s Command Packet / s Command Response Packet
The s command packet and s command response packet are paired. When you notify the server of the start of voice data transmission with the s command, the server returns an s command response packet.
If the s command response packet is a single "s" character, it indicates success. You can then begin supplying voice data with p command packets. When all voice data transmission is complete, notify the end of transmission with an e command packet. If the connection is still maintained after that, you can restart again from the s command packet.
s Command Packet
Starts voice data transmission. In addition to notifying the start of transmission, you need to send the format of the voice to be transmitted, the speech recognition engine (connection engine name) you want to use, authentication information (APPKEY), and other parameters.
Format
Type TEXT
s <audio_format> <grammar_file_names>
s <audio_format> <grammar_file_names> <key>=<value> ...
The delimiter between s and each parameter block is a single space. Details of each parameter are explained below.
<audio_format>
Specifies the audio format to be sent. This parameter is required. For specifiable format names, please see About Audio Formats in the development guide.
<grammar_file_names>
Specifies the connection engine name. This parameter is required.
<key>
The following key strings can be specified for <key>. If <value> contains spaces, enclose the value in double quotes "".
| <key> | Description |
|---|---|
| authorization | Set the [APPKEY] listed on MyPage or the one-time APPKEY obtained from the One-time APPKEY Issuance API for authorization. This parameter is required. |
| profileId | profileId is the ID of the user-specific data file (profile) for registering user-specific words in the user dictionary. Only that user can use the specified profile. Profiles are stored in a user-specific area, so there is no name collision with other users. * For profileId, specify a string consisting of alphanumeric characters, "-" (hyphen), and "_" (underscore). However, strings starting with "__" (two underscores) are reserved by the speech recognition engine, so please do not specify strings starting with "__" (two underscores). When you register words in the user dictionary registration on MyPage, a profile with the same name as the Service ID (automatically generated from the Account ID) listed in the connection information on MyPage is automatically created and saved on the server. If you want to perform speech recognition using this profile, specify "a string with a colon ":" prefixed to the beginning of the Service ID" for profileId. (Example) If the Service ID is "aiueo12345", set ":aiueo12345" as the value for profileId. User Dictionary The easiest way to register words in the user dictionary is through the user dictionary registration screen on MyPage. In addition to this, there are methods of using the user dictionary registration API or setting for each request. Please see How to Register User Dictionary for details on each method. |
| profileWords | In the profileWords parameter, you set the information for words that you want to configure as word registration or keyword biasing. For word registration with the Hybrid engine, enter "written<single-byte space>pronunciation", and for keyword biasing with the End to End engine, enter "written<single-byte space>alternative_written<single-byte space>biasing_level". When registering multiple words, separate them with " | ". When sending, enclose the entire value of the profileWords parameter in quotation marks. *When specifying a class name for word registration, use "written<single-byte space>pronunciation<single-byte space>class_name". Alternative written and biasing level for keyword biasing are optional, but if you omit the alternative written, use "written<single-byte space><single-byte space>biasing_level". |
| keepFillerToken | If you want to include filler words (such as "あー" or "えー") in the recognition result string, specify as follows: keepFillerToken=1 There are various filler words, but all are enclosed in % at the beginning and end of the written. (Example)"%あー%""%えー%""%おー%""%えーっと%" If keepFillerToken=1 is not specified, filler words are removed from the recognition result string. |
| segmenterProperties | These are parameters for speech detection. You can control the enabling/disabling of speaker diarization and set parameters to adjust the results. Multiple parameters can be set separated by spaces. segmenterProperties="key1=value1 key2=value2..." The following parameters can be set: useDiarizer Setting 1 enables speaker diarization. ・Specifiable values: 0 or 1 ・Default value: 0 diarizerAlpha This parameter controls the ease of appearance of new speakers. The larger the value, the easier it is for new speakers to appear, and the smaller the value, the harder it is for new speakers to appear. This is only effective when useDiarizer=1.・Specifiable values: 0 or more ・Recommended range: 1e-100~1e50 ・Default value: 1 diarizerTransitionBias This parameter controls the ease of speaker switching. The larger the value, the easier it is for speakers to switch, and the smaller the value, the harder it is for speakers to switch. This is only effective when useDiarizer=1.・Specifiable values: 0 or more and less than 1 ・Recommended range: 1e-150~1e-10 ・Default value: 1e-40 (1e-20 for 8k) |
s command to use words registered on MyPage
s MSB16K -a-general profileId=:<ServiceID> authorization=XXXXXXXXXXXXXX
s command to temporarily register words for this session
s MSB16K -a-general profileWords="AMI あみ|AmiVoice あみぼいす" authorization=XXXXXXXXXXXXXX
In the above example, profileId is not specified, and "AMI あみ|AmiVoice あみぼいす" (2 words) are registered. If you continue to send voice data within this session, these words will be used in the recognition process. After sending the e command packet (end of session), these words become invalid and are not saved.
How to use the user dictionary
To perform speech recognition using words registered in a profile, specify the profileId of the profile where user dictionary was previously registered, with ":" (colon) added to the beginning, when sending the s command packet.
If multiple members are using it and profileId is specified without adding ":" to the beginning, recognition accuracy may decrease.
s command to use a custom profile for speech recognition
s MSB16K -a-general profileId=:test authorization=XXXXXXXXXXXXXX
s Command Response Packet
This is sent from the server to the client in response to the s command.
Format
Type TEXT
Response packet for successful start request
If the start request is successful, a single s character is returned.
s
Response packet for failed start request
If the start request fails, an error message is returned after s with a single space in between.
s <error_message>
Error Messages
Client Errors
These are errors due to incorrect request parameters or authentication information in the s command. Please correct and resend the request.
| Error Message | Content |
|---|---|
| s received unsupported audio format | There was an error in the specified audio format. |
| s can't verify service authorization | Authentication failed. This is due to one of the following reasons: - APPKEY is not set - The set APPKEY is incorrect - Accessed from an IP address not allowed by the One-time APPKEY |
| s can't validate service authorization | Authentication failed. This is due to one of the following reasons: - The configured APPKEY is incorrect (including cases where the account is disabled) - The specified connection engine name is incorrect (e.g., typos like writing a-general instead of -a-general, or specifying an engine exclusive to AmiVoice API Private without having a contract for it) |
| s service authorization has expired: <expirationTime> <expiresIn> | The expiration time defined by the One-time APPKEY has expired. |
| s can't connect to recognizer server | Authentication failed. The One-time APPKEY is invalid. |
| s can't connect to recognizer server (can't find available servers) | Connection failed because the combination of the Sampling Rate in the audio format and the specified connection engine name is invalid, and a suitable engine could not be found. For example, this may occur when an 8k audio format is specified for an engine that does not support an 8k sampling rate. For details on the sampling rates supported by each speech recognition engine, please see the List of Speech Recognition Engines. |
| s can't start feeding audio data to recognizer server | The process to start sending audio data failed due to an error in the specified segmenter parameters. |
Server Errors
These are errors that may rarely occur due to infrastructure system failures. Please wait for a while and resend the request.
| Error Message | Content |
|---|---|
| s can't connect to recognizer server (can't connect to server)" | Could not connect to the speech recognition server. |
| s can't connect to recognizer server (can't find available servers because all requested servers are busy) | Could not connect because all appropriate speech recognition servers for the specified connection engine name or audio format were busy. |
| s can't connect to recognizer server (can't find available servers because maximum allowed clients has reached) | Could not connect to the speech recognition server because the maximum number of connectable clients has been reached. |
| s can't connect to recognizer server (can't send data) | Connection failed due to communication error between servers in the infrastructure system. |
| s can't connect to recognizer server (can't receive data) | Connection failed due to communication error between servers in the infrastructure system. |
| s can't connect to recognizer server (disconnected by force) | Connection failed due to communication error between servers in the infrastructure system. |
Errors Due to Limitations
These occur when limitations are violated. Please retry from the s command request.
| Error Message | Content |
|---|---|
| s session timeout occurred | A session timeout occurred. This occurs when the maximum session time in the limitations is exceeded. The server has initiated the disconnection process. |