s Command Packet / s Command Response Packet

The s command packet and s command response packet are paired. When you notify the server of the start of voice data transmission with the s command, the server returns an s command response packet.

If the s command response packet is a single "s" character, it indicates success. You can then begin supplying voice data with p command packets. When all voice data transmission is complete, notify the end of transmission with an e command packet. If the connection is still maintained after that, you can restart again from the s command packet.

s Command Packet

Starts voice data transmission. In addition to notifying the start of transmission, you need to send the format of the voice to be transmitted, the speech recognition engine (connection engine name) you want to use, authentication information (API key), and other parameters.

Format

Type TEXT

s <audio_format> <grammar_file_names>
s <audio_format> <grammar_file_names> <key>=<value> ...

The delimiter between s and each parameter block is a single space. Details of each parameter are explained below.

<audio_format>

Specifies the audio format to be sent. This parameter is required. For specifiable format names, please see About Audio Formats in the development guide.

<grammar_file_names>

Specifies the connection engine name. This parameter is required.

<key>

The following key strings can be specified for <key>. If <value> contains spaces, enclose the value in double quotes "".

<key>	Description
authorization	Set the [API key] listed on MyPage or the API key obtained using the API Key Issuance API for `authorization`. This parameter is required.
profileId	`profileId` is the ID of the user-specific data file (profile) for registering user-specific words in the user dictionary. Only that user can use the specified profile. Profiles are stored in a user-specific area, so there is no name collision with other users. * For `profileId`, specify a string consisting of alphanumeric characters, "-" (hyphen), and "_" (underscore). However, strings starting with "__" (two underscores) are reserved by the speech recognition engine, so please do not specify strings starting with "__" (two underscores). When you register words in the user dictionary registration on MyPage, a profile with the same name as the Service ID (automatically generated from the Account ID) listed in the connection information on MyPage is automatically created and saved on the server. If you want to perform speech recognition using this profile, specify "a string with a colon ":" prefixed to the beginning of the Service ID" for `profileId`. (Example) If the Service ID is "aiueo12345", set ":aiueo12345" as the value for `profileId`. User Dictionary The easiest way to register words in the user dictionary is through the user dictionary registration screen on MyPage. In addition to this, there are methods of using the user dictionary registration API or setting for each request. Please see How to Register User Dictionary for details on each method.
profileWords	In the `profileWords` parameter, you set the information for words that you want to configure as word registration or keyword biasing. For word registration with the Hybrid engine, enter "written<single-byte space>spoken", and for keyword biasing with the End to End engine, enter "written<single-byte space>alternative_written<single-byte space>biasing_level". When registering multiple words, separate them with " \| ". When sending, enclose the entire value of the `profileWords` parameter in quotation marks. *When specifying a class name for word registration, use "written<single-byte space>spoken<single-byte space>class_name". Alternative written and biasing level for keyword biasing are optional, but if you omit the alternative written, use "written<single-byte space><single-byte space>biasing_level".
keepFillerToken	If you want to include filler words (such as "あー" or "えー") in the recognition result string, specify as follows: keepFillerToken=1 There are various filler words, but all are enclosed in % at the beginning and end of the written. (Example)"%あー%""%えー%""%おー%""%えーっと%" If `keepFillerToken=1` is not specified, filler words are removed from the recognition result string.
segmenterProperties	These are parameters for speech detection. You can control the enabling/disabling of speaker diarization and set parameters to adjust the results. Multiple parameters can be set separated by spaces. segmenterProperties="key1=value1 key2=value2..." The following parameters can be set: useDiarizer Setting 1 enables speaker diarization. ・Specifiable values: 0 or 1 ・Default value: 0 diarizerAlpha This parameter controls the ease of appearance of new speakers. The larger the value, the easier it is for new speakers to appear, and the smaller the value, the harder it is for new speakers to appear. This is only effective when `useDiarizer=1`. ・Specifiable values: 0 or more ・Recommended range: 1e-100～1e50 ・Default value: 1 diarizerTransitionBias This parameter controls the ease of speaker switching. The larger the value, the easier it is for speakers to switch, and the smaller the value, the harder it is for speakers to switch. This is only effective when `useDiarizer=1`. ・Specifiable values: 0 or more and less than 1 ・Recommended range: 1e-150～1e-10 ・Default value: 1e-40 (1e-20 for 8k)

s command to use words registered on MyPage

s MSB16K -a-general profileId=:<ServiceID> authorization=XXXXXXXXXXXXXX

s command to temporarily register words for this session

s MSB16K -a-general profileWords="AMI あみ|AmiVoice あみぼいす" authorization=XXXXXXXXXXXXXX

In the above example, profileId is not specified, and "AMI あみ|AmiVoice あみぼいす" (2 words) are registered. If you continue to send voice data within this session, these words will be used in the recognition process. After sending the e command packet (end of session), these words become invalid and are not saved.

How to use the user dictionary

To perform speech recognition using words registered in a profile, specify the profileId of the profile where user dictionary was previously registered, with ":" (colon) added to the beginning, when sending the s command packet. If multiple members are using it and profileId is specified without adding ":" to the beginning, recognition accuracy may decrease.

s command to use a custom profile for speech recognition

s MSB16K -a-general profileId=:test authorization=XXXXXXXXXXXXXX

s Command Response Packet

This is sent from the server to the client in response to the s command.

Format

Type TEXT

Response packet for successful start request

If the start request is successful, a single s character is returned.

Response packet for failed start request

If the start request fails, an error message is returned after s with a single space in between.

s <error_message>

Error Messages

Client Errors

These are errors due to incorrect request parameters or authentication information in the s command. Please correct and resend the request.

Error Message	Content
s received unsupported audio format	There was an error in the specified audio format.
s can't verify service authorization	Authentication failed. This is due to one of the following reasons: - API key is not set - The set API key is incorrect - Access was made from an IP address outside the one specified in the Restrictions assigned to the API key
s can't validate service authorization	Authentication failed. This is due to one of the following reasons: - The configured API key is incorrect (including cases where the account is disabled) - The specified connection engine name is incorrect (e.g., typos like writing `a-general` instead of `-a-general`, or specifying an engine exclusive to AmiVoice API Private without having a contract for it)
s service authorization has expired: <expirationTime> <expiresIn>	The expiration time defined by the API Key has expired.
s can't connect to recognizer server	Authentication failed. The API Key is invalid.
s can't connect to recognizer server (can't find available servers)	Connection failed because the combination of the Sampling Rate in the audio format and the specified connection engine name is invalid, and a suitable engine could not be found. For example, this may occur when an 8k audio format is specified for an engine that does not support an 8k sampling rate. For details on the sampling rates supported by each speech recognition engine, please see the List of Speech Recognition Engines.
s can't start feeding audio data to recognizer server	The process to start sending audio data failed due to an error in the specified segmenter parameters.

Server Errors

These are errors that may rarely occur due to infrastructure system failures. Please wait for a while and resend the request.

Error Message	Content
s can't connect to recognizer server (can't connect to server)"	Could not connect to the speech recognition server.
s can't connect to recognizer server (can't find available servers because all requested servers are busy)	Could not connect because all appropriate speech recognition servers for the specified connection engine name or audio format were busy.
s can't connect to recognizer server (can't find available servers because maximum allowed clients has reached)	Could not connect to the speech recognition server because the maximum number of connectable clients has been reached.
s can't connect to recognizer server (can't send data)	Connection failed due to communication error between servers in the infrastructure system.
s can't connect to recognizer server (can't receive data)	Connection failed due to communication error between servers in the infrastructure system.
s can't connect to recognizer server (disconnected by force)	Connection failed due to communication error between servers in the infrastructure system.

Errors Due to Limitations

These occur when limitations are violated. Please retry from the s command request.

Error Message	Content
s session timeout occurred	A session timeout occurred. This occurs when the maximum session time in the limitations is exceeded. The server has initiated the disconnection process.

s Command Packet​

Format​

<audio_format>​

<grammar_file_names>​

<key>​

s command to use words registered on MyPage​

s command to temporarily register words for this session​

How to use the user dictionary​

s command to use a custom profile for speech recognition​

s Command Response Packet​

Format​

Response packet for successful start request​

Response packet for failed start request​

Error Messages​

Client Errors​

Server Errors​

Errors Due to Limitations​

s Command Packet

Format

<audio_format>

<grammar_file_names>

<key>

s command to use words registered on MyPage

s command to temporarily register words for this session

How to use the user dictionary

s command to use a custom profile for speech recognition

s Command Response Packet

Format

Response packet for successful start request

Response packet for failed start request

Error Messages

Client Errors

Server Errors

Errors Due to Limitations