Request Parameters
This section explains the parameters to be set when requesting speech recognition with AmiVoice API. Although the transmission methods differ for HTTP and WebSocket interfaces, the parameters that can be set are the same.
List of Parameters
authorization
(authentication information) and grammarFileNames
(connected engine name) are required. Other parameters are optional. Some parameters are not supported by all interfaces, so please see the table below.
Parameter Name | Description | Required | Sync HTTP | WebSocket | Async HTTP |
---|---|---|---|---|---|
authorization | Authentication information | ● | ● | ● | ● |
grammarFileNames | Connected engine name | ● | ● | ● | ● |
profileId | Profile ID | ● | ● | ● | |
profileWords | Word registration list | ● | ● | ● | |
keepFillerToken | Suppression of automatic filler word removal | ● | ● | ● | |
segmenterProperties | Parameters for speech segment detection | ● | ● | ||
resultUpdatedInterval | Interval for recognition in progress events | ● | |||
loggingOptOut | Change logging or no logging settings | ● | |||
contentId | User-defined ID | ● | |||
compatibleWithSync | Result format compatibility | ● | |||
speakerDiarization | Speaker diarization enable option | ● | |||
diarizationMinSpeaker | Minimum estimated number of speakers for diarization | ● | |||
diarizationMaxSpeaker | Maximum estimated number of speakers for diarization | ● | |||
sentimentAnalysis | Sentiment analysis enable option | ● |
For information on how to send these request parameters, please see the following sections:
Parameter Details
The following sections explain the details of each parameter.
Required Parameters
authorization
Authentication information
Authentication information must be set to use the API. The authentication information is the [APPKEY] listed on your My Page, or the one-time APPKEY obtained from the One-time APPKEY Issuance API.
When connecting to the speech recognition server from a browser application, please use a one-time APPKEY to avoid writing the APPKEY in the HTML file. For details, please see One-time APPKEY.
grammarFileNames
Connected engine name
Specify the speech recognition engine you want to use for that session. Specify one for each session. The values that can be set are listed in the Connected Engine Name table or on your My Page. For details, please see Speech Recognition Engines.
Optional Parameters
profileId
Profile ID
A profile is a user-specific data file that exists on the speech recognition server, where users can name and save registered words. The profile ID is an identifier to specify that data file. For details, please see Word Registration.
profileWords
Word registration list
You can register words that are valid for the session. Each word is registered in the format "notation (single-byte space) reading". If specifying a class name, use "notation<single-byte space>reading <single-byte space> class name". When registering multiple words, separate them with a "|" (single-byte vertical bar). The value format is as follows (example without specifying class names):
notation1 reading1|notation2 reading2|notation3 reading3|notation4 reading4
For details, please see Word Registration.
keepFillerToken
Suppression of automatic filler word removal
Specify 1
or 0
. The default is 0
. Specify 1
if you do not want to automatically remove filler words (such as "あー" or "えー") included in the speech recognition results. Also see Automatic Filler Word Removal.
Filler words are surrounded by single-byte "%" before and after the word. Here are examples of filler words:
%あー%
%えー%
%おー%
%えっと%
Please also see the AmiVoice Tech Blog article How to choose whether to display or remove unnecessary words (fillers) with AmiVoice API.
segmenterProperties
Parameters for speech segment detection
The following parameters can be set:
useDiarizer
- Setting this to
1
enables speaker diarization in synchronous HTTP and WebSocket interfaces. It is disabled by default. For details, please see Speaker Diarization.
- Setting this to
diarizerAlpha
- This parameter controls the ease of appearance of new speakers in speaker diarization for synchronous HTTP and WebSocket interfaces. Larger values make new speakers more likely to appear, while smaller values make new speakers less likely to appear.
diarizerAlpha=0
is special and is treated as if 1e0, i.e., 1, was specified. If not set, it is treated as ifdiarizerAlpha=0
was specified.
- This parameter controls the ease of appearance of new speakers in speaker diarization for synchronous HTTP and WebSocket interfaces. Larger values make new speakers more likely to appear, while smaller values make new speakers less likely to appear.
diarizerTransitionBias
- This parameter controls the ease of speaker transitions in speaker diarization for synchronous HTTP and WebSocket interfaces. Larger values make speaker transitions more likely, while smaller values make speaker transitions less likely.
diarizerTransitionBias=0
is special and is treated as if 1e-40 was specified. However, for engines supporting 8kHz audio, such as the general-purpose engine (-a-general
), when sending 8kHz audio, it is treated as if 1e-20 was specified. If not set, it is treated as ifdiarizerTransitionBias=0
was specified.
- This parameter controls the ease of speaker transitions in speaker diarization for synchronous HTTP and WebSocket interfaces. Larger values make speaker transitions more likely, while smaller values make speaker transitions less likely.
WebSocket API Specific Parameters
resultUpdatedInterval
Interval for recognition in progress events
Specifies the interval in milliseconds at which recognition in progress events are issued.
- Setting it to 0 will not issue recognition in progress events.
- Recognition in progress events are issued every time the specified amount of audio data is processed. These events are issued based on the amount of processed audio data, not the actual elapsed time. If a value with a fraction less than 100 is specified, it is treated as if rounded up to the nearest multiple of 100.
Asynchronous HTTP Interface Specific Parameters
loggingOptOut
Change logging or no logging settings
loggingOptOut=<True|False>
Specifies logging or no logging. When set to True, the system will not retain logs during the session. The default is False.
contentId
User-defined ID
contentId=<arbitrary string>
You can specify an arbitrary string defined on the user side. It will be included in the status and result responses during that session. The default is None.
compatibleWithSync
Result format compatibility
compatibleWithSync=<True|False>
Formats the results in a way compatible with the synchronous HTTP interface. The default is False.
speakerDiarization
Speaker diarization enable option
speakerDiarization=<True|False>
Enables speaker diarization. The default is False. For details, please see Speaker Diarization.
diarizationMinSpeaker
Minimum estimated number of speakers for diarization
diarizationMinSpeaker=<int>
Only valid when speaker diarization is enabled, you can specify the minimum number of speakers in the audio. It must be set to 1 or higher. The default is 1. For details, please see Speaker Diarization.
diarizationMaxSpeaker
Maximum estimated number of speakers for diarization
diarizationMaxSpeaker=<int>
Only valid when speaker diarization is enabled, you can specify the maximum number of speakers in the audio. It must be set to a value equal to or greater than diarizationMinSpeaker. The default is 10. For details, please see Speaker Diarization.
sentimentAnalysis
Sentiment analysis enable option
sentimentAnalysis=<True|False>
Enables sentiment analysis. The default is False.
For details, please see Sentiment Analysis.