Response Codes and Messages
This section explains the responses when processing fails.
HTTP Interface
Synchronous HTTP and Asynchronous HTTP Speech Recognition Job Creation Request
When a speech recognition request fails, the response code and error message indicating the cause of the speech recognition failure are set in the code
and message
at the top level of the speech recognition result. For details, please see the table below.
Example:
{
"results": [
{
"tokens": [],
"tags": [],
"rulename": "",
"text": ""
}
],
"text": "",
"code": "-",
"message": "received illegal service authorization"
}
For synchronous HTTP, when recognition processing is successful, code
and message
will be empty strings ("").
Example:
...
"utteranceid": "20220602/14/018122d65d370a30116494c8_20220602_141442",
"text": "アドバンスト・メディアは、人と機械との自然なコミュニケーションを実現し、豊かな未来を創造していくことを目指します。",
"code": "",
"message": ""
...
For asynchronous HTTP, when the job creation request is successful, code
and message
are not returned.
Example:
{"sessionid":"017ac8786c5b0a0504399999","text":"..."}
Asynchronous HTTP Result and Status Retrieval
For asynchronous HTTP, even if the speech recognition request is successful, processing may be interrupted for some reason during the speech recognition job processing. In this case, when retrieving the result and status, state
becomes error
, and error_message
is set to an error message indicating the cause of the failure. error_message
may include a response code (dsrh_code
) and output message (amineth_message
) indicating the cause of failure in the speech recognition process. For the meaning of response codes and messages, please see the table in Response Codes and Messages Details.
Example:
{
"status": "error",
"audio_md5":"40f59fe5fc7745c33b33af44be43f6ad",
"audio_size":306980,
"service_id":"{YOUR_SERVICE_ID}",
"session_id":"017c25ec12c00a304474a999",
"error_message": "ERROR: Failed to transcribe in recognition process - amineth_result=0, amineth_code='o', amineth_message='recognition result is rejected because confidence is below the threshold'"
}
Client Errors
When the response code in the Response Codes and Messages Details table is >
, o
, +
, -
, or %
, it is a client error. Please note that if the cause of the error is not resolved, retrying will yield the same result.
Other errors are likely to be infrastructure issues, so please wait for a while and then retry.
Audio Data Transmission Failure (Response Code >
)
Cause
This error occurs in the following cases:
- When the audio data sent via synchronous/asynchronous HTTP interface does not contain audio data
Countermeasure
If this error occurs, please check the following:
- Whether audio data is being sent
- Whether the sent audio is not an empty file (zero-byte file)
- Whether the body of the sent container format contains audio data
Even if data is being sent, if it does not contain audio, it will return o
as described later.
Reject (Response Code o
)
Cause
This response is returned in the following cases:
- Could not detect speech from the audio.
- Speech was detected from the audio, and as a result of speech recognition, the confidence was below the confidence threshold.
- Speech was detected from the audio, and as a result of speech recognition, there were no characters that could be output. (Supplemented below)
Regardless of which cause above, the error message is always recognition result is rejected because confidence is below the threshold
.
There are no characters that can be output in the speech recognition result in the following cases:
- All audio was recognized as fillers like "あー" or "えーと" and automatically deleted
- All audio was estimated to be noise
However, when keepFillerToken
is set to 1, fillers will be output.
When transcribing with AmiVoice API, a two-stage pipeline process of speech detection and speech recognition is performed. If speech detection is not performed, speech recognition will not be performed. Even in the speech detection phase, whether it is a human voice is determined by a deep learning model, not just by volume, but as a result of speech recognition processing, it may ultimately be estimated as noise.
Countermeasure
If this error occurs, please check the following:
Whether the sent audio data contains speech
Please check the sent audio data. For example, speech cannot be detected from the following audio data. There may be a program bug or a problem with the recording system.
- Silence
- Contains only noise
- For stereo audio, speech is only in the second channel (only the first channel of multi-channel audio is the target of recognition. Please see Stereo in the audio format)
Also, speech may not be detected from unclear and low-volume audio, such as when recording very far from the sound source.
Whether an incorrect audio format is specified in the request
When sending audio without headers, if an incorrect audio format is specified at the time of request, speech detection and speech recognition processing will not be performed correctly. Please check if the audio format in the request is correctly set for the sent audio data.
When speaking correctly using rule grammar
(If you are not using rule grammar, you do not need to consider this case) If you receive this error even though you are speaking correctly, you can lower the confidence threshold. However, lowering the confidence threshold increases the possibility of accepting incorrect speech.
In some cases, such as when the actually sent audio does not contain speech, it may not be necessary to consider o
as an error.
Response Codes and Messages Details
code | message | Description |
---|---|---|
+ | received unsupported audio format | Received audio data in an unsupported audio data format |
- | received illegal service authorization | Received an invalid APPKEY (service authentication key string) |
! | failed to connect to recognizer server | Communication failure within the speech recognition server (failed to connect to DSRM or DSRS) |
> | failed to send audio data to recognizer server | Communication failure within the speech recognition server (failed to send audio data to DSRS) Please also see Audio Data Transmission Failure. |
< | failed to receive recognition result from recognizer server | Communication failure within the speech recognition server (failed to receive recognition results from DSRS) |
# | received invalid recognition result from recognizer server | Communication failure within the speech recognition server (invalid format of recognition results received from DSRS) |
$ | timeout occurred while receiving audio data from client | No communication timeout occurred while receiving audio data from the client |
% | received too large audio data from client | The number of bytes of audio data received from the client is too large (does not occur with WebSocket interface) |
o | recognition result is rejected because confidence is below the threshold | Recognition failed because the overall confidence of the recognition result was below the confidence threshold. This error is also returned when no speech could be detected from the entire received audio data, or when all results were fillers and there were no recognition results to respond with. Please also see Reject. |
b | recognition result is rejected because recognizer server is busy | Recognition failed because the speech recognition server is busy |
x | recognition result is rejected because grammar files are not loaded | Recognition failed because the dictionary is not loaded |
c | recognition result is rejected because the recognition process is cancelled | Recognition failed because a recognition process interruption request was made |
? | recognition result is rejected because fatal error occurred in recognizer server | Recognition failed because a fatal error occurred during recognition on the speech recognition server |
^ | invalid parameter (...) | An invalid parameter was specified. Only for Asynchronous HTTP Interface. |