Response Codes and Messages

This section explains the responses when processing fails.

HTTP Interface

Synchronous HTTP and Asynchronous HTTP Speech Recognition Job Creation Request

When a speech recognition request fails, the response code and error message indicating the cause of the speech recognition failure are set in the code and message at the top level of the speech recognition result. For details, please see the table below.

Example:

{
  "results": [
    {
      "tokens": [],
      "tags": [],
      "rulename": "",
      "text": ""
    }
  ],
  "text": "",
  "code": "-",
  "message": "received illegal service authorization"
}

For synchronous HTTP, when recognition processing is successful, code and message will be empty strings ("").

Example:

...
  "utteranceid": "20220602/14/018122d65d370a30116494c8_20220602_141442",
  "text": "アドバンスト・メディアは、人と機械との自然なコミュニケーションを実現し、豊かな未来を創造していくことを目指します。",
  "code": "",
  "message": ""
...

For asynchronous HTTP, when the job creation request is successful, code and message are not returned.

Example:

{"sessionid":"017ac8786c5b0a0504399999","text":"..."}

Asynchronous HTTP Result and Status Retrieval

For asynchronous HTTP, even if the speech recognition request is successful, processing may be interrupted for some reason during the speech recognition job processing. In this case, when retrieving the result and status, state becomes error, and error_message is set to an error message indicating the cause of the failure. error_message may include a response code (dsrh_code) and output message (amineth_message) indicating the cause of failure in the speech recognition process. For the meaning of response codes and messages, please see the table in Response Codes and Messages Details.

Example:

{
    "status": "error",
    "audio_md5":"40f59fe5fc7745c33b33af44be43f6ad",
    "audio_size":306980,
    "service_id":"{YOUR_SERVICE_ID}",
    "session_id":"017c25ec12c00a304474a999",
    "error_message": "ERROR: Failed to transcribe in recognition process - amineth_result=0, amineth_code='o', amineth_message='recognition result is rejected because confidence is below the threshold'"
}

Client Errors

When the response code in the Response Codes and Messages Details table is >, o, +, -, or %, it is a client error. Please note that if the cause of the error is not resolved, retrying will yield the same result.

Other errors are likely to be infrastructure issues, so please wait for a while and then retry.

Audio Data Transmission Failure (Response Code `>`)

Cause

This error occurs in the following cases:

When the audio data sent via synchronous/asynchronous HTTP interface does not contain audio data

Countermeasure

If this error occurs, please check the following:

Whether audio data is being sent
Whether the sent audio is not an empty file (zero-byte file)
Whether the body of the sent container format contains audio data

Even if data is being sent, if it does not contain audio, it will return o as described later.

Reject (Response Code `o`)

Cause

This response is returned in the following cases:

Could not detect speech from the audio.
Speech was detected from the audio, and as a result of speech recognition, the confidence was below the confidence threshold.
Speech was detected from the audio, and as a result of speech recognition, there were no characters that could be output. (Supplemented below)

note

Regardless of which cause above, the error message is always recognition result is rejected because confidence is below the threshold.

note

There are no characters that can be output in the speech recognition result in the following cases:

All audio was recognized as fillers like "あー" or "えーと" and automatically deleted
All audio was estimated to be noise

However, when keepFillerToken is set to 1, fillers will be output.

When transcribing with AmiVoice API, a two-stage pipeline process of speech detection and speech recognition is performed. If speech detection is not performed, speech recognition will not be performed. Even in the speech detection phase, whether it is a human voice is determined by a deep learning model, not just by volume, but as a result of speech recognition processing, it may ultimately be estimated as noise.

Countermeasure

If this error occurs, please check the following:

Whether the sent audio data contains speech

Please check the sent audio data. For example, speech cannot be detected from the following audio data. There may be a program bug or a problem with the recording system.

Silence
Contains only noise
For stereo audio, speech is only in the second channel (only the first channel of multi-channel audio is the target of recognition. Please see Stereo in the audio format)

Also, speech may not be detected from unclear and low-volume audio, such as when recording very far from the sound source.

Whether an incorrect audio format is specified in the request

When sending audio without headers, if an incorrect audio format is specified at the time of request, speech detection and speech recognition processing will not be performed correctly. Please check if the audio format in the request is correctly set for the sent audio data.

When speaking correctly using rule grammar

(If you are not using rule grammar, you do not need to consider this case) If you receive this error even though you are speaking correctly, you can lower the confidence threshold. However, lowering the confidence threshold increases the possibility of accepting incorrect speech.

tip

In some cases, such as when the actually sent audio does not contain speech, it may not be necessary to consider o as an error.