Skip to main content

Limitations

This section explains the limitations of the AmiVoice API.

WebSocket Interface

Maximum Session Duration: 24 hours

The maximum time a session can be maintained with the WebSocket interface is 24 hours. Regardless of whether processing is ongoing, the connection will be terminated after the maximum session duration has elapsed. To continue recognition processing, please reconnect from the beginning.

Forced Disconnection Time Due to Non-Speech Period: 600 seconds

The connection will be terminated if no speech is detected for 600 seconds. To continue recognition processing, please reconnect from the beginning.

If this disconnection occurs, you will receive the following message in the p response packet:

p can't feed audio data to recognizer server

Please see the p command response packet in the reference, and Maintaining Sessions in the WebSocket interface usage guide.

Forced Disconnection Time Due to No Communication: 60 seconds

The connection will be terminated if no data is received for 60 seconds.

If this disconnection occurs before recognition processing begins, you will receive the following message:

e timeout occurred

If it occurs during recognition processing, you will receive the following message:

e timeout occurred while recognizing audio data from client

Please see the e command response packet in the reference, and Maintaining Sessions in the WebSocket interface usage guide.

Maximum Duration of a Single Utterance: 30 seconds

If an utterance exceeds the maximum duration, the speech recognition result will be returned as if the utterance had ended at the maximum time. The subsequent audio will be processed as a new utterance.

*An utterance is defined as a "voice" segment separated by silence (periods without voice) of about 1 second or more.

Synchronous HTTP Interface

Maximum Acceptable Audio Data Size: 16,777,215 bytes (about 16 MiB)

The maximum size of audio data that can be sent in a single request using the synchronous HTTP interface is 16,777,215 bytes. If you need to send audio data exceeding this limit, please use the asynchronous HTTP interface.

Forced Disconnection Time Due to Non-Speech Period: 50 seconds

The connection will be terminated if no speech is detected for 50 seconds. To continue recognition processing, please reconnect and send the audio again.

Maximum Duration of a Single Utterance: 30 seconds

If an utterance in the audio data continues for the maximum duration or longer, a confirmed recognition result will be generated and returned for the audio up to that point, as if one utterance had ended. The subsequent audio will be processed as a new utterance.

*An utterance is defined as a "voice" segment separated by silence (periods without voice) of about 1 second or more.

Forced Disconnection Time Due to No Communication: 60 seconds

The connection will be terminated if no data is received for 60 seconds.

Asynchronous HTTP Interface

Maximum Acceptable Audio Data Size: 2147,483,647 bytes (about 2GiB) (Maximum 3 hours when speaker diarization is enabled)

The maximum size of audio data that can be sent in a single request using the asynchronous HTTP interface is 2147,483,647 bytes.

Additionally, for requests with speaker diarization enabled, the maximum audio length is 3 hours. Exceeding this limit will result in an error at the time of the request.

{"results":[{"tokens":[],"tags":[],"rulename":"","text":""}],"code":"^","message":"request too large (audio duration exceeded 3 hours with speaker diarization)"}

Maximum Duration of a Single Utterance: 60 seconds

If an utterance in the audio data continues for the maximum duration or longer, a confirmed recognition result will be generated and returned for the audio up to that point, as if one utterance had ended. The subsequent audio will be processed as a new utterance.

*An utterance is defined as a "voice" segment separated by silence (periods without voice) of about 1 second or more.

Forced Disconnection Time Due to Non-Speech Period: No limit

This does not occur.

Speech Recognition Result Retention Period: 7 days

Results are stored for 7 days (168 hours) after the speech recognition process is completed.

Forced Disconnection Time Due to No Communication: 60 seconds

The connection will be terminated if no data is received for 60 seconds.