Skip to main content

Troubleshooting

General Questions

Q. What should I do if the speech recognition accuracy is not as good as I would like?

Please check the following points. Also, please see Client Errors in the Handling of Results section. If the issue persists, please contact us.

  • Check the number of channels in the audio data

    For stereo recordings, only the first channel will be recognized. If you want both channels to be recognized, please send separate audio files for each channel.

  • Check the sampling rate of the audio data

    For telephone audio, it is mainly recorded at 8kHz, while other widely used audio is 16kHz or higher. The 会話_汎用 engine supports each of these sampling rates, but since the connection destination engine is different, you need to specify the correct sampling rate when making a request. *If the sampling rate specification is incorrect, almost nothing will be recognized (the recognition result will be empty).

  • Check the endianness of the audio format specification

    When sending headerless audio data (RAW data), you need to specify the endianness in addition to the sampling rate. Add 'lsb' for little-endian data and 'msb' for big-endian data (e.g., lsb8K, msb16K).

    *If the lsb or msb specification is incorrect or reversed, almost nothing will be recognized (the recognition result will be empty).

  • Confirm that the audio to be recognized has been recorded

    As it is affected by the surrounding environment and the microphone device used for recording, please listen and confirm that the speaker's voice is recorded at a level that can be clearly heard.

  • Check if the speech content matches the engine being used

    Recognition accuracy will decrease for audio with frequent use of technical terms. We offer domain-specific engines for medical, financial, and insurance industries. If the speech content seems to match, please try these. For details, please see Speech Recognition Engines. If you want to recognize unique terms or abbreviations used only within your company, please utilize the Word Registration feature.

Q. Is it possible to perform speech recognition on video files or streaming formats like HLS?

We do not provide APIs or sample programs for speech recognition of streaming video or audio. For unsupported data formats, you need to convert them to supported audio formats.

For software to convert audio formats, there are free software options like FFmpeg. However, depending on the video/audio format, there may be usage restrictions or royalty issues due to licensing of the format itself. Please check and use these at your own responsibility.

If you want to perform speech recognition while playing video or audio, you can change getUserMedia() to getDisplayMedia() in the JavaScript sample program to recognize browser or system audio instead of microphone input. Please consider this option as well.

(Example) wrp.js

Before change

navigator.mediaDevices.getUserMedia(
{audio: true, video: false}

After change

navigator.mediaDevices.getDisplayMedia(
{audio: true, video: true}

*getDisplayMedia() audio is not supported by all browsers. We have confirmed it works with Chrome and Edge.

Questions about My Page

Q. Is it possible to know the amount of usage per client application?

To know the usage amount for each client application, you need to separate accounts for each client application. If you need to separate accounts, please also consider using the "Customer Management Function for Service Providers".

If you're considering the "Customer Management Function for Service Providers", please contact us through the inquiry form for more details.

Q. Why doesn't a word from a file show up on the word registration page after I have registered it?

It's possible that the file's character encoding is not UTF-8, or the file format is incorrect, preventing successful registration. Please check the file's character encoding and format.

Questions about Client Libraries

Q. How do I resume my connection to the speech recognition server after the internet disconnects and recovers?

While creating a new socket is not a problem, we recommend performing termination processing and then reconnecting. In the case of client libraries, you can reconnect by calling wrp.disconnect() followed by wrp.connect().

(Example)

wrp.construct();
wrp.connect();
wrp.feedDataResume();
wrp.feedData();
wrp.feedData();
wrp.feedData();
--> Error occurs
wrp.disconnect();
wrp.connect();
wrp.feedDataResume();
wrp.feedData();
wrp.feedData();
wrp.feedData();
wrp.feedDataPuse();

Q. Why does my Android application fail to connect to the server?

On Android, you cannot connect to the server from the main thread (UI thread). Please perform this operation on a separate thread. Also, you need to set the permission (android.permission.INTERNET) to perform network operations.

Questions about Sample Programs

Q. Do you have a sample program for the asynchronous speech recognition API?

Please check the Asynchronous HTTP Interface for Python sample code. Additionally, we publish sample applications with source code on the AmiVoice Tech Blog, so please see these as well.

Q. Why are parameters not valid when I run curl commands in a Windows batch file?

When writing commands in a Windows batch file, you need to escape "%20" as "%%20". Please check the content of your commands.

Q. Why are some characters garbled in the Windows command prompt?

The Windows command prompt does not support displaying 4-byte UTF-8 characters, causing them to appear garbled.

Q. I get "PHP Warning: PHP Startup: Unable to load dynamic library 'openssl'" in PHP. Is openssl.so absolutely necessary?

If openssl is compiled into PHP as a built-in feature, it is not necessary.

Questions about Server Connection

Q. Why does APPKEY authentication fail?

Please check the specified parameters and the status of your credit card registration. Also, after contract signing, it may take up to 10 minutes for AmiVoice API to become available.

Q. Why am I getting SSL certificate errors when connecting?

We have been informed by customers using environments with Zscaler, a third-party security solution, that it is necessary to specify Zscaler's certificate.

Q. Why does the connection fail due to a timeout?

It's possible that network settings such as firewalls or proxy servers are blocking the connection. Please check your firewall and proxy server settings.

Web browsers may have network settings configured, so please try the JavaScript version of the sample. If you're using the AmiVoice API client library in an environment that requires proxy server settings, please use the setProxyServerName() method to configure the proxy server.

Q. What is the reason why HTTP statuses 431 and 414 are returned?

The request headers, URI, or query string are too long. If you're specifying profileWords in the query string, please specify it in the request body instead.

Questions about Speech Recognition Results and Job Status

Q. Why aren't the words I registered on the word registration page reflected?

You need to specify the profile ID parameter in the request parameters. Note that "profileId" and "profileWords" are not independent request parameters, but "child parameters" of the d parameter. For details, please check Word Registration on My Page.

Q. Why can't I recognize my own audio file?

You may be using an audio format that AmiVoice API doesn't support, or the audio format of your file may not match the format specified in the parameters. Please confirm that you're using an audio format supported by AmiVoice API and that your parameter specifications are correct. Also, be careful not to send the filename instead of the actual audio file data to the server. For information about audio formats, please check Audio Formats.

Q. What should I do if I want to use a word that contains characters that cannot be used, such as "|" and ":", in the word registration?

You need to replace these with different characters when registering, and then replace them back to the original characters after obtaining the speech recognition results in your application's processing. For example, you can register URL-encoded characters and then URL-decode the speech recognition result string in your application.

Q. Is there any way to know if no logging is enabled?

Please check the usage amounts for "Logging" and "No logging" in "This Month's Usage" on your My Page. If your usage is added to the "No logging" usage amount, then "No Logging" is enabled. Previously, when "No logging" was active, [nolog] was appended to the "utteranceid" value in the speech recognition results, but this [nolog] suffix has been discontinued.

Q. What is the reason why the status of a job in the asynchronous HTTP speech recognition API is "queued"?

After a successful request, it takes time for speech recognition to complete, depending on the length of the submitted audio. When you retrieve the job status, the status transitions as follows as processing progresses:

queued → started → processing → completed

If the status is queued, speech recognition is not yet complete, so please wait for a while before making another request to retrieve the results.

If an error occurs during processing, it will transition to error from any of the states instead of reaching completed:

queued → error
queued → started → error
queued → started → processing → error

After a successful speech recognition request, implement a process to retrieve the job status at 30-second intervals until the status becomes completed or error.

Q. How do I put a line break after a punctuation mark?

AmiVoice API does not have a feature to automatically insert line breaks, so you need to process the speech recognition result text on the client side. (Example) Replace "。" with "。" and a br tag in JavaScript:

text = text.replace(/。/g, "。<br>");

Q. Is it possible to get hiragana speech recognition results?

The JSON of speech recognition results includes "reading" information for each word.

For details about the JSON, please see the speech recognition result responses for each API (Synchronous HTTP Interface, Asynchronous HTTP Interface , WebSocket Interface ). The "spoken" field represents the reading of each word.

However, this "reading" is based on the pronunciation, so it may differ from "furigana". For example, the reading for the notation "私 は" will be "わたくし わ".

Other Questions

Q. Why does the BIZTEL speech recognition service history linkage show an error for engines other than "会話_汎用"?

Currently, only "会話_汎用" supports the audio data format (sampling rate less than 16kHz) sent from BIZTEL, which is why other engines result in errors.

Q. Is it possible to reflect the registered words when using AmiVoice with AudioCodes VoiceAI Connect?

We have received information from customers that it is possible to specify the profile ID by adding it after the connection engine name in the Connection Engine Name (folderId), like this:

<connection engine name> profileId=<:profile ID>

For more details, please confirm with AudioCodes.