Development Guide
You can obtain speech recognition results by connecting to the speech recognition server's endpoint via HTTP or WebSocket and sending audio data along with request parameters. Here, we will explain the usage method for developers creating applications using the AmiVoice API.
Basic Functions
Generally, client applications performing speech recognition using the AmiVoice API need to implement the following:
- Obtain audio data from a recording device or network
- Convert audio data to a compatible format (not necessary if using a supported audio format)
- Send audio data to the speech recognition API endpoint
- Receive speech recognition results
- Interpret and use the speech recognition results (e.g., display as captions on screen, understand intent to generate voicebot responses, use as input for summarization processing such as meeting minutes, etc.)
The following is an overview of the interaction between the client program and the speech recognition server.
Interface Type and Usage
The AmiVoice API provides three speech recognition interfaces. We will explain the necessary features and intended use cases to help users choose the appropriate interface.
Request
To obtain speech recognition results, various settings need to be configured when making requests to the server, and audio files need to be sent.
- Request Parameters explains the items that need to be set during the request.
- For supported audio data, please see Audio Format.
- For available speech recognition engines and supported languages, please see Speech Recognition Engines.
The method of sending requests differs depending on whether HTTP or WebSocket is used, so we will explain each interface in order.
For handling of logs on the server for sent data and speech recognition results, please see Logging.
Response
The speech recognition server provides text transcribed from the sent audio. For details on various information obtained in addition to the text, please see Speech Recognition Results. For error handling, please see Response Codes and Messages.
Development Resources
We explain information for developing applications by making better use of the AmiVoice API, as well as client libraries, sample programs, and limitations.
Advanced Features
These are features for improving speech recognition accuracy.
We also provide additional features such as speaker diarization and sentiment analysis. Please use them according to your purpose.
We also provide features to support the creation of secure authentication keys and the operation of built services.
Client Libraries
We introduce client libraries for easily using the AmiVoice API from various languages.
Sample Programs
We introduce sample programs in various programming languages using the AmiVoice API.
Limitations
We explain limitations that should be known when using the AmiVoice API.