Skip to main content

Development Guide

You can obtain speech recognition results by connecting to the speech recognition server's endpoint via HTTP or WebSocket and sending audio data along with request parameters. Here, we will explain the usage method for developers creating applications using the AmiVoice API.

Basic Functions

Generally, client applications performing speech recognition using the AmiVoice API need to implement the following:

  1. Obtain audio data from a recording device or network
  2. Convert audio data to a compatible format (not necessary if using a supported audio format)
  3. Send audio data to the speech recognition API endpoint
  4. Receive speech recognition results
  5. Interpret and use the speech recognition results (e.g., display as captions on screen, understand intent to generate voicebot responses, use as input for summarization processing such as meeting minutes, etc.)

The following is an overview of the interaction between the client program and the speech recognition server.

Figure. AmiVoice API Overview

Interface Type and Usage

The AmiVoice API provides three speech recognition interfaces. We will explain the necessary features and intended use cases to help users choose the appropriate interface.

Request

To obtain speech recognition results, various settings need to be configured when making requests to the server, and audio files need to be sent.

The method of sending requests differs depending on whether HTTP or WebSocket is used, so we will explain each interface in order.

For handling of logs on the server for sent data and speech recognition results, please see Logging.

Response

The speech recognition server provides text transcribed from the sent audio. For details on various information obtained in addition to the text, please see Speech Recognition Results. For error handling, please see Response Codes and Messages.

Development Resources

We explain information for developing applications by making better use of the AmiVoice API, as well as client libraries, sample programs, and limitations.

Figure. AmiVoice API Overview

Advanced Features

These are features for improving speech recognition accuracy.

We also provide additional features such as speaker diarization and sentiment analysis. Please use them according to your purpose.

We also provide features to support the creation of secure authentication keys and the operation of built services.

Client Libraries

We introduce client libraries for easily using the AmiVoice API from various languages.

Sample Programs

We introduce sample programs in various programming languages using the AmiVoice API.

Limitations

We explain limitations that should be known when using the AmiVoice API.