Skip to main content

Introduction

AmiVoice API is a speech recognition API that converts speech to text. When you send audio, it returns the spoken content as text. You can create speech-enabled applications such as transcription of meetings or voice dialogue systems.

Figure. Overview of AmiVoice API

Documentation Structure

Please see the "Introduction and Operation Guide" for security and operational information before implementation, the "Development Guide" for implementation details, the "Reference" for API specifications, and the "Help" section if you encounter any issues.

Quick Start

1

Obtain an APPKEY

Register from the User Registration Page and note down the APPKEY displayed in the [Connection Information] on your MyPage. Set it as an environment variable using the following command.

export APPKEY=your_appkey_here
tip

The AmiVoice Tech Blog provides step-by-step instructions on how to register as a user and convert an audio file to text using AmiVoice API. Please see the following for reference:

Let's Try Using AmiVoice API

2

Prepare an Audio File

Prepare the audio file you want to transcribe. You can use the following sample audio (test.wav) as is.

For information on supported audio file formats, please see Audio Formats.

3

Execute Speech Recognition

Execute the following. Replace test.wav with the path to your audio file.

curl https://acp-api.amivoice.com/v1/recognize \
-F d=-a-general \
-F u=$APPKEY \
-F a=@test.wav | jq
note
  • If the curl command is not installed, download the package for your OS from https://curl.se/ or use a package manager to install curl.
  • The result text is Unicode-escaped. In the above command, jq is used to format the response for better readability. If jq is not installed, try executing without the | jq part. The jq command can be downloaded from https://stedolan.github.io/jq/ for your OS or installed using a package manager.
4

Check the Results

If successful, a JSON like the following will be returned. The transcription result is included in the text field.

{
"results": [
{
"tokens": [ ... ],
"confidence": 0.998,
"starttime": 250,
"endtime": 8794,
"text": "アドバンスト・メディアは、人と機械との自然なコミュニケーションを実現し、豊かな未来を創造していくことを目指します。"
}
],
"utteranceid": "20220602/14/018122d637320a301bc194c9_20220602_141433",
"text": "アドバンスト・メディアは、人と機械との自然なコミュニケーションを実現し、豊かな未来を創造していくことを目指します。",
"code": "",
"message": ""
}

For details on the response content, please see Speech Recognition Result Format.

Next Steps

The Quick Start used the Synchronous HTTP interface. If you want to handle real-time audio sources, you can use the WebSocket interface, and if you want to process large audio files exceeding 15MB, you can use the Asynchronous HTTP interface. For each use case and points to consider when choosing an interface, please see Interface Type and Usage.

We also provide client libraries and sample programs to support development.

For improving speech recognition accuracy, you can utilize the following features:

We also provide additional features such as speaker diarization and sentiment analysis. Please use them according to your purpose.

We also provide features to support the operation of your built services.

Please also see the comprehensive Development Guide.