Introduction
AmiVoice API is a speech recognition API that converts audio into text. When you send audio to AmiVoice API, it returns the spoken content as text. This allows you to create speech-enabled applications such as meeting transcription or voice dialogue systems.
Quick Start
Get your APPKEY
Register as a user and find your APPKEY in the [Connection Info] section on the My Page dashboard. Set it as an environment variable with the following command.
export AMIVOICE_APPKEY=your_appkey_here
Prepare an audio file
Prepare an audio file to transcribe. You can use the sample audio (test.wav) below right away.
For supported audio file formats, see Audio Formats.
Run speech recognition
Run the following. Replace test.wav with the path to your audio file.
- curl
- Python
curl https://acp-api.amivoice.com/v1/recognize \
-F d=-a-general \
-F u=$AMIVOICE_APPKEY \
-F a=@test.wav | jq
import os
import requests
with open("test.wav", "rb") as f:
response = requests.post(
"https://acp-api.amivoice.com/v1/recognize",
data={"d": "-a-general", "u": os.environ["AMIVOICE_APPKEY"]},
files={"a": f}
)
data = response.json()
print(data["text"]) # The JSON parser automatically converts Unicode escapes to readable text
Check the result
On success, you will receive JSON like the following. The text field contains the transcribed text.
{
"results": [
{
"tokens": [ ... ],
"confidence": 0.998,
"starttime": 250,
"endtime": 8794,
"text": "アドバンスト・メディアは、人と機械との自然なコミュニケーションを実現し、豊かな未来を創造していくことを目指します。"
}
],
"utteranceid": "20220602/14/018122d637320a301bc194c9_20220602_141433",
"text": "アドバンスト・メディアは、人と機械との自然なコミュニケーションを実現し、豊かな未来を創造していくことを目指します。",
"code": "",
"message": ""
}
For details about the response, see Speech Recognition Results.
Next Steps
For detailed API usage, refer to the following guides.
🔗 Developer Guide
Detailed information for API development, including interface selection, request parameters, and result formats.
🔗 Introduction & Operations Guide
Essential information for deploying and operating AmiVoice API in production environments.
Advanced Features
📄️ Engine Selection
Choose an engine optimized for your domain — medical, and more.
🔗 Streaming
Use the WebSocket interface to transcribe real-time audio sources like a microphone.
🔗 Batch Processing
For large files or high-volume audio, use the asynchronous HTTP interface for batch processing.
📄️ Word Registration
Register domain-specific terms and proper nouns to improve recognition accuracy.
🔗 Speaker Diarization
Separate audio containing multiple speakers and identify who spoke when.
🔗 Utterance Volume Tag
Use utterance volume tags to obtain aggregated information about speech segments.