Asynchronous HTTP Interface
The asynchronous HTTP interface is a non-blocking HTTP API for transcribing long audio files.
To use this API, follow these steps:
- Create a speech recognition job
- Poll to check the status of the speech recognition job and retrieve the results
How to Use
1. Create a speech recognition job
To create a job, set the request parameters in the same way as the synchronous HTTP interface, and send the request to the asynchronous HTTP interface endpoint.
POST https://acp-api-async.amivoice.com/v1/recognitions
For example, to send a speech recognition request for the test.wav file no logging using the curl command:
curl https://acp-api-async.amivoice.com/v1/recognitions \
-F u={APP_KEY} \
-F d="grammarFileNames=-a-general loggingOptOut=True" \
-F a=@test.wav
While the endpoint differs from the synchronous HTTP interface, the method for setting request parameters is the same.
Some parameters, such as sentiment analysis, are only supported by the asynchronous HTTP interface.
Unlike the synchronous HTTP interface, logging or no logging is specified by request parameters, not by the endpoint.
Logging is enabled by default. For no logging, specify loggingOptOut=True
in the d
parameter.
In case of success
The successful response includes a sessionid
. This is the job ID for the user's speech recognition request and is used to check the job status and obtain results.
The text
always returns "...".
Example
{"sessionid":"017ac8786c5b0a0504399999","text":"..."}
In case of failure
The failed response does not include a sessionid
. You can determine the cause of failure from the code
and message
.
Please see Response codes and messages.
Example
{
"results": [{ "tokens": [], "tags": [], "rulename": "", "text": "" }],
"text": "",
"code": "-",
"message": "received illegal service authorization"
}
2. Check the status of the speech recognition job and retrieve results
After successfully sending a speech recognition request, check the job status and poll until the status
becomes completed
or error
.
Retrieving the job status
Jobs are executed sequentially on the server side. To check the job status or retrieve results, query the result retrieval endpoint GET /v1/recognitions/{session_id}
.
Set the sessionid
to the job ID obtained when creating the job. Specify the authentication information (authorization
) from the request parameters in the Authorization
header.
When executing with curl, do the following. Here, assume the sessionid
is 017c25ec12c00a304474a999.
curl -H "Authorization: Bearer {APPKEY}" \
https://acp-api-async.amivoice.com/v1/recognitions/017c25ec12c00a304474a999
queued status
Immediately after sending the request, the status
will be in the queued
state.
{"service_id":"{YOUR_SERVICE_ID}","session_id":"017c25ec12c00a304474a999","status":"queued"}
started status
When the job is taken from the queue, the status
becomes started
.
{"service_id":"{YOUR_SERVICE_ID}","session_id":"017c25ec12c00a304474a999","status":"started"}
processing status
When the actual speech recognition process begins, the status
becomes processing
. Please see the sample below. Here, the results are formatted with line breaks for readability.
You can use the size of the audio received by the API (audio_size
) and the MD5 checksum (audio_md5
) to verify that the audio was transmitted correctly.
The time it takes to go from processing
to the next completed
state depends on the length of the audio, but it's roughly 0.5 to 1.5 times the length of the audio as a guideline.
{
"audio_md5":"40f59fe5fc7745c33b33af44be43f6ad",
"audio_size":306980,
"service_id":"{YOUR_SERVICE_ID}",
"session_id":"017c25ec12c00a304474a999",
"status":"processing"
}
completed status
When speech recognition is complete, the status
becomes completed
. At this time, you can obtain the speech recognition results in the results
and segments
of the response. The results are stored on the server for a certain period after the speech recognition server processing is completed. For the retention period, please see "Speech recognition result retention period" in the limitations of the asynchronous HTTP interface.
For details on the response including recognition results, please see Speech recognition results.
If you access results that have been deleted after a certain period, you will get a 404 NOT FOUND
error. For errors, please see the Error response list in the reference.
error status
If speech recognition fails for some reason, the status
becomes error
. In this case, the cause of the error is set in error_messsage
.
Example:
{
"status": "error",
"audio_md5":"40f59fe5fc7745c33b33af44be43f6ad",
"audio_size":306980,
"service_id":"{YOUR_SERVICE_ID}",
"session_id":"017c25ec12c00a304474a999",
"error_message": "ERROR: Failed to transcribe in recognition process - amineth_result=0, amineth_code='o', amineth_message='recognition result is rejected because confidence is below the threshold'"
}
The error_message
may include amineth_code='{response code}'
and amineth_message='{error message}'
. For details, please see the table in the response codes and messages details.
Particularly, if the error message includes amineth_code='o'
, there is a problem with the client's request method or audio file, so retrying will yield the same result. For details, please see "Reject (Response code=o)". For errors other than 'o', it's likely an issue with the AmiVoice API infrastructure, so please wait a while before retrying.
Content ID
You can freely set a string in the contentId
of the d
parameter when making a request. For example, by setting information such as an ID issued by the application, a file name, or user information, you can later obtain this information as part of the recognition result.
For example, to send a request setting the file name as the contentId
using the curl command:
curl https://acp-api-async.amivoice.com/v1/recognitions \
-F u={APP_KEY} \
-F d="grammarFileNames=-a-general loggingOptOut=True contentId=test.wav" \
-F a=@test.wav
When retrieving the job status or results, content_id
will be included as follows:
{"content_id":"test.wav","service_id":"{YOUR_SERVICE_ID}","session_id":"017c25ec12c00a304474a999","status":"queued"}
Sample Code
Here's a Python sample code demonstrating the typical flow of the asynchronous HTTP interface.
Request Parameters
An AmiVoice API APPKEY is required for execution. Set your AmiVoice API APPKEY in the following line:
app_key = 'TODO: Please set APPKEY here'
Decide on the options to set in the d
parameter. Here, we'll set the following:
- Engine: General purpose (
grammarFileNames=-a-general
) - Logging: None (
loggingOptOut=True
) - Content ID: File name (
contentId=filename
) - Speaker diarization: Enabled (
speakerDiarization=True
) - Number of speakers: Max=Min=2 (
diarizationMinSpeaker=2
,diarizationMaxSpeaker=2
) - Sentiment analysis: Enabled (
sentimentAnalysis=True
)
domain = {
'grammarFileNames': '-a-general',
'loggingOptOut': 'True',
'contentId': filename,
'speakerDiarization': 'True',
'diarizationMinSpeaker': '2',
'diarizationMaxSpeaker': '2',
'sentimentAnalysis': 'True',
...
We're also registering two words. profileId
is commented out, so the registered words will only be valid for this session. For details, please see Word Registration.
#'profileId': 'test',
'profileWords': 'wwww よんこだぶる|www2 とりぷるだぶる',
}
URL encode the values of the key-value pairs to be set in the d
parameter. In Python, we use urllib.parse.quote
.
params = {
'u': app_key,
'd': ' '.join([f'{key}={urllib.parse.quote(value)}' for key, value in domain.items()]),
}
logger.info(params)
params
will look like this:
{'u': 'XXXX', 'd': 'grammarFileNames=-a-general loggingOptOut=True contentId=www.wav profileWords=wwww%20%E3%82%88%E3%82%93%E3%81%93%E3%81%A0%E3%81%B6%E3%82%8B%7Cwww2%20%E3%81%A8%E3%82%8A%E3%81%B7%E3%82%8B%E3%81%A0%E3%81%B6%E3%82%8B speakerDiarization=True diarizationMinSpeaker=2 diarizationMaxSpeaker=2 sentimentAnalysis=True'}
Sending the Speech Recognition Job Request
Send the params
from earlier and the audio file via HTTP POST. For readability in describing HTTP communication, we'll use the HTTP client library requests
in this sample.
request_response = requests.post(
url=endpoint,
data={key: value for key, value in params.items()},
files={'a': (filename, open(filename, 'rb').read(), 'application/octet-stream')}
)
Check if the call was successful using the HTTP status code. Also, check if the job creation was successful by verifying the existence of sessionid
in the response.
if request_response.status_code != 200:
logger.error(f'Failed to request - {request_response.content}')
exit(1)
request = request_response.json()
if 'sessionid' not in request:
logger.error(f'Failed to create job - {request["message"]} ({request["code"]})')
exit(2)
logger.info(request)
When job creation is successful, you'll get a response like this. Use the sessionid
included in the response to check the job status and retrieve results.
{'sessionid': '01838d9535080a304474a07f', 'text': '...'}
Checking the Speech Recognition Job Status
Send an HTTP GET request to recognitions/{sessionid}
. Poll until the status
in the response becomes completed
or error
. Here, we're checking the result every 10 seconds.
while True:
# HTTP GET request to `recognitions/{sessionid}`
result_response = requests.get(
url=f'{endpoint}/{request["sessionid"]}',
headers={'Authorization': f'Bearer {app_key}'}
)
if result_response.status_code == 200:
result = result_response.json()
if 'status' in result and (result['status'] == 'completed' or result['status'] == 'error'):
# If the `status` in the response is `completed` or `error`, format and output the result
print(json.dumps(result, ensure_ascii=False, indent=4))
exit(0)
else:
# If the `status` in the response is not `completed` or `error`, the job is still running
# So, wait a bit (10 seconds here) before checking the status again
logger.info(result)
time.sleep(10)
else:
# If the HTTP response code is not 200, exit
logger.error(f'Failed. Response is {result_response.content} - {e}')
exit(3)
Code
Here's the complete Python code we've discussed so far.
import time
import json
import urllib
import logging
import requests
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s %(message)s")
endpoint = 'https://acp-api-async.amivoice.com/v1/recognitions'
app_key = 'TODO: Please set APPKEY here'
filename = 'www-2.wav'
# Request parameters
domain = {
'grammarFileNames': '-a-general',
'loggingOptOut': 'True',
'contentId': filename,
'speakerDiarization': 'True',
'diarizationMinSpeaker': '2',
'diarizationMaxSpeaker': '2',
'sentimentAnalysis': 'True',
#'profileId': 'test',
'profileWords': 'wwww よんこだぶる|www2 とりぷるだぶる',
}
params = {
'u': app_key,
'd': ' '.join([f'{key}={urllib.parse.quote(value)}' for key, value in domain.items()]),
}
logger.info(params)
# Send job request
request_response = requests.post(
url=endpoint,
data={key: value for key, value in params.items()},
files={'a': (filename, open(filename, 'rb').read(), 'application/octet-stream')}
)
if request_response.status_code != 200:
logger.error(f'Failed to request - {request_response.content}')
exit(1)
request = request_response.json()
if 'sessionid' not in request:
logger.error(f'Failed to create job - {request["message"]} ({request["code"]})')
exit(2)
logger.info(request)
# Check status every 10 seconds until results are ready
while True:
# Send HTTP GET request to `recognitions/{sessionid}`
result_response = requests.get(
url=f'{endpoint}/{request["sessionid"]}',
headers={'Authorization': f'Bearer {app_key}'}
)
if result_response.status_code == 200:
result = result_response.json()
if 'status' in result and (result['status'] == 'completed' or result['status'] == 'error'):
# If the `status` in the response is `completed` or `error`, format and output the result
print(json.dumps(result, ensure_ascii=False, indent=4))
exit(0)
else:
# If the `status` in the response is not `completed` or `error`, the job is still running
# So, wait a bit before checking the status again (10 seconds in this case)
logger.info(result)
time.sleep(10)
else:
# If the HTTP response code is not 200, exit
logger.error(f'Failed. Response is {result_response.content} - {e}')
exit(3)
How to Run
Make sure Python3 is installed on your system.
Install the required library:
pip install requests
Download the sample audio file (www-2.wav) and copy it to the same directory as the program.
This is an audio file of someone saying "トリプル・ダブルは、バスケットボールの記録に関する用語です。" In the sample code, we register the word "www2" for the pronunciation "とりぷるだぶる", so you can confirm that this is working effectively.
To run the sample program, execute the following from the command line:
python async-http-sample.py
The execution result will be as follows:
$ python sample.py
2022-12-06 15:01:03,336 {'u': 'XXXX', 'd': 'grammarFileNames=-a-general loggingOptOut=True contentId=www-2.wav speakerDiarization=True diarizationMinSpeaker=2 diarizationMaxSpeaker=2 sentimentAnalysis=True profileWords=wwww%20%E3%82%88%E3%82%93%E3%81%93%E3%81%A0%E3%81%B6%E3%82%8B%7Cwww2%20%E3%81%A8%E3%82%8A%E3%81%B7%E3%82%8B%E3%81%A0%E3%81%B6%E3%82%8B'}
2022-12-06 15:01:03,345 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:04,117 https://acp-api-async.amivoice.com:443 "POST /v1/recognitions HTTP/1.1" 200 55
2022-12-06 15:01:04,119 {'sessionid': '0184e605ff170a306b8f9c96', 'text': '...'}
2022-12-06 15:01:04,122 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:04,309 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:04,312 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:01:14,328 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:14,517 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:14,519 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:01:24,523 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:24,718 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:24,721 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:01:34,728 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:34,886 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:34,888 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:01:44,940 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:45,114 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:45,118 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:01:55,124 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:56,735 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:56,736 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:02:06,743 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:02:06,940 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 113
2022-12-06 15:02:06,942 {'status': 'started', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:02:16,948 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:02:17,108 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 113
2022-12-06 15:02:17,109 {'status': 'started', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:02:27,114 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:02:27,281 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 183
2022-12-06 15:02:27,283 {'status': 'processing', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav', 'audio_size': 270444, 'audio_md5': 'fd7d144824e8a5982d3aaa4cda5358a8'}
2022-12-06 15:02:37,290 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:02:37,476 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 2481
{
"status": "completed",
"session_id": "0184e605ff170a306b8f9c96",
"service_id": "amiyamamoto",
"content_id": "www-2.wav",
"audio_size": 270444,
"audio_md5": "fd7d144824e8a5982d3aaa4cda5358a8",
"segments": [
{
"results": [
{
"tokens": [
{
"written": "www2",
"confidence": 1,
"starttime": 1620,
"endtime": 2548,
"spoken": "とりぷるだぶる",
"label": "speaker0"
},
{
"written": "は",
"confidence": 1,
"starttime": 2548,
"endtime": 2788,
"spoken": "は",
"label": "speaker0"
},
{
"written": "バスケットボール",
"confidence": 1,
"starttime": 2916,
"endtime": 3956,
"spoken": "ばすけっとぼーる",
"label": "speaker0"
},
{
"written": "の",
"confidence": 0.99,
"starttime": 3956,
"endtime": 4052,
"spoken": "の",
"label": "speaker0"
},
{
"written": "記録",
"confidence": 1,
"starttime": 4052,
"endtime": 4404,
"spoken": "きろく",
"label": "speaker0"
},
{
"written": "に",
"confidence": 1,
"starttime": 4404,
"endtime": 4532,
"spoken": "に",
"label": "speaker0"
},
{
"written": "関する",
"confidence": 1,
"starttime": 4532,
"endtime": 5060,
"spoken": "かんする",
"label": "speaker0"
},
{
"written": "用語",
"confidence": 1,
"starttime": 5060,
"endtime": 5412,
"spoken": "ようご",
"label": "speaker1"
},
{
"written": "です",
"confidence": 0.96,
"starttime": 5412,
"endtime": 5940,
"spoken": "です",
"label": "speaker0"
},
{
"written": "。",
"confidence": 0.8,
"starttime": 5940,
"endtime": 6196,
"spoken": "_",
"label": "speaker0"
}
],
"confidence": 1,
"starttime": 1300,
"endtime": 6196,
"tags": [],
"rulename": "",
"text": "www2はバスケットボールの記録に関する用語です。"
}
],
"text": "www2はバスケットボールの記録に関する用語です。"
}
],
"utteranceid": "20221206/15/0184e60741170a30522339d0_20221206_150225[nolog]",
"text": "www2はバスケットボールの記録に関する用語です。",
"code": "",
"message": "",
"sentiment_analysis": {
"segments": [
{
"starttime": 1680,
"endtime": 2860,
/* sentimental parameters */
},
{
"starttime": 3520,
"endtime": 4900,
/* sentimental parameters */
}
]
}
}
The text
is the result text, which shows "www2はバスケットボールの記録に関する用語です。" This indicates that the speech content has been correctly recognized. Also, "トリプルダブル" has been converted to "www2", confirming that the word registration is working effectively. For details on the results, please see Speech Recognition Results.
Troubleshooting
received illegal service authorization
If you see the following message, it's possible that the AmiVoice API APPKEY has not been set:
2022-10-11 10:10:44,928 Failed to create job - received illegal service authorization (-)
Please check if the APPKEY is set and correct in the following part of the code:
app_key = 'TODO: Please set APPKEY here'
Please also see the Request Parameters section on this page.
No such file or directory: 'www-2.wav'
If you see the following message, the audio file does not exist in the execution directory:
FileNotFoundError: [Errno 2] No such file or directory: 'www-2.wav'
Please download the sample audio file (www-2.wav) and copy it to the directory where you're running the command. After confirming the file exists, please try running the command again.
Other Documentation
- For the API reference, please see Asynchronous HTTP Interface.