Asynchronous HTTP Interface

The asynchronous HTTP interface is a non-blocking HTTP API for transcribing long audio files.

To use this API, follow these steps:

Create a speech recognition job
Poll to check the status of the speech recognition job and retrieve the results

How to Use

1. Create a speech recognition job

To create a job, set the request parameters in the same way as the synchronous HTTP interface, and send the request to the asynchronous HTTP interface endpoint.

POST https://acp-api-async.amivoice.com/v1/recognitions

For example, to send a speech recognition request for the test.wav file no logging using the curl command:

curl https://acp-api-async.amivoice.com/v1/recognitions \
     -F u={APP_KEY} \
     -F d="grammarFileNames=-a-general loggingOptOut=True" \
     -F a=@test.wav

While the endpoint differs from the synchronous HTTP interface, the method for setting request parameters is the same.

note

Some parameters, such as sentiment analysis, are only supported by the asynchronous HTTP interface.

caution

Unlike the synchronous HTTP interface, logging or no logging is specified by request parameters, not by the endpoint. Logging is enabled by default. For no logging, specify loggingOptOut=True in the d parameter.

In case of success

The successful response includes a sessionid. This is the job ID for the user's speech recognition request and is used to check the job status and obtain results. The text always returns "...".

Example

{"sessionid":"017ac8786c5b0a0504399999","text":"..."}

In case of failure

The failed response does not include a sessionid. You can determine the cause of failure from the code and message. Please see Response codes and messages.

Example

{
  "results": [{ "tokens": [], "tags": [], "rulename": "", "text": "" }],
  "text": "",
  "code": "-",
  "message": "received illegal service authorization"
}

2. Check the status of the speech recognition job and retrieve results

After successfully sending a speech recognition request, check the job status and poll until the status becomes completed or error.

Retrieving the job status

Jobs are executed sequentially on the server side. To check the job status or retrieve results, query the result retrieval endpoint GET /v1/recognitions/{session_id}.

Set the sessionid to the job ID obtained when creating the job. Specify the authentication information (authorization) of the request parameters in the Authorization header.

When executing with curl, do the following. Here, assume the sessionid is 017c25ec12c00a304474a999.

curl -H "Authorization: Bearer {APPKEY}" \
     https://acp-api-async.amivoice.com/v1/recognitions/017c25ec12c00a304474a999

queued status

Immediately after sending the request, the status will be in the queued state.

{"service_id":"{YOUR_SERVICE_ID}","session_id":"017c25ec12c00a304474a999","status":"queued"}

started status

When the job is taken from the queue, the status becomes started.

{"service_id":"{YOUR_SERVICE_ID}","session_id":"017c25ec12c00a304474a999","status":"started"}

processing status

When the actual speech recognition process begins, the status becomes processing. Please see the sample below. Here, the results are formatted with line breaks for readability. You can use the size of the audio received by the API (audio_size) and the MD5 checksum (audio_md5) to verify that the audio was transmitted correctly.

The time it takes to go from processing to the next completed state depends on the length of the audio, but it's roughly 0.5~1.5 times the length of the audio as a guideline.

{
    "audio_md5":"40f59fe5fc7745c33b33af44be43f6ad",
    "audio_size":306980,
    "service_id":"{YOUR_SERVICE_ID}",
    "session_id":"017c25ec12c00a304474a999",
    "status":"processing"
}

completed status

When speech recognition is complete, the status becomes completed. At this time, you can obtain the speech recognition results in the results and segments of the response. The results are stored on the server for a certain period after the speech recognition server processing is completed. For the retention period, please see "Speech recognition result retention period" in the limitations of the asynchronous HTTP interface.

For details on the response including recognition results, please see Speech recognition results.

note

If you access results that have been deleted after a certain period, you will get a 404 NOT FOUND error. For errors, please see the Error response list in the reference.

error status

If speech recognition fails for some reason, the status becomes error. In this case, the cause of the error is set in error_messsage.

Example:

{
    "status": "error",
    "audio_md5":"40f59fe5fc7745c33b33af44be43f6ad",
    "audio_size":306980,
    "service_id":"{YOUR_SERVICE_ID}",
    "session_id":"017c25ec12c00a304474a999",
    "error_message": "ERROR: Failed to transcribe in recognition process - amineth_result=0, amineth_code='o', amineth_message='recognition result is rejected because confidence is below the threshold'"
}

The error_message may include amineth_code='{response code}' and amineth_message='{error message}'. For details, please see the table in the response codes and messages details.

Particularly, if the error message includes amineth_code='o', there is a problem with the client's request method or audio file, so retrying will yield the same result. For details, please see "Reject (Response code=o)". For errors other than 'o', it's likely an issue with the AmiVoice API infrastructure, so please wait a while before retrying.

Content ID

You can freely set a string in the contentId of the d parameter when making a request. For example, by setting information such as an ID issued by the application, a file name, or user information, you can later obtain this information as part of the recognition result.

For example, to send a request setting the file name as the contentId using the curl command:

curl https://acp-api-async.amivoice.com/v1/recognitions \
     -F u={APP_KEY} \
     -F d="grammarFileNames=-a-general loggingOptOut=True contentId=test.wav" \
     -F a=@test.wav

When retrieving the job status or results, content_id will be included as follows:

{"content_id":"test.wav","service_id":"{YOUR_SERVICE_ID}","session_id":"017c25ec12c00a304474a999","status":"queued"}

Sample Code

Here's a Python sample code demonstrating the typical flow of the asynchronous HTTP interface.

Request Parameters

An AmiVoice API APPKEY is required for execution. Set your AmiVoice API APPKEY in the following line:

app_key = 'TODO: Please set APPKEY here'

Decide on the options to set in the d parameter. Here, we'll set the following:

Engine: General purpose (grammarFileNames=-a-general)
Logging: None (loggingOptOut=True)
Content ID: File name (contentId=filename)
Speaker diarization: Enabled (speakerDiarization=True)
Number of speakers: Max=Min=2 (diarizationMinSpeaker=2, diarizationMaxSpeaker=2)
Sentiment analysis: Enabled (sentimentAnalysis=True)

domain = {
    'grammarFileNames': '-a-general',
    'loggingOptOut': 'True',
    'contentId': filename,
    'speakerDiarization': 'True',
    'diarizationMinSpeaker': '2',
    'diarizationMaxSpeaker': '2',
    'sentimentAnalysis': 'True',
    ...

We're also registering two words. profileId is commented out, so the registered words will only be valid for this session. For details, please see Word Registration.

    #'profileId': 'test',
    'profileWords': 'wwww よんこだぶる|www2 とりぷるだぶる',
}

URL encode the values of the key-value pairs to be set in the d parameter. In Python, we use urllib.parse.quote.

params = {
    'u': app_key,
    'd': ' '.join([f'{key}={urllib.parse.quote(value)}' for key, value in domain.items()]),
}
logger.info(params)

params will look like this:

{'u': 'XXXX', 'd': 'grammarFileNames=-a-general loggingOptOut=True contentId=www.wav profileWords=wwww%20%E3%82%88%E3%82%93%E3%81%93%E3%81%A0%E3%81%B6%E3%82%8B%7Cwww2%20%E3%81%A8%E3%82%8A%E3%81%B7%E3%82%8B%E3%81%A0%E3%81%B6%E3%82%8B speakerDiarization=True diarizationMinSpeaker=2 diarizationMaxSpeaker=2 sentimentAnalysis=True'}

Request to Create a Speech Recognition Job

Send the above params and the audio file via HTTP POST. For readability in describing HTTP communication, we'll use the HTTP client library requests in this sample.

request_response = requests.post(
    url=endpoint,
    data={key: value for key, value in params.items()},
    files={'a': (filename, open(filename, 'rb').read(), 'application/octet-stream')}
)

Check if the call was successful using the HTTP status code. Also, check if the job creation was successful by verifying the existence of sessionid in the response.

if request_response.status_code != 200:
    logger.error(f'Failed to request - {request_response.content}')
    exit(1)

request = request_response.json()

if 'sessionid' not in request:
    logger.error(f'Failed to create job - {request["message"]} ({request["code"]})')
    exit(2)

logger.info(request)

When job creation is successful, you'll get a response like this. Use the sessionid included in the response to check the job status and retrieve results.

{'sessionid': '01838d9535080a304474a07f', 'text': '...'}

Checking the Speech Recognition Job Status

Send an HTTP GET request to recognitions/{sessionid}. Poll until the status in the response becomes completed or error. Here, we're checking the result every 10 seconds.

while True:
    # HTTP GET request to `recognitions/{sessionid}`
    result_response = requests.get(
        url=f'{endpoint}/{request["sessionid"]}',
        headers={'Authorization': f'Bearer {app_key}'}
    )
    if result_response.status_code == 200:
        result = result_response.json()
        if 'status' in result and (result['status'] == 'completed' or result['status'] == 'error'):
            # If the `status` in the response is `completed` or `error`, format and output the result
            print(json.dumps(result, ensure_ascii=False, indent=4))
            exit(0)
        else:
            # If the `status` in the response is not `completed` or `error`, the job is still running
            # So, wait a bit (10 seconds here) before checking the status again
            logger.info(result)
            time.sleep(10)
    else:
        # If the HTTP response code is not 200, exit
        logger.error(f'Failed. Response is {result_response.content} - {e}')
        exit(3)

Code

Here's the complete Python code we've discussed so far.

async-http-sample.py
import time
import json
import urllib
import logging

import requests


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s %(message)s")


endpoint = 'https://acp-api-async.amivoice.com/v1/recognitions'
app_key = 'TODO: Please set APPKEY here'
filename = 'www-2.wav'

# Request parameters
domain = {
    'grammarFileNames': '-a-general',
    'loggingOptOut': 'True',
    'contentId': filename,
    'speakerDiarization': 'True',
    'diarizationMinSpeaker': '2',
    'diarizationMaxSpeaker': '2',
    'sentimentAnalysis': 'True',
    #'profileId': 'test',
    'profileWords': 'wwww よんこだぶる|www2 とりぷるだぶる',
}
params = {
    'u': app_key,
    'd': ' '.join([f'{key}={urllib.parse.quote(value)}' for key, value in domain.items()]),
}
logger.info(params)

# Send job request
request_response = requests.post(
    url=endpoint,
    data={key: value for key, value in params.items()},
    files={'a': (filename, open(filename, 'rb').read(), 'application/octet-stream')}
)

if request_response.status_code != 200:
    logger.error(f'Failed to request - {request_response.content}')
    exit(1)

request = request_response.json()

if 'sessionid' not in request:
    logger.error(f'Failed to create job - {request["message"]} ({request["code"]})')
    exit(2)

logger.info(request)

# Check status every 10 seconds until results are ready
while True:
    # HTTP GET request to `recognitions/{sessionid}`
    result_response = requests.get(
        url=f'{endpoint}/{request["sessionid"]}',
        headers={'Authorization': f'Bearer {app_key}'}
    )
    if result_response.status_code == 200:
        result = result_response.json()
        if 'status' in result and (result['status'] == 'completed' or result['status'] == 'error'):
            # If the `status` in the response is `completed` or `error`, format and output the result
            print(json.dumps(result, ensure_ascii=False, indent=4))
            exit(0)
        else:
            # If the `status` in the response is not `completed` or `error`, the job is still running
            # So, wait a bit (10 seconds in this case) before checking the status again 
            logger.info(result)
            time.sleep(10)
    else:
        # If the HTTP response code is not 200, exit
        logger.error(f'Failed. Response is {result_response.content} - {e}')
        exit(3)

How to Run

Make sure Python3 is installed on your system.

Install the required library:

pip install requests

Download the sample audio file (www-2.wav) and copy it to the same directory as the program.

info

This is an audio file of someone saying "トリプル・ダブルは、バスケットボールの記録に関する用語です。" In the sample code, we register the word "www2" for the pronunciation "とりぷるだぶる", so you can confirm that this is working effectively.

To run the sample program, execute the following from the command line:

python async-http-sample.py

The execution result will be as follows:

$ python sample.py
2022-12-06 15:01:03,336 {'u': 'XXXX', 'd': 'grammarFileNames=-a-general loggingOptOut=True contentId=www-2.wav speakerDiarization=True diarizationMinSpeaker=2 diarizationMaxSpeaker=2 sentimentAnalysis=True profileWords=wwww%20%E3%82%88%E3%82%93%E3%81%93%E3%81%A0%E3%81%B6%E3%82%8B%7Cwww2%20%E3%81%A8%E3%82%8A%E3%81%B7%E3%82%8B%E3%81%A0%E3%81%B6%E3%82%8B'}
2022-12-06 15:01:03,345 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:04,117 https://acp-api-async.amivoice.com:443 "POST /v1/recognitions HTTP/1.1" 200 55
2022-12-06 15:01:04,119 {'sessionid': '0184e605ff170a306b8f9c96', 'text': '...'}
2022-12-06 15:01:04,122 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:04,309 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:04,312 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:01:14,328 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:14,517 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:14,519 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:01:24,523 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:24,718 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:24,721 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:01:34,728 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:34,886 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:34,888 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:01:44,940 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:45,114 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:45,118 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:01:55,124 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:01:56,735 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 112
2022-12-06 15:01:56,736 {'status': 'queued', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:02:06,743 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:02:06,940 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 113
2022-12-06 15:02:06,942 {'status': 'started', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:02:16,948 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:02:17,108 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 113
2022-12-06 15:02:17,109 {'status': 'started', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav'}
2022-12-06 15:02:27,114 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:02:27,281 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 183
2022-12-06 15:02:27,283 {'status': 'processing', 'session_id': '0184e605ff170a306b8f9c96', 'service_id': 'amiyamamoto', 'content_id': 'www-2.wav', 'audio_size': 270444, 'audio_md5': 'fd7d144824e8a5982d3aaa4cda5358a8'}
2022-12-06 15:02:37,290 Starting new HTTPS connection (1): acp-api-async.amivoice.com:443
2022-12-06 15:02:37,476 https://acp-api-async.amivoice.com:443 "GET /v1/recognitions/0184e605ff170a306b8f9c96 HTTP/1.1" 200 2481
{
    "status": "completed",
    "session_id": "0184e605ff170a306b8f9c96",
    "service_id": "amiyamamoto",
    "content_id": "www-2.wav",
    "audio_size": 270444,
    "audio_md5": "fd7d144824e8a5982d3aaa4cda5358a8",
    "segments": [
        {
            "results": [
                {
                    "tokens": [
                        {
                            "written": "www2",
                            "confidence": 1,
                            "starttime": 1620,
                            "endtime": 2548,
                            "spoken": "とりぷるだぶる",
                            "label": "speaker0"
                        },
                        {
                            "written": "は",
                            "confidence": 1,
                            "starttime": 2548,
                            "endtime": 2788,
                            "spoken": "は",
                            "label": "speaker0"
                        },
                        {
                            "written": "バスケットボール",
                            "confidence": 1,
                            "starttime": 2916,
                            "endtime": 3956,
                            "spoken": "ばすけっとぼーる",
                            "label": "speaker0"
                        },
                        {
                            "written": "の",
                            "confidence": 0.99,
                            "starttime": 3956,
                            "endtime": 4052,
                            "spoken": "の",
                            "label": "speaker0"
                        },
                        {
                            "written": "記録",
                            "confidence": 1,
                            "starttime": 4052,
                            "endtime": 4404,
                            "spoken": "きろく",
                            "label": "speaker0"
                        },
                        {
                            "written": "に",
                            "confidence": 1,
                            "starttime": 4404,
                            "endtime": 4532,
                            "spoken": "に",
                            "label": "speaker0"
                        },
                        {
                            "written": "関する",
                            "confidence": 1,
                            "starttime": 4532,
                            "endtime": 5060,
                            "spoken": "かんする",
                            "label": "speaker0"
                        },
                        {
                            "written": "用語",
                            "confidence": 1,
                            "starttime": 5060,
                            "endtime": 5412,
                            "spoken": "ようご",
                            "label": "speaker1"
                        },
                        {
                            "written": "です",
                            "confidence": 0.96,
                            "starttime": 5412,
                            "endtime": 5940,
                            "spoken": "です",
                            "label": "speaker0"
                        },
                        {
                            "written": "。",
                            "confidence": 0.8,
                            "starttime": 5940,
                            "endtime": 6196,
                            "spoken": "_",
                            "label": "speaker0"
                        }
                    ],
                    "confidence": 1,
                    "starttime": 1300,
                    "endtime": 6196,
                    "tags": [],
                    "rulename": "",
                    "text": "www2はバスケットボールの記録に関する用語です。"
                }
            ],
            "text": "www2はバスケットボールの記録に関する用語です。"
        }
    ],
    "utteranceid": "20221206/15/0184e60741170a30522339d0_20221206_150225[nolog]",
    "text": "www2はバスケットボールの記録に関する用語です。",
    "code": "",
    "message": "",
    "sentiment_analysis": {
        "segments": [
            {
                "starttime": 1680,
                "endtime": 2860,
                /* sentiment parameters */
            },
            {
                "starttime": 3520,
                "endtime": 4900,
                /* sentiment parameters */
            }
        ]
    }
}

The text is the result text, which shows "www2はバスケットボールの記録に関する用語です。" This indicates that the speech content has been correctly recognized. Also, "トリプルダブル" has been converted to "www2", confirming that the word registration is working effectively. For details on the results, please see Speech Recognition Results.

Troubleshooting

received illegal service authorization

If you see the following message, it's possible that the AmiVoice API APPKEY has not been set:

2022-10-11 10:10:44,928 Failed to create job - received illegal service authorization (-)

Please check if the APPKEY is set and correct in the following part of the code:

app_key = 'TODO: Please set APPKEY here'

Please also see the Request Parameters section on this page.

No such file or directory: 'www-2.wav'

If you see the following message, the audio file does not exist in the execution directory:

FileNotFoundError: [Errno 2] No such file or directory: 'www-2.wav'

Please download the sample audio file (www-2.wav) and copy it to the directory where you're running the command. After confirming the file exists, please try running the command again.

How to Use​

1. Create a speech recognition job​

In case of success​

In case of failure​

2. Check the status of the speech recognition job and retrieve results​

Retrieving the job status​

queued status​

started status​

processing status​

completed status​

error status​

Content ID​

Sample Code​

Request Parameters​

Request to Create a Speech Recognition Job​

Checking the Speech Recognition Job Status​

Code​

How to Run​

Troubleshooting​

received illegal service authorization​

No such file or directory: 'www-2.wav'​

Other Documentation​