Asynchronous HTTP Interface
The asynchronous HTTP interface is a non-blocking HTTP API for transcribing long audio into text.
To use this API, follow these steps:
- Create a speech recognition job
- Poll to check the status of the speech recognition job and retrieve the results
The method for generating the job in step 1 is almost the same as the synchronous HTTP interface, except for how to specify logging options.
Endpoint
Unlike the synchronous HTTP interface, the base endpoint is the same regardless of whether logging is enabled or not.
https://acp-api-async.amivoice.com/v1/recognitions
1. Creating a Job
Endpoint:
POST https://acp-api-async.amivoice.com/v1/recognitions
Request
The request method is the same as the synchronous HTTP interface. For details, please see Request in the synchronous HTTP interface reference.
About d parameters
The d parameters of the synchronous HTTP interface can be set. Parameters that are only valid for the asynchronous HTTP interface are shown in the following table.
Parameter Name | Value | Description |
---|---|---|
loggingOptOut | True|False | Specifies whether to change to logging or no logging. When set to True , the system will not save logs during the session. The default is False . |
contentId | Any string | You can specify any user-defined string. It will be included in the status and result responses during that session. Not set by default. |
compatibleWithSync | True|False | Result format compatibility. Formats results in a way compatible with the synchronous HTTP interface. The default is False . |
speakerDiarization | True|False | Speaker diarization enablement option. Enables speaker diarization. The default is False . |
diarizationMinSpeaker | int | Minimum estimated number of speakers for speaker diarization. Only valid when speaker diarization is enabled, you can specify the minimum number of speakers in the audio. Must be set to 1 or higher. The default is 1. |
diarizationMaxSpeaker | int | Maximum estimated number of speakers for speaker diarization. Only valid when speaker diarization is enabled, you can specify the maximum number of speakers in the audio. Must be set to a value greater than or equal to diarizationMinSpeaker. The default is 10. |
sentimentAnalysis | True|False | Enables sentiment analysis. The default is False . |
Response
The successful response is in JSON format and contains the following values:
Key | Description |
---|---|
sessionid | Job ID for the user's speech recognition request. |
text | Always returns ... |
Example
{"sessionid":"017ac8786c5b0a0504399999","text":"..."}
The failure response is in JSON format and contains the following values:
Key | Description | |
---|---|---|
results | Array (1 element) | |
tokens | Array (empty) | |
tags | Array (empty) | |
rulename | String (empty) | |
text | String (empty) | |
text | String (empty) | |
code | Single character code representing the result. Please see Response Codes and Messages. | |
message | String describing the error content. Please see Response Codes and Messages. |
Example
{
"results": [{ "tokens": [], "tags": [], "rulename": "", "text": "" }],
"text": "",
"code": "-",
"message": "received illegal service authorization"
}
2. Checking Job Status, Retrieving Results
Endpoint:
GET https://acp-api-async.amivoice.com/v1/recognitions/{session_id}
Request
Request Parameters
Parameter Name | Required | Transmission Method | Description |
---|---|---|---|
session_id | ● | Path parameter | Specify the ID obtained in the response when creating the job. |
Authentication
Please specify the APPKEY in the Authorization
header.
Authorization: Bearer {APPKEY}
Response
If the request is successful, it returns the status
of the speech recognition request and associated information. If it fails, it returns an HTTP response code other than 200. The status
in case of success takes one of the following 5 values:
status | Description |
---|---|
queued | The job is registered in the queue. |
started | The job has been taken out of the queue and is preparing the execution environment. |
processing | The job is being executed. |
completed | Results have been obtained from the speech recognition process. If code is an empty string, meaning the speech recognition was successful, the results are included in segments . |
error | An error occurred when trying to execute the job or during job execution. The error details are stored in error_message . |
Information Included in the Response
The following table summarizes the information included in each state: queued, started, processing, completed, and error. In the columns for each state's initial letter (q, s, p, c, e), A indicates always included. O indicates included if that information is available.
Key | q | s | p | c | e | Description |
---|---|---|---|---|---|---|
status | A | A | A | A | A | Job status. Takes the states: queued, started, processing, completed, error. |
audio_md5 | A | A | O | MD5 checksum value of the received audio file. | ||
audio_size | A | A | O | Size of the received audio file. | ||
content_id | O | O | O | O | O | Value of contentId set by the user at the time of request. |
service_id | A | A | A | A | User name ID. | |
segments | A | Results of the speech recognition process. Speech recognition results per utterance. | ||||
utteranceid | A | Results of the speech recognition process. Recognition result information ID. | ||||
text | A | Results of the speech recognition process. Overall recognition result text combining all "recognition results for speech segments". | ||||
code | A | Results of the speech recognition process. Single character code representing the result. | ||||
message | A | Results of the speech recognition process. String describing the error content. | ||||
error_message | A | String describing the error content |
Details of the completed (speech recognition result) response
When the status is completed, it returns the speech recognition results in JSON format. Unlike the synchronous HTTP interface, the recognition results are placed under segments
on a per-utterance basis.
Description | ||||
---|---|---|---|---|
segments | ||||
results | Array of "recognition results for speech segments" | |||
confidence | Confidence (value between 0 and 1. 0: low confidence, 1: high confidence) | |||
starttime | Speech start time (0 is the beginning of the audio data) | |||
endtime | Speech end time (0 is the beginning of the audio data) | |||
tags | Unused (empty array) | |||
rulename | Unused (empty string) | |||
text | Recognition result text | |||
tokens | Array of morphemes of the recognition result text | |||
written | Notation of the morpheme (word) | |||
confidence | Confidence of the morpheme (likelihood of the recognition result) | |||
starttime | Start time of the morpheme (0 is the beginning of the audio data) | |||
endtime | End time of the morpheme (0 is the beginning of the audio data) | |||
spoken | Reading of the morpheme | |||
label | Speaker label (speaker0, speaker1, ...) Included in the result only when speakerDiarization=True is specified at the time of request. | |||
utteranceid | Recognition result information ID | |||
text | Overall recognition result text combining all "recognition results for speech segments" | |||
code | Single character code representing the result. Please see Response Codes and Messages. | |||
message | String describing the error content. Please see Response Codes and Messages. |
Details of the error response
This is an example of a response when speech recognition processing fails. For each value, please see Information Included in the Response.
Example
{
"status": "error",
"audio_md5":"40f59fe5fc7745c33b33af44be43f6ad",
"audio_size":306980,
"service_id":"{YOUR_SERVICE_ID}",
"session_id":"017c25ec12c00a304474a999",
"error_message": "ERROR: Failed to transcribe in recognition process - amineth_result=0, amineth_code='o', amineth_message='recognition result is rejected because confidence is below the threshold'"
}
Error Response
If the call to the endpoint for checking job status and retrieving results fails, it returns an HTTP status code other than 200. The response body returns JSON data containing the following information:
Parameter Name | Description |
---|---|
errorCode | Response code. |
errorMessage | Error message. |
Example:
{
"errorCode":401,
"errorMessage":"Invalid authorization header format"
}
Status Codes and Error Messages
The status codes and error messages in case of call failure are as follows:
HTTP Status Code | Error Message | Description |
---|---|---|
401 | No app_key | APPKEY is not set |
401 | No authorization header | Authorization header is not set |
401 | Invalid authorization header format | Authorization header format is invalid |
401 | Failed to authorize for the app_key | Authentication with the specified APPKEY failed |
404 | Specified session_id is not found | The job with the specified session_id is not found. If this error occurs even when specifying the correct session_id , the following cases may be possible:- When the data retention period has passed - When a user different from the one at the time of request tries to retrieve the job status or results |
500 | - | Internal error. Please contact us here. |