Asynchronous HTTP Interface

The asynchronous HTTP interface is a non-blocking HTTP API for transcribing long audio into text.

To use this API, follow these steps:

Create a speech recognition job
Poll to check the status of the speech recognition job and retrieve the results

The method for generating the job in step 1 is almost the same as the synchronous HTTP interface, except for how to specify logging options.

Endpoint

Unlike the synchronous HTTP interface, the base endpoint is the same regardless of whether logging is enabled or not.

https://acp-api-async.amivoice.com/v1/recognitions

1. Creating a Job

Endpoint:

POST https://acp-api-async.amivoice.com/v1/recognitions

Request

The request method is the same as the synchronous HTTP interface. For details, please see Request in the synchronous HTTP interface reference.

About d parameters

The d parameters of the synchronous HTTP interface can be set. Parameters that are only valid for the asynchronous HTTP interface are shown in the following table.

Parameter Name	Value	Description
loggingOptOut	True\|False	Specifies whether to change to logging or no logging. When set to `True`, the system will not save logs during the session. The default is `False`.
contentId	Any string	You can specify any user-defined string. It will be included in the status and result responses during that session. Not set by default.
compatibleWithSync	True\|False	Result format compatibility. Formats results in a way compatible with the synchronous HTTP interface. The default is `False`.
speakerDiarization	True\|False	Speaker diarization enablement option. Enables speaker diarization. The default is `False`.
diarizationMinSpeaker	int	Minimum estimated number of speakers for speaker diarization. Only valid when speaker diarization is enabled, you can specify the minimum number of speakers in the audio. Must be set to 1 or higher. The default is 1.
diarizationMaxSpeaker	int	Maximum estimated number of speakers for speaker diarization. Only valid when speaker diarization is enabled, you can specify the maximum number of speakers in the audio. Must be set to a value greater than or equal to diarizationMinSpeaker. The default is 10.
sentimentAnalysis	True\|False	Enables sentiment analysis. The default is `False`.

Response

The successful response is in JSON format and contains the following values:

Key	Description
sessionid	Job ID for the user's speech recognition request.
text	Always returns ...

Example

{"sessionid":"017ac8786c5b0a0504399999","text":"..."}

The failure response is in JSON format and contains the following values:

Key		Description
results		Array (1 element)
	tokens	Array (empty)
	tags	Array (empty)
	rulename	String (empty)
	text	String (empty)
text		String (empty)
code		Single character code representing the result. Please see Response Codes and Messages.
message		String describing the error content. Please see Response Codes and Messages.

Example

{
  "results": [{ "tokens": [], "tags": [], "rulename": "", "text": "" }],
  "text": "",
  "code": "-",
  "message": "received illegal service authorization"
}

2. Checking Job Status, Retrieving Results

Endpoint:

GET https://acp-api-async.amivoice.com/v1/recognitions/{session_id}

Request

Request Parameters

Parameter Name	Required	Transmission Method	Description
session_id	●	Path parameter	Specify the ID obtained in the response when creating the job.

Authentication

Please specify the APPKEY in the Authorization header.

Authorization: Bearer {APPKEY}

Response

If the request is successful, it returns the status of the speech recognition request and associated information. If it fails, it returns an HTTP response code other than 200. The status in case of success takes one of the following 5 values:

status	Description
queued	The job is registered in the queue.
started	The job has been taken out of the queue and is preparing the execution environment.
processing	The job is being executed.
completed	Results have been obtained from the speech recognition process. If `code` is an empty string, meaning the speech recognition was successful, the results are included in `segments`.
error	An error occurred when trying to execute the job or during job execution. The error details are stored in `error_message`.

Information Included in the Response

The following table summarizes the information included in each state: queued, started, processing, completed, and error. In the columns for each state's initial letter (q, s, p, c, e), A indicates always included. O indicates included if that information is available.

Key	q	s	p	c	e	Description
status	A	A	A	A	A	Job status. Takes the states: queued, started, processing, completed, error.
audio_md5			A	A	O	MD5 checksum value of the received audio file.
audio_size			A	A	O	Size of the received audio file.
content_id	O	O	O	O	O	Value of contentId set by the user at the time of request.
service_id	A	A	A	A		User ID.
segments				A		Results of the speech recognition process. Speech recognition results per utterance.
utteranceid				A		Results of the speech recognition process. Recognition result information ID.
text				A		Results of the speech recognition process. Overall recognition result text combining all "recognition results for speech segments".
code				A		Results of the speech recognition process. Single character code representing the result.
message				A		Results of the speech recognition process. String describing the error content.
error_message					A	String describing the error content

Details of the completed (speech recognition result) response

When the status is completed, it returns the speech recognition results in JSON format. Unlike the synchronous HTTP interface, the recognition results are placed under segments on a per-utterance basis.

				Description
segments
	results			Array of "recognition results for speech segments"
		confidence		Confidence (value between 0 and 1. 0: low confidence, 1: high confidence)
		starttime		Speech start time (0 is the beginning of the audio data)
		endtime		Speech end time (0 is the beginning of the audio data)
		tags		Unused (empty array)
		rulename		Unused (empty string)
		text		Recognition result text
		tokens		Array of morphemes of the recognition result text
			written	Notation of the morpheme (word)
			confidence	Confidence of the morpheme (likelihood of the recognition result)
			starttime	Start time of the morpheme (0 is the beginning of the audio data)
			endtime	End time of the morpheme (0 is the beginning of the audio data)
			spoken	Pronunciation of the morpheme
			label	Speaker label (speaker0, speaker1, ...) Included in the result only when speakerDiarization=True is specified at the time of request.
utteranceid				Recognition result information ID
text				Overall recognition result text combining all "recognition results for speech segments"
code				Single character code representing the result. Please see Response Codes and Messages.
message				String describing the error content. Please see Response Codes and Messages.

Details of the error response

This is an example of a response when speech recognition processing fails. For each value, please see Information Included in the Response.

Example

{
    "status": "error",
    "audio_md5":"40f59fe5fc7745c33b33af44be43f6ad",
    "audio_size":306980,
    "service_id":"{YOUR_SERVICE_ID}",
    "session_id":"017c25ec12c00a304474a999",
    "error_message": "ERROR: Failed to transcribe in recognition process - amineth_result=0, amineth_code='o', amineth_message='recognition result is rejected because confidence is below the threshold'"
}

Error Response

If the call to the endpoint for checking job status and retrieving results fails, it returns an HTTP status code other than 200. The response body returns JSON data containing the following information:

Parameter Name	Description
errorCode	Response code.
errorMessage	Error message.

Example:

{
    "errorCode":401,
    "errorMessage":"Invalid authorization header format"
}

Status Codes and Error Messages

The status codes and error messages in case of call failure are as follows:

HTTP Status Code	Error Message	Description
401	No app_key	APPKEY is not set
401	No authorization header	Authorization header is not set
401	Invalid authorization header format	Authorization header format is invalid
401	Failed to authorize for the app_key	Authentication with the specified APPKEY failed
404	Specified session_id is not found	The job with the specified `session_id` is not found. If this error occurs even when specifying the correct `session_id`, the following cases may be possible: - When the data retention period has passed - When a user different from the one at the time of request tries to retrieve the job status or results
500	-	Internal error. Please contact us here.

Endpoint​

1. Creating a Job​

Request​

About d parameters​

Response​

2. Checking Job Status, Retrieving Results​

Request​

Request Parameters​

Authentication​

Response​

Information Included in the Response​

Details of the completed (speech recognition result) response​

Details of the error response​

Error Response​

Status Codes and Error Messages​

Endpoint

1. Creating a Job

Request

About d parameters

Response

2. Checking Job Status, Retrieving Results

Request

Request Parameters

Authentication

Response

Information Included in the Response

Details of the completed (speech recognition result) response

Details of the error response

Error Response

Status Codes and Error Messages