Synchronous HTTP Interface

The synchronous HTTP interface allows you to send request parameters and audio data to the server and receive speech recognition results as a response.

How to Use

Sending Speech Recognition Requests

The endpoint differs depending on whether logging or no logging.

POST https://acp-api.amivoice.com/v1/recognize    (logging)
POST https://acp-api.amivoice.com/v1/nolog/recognize (no logging)

For the differences between these, please see Logging.

Specify the required parameters of the request parameters authentication information, connection engine name, and audio data as follows:

u={authentication information}
d={connection engine name}
a={binary audio data}

Send these to the server using multipart POST. The binary audio data must be placed in the final part of the HTTP multipart.

Let's try making an actual speech recognition request using the curl command. To perform speech recognition on the sample audio file (test.wav) using the 会話_汎用 engine (-a-general), do the following. Here, we're connecting to the "no logging" endpoint, which doesn't save audio logs on the server.

curl https://acp-api.amivoice.com/v1/nolog/recognize \
     -F u={APP_KEY} \
     -F d=-a-general \
     -F a=@test.wav

Structure of HTTP headers and HTTP body for multipart POST request

The structure will be as follows:

POST https://acp-api.amivoice.com/v1/recognize
Content-Type: multipart/form-data;boundary=some-boundary-string

--some-boundary-string
Content-Disposition: form-data; name="u"

(This part contains <APPKEY>)
--some-boundary-string
Content-Disposition: form-data; name="d"

-a-general
--some-boundary-string
Content-Disposition: form-data; name="a"
Content-Type: application/octet-stream

(The audio data is stored in the last part)
--some-boundary-string--

warning

Parameters set after the a parameter are ignored.

For example, if you put the u parameter last as shown below, it will result in an authentication error.

curl https://acp-api.amivoice.com/v1/nolog/recognize \
     -F d=-a-general \
     -F a=@test.wav \
     -F u={APP_KEY}      # u is specified after a

Response

{
  "results": [
    {
      "tokens": [],
      "tags": [],
      "rulename": "",
      "text": ""
    }
  ],
  "text": "",
  "code": ":"-",
  "message":"received illegal service authorization"
}

Also, if you put the d parameter last as shown below, you'll get an error saying the specified speech recognition engine cannot be found.

curl https://acp-api.amivoice.com/v1/nolog/recognize \
     -F u={APP_KEY} \
     -F a=@test.wav \
     -F d=-a-general      # d is specified after a

Response

{
  "results": [
    {
      "tokens": [],
      "tags": [],
      "rulename": "",
      "text": ""
    }
  ],
  "text": "",
  "code": "!",
  "message": "failed to connect to recognizer server (can't find available servers)"
}

For information about the response, please see Speech Recognition Results.

Specifying Audio Format

If the audio you're sending doesn't have a header (like WAV or Ogg), you need to specify the audio format. Set the audio format following the c in the request parameters.

c={audio format}

For the audio formats that can be specified, please see the Audio Format Compatibility Table.

For example, to send an audio file test.pcm with a sampling rate of 16kHz, 16-bit quantization, and little-endian byte order, specify LSB16K for the c parameter as follows:

curl https://acp-api.amivoice.com/v1/recognize \
     -F u={APP_KEY} \
     -F d=-a-general \
     -F c=LSB16K \
     -F a=@test.pcm

Structure of HTTP headers and HTTP body for multipart POST request

The structure will be as follows:

POST https://acp-api.amivoice.com/v1/recognize
Content-Type: multipart/form-data;boundary=some-boundary-string

--some-boundary-string
Content-Disposition: form-data; name="u"

(This part contains <APPKEY>)
--some-boundary-string
Content-Disposition: form-data; name="d"

-a-general
--some-boundary-string
Content-Disposition: form-data; name="c"

LSB16K
--some-boundary-string
Content-Disposition: form-data; name="a"
Content-Type: application/octet-stream
(The audio data is stored in the last part)
--some-boundary-string--

Multiple Parameters

If you want to set request parameters other than the required parameters, such as the profile ID (profileId), you can set multiple parameters in the d parameter as follows:

d=<key>=<value> <key>=<value> <key>=<value> ...

Separate each <key>=<value> pair with a space or a line break.
Since the connection engine name is required, in this case, specify grammarFileNames as the key, like grammarFileNames=-a-general.

Example:

curl https://acp-api.amivoice.com/v1/recognize \
     -F u={APP_KEY} \
     -F d="grammarFileNames=-a-general profileId=:user01" \
     -F a=@test.wav

The <value> in the "<key>=<value>" above needs to be URL encoded. For example, to set a word with the notation "www" and pronunciation "とりぷるだぶる" in profileWords, encode the space between the notation and pronunciation as %20, and encode "とりぷるだぶる" as %E3%81%A8%E3%82%8A%E3%81%B7%E3%82%8D%E3%81%A0%E3%81%B6%E3%82%8B.

curl https://acp-api.amivoice.com/v1/recognize \
     -F u={APP_KEY} \
     -F d="grammarFileNames=-a-general profileWords=hogehoge%20%E3%81%A8%E3%82%8A%E3%81%B7%E3%82%8D%E3%81%A0%E3%81%B6%E3%82%8B" \
     -F a=@test.wav

note

Use UTF-8 character encoding
In this URL encoding, spaces are converted to "%20" instead of "+"

Sending Parameters as URL Query Strings

Parameters other than a, such as c, d, and u, can be sent either as URL query strings or in the HTTP body using the multipart/form-data.

note

To avoid hitting HTTP header size limits, it's recommended to send all parameters using the multipart/form-data.
If the same parameter is specified in both the URL query string and the multipart, the value in the query parameter takes precedence.
Although u can be sent as a query string, it's recommended to always send it in the HTTP body using the multipart/form-data to prevent potential leaks in communication path logs.

When sending the d parameter as a query string, you need to URL encode the value of the d parameter again.

https://acp-api.amivoice.com/v1/recognize?d=grammarFileNames%3D-a-general%20profileWords%3Dhogehoge%2520%25E3%2581%25BB%25E3%2581%2592%25E3%2581%25BB%25E3%2581%2592%25E3%2581%25A6%25E3%2581%2599%25E3%2581%25A8

"%" is converted to "%25", "=" to "%3D", and spaces to "%20".

How to Use​

Sending Speech Recognition Requests​

Specifying Audio Format​

Multiple Parameters​

Sending Parameters as URL Query Strings​

Other Documentation​

How to Use

Sending Speech Recognition Requests

Specifying Audio Format

Multiple Parameters

Sending Parameters as URL Query Strings

Other Documentation