Synchronous HTTP Interface
The synchronous HTTP interface allows you to send request parameters and audio data to the server and receive speech recognition results as a response.
How to Use
Sending Speech Recognition Requests
The endpoint differs depending on whether logging or no logging.
POST https://acp-api.amivoice.com/v1/recognize (logging)
POST https://acp-api.amivoice.com/v1/nolog/recognize (no logging)
For the differences between these, please see Logging.
Specify the parameter names for the required request parameters - authentication information, connection engine name, and audio data - as follows:
u
={authentication information}d
={connection engine name}a
={audio data binary}
Send these to the server using multipart POST. The binary audio data must be placed in the final part of the HTTP multipart.
Let's try making an actual speech recognition request using the curl command. To perform speech recognition on the sample audio file (test.wav) using the 会話_汎用 engine (-a-general
), do the following. Here, we're connecting to the "no logging" endpoint, which doesn't save audio logs on the server.
curl https://acp-api.amivoice.com/v1/nolog/recognize \
-F u={APP_KEY} \
-F d=-a-general \
-F a=@test.wav
Structure of HTTP headers and HTTP body for multipart POST request
The structure will be as follows:
POST https://acp-api.amivoice.com/v1/recognize
Content-Type: multipart/form-data;boundary=some-boundary-string
--some-boundary-string
Content-Disposition: form-data; name="u"
(This part contains <APPKEY>)
--some-boundary-string
Content-Disposition: form-data; name="d"
-a-general
--some-boundary-string
Content-Disposition: form-data; name="a"
Content-Type: application/octet-stream
(The audio data is stored in the last part)
--some-boundary-string--
Parameters set after the a
parameter are ignored.
For example, if you put the u
parameter last as shown below, it will result in an authentication error.
curl https://acp-api.amivoice.com/v1/nolog/recognize \
-F d=-a-general \
-F a=@test.wav \
-F u={APP_KEY} # u is specified after a
Response
{
"results": [
{
"tokens": [],
"tags": [],
"rulename": "",
"text": ""
}
],
"text": "",
"code": ":"-",
"message":"received illegal service authorization"
}
Also, if you put the d
parameter last as shown below, you'll get an error saying the specified speech recognition engine cannot be found.
curl https://acp-api.amivoice.com/v1/nolog/recognize \
-F u={APP_KEY} \
-F a=@test.wav \
-F d=-a-general # d is specified after a
Response
{
"results": [
{
"tokens": [],
"tags": [],
"rulename": "",
"text": ""
}
],
"text": "",
"code": "!",
"message": "failed to connect to recognizer server (can't find available servers)"
}
For information about the response, please see Speech Recognition Results.
Specifying Audio Format
If the audio you're sending doesn't have a header (like WAV or Ogg), you need to specify the audio format. Set the audio format following the c
in the request parameters.
c
={audio format}
For the audio formats that can be specified, please see the Audio Format Compatibility Table.
For example, to send an audio file test.pcm
with a sampling rate of 16kHz, 16-bit quantization, and little-endian byte order, specify LSB16K
for the c
parameter as follows:
curl https://acp-api.amivoice.com/v1/recognize \
-F u={APP_KEY} \
-F d=-a-general \
-F c=LSB16K \
-F a=@test.pcm
Structure of HTTP headers and HTTP body for multipart POST request
The structure will be as follows:
POST https://acp-api.amivoice.com/v1/recognize
Content-Type: multipart/form-data;boundary=some-boundary-string
--some-boundary-string
Content-Disposition: form-data; name="u"
(This part contains <APPKEY>)
--some-boundary-string
Content-Disposition: form-data; name="d"
-a-general
--some-boundary-string
Content-Disposition: form-data; name="c"
LSB16K
--some-boundary-string
Content-Disposition: form-data; name="a"
Content-Type: application/octet-stream
(The audio data is stored in the last part)
--some-boundary-string--
Multiple Parameters
If you want to set request parameters other than the required parameters, such as the profile ID (profileId
), you can set multiple parameters in the d
parameter as follows:
d=<key>=<value> <key>=<value> <key>=<value> ...
- Separate each <key>=<value> pair with a space or a line break.
- Since the connection engine name is required, in this case, specify
grammarFileNames
as the key, likegrammarFileNames=-a-general
.
Example:
curl https://acp-api.amivoice.com/v1/recognize \
-F u={APP_KEY} \
-F d="grammarFileNames=-a-general profileId=:user01" \
-F a=@test.wav
The <value> in the "<key>=<value>" above needs to be URL encoded.
For example, to set a word with the spelling "www" and pronunciation "とりぷるだぶる" in profileWords
, encode the space between the spelling and pronunciation as %20
, and encode "とりぷるだぶる" as %E3%81%A8%E3%82%8A%E3%81%B7%E3%82%8D%E3%81%A0%E3%81%B6%E3%82%8B
.
curl https://acp-api.amivoice.com/v1/recognize \
-F u={APP_KEY} \
-F d="grammarFileNames=-a-general profileWords=hogehoge%20%E3%81%A8%E3%82%8A%E3%81%B7%E3%82%8D%E3%81%A0%E3%81%B6%E3%82%8B" \
-F a=@test.wav
- Use UTF-8 character encoding
- In this URL encoding, spaces are converted to "%20" instead of "+"
Sending Parameters as URL Query Strings
Parameters other than a
, such as c
, d
, and u
, can be sent either as URL query strings or in the HTTP body using the multipart method.
- To avoid hitting HTTP header size limits, it's recommended to send all parameters using the multipart method.
- If the same parameter is specified in both the URL query string and the multipart, the value in the query parameter takes precedence.
- Although
u
can be sent as a query string, it's recommended to always send it in the HTTP body using the multipart method to prevent potential leaks in communication path logs.
When sending the d
parameter as a query string, you need to URL encode the value of the d
parameter again.
https://acp-api.amivoice.com/v1/recognize?d=grammarFileNames%3D-a-general%20profileWords%3Dhogehoge%2520%25E3%2581%25BB%25E3%2581%2592%25E3%2581%25BB%25E3%2581%2592%25E3%2581%25A6%25E3%2581%2599%25E3%2581%25A8
"%" is converted to "%25", "=" to "%3D", and spaces to "%20".
Other Documentation
- For API reference, please see Synchronous HTTP Interface.
- We provide a client library (
Hrp
) that encapsulates the communication processing and procedures for using the HTTP interface, allowing you to easily create speech recognition applications by implementing only the necessary interfaces. First, try running the sample program HrpTester. For the interface specifications of the Hrp client library, please see Hrp (HTTP Interface Client) in the client library documentation.