A Event Packet
This packet is sent from the server to the client when the recognition process is completed and the recognition result is accepted.
Format
Type: JSON
A <result>
<result> contains the following JSON:
Description | |||
---|---|---|---|
results | Array of "recognition results for speech segments" *Although it's an array, it always contains only 1 element. | ||
confidence | Confidence level (value between 0 and 1. 0: low confidence, 1: high confidence) | ||
starttime | Speech start time (0 is the beginning of the audio data) | ||
endtime | Speech end time (0 is the beginning of the audio data) | ||
tags | Unused (empty array) | ||
rulename | Unused (empty string) | ||
text | Recognition result text | ||
tokens | Array of morphemes in the recognition result text | ||
written | Notation of the morpheme (word) | ||
confidence | Confidence of the morpheme (likelihood of the recognition result) | ||
starttime | Start time of the morpheme (0 is the beginning of the audio data) | ||
endtime | End time of the morpheme (0 is the beginning of the audio data) | ||
spoken | Reading of the morpheme *3 | ||
utteranceid | Recognition result information ID *1 | ||
text | Overall recognition result text combining all "recognition results for speech segments". *2 | ||
code | Single character code representing the result. Please see Response Codes and Messages. *2 | ||
message | String representing the error content. Please see Response Codes and Messages. *2 |
*1
For the WebSocket speech recognition protocol, the recognition result information ID is assigned to the recognition result information for each speech segment. For the HTTP speech recognition protocol, it is assigned to the recognition result information for the entire audio data uploaded in one session (which may contain multiple speech segments).
*2
On successful recognition:
body.code == "" and body.message == "" and body.text != ""
On failed recognition:
body.code != "" and body.message != "" and body.text == ""
*3
For Japanese engine recognition results, 'spoken' is in hiragana. For English engine recognition results, 'spoken' is not a reading (please ignore it). For Chinese engine recognition results, 'spoken' is in pinyin.
List of codes and messages included in JSON
code | message | Description |
---|---|---|
"+" | "received unsupported audio format" | Received audio data in an unsupported format |
"-" | "received illegal service authorization" | Received an invalid APPKEY (service authentication key string) |
"!" | "failed to connect to recognizer server" | Communication failure within the speech recognition server (failed to connect to DSRM or DSRS) |
">" | "failed to send audio data to recognizer server" | Communication failure within the speech recognition server (failed to send audio data to DSRS) |
"<" | "failed to receive recognition result from recognizer server" | Communication failure within the speech recognition server (failed to receive recognition results from DSRS) |
"#" | "received invalid recognition result from recognizer server" | Communication failure within the speech recognition server (invalid format of recognition results received from DSRS) |
"$" | "timeout occurred while receiving audio data from client" | No-communication timeout occurred while receiving audio data from the client |
"%" | "received too large audio data from client" | The number of bytes of audio data received from the client is too large (does not occur with WebSocket interface) |
"o" | "recognition result is rejected because confidence is below the threshold" | Recognition failed because the overall confidence of the recognition result is below the confidence threshold * This error is also returned when no utterance could be detected from the entire received audio data, so no recognition result can be returned. Possible causes for failure to detect utterances include audio data loss or incorrect specification of audio data format. |
"b" | "recognition result is rejected because recognizer server is busy" | Recognition failed because the speech recognition server is busy |
"x" | "recognition result is rejected because grammar files are not loaded" | Recognition failed because the dictionary is not loaded |
"c" | "recognition result is rejected because the recognition process is cancelled" | Recognition failed because a request to stop the recognition process was made |
"t" | "recognition result is rejected because timeout occurred during recognition process" | Recognition failed because the recognition process was interrupted due to a timeout or other reason |
"?" | "recognition result is rejected because fatal error occurred in recognizer server" | Recognition failed because a fatal error occurred in the speech recognition server during recognition |
"s" | "recognition result is rejected because recognition process was not started before timeout occurred" | Recognition failed because the recognition process did not start within a certain time after the audio data received from the client was put into the audio data queue |
"e" | "recognition result is rejected because recognition process was not finished before timeout occurred" | Recognition failed because the recognition process did not complete within a certain time after the audio data received from the client was put into the audio data queue |
"" | "" | Recognition successful |
Response Example
{
"results": [
{
"tokens": [
{
"written": "www",
"confidence": 1.0,
"starttime": 16020,
"endtime": 16916,
"spoken": "\u3068\u308a\u3077\u308b\u3060\u3076\u308b"
}
],
"confidence": 0.997,
"starttime": 15700,
"endtime": 17188,
"tags": [],
"rulename": "",
"text": "www"
}
],
"utteranceid": "20191127/ja_ja-amivoicecloud-16k-user@016ead249db00a3011a68536-1127_225504",
"text": "www",
"code": "",
"message": ""
}