Skip to main content

WebSocket Interface

After establishing a WebSocket connection, you can make speech recognition requests via text messages and receive sequential responses. You can send audio data in small chunks, such as a real-time recorded audio stream, and obtain recognition results sequentially.

The process follows these steps:

  1. WebSocket connection
  2. Speech recognition request
  3. Sending audio data
  4. Receiving status events
  5. Obtaining speech recognition results
  6. Notifying the end of audio data transmission
  7. WebSocket disconnection
note

We provide client libraries that simplify creating real-time speech recognition applications by hiding the details of the WebSocket interface. For usage instructions, please see How to use the real-time speech recognition library Wrp.

Applications using audio streaming implement their logic by receiving commands and command responses (s, p, e) and events corresponding to server processing (S, E, C, U, A) after establishing a WebSocket connection. The general flow is as follows:

Figure. Commands and Events

The following sections explain the implementation method step by step. For details on commands and command responses (s, p, e), server processing events (S, E, C, U, A), and responses, please see Streaming Responses.

How to Use

1. WebSocket Connection

Connect to the speech recognition server using WebSocket. You can choose between two endpoints to allow or disallow logging:

wss://acp-api.amivoice.com/v1/     (logging)
wss://acp-api.amivoice.com/v1/nolog/ (no logging)

For information on logging, please see Logging.

Communication with the server is done via text messages. While we explain using Python code here, you can perform real-time speech recognition by sending and receiving text messages after establishing a WebSocket connection in other languages as well.

We'll use Python's websocket-client library to handle WebSocket connections easily. We'll connect to the AmiVoice API WebSocket interface endpoint with logging. When the WebSocket connection is established with the server, on_open is called, and when a message is received from the server, on_message is called. We'll add processing to these methods to explain communication with the speech recognition server.

import websocket
import logging


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s %(threadName)s %(message)s")


def on_open(ws):
logger.info("open")

def on_message(ws, message):
logger.info(f"message: {message}")

def on_close(ws):
logger.info("close")


ws = websocket.WebSocketApp('wss://acp-api.amivoice.com/v1/',
on_open=on_open,
on_message=on_message,
on_close=on_close)
ws.run_forever()
caution

In rare cases when the underlying system is extremely congested, the WebSocket connection may fail. In such cases, please try to reconnect several times until successful.

2. Speech Recognition Request

After successfully connecting via WebSocket, send the s command. The s command has the following format:

s <audio_format> <grammar_file_names> <key>=<value> ...

Specify the audio format of the audio to be sent in this session for audio_format. For grammar_file_name, specify the connection engine name from the request parameters. Then, set the authentication information (authorization) in the format authorization={APPKEY}. You can set other request parameters in the <key>=<value> format.

Let's consider transcribing the audio file (test.wav) included in the sample using the general-purpose engine (-a-general). Since this audio file is a wav container file with a 16kHz sampling rate, we'll specify 16K for audio_format. For details, please see In case of audio files with headers. Set -a-general, which can be used most generally, for grammar_file_name. Add the following code to the on_open handler for WebSocket connection:

APPKEY='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'

def on_open(ws):
logger.info("open")
command = f"s 16K -a-general authorization={APPKEY}"
logger.info(f"send> {command}")
ws.send(command)

To set other request parameters, add them to the s command in the format <key1>=<value1> <key2>=<value2>.... Here, we'll add two parameters to the s command line to deliberately display unnecessary words like "あのー" and "えーっと" (keepFillerToken) and change the timing of sending update events (resultUpdatedInterval) to 1 second.

def on_open(ws):
logger.info("open")
command = f"s 16K -a-general authorization={APPKEY} keepFillerToken=1 resultUpdatedInterval=1000"
logger.info(f"send> {command}")
ws.send(command)

If the value to be set for a request parameter contains spaces, enclose the value in double quotes like "value". For example, when setting multiple parameters for segmenterProperties, do it as follows:

segmenterProperties="useDiarizer=1 diarizerAlpha=1e-20 diarizerTransitionBias=1e-10"

The on_open function becomes:

def on_open(ws):
logger.info("open")
command = f"s 16K -a-general authorization={APPKEY} segmenterProperties=\"useDiarizer=1 diarizerAlpha=1e-20 diarizerTransitionBias=1e-10\""
logger.info(f"send> {command}")
ws.send(command)

For details, please see the s command packet in the reference.

Response

When the speech recognition request is accepted, the server returns a text message, the s command response packet.

On success
s
On failure

An error message is received after s with a half-width space. For error types, please see the s command response packet in the reference.

s Error message

Example:

s received unsupported audio format
caution

In rare cases when the underlying system is extremely congested, it may return an error like the following. In such cases, please try to retry the s command several times until it succeeds. For details on error messages, please see the s command packet Server Error in the reference. Also, see the Client Program State Transition described later.

s can't connect to recognizer server (can't connect to server)

To process the s command response, add the following code to on_message:

def on_message(ws, message):
event = message[0]
content = message[2:].rstrip()
logger.info(f"message: {event} {content}")

if event == 's':
if content != "":
logger.error(content)
return
# s command succeeded

3. Sending Audio Data

Once the s command succeeds, you can send the audio file. Use the p command to send the binary audio data. The p command has the following format:

p<audio_data>

audio_data is the binary audio data. Please set the audio data in the audio format specified by the s command at the start of the session.

info

Please send audio data that matches the audio format specified in the s command. Even if the format is different, it won't result in an error, but the response may take a very long time or the recognition results may not be obtained correctly.

note
  • The maximum size of audio data that can be sent in one p command is 16MB. If the data size is larger than that, please split it.
  • You can split the audio data at any point. There's no need to be aware of wav chunks or mp3/ogg/opus frame boundaries.
  • You cannot change the format of the data you send midway. If you want to change the audio format, end with the e command and make a new speech recognition request from s. The same applies to audio files with headers; end with the e command for each file and make a new speech recognition request from s.

In the handler for the s command response, send the audio file (test.wav) included in the sample. Add the following to on_message. Here, we're using sleep to simulate sending at the same timing as real-time, as if recording from a microphone.

import time
import threading


def on_message(ws, message):
event = message[0]
content = message[2:].rstrip()
logger.info(f"message: {event} {content}")

if event == 's':
if content != "":
logger.error(content)
return

# s command succeeded
# If the request was successful, send the audio file data to the server
def send_audio(*args):
with open(filename, mode='rb') as file:
buf = file.read(audio_block_size)
while buf:
logger.info("send> p [..({} bytes)..]".format(len(buf)))
ws.send(b'p' + buf,
opcode=websocket.ABNF.OPCODE_BINARY)
buf = file.read(audio_block_size)
# test.wav is 16bit, 16kHz so 32,000 bytes/sec. Sending 16,000 at a time, so waiting 0.5 seconds makes it the same as real-time
time.sleep(0.5)
logger.info("send> e")
# After finishing sending the audio, send the e command
ws.send('e')
threading.Thread(target=send_audio).start()

4. Receiving Status Events

When you send audio, you'll receive G, speech start (S), speech end (E), and events indicating that speech recognition processing has started (C). G is an event that notifies information generated on the server side, but please ignore it. Along with the S and E events, you'll get the relative time in milliseconds from the start of the audio.

For example, when sending the audio file (test.wav), the events received from the server will be as follows:

G
S 250
C
E 8800

Add the following code to on_message to process status events. The code added here doesn't do anything, but add appropriate processing if you want to use the speech start and end times.

def on_message(ws, message):
event = message[0]
content = message[2:].rstrip()
logger.info(f"message: {event} {content}")

if event == 's':
# ...omitted...
elif event == 'G':
pass
elif event == 'S':
starttime = int(content)
elif event == 'E':
endtime = int(content)
elif event == 'C':
pass
note

If speech cannot be detected from the audio data, these events will not be obtained, and speech recognition result events will not be obtained either. The following reasons can be considered, so please check:

  • There is no audio at all, or the volume is extremely low. Check if the recording system is not muted and if the volume settings are appropriate.
  • The audio format and audio data do not match. Check the audio format.

5. Obtaining Speech Recognition Results

The speech recognition server notifies intermediate results as U events. Here, since we've set resultUpdatedInterval=1000 in the s command, U events will be obtained approximately every second. When processing is complete and the result is confirmed, an A event is obtained. For details on the results, please see Speech Recognition Results. For a series of results for the test.wav audio, please see the Sample Code Results.

def on_message(ws, message):
event = message[0]
content = message[2:].rstrip()
logger.info(f"message: {event} {content}")

if event == 's':
# ...omitted...
elif event == 'G':
pass
elif event == 'S':
starttime = int(content)
elif event == 'E':
endtime = int(content)
elif event == 'C':
pass
elif event == 'U':
raw = json.loads(content) if content else ''
elif event == 'A' or event == 'R':
raw = json.loads(content) if content else ''

6. Notifying the End of Audio Data Transmission

After sending all the audio, you can end the speech recognition session by sending the e command.

e

In the following code, the e command is sent after all the audio file data has been sent.

def on_message(ws, message):
event = message[0]
content = message[2:].rstrip()
logger.info(f"message: {event} {content}")

if event == 's':
if content != "":
logger.error(content)
return

# s command succeeded
# If the request was successful, send the audio file data to the server
def send_audio(*args):
with open(filename, mode='rb') as file:
buf = file.read(audio_block_size)
while buf:
logger.info("send> p [..({} bytes)..]".format(len(buf)))
ws.send(b'p' + buf,
opcode=websocket.ABNF.OPCODE_BINARY)
buf = file.read(audio_block_size)
# test.wav is 16bit, 16kHz so 32,000 bytes/sec. Sending 16,000 at a time, so waiting 0.5 seconds makes it the same as real-time
time.sleep(0.5)
# After finishing sending the audio, send the e command
logger.info("send> e")
ws.send('e')
threading.Thread(target=send_audio).start()

When you send the e command, the speech recognition server processes all received audio, returns all results, and then returns the e command response. It may take time to complete depending on the length of the sent audio, so please wait for the e command response to get all results.

caution

To handle unexpected situations such as communication failures or server-side delays, set appropriate communication timeouts and ensure that your application functions properly even if there is no response from the speech recognition server.

7. Closing the WebSocket

Here, we close the WebSocket when the response to the e command is received.

def on_message(ws, message):
event = message[0]
content = message[2:].rstrip()
logger.info(f"message: {event} {content}")

if event == 's':
# ...omitted...
elif event == 'G':
pass
elif event == 'S':
starttime = int(content)
elif event == 'E':
endtime = int(content)
elif event == 'C':
pass
elif event == 'U':
raw = json.loads(content) if content else ''
elif event == 'A' or event == 'R':
raw = json.loads(content) if content else ''
elif event == 'e':
logger.info("close>")
ws.close()

Sample Code

Here's the complete Python code we've discussed so far.

websocket-sample.py
import time
import websocket
import json
import threading
import logging


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s %(threadName)s %(message)s")

server = 'wss://acp-api.amivoice.com/v1/'
filename = 'test.wav'
codec = "16K"
audio_block_size = 16000
grammar_file_names = "-a-general"
options = {
"profileId" : "",
"profileWords" : "",
"keepFillerToken": "",
"resultUpdatedInterval" : "1000",
"authorization" : 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
}


def on_open(ws):
logger.info("open")
def start(*args):
command = "s {} {}".format(codec, grammar_file_names)
for k, v in options.items():
if v != "":
if k == 'profileWords':
v = '"' + v.replace('"', '""') + '"'
command += f" {k}={v}"
logger.info("send> {command}")
ws.send(command)
threading.Thread(target=start).start()


def on_message(ws, message):
event = message[0]
content = message[2:].rstrip()
logger.info(f"message: {event} {content}")

if event == 's':
if content == "can't connect to recognizer server":
logger.error(content)
return

def send_audio(*args):
with open(filename, mode='rb') as file:
buf = file.read(audio_block_size)
while buf:
logger.debug("send> p [..({} bytes)..]".format(len(buf)))
ws.send(b'p' + buf,
opcode=websocket.ABNF.OPCODE_BINARY)
buf = file.read(audio_block_size)
time.sleep(0.5)
logger.info("send> e")
ws.send('e')
threading.Thread(target=send_audio).start()

elif event == 'G':
pass
elif event == 'S':
starttime = int(content)
elif event == 'E':
endtime = int(content)
elif event == 'C':
pass
elif event == 'U':
raw = json.loads(content) if content else ''
elif event == 'A' or event == 'R':
raw = json.loads(content) if content else ''
elif event == 'e':
logger.info("close>")
ws.close()


def on_close(ws):
logger.info("close")


logger.info("open> {}".format(server))
ws = websocket.WebSocketApp(server,
on_open=on_open,
on_message=on_message,
on_close=on_close)
ws.run_forever()

Execution

You can run it as follows:

$ python websocket-sample.py

Results

The operation log when sending an audio file (test.wav) looks like this. It displays the time in milliseconds from the start of the program, the thread name, and the message.

You can see that the Thread-1 thread is sending audio (send> p [..(16000 bytes)..]), and interim results (message: U) are received approximately every second. Finally, the confirmed result (message: A) is obtained.

         4  MainThread   open> wss://acp-api.amivoice.com/v1/
94 MainThread open
94 MainThread send> s LSB16K -a-general resultUpdatedInterval=1000 authorization={APPKEY}
133 MainThread message: s
134 Thread-1 send> p [..(16000 bytes)..]
174 MainThread message: G
637 Thread-1 send> p [..(16000 bytes)..]
668 MainThread message: S 250
668 MainThread message: C
1139 Thread-1 send> p [..(16000 bytes)..]
1639 Thread-1 send> p [..(16000 bytes)..]
2144 Thread-1 send> p [..(16000 bytes)..]
2647 Thread-1 send> p [..(16000 bytes)..]
3148 Thread-1 send> p [..(16000 bytes)..]
3174 MainThread message: U {"results":[{"tokens":[{"written":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2"},{"written":"..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2..."}
3649 Thread-1 send> p [..(16000 bytes)..]
4153 Thread-1 send> p [..(16000 bytes)..]
4179 MainThread message: U {"results":[{"tokens":[{"written":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2"},{"written":"\u306f"},{"written":"\u3001"},{"written":"\u3072\u3068\u3068\u304d"},{"written":"..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u3072\u3068\u3068\u304d..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u3072\u3068\u3068\u304d..."}
4656 Thread-1 send> p [..(16000 bytes)..]
5157 Thread-1 send> p [..(16000 bytes)..]
5184 MainThread message: U {"results":[{"tokens":[{"written":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2"},{"written":"\u306f"},{"written":"\u3001"},{"written":"\u4eba"},{"written":"\u3068"},{"written":"\u6a5f\u68b0"},{"written":"\u3068"},{"written":"\u306e"},{"written":"\u81ea\u7136"},{"written":"\u306a"},{"written":"..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a..."}
5658 Thread-1 send> p [..(16000 bytes)..]
6159 Thread-1 send> p [..(16000 bytes)..]
6187 MainThread message: U {"results":[{"tokens":[{"written":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2"},{"written":"\u306f"},{"written":"\u3001"},{"written":"\u4eba"},{"written":"\u3068"},{"written":"\u6a5f\u68b0"},{"written":"\u3068"},{"written":"\u306e"},{"written":"\u81ea\u7136"},{"written":"\u306a"},{"written":"\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3"},{"written":"\u3092"},{"written":"\u6301"},{"written":"..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u6301..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u6301..."}
6660 Thread-1 send> p [..(16000 bytes)..]
7161 Thread-1 send> p [..(16000 bytes)..]
7185 MainThread message: U {"results":[{"tokens":[{"written":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2"},{"written":"\u306f"},{"written":"\u3001"},{"written":"\u4eba"},{"written":"\u3068"},{"written":"\u6a5f\u68b0"},{"written":"\u3068"},{"written":"\u306e"},{"written":"\u81ea\u7136"},{"written":"\u306a"},{"written":"\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3"},{"written":"\u3092"},{"written":"\u5b9f\u73fe"},{"written":"\u3057"},{"written":"\u3001"},{"written":"..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001..."}
7662 Thread-1 send> p [..(16000 bytes)..]
8164 Thread-1 send> p [..(16000 bytes)..]
8199 MainThread message: U {"results":[{"tokens":[{"written":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2"},{"written":"\u306f"},{"written":"\u3001"},{"written":"\u4eba"},{"written":"\u3068"},{"written":"\u6a5f\u68b0"},{"written":"\u3068"},{"written":"\u306e"},{"written":"\u81ea\u7136"},{"written":"\u306a"},{"written":"\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3"},{"written":"\u3092"},{"written":"\u5b9f\u73fe"},{"written":"\u3057"},{"written":"\u3001"},{"written":"\u8c4a\u304b"},{"written":"\u306a"},{"written":"\u672a\u6765"},{"written":"\u3092"},{"written":"..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001\u8c4a\u304b\u306a\u672a\u6765\u3092..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001\u8c4a\u304b\u306a\u672a\u6765\u3092..."}
8668 Thread-1 send> p [..(16000 bytes)..]
9171 Thread-1 send> p [..(16000 bytes)..]
9188 MainThread message: E 8800
9190 MainThread message: U {"results":[{"tokens":[{"written":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2"},{"written":"\u306f"},{"written":"\u3001"},{"written":"\u4eba"},{"written":"\u3068"},{"written":"\u6a5f\u68b0"},{"written":"\u3068"},{"written":"\u306e"},{"written":"\u81ea\u7136"},{"written":"\u306a"},{"written":"\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3"},{"written":"\u3092"},{"written":"\u5b9f\u73fe"},{"written":"\u3057"},{"written":"\u3001"},{"written":"\u8c4a\u304b"},{"written":"\u306a"},{"written":"\u672a\u6765"},{"written":"\u3092"},{"written":"\u5275\u9020"},{"written":"\u3057\u3066"},{"written":"\u3044\u304f"},{"written":"\u3053\u3068"},{"written":"..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001\u8c4a\u304b\u306a\u672a\u6765\u3092\u5275\u9020\u3057\u3066\u3044\u304f\u3053\u3068..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001\u8c4a\u304b\u306a\u672a\u6765\u3092\u5275\u9020\u3057\u3066\u3044\u304f\u3053\u3068..."}
9390 MainThread message: U {"results":[{"tokens":[{"written":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2"},{"written":"\u306f"},{"written":"\u3001"},{"written":"\u4eba"},{"written":"\u3068"},{"written":"\u6a5f\u68b0"},{"written":"\u3068"},{"written":"\u306e"},{"written":"\u81ea\u7136"},{"written":"\u306a"},{"written":"\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3"},{"written":"\u3092"},{"written":"\u5b9f\u73fe"},{"written":"\u3057"},{"written":"\u3001"},{"written":"\u8c4a\u304b"},{"written":"\u306a"},{"written":"\u672a\u6765"},{"written":"\u3092"},{"written":"\u5275\u9020"},{"written":"\u3057\u3066"},{"written":"\u3044\u304f"},{"written":"\u3053\u3068"},{"written":"\u3092"},{"written":"\u76ee\u6307\u3057"},{"written":"\u307e\u3059"},{"written":"\u3002"},{"written":"..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001\u8c4a\u304b\u306a\u672a\u6765\u3092\u5275\u9020\u3057\u3066\u3044\u304f\u3053\u3068\u3092\u76ee\u6307\u3057\u307e\u3059\u3002..."}],"text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001\u8c4a\u304b\u306a\u672a\u6765\u3092\u5275\u9020\u3057\u3066\u3044\u304f\u3053\u3068\u3092\u76ee\u6307\u3057\u307e\u3059\u3002..."}
9471 MainThread message: A {"results":[{"tokens":[{"written":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2","confidence":1.00,"starttime":522,"endtime":1578,"spoken":"\u3042\u3069\u3070\u3093\u3059\u3068\u3081\u3067\u3043\u3042"},{"written":"\u306f","confidence":1.00,"starttime":1578,"endtime":1866,"spoken":"\u306f"},{"written":"\u3001","confidence":0.74,"starttime":1866,"endtime":2026,"spoken":"_"},{"written":"\u4eba","confidence":1.00,"starttime":2026,"endtime":2314,"spoken":"\u3072\u3068"},{"written":"\u3068","confidence":1.00,"starttime":2314,"endtime":2426,"spoken":"\u3068"},{"written":"\u6a5f\u68b0","confidence":1.00,"starttime":2426,"endtime":2826,"spoken":"\u304d\u304b\u3044"},{"written":"\u3068","confidence":1.00,"starttime":2826,"endtime":2954,"spoken":"\u3068"},{"written":"\u306e","confidence":1.00,"starttime":2954,"endtime":3082,"spoken":"\u306e"},{"written":"\u81ea\u7136","confidence":1.00,"starttime":3082,"endtime":3434,"spoken":"\u3057\u305c\u3093"},{"written":"\u306a","confidence":1.00,"starttime":3434,"endtime":3530,"spoken":"\u306a"},{"written":"\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3","confidence":1.00,"starttime":3530,"endtime":4378,"spoken":"\u3053\u307f\u3085\u306b\u3051\u30fc\u3057\u3087\u3093"},{"written":"\u3092","confidence":1.00,"starttime":4378,"endtime":4458,"spoken":"\u3092"},{"written":"\u5b9f\u73fe","confidence":1.00,"starttime":4458,"endtime":4922,"spoken":"\u3058\u3064\u3052\u3093"},{"written":"\u3057","confidence":1.00,"starttime":4922,"endtime":5434,"spoken":"\u3057"},{"written":"\u3001","confidence":1.00,"starttime":5434,"endtime":5546,"spoken":"_"},{"written":"\u8c4a\u304b","confidence":1.00,"starttime":5546,"endtime":5994,"spoken":"\u3086\u305f\u304b"},{"written":"\u306a","confidence":1.00,"starttime":5994,"endtime":6090,"spoken":"\u306a"},{"written":"\u672a\u6765","confidence":1.00,"starttime":6090,"endtime":6490,"spoken":"\u307f\u3089\u3044"},{"written":"\u3092","confidence":1.00,"starttime":6490,"endtime":6554,"spoken":"\u3092"},{"written":"\u5275\u9020","confidence":0.93,"starttime":6554,"endtime":7050,"spoken":"\u305d\u3046\u305e\u3046"},{"written":"\u3057\u3066","confidence":0.99,"starttime":7050,"endtime":7210,"spoken":"\u3057\u3066"},{"written":"\u3044\u304f","confidence":1.00,"starttime":7210,"endtime":7418,"spoken":"\u3044\u304f"},{"written":"\u3053\u3068","confidence":1.00,"starttime":7418,"endtime":7690,"spoken":"\u3053\u3068"},{"written":"\u3092","confidence":1.00,"starttime":7690,"endtime":7722,"spoken":"\u3092"},{"written":"\u76ee\u6307\u3057","confidence":0.77,"starttime":7722,"endtime":8090,"spoken":"\u3081\u3056\u3057"},{"written":"\u307e\u3059","confidence":0.76,"starttime":8090,"endtime":8506,"spoken":"\u307e\u3059"},{"written":"\u3002","confidence":0.82,"starttime":8506,"endtime":8794,"spoken":"_"}],"confidence":0.998,"starttime":250,"endtime":8794,"tags":[],"rulename":"","text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001\u8c4a\u304b\u306a\u672a\u6765\u3092\u5275\u9020\u3057\u3066\u3044\u304f\u3053\u3068\u3092\u76ee\u6307\u3057\u307e\u3059\u3002"}],"utteranceid":"20220620/ja_ja-amivoicecloud-16k-user01@01817dce7ba30a301ccf8536-0620_061133","text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001\u8c4a\u304b\u306a\u672a\u6765\u3092\u5275\u9020\u3057\u3066\u3044\u304f\u3053\u3068\u3092\u76ee\u6307\u3057\u307e\u3059\u3002","code":"","message":""}
9495 MainThread message: G
9672 Thread-1 send> p [..(2980 bytes)..]
10174 Thread-1 send> e
10225 MainThread message: e
10225 MainThread close>

Maintaining the Session

When sending audio in real-time, if no audio is detected for 600 seconds (i.e., continuously sending silent audio), the server will disconnect the session. In this case, you will receive a p command response packet like the following:

p can't feed audio data to recognizer server

Additionally, if there is no communication for 60 seconds, the server will close the session. In this case, you will receive an e command response packet like the following:

e timeout occurred while recognizing audio data from client

When you receive these responses, please reconnect and then send the audio again.

Client Program State Transition

The state of the client program transitions as follows, according to command transmissions and responses:

Other Documents

  • For command and response sequences, and response details, please see Streaming Response.
  • For the API reference, please see WebSocket Interface.
  • We provide a client library (Wrp) that encapsulates the communication processing and procedures for using the WebSocket interface into a class library, allowing you to easily create speech recognition applications by simply implementing the necessary interfaces. First, please see How to Use the Real-time Speech Recognition Library Wrp.