How to Use the Real-time Speech Recognition Library Wrp
The Wrp
library allows you to develop real-time applications using the WebSocket interface of AmiVoice API with an interface similar to AmiVoice SDK. You can send streaming audio and receive results sequentially. This library is available in languages such as Java, C#, C++, Python, and PHP.
Overview of Client Program
The flow of a program using Wrp
is as follows:
Methods
The client program performs the following processes in order. The corresponding Wrp
methods are listed in parentheses.
- Connect (
connect
) - Request speech recognition (
feedDataResume
) - Send audio data (
feedData
) - End audio data transmission (
feedDataPause
) - Disconnect (
disconnect
)
Events
Notification events for speech detection and speech recognition from the server are both obtained through methods of the listener class. There are five events as follows. The method names to be implemented in Wrp
are listed in parentheses.
- Event notified when the start of speech is detected
utteranceStarted(startTime)
- Event notified when the end of speech is detected
utteranceEnded(endTime)
- Event notified when speech recognition processing starts
resultCreated()
- Intermediate recognition result notification event
resultUpdated(result)
- Recognition result notification event
resultFinalized(result)
Be sure to implement the recognition result notification event resultFinalized(result)
. Implement processing according to notifications from the server as needed for others.
Implementation Guide
We will explain how to use Wrp
step by step while showing samples for each language.
The code examples shown below are all excerpts from WrpSimpleTester published in the GitHub repository advanced-media-inc/amivoice-api-client-library. For the entire code, please see the following source files:
For explanations on execution methods and file structures, please see the client library sample program WrpSimpleTester.
1. Initialization
Create an instance of the Wrp
class.
- Java
- C#
- C++
- PHP
- Python
// Initialize WebSocket speech recognition server
com.amivoice.wrp.Wrp wrp = com.amivoice.wrp.Wrp.construct();
// Initialize WebSocket speech recognition server
com.amivoice.wrp.Wrp wrp = com.amivoice.wrp.Wrp.construct();
// Initialize WebSocket speech recognition server
Pointer<com::amivoice::wrp::Wrp> wrp = com::amivoice::wrp::Wrp::construct();
// Initialize WebSocket speech recognition server
$wrp = com\amivoice\wrp\Wrp::construct();
# Initialize WebSocket speech recognition server
wrp = com.amivoice.wrp.Wrp.construct()
2. Implementing the Listener Class
Implement event handlers by inheriting the com.amivoice.wrp.WrpListener
class.
The speech recognition results are obtained in the result
argument of resultFinalized
. For details, please see the WebSocket Interface of the speech recognition result format. Also, the recognition result text is encoded in UTF-8 and Unicode escaped. Please also see About Result Text.
In the code below, we implement logging to standard output in each method of utteranceStarted
, utteranceEnded
, resultCreated
, resultUpdated
, and resultFinalized
. Set the instance of the listener that implements these methods to the wrp
instance with wrp.setListener(listener)
. The Unicode escape of the result text is decoded with the text_
method. The complete code for the text_
method is published on GitHub.
- Java
- C#
- C++
- PHP
- Python
public class WrpTester implements com.amivoice.wrp.WrpListener {
public static void main(String[] args) {
// Create WebSocket speech recognition server event listener
com.amivoice.wrp.WrpListener listener = new WrpTester(verbose);
wrp.setListener(listener);
}
@Override
public void utteranceStarted(int startTime) {
System.out.println("S " + startTime);
}
@Override
public void utteranceEnded(int endTime) {
System.out.println("E " + endTime);
}
@Override
public void resultCreated() {
System.out.println("C");
}
@Override
public void resultUpdated(String result) {
System.out.println("U " + result);
String text = text_(result);
if (text != null) {
System.out.println(" -> " + text);
}
}
@Override
public void resultFinalized(String result) {
System.out.println("F " + result);
String text = text_(result);
if (text != null) {
System.out.println(" -> " + text);
}
}
public class WrpTester : com.amivoice.wrp.WrpListener {
public static void Main(string[] args) {
// Create WebSocket speech recognition server event listener
com.amivoice.wrp.WrpListener listener = new WrpTester(verbose);
wrp.setListener(listener);
}
public void utteranceStarted(int startTime) {
Console.WriteLine("S " + startTime);
}
public void utteranceEnded(int endTime) {
Console.WriteLine("E " + endTime);
}
public void resultCreated() {
Console.WriteLine("C");
}
public void resultUpdated(string result) {
Console.WriteLine("U " + result);
string text = text_(result);
if (text != null) {
Console.WriteLine(" -> " + text);
}
}
public void resultFinalized(string result) {
Console.WriteLine("F " + result);
string text = text_(result);
if (text != null) {
Console.WriteLine(" -> " + text);
}
}
}
class WrpTester : private com::amivoice::wrp::WrpListener {
public: static void main(const StringList& args) {
// Create WebSocket speech recognition server event listener
Pointer<com::amivoice::wrp::WrpListener> listener = new WrpTester(verbose);
wrp->setListener(listener);
}
public: void utteranceStarted(int startTime) override {
print("S %d", startTime);
}
public: void utteranceEnded(int endTime) override {
print("E %d", endTime);
}
public: void resultCreated() override {
print("C");
}
public: void resultUpdated(const char* result) override {
print("U %s", String().fromUTF8(result).to());
unsigned short* text = text_(result);
if (text != NULL) {
print(" -> %s", String().fromUTF16(text).to());
delete[] text;
}
}
public: void resultFinalized(const char* result) override {
print("F %s", String().fromUTF8(result).to());
unsigned short* text = text_(result);
if (text != NULL) {
print(" -> %s", String().fromUTF16(text).to());
delete[] text;
}
}
}
Please see GitHub for the implementation of String().fromUTF8(result).to()
.
class WrpTester implements com\amivoice\wrp\WrpListener {
public static function main($args) {
// Create WebSocket speech recognition server event listener
$listener = new WrpTester($verbose);
$wrp->setListener($listener);
}
public function utteranceStarted($startTime) {
p("S " . $startTime);
}
public function utteranceEnded($endTime) {
p("E " . $endTime);
}
public function resultCreated() {
p("C");
}
public function resultUpdated($result) {
p("U " . $result);
$text = $this->text_($result);
if ($text !== null) {
p(" -> " . $text);
}
}
public function resultFinalized($result) {
p("F " . $result);
$text = $this->text_($result);
if ($text !== null) {
p(" -> " . $text);
}
}
}
'p' is implemented as follows:
function p($s = "") {
print s($s) . "\n";
}
class WrpTester(com.amivoice.wrp.WrpListener):
@staticmethod
def main(args):
# Create WebSocket speech recognition server event listener
listener = WrpTester(verbose)
# Set listener to wrp object
wrp.setListener(listener)
def utteranceStarted(self, startTime):
print("S %d" % startTime)
def utteranceEnded(self, endTime):
print("E %d" % endTime)
def resultCreated(self):
print("C")
def resultUpdated(self, result):
print("U %s" % result)
text = self.text_(result)
if text != None:
print(" -> %s" % text)
def resultFinalized(self, result):
print("F %s" % result)
text = self.text_(result)
if text != None:
print(" -> %s" % text)
3. Connection (connect
)
Connect to the speech recognition server.
The parameter that must be set before calling this method is as follows:
serverURL
... WebSocket interface endpoint
Please specify the following URL:
wss://acp-api.amivoice.com/v1/ (logging)
wss://acp-api.amivoice.com/v1/nolog/ (no logging)
You can adjust the behavior by setting the following parameters:
proxyServerName
... Specify when the client program connects through a proxy serverconnectTimeout
... Connection timeout with the server. Unit is milliseconds.receiveTimeout
... Timeout for receiving data from the server. Unit is milliseconds.
For server-side timeouts, please see Limitations.
The following code connects to the speech recognition server. If verbose
is false
, error messages are not displayed.
- Java
- C#
- C++
- PHP
- Python
wrp.setServerURL(serverURL);
wrp.setProxyServerName(proxyServerName);
wrp.setConnectTimeout(connectTimeout);
wrp.setReceiveTimeout(receiveTimeout);
// Connect to WebSocket speech recognition server
if (!wrp.connect()) {
if (!verbose) {
System.out.println(wrp.getLastMessage());
}
System.out.println("WebSocket 音声認識サーバ " + serverURL + " への接続に失敗しました。");
return;
}
wrp.setServerURL(serverURL);
wrp.setProxyServerName(proxyServerName);
wrp.setConnectTimeout(connectTimeout);
wrp.setReceiveTimeout(receiveTimeout);
// Connect to WebSocket speech recognition server
if (!wrp.connect()) {
if (!verbose) {
Console.WriteLine(wrp.getLastMessage());
}
Console.WriteLine("WebSocket 音声認識サーバ " + serverURL + " への接続に失敗しました。");
return;
}
wrp->setServerURL(serverURL);
wrp->setProxyServerName(proxyServerName);
wrp->setConnectTimeout(connectTimeout);
wrp->setReceiveTimeout(receiveTimeout);
// Connect to WebSocket speech recognition server
if (!wrp->connect()) {
if (!verbose) {
print("%s", wrp->getLastMessage());
}
print("WebSocket 音声認識サーバ %s への接続に失敗しました。", serverURL);
return;
}
$wrp->setServerURL($serverURL);
$wrp->setProxyServerName($proxyServerName);
$wrp->setConnectTimeout($connectTimeout);
$wrp->setReceiveTimeout($receiveTimeout);
// Connect to WebSocket speech recognition server
if (!$wrp->connect()) {
if (!$verbose) {
p($wrp->getLastMessage());
}
p("WebSocket 音声認識サーバ " . $serverURL . " への接続に失敗しました。");
return;
}
wrp.setServerURL(serverURL)
wrp.setProxyServerName(proxyServerName)
wrp.setConnectTimeout(connectTimeout)
wrp.setReceiveTimeout(receiveTimeout)
# Connect to WebSocket speech recognition server
if not wrp.connect():
if not verbose:
print(wrp.getLastMessage())
print(u"WebSocket 音声認識サーバ %s への接続に失敗しました。" % serverURL)
return
4. Speech Recognition Request (feedDataResume
)
Send a speech recognition request. This method blocks until the connection to the appropriate speech recognition server specified in the request parameters is established and word registration preparation is complete. If it fails, it returns an error. For details, please see the error messages of the s
command response packet.
The following parameters must be set before calling this method:
codec
... Specifies the audio format. For supported audio formats, please see Speech Recognition Result Formats.grammarFileNames
... Connection engine name. For supported engines, please see Speech Recognition Engines.authorization
... Authentication information. Specify the APPKEY listed on your My Page. Or specify a One-time APPKEY.
The behavior can be adjusted by setting the following parameters:
profileId
profileWords
keepFillerToken
segmenterProperties
resultUpdatedInterval
For details, please see Parameter Details.
To avoid deadlock, do not call the feedDataResume
method from listener class methods such as resultFinalized
that receive event notifications.
The following code sends a speech recognition request to the server based on the above parameters and displays a message to standard output if an error occurs. If verbose
is false
, error messages are not displayed.
- Java
- C#
- C++
- PHP
- Python
wrp.setCodec(codec);
wrp.setGrammarFileNames(grammarFileNames);
wrp.setAuthorization(authorization);
// Start sending audio data to WebSocket speech recognition server
if (!wrp.feedDataResume()) {
if (!verbose) {
System.out.println(wrp.getLastMessage());
}
System.out.println("WebSocket 音声認識サーバへの音声データの送信の開始に失敗しました。");
break;
}
wrp.setCodec(codec);
wrp.setAuthorization(authorization);
wrp.setGrammarFileNames(grammarFileNames);
// Start sending audio data to WebSocket speech recognition server
if (!wrp.feedDataResume()) {
if (!verbose) {
Console.WriteLine(wrp.getLastMessage());
}
Console.WriteLine("WebSocket 音声認識サーバへの音声データの送信の開始に失敗しました。");
break;
}
wrp->setCodec(codec);
wrp->setAuthorization(authorization);
wrp->setGrammarFileNames(grammarFileNames);
// Start sending audio data to WebSocket speech recognition server
if (!wrp->feedDataResume()) {
if (!verbose) {
print("%s", wrp->getLastMessage());
}
print("WebSocket 音声認識サーバへの音声データの送信の開始に失敗しました。");
break;
}
$wrp->setCodec($codec);
$wrp->setAuthorization($authorization);
$wrp->setGrammarFileNames($grammarFileNames);
// Start sending audio data to WebSocket speech recognition server
if (!$wrp->feedDataResume()) {
if (!$verbose) {
p($wrp->getLastMessage());
}
p("WebSocket 音声認識サーバへの音声データの送信の開始に失敗しました。");
break;
}
wrp.setCodec(codec)
wrp.setAuthorization(authorization)
wrp.setGrammarFileNames(grammarFileNames)
# Start sending audio data to WebSocket speech recognition server
if not wrp.feedDataResume():
if not verbose:
print(wrp.getLastMessage())
print(u"WebSocket 音声認識サーバへの音声データの送信の開 始に失敗しました。")
break
5. Sending Audio Data (feedData
)
Next, send the audio data. This method does not block. If an error occurs on the server side, it will return an error on the next method call. For details on the error content, please see the error messages of the p
command response packet. Once you start sending audio data, the listener class methods will be called according to the server-side processing.
- Please send audio data that matches the
codec
specified infeedDataResume
. Even if the format is different, it will not result in an error, but the response may take a very long time or the recognition results may not be obtained correctly. - If speech cannot be detected from the sent audio data, the listener class methods will not be called. Please check the following possible reasons:
- No audio is included at all, or the volume is very low. Check if the recording system is not muted and if the volume settings are appropriate.
- The audio format and audio data do not match. Check the Audio Formats.
- The maximum size of audio data that can be sent in one
feedData
method call is 16MB. If the data size is larger than that, please split it. - Audio data can be split at any point. There is no need to be aware of wav chunk boundaries or mp3, flac, opus frame boundaries.
- You cannot change the format of the data being sent midway. If you want to change the audio format, end with the
e
command and start a new speech recognition request froms
. The same applies to audio files with headers; end with thee
command for each file and start a new speech recognition request froms
.
The following code reads audio data from the audio data file specified by audioFileName
and sends it to the WebSocket speech recognition server. If sleepTime
is -1
, it sleeps until the number of waiting recognition results becomes 1 or less. If verbose
is false
, error messages are not displayed.
- Java
- C#
- C++
- PHP
- Python
try (FileInputStream audioStream = new FileInputStream(audioFileName)) {
// Read audio data from audio data file
byte[] audioData = new byte[4096];
int audioDataReadBytes = audioStream.read(audioData, 0, audioData.length);
while (audioDataReadBytes > 0) {
// Check if sleep time has been calculated
if (sleepTime >= 0) {
// If sleep time has been calculated...
// Sleep for a short time
wrp.sleep(sleepTime);
} else {
// If sleep time has not been calculated...
// Sleep for a short time
wrp.sleep(1);
// Sleep until the number of waiting recognition results becomes 1 or less
int maxSleepTime = 50000;
while (wrp.getWaitingResults() > 1 && maxSleepTime > 0) {
wrp.sleep(100);
maxSleepTime -= 100;
}
}
// Send audio data to WebSocket speech recognition server
if (!wrp.feedData(audioData, 0, audioDataReadBytes)) {
if (!verbose) {
System.out.println(wrp.getLastMessage());
}
System.out.println("WebSocket 音声認識サーバへの音声データの送信に失敗しました。");
break;
}
// Read audio data from audio data file
audioDataReadBytes = audioStream.read(audioData, 0, audioData.length);
}
} catch (IOException e) {
System.out.println("音声データファイル " + audioFileName + " の読み込みに失敗しました。");
}
try {
using (FileStream audioStream = new FileStream(audioFileName, FileMode.Open, FileAccess.Read)) {
// Read audio data from audio data file
byte[] audioData = new byte[4096];
int audioDataReadBytes = audioStream.Read(audioData, 0, audioData.Length);
while (audioDataReadBytes > 0) {
// Check if sleep time has been calculated
if (sleepTime >= 0) {
// If sleep time has been calculated...
// Sleep for a short time
wrp.sleep(sleepTime);
} else {
// If sleep time has not been calculated...
// Sleep for a short time
wrp.sleep(1);
// Sleep until the number of waiting recognition results becomes 1 or less
int maxSleepTime = 50000;
while (wrp.getWaitingResults() > 1 && maxSleepTime > 0) {
wrp.sleep(100);
maxSleepTime -= 100;
}
}
// Send audio data to WebSocket speech recognition server
if (!wrp.feedData(audioData, 0, audioDataReadBytes)) {
if (!verbose) {
Console.WriteLine(wrp.getLastMessage());
}
Console.WriteLine("WebSocket 音声認識サーバへの音声データの送信に失敗しました。");
break;
}
// Read audio data from audio data file
audioDataReadBytes = audioStream.Read(audioData, 0, audioData.Length);
}
}
} catch (IOException) {
Console.WriteLine("音声データファイル " + audioFileName + " の読み込みに失敗しました。");
}
// Open audio data file
FILE* audioStream;
if (fopen_s(&audioStream, audioFileName->to(), "rb") == 0) {
// Read audio data from audio data file
char audioData[4096];
int audioDataReadBytes = (int)fread(audioData, 1, 4096, audioStream);
while (audioDataReadBytes > 0) {
// Check if sleep time has been calculated
if (sleepTime >= 0) {
// If sleep time has been calculated...
// Sleep for a short time
wrp->sleep(sleepTime);
} else {
// If sleep time has not been calculated...
// Sleep for a short time
wrp->sleep(1);
// Sleep until the number of waiting recognition results becomes 1 or less
int maxSleepTime = 50000;
while (wrp->getWaitingResults() > 1 && maxSleepTime > 0) {
wrp->sleep(100);
maxSleepTime -= 100;
}
}
// Send audio data to WebSocket speech recognition server
if (!wrp->feedData(audioData, 0, audioDataReadBytes)) {
if (!verbose) {
print("%s", wrp->getLastMessage());
}
print("WebSocket 音声認識サーバへの音声データの送信に失敗しました。");
break;
}
// Read audio data from audio data file
audioDataReadBytes = (int)fread(audioData, 1, 4096, audioStream);
}
// Close audio data file
fclose(audioStream);
} else {
print("音声データファイル %s の読み込みに失敗しました。", audioFileName->to());
}
$audioStream = false;
try {
// Open audio data file
$audioStream = @fopen($audioFileName, "rb");
if ($audioStream === false) {
throw new \Exception();
}
// Read audio data from audio data file
$audioData = @fread($audioStream, 4096);
while ($audioData !== false && strlen($audioData) > 0) {
// Check if sleep time has been calculated
if ($sleepTime >= 0) {
// If sleep time has been calculated...
// Sleep for a short time
$wrp->sleep($sleepTime);
} else {
// If sleep time has not been calculated...
// Sleep for a short time
$wrp->sleep(1);
// Sleep until the number of waiting recognition results becomes 1 or less
$maxSleepTime = 50000;
while ($wrp->getWaitingResults() > 1 && $maxSleepTime > 0) {
$wrp->sleep(100);
$maxSleepTime -= 100;
}
}
// Send audio data to WebSocket speech recognition server
if (!$wrp->feedData($audioData, 0, strlen($audioData))) {
if (!$verbose) {
p($wrp->getLastMessage());
}
p("WebSocket 音声認識サーバへの音声データの送信に失敗しました。");
break;
}
// Read audio data from audio data file
$audioData = @fread($audioStream, 4096);
}
} catch (\Exception $e) {
p("音声データファイル " . $audioFileName . " の読み込みに失敗しました。");
} finally {
// Close audio data file
if ($audioStream !== false) {
@fclose($audioStream);
$audioStream = false;
}
}
try:
with open(audioFileName, "rb") as audioStream:
# Read audio data from audio data file
audioData = audioStream.read(4096)
while len(audioData) > 0:
# Check if sleep time has been calculated
if sleepTime >= 0:
# If sleep time has been calculated...
# Sleep for a short time
wrp.sleep(sleepTime)
else:
# If sleep time has not been calculated...
# Sleep for a short time
wrp.sleep(1)
# Sleep until the number of waiting recognition results becomes 1 or less
maxSleepTime = 50000
while wrp.getWaitingResults() > 1 and maxSleepTime > 0:
wrp.sleep(100)
maxSleepTime -= 100
# Send audio data to WebSocket speech recognition server
if not wrp.feedData(audioData, 0, len(audioData)):
if not verbose:
print(wrp.getLastMessage())
print(u"WebSocket 音声認識サーバへの音声データの送信に失敗しました。")
break
# Read audio data from audio data file
audioData = audioStream.read(4096)
except:
print(u"音声データファイル %s の読み込みに失敗しました。" % audioFileName)
6. End of Audio Data Transmission (feedDataPause
)
Call this when audio data transmission is complete. This method blocks until speech recognition processing is finished. The request may fail for some reason. For details, please see the error messages of the e
command response packet.
To avoid deadlock, do not call the feedDataPause
method from listener class methods such as resultFinalized
that receive event notifications.
If you send all the audio data at once instead of streaming, the speech recognition processing will take time, so it will take a while for the results to return. Please expect it to take about 0.5 to 1.5 times the length of the audio you sent.
The following code informs the server that all audio data has been sent and blocks until the results are returned. If verbose
is false
, error messages will not be displayed.
- Java
- C#
- C++
- PHP
- Python
// Completion of sending audio data to the WebSocket speech recognition server
if (!wrp.feedDataPause()) {
if (!verbose) {
System.out.println(wrp.getLastMessage());
}
System.out.println("WebSocket 音声認識サーバへの音声データの送信の完了に失敗しました。");
break;
}
// Completion of sending audio data to the WebSocket speech recognition server
if (!wrp.feedDataPause()) {
if (!verbose) {
Console.WriteLine(wrp.getLastMessage());
}
Console.WriteLine("WebSocket 音声認識サーバへの音声データの送信の完了に失敗しました。");
break;
}
// Completion of sending audio data to the WebSocket speech recognition server
if (!wrp->feedDataPause()) {
if (!verbose) {
print("%s", wrp->getLastMessage());
}
print("WebSocket 音声認識サーバへの音声データの送信の完了に失敗しました。");
break;
}
// Completion of sending audio data to the WebSocket speech recognition server
if (!$wrp->feedDataPause()) {
if (!$verbose) {
p($wrp->getLastMessage());
}
p("WebSocket 音声認識サーバへの音声データの送信の完了に失敗しました。");
break;
}
# Completion of sending audio data to the WebSocket speech recognition server
if not wrp.feedDataPause():
if not verbose:
print(wrp.getLastMessage())
print(u"WebSocket 音声認識サーバへの音声データの送信の完了に失敗しました。")
break
7. Disconnection (disconnect
)
Finally, disconnect from the speech recognition server.
- Java
- C#
- C++
- PHP
- Python
// Disconnection from the WebSocket speech recognition server
wrp.disconnect();
// Disconnection from the WebSocket speech recognition server
wrp.disconnect();
// Disconnection from the WebSocket speech recognition server
wrp->disconnect();
// Disconnection from the WebSocket speech recognition server
$wrp->disconnect();
# Disconnection from the WebSocket speech recognition server
wrp.disconnect()
Client Program State Transition
The client program undergoes the following state transitions.
Other Documentation
- For information on how to obtain the Wrp client library source and sample programs, please see How to Obtain. Also please see How to Run Sample Programs and Directory and File Structure.
- For the reference of the
Wrp
class library, please see Wrp. - The
Wrp
class library uses the WebSocket interface of AmiVoice API. Also please see WebSocket Interface Overview.