How to Use the Real-time Speech Recognition Library Wrp

The Wrp library allows you to develop real-time applications using the WebSocket interface of AmiVoice API with an interface similar to AmiVoice SDK. You can send streaming audio and receive results sequentially. This library is available in languages such as Java, C#, C++, Python, and PHP.

Overview of Client Program

The flow of a program using Wrp is as follows:

Figure. Commands and Events

Methods

The client program performs the following processes in order. The corresponding Wrp methods are listed in parentheses.

Connect (connect)
Request speech recognition (feedDataResume)
Send audio data (feedData)
End audio data transmission (feedDataPause)
Disconnect (disconnect)

Events

Notification events for speech detection and speech recognition from the server are both obtained through methods of the listener class. There are five events as follows. The method names to be implemented in Wrp are listed in parentheses.

Event notified when the start of speech is detected utteranceStarted(startTime)
Event notified when the end of speech is detected utteranceEnded(endTime)
Event notified when speech recognition processing starts resultCreated()
Intermediate recognition result notification event resultUpdated(result)
Recognition result notification event resultFinalized(result)

Be sure to implement the recognition result notification event resultFinalized(result). Implement processing according to notifications from the server as needed for others.

Implementation Guide

We will explain how to use Wrp step by step while showing samples for each language.

The code examples shown below are all excerpts from WrpSimpleTester published in the GitHub repository advanced-media-inc/amivoice-api-client-library. For the entire code, please see the following source files:

For explanations on execution methods and file structures, please see the client library sample program WrpSimpleTester.

1. Initialization

Create an instance of the Wrp class.

Java
C#
C++
PHP
Python

		// Initialize WebSocket speech recognition server
		com.amivoice.wrp.Wrp wrp = com.amivoice.wrp.Wrp.construct();

		// Initialize WebSocket speech recognition server
		com.amivoice.wrp.Wrp wrp = com.amivoice.wrp.Wrp.construct();

    // Initialize WebSocket speech recognition server
    Pointer<com::amivoice::wrp::Wrp> wrp = com::amivoice::wrp::Wrp::construct();

		// Initialize WebSocket speech recognition server
		$wrp = com\amivoice\wrp\Wrp::construct();

		# Initialize WebSocket speech recognition server
		wrp = com.amivoice.wrp.Wrp.construct()

2. Implementing the Listener Class

Implement event handlers by inheriting the com.amivoice.wrp.WrpListener class.

The speech recognition results are obtained in the result argument of resultFinalized. For details, please see the WebSocket Interface of the speech recognition result format. Also, the recognition result text is encoded in UTF-8 and Unicode escaped. Please also see About Result Text.

In the code below, we implement logging to standard output in each method of utteranceStarted, utteranceEnded, resultCreated, resultUpdated, and resultFinalized. Set the instance of the listener that implements these methods to the wrp instance with wrp.setListener(listener). The Unicode escape of the result text is decoded with the text_ method. The complete code for the text_ method is published on GitHub.

Java
C#
C++
PHP
Python

public class WrpTester implements com.amivoice.wrp.WrpListener {
	public static void main(String[] args) {
        // Create WebSocket speech recognition server event listener
		com.amivoice.wrp.WrpListener listener = new WrpTester(verbose);
		wrp.setListener(listener);
	}

	@Override
	public void utteranceStarted(int startTime) {
		System.out.println("S " + startTime);
	}

	@Override
	public void utteranceEnded(int endTime) {
		System.out.println("E " + endTime);
	}

	@Override
	public void resultCreated() {
		System.out.println("C");
	}

	@Override
	public void resultUpdated(String result) {
		System.out.println("U " + result);
		String text = text_(result);
		if (text != null) {
			System.out.println("   -> " + text);
		}
	}

	@Override
	public void resultFinalized(String result) {
		System.out.println("F " + result);
		String text = text_(result);
		if (text != null) {
			System.out.println("   -> " + text);
		}
	}

public class WrpTester : com.amivoice.wrp.WrpListener {
	public static void Main(string[] args) {
		// Create WebSocket speech recognition server event listener
		com.amivoice.wrp.WrpListener listener = new WrpTester(verbose);
		wrp.setListener(listener);
	}

	public void utteranceStarted(int startTime) {
		Console.WriteLine("S " + startTime);
	}

	public void utteranceEnded(int endTime) {
		Console.WriteLine("E " + endTime);
	}

	public void resultCreated() {
		Console.WriteLine("C");
	}

	public void resultUpdated(string result) {
		Console.WriteLine("U " + result);
		string text = text_(result);
		if (text != null) {
			Console.WriteLine("   -> " + text);
		}
	}

	public void resultFinalized(string result) {
		Console.WriteLine("F " + result);
		string text = text_(result);
		if (text != null) {
			Console.WriteLine("   -> " + text);
		}
	}
}

class WrpTester : private com::amivoice::wrp::WrpListener {
	public: static void main(const StringList& args) {
            // Create WebSocket speech recognition server event listener
			Pointer<com::amivoice::wrp::WrpListener> listener = new WrpTester(verbose);

			wrp->setListener(listener);
    }

	public: void utteranceStarted(int startTime) override {
		print("S %d", startTime);
	}

	public: void utteranceEnded(int endTime) override {
		print("E %d", endTime);
	}

	public: void resultCreated() override {
		print("C");
	}

	public: void resultUpdated(const char* result) override {
		print("U %s", String().fromUTF8(result).to());
		unsigned short* text = text_(result);
		if (text != NULL) {
			print("   -> %s", String().fromUTF16(text).to());
			delete[] text;
		}
	}

	public: void resultFinalized(const char* result) override {
		print("F %s", String().fromUTF8(result).to());
		unsigned short* text = text_(result);
		if (text != NULL) {
			print("   -> %s", String().fromUTF16(text).to());
			delete[] text;
		}
	}
}

Please see GitHub for the implementation of String().fromUTF8(result).to().

class WrpTester implements com\amivoice\wrp\WrpListener {
	public static function main($args) {
        // Create WebSocket speech recognition server event listener
		$listener = new WrpTester($verbose);
		$wrp->setListener($listener);
    }

	public function utteranceStarted($startTime) {
		p("S " . $startTime);
	}

	public function utteranceEnded($endTime) {
		p("E " . $endTime);
	}

	public function resultCreated() {
		p("C");
	}

	public function resultUpdated($result) {
		p("U " . $result);
		$text = $this->text_($result);
		if ($text !== null) {
			p("   -> " . $text);
		}
	}

	public function resultFinalized($result) {
		p("F " . $result);
		$text = $this->text_($result);
		if ($text !== null) {
			p("   -> " . $text);
		}
	}
}

'p' is implemented as follows:

function p($s = "") {
	print s($s) . "\n";
}

class WrpTester(com.amivoice.wrp.WrpListener):
  @staticmethod
  def main(args):
    # Create WebSocket speech recognition server event listener
    listener = WrpTester(verbose)

    # Set listener to wrp object
    wrp.setListener(listener)

  def utteranceStarted(self, startTime):
    print("S %d" % startTime)

  def utteranceEnded(self, endTime):
    print("E %d" % endTime)

  def resultCreated(self):
    print("C")

  def resultUpdated(self, result):
      print("U %s" % result)
      text = self.text_(result)
      if text != None:
          print("   -> %s" % text)

  def resultFinalized(self, result):
      print("F %s" % result)
      text = self.text_(result)
      if text != None:
          print("   -> %s" % text)

3. Connection (`connect`)

Connect to the speech recognition server.

The parameter that must be set before calling this method is as follows:

serverURL ... WebSocket interface endpoint

Please specify the following URL:

wss://acp-api.amivoice.com/v1/　　　  (logging)
wss://acp-api.amivoice.com/v1/nolog/ (no logging)

You can adjust the behavior by setting the following parameters:

proxyServerName ... Specify when the client program connects through a proxy server
connectTimeout ... Connection timeout with the server. Unit is milliseconds.
receiveTimeout ... Timeout for receiving data from the server. Unit is milliseconds.

For server-side timeouts, please see Limitations.

The following code connects to the speech recognition server. If verbose is false, error messages are not displayed.

Java
C#
C++
PHP
Python

		wrp.setServerURL(serverURL);
		wrp.setProxyServerName(proxyServerName);
		wrp.setConnectTimeout(connectTimeout);
		wrp.setReceiveTimeout(receiveTimeout);

		// Connect to WebSocket speech recognition server
		if (!wrp.connect()) {
			if (!verbose) {
				System.out.println(wrp.getLastMessage());
			}
			System.out.println("WebSocket 音声認識サーバ " + serverURL + " への接続に失敗しました。");
			return;
		}

		wrp.setServerURL(serverURL);
		wrp.setProxyServerName(proxyServerName);
		wrp.setConnectTimeout(connectTimeout);
		wrp.setReceiveTimeout(receiveTimeout);

		// Connect to WebSocket speech recognition server
		if (!wrp.connect()) {
			if (!verbose) {
				Console.WriteLine(wrp.getLastMessage());
			}
			Console.WriteLine("WebSocket 音声認識サーバ " + serverURL + " への接続に失敗しました。");
			return;
		}

			wrp->setServerURL(serverURL);
			wrp->setProxyServerName(proxyServerName);
			wrp->setConnectTimeout(connectTimeout);
			wrp->setReceiveTimeout(receiveTimeout);

			// Connect to WebSocket speech recognition server
			if (!wrp->connect()) {
				if (!verbose) {
					print("%s", wrp->getLastMessage());
				}
				print("WebSocket 音声認識サーバ %s への接続に失敗しました。", serverURL);
				return;
			}

		$wrp->setServerURL($serverURL);
		$wrp->setProxyServerName($proxyServerName);
		$wrp->setConnectTimeout($connectTimeout);
		$wrp->setReceiveTimeout($receiveTimeout);

		// Connect to WebSocket speech recognition server
		if (!$wrp->connect()) {
			if (!$verbose) {
				p($wrp->getLastMessage());
			}
			p("WebSocket 音声認識サーバ " . $serverURL . " への接続に失敗しました。");
			return;
		}

		wrp.setServerURL(serverURL)
		wrp.setProxyServerName(proxyServerName)
		wrp.setConnectTimeout(connectTimeout)
		wrp.setReceiveTimeout(receiveTimeout)

		# Connect to WebSocket speech recognition server
		if not wrp.connect():
			if not verbose:
				print(wrp.getLastMessage())
			print(u"WebSocket 音声認識サーバ %s への接続に失敗しました。" % serverURL)
			return

4. Speech Recognition Request (`feedDataResume`)

Send a speech recognition request. This method blocks until the connection to the appropriate speech recognition server specified in the request parameters is established and word registration preparation is complete. If it fails, it returns an error. For details, please see the error messages of the s command response packet.

The following parameters must be set before calling this method:

codec ... Specifies the audio format. For supported audio formats, please see Speech Recognition Result Formats.
grammarFileNames ... Connection engine name. For supported engines, please see Speech Recognition Engines.
authorization ... Authentication information. Specify the APPKEY listed on your My Page. Or specify a One-time APPKEY.

The behavior can be adjusted by setting the following parameters:

profileId
profileWords
keepFillerToken
segmenterProperties
resultUpdatedInterval

For details, please see Parameter Details.

warning

To avoid deadlock, do not call the feedDataResume method from listener class methods such as resultFinalized that receive event notifications.

The following code sends a speech recognition request to the server based on the above parameters and displays a message to standard output if an error occurs. If verbose is false, error messages are not displayed.

Java
C#
C++
PHP
Python

        wrp.setCodec(codec);
        wrp.setGrammarFileNames(grammarFileNames);
        wrp.setAuthorization(authorization);

        // Start sending audio data to WebSocket speech recognition server
        if (!wrp.feedDataResume()) {
            if (!verbose) {
                System.out.println(wrp.getLastMessage());
            }
            System.out.println("WebSocket 音声認識サーバへの音声データの送信の開始に失敗しました。");
            break;
        }

        wrp.setCodec(codec);
        wrp.setAuthorization(authorization);
        wrp.setGrammarFileNames(grammarFileNames);

        // Start sending audio data to WebSocket speech recognition server
        if (!wrp.feedDataResume()) {
            if (!verbose) {
                Console.WriteLine(wrp.getLastMessage());
            }
            Console.WriteLine("WebSocket 音声認識サーバへの音声データの送信の開始に失敗しました。");
            break;
        }

        wrp->setCodec(codec);
        wrp->setAuthorization(authorization);
        wrp->setGrammarFileNames(grammarFileNames);

        // Start sending audio data to WebSocket speech recognition server
        if (!wrp->feedDataResume()) {
            if (!verbose) {
                print("%s", wrp->getLastMessage());
            }
            print("WebSocket 音声認識サーバへの音声データの送信の開始に失敗しました。");
            break;
        }

        $wrp->setCodec($codec);
        $wrp->setAuthorization($authorization);
        $wrp->setGrammarFileNames($grammarFileNames);

        // Start sending audio data to WebSocket speech recognition server
        if (!$wrp->feedDataResume()) {
            if (!$verbose) {
                p($wrp->getLastMessage());
            }
            p("WebSocket 音声認識サーバへの音声データの送信の開始に失敗しました。");
            break;
        }

        wrp.setCodec(codec)
        wrp.setAuthorization(authorization)
        wrp.setGrammarFileNames(grammarFileNames)

        # Start sending audio data to WebSocket speech recognition server
        if not wrp.feedDataResume():
            if not verbose:
                print(wrp.getLastMessage())
            print(u"WebSocket 音声認識サーバへの音声データの送信の開始に失敗しました。")
            break

5. Sending Audio Data (`feedData`)

Next, send the audio data. This method does not block. If an error occurs on the server side, it will return an error on the next method call. For details on the error content, please see the error messages of the p command response packet. Once you start sending audio data, the listener class methods will be called according to the server-side processing.

info

Please send audio data that matches the codec specified in feedDataResume. Even if the format is different, it will not result in an error, but the response may take a very long time or the recognition results may not be obtained correctly.
If speech cannot be detected from the sent audio data, the listener class methods will not be called. Please check the following possible reasons:
- No audio is included at all, or the volume is very low. Check if the recording system is not muted and if the volume settings are appropriate.
- The audio format and audio data do not match. Check the Audio Formats.

note

The maximum size of audio data that can be sent in one feedData method call is 16MB. If the data size is larger than that, please split it.
Audio data can be split at any point. There is no need to be aware of wav chunk boundaries or mp3, flac, opus frame boundaries.
You cannot change the format of the data being sent midway. If you want to change the audio format, end with the e command and start a new speech recognition request from s. The same applies to audio files with headers; end with the e command for each file and start a new speech recognition request from s.

The following code reads audio data from the audio data file specified by audioFileName and sends it to the WebSocket speech recognition server. If sleepTime is -1, it sleeps until the number of waiting recognition results becomes 1 or less. If verbose is false, error messages are not displayed.

Java
C#
C++
PHP
Python

					try (FileInputStream audioStream = new FileInputStream(audioFileName)) {
						// Read audio data from audio data file
						byte[] audioData = new byte[4096];
						int audioDataReadBytes = audioStream.read(audioData, 0, audioData.length);
						while (audioDataReadBytes > 0) {
							// Check if sleep time has been calculated
							if (sleepTime >= 0) {
								// If sleep time has been calculated...
								// Sleep for a short time
								wrp.sleep(sleepTime);
							} else {
								// If sleep time has not been calculated...
								// Sleep for a short time
								wrp.sleep(1);

								// Sleep until the number of waiting recognition results becomes 1 or less
								int maxSleepTime = 50000;
								while (wrp.getWaitingResults() > 1 && maxSleepTime > 0) {
									wrp.sleep(100);
									maxSleepTime -= 100;
								}
							}

							// Send audio data to WebSocket speech recognition server
							if (!wrp.feedData(audioData, 0, audioDataReadBytes)) {
								if (!verbose) {
									System.out.println(wrp.getLastMessage());
								}
								System.out.println("WebSocket 音声認識サーバへの音声データの送信に失敗しました。");
								break;
							}

							// Read audio data from audio data file
							audioDataReadBytes = audioStream.read(audioData, 0, audioData.length);
						}
					} catch (IOException e) {
						System.out.println("音声データファイル " + audioFileName + " の読み込みに失敗しました。");
					}

					try {
						using (FileStream audioStream = new FileStream(audioFileName, FileMode.Open, FileAccess.Read)) {
							// Read audio data from audio data file
							byte[] audioData = new byte[4096];
							int audioDataReadBytes = audioStream.Read(audioData, 0, audioData.Length);
							while (audioDataReadBytes > 0) {
								// Check if sleep time has been calculated
								if (sleepTime >= 0) {
									// If sleep time has been calculated...
									// Sleep for a short time
									wrp.sleep(sleepTime);
								} else {
									// If sleep time has not been calculated...
									// Sleep for a short time
									wrp.sleep(1);

									// Sleep until the number of waiting recognition results becomes 1 or less
									int maxSleepTime = 50000;
									while (wrp.getWaitingResults() > 1 && maxSleepTime > 0) {
										wrp.sleep(100);
										maxSleepTime -= 100;
									}
								}

								// Send audio data to WebSocket speech recognition server
								if (!wrp.feedData(audioData, 0, audioDataReadBytes)) {
									if (!verbose) {
										Console.WriteLine(wrp.getLastMessage());
									}
									Console.WriteLine("WebSocket 音声認識サーバへの音声データの送信に失敗しました。");
									break;
								}

								// Read audio data from audio data file
								audioDataReadBytes = audioStream.Read(audioData, 0, audioData.Length);
							}
						}
					} catch (IOException) {
						Console.WriteLine("音声データファイル " + audioFileName + " の読み込みに失敗しました。");
					}

					// Open audio data file
					FILE* audioStream;
					if (fopen_s(&audioStream, audioFileName->to(), "rb") == 0) {
						// Read audio data from audio data file
						char audioData[4096];
						int audioDataReadBytes = (int)fread(audioData, 1, 4096, audioStream);
						while (audioDataReadBytes > 0) {
							// Check if sleep time has been calculated
							if (sleepTime >= 0) {
								// If sleep time has been calculated...
								// Sleep for a short time
								wrp->sleep(sleepTime);
							} else {
								// If sleep time has not been calculated...
								// Sleep for a short time
								wrp->sleep(1);

								// Sleep until the number of waiting recognition results becomes 1 or less
								int maxSleepTime = 50000;
								while (wrp->getWaitingResults() > 1 && maxSleepTime > 0) {
									wrp->sleep(100);
									maxSleepTime -= 100;
								}
							}

							// Send audio data to WebSocket speech recognition server
							if (!wrp->feedData(audioData, 0, audioDataReadBytes)) {
								if (!verbose) {
									print("%s", wrp->getLastMessage());
								}
								print("WebSocket 音声認識サーバへの音声データの送信に失敗しました。");
								break;
							}

							// Read audio data from audio data file
							audioDataReadBytes = (int)fread(audioData, 1, 4096, audioStream);
						}

						// Close audio data file
						fclose(audioStream);
					} else {
						print("音声データファイル %s の読み込みに失敗しました。", audioFileName->to());
					}

					$audioStream = false;
					try {
						// Open audio data file
						$audioStream = @fopen($audioFileName, "rb");
						if ($audioStream === false) {
							throw new \Exception();
						}

						// Read audio data from audio data file
						$audioData = @fread($audioStream, 4096);
						while ($audioData !== false && strlen($audioData) > 0) {
							// Check if sleep time has been calculated
							if ($sleepTime >= 0) {
								// If sleep time has been calculated...
								// Sleep for a short time
								$wrp->sleep($sleepTime);
							} else {
								// If sleep time has not been calculated...
								// Sleep for a short time
								$wrp->sleep(1);

								// Sleep until the number of waiting recognition results becomes 1 or less
								$maxSleepTime = 50000;
								while ($wrp->getWaitingResults() > 1 && $maxSleepTime > 0) {
									$wrp->sleep(100);
									$maxSleepTime -= 100;
								}
							}

							// Send audio data to WebSocket speech recognition server
							if (!$wrp->feedData($audioData, 0, strlen($audioData))) {
								if (!$verbose) {
									p($wrp->getLastMessage());
								}
								p("WebSocket 音声認識サーバへの音声データの送信に失敗しました。");
								break;
							}

							// Read audio data from audio data file
							$audioData = @fread($audioStream, 4096);
						}
					} catch (\Exception $e) {
						p("音声データファイル " . $audioFileName . " の読み込みに失敗しました。");
					} finally {
						// Close audio data file
						if ($audioStream !== false) {
							@fclose($audioStream);
							$audioStream = false;
						}
					}

                    try:
                        with open(audioFileName, "rb") as audioStream:
                            # Read audio data from audio data file
                            audioData = audioStream.read(4096)
                            while len(audioData) > 0:
                                # Check if sleep time has been calculated
                                if sleepTime >= 0:
                                    # If sleep time has been calculated...
                                    # Sleep for a short time
                                    wrp.sleep(sleepTime)
                                else:
                                    # If sleep time has not been calculated...
                                    # Sleep for a short time
                                    wrp.sleep(1)

                                    # Sleep until the number of waiting recognition results becomes 1 or less
                                    maxSleepTime = 50000
                                    while wrp.getWaitingResults() > 1 and maxSleepTime > 0:
                                        wrp.sleep(100)
                                        maxSleepTime -= 100

                                # Send audio data to WebSocket speech recognition server
                                if not wrp.feedData(audioData, 0, len(audioData)):
                                    if not verbose:
                                        print(wrp.getLastMessage())
                                    print(u"WebSocket 音声認識サーバへの音声データの送信に失敗しました。")
                                    break

                                # Read audio data from audio data file
                                audioData = audioStream.read(4096)
                    except:
                        print(u"音声データファイル %s の読み込みに失敗しました。" % audioFileName)

6. End of Audio Data Transmission (`feedDataPause`)

Call this when audio data transmission is complete. This method blocks until speech recognition processing is finished. The request may fail for some reason. For details, please see the error messages of the e command response packet.

warning

To avoid deadlock, do not call the feedDataPause method from listener class methods such as resultFinalized that receive event notifications.

info

If you send all the audio data at once instead of streaming, the speech recognition processing will take time, so it will take a while for the results to return. Please expect it to take about 0.5 to 1.5 times the length of the audio you sent.

The following code informs the server that all audio data has been sent and blocks until the results are returned. If verbose is false, error messages will not be displayed.

Java
C#
C++
PHP
Python

					// Completion of sending audio data to the WebSocket speech recognition server
					if (!wrp.feedDataPause()) {
						if (!verbose) {
							System.out.println(wrp.getLastMessage());
						}
						System.out.println("WebSocket 音声認識サーバへの音声データの送信の完了に失敗しました。");
						break;
					}

					// Completion of sending audio data to the WebSocket speech recognition server
					if (!wrp.feedDataPause()) {
						if (!verbose) {
							Console.WriteLine(wrp.getLastMessage());
						}
						Console.WriteLine("WebSocket 音声認識サーバへの音声データの送信の完了に失敗しました。");
						break;
					}

					// Completion of sending audio data to the WebSocket speech recognition server
					if (!wrp->feedDataPause()) {
						if (!verbose) {
							print("%s", wrp->getLastMessage());
						}
						print("WebSocket 音声認識サーバへの音声データの送信の完了に失敗しました。");
						break;
					}

					// Completion of sending audio data to the WebSocket speech recognition server
					if (!$wrp->feedDataPause()) {
						if (!$verbose) {
							p($wrp->getLastMessage());
						}
						p("WebSocket 音声認識サーバへの音声データの送信の完了に失敗しました。");
						break;
					}

					# Completion of sending audio data to the WebSocket speech recognition server
					if not wrp.feedDataPause():
						if not verbose:
							print(wrp.getLastMessage())
						print(u"WebSocket 音声認識サーバへの音声データの送信の完了に失敗しました。")
						break

7. Disconnection (`disconnect`)

Finally, disconnect from the speech recognition server.

Java
C#
C++
PHP
Python

			// Disconnection from the WebSocket speech recognition server
			wrp.disconnect();

			// Disconnection from the WebSocket speech recognition server
			wrp.disconnect();

			// Disconnection from the WebSocket speech recognition server
			wrp->disconnect();

			// Disconnection from the WebSocket speech recognition server
			$wrp->disconnect();

			# Disconnection from the WebSocket speech recognition server
			wrp.disconnect()

Client Program State Transition

The client program undergoes the following state transitions.

Overview of Client Program​

Methods​

Events​

Implementation Guide​

1. Initialization​

2. Implementing the Listener Class​

3. Connection (connect)​

4. Speech Recognition Request (feedDataResume)​

5. Sending Audio Data (feedData)​

6. End of Audio Data Transmission (feedDataPause)​

7. Disconnection (disconnect)​

Client Program State Transition​

Other Documentation​