Packets and State Transitions
Packet List
The packets exchanged between client and server in the WebSocket speech recognition protocol are as follows:
| Packet Name | Related State | Description |
|---|---|---|
| s command packet | Audio Supply | Command to start sending audio data |
| s command response | Audio Supply | Response to start sending audio data command |
| p command packet | Audio Supply | Command to send audio data |
| p command response | Audio Supply | Response to send audio data command |
| e command packet | Audio Supply | Command to stop sending audio data |
| e command response | Audio Supply | Response to stop sending audio data command |
| S event packet | Utterance Detection | Notification of utterance start detection |
| E event packet | Utterance Detection | Notification of utterance end detection |
| C event packet | Speech Recognition | Notification of recognition process start |
| U event packet | - | Notification during recognition process |
| A/R event packet | Speech Recognition | Notification of recognition process results |
| G event packet | - | Notification of action results within server |
State Transition List
The state transitions that exist in the WebSocket speech recognition protocol are as follows:
- Audio Supply State Transition
- Utterance Detection State Transition
- Speech Recognition State Transition
1. Audio Supply State Transition
The states representing the status of audio data supply from client to server are as follows:
Audio Supply State Transition Table
| Packet Name | 0 Initialized [Initial State] | 1 starting | 2 started | 3 providing | 4 ending |
|---|---|---|---|---|---|
| s command packet | Start supply → 1 | (ERROR) | (ERROR) | (ERROR) | (ERROR) |
| s command response (success) | - | (OK) → 2 | - | - | - |
| s command response (failure) | - | (ERROR) → 0 | - | - | - |
| p command packet | (ERROR) | (ERROR) | Supplying → 3 | Supplying → 3 | (ERROR) |
| p command response (success) | - | - | - | (OK) → 3 | - |
| p command response (failure) | - | - | - | (ERROR) → 0 | - |
| e command packet | (ERROR) | (ERROR) | Stop supply → 4 | Stop supply → 4 | (ERROR) |
| e command response (success) | - | - | - | - | (OK) → 0 |
| e command response (failure) | - | - | - | - | (ERROR) → 0 |
Audio Supply State Transition Diagram

2. Utterance Detection State Transition
The states representing the status of utterance detection are as follows:
Utterance Detection State Transition Table
| Packet Name | 6 not-detecting [Initial State] | 7 detecting |
|---|---|---|
| S event packet | → 7 | - |
| E event packet | - | → 6 |
Utterance Detection State Transition Diagram
3. Speech Recognition State Transition
The states representing the status of speech recognition processing are as follows:
Speech Recognition State Transition Table
| Packet Name | 8 not-recognizing [Initial State] | 9 recognizing |
|---|---|---|
| C event packet | → 9 | - |
| A/R event packet | - | → 8 |
Speech Recognition State Transition Diagram