Speech Recognition Engines
AmiVoice API provides multiple speech recognition engines tailored for various languages and purposes. By selecting the most suitable speech recognition engine for the audio you want to recognize, you can improve accuracy. This section explains the languages supported by the speech recognition engines, types of engines, and key points for choosing the appropriate one.
We plan to consolidate and reorganize medical engines on February 1, 2025. The explanation after the consolidation is described in the "Note" section.
List of Speech Recognition Engines
Here is a list of speech recognition engines provided by AmiVoice API.
Language | Engine Name | Language Model | Supported Sampling Rates | Connection Engine Name |
---|---|---|---|---|
Japanese | 会話_汎用 | General | 8k / 16k | -a-general |
Japanese | 会話_医療 | Medical | 16k | -a-medgeneral Planned to change to -a-medical (*2) |
Japanese | 会話_製薬 | Pharmaceutical | 16k | -a-bizmrreport |
Japanese | 会話_金融 | Finance | 16k | -a-bizfinance |
Japanese | 会話_保険 | Insurance | 16k | -a-bizinsurance |
Japanese | 音声入力_汎用 | Large-scale General | 16k | -a-general-input |
Japanese | 音声入力_医療 | Medical | 16k | -a-medgeneral-input Planned to change to -a-medical-input (*2) |
Japanese | 音声入力_製薬 | Pharmaceutical | 16k | -a-bizmrreport-input Planned to be integrated into -a-medical-input (*2) |
Japanese | 音声入力_保険 | Insurance | 16k | -a-bizinsurance-input |
Japanese | 音声入力_金融 | Finance | 16k | -a-bizfinance-input |
Japanese | 音声入力_電子カルテ | Electronic Medical Records | 16k | -a-medkarte-input Planned to be integrated into -a-medical-input (*2) |
English | 英語_汎用 | General | 8k(*3) / 16k | -a-general-en |
Chinese | 中国語_汎用 | General | 8k(*3) / 16k | -a-general-zh |
Korean(*1) | 韓国語_汎用 | General | 8k(*3) / 16k | -a-general-ko |
- (*1) Korean is not supported for asynchronous API. Support is planned for the future.
- (*2) Planned to change on February 1, 2025. Users who have been using the old engine names until October 30, 2024, will be able to continue using the old connection engine names even after the change. While this won't affect the operation of applications, we recommend using the new engine names. The new engines will be available from November 1.
The correspondence between current speech recognition engines and new engines is as shown in the table below.
Current | After Change | ||
Speech Recognition Engine Name | Connection Engine Name | Speech Recognition Engine Name | Connection Engine Name |
会話_医療 | -a-medgeneral | 会話_医療 | -a-medical |
会話_製薬 | -a-bizmrreport | ||
音声入力_医療 | -a-medgeneral-input | 音声入力_医療 | -a-medical-input |
音声入力_製薬 | -a-bizmrreport-input | ||
音声入力_電子カルテ | -a-medkarte-input |
- (*3) 8k engines for English, Chinese, and Korean are not supported for asynchronous API. Support is planned for the future.
Engine Name
Japanese speech recognition engines provide multiple engines combining purpose (acoustic model) and language model.
Purpose
There are "会話" engines optimized for transcribing natural conversations between people, and "音声入力" engines optimized for when people speak to machines. Each uses acoustic models trained on different datasets. However, the purpose is not just about the difference in acoustic models, but also includes optimizations for each specific use case.
Characteristics and Points to Note
"会話" engines are designed to easily remove unnecessary words like "えーっと" or "あのー". In the standard settings, these unnecessary words are recognized and automatically removed. You can also configure settings to deliberately display these unnecessary words. Please see Specifying Filler Word Output. When using "