Speech Recognition Engines

AmiVoice API provides multiple speech recognition engines tailored for various languages and purposes. By selecting the most suitable speech recognition engine for the audio you want to recognize, you can improve accuracy. This section explains the languages supported by the speech recognition engines, types of engines, and key points for choosing the appropriate one.

caution

We plan to consolidate and reorganize medical engines on February 1, 2025. The explanation after the consolidation is described in the "Note" section.

【AmiVoice API】 Notice of Planned Consolidation of Speech Recognition Engines for the Medical Industry

List of Speech Recognition Engines

Here is a list of speech recognition engines provided by AmiVoice API.

Language	Engine Name	Language Model	Supported Sampling Rates	Connection Engine Name
Japanese	会話_汎用	General	8k / 16k	`-a-general`
Japanese	会話_医療	Medical	16k	`-a-medgeneral` Planned to change to `-a-medical`(*2)
Japanese	会話_製薬	Pharmaceutical	16k	`-a-bizmrreport`
Japanese	会話_金融	Finance	16k	`-a-bizfinance`
Japanese	会話_保険	Insurance	16k	`-a-bizinsurance`
Japanese	音声入力_汎用	Large-scale General	16k	`-a-general-input`
Japanese	音声入力_医療	Medical	16k	`-a-medgeneral-input` Planned to change to `-a-medical-input`(*2)
Japanese	音声入力_製薬	Pharmaceutical	16k	`-a-bizmrreport-input` Planned to be integrated into `-a-medical-input`(*2)
Japanese	音声入力_保険	Insurance	16k	`-a-bizinsurance-input`
Japanese	音声入力_金融	Finance	16k	`-a-bizfinance-input`
Japanese	音声入力_電子カルテ	Electronic Medical Records	16k	`-a-medkarte-input` Planned to be integrated into `-a-medical-input`(*2)
English	英語_汎用	General	8k(*3) / 16k	`-a-general-en`
Chinese	中国語_汎用	General	8k(*3) / 16k	`-a-general-zh`
Korean(*1)	韓国語_汎用	General	8k(*3) / 16k	`-a-general-ko`

caution

(*1) Korean is not supported for asynchronous API. Support is planned for the future.
(*2) Planned to change on February 1, 2025. Users who have been using the old engine names until October 30, 2024, will be able to continue using the old connection engine names even after the change. While this won't affect the operation of applications, we recommend using the new engine names. The new engines will be available from November 1.

The correspondence between current speech recognition engines and new engines is as shown in the table below.

Current		After Change
Speech Recognition Engine Name	Connection Engine Name	Speech Recognition Engine Name	Connection Engine Name
会話_医療	`-a-medgeneral`	会話_医療	`-a-medical`
会話_製薬	`-a-bizmrreport`
音声入力_医療	`-a-medgeneral-input`	音声入力_医療	`-a-medical-input`
音声入力_製薬	`-a-bizmrreport-input`
音声入力_電子カルテ	`-a-medkarte-input`

(*3) 8k engines for English, Chinese, and Korean are not supported for asynchronous API. Support is planned for the future.

Engine Name

Japanese speech recognition engines provide multiple engines combining purpose (acoustic model) and language model.

Purpose

There are "会話" engines optimized for transcribing natural conversations between people, and "音声入力" engines optimized for when people speak to machines. Each uses acoustic models trained on different datasets. However, the purpose is not just about the difference in acoustic models, but also includes optimizations for each specific use case.

Characteristics and Points to Note

"会話" engines are designed to easily remove unnecessary words like "えーっと" or "あのー". In the standard settings, these unnecessary words are recognized and automatically removed. You can also configure settings to deliberately display these unnecessary words. Please see Specifying Filler Word Output. When using "音声入力" engines, these words are often not judged as unnecessary and not removed, or may be misrecognized as other words.

Use Cases

Use "会話" engines for transcribing audio from meetings, phone calls, etc.
Use "音声入力" engines for dictation of electronic medical records, reports, emails, short messages, or for dialogues with robots or voice chatbots.
If you can't narrow down the use case, use "会話" engines.

Language Model

Each "domain" such as medical, pharmaceutical, finance, insurance has its own frequently used vocabulary and expressions. We have prepared "domain-specific" language models optimized for each of these domains.

Here's a list of Japanese language models. As they are provided as engines for each purpose, use cases for each are also explained.

Language Model	Language Model Description and Engines for Each Purpose
General	Can be used for transcribing speech content without limiting the purpose. Exclusively for "会話" 会話_汎用(`-a-general`): For transcribing meetings, videos, and cases where input is not limited
Large-scale General	Can be used for dictation without limiting the purpose or transcribing voice dialogues. It has a significantly larger vocabulary than the general model. Rich in vocabulary including rarely spoken words and names of landmarks, places, facilities such as shrines, temples, castles, bridges, hot springs, zoos, aquariums, art museums, museums, dams, tunnels, etc. Exclusively for "音声入力" 音声入力_汎用(`-a-general-input`): For dictation in various scenarios, voice dialogue applications, etc.
Finance	In addition to the "General" language model, terms and expressions used in the finance industry are added. 会話_金融(`-a-bizfinance`): For transcribing conversations during face-to-face sales, etc. 音声入力_金融(`-a-bizfinance-input`): For voice input of daily reports, email creation, etc.
Insurance	In addition to the "General" language model, terms and expressions used in the insurance industry are added. 会話_保険(`-a-bizinsurance`): For transcribing conversations during face-to-face sales, etc. 音声入力_保険(`-a-bizinsurance-input`): For voice input of daily reports, email creation, etc.
Medical	In addition to the "General" language model, various medical specialties, medical terms, and expressions used in medical industry meetings are added. It covers many disease names, drug names, hospital names, surgery names, place names, etc. 会話_医療(`-a-medgeneral`): For transcribing medical industry meetings, doctor-patient conversations during consultations, medical-related videos, etc. 音声入力_医療(`-a-medgeneral-input`): For voice input of care records, medical-related information, etc.
Pharmaceutical	In addition to the "Medical" language model, many pharmaceutical industry terms and expressions are added. It covers many disease names, drug names, hospital names, etc. 会話_製薬(`-a-bizmrreport`): For transcribing conversations during face-to-face sales, etc. 音声入力_製薬(`-a-bizmrreport-input`): For creating pharmacist medication guidance documents, voice input of MR sales daily reports, etc.
Electronic Medical Records	Specialized for dictation of various medical documents such as electronic medical record findings, medical certificates, medical information provision documents, referral letters, etc. Exclusively for "音声入力" 音声入力_電子カルテ(`-a-medkarte-input`): For dictation of electronic medical records in various medical specialties

caution

We plan to consolidate and reorganize medical engines on February 1, 2025. After the consolidation, the list of language models will be as follows:

New "Medical Conference" and "Medical General" language models have been added
The "Pharmaceutical" language model will be integrated into the "Medical Conference" language model

Language Model	Language Model Description and Engines for Each Purpose
General	Can be used for transcribing speech content without limiting the purpose. Exclusively for "会話" 会話_汎用(`-a-general`): For transcribing meetings, videos, and cases where input is not limited
Large-scale General	Can be used for dictation without limiting the purpose or transcribing voice dialogues. It has a significantly larger vocabulary than the general model. Rich in vocabulary including rarely spoken words and names of landmarks, places, facilities such as shrines, temples, castles, bridges, hot springs, zoos, aquariums, art museums, museums, dams, tunnels, etc. Exclusively for "音声入力" 音声入力_汎用(`-a-general-input`): For dictation in various scenarios, voice dialogue applications, etc.
Finance	In addition to the "General" language model, terms and expressions used in the finance industry are added. 会話_金融(`-a-bizfinance`): For transcribing conversations during face-to-face sales, etc. 音声入力_金融(`-a-bizfinance-input`): For voice input of daily reports, email creation, etc.
Insurance	In addition to the "General" language model, terms and expressions used in the insurance industry are added. 会話_保険(`-a-bizinsurance`): For transcribing conversations during face-to-face sales, etc. 音声入力_保険(`-a-bizinsurance-input`): For voice input of daily reports, email creation, etc.
Medical Conference	In addition to the "General" language model, various medical specialties, medical terms, and expressions used in medical industry meetings are added. It covers many disease names, drug names, hospital names, surgery names, place names, etc. Exclusively for "会話" 会話_医療(`-a-medical`): For transcribing medical industry meetings, doctor-patient conversations during consultations, medical-related videos, face-to-face sales conversations, MR sales daily reports, etc.
Medical General	Specialized for dictation of various medical documents such as electronic medical record findings, medical certificates, medical information provision documents, referral letters, care records, pharmacist medication guidance documents, etc. Exclusively for "音声入力" 音声入力_医療(`-a-medical-input`): For dictation by various medical specialists including doctors and pharmacists

List of Class Names for Japanese Language Models

Here's a list of class names defined in Japanese speech recognition engines. Classes are used when registering words. For details, please see Word Registration. API users cannot add new classes.

Class Name	General	Large-scale General	Finance	Insurance	Pharmaceutical	Medical	Electronic Medical Records	Supplementary
固有名詞	●	●	●	●	●	●
名前	●	●	●	●	●	●		Represents surname
名前(名)	●	●	●	●	●	●		Represents given name
名前							●	Represents full name *1
駅名	●	●	●	●	●	●
地名	●	●	●	●	●
会社名	●	●	●	●	●	●
部署名	●	●	●	●	●	●
役職名	●	●	●	●	●	●
記号	●	●	●	●	●	●
括弧開き	●	●	●	●	●	●
括弧閉じ	●	●	●	●	●	●
元号	●	●	●	●	●	●	●
病名					●	●	●
薬品名					●	●	●
病院名					●	●	●
手術名						●	●
地名_区町村						●	●
地名_支庁市郡						●	●

(*1) The Name class represents full name in Electronic Medical Records, but represents surname in other language models.

caution

We plan to consolidate and reorganize medical engines on February 1, 2025. After the consolidation, the list of class names will be as follows:

New "Medical Conference" and "Medical General" language models have been added
The "Pharmaceutical" language model will be integrated into the "Medical Conference" language model

Class Name	General	Large-scale General	Finance	Insurance	Medical Conference	Medical General	Notes
固有名詞	●	●	●	●	●
名前	●	●	●	●	●		Represents surname
名前(名)	●	●	●	●	●		Represents given name
名前						●	Represents full name *1
駅名	●	●	●	●	●
地名	●	●	●	●
会社名	●	●	●	●	●
部署名	●	●	●	●	●
役職名	●	●	●	●	●
記号	●	●	●	●	●
括弧開き	●	●	●	●	●
括弧閉じ	●	●	●	●	●
元号	●	●	●	●	●	●
病名					●	●
薬品名					●	●
病院名					●	●
手術名					●	●
地名_区町村					●	●
地名_支庁市郡					●	●

List of Class Names for Chinese Language Model

This is a list of class names defined in the Chinese speech recognition engine.

Class Name	General
固有名词一般	●
姓	●
名	●

List of Class Names for Korean Language Model

This is a list of class names defined in the Korean speech recognition engine.

Class Name	General
固有名詞	●
地名	●
駅名	●
会社名	●
名前(姓)	●
名前(名)	●

Supported Sampling Rates

All speech recognition engines support 16kHz. Some engines support 8kHz sampling rate, which is commonly used in telephone communications. For information on sampling rates, please see Sampling Rate in the audio format section.

tip

When recording audio yourself, record at 16kHz sampling rate and use a 16kHz engine.
For telephone audio, use an 8kHz engine.

Connection Engine Names

For the Connection Engine Name (grammarFileNames) in the Request Parameters, specify the string in the "Connection Engine Name" column of the table. For engine names published in AmiVoice API Private, please see your My Page.

Costs

Costs vary depending on the engine. For details, please see AmiVoice API Pricing.

About Recognition Accuracy

Words that are not in the vocabulary of the speech recognition engine will not be output. If a word not in the vocabulary is spoken, it will be recognized as a word with a similar pronunciation, a combination of shorter words with similar pronunciations, or simply as an incorrect word. Due to computational resource and time constraints, each speech recognition engine has a fixed vocabulary. General-purpose engines such as "会話_汎用" and "音声入力_汎用" have many vocabulary words registered to be usable in various scenarios, but they do not include words specific to particular industries or uses.

For specialized terms commonly used in industries such as medical, finance, and insurance, using an engine specialized for that industry can achieve higher recognition rates for words commonly used in that field. Furthermore, for words commonly used in specific organizations, word registration can be used to address this.

tip

We have compared and reported on the difference in recognition rates between general-purpose engines and domain-specific engines on the AmiVoice Tech Blog. Please see Comparing the Speech Recognition Accuracy of AmiVoice's Domain-Specific Engines (General vs. Electronic Medical Records) and Comparison of Recognition Results between Voice Input Engine and Conversation Engine with the Same Utterance.

List of Speech Recognition Engines​

Engine Name​

Purpose​

Characteristics and Points to Note​

Use Cases​

Language Model​

List of Class Names for Japanese Language Models​

List of Class Names for Chinese Language Model​

List of Class Names for Korean Language Model​

Supported Sampling Rates​

Connection Engine Names​

Costs​

About Recognition Accuracy​

List of Speech Recognition Engines

Engine Name

Purpose

Characteristics and Points to Note

Use Cases

Language Model

List of Class Names for Japanese Language Models

List of Class Names for Chinese Language Model

List of Class Names for Korean Language Model

Supported Sampling Rates

Connection Engine Names

Costs

About Recognition Accuracy