Skip to main content

Speech Recognition Engines

AmiVoice API provides multiple speech recognition engines tailored for various languages and purposes. By selecting the most suitable speech recognition engine for the audio you want to recognize, you can improve accuracy. This section explains the languages supported by the speech recognition engines, types of engines, and key points for choosing the appropriate one.

caution

We plan to consolidate and reorganize medical engines on February 1, 2025. The explanation after the consolidation is described in the "Note" section.

【AmiVoice API】 Notice of Planned Consolidation of Speech Recognition Engines for the Medical Industry

List of Speech Recognition Engines

Here is a list of speech recognition engines provided by AmiVoice API.

LanguageEngine NameLanguage ModelSupported Sampling RatesConnection Engine Name
Japanese会話_汎用General8k / 16k-a-general
Japanese会話_医療Medical16k-a-medgeneral
Planned to change to -a-medical(*2)
Japanese会話_製薬Pharmaceutical16k-a-bizmrreport
Japanese会話_金融Finance16k-a-bizfinance
Japanese会話_保険Insurance16k-a-bizinsurance
Japanese音声入力_汎用Large-scale General16k-a-general-input
Japanese音声入力_医療Medical16k-a-medgeneral-input
Planned to change to -a-medical-input(*2)
Japanese音声入力_製薬Pharmaceutical16k-a-bizmrreport-input
Planned to be integrated into -a-medical-input(*2)
Japanese音声入力_保険Insurance16k-a-bizinsurance-input
Japanese音声入力_金融Finance16k-a-bizfinance-input
Japanese音声入力_電子カルテElectronic Medical Records16k-a-medkarte-input
Planned to be integrated into -a-medical-input(*2)
English英語_汎用General8k(*3) / 16k-a-general-en
Chinese中国語_汎用General8k(*3) / 16k-a-general-zh
Korean(*1)韓国語_汎用General8k(*3) / 16k-a-general-ko
caution
  • (*1) Korean is not supported for asynchronous API. Support is planned for the future.
  • (*2) Planned to change on February 1, 2025. Users who have been using the old engine names until October 30, 2024, will be able to continue using the old connection engine names even after the change. While this won't affect the operation of applications, we recommend using the new engine names. The new engines will be available from November 1.

The correspondence between current speech recognition engines and new engines is as shown in the table below.

CurrentAfter Change
Speech Recognition Engine NameConnection Engine NameSpeech Recognition Engine NameConnection Engine Name
会話_医療-a-medgeneral会話_医療-a-medical
会話_製薬-a-bizmrreport
音声入力_医療-a-medgeneral-input音声入力_医療-a-medical-input
音声入力_製薬-a-bizmrreport-input
音声入力_電子カルテ-a-medkarte-input
  • (*3) 8k engines for English, Chinese, and Korean are not supported for asynchronous API. Support is planned for the future.

Engine Name

Japanese speech recognition engines provide multiple engines combining purpose (acoustic model) and language model.

Purpose

There are "会話" engines optimized for transcribing natural conversations between people, and "音声入力" engines optimized for when people speak to machines. Each uses acoustic models trained on different datasets. However, the purpose is not just about the difference in acoustic models, but also includes optimizations for each specific use case.

Characteristics and Points to Note

"会話" engines are designed to easily remove unnecessary words like "えーっと" or "あのー". In the standard settings, these unnecessary words are recognized and automatically removed. You can also configure settings to deliberately display these unnecessary words. Please see Specifying Filler Word Output. When using "音声入力" engines, these words are often not judged as unnecessary and not removed, or may be misrecognized as other words.

Use Cases
  • Use "会話" engines for transcribing audio from meetings, phone calls, etc.
  • Use "音声入力" engines for dictation of electronic medical records, reports, emails, short messages, or for dialogues with robots or voice chatbots.
  • If you can't narrow down the use case, use "会話" engines.

Language Model

Each "domain" such as medical, pharmaceutical, finance, insurance has its own frequently used vocabulary and expressions. We have prepared "domain-specific" language models optimized for each of these domains.

Here's a list of Japanese language models. As they are provided as engines for each purpose, use cases for each are also explained.

Language Model
Language Model Description and Engines for Each Purpose
GeneralCan be used for transcribing speech content without limiting the purpose. Exclusively for "会話"

会話_汎用(-a-general): For transcribing meetings, videos, and cases where input is not limited
Large-scale GeneralCan be used for dictation without limiting the purpose or transcribing voice dialogues. It has a significantly larger vocabulary than the general model. Rich in vocabulary including rarely spoken words and names of landmarks, places, facilities such as shrines, temples, castles, bridges, hot springs, zoos, aquariums, art museums, museums, dams, tunnels, etc. Exclusively for "音声入力"

音声入力_汎用(-a-general-input): For dictation in various scenarios, voice dialogue applications, etc.
FinanceIn addition to the "General" language model, terms and expressions used in the finance industry are added.

会話_金融(-a-bizfinance): For transcribing conversations during face-to-face sales, etc.
音声入力_金融(-a-bizfinance-input): For voice input of daily reports, email creation, etc.
InsuranceIn addition to the "General" language model, terms and expressions used in the insurance industry are added.

会話_保険(-a-bizinsurance): For transcribing conversations during face-to-face sales, etc.
音声入力_保険(-a-bizinsurance-input): For voice input of daily reports, email creation, etc.
MedicalIn addition to the "General" language model, various medical specialties, medical terms, and expressions used in medical industry meetings are added. It covers many disease names, drug names, hospital names, surgery names, place names, etc.

会話_医療(-a-medgeneral): For transcribing medical industry meetings, doctor-patient conversations during consultations, medical-related videos, etc.
音声入力_医療(-a-medgeneral-input): For voice input of care records, medical-related information, etc.
PharmaceuticalIn addition to the "Medical" language model, many pharmaceutical industry terms and expressions are added. It covers many disease names, drug names, hospital names, etc.

会話_製薬(-a-bizmrreport): For transcribing conversations during face-to-face sales, etc.
音声入力_製薬(-a-bizmrreport-input): For creating pharmacist medication guidance documents, voice input of MR sales daily reports, etc.
Electronic Medical RecordsSpecialized for dictation of various medical documents such as electronic medical record findings, medical certificates, medical information provision documents, referral letters, etc. Exclusively for "音声入力"

音声入力_電子カルテ(-a-medkarte-input): For dictation of electronic medical records in various medical specialties
caution

We plan to consolidate and reorganize medical engines on February 1, 2025. After the consolidation, the list of language models will be as follows:

  • New "Medical Conference" and "Medical General" language models have been added
  • The "Pharmaceutical" language model will be integrated into the "Medical Conference" language model
Language Model
Language Model Description and Engines for Each Purpose
GeneralCan be used for transcribing speech content without limiting the purpose. Exclusively for "会話"

会話_汎用(-a-general): For transcribing meetings, videos, and cases where input is not limited
Large-scale GeneralCan be used for dictation without limiting the purpose or transcribing voice dialogues. It has a significantly larger vocabulary than the general model. Rich in vocabulary including rarely spoken words and names of landmarks, places, facilities such as shrines, temples, castles, bridges, hot springs, zoos, aquariums, art museums, museums, dams, tunnels, etc. Exclusively for "音声入力"

音声入力_汎用(-a-general-input): For dictation in various scenarios, voice dialogue applications, etc.
FinanceIn addition to the "General" language model, terms and expressions used in the finance industry are added.

会話_金融(-a-bizfinance): For transcribing conversations during face-to-face sales, etc.
音声入力_金融(-a-bizfinance-input): For voice input of daily reports, email creation, etc.
InsuranceIn addition to the "General" language model, terms and expressions used in the insurance industry are added.

会話_保険(-a-bizinsurance): For transcribing conversations during face-to-face sales, etc.
音声入力_保険(-a-bizinsurance-input): For voice input of daily reports, email creation, etc.
Medical ConferenceIn addition to the "General" language model, various medical specialties, medical terms, and expressions used in medical industry meetings are added. It covers many disease names, drug names, hospital names, surgery names, place names, etc. Exclusively for "会話"

会話_医療(-a-medical): For transcribing medical industry meetings, doctor-patient conversations during consultations, medical-related videos, face-to-face sales conversations, MR sales daily reports, etc.
Medical GeneralSpecialized for dictation of various medical documents such as electronic medical record findings, medical certificates, medical information provision documents, referral letters, care records, pharmacist medication guidance documents, etc. Exclusively for "音声入力"

音声入力_医療(-a-medical-input): For dictation by various medical specialists including doctors and pharmacists

List of Class Names for Japanese Language Models

Here's a list of class names defined in Japanese speech recognition engines. Classes are used when registering words. For details, please see Word Registration. API users cannot add new classes.

Class NameGeneralLarge-scale GeneralFinanceInsurancePharmaceuticalMedicalElectronic Medical RecordsSupplementary
固有名詞
名前Represents surname
名前(名)Represents given name
名前Represents full name *1
駅名
地名
会社名
部署名
役職名
記号
括弧開き
括弧閉じ
元号
病名
薬品名
病院名
手術名
地名_区町村
地名_支庁市郡
  • (*1) The Name class represents full name in Electronic Medical Records, but represents surname in other language models.
caution

We plan to consolidate and reorganize medical engines on February 1, 2025. After the consolidation, the list of class names will be as follows:

  • New "Medical Conference" and "Medical General" language models have been added
  • The "Pharmaceutical" language model will be integrated into the "Medical Conference" language model
Class Name
General
Large-scale General
Finance
Insurance
Medical Conference
Medical General
Notes
固有名詞
名前Represents surname
名前(名)Represents given name
名前Represents full name *1
駅名
地名
会社名
部署名
役職名
記号
括弧開き
括弧閉じ
元号
病名
薬品名
病院名
手術名
地名_区町村
地名_支庁市郡

List of Class Names for Chinese Language Model

This is a list of class names defined in the Chinese speech recognition engine.

Class NameGeneral
固有名词一般

List of Class Names for Korean Language Model

This is a list of class names defined in the Korean speech recognition engine.

Class NameGeneral
固有名詞
地名
駅名
会社名
名前(姓)
名前(名)

Supported Sampling Rates

All speech recognition engines support 16kHz. Some engines support 8kHz sampling rate, which is commonly used in telephone communications. For information on sampling rates, please see Sampling Rate in the audio format section.

tip
  • When recording audio yourself, record at 16kHz sampling rate and use a 16kHz engine.
  • For telephone audio, use an 8kHz engine.

Connection Engine Names

For the Connection Engine Name (grammarFileNames) in the Request Parameters, specify the string in the "Connection Engine Name" column of the table. For engine names published in AmiVoice API Private, please see your My Page.

Costs

Costs vary depending on the engine. For details, please see AmiVoice API Pricing.

About Recognition Accuracy

Words that are not in the vocabulary of the speech recognition engine will not be output. If a word not in the vocabulary is spoken, it will be recognized as a word with a similar pronunciation, a combination of shorter words with similar pronunciations, or simply as an incorrect word. Due to computational resource and time constraints, each speech recognition engine has a fixed vocabulary. General-purpose engines such as "会話_汎用" and "音声入力_汎用" have many vocabulary words registered to be usable in various scenarios, but they do not include words specific to particular industries or uses.

For specialized terms commonly used in industries such as medical, finance, and insurance, using an engine specialized for that industry can achieve higher recognition rates for words commonly used in that field. Furthermore, for words commonly used in specific organizations, word registration can be used to address this.

tip

We have compared and reported on the difference in recognition rates between general-purpose engines and domain-specific engines on the AmiVoice Tech Blog. Please see Comparing the Speech Recognition Accuracy of AmiVoice's Domain-Specific Engines (General vs. Electronic Medical Records) and Comparison of Recognition Results between Voice Input Engine and Conversation Engine with the Same Utterance.