Components of Word Registration
For word registration, you can specify the "notation", "pronunciation", and "class" of the word, among which "notation" and "pronunciation" are mandatory items. The following explains each component.
Item | Description | Required | Example |
---|---|---|---|
Notation | The string obtained as a result of speech recognition when the word is spoken. | ● | AmiVoice |
Pronunciation | Information representing how the word is pronounced. The method of describing the pronunciation differs for each language. | ● | あみぼいす |
Class | A classification used to specify the category or type of the word. This classification allows the speech recognition system to distinguish words with the same pronunciation used in different contexts. Classes are defined for each engine, and API users cannot add classes. | 固有名詞 |
The English engine does not support word registration.
Overview of Word Registration
For example, if you want to register the word "パレオパラドキシア" because it's not being recognized, register the notation and pronunciation pair as follows. Separate the notation and pronunciation with a space. If you also want to set a class, please see How to Set Class.
パレオパラドキシア ぱれおぱらどきしあ
Setting multiple pronunciations for the same notation
You can set multiple pronunciations for one notation.
For example, you can set the notation "AMI" for pronunciation like "あみ" or "アドバンストメディア".
AMI あみ
AMI あどばんすとめでぃあ
Setting the same pronunciation for multiple notations
You can set the same pronunciation for multiple different notations. It won't cause an error, but which notation will be chosen is undefined. It is not recommended to set this intentionally.
For example, you can set notations like "AMI" and "AmiVoice" for the pronunciation "あみ".
AMI あみ
AmiVoice あみ
Notation
The "notation" is the string you want to output for the spoken audio.
Special Characters Usable in Notation
Among the characters that can be used in the notation, there are symbols that have special functions.
Character | Character Name | Description |
---|---|---|
_ | Underscore | Symbol that outputs as a space in speech recognition results |
It is not possible to output an underscore (_) as a speech recognition result.
Characters That Cannot Be Registered in Notation
Strings containing the following characters cannot be registered in the notation.
Character | Character Name |
---|---|
| | Vertical bar |
Space | |
: | Colon |
While you cannot use spaces in the notation you're registering, if you use an underscore (_) in the notation when registering a word, it will be output as a space in the speech recognition results.*
For example, if you want to output "Advanced Media" when "あみ" is spoken, register the word as "Advanced_Media あみ".*
Advanced_Media あみ
Pronunciation
"Pronunciation" refers to how the word is pronounced (how it's spoken).
How to Describe Pronunciation for Each Language
The method of describing pronunciation differs for each language. The following explains the description method for each language.
Japanese
For Japanese, describe using hiragana or katakana.
Chinese
For Chinese, describe using pinyin with tones represented by numbers. For example, "我们" should be described as "wo3men5".
我们 wo3men5
Korean
For Korean, describe using Hangul.
Special characters for pronunciation
Among the characters that can be used in pronunciation. The following shows the special characters that can be used for each language.
Japanese
Character | Character name | Description |
---|---|---|
. | Half-width period | Symbol for syllable separation and suppression of long vowels |
_ | Half-width underscore | Symbol representing silence |
-
The AmiVoice Tech Blog explains how to use these special characters, particularly the half-width period for pronunciation. For details, please see the following:
【For intermediate users】About automatic conversion of word pronunciation in AmiVoice -
By using an underscore (_), even if there's a slight silence in the middle of a word, it's more likely to be recognized as a continuous word.
For example, if you register "AmiVoice青果店 あみぼいす_せいかてん", even if there's a momentary pause between "あみぼいす" and "せいかてん", it's more likely to be recognized as "AmiVoice青果店".
AmiVoice青果店 あみぼいす_せいかてん
Chinese and Korean
Character | Character name | Description |
---|---|---|
_ | Half-width underscore | Symbol representing silence |
Class
In AmiVoice API, a classification used to specify the category or type of a word is called a class. Classes allow the speech recognition system to distinguish words with the same pronunciation used in different contexts. Classes are defined for each speech recognition engine. For example, in the case of the "会話_汎用" engine (-a-general
), the following classes are defined. For details, please see the list of class names for Japanese language models of the speech recognition engine.
- 固有名詞
- 名前
- 名前(名)
- 駅名
- 地名
- 会社名
- 部署名
- 役職名
- 記号
- 括弧開き
- 括弧閉じ
- 元号
- In the "会話_汎用" engine, the
名前
class represents surnames, and名前(名)
represents first names. - If a non-existent class name is specified, it will be treated as if no class name was specified.
For example, if you specify a word as the "名前" class, that word will be more easily recognized in contexts where personal names are spoken. Conversely, it will be less likely to be recognized in contexts other than where personal names are spoken, which can reduce problems of incorrect recognition of words with the same pronunciation in different contexts. If there is a class that fits the word you are trying to register, please try to set the class whenever possible.
If a full name is not recognized well even after registering it as a word, here are some strategies:*
-
Split the name into "名前" for surname and "名前(名)" for given name, and register them in the
名前
class for surname and名前(名)
class for given name respectively.
In this case, it becomes easier to recognize even if there's silence or a filler between the surname and given name when spoken. On the other hand, if you register other homophonic names, it becomes easier to misrecognize. (For example, if you register "山田" in the名前
class, and "太郎" and "太朗" both with the pronunciation "たろう" in the名前(名)
class, when you want "山田太郎" for the full name to be recognized for the speech "やまだたろう", there's a possibility it might be recognized as "山田太朗".) -
Insert an underscore (_), which represents silence, between surname and given name in the pronunciation.
In this case, it becomes easier to recognize even if there's a slight silence between the surname and given name. However, it won't be recognized correctly if there's a filler between the surname and given name.*
How to set a class
The class is specified following the "notation" and "pronunciation". For example, if the station name "アソーク駅" is not recognized, and you want to specify the class name as 駅名
, write it as follows after a space:
アソーク駅 あそーくえき 駅名
Please also see Explanation of "Classes" selectable with AmiVoice API's word registration function (General-purpose engine) on the AmiVoice Tech Blog.
Special Word Registration
There are special types of word registration that are only supported in some engines.
Filler Words
The "音声入力_氏名" engine is used for recognizing only names, and the "音声入力_住所" engine is used for recognizing only addresses. These engines do not have filler words preset, but if words other than names or addresses are also spoken, users can register filler words themselves as needed.
The classes that can be used for filler words are as follows:
Class Name | Description |
---|---|
フィラー(文頭) | Class used when you want to insert words like "えー", "わたしは", "ぼくは" before a full name or surname, or words like "えーと", "住所は" before an address |
フィラー(文末) | Class used when you want to insert words like "です" or "ともうします" after a full name or first name, or words like "です" after an address |
When registering filler words, please enclose the notation in half-width percent signs (%). For the pronunciation, write the word's pronunciation as usual without adding percent signs. For example, if you want to register the word "あのー" in the "フィラー(文頭)" class, write it as follows:
%あのー% あのー フィラー(文頭)
In "フィラー(文頭)" or "フィラー(文末)", you can register not only general filler words like "えー" or "あのー", but also words (short phrases) that are used only at the beginning or end of a sentence and that you want to treat as fillers rather than names or addresses.
For example, in the "音声入力_氏名" engine, you register "私は" and "です" as fillers at the beginning and end of the sentence respectively.
%私は% わたしは フィラー(文頭)
%です% です フィラー(文末)
Now, let's say you speak "わたしは、やまだあみです".
わたしは、やまだあみです
The recognition result of this voice will be as follows, with the words registered as fillers automatically removed:
ヤマダアミ