Skip to main content

Keyword Biasing for End to End Engine

For keyword biasing(boosting) in the End to End engine, you can specify the "written", "alternative written", and "biasing level" of the word, among which the "written" is required. The following explains each component.

Item
Description
Required
Example
WrittenThe string you want to make more likely to appear as a speech recognition result.AmiVoice
Alternative WrittenThe string you want to replace with the written. It can also be treated as pronunciation information for the written. The alternative written is also made more likely to appear as a recognition result candidate before being replaced with the written.アミボイス
Biasing LevelSpecifies the strength of keyword biasing. Values from 0 to 1 can be specified, with a default of 0.5 if not specified.0.7

Overview of Keyword Biasing

For example, if you want to boost the word "AmiVoice" because it's not being recognized, you can register the written, alternative written, and biasing level as follows. The written and alternative written, and the alternative written and biasing level are separated by a single space.

The alternative written can be omitted. In this case, put two consecutive spaces between the written and the biasing level. The biasing level can also be omitted, in which case the default value of 0.5 is applied.

AmiVoice アミボイス 0.6

In this case, not only will "AmiVoice" be more likely to appear as a recognition result (with a keyword biasing of 0.6), but "アミボイス" will also be more likely to appear as a recognition result candidate, and then "アミボイス" in the recognition result will be replaced with "AmiVoice".

info

Setting Multiple Alternative Writtens for the Same Written

You can set multiple alternative writtens for one written. However, you cannot apply different biasing levels to the same written. If multiple entries with the same written are included in a keyword biasing request, the biasing level applied to the first one (or the default 0.5 if the first one omits the biasing level specification) will be applied to all entries with that written.

For example, you can set "AMI" as the written for pronunciation like "アミ" or "アドバンストメディア".

AMI  0.7
AMI アミ
AMI アドバンストメディア

In this case, a biasing level of 0.7 will also be applied to "AMI アミ" and "AMI アドバンストメディア".

note

Please do not register the same alternative written for multiple writtens.

Written

The "written" is the string you want to make more likely to appear.

Special Characters Usable in Written

There are special characters that have specific functions when used in the written.

Character
Character Name
Description
_UnderscoreSymbol representing word separation
note

It is not possible to output an underscore (_) as a speech recognition result.

tip

When a word ending with a half-width alphanumeric character is followed by a word starting with a half-width alphanumeric character, a half-width space is inserted between the two words.

For example, if you want to output "Advanced Media" when "あみ" is spoken, use keyword biasing like "Advanced_Media アミ". (The biasing level is optional)

Advanced_Media アミ

However, a half-width space is not inserted in the following cases:

アドバンスト_メディア アミ
トリプル_W トリプルダブル
Triple_ダブル トリプルダブル

Also, if you don't use an underscore in the written as shown below, it will be interpreted as a continuous word, so no half-width space will be inserted.

AdvancedMedia アミ

Characters That Cannot Be Registered in Written

Strings containing the following characters cannot be registered in the written:

Character
Character Name
|Vertical bar
Space
:Colon

Alternative Written

The "alternative written" is a string you want to replace with the "written". Like the written, the alternative written is also made more likely to appear as a recognition result candidate before being replaced with the written. While it can be used to provide pronunciation information for the "written", it has a different nature from the "pronunciation" in word registration for the Hybrid engine.

For example, let's say you want to make the word "雲母" (mica) easier to recognize. Since "雲母" is normally read as "うんも", it can be a candidate for recognition results of the utterance "うんも" without using the alternative written to provide pronunciation information. Therefore, keyword biasing as follows will make "雲母" more likely to be recognized when "うんも" is spoken. The numeric value for the biasing level is just an example.

雲母  0.8

Now, if you also want the utterance "きらら" to be recognized as "雲母", you can use the alternative written for keyword biasing as follows:

雲母 キララ 0.8

In this case, both utterances "うんも" and "きらら" will be more likely to be recognized as "雲母".

tip

The alternative written is a mechanism where the End to End engine replaces it with the "written" word when it produces the alternative written as a recognition result. Therefore, if the engine never considers the alternative written as a recognition result candidate, keyword biasing will not work. For Japanese End to End engines, proper nouns are more likely to have katakana transcriptions as recognition result candidates, so registering katakana transcriptions as alternative writtens can be expected to work appropriately for keyword biasing.

For example, if you want the utterance "ぱれおぱらどきしあ" to be recognized as "絶滅哺乳類" (extinct mammal), specifying "絶滅哺乳類" as the written and registering the katakana transcription "パレオパラドキシア" as the alternative written, rather than "ぱれおぱらどきしあ", is more likely to work appropriately.

note

It is not guaranteed that all parts matching the alternative written in the recognition results will always be replaced with the written, causing the alternative written to disappear from the recognition results. If you want to always replace a certain string with a different string, it is recommended to perform post-processing on the recognition results on the client system side.

Characters That Cannot Be Registered in Alternative Written

Strings containing the following characters cannot be registered in the alternative written:

Character
Character Name
|Vertical bar
Space

Special Characters Usable in Alternative Written

There are special characters that have specific functions when used in the alternative written.

Character
Character Name
Description
_UnderscoreSymbol representing word separation
tip

If the underscore "_" is used in the written or alternative written to represent word separation, the recognition result candidate must also have word separations in the same way to be considered a match with the written or alternative written. For example, if keyword biasing is specified as follows and "ぱれおぱらどきしあ" is spoken, if the recognition result candidate is a continuous word like "パレオパラドキシア", it will be judged as different from the two words "パレオパラ" and "ドキシア" lined up in the alternative written, and this keyword biasing will not work.

絶滅哺乳類 パレオパラ_ドキシア 0.5

Conversely, if the written or alternative written is made into a continuous word without using an underscore, keyword biasing will work even if the recognition result candidate is separated into multiple words.

Biasing Level

The "biasing level" specifies the strength of keyword biasings. Values from 0 to 1 can be specified, where 0 represents no emphasis, meaning the word will neither become more likely nor less likely to appear. A value of 1 represents strong keyword biasing, potentially at the expense of overall recognition accuracy. If not specified, the default value is 0.5.