Skip to main content

Rule Grammar

Speech Recognition Using Grammar Files

The "音声入力_ルール" engine (-a-rule-input-private) available in AmiVoice API Private allows speech recognition using grammar files created by users. For example, it is effective in scenarios where speech with limited words based on certain rules is expected, such as verifying membership numbers when performing speech recognition of customer information in an IVR system.

Grammar File Format

The "音声入力_ルール" engine can use grammar files written in JSpeech Grammar Format (JSGF) or Speech Recognition Grammar Specification (SRGS). These formats allow you to describe phrases (sets of words) and tags (strings returned to the application as accompanying information when phrases are recognized) that you want the speech recognition engine to recognize in a limited way. When creating a new grammar file, we recommend using JSGF grammar files. Here, we provide a brief explanation of the specifications for these two grammar files, but for detailed specifications of each grammar file, please see the following web pages:

JSGF Grammar File Simple Specification

JSGF grammar files consist of the following elements:

Header Information

Describes character encoding information and country/language information. The format of the "Header Information" is as follows:

#JSGF V1.0;
#JSGF V1.0 <character encoding information>;
#JSGF V1.0 <character encoding information> <country/language information>;

Character encoding information and country/language information can be omitted.

Example:

#JSGF V1.0;
#JSGF V1.0 MS932;
#JSGF V1.0 MS932 ja-JP;
#JSGF V1.0 UTF-8;

Grammar Statement

Describes the grammar name. The format of the "grammar statement" is as follows:

grammar [Grammar Name];

For the grammar name, please specify a string that excludes the path and extension parts from the grammar file name.

Example:

grammar Sample;

Rule Definition Statement

Describes the actual definition of phrases and tags that you want the speech recognition engine to recognize. Multiple rule definition statements can be written in a single grammar file. There are two types of rules: public rules and private rules. Public rules define phrases or tags that you want the speech recognition engine to recognize on their own. Private rules are referenced by other rules. The format of the "Rule Definition Statement" for public rules is as follows:

public <[Rule Name]> = [Rule Definition];

The format of the "Rule Definition Statement" for private rules is as follows:

<[Rule Name]> = [Rule Definition];

For rule definitions, please see Rule Definition.

Example:

public <sample1> = おはよう <sample2>;
<sample2> = AmiVoice\あみぼいす;

Rule Definition

Rule definitions consist of the following elements:

Words

Describe the words you want the speech recognition engine to recognize. Words can be described in series or in parallel. The "notation" and "pronunciation" of a word are separated by a \ (backslash). If there are multiple "pronunciations" for one "notation", each pronunciation is separated by a / (slash). The format is as follows:

SymbolDescription
( )Indicates "grouping"
|Indicates "parallel"
\Indicates the separator between notation and pronunciation
/Indicates the separator between pronunciations

Example of series:

・・・ AmiVoice\あみぼいす/あみ 音声認識\おんせいにんしき エンジン ・・・

Example of parallel:

・・・ ( AmiVoice\あみぼいす/あみ | 音声認識\おんせいにんしき | エンジン ) ・・・

Please also check the following regarding how to describe words:

  • The characters that can be specified for "pronunciation" are limited to hiragana, "ー" (long vowel mark), and "." (period).
  • When describing "pronunciation", whether you use the "ー" (long vowel mark) or not, the speech recognition engine internally treats them as the same "pronunciation". Based on this identification rule, please avoid registering duplicate words.

Example:

"かー" = "かあ" "きゃー" = "きゃあ"
"きー" = "きい"
"くー" = "くう" "きゅー" = "きゅう"
"けー" = "けい" = "けえ"
"こー" = "こう" = "こお" "きょー" = "きょう" = "きょお" etc.
  • When using "は" or "へ" in "pronunciation", these are always treated as "h a" or "h e". They are never treated as "w a" or "e".
  • By using "." (period) when describing "pronunciation", it's possible to forcibly not apply the above identification rules. Example:
"やまのうち" → "やまのーち"
"やまの.うち" → "やまのうち"
"りくうんきょく" → "りくーんきょく"
"りく.うんきょく" → "りくうんきょく" etc.
Rule Names

Described when you want to refer to different rules within the same grammar file. Rule names can also be described in series or in parallel. The format is as follows:

<[Rule Name]>

Example:

・・・ こんにちは <name> さん・・・
Special Rule Names

You can also describe references to special rules. The rule names representing special rules are as follows:

Rule NameDescription
<NULL>Special rule meaning nothing
<VOID>Special rule meaning it never matches any utterance
<GARBAGE>Special rule meaning it matches any utterance

Example:

・・・ こんにちは ( <NULL> {no-name} | アミ {AMI} ) ・・・
・・・ こんばんは <VOID> ・・・
<GARBAGE>+ ( おはよう | おやすみ ) <GARBAGE>+
tip

The special rule <GARBAGE> is a specification uniquely extended by AmiVoice for JSGF and is not included in the standard JSGF specification.

Repetition

You can describe four types of repetition counts: "once", "zero or once", "zero or more repetitions", and "one or more repetitions". The format is as follows:

SymbolDescription
[ ]Symbols indicating "zero or once"
*Postfix symbol indicating "zero or more repetitions"
+Postfix symbol indicating "one or more repetitions"

Example of "once":

・・・ AmiVoice\あみぼいす 音声認識\おんせいにんしき エンジン ・・・

Example of "zero or once":

・・・ [ AmiVoice\あみぼいす ] ・・・
・・・ [ AmiVoice\あみぼいす 音声認識\おんせいにんしき エンジン ] ・・・

Example of "zero or more repetitions":

・・・ AmiVoice\あみぼいす * ・・・
・・・ ( AmiVoice\あみぼいす 音声認識\おんせいにんしき エンジン )* ・・・

Example of "one or more repetitions":

・・・ AmiVoice\あみぼいす + ・・・
・・・ ( AmiVoice\あみぼいす 音声認識\おんせいにんしき エンジン )+ ・・・
Tags

You can describe tags (strings returned to the application as accompanying information when phrases are recognized) within rule definitions. By using tags, it becomes possible to extract information different from the set of words that make up the recognized phrase from the recognized phrase. By setting the same tag to different phrases with the same meaning, it becomes easier for the application side receiving the tag string to implement processing that is independent of wording or language. The format is as follows:

{[Tag]}

Example:

・・・ AmiVoice\あみぼいす 音声認識\おんせいにんしき エンジン {AmiVoice} ・・・
・・・ ( 一つ\ひとつ | 一個\いっこ | 一本\いっぽん ) {1} ・・・

Simple Specification of SRGS Grammar Files

SRGS grammar files come in two formats: ABNF-format SRGS grammar files and XML-format SRGS grammar files. The difference between them is only in their representation format, and there is no difference in the content they can express. The specification of SRGS grammar files almost entirely encompasses the specification of JSGF grammar files, so conversion between JSGF grammar files and SRGS grammar files can be done relatively easily.

The main conversion rules between JSGF grammar files and SRGS grammar files are as follows:

ItemJSGF Grammar FileABNF-format
SRGS Grammar File
XML-format
SRGS Grammar File
Header Information#JSGF V1.0 MS932 ja-JP;#ABNF 1.0 MS932;
language ja-JP;
<?xml version="1.0" encoding="MS932">
<grammar xmlns="..." version="1.0" xml:lang="ja-JP">
...
</grammar>
Grammar Declarationgrammar a;(None)
*Grammar name is expressed by filename
(None)
*Grammar name is expressed by filename
Public Rule Definitionpublic <a> = A;public $a = A;<rule id="a" scope="public">
A
</rule>
Private Rule Definition<b> = B;$b = B;<rule id="b">
B
</rule>
Grouping( A B C )( A B C )<item>A B C</item>
Parallel( A | B | C )( A | B | C )<one-of>
<item>A</item>
<item>B</item>
<item>C</item>
</one-of>
Rule Name< b >$ b<ruleref uri="#b">
Special Rule Names<NULL>
<VOID>
<GARBAGE>
$NULL
$VOID
$GARBAGE
<ruleref special="NULL">
<ruleref special="VOID">
<ruleref special="GARBAGE">
0 or 1 time[ A ][ A ]
or
A <0-1>
<item repeat="0-1">
A
</item>
0 or more timesA *A <0-><item repeat="0-">
A
</item>
1 or more timesA +A <1-><item repeat="1-">
A
</item>
Tag{ T }{ T }<tag>T</tag>

Samples

Sample 1: Greeting

Greeting.gram

#JSGF V1.0 UTF-8;
grammar Greeting;

public <greeting> = おはよう [ございます] | こんにちは | こんばんは | さようなら;

It can recognize "おはよう", "おはようございます", "こんにちは", "こんばんは", "さようなら".

Sample 2: Name

Name.gram

#JSGF V1.0 UTF-8;
grammar Name;

public <name> = [私\わたし わ] <last> <first> [です];

<last> = 鈴木\すずき {suzuki} | 中村\なかむら {nakamura};
<first> = 太郎\たろー {taro} | 花子\はなこ {hanako};

It can recognize phrases like "私は鈴木太郎です" or "私は中村花子です". "私は" and "です" can be omitted. It cannot recognize phrases with different word orders like "鈴木花子は私です" or incomplete phrases like "私は中村です".

Sample 3: Zip Code

ZipCode.gram

#JSGF V1.0 UTF-8;
grammar ZipCode;

public <zipcode> = <num> <num> <num> [] <num> <num> <num> <num>;

<num> = 1\いち {1}
| 2\に/にー {2}
| 3\さん {3}
| 4\よん//しー {4}
| 5\ご {5}
| 6\ろく {6}
| 7\なな/しち {7}
| 8\はち {8}
| 9\きゅー//くー {9}
| 0\ぜろ/れー/まる {0};

It can recognize phrases like "123 の4567". When "123 の4567" is recognized, the tag string becomes "1|2|3|4|5|6|7". By removing the "|" from the tag string on the application side, you can obtain the zip code.

Sample 4: Continuous Number Input

Number.gram

#JSGF V1.0 UTF-8;
grammar Number;

public <number> = <num>+;

<num> = 1\いち {1}
| 2\に/にー {2}
| 3\さん {3}
| 4\よん//しー {4}
| 5\ご {5}
| 6\ろく {6}
| 7\なな/しち {7}
| 8\はち {8}
| 9\きゅー//くー {9}
| 0\ぜろ/れー/まる {0};

It can recognize phrases like "いちにさん", "ごろくなな", etc.

Sample 5: Drive-through

Burger.gram

#JSGF V1.0 UTF-8;
grammar Burger;

public <order> = (<item> [] <count> [])+ [[] お願い\おねがい [します]];

public <item> = ( ハンバーガー {bur}
| ドリンク {dri}
| フライドポテト {fri}
| コカコーラ {col}
| アイスクリーム {ice}
);

<count> = ( 一つ\ひとつ {1} | 二つ\ふたつ {2} | 三つ\みっつ {3} | 四つ\よっつ {4} | 五つ\いつつ {5}
| 六つ\むっつ {6} | 七つ\ななつ {7} | 八つ\やっつ {8} | 九つ\ここのつ {9}
| 一個\いっこ {1} | 二個\にこ {2} | 三個\さんこ {3} | 四個\よんこ {4} | 五個\ごこ {5}
| 六個\ろっこ {6} | 七個\ななこ {7} | 八個\はっこ {8} | 九個\きゅーこ {9}
);

public <end> = (以上\いじょー | OK\おっけー) [です] {end};

The "+" symbol means one or more repetitions, so it can recognize phrases like:

"ハンバーガー"
"ひとつ"
"ハンバーガー一個"
"ハンバーガー一個とアイスクリームを二つお願いします"
"ハンバーガー一個とアイスクリームを二つドリンク三つ"

By using the tag string on the application side, it's easy to perform the same processing whether "一つ" or "一個" is recognized. For example, if "ハンバーガー一個とアイスクリームを二つドリンク三つ" is recognized, the tag string becomes "bur|1|ice|2|dri|3".