Rule Grammar
Speech Recognition Using Grammar Files
The "音声入力_ルール" engine (-a-rule-input-private) available in AmiVoice API Private allows speech recognition using grammar files created by users. For example, it is effective in scenarios where speech with limited words based on certain rules is expected, such as verifying membership numbers when performing speech recognition of customer information in an IVR system.
Grammar File Format
The "音声入力_ルール" engine can use grammar files written in JSpeech Grammar Format (JSGF) or Speech Recognition Grammar Specification (SRGS). These formats allow you to describe phrases (sets of words) and tags (strings returned to the application as accompanying information when phrases are recognized) that you want the speech recognition engine to recognize in a limited way. When creating a new grammar file, we recommend using JSGF grammar files. Here, we provide a brief explanation of the specifications for these two grammar files, but for detailed specifications of each grammar file, please see the following web pages:
- JSpeech Grammar Format: https://www.w3.org/TR/jsgf/
- Speech Recognition Grammar Specification Version 1.0: https://www.w3.org/TR/speech-grammar/
JSGF Grammar File Simple Specification
JSGF grammar files consist of the following elements:
Header Information
Describes character encoding information and country/language information. The format of the "Header Information" is as follows:
#JSGF V1.0;
#JSGF V1.0 <character encoding information>;
#JSGF V1.0 <character encoding information> <country/language information>;
Character encoding information and country/language information can be omitted.
Example:
#JSGF V1.0;
#JSGF V1.0 MS932;
#JSGF V1.0 MS932 ja-JP;
#JSGF V1.0 UTF-8;
Grammar Statement
Describes the grammar name. The format of the "grammar statement" is as follows:
grammar [Grammar Name];
For the grammar name, please specify a string that excludes the path and extension parts from the grammar file name.
Example:
grammar Sample;
Rule Definition Statement
Describes the actual definition of phrases and tags that you want the speech recognition engine to recognize. Multiple rule definition statements can be written in a single grammar file. There are two types of rules: public rules and private rules. Public rules define phrases or tags that you want the speech recognition engine to recognize on their own. Private rules are referenced by other rules. The format of the "Rule Definition Statement" for public rules is as follows:
public <[Rule Name]> = [Rule Definition];
The format of the "Rule Definition Statement" for private rules is as follows:
<[Rule Name]> = [Rule Definition];
For rule definitions, please see Rule Definition.
Example:
public <sample1> = おはよう <sample2>;
<sample2> = AmiVoice\あみぼいす;
Rule Definition
Rule definitions consist of the following elements:
Words
Describe the words you want the speech recognition engine to recognize. Words can be described in series or in parallel. The "notation" and "pronunciation" of a word are separated by a \ (backslash). If there are multiple "pronunciations" for one "notation", each pronunciation is separated by a / (slash). The format is as follows:
| Symbol | Description |
|---|---|
( ) | Indicates "grouping" |
| | Indicates "parallel" |
\ | Indicates the separator between notation and pronunciation |
/ | Indicates the separator between pronunciations |
Example of series:
・・・ AmiVoice\あみぼいす/あみ 音声認識\おんせいにんしき エンジン ・・・
Example of parallel:
・・・ ( AmiVoice\あみぼいす/あみ | 音声認識\おんせいにんしき | エンジン ) ・・・
Please also check the following regarding how to describe words:
- The characters that can be specified for "pronunciation" are limited to hiragana, "ー" (long vowel mark), and "." (period).
- When describing "pronunciation", whether you use the "ー" (long vowel mark) or not, the speech recognition engine internally treats them as the same "pronunciation". Based on this identification rule, please avoid registering duplicate words.
Example:
"かー" = "かあ" "きゃー" = "きゃあ"
"きー" = "きい"
"くー" = "くう" "きゅー" = "きゅう"
"けー" = "けい" = "けえ"
"こー" = "こう" = "こお" "きょー" = "きょう" = "きょお" etc.
- When using "は" or "へ" in "pronunciation", these are always treated as "h a" or "h e". They are never treated as "w a" or "e".
- By using "." (period) when describing "pronunciation", it's possible to forcibly not apply the above identification rules. Example:
"やまのうち" → "やまのーち"
"やまの.うち" → "やまのうち"
"りくうんきょく" → "りくーんきょく"
"りく.うんきょく" → "りくうんきょく" etc.
Rule Names
Described when you want to refer to different rules within the same grammar file. Rule names can also be described in series or in parallel. The format is as follows:
<[Rule Name]>
Example:
・・・ こんにちは <name> さん・・・
Special Rule Names
You can also describe references to special rules. The rule names representing special rules are as follows:
| Rule Name | Description |
|---|---|
<NULL> | Special rule meaning nothing |
<VOID> | Special rule meaning it never matches any utterance |
<GARBAGE> | Special rule meaning it matches any utterance |
Example:
・・・ こんにちは ( <NULL> {no-name} | アミ {AMI} ) ・・・
・・・ こんばんは <VOID> ・・・
<GARBAGE>+ ( おはよう | おやすみ ) <GARBAGE>+
The special rule <GARBAGE> is a specification uniquely extended by AmiVoice for JSGF and is not included in the standard JSGF specification.
Repetition
You can describe four types of repetition counts: "once", "zero or once", "zero or more repetitions", and "one or more repetitions". The format is as follows:
| Symbol | Description |
|---|---|
[ ] | Symbols indicating "zero or once" |
* | Postfix symbol indicating "zero or more repetitions" |
+ | Postfix symbol indicating "one or more repetitions" |
Example of "once":
・・・ AmiVoice\あみぼいす 音声認識\おんせいにんしき エンジン ・・・
Example of "zero or once":
・・・ [ AmiVoice\あみぼいす ] ・・・
・・・ [ AmiVoice\あみぼいす 音声認識\おんせいにんしき エンジン ] ・・・
Example of "zero or more repetitions":
・・・ AmiVoice\あみぼいす * ・・・
・・・ ( AmiVoice\あみぼいす 音声認識\おんせいにんしき エンジン )* ・・・
Example of "one or more repetitions":
・・・ AmiVoice\あみぼいす + ・・・
・・・ ( AmiVoice\あみぼいす 音声認識\おんせいにんしき エンジン )+ ・・・
Tags
You can describe tags (strings returned to the application as accompanying information when phrases are recognized) within rule definitions. By using tags, it becomes possible to extract information different from the set of words that make up the recognized phrase from the recognized phrase. By setting the same tag to different phrases with the same meaning, it becomes easier for the application side receiving the tag string to implement processing that is independent of wording or language. The format is as follows:
{[Tag]}
Example:
・・・ AmiVoice\あみぼいす 音声認識\おんせいにんしき エンジン {AmiVoice} ・・・
・・・ ( 一つ\ひとつ | 一個\いっこ | 一本\いっぽん ) {1} ・・・
Simple Specification of SRGS Grammar Files
SRGS grammar files come in two formats: ABNF-format SRGS grammar files and XML-format SRGS grammar files. The difference between them is only in their representation format, and there is no difference in the content they can express. The specification of SRGS grammar files almost entirely encompasses the specification of JSGF grammar files, so conversion between JSGF grammar files and SRGS grammar files can be done relatively easily.
The main conversion rules between JSGF grammar files and SRGS grammar files are as follows:
| Item | JSGF Grammar File | ABNF-format SRGS Grammar File | XML-format SRGS Grammar File |
|---|---|---|---|
| Header Information | #JSGF V1.0 MS932 ja-JP; | #ABNF 1.0 MS932; language ja-JP; | <?xml version="1.0" encoding="MS932"> <grammar xmlns="..." version="1.0" xml:lang="ja-JP"> ... </grammar> |
| Grammar Declaration | grammar a; | (None) *Grammar name is expressed by filename | (None) *Grammar name is expressed by filename |
| Public Rule Definition | public <a> = A; | public $a = A; | <rule id="a" scope="public"> A </rule> |
| Private Rule Definition | <b> = B; | $b = B; | <rule id="b"> B </rule> |
| Grouping | ( A B C ) | ( A B C ) | <item>A B C</item> |
| Parallel | ( A | B | C ) | ( A | B | C ) | <one-of> <item>A</item> <item>B</item> <item>C</item> </one-of> |
| Rule Name | < b > | $ b | <ruleref uri="#b"> |
| Special Rule Names | <NULL> <VOID> <GARBAGE> | $NULL $VOID $GARBAGE | <ruleref special="NULL"> <ruleref special="VOID"> <ruleref special="GARBAGE"> |
| 0 or 1 time | [ A ] | [ A ] or A <0-1> | <item repeat="0-1"> A </item> |
| 0 or more times | A * | A <0-> | <item repeat="0-"> A </item> |
| 1 or more times | A + | A <1-> | <item repeat="1-"> A </item> |
| Tag | { T } | { T } | <tag>T</tag> |