Automatic Removal of Filler Words

Filler words (unnecessary words) such as "あのー" and "えーっと" are automatically removed from the speech recognition results.

For example, if you speak as follows:

えーっと、会議があるので、えー、それまでに、あのー、資料を作成しておきます。

The recognition result for this speech will be as follows:

会議があるのでそれまでに資料を作成しておきます。

The following types of words are treated as filler words:

Language	Filler word examples
Japanese	あー, あのー, えー, おー, えっと
English	ah, urm, hmm
Chinese	呃, 啊, 哎呀
Korean	어, 으, 음

note

Users cannot add filler words.
Filler words may change during the process of accuracy improvement, and we do not publish a list of filler words.

If the above speech content is included in an audio file named test-with-filler.wav, you can confirm the automatic removal of filler words by executing the following curl command. For details on this procedure, please see Transcribing Short Audio Files. For WebSocket, please see Speech Recognition Request.

curl -sS https://acp-api.amivoice.com/v1/recognize \
      -F u={APPKEY} \
      -F "d=-a-general" \
      -F a=@test-with-filler.wav | jq

Response

{
  "results": [
    {
      "tokens": [
        {
          "written": "会議",
          "confidence": 0.99,
          "starttime": 656,
          "endtime": 1184,
          "spoken": "かいぎ"
        },
        {
          "written": "が",
          "confidence": 1,
          "starttime": 1184,
          "endtime": 1312,
          "spoken": "が"
        },
        {
          "written": "ある",
          "confidence": 1,
          "starttime": 1312,
          "endtime": 1536,
          "spoken": "ある"
        },
        {
          "written": "ので",
          "confidence": 1,
          "starttime": 1536,
          "endtime": 1920,
          "spoken": "ので"
        },
        {
          "written": "それ",
          "confidence": 1,
          "starttime": 2384,
          "endtime": 2736,
          "spoken": "それ"
        },
        {
          "written": "まで",
          "confidence": 1,
          "starttime": 2736,
          "endtime": 3024,
          "spoken": "まで"
        },
        {
          "written": "に",
          "confidence": 1,
          "starttime": 3024,
          "endtime": 3296,
          "spoken": "に"
        },
        {
          "written": "資料",
          "confidence": 0.97,
          "starttime": 3920,
          "endtime": 4384,
          "spoken": "しりょう"
        },
        {
          "written": "を",
          "confidence": 1,
          "starttime": 4384,
          "endtime": 4544,
          "spoken": "を"
        },
        {
          "written": "作成",
          "confidence": 0.98,
          "starttime": 4576,
          "endtime": 5136,
          "spoken": "さくせい"
        },
        {
          "written": "して",
          "confidence": 1,
          "starttime": 5136,
          "endtime": 5392,
          "spoken": "して"
        },
        {
          "written": "おき",
          "confidence": 0.99,
          "starttime": 5392,
          "endtime": 5664,
          "spoken": "おき"
        },
        {
          "written": "ます",
          "confidence": 0.98,
          "starttime": 5664,
          "endtime": 5952,
          "spoken": "ます"
        },
        {
          "written": "。",
          "confidence": 0.21,
          "starttime": 5952,
          "endtime": 5984,
          "spoken": "_"
        }
      ],
      "confidence": 0.993,
      "starttime": 0,
      "endtime": 5984,
      "tags": [],
      "rulename": "",
      "text": "会議があるのでそれまでに資料を作成しておきます。"
    }
  ],
  "utteranceid": "20240801/08/01910b1c09cc0a303c1094c9_20240801_082432",
  "text": "会議があるのでそれまでに資料を作成しておきます。",
  "code": "",
  "message": ""
}

Suppressing Automatic Removal of Filler Words

By setting keepFillerToken=1 in the request parameters, automatic removal of filler words will not occur. For example, this setting can be used when you want to check if a call center operator is using too many filler words in their speech.

Example of recognition results for the above audio:

%えっと%会議があるので%えー%それまでに%あのー%資料を作成しておきます。

Filler words are enclosed in half-width "%" symbols. Please handle this notation appropriately in your program. Here's an example of the recognition result response:

{
  "results": [
    {
      "tokens": [
        {
          "written": "%えっと%",
          "confidence": 0.95,
          "starttime": 0,
          "endtime": 592,
          "spoken": "えっと"
        },
        /* omitted */
      ],
      "text": "%えっと%会議があるので%えー%それまでに%あのー%資料を作成しておきます。",
      /* omitted */
    }
  ],
  "text": "%えっと%会議があるので%えー%それまでに%あのー%資料を作成しておきます。",
  /* omitted */
}

note

When "ぱーせんと" is spoken, "%" becomes a single word. results[0].tokens[].written will be a single character "%", which can be distinguished from the "%" of filler words.

{
  "results": [
    {
      "tokens": [
        {
          "written": "%",
          "confidence": 1,
          "starttime": 0,
          "endtime": 800,
          "spoken": "ぱーせんと"
        }
   /* omitted */
    }
}

For the previously mentioned test-with-filler.wav audio file, set keepFillerToken=1. By executing the following curl command, you can obtain results that include filler words. For details on this procedure, please see Transcribing Short Audio Files. For WebSocket, please see Speech Recognition Request.

Example of execution with curl command

curl -sS https://acp-api.amivoice.com/v1/recognize \
      -F u={APPKEY} \
      -F "d=-a-general keepFillerToken=1" \
      -F a=@test-with-filler.wav | jq

Response

{
  "results": [
    {
      "tokens": [
        {
          "written": "%えっと%",
          "confidence": 0.95,
          "starttime": 0,
          "endtime": 592,
          "spoken": "えっと"
        },
        {
          "written": "会議",
          "confidence": 0.99,
          "starttime": 656,
          "endtime": 1184,
          "spoken": "かいぎ"
        },
        {
          "written": "が",
          "confidence": 1,
          "starttime": 1184,
          "endtime": 1312,
          "spoken": "が"
        },
        {
          "written": "ある",
          "confidence": 1,
          "starttime": 1312,
          "endtime": 1536,
          "spoken": "ある"
        },
        {
          "written": "ので",
          "confidence": 1,
          "starttime": 1536,
          "endtime": 1920,
          "spoken": "ので"
        },
        {
          "written": "%えー%",
          "confidence": 0.99,
          "starttime": 1968,
          "endtime": 2224,
          "spoken": "えー"
        },
        {
          "written": "それ",
          "confidence": 1,
          "starttime": 2224,
          "endtime": 2528,
          "spoken": "それ"
        },
        {
          "written": "まで",
          "confidence": 1,
          "starttime": 2528,
          "endtime": 2800,
          "spoken": "まで"
        },
        {
          "written": "に",
          "confidence": 1,
          "starttime": 2800,
          "endtime": 3088,
          "spoken": "に"
        },
        {
          "written": "%あのー%",
          "confidence": 1,
          "starttime": 3120,
          "endtime": 3600,
          "spoken": "あのー"
        },
        {
          "written": "資料",
          "confidence": 1,
          "starttime": 3712,
          "endtime": 4176,
          "spoken": "しりょう"
        },
        {
          "written": "を",
          "confidence": 1,
          "starttime": 4176,
          "endtime": 4336,
          "spoken": "を"
        },
        {
          "written": "作成",
          "confidence": 1,
          "starttime": 4368,
          "endtime": 4928,
          "spoken": "さくせい"
        },
        {
          "written": "して",
          "confidence": 1,
          "starttime": 4928,
          "endtime": 5184,
          "spoken": "して"
        },
        {
          "written": "おき",
          "confidence": 0.99,
          "starttime": 5184,
          "endtime": 5456,
          "spoken": "おき"
        },
        {
          "written": "ます",
          "confidence": 0.98,
          "starttime": 5456,
          "endtime": 5744,
          "spoken": "ます"
        },
        {
          "written": "。",
          "confidence": 0.32,
          "starttime": 5744,
          "endtime": 5776,
          "spoken": "_"
        }
      ],
      "confidence": 0.993,
      "starttime": 0,
      "endtime": 5776,
      "tags": [],
      "rulename": "",
      "text": "%えっと%会議があるので%えー%それまでに%あのー%資料を作成しておきます。"
    }
  ],
  "utteranceid": "20240801/08/01910b1dde010a301e8894c2_20240801_082632",
  "text": "%えっと%会議があるので%えー%それまでに%あのー%資料を作成しておきます。",
  "code": "",
  "message": ""
}

Suppressing Automatic Removal of Filler Words​

Suppressing Automatic Removal of Filler Words