长音频文件的转录

本教程将逐步说明如何使用 API 将会议、演讲、呼叫中心通话录音等长音频转换为文本。我们将使用 curl 命令和 jq 命令来说明，而不是编写程序。

准备

要执行此教程，您需要以下内容：

curl
jq
注册 AmiVoice API 并获取 APPKEY
准备要转录的音频文件

备注

我们使用 jq 命令来格式化结果，使其更易于阅读。即使没有安装 jq，您也可以继续进行本教程的音频转录，不安装也没关系。

curl

请检查您的系统是否已安装 curl 命令。

curl -V

如果没有显示版本信息，请从 https://curl.se/ 下载适合您操作系统的 package，或使用 package 管理器安装 curl。

jq

请检查您的系统是否已安装 jq 命令。

jq -V

如果没有显示版本信息，请从 https://stedolan.github.io/jq/ 下载适合您操作系统的 package，或使用 package 管理器安装 jq。

获取 APPKEY

注册 AmiVoice API。
登录个人页面，记录 [连接信息] 标签中 [通用连接信息] 中列出的 APPKEY。

提示

AmiVoice Tech Blog 详细描述了获取 APPKEY 的步骤。有关获取 APPKEY 的信息，请参阅尝试使用 AmiVoice API。

音频文件

准备要转录的音频文件。在这里，我们将使用与客户端库的示例程序一起提供的音频文件 test.wav。

备注

准备音频文件时，请注意支持的音频文件格式。有关支持的格式，请参阅关于音频格式。
可接受的音频文件长度有限制。请参阅限制事项。

执行

1. 语音识别请求

语音识别请求时指定的参数与同步 HTTP 接口完全相同。

curl https://acp-api-async.amivoice.com/v1/recognitions \
     -F d=-a-general \
     -F u={APP_KEY} \
     -F a=@test.wav

请求成功后，您将收到如下响应。请求将作为Job进入队列，并按顺序处理。

{"sessionid":"017c25ec12c00a304474a999","text":"..."}

2. 获取Job状态和结果

您可以使用请求中获得的 sessionid 来获取Job状态(status)和结果。在获得语音识别结果之前，需要多次执行此操作。请在 Authorization header 中指定 {APPKEY}。

curl -H "Authorization: Bearer {APPKEY}" \
     https://acp-api-async.amivoice.com/v1/recognitions/017c25ec12c00a304474a999

发送请求后，status 初始状态为 queued。

{"service_id":"{YOUR_SERVICE_ID}","session_id":"017c25ec12c00a304474a999","status":"queued"}

当Job从队列中取出时，status 变为 started 状态。

{"service_id":"{YOUR_SERVICE_ID}","session_id":"017c25ec12c00a304474a999","status":"started"}

当实际开始语音识别处理时，status 变为 processing 状态。您可以使用 API 接收到的音频大小和 MD5 校验和来确认所发送的音频是否正确处理。processing 状态所需的时间取决于音频的长度。

{'audio_md5': '40f59fe5fc7745c33b33af44be43f6ad', 'audio_size': 306980, 'service_id': '{YOUR_SERVICE_ID}', 'session_id': '017c25ec12c00a304474a999', 'status': 'processing'}

结果

语音识别完成后，status 变为 completed 状态。此时，您可以在 results 中获取语音识别结果。我们使用 jq 命令来格式化结果，使其更易于阅读。

curl -H "Authorization: Bearer {APPKEY}" \
     https://acp-api-async.amivoice.com/v1/recognitions/017c25ec12c00a304474a999 | jq

以下是完整的响应示例。除了转录结果外，您还可以获得单词级别的结果、音频时间和置信度等信息。有关详细信息，请参阅语音识别结果。

响应

{
  "audio_md5": "40f59fe5fc7745c33b33af44be43f6ad",
  "audio_size": 306980,
  "results": {
    "code": "",
    "message": "",
    "segments": [
      {
        "code": "",
        "message": "",
        "results": [
          {
            "confidence": 1.0,
            "endtime": 8778,
            "rulename": "",
            "starttime": 250,
            "tags": [],
            "text": "アドバンスト・メディアは、人と機械等の自然なコミュニケーションを実現し、豊かな未来を創造していくことをめざします。",
            "tokens": [
              {
                "confidence": 1.0,
                "endtime": 1578,
                "spoken": "あどばんすとめでぃあ",
                "starttime": 570,
                "written": "アドバンスト・メディア"
              },
              {
                "confidence": 1.0,
                "endtime": 1850,
                "spoken": "は",
                "starttime": 1578,
                "written": "は"
              },
              {
                "confidence": 0.77,
                "endtime": 2010,
                "spoken": "_",
                "starttime": 1850,
                "written": "、"
              },
              {
                "confidence": 1.0,
                "endtime": 2314,
                "spoken": "ひと",
                "starttime": 2010,
                "written": "人"
              },
              {
                "confidence": 1.0,
                "endtime": 2426,
                "spoken": "と",
                "starttime": 2314,
                "written": "と"
              },
              {
                "confidence": 1.0,
                "endtime": 2826,
                "spoken": "きかい",
                "starttime": 2426,
                "written": "機械"
              },
              {
                "confidence": 0.76,
                "endtime": 2922,
                "spoken": "とう",
                "starttime": 2826,
                "written": "等"
              },
              {
                "confidence": 1.0,
                "endtime": 3082,
                "spoken": "の",
                "starttime": 2922,
                "written": "の"
              },
              {
                "confidence": 1.0,
                "endtime": 3434,
                "spoken": "しぜん",
                "starttime": 3082,
                "written": "自然"
              },
              {
                "confidence": 1.0,
                "endtime": 3530,
                "spoken": "な",
                "starttime": 3434,
                "written": "な"
              },
              {
                "confidence": 1.0,
                "endtime": 4362,
                "spoken": "こみゅにけーしょん",
                "starttime": 3530,
                "written": "コミュニケーション"
              },
              {
                "confidence": 1.0,
                "endtime": 4442,
                "spoken": "を",
                "starttime": 4362,
                "written": "を"
              },
              {
                "confidence": 1.0,
                "endtime": 4906,
                "spoken": "じつげん",
                "starttime": 4442,
                "written": "実現"
              },
              {
                "confidence": 1.0,
                "endtime": 5242,
                "spoken": "し",
                "starttime": 4906,
                "written": "し"
              },
              {
                "confidence": 0.83,
                "endtime": 5642,
                "spoken": "_",
                "starttime": 5242,
                "written": "、"
              },
              {
                "confidence": 1.0,
                "endtime": 5978,
                "spoken": "ゆたか",
                "starttime": 5642,
                "written": "豊か"
              },
              {
                "confidence": 1.0,
                "endtime": 6090,
                "spoken": "な",
                "starttime": 5978,
                "written": "な"
              },
              {
                "confidence": 1.0,
                "endtime": 6490,
                "spoken": "みらい",
                "starttime": 6090,
                "written": "未来"
              },
              {
                "confidence": 1.0,
                "endtime": 6554,
                "spoken": "を",
                "starttime": 6490,
                "written": "を"
              },
              {
                "confidence": 0.92,
                "endtime": 7034,
                "spoken": "そうぞう",
                "starttime": 6554,
                "written": "創造"
              },
              {
                "confidence": 1.0,
                "endtime": 7210,
                "spoken": "して",
                "starttime": 7034,
                "written": "して"
              },
              {
                "confidence": 1.0,
                "endtime": 7402,
                "spoken": "いく",
                "starttime": 7210,
                "written": "いく"
              },
              {
                "confidence": 0.8,
                "endtime": 7674,
                "spoken": "こと",
                "starttime": 7402,
                "written": "こと"
              },
              {
                "confidence": 1.0,
                "endtime": 7706,
                "spoken": "を",
                "starttime": 7674,
                "written": "を"
              },
              {
                "confidence": 0.78,
                "endtime": 7962,
                "spoken": "めざ",
                "starttime": 7706,
                "written": "めざ"
              },
              {
                "confidence": 0.78,
                "endtime": 8490,
                "spoken": "します",
                "starttime": 7962,
                "written": "します"
              },
              {
                "confidence": 0.83,
                "endtime": 8778,
                "spoken": "_",
                "starttime": 8490,
                "written": "。"
              }
            ]
          }
        ],
        "text": "アドバンスト・メディアは、人と機械等の自然なコミュニケーションを実現し、豊かな未来を創造していくことをめざします。"
      }
    ],
    "text": "アドバンスト・メディアは、人と機械等の自然なコミュニケーションを実現し、豊かな未来を創造していくことをめざします。",
    "utteranceid": "20210927/06/017c25ed38cc0a30425239d0_20210927_062436[nolog]"
  },
  "service_id": "{YOUR_SERVICE_ID}",
  "session_id": "017c25ec12c00a304474a999",
  "status": "completed"
}

下一步

在使用指南中说明了包括此处使用的 AmiVoice API 的异步 HTTP 接口在内的使用 AmiVoice API 进行语音转文字的方法。
在使用指南中，特别是关于请求时可以设置的参数，请参阅请求参数；关于响应的详细信息，请参阅语音识别结果；关于 AmiVoice API 的异步 HTTP 接口，请参阅异步 HTTP 接口。
此外，API 参考请参阅异步 HTTP 接口。

准备​

curl​

jq​

获取 APPKEY​

音频文件​

执行​

1. 语音识别请求​

2. 获取Job状态和结果​

结果​

下一步​

准备

curl

jq

获取 APPKEY

音频文件

执行

1. 语音识别请求

2. 获取Job状态和结果

结果

下一步