短音频文件的转写
只需将手头的短音频文件(16MB 以下)发送到 AmiVoice API 的 HTTP 接口 endpoint,就可以轻松地将其转换为文本。本教程将使用 curl
命令和 jq
命令来说明 API 的使用方法,而不是编写程序。对于较长的音频文件,我们将在下一个教程「长音频文件的转写」中进行说明。
准备
要执行本教程,您需要以下内容:
- curl
- jq
- 注册 AmiVoice API 并获取
APPKEY
- 准备要转写的音频文件
我们使用 jq
命令来格式化结果,使其更易读。如果您没有安装 jq
,您仍然可以在之后的教程中进行音频转写,所以可以不安装它就继续进行。
curl
请检查您的系统是否已安装 curl
命令。
curl -V
如果没有显示版本,请从 https://curl.se/ 下载适用于您操作系统的 package,或使用 package 管理器安装 curl
。
jq
请检查您的系统是否已安装 jq
命令。
jq -V
如果没有显示版本,请从 https://stedolan.github.io/jq/ 下载适用于您操作系统的 package,或使用 package 管理器安装 jq
。
获取 APPKEY
- 注册 AmiVoice API。
- 登录到您的个人页面,记录下[连接信息]标签中[通用连接信息]中列出的 APPKEY。
AmiVoice Tech Blog 详细介绍了获取 APPKEY
的步骤。关于获取 APPKEY
,也请参考 尝试使用 AmiVoice API。
音频文件
准备要转写的音频文件。 在这里,我们将使用客户端库的示例程序附带的音频文件(test.wav)。
执行
启动终端并复制执行以下命令。在执行时,请将 test.wav
部分替换为您准备的音频文件的路径。此外,请将 {APPKEY}
替换为您自己的密钥。
curl https://acp-api.amivoice.com/v1/recognize \
-F d=-a-general \
-F u={APP_KEY} \
-F a=@test.wav
结果
执行成功后,您将获得如下 JSON 格式的结果:
{"results":[{"tokens":[{"written":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2","confidence":1.00,"starttime":522,"endtime":1578,"spoken":"\u3042\u3069\u3070\u3093\u3059\u3068\u3081\u3067\u3043\u3042"},{"written":"\u306f","confidence":1.00,"starttime":1578,"endtime":1866,"spoken":"\u306f"},{"written":"\u3001","confidence":0.72,"starttime":1866,"endtime":2026,"spoken":"_"},{"written":"\u4eba","confidence":1.00,"starttime":2026,"endtime":2314,"spoken":"\u3072\u3068"},{"written":"\u3068","confidence":1.00,"starttime":2314,"endtime":2426,"spoken":"\u3068"},{"written":"\u6a5f\u68b0","confidence":1.00,"starttime":2426,"endtime":2826,"spoken":"\u304d\u304b\u3044"},{"written":"\u3068","confidence":1.00,"starttime":2826,"endtime":2938,"spoken":"\u3068"},{"written":"\u306e","confidence":1.00,"starttime":2938,"endtime":3082,"spoken":"\u306e"},{"written":"\u81ea\u7136","confidence":1.00,"starttime":3082,"endtime":3434,"spoken":"\u3057\u305c\u3093"},{"written":"\u306a","confidence":1.00,"starttime":3434,"endtime":3530,"spoken":"\u306a"},{"written":"\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3","confidence":1.00,"starttime":3530,"endtime":4378,"spoken":"\u3053\u307f\u3085\u306b\u3051\u30fc\u3057\u3087\u3093"},{"written":"\u3092","confidence":1.00,"starttime":4378,"endtime":4442,"spoken":"\u3092"},{"written":"\u5b9f\u73fe","confidence":1.00,"starttime":4442,"endtime":4922,"spoken":"\u3058\u3064\u3052\u3093"},{"written":"\u3057","confidence":1.00,"starttime":4922,"endtime":5434,"spoken":"\u3057"},{"written":"\u3001","confidence":0.45,"starttime":5434,"endtime":5562,"spoken":"_"},{"written":"\u8c4a\u304b","confidence":1.00,"starttime":5562,"endtime":5994,"spoken":"\u3086\u305f\u304b"},{"written":"\u306a","confidence":1.00,"starttime":5994,"endtime":6090,"spoken":"\u306a"},{"written":"\u672a\u6765","confidence":1.00,"starttime":6090,"endtime":6490,"spoken":"\u307f\u3089\u3044"},{"written":"\u3092","confidence":1.00,"starttime":6490,"endtime":6554,"spoken":"\u3092"},{"written":"\u5275\u9020","confidence":0.93,"starttime":6554,"endtime":7050,"spoken":"\u305d\u3046\u305e\u3046"},{"written":"\u3057\u3066","confidence":0.99,"starttime":7050,"endtime":7210,"spoken":"\u3057\u3066"},{"written":"\u3044\u304f","confidence":1.00,"starttime":7210,"endtime":7418,"spoken":"\u3044\u304f"},{"written":"\u3053\u3068","confidence":1.00,"starttime":7418,"endtime":7690,"spoken":"\u3053\u3068"},{"written":"\u3092","confidence":1.00,"starttime":7690,"endtime":7722,"spoken":"\u3092"},{"written":"\u76ee\u6307\u3057","confidence":0.76,"starttime":7722,"endtime":8090,"spoken":"\u3081\u3056\u3057"},{"written":"\u307e\u3059","confidence":0.76,"starttime":8090,"endtime":8506,"spoken":"\u307e\u3059"},{"written":"\u3002","confidence":0.82,"starttime":8506,"endtime":8794,"spoken":"_"}],"confidence":0.998,"starttime":250,"endtime":8794,"tags":[],"rulename":"","text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001\u8c4a\u304b\u306a\u672a\u6765\u3092\u5275\u9020\u3057\u3066\u3044\u304f\u3053\u3068\u3092\u76ee\u6307\u3057\u307e\u3059\u3002"}],"utteranceid":"20220602/14/018122d637320a301bc194c9_20220602_141433","text":"\u30a2\u30c9\u30d0\u30f3\u30b9\u30c8\u30fb\u30e1\u30c7\u30a3\u30a2\u306f\u3001\u4eba\u3068\u6a5f\u68b0\u3068\u306e\u81ea\u7136\u306a\u30b3\u30df\u30e5\u30cb\u30b1\u30fc\u30b7\u30e7\u30f3\u3092\u5b9f\u73fe\u3057\u3001\u8c4a\u304b\u306a\u672a\u6765\u3092\u5275\u9020\u3057\u3066\u3044\u304f\u3053\u3068\u3092\u76ee\u6307\u3057\u307e\u3059\u3002","code":"","message":""}
识别结果中包含的日语是以 Unicode 转义的 UTF-8 格式。您可以使用开发语言中内置的 JSON 解析器轻松地将其还原。这里我们使用 jq
命令进行转换。
curl -F a=@test.wav "https://acp-api.amivoice.com/v1/recognize?d=-a-general&u=<APPKEY>" | jq
这次,识别结果中的日语应该以可读的格式显示,并带有缩进。在结果中查找 text
。这里包含了音频转写的结果。
"text": "アドバンスト・メディアは、人と機械との自然なコミュニケーションを実現し、豊かな未来を創造していくことを目指します。"
以下是完整的响应示例。除了转写结果,还可以获得单词级别的结果、音频时间和置信度等信息。有关详细信息,请参阅语音识别结果。
响应
{
"results": [
{
"tokens": [
{
"written": "アドバンスト・メディア",
"confidence": 1,
"starttime": 522,
"endtime": 1578,
"spoken": "あどばんすとめでぃあ"
},
{
"written": "は",
"confidence": 1,
"starttime": 1578,
"endtime": 1866,
"spoken": "は"
},
{
"written": "、",
"confidence": 0.72,
"starttime": 1866,
"endtime": 2026,
"spoken": "_"
},
{
"written": "人",
"confidence": 1,
"starttime": 2026,
"endtime": 2314,
"spoken": "ひと"
},
{
"written": "と",
"confidence": 1,
"starttime": 2314,
"endtime": 2426,
"spoken": "と"
},
{
"written": "機械",
"confidence": 1,
"starttime": 2426,
"endtime": 2826,
"spoken": "きかい"
},
{
"written": "と",
"confidence": 1,
"starttime": 2826,
"endtime": 2938,
"spoken": "と"
},
{
"written": "の",
"confidence": 1,
"starttime": 2938,
"endtime": 3082,
"spoken": "の"
},
{
"written": "自然",
"confidence": 1,
"starttime": 3082,
"endtime": 3434,
"spoken": "しぜん"
},
{
"written": "な",
"confidence": 1,
"starttime": 3434,
"endtime": 3530,
"spoken": "な"
},
{
"written": "コミュニケーション",
"confidence": 1,
"starttime": 3530,
"endtime": 4378,
"spoken": "こみゅにけーしょん"
},
{
"written": "を",
"confidence": 1,
"starttime": 4378,
"endtime": 4442,
"spoken": "を"
},
{
"written": "実現",
"confidence": 1,
"starttime": 4442,
"endtime": 4922,
"spoken": "じつげん"
},
{
"written": "し",
"confidence": 1,
"starttime": 4922,
"endtime": 5434,
"spoken": "し"
},
{
"written": "、",
"confidence": 0.45,
"starttime": 5434,
"endtime": 5562,
"spoken": "_"
},
{
"written": "豊か",
"confidence": 1,
"starttime": 5562,
"endtime": 5994,
"spoken": "ゆたか"
},
{
"written": "な",
"confidence": 1,
"starttime": 5994,
"endtime": 6090,
"spoken": "な"
},
{
"written": "未来",
"confidence": 1,
"starttime": 6090,
"endtime": 6490,
"spoken": "みらい"
},
{
"written": "を",
"confidence": 1,
"starttime": 6490,
"endtime": 6554,
"spoken": "を"
},
{
"written": "創造",
"confidence": 0.93,
"starttime": 6554,
"endtime": 7050,
"spoken": "そうぞう"
},
{
"written": "