[ Linkit Smart 7688 ] 透過 Python 使用 Microsoft Bing 的 Cognitive Services Speech 語音辨識服務

Microsoft 提供的雲端服務為數眾多,例如:Bing SpeechLanguage Understanding Intelligent ServiceText AnalyticsSpeaker Recognition,將語音傳送給 Microsoft,並透過 Cognitive Services Speech Recognition 的服務將語音轉換為文字後傳送回 Linkit Smart 7688 。


前置準備作業

  1. 準備一片 Linkit Smart 7688 開發板

  2. 將 Linkit Smart 7688 連接至電腦

  3. 更新 Firmware 為 0.9.3
    http://goo.gl/dVLQ2Y

  4. 將 USB 音效卡透過 OTG Cable 安裝於 Linkit Smart 7688

  5. 安裝 MIC 與 Speaker 於外接 USB 音效卡上

  6. 安裝 USB 音效卡相關套件
    REF: http://goo.gl/D5rHtu


Linkit Smart 7688 與 USB 外接音效卡連接圖


Microsoft 端

Step 1. 到 Microsoft 網站申請帳號

https://www.microsoft.com/cognitive-services/


Step 2. 登入 Microsoft Cognitive Services 網站

https://www.microsoft.com/cognitive-services/


Step 3. 點擊 Get started for free


Step 4. 點選 Bing Speech - PreviewI agree to the Microsoft Cognitive Services Terms and Microsoft Privacy StatementSubscribe


Linkit Smart 7688 端

Step 1. SSH 進入 Linkit Smart 7688 中

Imgur


Step 2. 於 Linkit Smart 7688 中取得 Microsoft Token

請參考 取得 Microsoft 的 Cognitive Services Token


Step 3. 於 Linkit Smart 7688 中於 Linkit Smart 7688 中進行語音辨識
import httplib, json, urllib  
import uuid

getUUID = uuid.uuid4()  
print "getUUID: " + str(getUUID)

f = open('turn_on_the_headlights.wav','rb')  
try:  
    wavBody = f.read();
finally:  
    f.close()

requestid = str(getUUID) #this can be any unique GUID  
appid = "D4D52672-91D7-4C74-8AD8-42B1D98141A5"  
locale = "en-US"  
deviceOS = "linux"  
version = "3.0"  
instanceid = str(getUUID) #this can be any unique GUID  
headers = { "Content-type": "audio/wav; samplerate=16000", "Authorization": "Bearer " + accessToken }  
conn = httplib.HTTPSConnection("speech.platform.bing.com")  
conn.request("POST", "/recognize/query?scenarios=ulm&appid=" + appid +"&locale=" + locale + "&device.os=" + deviceOS + "&version=" + version + "&format=json&requestid=" + requestid + "&instanceid=" + instanceid, wavBody, headers)  
response = conn.getresponse()  
print(response.status, response.reason)  
data = response.read()  
print(data)  
conn.close()  
encodedjson = json.dumps(data)  
decodejson = json.loads(data)  
print "\n" + decodejson["results"][0]["lexical"]  


Step 4. 執行 Python Code
python speech_recognition.py  


Github
參考資料
List of blogs

Archer

Having being a full stack engineer. Experience with C, Python, Objective-C, Swift, Node.js, Lua, Linkit Smart 7688, Raspberry Pi, ARM mbed, Arduino, IoT solutions. Contact us : [email protected]

ALL RIGHTS RESERVED. COPYRIGHT © 2016. Designed and Coded by Makee.io