音声認識とpython

Question

音声認識をどこから始めることができるか知りたいです。ライブラリやかなり「ブラックボックス化」されたものではありませんが、代わりに、実際に簡単な音声認識スクリプトを作成できる場所を知りたいです。私はいくつかの検索を行ったが、それほど多くはないが、私が見たのは、テキストを形成するためにつなぎ合わせることができる「音」または音節の辞書があるということです。だから基本的に私の質問は、どこからこれを始めることができますか？

また、これは少し楽観的であるため、プログラムで使用するライブラリ（現時点では）でも問題ありません。テキストライブラリとAPIの一部のスピーチでは、1つの結果のみが出力されることがわかりました。これは問題ありませんが、現実的ではありません。私の現在のプログラムはすでに文法と入力されたテキストのすべてをチェックしているので、スピーチからテキストへのソフトウェアからのトップ10の結果と言えば、それぞれをチェックして意味のないものを除外することができます。

dr. Neox · Accepted Answer

更新：これはもう機能していません

グーグルがプラットフォームを閉じたため

-

https://pypi.python.org/pypi/pygsr を使用できます

$> pip install pygsr

使用例：

from pygsr import Pygsr speech = Pygsr() # duration in seconds speech.record(3) # select the language phrase, complete_response = speech.speech_to_text('en_US') print phrase

alexis · Answer

音声認識を最初から完全に理解したい場合は、pythonの適切な信号処理パッケージを探してから、ソフトウェアの音声認識独立してを読んでください。

しかし、音声認識は非常に複雑な問題です（基本的には、話すときに音があらゆる種類の方法で相互作用するため）。最良の音声認識ライブラリから始めても、手に入れることができますが、これ以上何もすることがありません。

toine · Answer

Pocketsphinxも良い選択肢です。 SWIGを通じて提供されるPythonバインディングは、スクリプトへの統合を容易にします。

例えば：

from os import environ, path from itertools import izip from pocketsphinx import * from sphinxbase import * MODELDIR = "../../../model" DATADIR = "../../../test/data" # Create a decoder with certain model config = Decoder.default_config() config.set_string('-hmm', path.join(MODELDIR, 'hmm/en_US/hub4wsj_sc_8k')) config.set_string('-lm', path.join(MODELDIR, 'lm/en_US/hub4.5000.DMP')) config.set_string('-dict', path.join(MODELDIR, 'lm/en_US/hub4.5000.dic')) decoder = Decoder(config) # Decode static file. decoder.decode_raw(open(path.join(DATADIR, 'goforward.raw'), 'rb')) # Retrieve hypothesis. hypothesis = decoder.hyp() print 'Best hypothesis: ', hypothesis.best_score, hypothesis.hypstr print 'Best hypothesis segments: ', [seg.Word for seg in decoder.seg()] # Access N best decodings. print 'Best 10 hypothesis: ' for best, i in izip(decoder.nbest(), range(10)): print best.hyp().best_score, best.hyp().hypstr # Decode streaming data. decoder = Decoder(config) decoder.start_utt('goforward') stream = open(path.join(DATADIR, 'goforward.raw'), 'rb') while True: buf = stream.read(1024) if buf: decoder.process_raw(buf, False, False) else: break decoder.end_utt() print 'Stream decoding result:', decoder.hyp().hypstr

anatoly techtonik · Answer

Pythonでの音声認識のテーマについてさらに詳しく知りたい場合は、次のリンクをご覧ください。

http://www.slideshare.net/mchua/sigproc-selfstudy-1732382 -Pythonでの信号処理。再生するのが最も興味深いオーディオ信号を含みます。

Noah Krasser · Answer

私は質問が古いが、将来の人々のためだけであることを知っています：

speech_recognition- Moduleを使用していますが、とても気に入っています。 ~~唯一のものは、Googleを使用して音声を認識するため、インターネットが必要です。しかし、ほとんどの場合、それは問題になりません。~~ 認識はほぼ完璧に機能します。

編集：

speech_recognitionパッケージは、CMUsphinx（オフライン認識を可能にする）など、Googleだけでなく翻訳も使用できます。唯一の違いは、認識コマンドの微妙な変更です。

https://pypi.python.org/pypi/SpeechRecognition/

以下に小さなコード例を示します。

import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: # use the default microphone as the audio source audio = r.listen(source) # listen for the first phrase and extract it into audio data try: print("You said " + r.recognize_google(audio)) # recognize speech using Google Speech Recognition - ONLINE print("You said " + r.recognize_sphinx(audio)) # recognize speech using CMUsphinx Speech Recognition - OFFLINE except LookupError: # speech is unintelligible print("Could not understand audio")

うまくいかないのは、無限ループで聞くことです。数分後、電話が切れます。（クラッシュしていません。単に応答していません。）

編集：無限ループなしでマイクを使用する場合は、録音の長さを指定する必要があります。サンプルコード：

import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: print("Speak:") audio = r.listen(source, None, "time_to_record") # recording

tehmisvh · Answer

Dragonfly は、Windowsでの音声認識のためのクリーンなフレームワークを提供します。ドキュメントの使用例を確認してください。 Dragonflyが提供する大規模な機能を探しているわけではないので、メンテナンスされていない PySpeech ライブラリをご覧ください。

彼らのソースコードは理解しやすいように見えますが、おそらくそれが最初に見たいものです