PocketSphinxを使用して複数のキーワードを認識する

Question

私はPocketSphinxデモをインストールしましたが、UbuntuとEclipseでうまく動作しますが、複数の単語の認識を追加する方法を試すことができません。

私が欲しいのは、コードが単一の単語を認識できるようにすることだけです。それをコード内でswitch()にできます。 "上下左右"。文章を認識したくありません。1つの単語だけです。

これに関するどんな助けでもありがたいです。他のユーザーにも同様の問題があることを発見しましたが、今のところ誰も答えを知りません。

私を困惑させる1つのことは、なぜ「ウェイクアップ」定数をまったく使用する必要があるのかということです。

private static final String KWS_SEARCH = "wakeup"; private static final String KEYPHRASE = "oh mighty computer"; . . . recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);

wakeupは何と関係がありますか？

私はいくつかの進歩を遂げました（？）：addGrammarSearchを使用して.gramファイルを使用して単語をリストすることができます。 up,down,left,right,forwards,backwards、私が言うすべてがそれらの特定の単語である場合にうまくいくようです。ただし、他の単語を使用すると、システムは述べられたものから「最も近い」単語に言われたことを一致させます。理想的には、話し言葉が.gramファイルにない場合に認識させたくない...

Nikolay Shmyrev · Accepted Answer

キーフレーズをファイルするために使用するaddKeywordSearchを使用できます。 //の各フレーズのしきい値を使用して、1行に1つのフレーズを入力します。たとえば、

up /1.0/ down /1.0/ left /1.0/ right /1.0/ forwards /1e-1/

誤警報を回避するために、しきい値を選択する必要があります。

Pixel · Answer

Nikolayのヒント（上記の彼の回答を参照）のおかげで、正常に機能し、リストにない限り単語を認識しない次のコードを開発しました。これをPocketSphinxDemoコードのメインクラスに直接コピーして貼り付けることができます。

public class PocketSphinxActivity extends Activity implements RecognitionListener { private static final String DIGITS_SEARCH = "digits"; private SpeechRecognizer recognizer; @Override public void onCreate(Bundle state) { super.onCreate(state); setContentView(R.layout.main); ((TextView) findViewById(R.id.caption_text)).setText("Preparing the recognizer"); try { Assets assets = new Assets(PocketSphinxActivity.this); File assetDir = assets.syncAssets(); setupRecognizer(assetDir); } catch (IOException e) { // oops } ((TextView) findViewById(R.id.caption_text)).setText("Say up, down, left, right, forwards, backwards"); reset(); } @Override public void onPartialResult(Hypothesis hypothesis) { } @Override public void onResult(Hypothesis hypothesis) { ((TextView) findViewById(R.id.result_text)).setText(""); if (hypothesis != null) { String text = hypothesis.getHypstr(); makeText(getApplicationContext(), text, Toast.LENGTH_SHORT).show(); } } @Override public void onBeginningOfSpeech() { } @Override public void onEndOfSpeech() { reset(); } private void setupRecognizer(File assetsDir) { File modelsDir = new File(assetsDir, "models"); recognizer = defaultSetup().setAcousticModel(new File(modelsDir, "hmm/en-us-semi")) .setDictionary(new File(modelsDir, "dict/cmu07a.dic")) .setRawLogDir(assetsDir).setKeywordThreshold(1e-20f) .getRecognizer(); recognizer.addListener(this); File digitsGrammar = new File(modelsDir, "grammar/digits.gram"); recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar); } private void reset() { recognizer.stop(); recognizer.startListening(DIGITS_SEARCH); } }

digits.gramファイルは次のようになります。

up /1e-1/ down /1e-1/ left /1e-1/ right /1e-1/ forwards /1e-1/ backwards /1e-1/

パフォーマンスのために、二重スラッシュ//内のしきい値を実験する必要があります。ここで、1e-1は0.1を表します（私はそう思います）。最大は1.0だと思います。

午後5時30分なので、仕事をやめることができます。結果。

portsample · Answer

Android Studioで実行できるようにPocketSphinxデモのAntinous修正を更新する作業をしています。これは私がこれまで持っていたものですが、

//Note: change MainActivity to PocketSphinxActivity for demo use... public class MainActivity extends Activity implements RecognitionListener { private static final String DIGITS_SEARCH = "digits"; private SpeechRecognizer recognizer; /* Used to handle permission request */ private static final int PERMISSIONS_REQUEST_RECORD_AUDIO = 1; @Override public void onCreate(Bundle state) { super.onCreate(state); setContentView(R.layout.main); ((TextView) findViewById(R.id.caption_text)) .setText("Preparing the recognizer"); // Check if user has given permission to record audio int permissionCheck = ContextCompat.checkSelfPermission(getApplicationContext(), Manifest.permission.RECORD_AUDIO); if (permissionCheck != PackageManager.PERMISSION_GRANTED) { ActivityCompat.requestPermissions(this, new String[]{Manifest.permission.RECORD_AUDIO}, PERMISSIONS_REQUEST_RECORD_AUDIO); return; } new AsyncTask<Void, Void, Exception>() { @Override protected Exception doInBackground(Void... params) { try { Assets assets = new Assets(MainActivity.this); File assetDir = assets.syncAssets(); setupRecognizer(assetDir); } catch (IOException e) { return e; } return null; } @Override protected void onPostExecute(Exception result) { if (result != null) { ((TextView) findViewById(R.id.caption_text)) .setText("Failed to init recognizer " + result); } else { reset(); } } }.execute(); ((TextView) findViewById(R.id.caption_text)).setText("Say one, two, three, four, five, six..."); } /** * In partial result we get quick updates about current hypothesis. In * keyword spotting mode we can react here, in other modes we need to wait * for final result in onResult. */ @Override public void onPartialResult(Hypothesis hypothesis) { if (hypothesis == null) { return; } else if (hypothesis != null) { if (recognizer != null) { //recognizer.rapidSphinxPartialResult(hypothesis.getHypstr()); String text = hypothesis.getHypstr(); if (text.equals(DIGITS_SEARCH)) { recognizer.cancel(); performAction(); recognizer.startListening(DIGITS_SEARCH); }else{ //Toast.makeText(getApplicationContext(),"Partial result = " +text,Toast.LENGTH_SHORT).show(); } } } } @Override public void onResult(Hypothesis hypothesis) { ((TextView) findViewById(R.id.result_text)).setText(""); if (hypothesis != null) { String text = hypothesis.getHypstr(); makeText(getApplicationContext(), "Hypothesis" +text, Toast.LENGTH_SHORT).show(); }else if(hypothesis == null){ makeText(getApplicationContext(), "hypothesis = null", Toast.LENGTH_SHORT).show(); } } @Override public void onDestroy() { super.onDestroy(); recognizer.cancel(); recognizer.shutdown(); } @Override public void onBeginningOfSpeech() { } @Override public void onEndOfSpeech() { reset(); } @Override public void onTimeout() { } private void setupRecognizer(File assetsDir) throws IOException { // The recognizer can be configured to perform multiple searches // of different kind and switch between them recognizer = defaultSetup() .setAcousticModel(new File(assetsDir, "en-us-ptm")) .setDictionary(new File(assetsDir, "cmudict-en-us.dict")) // .setRawLogDir(assetsDir).setKeywordThreshold(1e-20f) .getRecognizer(); recognizer.addListener(this); File digitsGrammar = new File(assetsDir, "digits.gram"); recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar); } private void reset(){ recognizer.stop(); recognizer.startListening(DIGITS_SEARCH); } @Override public void onError(Exception error) { ((TextView) findViewById(R.id.caption_text)).setText(error.getMessage()); } public void performAction() { // do here whatever you want makeText(getApplicationContext(), "performAction done... ", Toast.LENGTH_SHORT).show(); } }

警告の先手：これは進行中の作業です。後で戻って確認。提案をいただければ幸いです。