IOS10の音声認識

作成したコード

http://dev.classmethod.jp/smartphone/iphone/try-ios10-speech-recognizer/
を参考にXamarin.iOSで以下のコードを作成しました。

C#
1using System;
2using AVFoundation;
3using UIKit;
4using Foundation;
5using Speech;
6
7namespace SpeechRecognizerSample
8{
9	public partial class ViewController : UIViewController
10    {
11		readonly SFSpeechRecognizer speechRecognizer = new SFSpeechRecognizer(new NSLocale("ja-JP"));
12		readonly AVAudioEngine audioEngine = new AVAudioEngine();
13		SFSpeechAudioBufferRecognitionRequest recognitionRequest;
14		SFSpeechRecognitionTask recognitionTask;
15		
16		protected ViewController(IntPtr handle) : base(handle)
17		{
18			// Note: this .ctor should not contain any initialization logic.
19		}
20
21		public override void ViewDidLoad()
22		{
23			base.ViewDidLoad();
24
25            button.Enabled = true;
26            button.SetTitle("音声認識スタート", UIControlState.Normal);
27
28            button.TouchUpInside += (_, e) =>
29            {
30                if(audioEngine.Running)
31                {
32                    audioEngine.Stop();
33                    recognitionRequest?.EndAudio();
34                    button.Enabled = false;
35                    button.SetTitle("停止中", UIControlState.Disabled);
36                }
37                else
38                {
39                    startRecording();
40                    button.SetTitle("音声認識を中止", UIControlState.Normal);
41                }
42            };
43		}
44
45
46        public override void ViewDidAppear(bool animated)
47        {
48            base.ViewDidAppear(animated);
49
50            SFSpeechRecognizer.RequestAuthorization((authStatus) =>
51            {
52                NSOperationQueue.MainQueue.AddOperation(() =>
53                {
54                    switch(authStatus)
55                    {
56                        case SFSpeechRecognizerAuthorizationStatus.Authorized:
57                            button.Enabled = true;
58                            break;
59                        case SFSpeechRecognizerAuthorizationStatus.Denied:
60                            button.Enabled = false;
61                            button.SetTitle("音声認識へのアクセスが拒否されています", UIControlState.Disabled);
62                            break;
63                        case SFSpeechRecognizerAuthorizationStatus.Restricted:
64                            button.Enabled = false;
65                            button.SetTitle("この端末で音声認識はできません", UIControlState.Disabled);
66                            break;
67                        case SFSpeechRecognizerAuthorizationStatus.NotDetermined:
68                            button.Enabled = false;
69                            button.SetTitle("音声認識はまだ許可されていません", UIControlState.Disabled);
70                            break;
71                    }
72                });
73            });
74        }
75
76        void refreshTask()
77        {
78            recognitionTask?.Cancel();
79            recognitionTask = null;
80        }
81        void startAudioEngine()
82        {
83            audioEngine.Prepare();
84            NSError err;
85            if(!audioEngine.StartAndReturnError(out err))
86            {
87                throw new Exception(err.Description);
88            }
89
90            label.Text = "どうぞしゃべってください";
91        }
92        void startRecording()
93        {
94            NSError err;
95            refreshTask();
96
97            var audioSession = AVAudioSession.SharedInstance();
98            err = audioSession.SetCategory(AVAudioSessionCategory.Record);
99            audioSession.SetMode(AVAudioSession.ModeMeasurement, out err);
100            err = audioSession.SetActive(true, AVAudioSessionSetActiveOptions.NotifyOthersOnDeactivation);
101
102            recognitionRequest = new SFSpeechAudioBufferRecognitionRequest();
103            var inputNode = audioEngine.InputNode;
104            if(inputNode == null)
105            {
106                throw new InvalidProgramException("Audio engine has no input node");
107            }
108            recognitionRequest.ShouldReportPartialResults = true;
109            
110            recognitionTask = speechRecognizer.GetRecognitionTask(recognitionRequest, (result, error) =>
111            {
112                var isFinal = false;
113                if(result != null)
114                {
115                    label.Text = result.BestTranscription.FormattedString;
116                    isFinal = result.Final;
117                }
118                if(error != null || isFinal)
119                {
120                    audioEngine.Stop();
121                    inputNode.RemoveTapOnBus(0);
122
123                    recognitionRequest = null;
124                    recognitionTask = null;
125
126                    button.Enabled = true;
127                    button.SetTitle("音声認識スタート", UIControlState.Normal);
128                }
129            });
130
131            var recordingFormat = inputNode.GetBusOutputFormat(0);
132            inputNode.InstallTapOnBus(0, 1024, recordingFormat, (buffer, when) =>
133            {
134                recognitionRequest?.Append(buffer);
135            });
136            startAudioEngine();
137        }
138 	}
139}

画面はラベルとボタンのみです。

困っていること

ボタンをタップして開始し、しゃべると音声認識中に単語や文は認識されます。
しかし、１文字のみの場合に以下のように動作してしまい困っています。
例：

音声「あ」

音声認識中：結果が返ってこない
ボタンタップで終了後：「あ」が返ってくる

音声「か」

音声認識中：「家」が返ってくる
ボタンタップで終了後：「家」「間」などが返ってくる

音声「け」：

音声認識中：「家」が返ってくる
ボタンタップで終了後：「家」が返ってくる

音声「さ」：

音声認識中：「左」が返ってくる
ボタンタップで終了後：「さ」が返ってくる

音声「く」

音声認識中：「９」が返ってくる
ボタンタップで終了後：「９」が返ってくる

実現したいこと

かな１文字を認識対象としたいです。
長さが指定できなさそうなので、その場合は先頭文字で判断できたらと考えています。

行動規範の内容に同意します

回答2件

はじめまして、1文字だとうまく認識されないとのことですね。
SpeechFrameworkを用いてはおそらく現状では難しいかと思います。

自分はネイティブで音声認識アプリを作っていますが、
やはり単語や文章だとそれなりの精度で認識してくれますが、
文字数が短くなればなるほど精度が低くなります。

おそらく理由は、前後の音声情報も参考にして、
認識した文章の精度をあげているためかと思います。
なので一文字だと精度が低いのかと。

何かわかりましたら、自分も教えていただきたいくらいです。
よろしくお願い致します。

投稿2016/12/17 02:41

編集2016/12/17 02:44

退会済みユーザー

総合スコア0

C#
1            recognitionTask = speechRecognizer.GetRecognitionTask(recognitionRequest, (result, error) =>
2{
3// some method...
4}

で定義されている認識結果を取得するコールバックが呼ばれるタイミングは音声バッファがたまり，それを認識サーバに送信して認識結果が返ってきたorエラーが起こった時に呼ばれます．
1文字以外の音声を認識している際はこの音声バッファが1文字以上の長さのものだったため，ちゃんと認識結果が返ってきている，という風に感じたのかと思います．

解決策として考えられるのは(試してはいませんが)，
0. バッファサイズを小さくしてみる
0. 明示的にボタンを押して音声入力が完了したことを伝える
0. ボタンを押してから数秒だけマイクをオンにするようにタイマーを回し，一定時間が過ぎたらボタンが押された時と同じような処理を行い入力が終了したと認識させる

の3点かなと思います．

投稿2016/12/05 14:50