whisperの音声認識について

Question

### 前提 Pythonの音声認識ライブラリ「Whisper」を試してみようと思ったのですが、うまくいかず、質問させて頂きました。 ### 実現したいこと jupyter notebookのファイルと同じフォルダ内にある音声ファイルを、 whisperで読み込み、音声認識を行う。 ### 発生している問題・エラーメッセージ ``` FileNotFoundError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_4848/290471416.py in 5 path = ".\testsample.mp3" 6 ----> 7 result = model.transcribe(path, verbose=True, language='ja') 8 print(result["text"]) ~\anaconda3\envs\pip-env\lib\site-packages\whisper ranscribe.py in transcribe(model, audio, verbose, temperature, compression_ratio_threshold, logprob_threshold, no_speech_threshold, condition_on_previous_text, **decode_options) 82 decode_options["fp16"] = False 83 ---> 84 mel = log_mel_spectrogram(audio) 85 86 if decode_options.get("language", None) is None: ~\anaconda3\envs\pip-env\lib\site-packages\whisper\audio.py in log_mel_spectrogram(audio, n_mels) 109 if not torch.is_tensor(audio): 110 if isinstance(audio, str): --> 111 audio = load_audio(audio) 112 audio = torch.from_numpy(audio) 113 ~\anaconda3\envs\pip-env\lib\site-packages\whisper\audio.py in load_audio(file, sr) 42 ffmpeg.input(file, threads=0) 43 .output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sr) ---> 44 .run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True) 45 ) 46 except ffmpeg.Error as e: ~\anaconda3\envs\pip-env\lib\site-packages\ffmpeg\_run.py in run(stream_spec, cmd, capture_stdout, capture_stderr, input, quiet, overwrite_output) 318 pipe_stderr=capture_stderr, 319 quiet=quiet, --> 320 overwrite_output=overwrite_output, 321 ) 322 out, err = process.communicate(input) ~\anaconda3\envs\pip-env\lib\site-packages\ffmpeg\_run.py in run_async(stream_spec, cmd, pipe_stdin, pipe_stdout, pipe_stderr, quiet, overwrite_output) 283 stderr_stream = subprocess.PIPE if pipe_stderr or quiet else None 284 return subprocess.Popen( --> 285 args, stdin=stdin_stream, stdout=stdout_stream, stderr=stderr_stream 286 ) 287 ~\anaconda3\envs\pip-env\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text) 798 c2pread, c2pwrite, 799 errread, errwrite, --> 800 restore_signals, start_new_session) 801 except: 802 # Cleanup if the child failed starting. ~\anaconda3\envs\pip-env\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session) 1205 env, 1206 os.fspath(cwd) if cwd is not None else None, -> 1207 startupinfo) 1208 finally: 1209 # Child is launched. Close the parent's copy of those pipe FileNotFoundError: [WinError 2] 指定されたファイルが見つかりません。 ``` ### 該当のソースコード ```python import whisper model = whisper.load_model("large") path = "testsample.mp3" result = model.transcribe(path, verbose=True, language='ja') print(result["text"]) ``` ### 試したこと上記のソースコードを実行すると、エラー「FileNotFoundError: [WinError 2] 指定されたファイルが見つかりません。」が出ます。何かしらパスがおかしいのかと思い、読み込みたい音声ファイルがあるフォルダ内に「aaa.csv」というファイルを作って、以下のことを試しました。 ```python import glob files = glob.glob("./*") print(files) #出力　['.\aaa.csv', '.\testsample.mp3', '.\whisper-test.ipynb'] ``` その結果、csvファイルも音声ファイルもちゃんと認識されており、下記のコードは問題無く実行できました。 ```python import pandas as pd df=pd.read_csv(files[0]) ``` また、該当のソースコード内のpathの部分をfiles[1]に変えても認識されませんでした。なぜこのような現象が起こるのでしょうか？ ### 試したこと2 jbpb0様に提案頂いた以下のコードの実行を試みたところ、やはりエラーが出ました。 ```python import ffmpeg file = '音声ファイル名' sr = 16000 out, _ = (ffmpeg.input(file, threads=0).output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sr).run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)) ``` ```python FileNotFoundError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_13216/2098733105.py in 2 file = 'testsample.mp3' 3 sr = 16000 ----> 4 out, _ = (ffmpeg.input(file, threads=0).output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sr).run(cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True)) ~\anaconda3\envs\pip-env\lib\site-packages\ffmpeg\_run.py in run(stream_spec, cmd, capture_stdout, capture_stderr, input, quiet, overwrite_output) 318 pipe_stderr=capture_stderr, 319 quiet=quiet, --> 320 overwrite_output=overwrite_output, 321 ) 322 out, err = process.communicate(input) ~\anaconda3\envs\pip-env\lib\site-packages\ffmpeg\_run.py in run_async(stream_spec, cmd, pipe_stdin, pipe_stdout, pipe_stderr, quiet, overwrite_output) 283 stderr_stream = subprocess.PIPE if pipe_stderr or quiet else None 284 return subprocess.Popen( --> 285 args, stdin=stdin_stream, stdout=stdout_stream, stderr=stderr_stream 286 ) 287 ~\anaconda3\envs\pip-env\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text) 798 c2pread, c2pwrite, 799 errread, errwrite, --> 800 restore_signals, start_new_session) 801 except: 802 # Cleanup if the child failed starting. ~\anaconda3\envs\pip-env\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session) 1205 env, 1206 os.fspath(cwd) if cwd is not None else None, -> 1207 startupinfo) 1208 finally: 1209 # Child is launched. Close the parent's copy of those pipe FileNotFoundError: [WinError 2] 指定されたファイルが見つかりません。 ``` ### 補足情報（FW/ツールのバージョンなど） python 3.7.11 whisper 1.0

Accepted Answer

ffmpeg「本体」をインストールしてください

参考
[Setup](https://github.com/openai/whisper#setup)
It also requires the command-line tool ffmpeg to be installed on your system

前提

実現したいこと

発生している問題・エラーメッセージ

該当のソースコード

試したこと

試したこと2

補足情報（FW/ツールのバージョンなど）

関連した質問