現在Windows10でDeepLabを用いた機械学習を行いたいと考えております。
導入する環境としてwindows10, corei5-6500@3.20GHz, メモリ8GB, GTX1050のPCを用いました。
Anaconda3の仮想環境下にPython3.6.9, tensorflow1.14.0を用意しDeepLabをインストールしました。
その後、model_test.pyとlocal_test.shを実行し、動作確認を行ったところ、model_test.pyは大量の警告が出たものの、正常終了しました。
一方でlocal_test.shは以下のエラーが発生し、実行が正常に終了しません。
どうすれば実行できるのでしょうか。
vの値がないと言われているので-vオプションで値を指定してあげればいいのかと考えていますがsh local_test.sh -v -1を実行しても同じエラーメッセージが出てくるだけでした。
当方機械学習はおろか、python自体に触れるのが今回が初のため、どのような対処が必要なのか皆目見当がつきません。
お手数をおかけしますがご教授願います。
local_test
1C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. 2 from ._conv import register_converters as _register_converters 3WARNING:tensorflow: 4The TensorFlow contrib module will not be included in TensorFlow 2.0. 5For more information, please see: 6 * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md 7 * https://github.com/tensorflow/addons 8 * https://github.com/tensorflow/io (for I/O related ops) 9If you depend on functionality not listed there, please file an issue. 10 11Traceback (most recent call last): 12 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\absl\flags\_flagvalues.py", line 696, in get_value 13 return next(args) if value is None else value 14StopIteration 15 16During handling of the above exception, another exception occurred: 17 18Traceback (most recent call last): 19 File "C:/Users/XXX/tensorflow/models/research/deeplab/model_test.py", line 147, in <module> 20 tf.test.main() 21 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\tensorflow\python\platform\test.py", line 64, in main 22 return _googletest.main(argv) 23 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\tensorflow\python\platform\googletest.py", line 65, in main 24 benchmark.benchmarks_main(true_main=main_wrapper) 25 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\tensorflow\python\platform\benchmark.py", line 407, in benchmarks_main 26 true_main() 27 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\tensorflow\python\platform\googletest.py", line 64, in main_wrapper 28 return app.run(main=g_main, argv=args) 29 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run 30 _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) 31 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\absl\app.py", line 293, in run 32 flags_parser, 33 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\absl\app.py", line 362, in _run_init 34 flags_parser=flags_parser, 35 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\absl\app.py", line 212, in _register_and_parse_flags_with_usage 36 args_to_main = flags_parser(original_argv) 37 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\tensorflow\python\platform\app.py", line 31, in _parse_flags_tolerate_undef 38 return flags.FLAGS(_sys.argv if argv is None else argv, known_only=True) 39 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\tensorflow\python\platform\flags.py", line 112, in __call__ 40 return self.__dict__['__wrapped'].__call__(*args, **kwargs) 41 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\absl\flags\_flagvalues.py", line 626, in __call__ 42 unknown_flags, unparsed_args = self._parse_args(args, known_only) 43 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\absl\flags\_flagvalues.py", line 744, in _parse_args 44 value = get_value() 45 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\absl\flags\_flagvalues.py", line 698, in get_value 46 raise _exceptions.Error('Missing value for flag ' + arg) # pylint: disable=undefined-loop-variable 47absl.flags._exceptions.Error: Missing value for flag -v
(追記)
local_test.sh内40行目のpython "${WORK_DIR}"/model_test.py -vをpython "${WORK_DIR}"/model_test.py -v -1にしたところ実行が通りました。ただ以下の警告が多少の形を変えて数百行繰り返し発生してしまうのですが、これは何が問題なのでしょうか。
WARNING:tensorflow:Entity <bound method Conv.call of <tensorflow.python.layers.convolutional.Conv2D object at 0x000001C557142828>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method Conv.call of <tensorflow.python.layers.convolutional.Conv2D object at 0x000001C557142828>>: AssertionError: Bad argument number for Name: 3, expecting 4
(追記2)
gastのバージョンを0.3.2から0.2.2にすることで追記に示したエラーが消えました。
出てくる警告はtf.xxx系の関数が非推奨なのでtf.compat.v1.xxxを代わりに使えというものですがこれは無視してよいのでしょうか。
また、以下のエラーが発生し、結局local_test.shは正常終了しません。
ResourceExhaustedErrorなのでtrain.py内のバッチサイズやクロップサイズなどを弄ってみているのですが、解決いたしません...
何か方法をご存知でしょうか。
error
12019-10-04 13:58:24.570618: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at conv_ops.cc:486 : Resource exhausted: OOM when allocating tensor with shape[4,128,257,257] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc 2Traceback (most recent call last): 3 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call 4 return fn(*args) 5 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn 6 options, feed_dict, fetch_list, target_list, run_metadata) 7 File "C:\Users\XXX\Anaconda3\envs\TensorFlow\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun 8 run_metadata) 9tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found. 10 (0) Resource exhausted: OOM when allocating tensor with shape[4,128,257,257] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc 11 [[{{node xception_65/entry_flow/block1/unit_1/xception_module/separable_conv2_pointwise/Conv2D}}]] 12Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. 13 14 [[gradients/xception_65/middle_flow/block1/unit_5/xception_module/separable_conv3_depthwise/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_12968]] 15Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. 16 17 (1) Resource exhausted: OOM when allocating tensor with shape[4,128,257,257] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc 18 [[{{node xception_65/entry_flow/block1/unit_1/xception_module/separable_conv2_pointwise/Conv2D}}]] 19Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. 20 210 successful operations. 220 derived errors ignored.
回答1件
あなたの回答
tips
プレビュー