実現したいこと
現在、Lambdaで動かしている処理が制限時間の15分を超えそうになっており、AWS Glue(Python shell)への移行を検討しています。
そこで、このClassmethodの記事のようにPython shellにてJobを作成・実行し、RedshiftにSQLを投げて実行したいです。
試したことや設定
"接続"を作成
AWS GlueのConsoleの[データカタログ]-[データベース]-[接続]から、対象のRedshiftへの接続を作成
=> "接続のテスト"を実行し、成功することを確認
IAMロールを作成
AmazonS3FullAccess
とAWSGlueServiceRole
を付与したRoleを作成。Glue実行時にはこのRoleを使用。
psycopg2モジュールを配置
psycopg2_binary-2.8.6-cp36-cp36m-manylinux1_x86_64.whlをS3に配置し、Python shell Job実行時にPythonライブラリパスとして追加。
実行結果
上記の設定で、Redshift上のTableにselect
を実行するJobを実行。
Python3
1import psycopg2 2def get_connection(): 3 rs = "postgresql://***USER***:***PASSWORD***@***REDSHIFT ENDPOINT***:5439/***DB NAME***" 4 return psycopg2.connect(rs) 5with get_connection() as conn: 6 with conn.cursor() as cur: 7 cur.execute("select * from SCHEMA.TABLE") 8 rows = cur.fetchall() 9 print(rows)
"接続"をAttachして実行
なぜか、S3へのアクセスがForbidden
となり失敗。(S3に上げたモジュールへのアクセスが失敗している?)
Traceback (most recent call last): File "/tmp/runscript.py", line 114, in <module> temp_file_path = download_user_script(args.scriptLocation) File "/tmp/runscript.py", line 91, in download_user_script download_from_s3(args.scriptLocation, temp_file_path) File "/tmp/runscript.py", line 81, in download_from_s3 s3.download_file(bucket_name, s3_key, new_file_path) File "/usr/local/lib/python3.6/site-packages/boto3/s3/inject.py", line 172, in download_file extra_args=ExtraArgs, callback=Callback) File "/usr/local/lib/python3.6/site-packages/boto3/s3/transfer.py", line 307, in download_file future.result() File "/usr/local/lib/python3.6/site-packages/s3transfer/futures.py", line 106, in result return self._coordinator.result() File "/usr/local/lib/python3.6/site-packages/s3transfer/futures.py", line 265, in result raise self._exception File "/usr/local/lib/python3.6/site-packages/s3transfer/tasks.py", line 255, in _main self._submit(transfer_future=transfer_future, **kwargs) File "/usr/local/lib/python3.6/site-packages/s3transfer/download.py", line 345, in _submit **transfer_future.meta.call_args.extra_args File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call raise error_class(parsed_response, operation_name) botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
"接続"をAttachせずに実行
Libraryは展開できているようだが、Redshiftへの接続エラーが発生。
Traceback (most recent call last): File "/tmp/runscript.py", line 123, in <module> runpy.run_path(temp_file_path, run_name='__main__') File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/tmp/glue-python-scripts-btjivu4_/test.py", line 7, in <module> File "/tmp/glue-python-scripts-btjivu4_/test.py", line 5, in get_connection File "/glue/lib/installation/psycopg2/__init__.py", line 127, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) psycopg2.OperationalError: could not connect to server: No route to host Is the server running on host "REDSHIFT_ENDPOINT" (***.***.***.***) and accepting TCP/IP connections on port 5439? During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/runscript.py", line 142, in <module> raise e_type(e_value).with_traceback(new_stack) File "/tmp/glue-python-scripts-btjivu4_/test.py", line 7, in <module> File "/tmp/glue-python-scripts-btjivu4_/test.py", line 5, in get_connection File "/glue/lib/installation/psycopg2/__init__.py", line 127, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) psycopg2.OperationalError: could not connect to server: No route to host Is the server running on host "REDSHIFT_ENDPOINT" (***.***.***.***) and accepting TCP/IP connections on port 5439? Traceback (most recent call last): File "/tmp/runscript.py", line 123, in <module> runpy.run_path(temp_file_path, run_name='__main__') File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/tmp/glue-python-scripts-btjivu4_/test.py", line 7, in <module> File "/tmp/glue-python-scripts-btjivu4_/test.py", line 5, in get_connection File "/glue/lib/installation/psycopg2/__init__.py", line 127, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) psycopg2.OperationalError: could not connect to server: No route to host Is the server running on host "REDSHIFT_ENDPOINT" (***.***.***.***) and accepting TCP/IP connections on port 5439? During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/runscript.py", line 142, in <module> raise e_type(e_value).with_traceback(new_stack) File "/tmp/glue-python-scripts-btjivu4_/test.py", line 7, in <module> File "/tmp/glue-python-scripts-btjivu4_/test.py", line 5, in get_connection File "/glue/lib/installation/psycopg2/__init__.py", line 127, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) psycopg2.OperationalError: could not connect to server: No route to host Is the server running on host "REDSHIFT_ENDPOINT" (***.***.***.***) and accepting TCP/IP connections on port 5439?
どのようにすればAWS GlueからRedshiftにSQLを投げられるのでしょうか?
知見のある方のお知恵を拝借できれば幸いです。
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。