csvファイルのデータをembulkを使って、bigqueryにデータ連携するテストを実施しています。
データの一部に日付データがあって、カラム値にはyyyymmddの値とyyyymmddhhmmssの値が混在しています。
混在しているため、通常の方法だといづれかの値がスキップされてしまいます。
そのため、embulk-filter-timestamp_formatを使って文字列として取り込んでデータ連携させたいです。
テストを実施している中で、タイムゾーンが不正?のエラーが出力されています。embulkの定義ファイルでタイムゾーンを定義しているのですが、うまく動作しない状況となります。
■プラグイン embulk (0.9.8 java) embulk-input-oracle (0.9.3) embulk-output-bigquery (0.4.9) embulk-filter-timestamp_format (0.3.1) ■embulk定義ファイル in: type: file path_prefix: xxxxx\DATA_CSV parser: type: csv delimiter: ',' skip_header_line: false charset: UTF-8 newline: CRLF quote: '"' escape: '"' trim_if_not_quoted: false skip_header_lines: 0 allow_extra_columns: false allow_optional_columns: true columns: - {name: xxx001, type: string} - {name: xxx002, type: string} - {name: xxx003, type: timestamp} - {name: xxx004, type: timestamp} filters: - type: timestamp_format default_to_timezone: "Asia/Tokyo" default_to_timestamp_format: "%Y-%m-%d %H:%M:%S" columns: - {name: xxx003, type: timestamp, format: '%Y/%m/%d'} - {name: xxx004, type: timestamp, format: '%Y/%m/%d %H:%M:%S'} out: {type: bigquery, auth_method: json_key, json_keyfile: 'xxxxx.json', project: xxxxx, dataset: xxxx, auto_create_table: true, table: CSV_BQ_TABLE, open_read_timeout_sec: 360000, send_timeout_sec: 360000, read_timeout_sec: 360000} ■実行時のエラー Caused by: java.lang.AbstractMethodError: Method org/embulk/filter/timestamp_format/TimestampParser$TimestampParserColumnOptionImpl.getTimeZoneId()Lcom/google/common/base/Optional; is abstract at org.embulk.filter.timestamp_format.TimestampParser$TimestampParserColumnOptionImpl.getTimeZoneId(TimestampParser.java) at org.embulk.spi.time.TimestampParser.<init>(TimestampParser.java:27) at org.embulk.filter.timestamp_format.TimestampParser.getTimestampParser(TimestampParser.java:249) at org.embulk.filter.timestamp_format.TimestampParser.<init>(TimestampParser.java:70) at org.embulk.filter.timestamp_format.ColumnCaster.getTimestampParser(ColumnCaster.java:91) at org.embulk.filter.timestamp_format.ColumnCaster.buildTimestampParserMap(ColumnCaster.java:65) at org.embulk.filter.timestamp_format.ColumnCaster.<init>(ColumnCaster.java:51) at org.embulk.filter.timestamp_format.ColumnVisitorImpl.<init>(ColumnVisitorImpl.java:44) at org.embulk.filter.timestamp_format.TimestampFormatFilterPlugin$1.<init>(TimestampFormatFilterPlugin.java:162) at org.embulk.filter.timestamp_format.TimestampFormatFilterPlugin.open(TimestampFormatFilterPlugin.java:159) at org.embulk.spi.util.Filters.open(Filters.java:50) at org.embulk.spi.util.Executors.process(Executors.java:59) at org.embulk.spi.util.Executors.process(Executors.java:38) at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:172) at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:169) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Error: java.lang.RuntimeException: java.lang.AbstractMethodError: Method org/embulk/filter/timestamp_format/TimestampParser$TimestampParserColumnOptionImpl.getTimeZoneId()Lcom/google/common/base/Optional; is abstract
※フィルタ正常インストール後の実行結果、スキップされる。
2019-02-19 18:25:03.497 +0900: Embulk v0.9.15 2019-02-19 18:25:04.366 +0900 [WARN] (main): DEPRECATION: JRuby org.jruby.embed.ScriptingContainer is directly injected. 2019-02-19 18:25:08.661 +0900 [INFO] (main): Gem's home and path are set by default: "C:\Users\xxxx.embulk\lib\gems" 2019-02-19 18:25:10.668 +0900 [INFO] (main): Started Embulk v0.9.15 2019-02-19 18:25:18.645 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-bigquery (0.4.9) 2019-02-19 18:25:18.801 +0900 [INFO] (0001:transaction): Loaded plugin embulk-filter-timestamp_format (0.3.2) 2019-02-19 18:25:18.894 +0900 [INFO] (0001:transaction): Listing local files at directory 'C:\Users\xxxx\Desktop\embulk\DATA_CSV' filtering filename by prefix '' 2019-02-19 18:25:18.894 +0900 [INFO] (0001:transaction): "follow_symlinks" is set false. Note that symbolic links to directories are skipped. 2019-02-19 18:25:19.003 +0900 [INFO] (0001:transaction): Loading files [C:\Users\xxxx\Desktop\embulk\DATA_CSV\SRDCSBAK.B_RS_TRAN_2_11.DAT, C:\Users\xxxx\Desktop\embulk\DATA_CSV\tmp\SRDCSBAK.B_RS_TRAN_2_690.DAT] 2019-02-19 18:25:19.097 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=8 / tasks=79 2019-02-19 18:25:19.238 +0900 [INFO] (0001:transaction): embulk-output-bigquery: Get dataset... fluid-emissary-216806:embulk_test 2019-02-19 18:25:21.503 +0900 [INFO] (0001:transaction): embulk-output-bigquery: Create table... fluid-emissary-216806:embulk_test.LOAD_TEMP_a1f6a8a7_ef2d_4e0b_a061_fe5190c6ba8b_CSV_BQ_TABLE 2019-02-19 18:25:22.539 +0900 [INFO] (0001:transaction): {done: 0 / 79, running: 0}
※2019/2/20 001 定義
in: type: file path_prefix: xxx\DATA_CSV parser: type: csv delimiter: ',' skip_header_line: false charset: UTF-8 newline: CRLF quote: '"' escape: '"' trim_if_not_quoted: false skip_header_lines: 0 allow_extra_columns: false allow_optional_columns: true columns: - {name: colname001, type: string} - {name: colname002, type: string} - {name: colname003, type: string} - {name: colname004, type: timestamp} - {name: colname005, type: timestamp} - {name: colname006, type: timestamp} - {name: colname007, type: timestamp} - {name: colname008, type: timestamp} filters: - type: timestamp_format default_to_timezone: "Asia/Tokyo" default_from_timestamp_format: ["%Y/%m/%d", "%Y/%m/%d %H:%M:%S"] columns: - {name: colname004, type: timestamp, format: '%Y/%m/%d'} - {name: colname005, type: timestamp, format: '%Y/%m/%d %H:%M:%S'} - {name: colname006, type: timestamp, format: '%Y/%m/%d %H:%M:%S'} - {name: colname007, type: timestamp, format: '%Y/%m/%d %H:%M:%S'} - {name: colname008, type: timestamp, format: '%Y/%m/%d %H:%M:%S'} out: {type: bigquery, auth_method: json_key, json_keyfile: 'xxxxxx.json', project: xxxx-216806, dataset: embulk_test, auto_create_table: true, table: CSV_BQ_TABLE, open_read_timeout_sec: 360000, send_timeout_sec: 360000, read_timeout_sec: 360000}

回答7件
あなたの回答
tips
プレビュー
下記のような回答は推奨されていません。
このような回答には修正を依頼しましょう。
また依頼した内容が修正された場合は、修正依頼を取り消すようにしましょう。
2019/02/20 01:22
2019/02/20 01:29
2019/02/20 01:57