トップ AWS Lambdaに関する質問 Lambda(Python)でS3のcsvを読み込んで、Opensearchへ転送したい

編集履歴

質問編集履歴

修正を追記

2022/12/05 09:32

投稿

msy47

スコア26

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -27,13 +27,15 @@
 ### 発生している問題・エラーメッセージ
-ストリーム方式csv読み込みを試みるもエラーが発生している
+ストリーム方式csv読み込み、配列に加えた後にdocumentへ入れる想定
 ```
 エラーメッセージ：
-[ERROR] Runtime.ImportModuleError: Unable to import module `sample` : No module named `BytesIO`
+[ERROR] Runtime.UserCodesSyntaxError : Syntax error in module `sample` : unexpected indent (sample.py, line47)
-    Trackback (most recent call last):
+Trackback (most recent call last):
+  File "/var/task/sample.py" Line 47
+    for row in csv.reader(textIo):
 ```
@@ -48,6 +50,15 @@
 import io
 import BytesIO
 from requests_aws4auth import AWS4Auth
+#配列
+vmanagename = []
+ip = []
+hostname = []
+session1 = []
+session2 = []
+sessionmax = []
 region = '' # e.g. us-west-1
 service = 'es'

AWS Lambda Python Amazon S3

タイトル変更

2022/12/05 02:43

投稿

msy47

スコア26

test CHANGED Viewed

	@@ -1 +1 @@
1	- Lambda(Python)でS3の(csv)読み込んでをOpensearchへ転送したい
1	+ Lambda(Python)でS3のcsvを読み込んで、Opensearchへ転送したい

test CHANGED Viewed

File without changes

AWS Lambda Python Amazon S3

変更

2022/12/05 02:43

投稿

msy47

スコア26

test CHANGED Viewed

	@@ -1 +1 @@
1	- Lambda(Python)でS3(csv)をOpensearchへ転送~~するロジックにつ~~いて
1	+ Lambda(Python)でS3の(csv)読み込んでをOpensearchへ転送したい

test CHANGED Viewed

File without changes

AWS Lambda Python Amazon S3

編集

2022/12/05 02:15

投稿

msy47

スコア26

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -2,12 +2,13 @@
 AWSのLambda、S3、Opensearchの使用を想定しています。
-以下のサンプルソースから
+サンプルソースから、以下の動作は確認できた。
 Amazon OpenSearch Service へのストリーミングデータをロードする - Amazon OpenSearch Service
 https://docs.aws.amazon.com/ja_jp/opensearch-service/latest/developerguide/integrations.html
 　→「Amazon S3 からストリーミングデータをロードする」
 １．S3へのcsvアップロードをトリガーとし
 ２．csvファイルをOpensearchへ転送する
 という動作は確認できたのですが、
 実際に利用するCSVはlogでなどではなく
 以下のような単純なコンマ区切りのファイルになります。
@@ -15,28 +16,24 @@
 ### 実現したいこと
-正規表現取り込みではなく、サンプルをもとに
-コンマ区切りのcsv取り込みロジックに変更したいと考えております。
-- [ ] 正規表現ではなくコンマ区切りの取り込みとしたい
+- [ ] 正規表現取り込みをcsv読み取りへ変更したい。
-最終的には、csvのデータをOpensearch上でグラフ化(視覚化)を目指しております。
+- [ ] ゴールは、取り込んだcsvデータをOpensearch上でグラフ化(視覚化)を実現したい
-前提として、vmanage apiを使用してて得たjsonの結果を、必要な項目だけに変換したcsvとなります。
+※フォーマットについて随時確認中。
 csv中身例：
-vmanage, ip, host, session1, session2, session3
+vmanage name, ip, host, session1, session2, sessionmax
-vmanage, 1.1.1.1, test, 10, 9, 10
+vmanage-1, 1.1.1.1, testhost, 10, 9, 20
 ### 発生している問題・エラーメッセージ
+ストリーム方式csv読み込みを試みるもエラーが発生している
 ```
 エラーメッセージ：
-Cloud Watchより
-[ERROR] TypeError : an interger is required (got type stre)
+[ERROR] Runtime.ImportModuleError: Unable to import module `sample` : No module named `BytesIO`
-Trackback (most recent call last):
+    Trackback (most recent call last):
- File "/var/task/sample.py", line 37, in handler
-   lines = body.splitlines(',')
 ```
@@ -47,6 +44,9 @@
 import boto3
 import re
 import requests
+import csv
+import io
+import BytesIO
 from requests_aws4auth import AWS4Auth
 region = '' # e.g. us-west-1
@@ -63,11 +63,6 @@
 s3 = boto3.client('s3')
-# Regular expressions used to parse some simple log lines
-ip_pattern = re.compile('(\d+\.\d+\.\d+\.\d+)')
-time_pattern = re.compile('\[(\d+\/\w\w\w\/\d\d\d\d:\d\d:\d\d:\d\d\s-\d\d\d\d)\]')
-message_pattern = re.compile('\"(.+)\"')
 # Lambda execution starts here
 def handler(event, context):
     for record in event['Records']:
@@ -79,56 +74,26 @@
         # Get, read, and split the file into lines
         obj = s3.get_object(Bucket=bucket, Key=key)
         body = obj['Body'].read()
-        lines = body.splitlines()
+        textIo = io.TextIOWrapper(io.BytesIO(body))
         # Match the regular expressions to each line and index the JSON
+        for row in lines csv.reader(textIo):
+            print(row)
+            vmanagename = row[0]
+            ip = row[1]
-        for line in lines:
+            hostname = row[2]
-            line = line.decode("utf-8")
+            session1 = row[3]
+            session2 = row[4]
-            ip = ip_pattern.search(line).group(1)
+            sessionmax = row[5]
-            timestamp = time_pattern.search(line).group(1)
-            message = message_pattern.search(line).group(1)
-            document = { "ip": ip, "timestamp": timestamp, "message": message }
+            document = { "vmanagename ": vmanagename , "ip  ": ip , "hostname ": hostname , "session1 ": session1 , "session2": session2, "sessionmax ": sessionmax }
             r = requests.post(url, auth=awsauth, json=document, headers=headers)
 ```
 ### 試したこと
-・正規表現を廃止し
 ・csvのコンマ区切りを読み取り
-・読み取ったcsvを辞書型でjsonにして送信したい
+・csvを辞書型にしたほうが良いか
-既存のsearch(line).groupをの箇所をどのように変更すればよいか
-検討がつかず。。。ご助力いただきたく。
-```Python/Lambda
-# Regular expressions used to parse some simple log lines
-# 正規表現検索しない
-# ip_pattern = re.compile('(\d+\.\d+\.\d+\.\d+)')
-# time_pattern = re.compile('\[(\d+\/\w\w\w\/\d\d\d\d:\d\d:\d\d:\d\d\s-\d\d\d\d)\]')
-# message_pattern = re.compile('\"(.+)\"')
-# Lambda execution starts here
-def handler(event, context):
-    for record in event['Records']:
-        # Get the bucket name and key for the new file
-        bucket = record['s3']['bucket']['name']
-        key = record['s3']['object']['key']
-        # Get, read, and split the file into lines
-        obj = s3.get_object(Bucket=bucket, Key=key)
-        body = obj['Body'].read()
-       # csvはカンマ区切りで
-        lines = body.splitlines(',')
-        # Match the regular expressions to each line and index the JSON
-        for line in lines:
-            line = line.decode("utf-8")
-           #以下の箇所をどのように変更すればよいか
+・その他の方法について検討中
-            ip = ip_pattern.search(line).group(1)
-            timestamp = time_pattern.search(line).group(1)
-            message = message_pattern.search(line).group(1)
-```
 ### 補足情報（FW/ツールのバージョンなど）

AWS Lambda Python Amazon S3

試したこと、エラーを追記

2022/11/30 07:42

投稿

msy47

スコア26

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -30,7 +30,14 @@
 ```
-エラーメッセージ：なし
+エラーメッセージ：
+Cloud Watchより
+[ERROR] TypeError : an interger is required (got type stre)
+Trackback (most recent call last):
+ File "/var/task/sample.py", line 37, in handler
+   lines = body.splitlines(',')
 ```
 ### 該当のソースコード
@@ -92,6 +99,37 @@
 既存のsearch(line).groupをの箇所をどのように変更すればよいか
 検討がつかず。。。ご助力いただきたく。
+```Python/Lambda
+# Regular expressions used to parse some simple log lines
+# 正規表現検索しない
+# ip_pattern = re.compile('(\d+\.\d+\.\d+\.\d+)')
+# time_pattern = re.compile('\[(\d+\/\w\w\w\/\d\d\d\d:\d\d:\d\d:\d\d\s-\d\d\d\d)\]')
+# message_pattern = re.compile('\"(.+)\"')
+# Lambda execution starts here
+def handler(event, context):
+    for record in event['Records']:
+        # Get the bucket name and key for the new file
+        bucket = record['s3']['bucket']['name']
+        key = record['s3']['object']['key']
+        # Get, read, and split the file into lines
+        obj = s3.get_object(Bucket=bucket, Key=key)
+        body = obj['Body'].read()
+       # csvはカンマ区切りで
+        lines = body.splitlines(',')
+        # Match the regular expressions to each line and index the JSON
+        for line in lines:
+            line = line.decode("utf-8")
+           #以下の箇所をどのように変更すればよいか
+            ip = ip_pattern.search(line).group(1)
+            timestamp = time_pattern.search(line).group(1)
+            message = message_pattern.search(line).group(1)
+```
 ### 補足情報（FW/ツールのバージョンなど）
 なし

AWS Lambda Python Amazon S3

変更したい箇所について補足を追記

2022/11/30 06:52

投稿

msy47

スコア26

test CHANGED Viewed

File without changes

test CHANGED Viewed

@@ -86,8 +86,11 @@
 ```
 ### 試したこと
-・未動作
-・ソースコード検討中
+・正規表現を廃止し
+・csvのコンマ区切りを読み取り
+・読み取ったcsvを辞書型でjsonにして送信したい
+既存のsearch(line).groupをの箇所をどのように変更すればよいか
+検討がつかず。。。ご助力いただきたく。
 ### 補足情報（FW/ツールのバージョンなど）

AWS Lambda Python Amazon S3