S3に保存されているcsvをcsv.DictReaderで読み取るにはどうすればよいですか？

Question

AWS S3オブジェクトをフェッチするコードがあります。 Pythonのcsv.DictReaderでこのStreamingBodyを読み取るにはどうすればよいですか？

import boto3, csv session = boto3.session.Session(aws_access_key_id=<>, aws_secret_access_key=<>, region_name=<>) s3_resource = session.resource('s3') s3_object = s3_resource.Object(<bucket>, <key>) streaming_body = s3_object.get()['Body'] #csv.DictReader(???)

gary · Answer

コードは次のようになります。

import boto3 import csv # get a handle on s3 s3 = boto3.resource(u's3') # get a handle on the bucket that holds your file bucket = s3.Bucket(u'bucket-name') # get a handle on the object you want (i.e. your file) obj = bucket.Object(key=u'test.csv') # get the object response = obj.get() # read the contents of the file and split it into a list of lines # for python 2: lines = response[u'Body'].read().split() # for python 3 you need to decode the incoming bytes: lines = response['Body'].read().decode('utf-8').split() # now iterate over those lines for row in csv.DictReader(lines): # here you get a sequence of dicts # do whatever you want with each line here print(row)

実際のコードではこれを少し圧縮できますが、boto3を使用してオブジェクト階層を表示するために、段階的にそれを維持するようにしました。

ファイル全体をメモリに読み込まないようにすることについてのコメントに従って編集してください：私はその要件に遭遇していないので、信頼できる発言はできませんが、私はラップしてみますストリームなので、テキストファイルのようなイテレータを取得できます。たとえば、 codecs ライブラリを使用して、上記のcsv解析セクションを次のように置き換えることができます。

for row in csv.DictReader(codecs.getreader('utf-8')(response[u'Body'])): print(row)