S3バケットからすべてのファイルをダウンロードするBoto3

Question

Boto3を使用して、s3バケットからファイルを取得しています。 aws s3 syncのような同様の機能が必要です

私の現在のコードは

#!/usr/bin/python import boto3 s3=boto3.client('s3') list=s3.list_objects(Bucket='my_bucket_name')['Contents'] for key in list: s3.download_file('my_bucket_name', key['Key'], key['Key'])

バケットにファイルしかない限り、これは正常に機能しています。バケット内にフォルダが存在する場合、エラーがスローされます

Traceback (most recent call last): File "./test", line 6, in <module> s3.download_file('my_bucket_name', key['Key'], key['Key']) File "/usr/local/lib/python2.7/dist-packages/boto3/s3/inject.py", line 58, in download_file extra_args=ExtraArgs, callback=Callback) File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 651, in download_file extra_args, callback) File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 666, in _download_file self._get_object(bucket, key, filename, extra_args, callback) File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 690, in _get_object extra_args, callback) File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 707, in _do_get_object with self._osutil.open(filename, 'wb') as f: File "/usr/local/lib/python2.7/dist-packages/boto3/s3/transfer.py", line 323, in open return open(filename, mode) IOError: [Errno 2] No such file or directory: 'my_folder/.8Df54234'

これは、boto3を使用して完全なs3バケットをダウンロードする適切な方法ですか？フォルダーをダウンロードする方法。

Grant Langseth · Accepted Answer

1000個以上のオブジェクトを持つバケットを使用する場合、最大で1000個のキーのシーケンシャルセットでNextContinuationTokenを使用するソリューションを実装する必要があります。このソリューションは、最初にオブジェクトのリストをコンパイルしてから、指定されたディレクトリを繰り返し作成し、既存のオブジェクトをダウンロードします。

import boto3 import os s3_client = boto3.client('s3') def download_dir(prefix, local, bucket, client=s3_client): """ params: - prefix: pattern to match in s3 - local: local path to folder in which to place files - bucket: s3 bucket with target contents - client: initialized s3 client object """ keys = [] dirs = [] next_token = '' base_kwargs = { 'Bucket':bucket, 'Prefix':prefix, } while next_token is not None: kwargs = base_kwargs.copy() if next_token != '': kwargs.update({'ContinuationToken': next_token}) results = client.list_objects_v2(**kwargs) contents = results.get('Contents') for i in contents: k = i.get('Key') if k[-1] != '/': keys.append(k) else: dirs.append(k) next_token = results.get('NextContinuationToken') for d in dirs: dest_pathname = os.path.join(local, d) if not os.path.exists(os.path.dirname(dest_pathname)): os.makedirs(os.path.dirname(dest_pathname)) for k in keys: dest_pathname = os.path.join(local, k) if not os.path.exists(os.path.dirname(dest_pathname)): os.makedirs(os.path.dirname(dest_pathname)) client.download_file(bucket, k, dest_pathname)

glefait · Answer

同じニーズがあり、ファイルを再帰的にダウンロードする次の関数を作成しました。

ディレクトリは、ファイルが含まれている場合にのみローカルに作成されます。

import boto3 import os def download_dir(client, resource, dist, local='/tmp', bucket='your_bucket'): paginator = client.get_paginator('list_objects') for result in paginator.paginate(Bucket=bucket, Delimiter='/', Prefix=dist): if result.get('CommonPrefixes') is not None: for subdir in result.get('CommonPrefixes'): download_dir(client, resource, subdir.get('Prefix'), local, bucket) for file in result.get('Contents', []): dest_pathname = os.path.join(local, file.get('Key')) if not os.path.exists(os.path.dirname(dest_pathname)): os.makedirs(os.path.dirname(dest_pathname)) resource.meta.client.download_file(bucket, file.get('Key'), dest_pathname)

関数はそのように呼び出されます：

def _start(): client = boto3.client('s3') resource = boto3.resource('s3') download_dir(client, resource, 'clientconf/', '/tmp', bucket='my-bucket')

John Rotenstein · Answer

Amazon S3にはフォルダー/ディレクトリがありません。 フラットファイル構造です。

ディレクトリの外観を維持するために、パス名はオブジェクトKey（ファイル名）の一部として保存されます。例えば：

images/foo.jpg

この場合、キー全体はimages/foo.jpgではなくfoo.jpgです。

あなたの問題は、botoがmy_folder/.8Df54234というファイルを返し、それをローカルファイルシステムに保存しようとしていることだと思われます。ただし、ローカルファイルシステムはmy_folder/部分をディレクトリ名として解釈し、そのディレクトリはローカルファイルシステムに存在しません。

ファイル名をtruncateして.8Df54234部分のみを保存するか、createする必要がありますファイルを書き込む前に必要なディレクトリマルチレベルのネストされたディレクトリである可能性があることに注意してください。

より簡単な方法は、 AWSコマンドラインインターフェイス（CLI） を使用することです。、たとえば：

aws s3 cp --recursive s3://my_bucket_name local_folder

新しいファイルと変更されたファイルのみをコピーするsyncオプションもあります。

Tushar Niras · Answer

import os import boto3 #initiate s3 resource s3 = boto3.resource('s3') # select bucket my_bucket = s3.Bucket('my_bucket_name') # download file into current directory for s3_object in my_bucket.objects.all(): # Need to split s3_object.key into path and file name, else it will give error file not found. path, filename = os.path.split(s3_object.key) my_bucket.download_file(s3_object.key, filename)

Shan · Answer

私は現在、次を使用してタスクを達成しています

#!/usr/bin/python import boto3 s3=boto3.client('s3') list=s3.list_objects(Bucket='bucket')['Contents'] for s3_key in list: s3_object = s3_key['Key'] if not s3_object.endswith("/"): s3.download_file('bucket', s3_object, s3_object) else: import os if not os.path.exists(s3_object): os.makedirs(s3_object)

それは仕事をしますが、私はこの方法で行うのが良いかどうかわかりません。私は他のユーザーとさらなる回答を助けるためにここに残し、これを達成するためのより良い方法で

ifoukarakis · Answer

決して遅くない方が良い:)前のページネーターの答えは本当に良いです。ただし、再帰的であり、Pythonの再帰制限に達する可能性があります。いくつかの追加チェックを伴う代替アプローチを次に示します。

import os import errno import boto3 def assert_dir_exists(path): """ Checks if directory tree in path exists. If not it created them. :param path: the path to check if it exists """ try: os.makedirs(path) except OSError as e: if e.errno != errno.EEXIST: raise def download_dir(client, bucket, path, target): """ Downloads recursively the given S3 path to the target directory. :param client: S3 client to use. :param bucket: the name of the bucket to download from :param path: The S3 directory to download. :param target: the local directory to download the files to. """ # Handle missing / at end of prefix if not path.endswith('/'): path += '/' paginator = client.get_paginator('list_objects_v2') for result in paginator.paginate(Bucket=bucket, Prefix=path): # Download each file individually for key in result['Contents']: # Calculate relative path rel_path = key['Key'][len(path):] # Skip paths ending in / if not key['Key'].endswith('/'): local_file_path = os.path.join(target, rel_path) # Make sure directories exist local_file_dir = os.path.dirname(local_file_path) assert_dir_exists(local_file_dir) client.download_file(bucket, key['Key'], local_file_path) client = boto3.client('s3') download_dir(client, 'bucket-name', 'path/to/data', 'downloads')

Ganatra · Answer

すべてのファイルを一度に取得することは非常に悪い考えです。むしろバッチで取得する必要があります。

S3から特定のフォルダー（ディレクトリ）を取得するために使用する実装の1つは、

def get_directory(directory_path, download_path, exclude_file_names): # prepare session session = Session(aws_access_key_id, aws_secret_access_key, region_name) # get instances for resource and bucket resource = session.resource('s3') bucket = resource.Bucket(bucket_name) for s3_key in self.client.list_objects(Bucket=self.bucket_name, Prefix=directory_path)['Contents']: s3_object = s3_key['Key'] if s3_object not in exclude_file_names: bucket.download_file(file_path, download_path + str(s3_object.split('/')[-1])

それでもバケット全体を取得するには、CILを介して @ John Rotenstein前述として使用します。

aws s3 cp --recursive s3://bucket_name download_path

mattalxndr · Answer

同じプロセスでAWS CLIを実行する回避策があります。

awscliをpython libとしてインストールします。

pip install awscli

次に、この関数を定義します。

from awscli.clidriver import create_clidriver def aws_cli(*cmd): old_env = dict(os.environ) try: # Environment env = os.environ.copy() env['LC_CTYPE'] = u'en_US.UTF' os.environ.update(env) # Run awscli in the same process exit_code = create_clidriver().main(*cmd) # Deal with problems if exit_code > 0: raise RuntimeError('AWS CLI exited with code {}'.format(exit_code)) finally: os.environ.clear() os.environ.update(old_env)

実行するには：

aws_cli('s3', 'sync', '/path/to/source', 's3://bucket/destination', '--delete')

Rajesh Rajendran · Answer

for objs in my_bucket.objects.all(): print(objs.key) path='/tmp/'+os.sep.join(objs.key.split(os.sep)[:-1]) try: if not os.path.exists(path): os.makedirs(path) my_bucket.download_file(objs.key, '/tmp/'+objs.key) except FileExistsError as fe: print(objs.key+' exists')

このコードは、コンテンツを/tmp/ディレクトリにダウンロードします。必要に応じて、ディレクトリを変更できます。

snat2100 · Answer

Pythonを使用してbashスクリプトを呼び出す場合、S3バケットのフォルダーからローカルフォルダー（Linuxマシン内）にファイルをロードする簡単な方法を次に示します。

import boto3 import subprocess import os ###TOEDIT### my_bucket_name = "your_my_bucket_name" bucket_folder_name = "your_bucket_folder_name" local_folder_path = "your_local_folder_path" ###TOEDIT### # 1.Load thes list of files existing in the bucket folder FILES_NAMES = [] s3 = boto3.resource('s3') my_bucket = s3.Bucket('{}'.format(my_bucket_name)) for object_summary in my_bucket.objects.filter(Prefix="{}/".format(bucket_folder_name)): # print(object_summary.key) FILES_NAMES.append(object_summary.key) # 2.List only new files that do not exist in local folder (to not copy everything!) new_filenames = list(set(FILES_NAMES )-set(os.listdir(local_folder_path))) # 3.Time to load files in your destination folder for new_filename in new_filenames: upload_S3files_CMD = """aws s3 cp s3://{}/{}/{} {}""".format(my_bucket_name,bucket_folder_name,new_filename ,local_folder_path) subprocess_call = subprocess.call([upload_S3files_CMD], Shell=True) if subprocess_call != 0: print("ALERT: loading files not working correctly, please re-check new loaded files")