Python and Boto3を使用してS3バケットのディレクトリコンテンツを一覧表示するにはどうすればよいですか？

Question

PythonおよびBoto3を使用して、S3バケット内のすべてのディレクトリをリストしようとしています。

私は次のコードを使用しています：

s3 = session.resource('s3') # I already have a boto3 Session object bucket_names = [ 'this/bucket/', 'that/bucket/' ] for name in bucket_names: bucket = s3.Bucket(name) for obj in bucket.objects.all(): # this raises an exception # handle obj

これを実行すると、次の例外スタックトレースが表示されます。

File "botolist.py", line 67, in <module> for obj in bucket.objects.all(): File "/Library/Python/2.7/site-packages/boto3/resources/collection.py", line 82, in __iter__ for page in self.pages(): File "/Library/Python/2.7/site-packages/boto3/resources/collection.py", line 165, in pages for page in pages: File "/Library/Python/2.7/site-packages/botocore/paginate.py", line 83, in __iter__ response = self._make_request(current_kwargs) File "/Library/Python/2.7/site-packages/botocore/paginate.py", line 155, in _make_request return self._method(**current_kwargs) File "/Library/Python/2.7/site-packages/botocore/client.py", line 270, in _api_call return self._make_api_call(operation_name, kwargs) File "/Library/Python/2.7/site-packages/botocore/client.py", line 335, in _make_api_call raise ClientError(parsed_response, operation_name) botocore.exceptions.ClientError: An error occurred (NoSuchKey) when calling the ListObjects operation: The specified key does not exist.

バケット内のディレクトリをリストする正しい方法は何ですか？

事前に感謝します...

Henry Henrinson · Accepted Answer

これらの他のすべての反応は最悪です。を使用して

client.list_objects()

結果を最大1kに制限します。残りの答えは間違っているか、複雑すぎます。

継続トークンを自分で処理するのはひどい考えです。あなたのためにそのロジックを扱うpaginatorを使うだけです

必要なソリューションは次のとおりです。

[e['Key'] for p in client.get_paginator("list_objects_v2")\ .paginate(Bucket='my_bucket') for e in p['Contents']]

Anne M. · Answer

セッションがある場合は、クライアントを作成し、クライアントのCommonPrefixesを取得しますlist_objects：

client = session.client('s3', # region_name='eu-west-1' ) result = client.list_objects(Bucket='MyBucket', Delimiter='/') for obj in result.get('CommonPrefixes'): #handle obj.get('Prefix')

多数のフォルダーが存在する可能性がありますが、サブフォルダーから開始することもできます。このような何かがそれを処理できます：

def folders(client, bucket, prefix=''): paginator = client.get_paginator('list_objects') for result in paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter='/'): for prefix in result.get('CommonPrefixes', []): yield prefix.get('Prefix') gen_folders = folders(client, 'MyBucket') list(gen_folders) gen_subfolders = folders(client, 'MyBucket', prefix='MySubFolder/') list(gen_subfolders)

Vor · Answer

または、boto3.clientを使用することもできます

例

>>> import boto3 >>> client = boto3.client('s3') >>> client.list_objects(Bucket='MyBucket')

list_objectsは、結果を反復するために必要となる可能性のある他の引数もサポートしています：Bucket、Delimiter、EncodingType、Marker、MaxKeys、Prefix

Behrooz · Answer

S3バケット内の特定のプレフィックスを持つすべてのオブジェクトのリストを取得する最良の方法は、list_objects_v2とともにContinuationTokenを使用して、1000個のオブジェクトのページ分割の制限を克服します。

import boto3 s3 = boto3.client('s3') s3_bucket = 'your-bucket' s3_prefix = 'your/prefix' partial_list = s3.list_objects_v2( Bucket=s3_bucket, Prefix=s3_prefix) obj_list = partial_list['Contents'] while partial_list['IsTruncated']: next_token = partial_list['NextContinuationToken'] partial_list = s3.list_objects_v2( Bucket=s3_bucket, Prefix=s3_prefix, ContinuationToken=next_token) obj_list.extend(partial_list['Contents'])

Old_Mortality · Answer

バケット名にスラッシュを含めることはできないと思っていたでしょう。バケット内のすべてのディレクトリをリストしたいが、コードはいくつかのバケット内のすべてのコンテンツ（必ずしもディレクトリではない）をリストしようとします。これらのバケットはおそらく存在しません（名前が違法であるため）。だからあなたが走るとき

bucket = s3.Bucket(name)

バケットはおそらくnullであり、後続のリストは失敗します。

bucket = s3.Bucket(name)

バケットはおそらくnullであり、後続のリストは失敗します。