pythonスクリプトでtarファイルの内容を展開せずに読み込む

Question

多数のファイルを含むtarファイルがあります。 pythonスクリプトを記述する必要があります。このスクリプトは、tarファイルを展開せずに、文字、スペース、改行文字、すべての合計数を含む合計文字数をカウントします。。

ghostdog74 · Accepted Answer

getmembers（）を使用できます

>>> import tarfile >>> tar = tarfile.open("test.tar") >>> tar.getmembers()

その後、extractfile（）を使用して、メンバーをファイルオブジェクトとして抽出できます。ほんの一例

import tarfile,os import sys os.chdir("/tmp/foo") tar = tarfile.open("test.tar") for member in tar.getmembers(): f=tar.extractfile(member) content=f.read() print "%s has %d newlines" %(member, content.count("
")) print "%s has %d spaces" % (member,content.count(" ")) print "%s has %d characters" % (member, len(content)) sys.exit() tar.close()

上記の例のファイルオブジェクト「f」では、read（）、readlines（）などを使用できます。

Stefano Borini · Answer

tarfileモジュールを使用する必要があります。具体的には、クラスTarFileのインスタンスを使用してファイルにアクセスし、TarFile.getnames（）を使用して名前にアクセスします。

 | getnames(self) | Return the members of the archive as a list of their names. It has | the same order as the list returned by getmembers().

代わりにcontentを読みたい場合は、このメソッドを使用します

 | extractfile(self, member) | Extract a member from the archive as a file object. `member' may be | a filename or a TarInfo object. If `member' is a regular file, a | file-like object is returned. If `member' is a link, a file-like | object is constructed from the link's target. If `member' is none of | the above, None is returned. | The file-like object is read-only and provides the following | methods: read(), readline(), readlines(), seek() and tell()

ThorSummoner · Answer

@ stefano-boriniで言及されているメソッドの実装ファイル名でtarアーカイブメンバーにアクセスします

#python3 myFile = myArchive.extractfile( dict(Zip( myArchive.getnames(), myArchive.getmembers() ))['path/to/file'] ).read()`

クレジット：

dict(Zip( from https://stackoverflow.com/a/209854/169568
tarfile.getnames from https://stackoverflow.com/a/2018523/169568
さらに、私の用途では、バッファーからtarアーカイブを読み取る Python 3？のバイトバッファーからメモリにTarFileオブジェクトを構築する方法