マルチページPDFをPythonで画像オブジェクトのリストに変換する方法は？

Question

Pythonで画像をディスクに保存せずに（PDF $ ===ドキュメントをリスト構造の一連の画像オブジェクトに変換したい）、複数ページのPDFドキュメントをPILイメージで処理したい）にします。これまでのところ、最初に画像をファイルに書き込むためにのみこれを行うことができます。

from wand.image import Image with Image(filename='source.pdf') as img: with img.convert('png') as converted: converted.save(filename='pyout/page.png')

しかし、どうすれば上記のimgオブジェクトをPIL.Imageオブジェクトのリストに直接変換できますか？

Bryant Kou · Answer

新しい答え：

pip install pdf2image

from pdf2image import convert_from_path, convert_from_bytes images = convert_from_path('/path/to/my.pdf')

枕も設置する必要があるかもしれません。これはLinuxでのみ機能する可能性があります。

https://github.com/Belval/pdf2image

結果は2つの方法で異なる場合があります。

古い答え：

Python 3.4：

from PIL import Image from wand.image import Image as wimage import os import io if __name__ == "__main__": filepath = "fill this in" assert os.path.exists(filepath) page_images = [] with wimage(filename=filepath, resolution=200) as img: for page_wand_image_seq in img.sequence: page_wand_image = wimage(page_wand_image_seq) page_jpeg_bytes = page_wand_image.make_blob(format="jpeg") page_jpeg_data = io.BytesIO(page_jpeg_bytes) page_image = Image.open(page_jpeg_data) page_images.append(page_image)

最後に、システムコールを実行してmogrifyを実行できますが、一時ファイルを管理する必要があるため、より複雑になる可能性があります。