多くのPDFを含むZipファイルから特定のPDFを抽出する

Question

与えられた：

Zipファイルの名前と場所。例：collectionOfPdfFiles2017.Zip
- Zipファイルは、フォルダー構造のないPDFのコレクションです。
PDF Zip-File内のファイルの名前。例：someFileFrom2017.pdf

募集：

指定されたZipファイルから名前付きPDF=を抽出するコンソールの方法
ファイルは変更しないでください。
- 基本的に、抽出されたファイルは、アーカイブ全体を抽出し、必要なファイルを手動でコピーすることで取得するのと同じ状態にする必要があります。
理想的には宛先フォルダに。しかし、それは贅沢です。

どうすればよいですか？現在、ZIP内のPDFファイル内の文字列を検索し、その中にZipの名前とpdfを出力するスクリプトがあります。これは、わかりやすいように投稿します。

 #!/bin/bash echo "Hi I'll find text in pdf files that are stored inside Zip files." echo "" echo "Enter search string:" read searchString echo "Ok. I'll search all Zip files for content with this text..." for z in *.Zip do zipinfo -1 "$z" | # Get the list of filenames in the Zip file while IFS= read -r f do unzip -p "$z" "$f" | # Extract each PDF to standard output instead of a file pdftotext - - | # Then convert it to text, reading from stdin, writing to stdout grep -q $searchString && echo "$z -> $f" # And finally grep the text done done

このスクリプトは this answer のおかげで作成されました。

cmak.fr · Accepted Answer

zipアーカイブから特定のファイルを解凍する

unzip -j "myarchive.Zip" "in/archive/file.pdf" -d "/destination/path/"

あなたのスクリプトで

# Set a destination path dest="/path/to/unzip/to" # dump pdf to temp text file tempfile=$(mktemp) # unzip the file to stdOut and convert it to text unzip -p "$z" "$f" | pdftotext - $tempfile if grep -q $searchString $tempfile; then unzip -j "$z" "$f" -d "$dest" # some text output echo "$z -> $f" fi rm $tempfile