bashスクリプト：これらの操作を実行するためのより洗練された方法：

Question

私はこれらの3つのファイルを持っています：

file.txt.7z = 5.4GB
file-1.txt.7z = 251M
file-2.txt.7z = 7.7M

そして、それらはディレクトリ内の唯一のファイルです。

$ tree . ├── file.txt.7z ├── file-1.txt.7z └── file-2.txt.7z

したい

ファイルを解凍します
それらを1つのファイルに結合します
その結合されたファイルを500,000行のファイルに分割します
その結果、拡張子が「.txt」のファイルが多数あります。

今、私はそれをこのように達成しています：

p7Zip -d "*.txt.7z" cat file-1.txt >> file.txt rm file-1.txt cat file-2.txt >> file.txt rm file-2.txt split -l 500000 file.txt for f in *; do mv "$f" "$f.txt"; done

どうすればもっとエレガントな方法でこれを達成できますか？

RomanPerekhrest · Answer

7za + splitソリューション（単一パイプライン）：

7za e "*.7z" -so 2> /dev/null | split -l500000 --additional-suffix=".txt" --numeric-suffixes=1 - "file"

-7zaオプション：

e-アーカイブを抽出/解凍します
-so-コンテンツをSTDOUTに書き込みます

-splitオプション：

--additional-suffix=".txt"-結果のすべてのファイル名にサフィックス.txtを追加します
--numeric-suffixes=1-1で始まる数字のサフィックスを使用します
-（ハイフン）-STDINからデータを読み取ります（標準入力）
"file"-結果のすべてのファイル名に共通のプレフィックス

上記のコマンドは、次の命名形式のファイルになります：file01.txt、file02.txtなど。

Rastapopoulos · Answer

解凍後、パイプとsplitの--filterオプションを使用できます。

p7Zip -d *.txt.7z cat file.txt file-1.txt file-2.txt | split -l 500000 --filter='> $FILE.txt' rm file*

--filter optionのドキュメントは次のとおりです。

‘--filter=COMMAND’ With this option, rather than simply writing to each output file, write through a pipe to the specified Shell COMMAND for each output file. COMMAND should use the $FILE environment variable, which is set to a different output file name for each invocation of the command. For example, imagine that you have a 1TiB compressed file that, if uncompressed, would be too large to reside on disk, yet you must split it into individually-compressed pieces of a more manageable size. To do that, you might run this command: xz -dc BIG.xz | split -b200G --filter='xz > $FILE.xz' - big- Assuming a 10:1 compression ratio, that would create about fifty 20GiB files with names ‘big-aa.xz’, ‘big-ab.xz’, ‘big-ac.xz’, etc.

すべての出力を含むファイルを保持する必要がある場合は、teeを使用できます。これにより、標準入力が標準出力と引数として指定されたファイルにコピーされます。

cat file.txt file-1.txt file-2.txt | tee all.txt | split -l 50000 --filter='> $FILE.txt'