ファイルを分割して直接圧縮する方法は？

Question

100 GBのファイルを1つずつ100行に分割したい（改行で）

例えば.

split --bytes=1024M /path/to/input /path/to/output

生成された100個のファイルについて、これらの各ファイルにgzip/Zipを適用します。

単一のコマンドを使用することは可能ですか？

Peter · Answer

「--filter」を使用します。

split --bytes=1024M --filter='gzip > $FILE.gz' /path/to/input /path/to/output

Iv&#225;n · Answer

このコマンドを-dオプションとともに使用すると、数値の接尾辞を生成できます。

split -d -b 2048m "myDump.dmp" "myDump.dmp.part-" && gzip myDump.dmp.part*

生成されるファイル：

 myDump.dmp.part-00 myDump.dmp.part-01 myDump.dmp.part-02 ...

Paused until further notice. · Answer

条件付きを使用するワンライナーは、あなたが来ることができる限り近くなります。

cd /path/to/output && split --bytes=1024M /path/to/input/filename && gzip x*

gzipは、条件付きでsplitが成功した場合にのみ実行されます&&もcdとsplitの間にあり、cdも成功していることを確認しています。splitとgzipに注意してください出力ディレクトリを指定する機能の代わりに、現在のディレクトリに出力します。必要に応じて、ディレクトリを作成できます。

mkdir -p /path/to/output && cd /path/to/output && split --bytes=1024M /path/to/input/filename && gzip x*

すべてを元に戻すには：

gunzip /path/to/files/x* && cat /path/to/files/x* > /path/to/dest/filename

splaisan · Answer

Pigzでオンザフライで圧縮するbash関数

function splitreads(){ # add this function to your .bashrc or alike # split large compressed read files into chunks of fixed size # suffix is a three digit counter starting with 000 # take compressed input and compress output with pigz # keeps the read-in-pair suffix in outputs # requires pigz installed or modification to use gzip usage="# splitreads <reads.fastq.gz> <reads per chunk; default 10000000>
"; if [ $# -lt 1 ]; then echo; echo ${usage}; return; fi; # threads for pigz (adapt to your needs) thr=8 input=$1 # extract prefix and read number in pair # this code is adapted to paired reads base=$(basename ${input%.f*.gz}) pref=$(basename ${input%_?.f*.gz}) readn="${base#"${base%%_*}"}" # 10M reads (4 lines each) binsize=$((${2:-10000000}*4)) # split in bins of ${binsize} echo "# splitting ${input} in chuncks of $((${binsize}/4)) reads" cmd="zcat ${input} \ | split \ -a 3 \ -d \ -l ${binsize} \ --numeric-suffixes \ --additional-suffix ${readn} \ --filter='pigz -p ${thr} > \$FILE.fq.gz' \ - ${pref}_" echo "# ${cmd}" eval ${cmd} }