ファイルを名前が同じでターゲットディレクトリが異なるN個に分割する

Question

10000行（毎日増加）を含むsourcefile.txtを30の等しいファイルに分割したいと思います。 prog1からprog30までのディレクトリがあり、ファイルを同じファイル名でこれらのディレクトリに分割して保存したいと思います。たとえば、/prog1/myfile.txt、/prog2/myfile.txtから/prog30/myfile.txt。

divide.shというbashスクリプトがprogディレクトリで実行されます

#!/bin/bash programpath=/home/mywebsite/project/a1/ array=/prog1/ totalline=$(wc -l < ./sourcefile.txt) divide="$(( $totalline / 30 ))" split --lines=$divide $./prog1/myfile.txt exit 1 fi

jthill · Accepted Answer

楽しみのためのSedバージョン：

lines=$(wc -l <sourcefile.txt) perfile=$(( (lines+29)/30 )) # see https://www.rfc-editor.org/rfc/rfc968.txt last=0 sed -nf- sourcefile.txt <<EOD $(while let $((last<lines)); do mkdir -p prog$((last/perfile+1)) echo $((last+1)),$((last+perfile)) w prog$((last/perfile+1))/myfile.txt : $((last+=perfile)) done) EOD

ashishk · Answer

#!/bin/bash # assuming the file is in the same folder as the script INPUT=large_file.txt # assuming the folder called "output" is in the same folder # as the script and there are folders that have the patter # prog01 prog02 ... prog30 # create that with mkdir output/prog{01..30} OUTPUT_FOLDER=output OUTPUT_FILE_FORMAT=myfile # split # -n -> 30 files # $OUTPUT_FILE_FORMAT -> should start with this pattern # --numeric-suffixes=1 -> end of file name should start from 01 split -n 30 $INPUT $OUTPUT_FILE_FORMAT --numeric-suffixes=1 # move all files to their repective directories for i in {01..30} do mv $OUTPUT_FILE_FORMAT$i $OUTPUT_FOLDER/prog$i/myfile.txt done echo "done :)" exit

このタスクでは、splitコマンドで十分です。ただし、ここでのソリューションでは、フォルダー名をprog01ではなくprog1で始まるようにする必要があります。

αғsнιη · Answer

awkのみのソリューション（[〜＃〜] n [〜＃〜]ここでは30ファイルに等しい）：

awk 'BEGIN{ cmd="wc -l <sourcefile.txt"; cmd|getline l; l=int((l+29)/30); close(cmd) } NR%l==1{trgt=sprintf("prog%d",((++c)))}{print >trgt"/myfile.txt"}' sourcefile.txt

または、シェルを実行してsourcefile.txtの行数を返し、 jthill で提案されているようにawkに渡します。

awk 'NR%l==1{trgt=sprintf("prog%d",((++c)))}{print >trgt"/myfile.txt"}' l=$(( ($(wc -l <sourcefile.txt)+29)/30 )) sourcefile.txt

RomanPerekhrest · Answer

split + bashソリューション：

lines=$(echo "t=$(wc -l ./sourcefile.txt | cut -d' ' -f1); d=30; if(t%d) t/d+1 else t/d" | bc) split -l $lines ./sourcefile.txt "myfile.txt" --numeric-suffixes=1 for f in myfile.txt[0-9]*; do dir_n="prog"$(printf "%d" "${f#*txt}") # constructing directory name mv "$f" "$dir_n/myfile.txt" done

あなたがすでにprog1からprog30と呼ばれるフォルダを持っていると仮定します（あなたが言ったように）

lines-出力ファイルごとの整数行数が含まれます
- t-ファイルの合計行数./sourcefile.txt
- d=30は除算器です
--numeric-suffixes=1-splitのオプション。1で始まる数値のサフィックスを使用するように指示します

Kramer · Answer

手順

ファイル内の行を数え、30で割るlines = cat ${file} | wc -l
必要なファイルの量を取得します（bashは整数に切り上げます）numOfFiles = ${lines} / 30
splitを使用してファイルを分割しますsplit -l ${lines} -d --additional-suffix=-filename.extension ${file}

期待される結果

x01-filename.extension、x02-filename.extension ... xN-filename.extension

一度に複数のファイルを処理するには、それをforループにラップします

#!/bin/bash for FILE in $(find ${pathToWorkingDir} -type f -name "filename.extension") do split -l ${lines} -d --additional-suffix=-filename.extension ${file} if [ $? -eq 0 ]; then echo "${file} splitted file correctly" else echo "there was a problem splitting ${file}" exit 1 #we exit with an error code fi done exit 0 #if all processed fine we exit with a success code

Ole Tange · Answer

GNU Parallel：

parallel -j30 -a sourcefile.txt --pipepart --block -1 cat '>'prog{#}/myfile.txt

これにより、30個のジョブが並行して実行され、sourcefile.txtがジョブごとに1つの部分（つまり30）に分割され、catにその部分が渡され、prog{jobnumber}/myfile.txtに保存されます。