コマンドにパイプするためにsplitto stdoutのようなファイルを分割するにはどうすればよいですか？

Question

私は大きな.sql SQLServerデータベースに挿入したいデータを含むSELECTステートメントでいっぱいのファイル。基本的に、ファイルの内容を一度に100行ずつ取得し、残りを実行するために設定したコマンドに渡す方法を探しています。

基本的に、ファイルではなくsplitに出力されるstdoutを探しています。

私はWindowsでCygWinも使用しているため、ツールの完全なスイートにアクセスできません。

Ehryk · Accepted Answer

最終的にはひどいものになりました。もっと良い方法がある場合は投稿してください。

#!/bin/sh DONE=false until $DONE; do for i in $(seq 1 $2); do read line || DONE=true; [ -z "$line" ] && continue; lines+=$line$'
'; done sql=${lines::${#lines}-10} (cat "Header.sql"; echo "$sql";) | sqlcmd #echo "--- PROCESSED ---"; lines=; done < $1

./insert.sh "File.sql" 100で実行します。ここで、100は一度に処理する行数です。

Graeme · Answer

これを行う最も簡単な方法は次のとおりです。

while IFS= read -r line; do { printf '%s
' "$line"; head -n 99; } | other_commands done <database_file

ファイルの終わりに達したときに停止する他の方法がないように見えるため、各セクションの最初の行にreadを使用する必要があります。詳細については、以下を参照してください。

Ole Tange · Answer

GNU Parallelはこれのために作られています：

cat bigfile | parallel --pipe -N100 yourscript

デフォルトでは、CPUコアごとに1つのジョブを実行します。 '-j1'を使用すると、単一のジョブを強制的に実行できます。

バージョン20140422には、3.5 GB/sを配信できる高速バージョンが含まれています。価格は正確な100行を配信できないことですが、おおよその行の長さがわかっている場合は、-blockをその100倍に設定できます（ここでは、行の長さが500バイトに近いと想定しています）。

parallel --pipepart --block 50k yourscript :::: bigfile

mikeserv · Answer

_linc() ( ${sh-da}sh ${dbg+-vx} 4<&0 <&3 ) 3<<-ARGS 3<<\CMD set -- $( [ $((i=${1%%*[!0-9]*}-1)) -gt 1 ] && { shift && echo "\${inc=$i}" ; } unset cmd ; [ $# -gt 0 ] || cmd='echo incr "#$((i=i+1))" ; cat' printf '%s ' 'me=$$ ;' \ '_cmd() {' '${dbg+set -vx ;}' "$@" "$cmd" ' }' ) ARGS s= ; sed -f - <<-INC /dev/fd/4 | . /dev/stdin i_cmd <<"${s:=${me}SPLIT${me}}" ${inc:+$(printf '$!n
%.0b' `seq $inc`)} a$s INC CMD

上記の関数は、sedを使用して、引数リストをコマンド文字列として任意の行増分に適用します。コマンドラインで指定するコマンドは、一時的なシェル関数に供給されます。この関数には、すべてのインクリメントのステップに相当する行で構成されるstdinのヒアドキュメントが提供されます。

あなたはそれをこのように使います：

time printf 'this is line #%d
' `seq 1000` | _linc 193 sed -e \$= -e r \- \| tail -n2 #output 193 this is line #193 193 this is line #386 193 this is line #579 193 this is line #772 193 this is line #965 35 this is line #1000 printf 'this is line #%d
' `seq 1000` 0.00s user 0.00s system 0% cpu 0.004 total

ここでのメカニズムは非常に単純です。

i_cmd <<"${s:=${me}SPLIT${me}}" ${inc:+$(printf '$!n
%.0b' `seq $inc`)} a$s

それがsedスクリプトです。基本的にはprintf $increment * n;だけです。したがって、増分を100に設定すると、printfは、$!nだけを言う100行から構成されるsedスクリプトを作成し、1つのinsert行をhere-docの上端に、1つのappendを最終行に書き込みます-これはそれ。残りのほとんどはオプションを処理するだけです。

nextコマンドは、sedに、現在の行を出力して削除し、次の行をプルするように指示します。 $!は、最後の行以外は試行しないことを指定します。

インクリメンターのみを提供した場合：

printf 'this is line #%d
' `seq 10` | ⏎ _linc 3 #output incr #1 this is line #1 this is line #2 this is line #3 incr #2 this is line #4 this is line #5 this is line #6 incr #3 this is line #7 this is line #8 this is line #9 incr #4 this is line #10

したがって、ここで舞台裏で行われているのは、関数がechoカウンターに設定され、コマンド文字列が指定されていない場合はその入力がcatに設定されていることです。コマンドラインで見た場合、次のようになります。

{ echo "incr #$((i=i+1))" ; cat ; } <<HEREDOC this is line #7 this is line #8 this is line #9 HEREDOC

増分ごとにこれらの1つを実行します。見て：

printf 'this is line #%d
' `seq 10` | dbg= _linc 3 #output set -- ${inc=2} + set -- 2 me=$$ ; _cmd() { ${dbg+set -vx ;} echo incr "#$((i=i+1))" ; cat } + me=19396 s= ; sed -f - <<-INC /dev/fd/4 | . /dev/stdin i_cmd <<"${s:=${me}SPLIT${me}}" ${inc:+$(printf '$!n
%.0b' `seq $inc`)} a$s INC + s= + . /dev/stdin + seq 2 + printf $!n
%.0b 1 2 + sed -f - /dev/fd/4 _cmd <<"19396SPLIT19396" this is line #1 this is line #2 this is line #3 19396SPLIT19396 + _cmd + set -vx ; echo incr #1 + cat this is line #1 this is line #2 this is line #3 _cmd <<"19396SPLIT19396"

本当に速い

time yes | sed = | sed -n 'p;n' | _linc 4000 'printf "current line and char count
" sed "1w /dev/fd/2" | wc -c [ $((i=i+1)) -ge 5000 ] && kill "$me" || echo "$i"' #OUTPUT current line and char count 19992001 36000 4999 current line and char count 19996001 36000 current line and char count [2] 17113 terminated yes | 17114 terminated sed = | 17115 terminated sed -n 'p;n' yes 0.86s user 0.06s system 5% cpu 16.994 total sed = 9.06s user 0.30s system 55% cpu 16.993 total sed -n 'p;n' 7.68s user 0.38s system 47% cpu 16.992 total

上記では、4000行ごとにインクリメントするように指示しています。 17秒後、2,000万行を処理しました。もちろん、ロジックはそれほど深刻ではありません。各行を2回読み取り、すべての文字をカウントするだけですが、可能性はかなり開いています。また、よく見ると、とにかく大部分の時間を費やしているのは、入力を提供するフィルターのように見えるかもしれません。

don_crissti · Answer

基本的に、ファイルではなくsplitに出力されるstdoutを探しています。

gnu splitへのアクセス権がある場合、--filterオプションはそれを正確に実行します。

‘--filter=command’ With this option, rather than simply writing to each output file, write through a pipe to the specified Shell command for each output file.

したがって、あなたの場合、これらのコマンドを--filterで使用できます。

split -l 100 --filter='{ cat Header.sql; cat; } | sqlcmd; printf %s\n DONE' infile

または、スクリプトを記述します。 myscript：

#!/bin/sh { cat Header.sql; cat; } | sqlcmd printf %s\n '--- PROCESSED ---'

その後、単に実行します

split -l 100 --filter=./myscript infile