列からのテキスト解析

Question

2018-05-24 23:57:30 1.1.1.1 8.8.4.4 2018-05-24 23:57:32 2.2.2.2 8.8.4.4 2018-05-24 23:58:12 8.8.8.8 8.8.4.4 2018-05-24 23:58:23 8.8.8.8 8.8.4.4 2018-05-24 23:59:40 8.8.8.8 8.8.4.4 2018-05-24 23:59:51 8.8.8.8 8.8.4.4

上記の形式のログファイルを持っているので。これを解析する必要があり、出力は次のようになります（行データが繰り返される場合、3列目と4列目を比較して最初と最後の行のみを表示します。

2018-05-24 23:57:30 1.1.1.1 8.8.4.4 2018-05-24 23:57:32 2.2.2.2 8.8.4.4 2018-05-24 23:58:12 8.8.8.8 8.8.4.4 2018-05-24 23:59:51 8.8.8.8 8.8.4.4

αғsнιη · Accepted Answer

awkの場合：

awk '!first[$3, $4]{ first[$3, $4]= $0 } { last[$3, $4]= $0 } END{ for (x in last) print first[x] (last[x] != first[x]? ORS last[x]:"") }' infile 2018-05-24 23:58:12 8.8.8.8 8.8.4.4 2018-05-24 23:59:51 8.8.8.8 8.8.4.4 2018-05-24 23:57:30 1.1.1.1 8.8.4.4 2018-05-24 23:57:32 2.2.2.2 8.8.4.4

first連想配列は、列＃3と列＃4のキーの組み合わせで最初に発生した行を保持しますが、last配列は、毎回同じキーで最新の行を保持し続けます。

すべての行が読み取られた後、first配列の値は最初に発生した行（異なる列＃3、＃4）であり、lastの値はで発生した行です。最後に。

次に、ENDで、first配列に保存された値を出力し、次にlastに保存されます。この (last[x] != first[x]? ORS last[x]:"")は、column3と4の組み合わせが繰り返されていない唯一の一意の行である場合に、行の重複を防ぐために使用されます。

choroba · Answer

救助するPerl：

Perl -ane ' if ($F[2] ne $c3 || $F[3] ne $c4) { $printed or print $previous; $printed = print; } else { $printed = 0; } ($c3, $c4, $previous) = (@F[2, 3], $_); END { print $previous unless $printed } ' -- input.file

-n入力を1行ずつ読み取り、各行のコードを実行します。
-a空白の各入力行を@F配列に分割します。
$ c3と$ c4は、列3と4の以前の値を保持するために使用され、実際の値は$ F [2]と$ F [3]に格納されます（配列は0からインデックス付けされます）。
$ previousは、前の行を印刷する必要がある場合に備えて保存します。
$ printedは、最後の行を2回印刷するのを防ぐだけです（そうしないと、列3と4が前の行と異なる場合に発生します）。

Rakesh Sharma · Answer

 Perl -lane ' *x = sub { print for splice @A; } if $. == 1; x() if $. > 1 and $F[2] ne $c3 || $F[3] ne $c4; ($c3, $c4, $A[!!@A]) = (@F[2,3], $_); x() if eof; ' include.txt

§ 使い方。

 ° Array @A holds only 2 elements max at any time. The beginning and end lines for the range. ° subroutine &x displays the array @A and after displaying empties it as well. ° display the previous range provided we are not at the first line and either of the previous columns don't match with the current. ° update the previous columns and array.

¶sedエディターを使用する別の方法が詳しく説明されています。

 #! /bin/sh # declare regex assist variables b='[:space:]' s="[$b]" # \s S="[^$b]" # \S # \S+ \s+ F="$S$S*" sp="$s$s*" F_s="$F$sp" # \S+\s+ # composition of a line L="$F_s$F_s$$F$$sp$$F$" # matching next line M=".*$s\1$sp\2" # 2 lines when they match with 3,4 fields L2="$L$\n$M$\{1\}" # 3 lines when they match in fields 3,4 L3="$L$\n$M$\{2\}" #### code sed -e ' # bring on board next line for interrogation N # 2 lines fields 3,4 donot match # display the first line... redo code with remaining '"/^$L2\$/"'!{ P;D } # 3 lines with first two match but third not match in fields 3,4 :a;h;N '"/^$L3\$/"'!{ x;p;g s/.*$\n$/\1/;D } s/\n.*$\n$/\1/;ba ' include.txt

cheft · Answer

また、列3,4を比較するだけの一意の行を取得して、この場合は最後の行を追加することもできます。ただし、他のすべての行の3番目と4番目の列が異なる場合、最後の行が重複する可能性があります。

次に、必要に応じて、uniqに別のパイプを追加して削除します。

{uniq <your_file> -f2; tail -n1 <your_file>; } | cat | uniq

-fは、最初の2つのスペース区切りフィールドをスキップします。

Rakesh Sharma · Answer

 Perl -lane ' *x = sub { print for splice @A; } if $. == 1; x() if $. > 1 and $F[2] ne $c3 || $F[3] ne $c4; ($c3, $c4, $A[!!@A]) = (@F[2,3], $_); x() if eof; ' include.txt

§ 使い方。

 ° Array @A holds only 2 elements max at any time. The beginning and end lines for the range. ° subroutine &x displays the array @A and after displaying empties it as well. ° display the previous range provided we are not at the first line and either of the previous columns don't match with the current. ° update the previous columns and array.

MiniMax · Answer

最初のバリアント

paste -d'
' <(uniq -f2 input.txt) <(tac input.txt | uniq -f2 | tac) | uniq

2番目のバリアント

awk ' $3$4 == prev { buf = $0 ORS } $3$4 != prev { print buf $0 prev = $3$4 buf = "" } END { printf("%s", buf) }' input.txt

テスト

入力（テストが複雑）

2018-05-24 23:57:30 1.1.1.1 8.8.4.4 2018-05-24 23:57:32 2.2.2.2 8.8.4.4 2018-05-24 23:58:12 8.8.8.8 8.8.4.4 2018-05-24 23:58:23 8.8.8.8 8.8.4.4 2018-05-24 23:59:40 8.8.8.8 8.8.4.4 2018-05-24 23:59:51 8.8.8.8 8.8.4.4 2018-05-25 00:18:12 8.8.1.8 8.8.4.4 2018-05-25 00:18:23 8.8.1.8 8.8.4.4 2018-05-25 00:19:40 8.8.1.8 8.8.4.4 2018-05-25 00:19:51 8.8.1.8 8.8.4.4 2018-05-25 00:39:51 8.8.2.8 8.8.4.4 2018-05-25 00:49:52 8.8.2.8 8.8.4.4 2018-05-25 00:59:51 8.8.2.8 8.8.4.4

出力（両方のバリアント）

2018-05-24 23:57:30 1.1.1.1 8.8.4.4 2018-05-24 23:57:32 2.2.2.2 8.8.4.4 2018-05-24 23:58:12 8.8.8.8 8.8.4.4 2018-05-24 23:59:51 8.8.8.8 8.8.4.4 2018-05-25 00:18:12 8.8.1.8 8.8.4.4 2018-05-25 00:19:51 8.8.1.8 8.8.4.4 2018-05-25 00:39:51 8.8.2.8 8.8.4.4 2018-05-25 00:59:51 8.8.2.8 8.8.4.4