各番号が発生する回数を数える

Question

ファイルには、番号付きの5つの列が含まれています

例：

12 34 67 88 10 4 90 12 10 7 33 12 5 76 34

同じ番号を印刷して、それが何回消えるかを確認したいと思います。例：

3 : 12 2 : 34

Bodo · Answer

このawkスクリプトは、次の例のように出力を出力します。

awk '{ for ( i=1; i<=NF; i++ ) # loop over all fields/columns dict[$i]++; # count occurrence in an array using the field value as index/key } END { # after processing all data for (key in dict) # iterate over all array keys if(dict[key]>1) # if the key occurred more than once print dict[key] " : " key # print counter and key }' inputfile

入力例では、出力は次のようになります。

2 : 10 3 : 12 2 : 34

条件if(a[i]>1)を削除すると、1回だけ発生した番号も一覧表示されます。

結果を出現回数の降順で並べ替える場合は、次のように追加します。

| sort -nr

これは、番号の逆順で並べ替えることを意味します。

したがって、上記のawkコマンドと並べ替えを組み合わせて

awk '...' inputfile | sort -nr

作り出す

3 : 12 2 : 34 2 : 10

Glenn jackmanのコメントで述べたように、forブロックの上にPROCINFO["sorted_in"] = "@val_num_desc"を追加することで、ENDで処理するときに配列値をソートするようにGNU AWKに指示できます。

 END { # after processing all data # In GNU AWK only you can use the next line to sort the array for processing PROCINFO["sorted_in"] = "@val_num_desc" # sort descending by numeric value for (key in dict) # iterate over all array keys if(dict[key]>1) # if the key occurred more than once print dict[key] " : " key # print counter and key }

このGNU特定の拡張機能を使用すると、sortにパイプすることなくソートされた結果を取得できます。

roaima · Answer

パイプラインを使用できます

tr -s ' ' '
' < datafile | sort | uniq -c -d

回答がどの程度洗練されているかに応じて、数値をフィルタリングできます。 -dを削除して、カウントが複数ある場所だけでなく、すべての値を表示します。

terdon · Answer

これは @ roaimaの回答と非常に似ていますが、sedを使用すると、カウント時に出力に複数のスペースが含まれるのを回避できます。

$ sed -E 's/ +/
/g' file | sort | uniq -c -d 2 10 3 12 2 34

また、数値で並べ替えて:を追加するには、次のようにします。

$ sed -E 's/ +/
/g' file | sort | uniq -c -d | sort -rn | sed -E 's/([0-9]) /\1 : /' 3 : 12 2 : 34 2 : 10

または：

$ grep -oP '\d+' file | sort | uniq -c -d | sort -rn | sed -E 's/([0-9]) /\1 : /' 3 : 12 2 : 34 2 : 10

または、Perlを使用：

$ Perl -lae '$k{$_}++ for @F; END{ @keys = grep { $k{$_} > 1 } keys(%k); @keys = sort { $k{$b} <=> $k{$a} } @keys; print "$k{$_} : $_" for @keys }' file 3 : 12 2 : 10 2 : 34

または、簡潔にすることに興味がある場合：

$ Perl -lae '$k{$_}++for@F}{print"$k{$_} : $_"for sort{$k{$b}<=>$k{$a}}grep{$k{$_}>1}keys(%k)' file 3 : 12 2 : 10 2 : 34

Jim L. · Answer

入力ファイルの名前がbarであり、図のように適切に構造化されている（空白や数値間の改行）とすると、1つの解決策は次のようになります。

for n in $(cat bar); do echo "$n"; done | sort | uniq -c | sort -nr

Praveen Kumar BS · Answer

コマンド：

sed "N;s/
/ /g" filename | sed "N;s/
/ /g"| Perl -pne "s/ /
/g"| sed '/^$/d'| awk '{a[$1]++}END{for(x in a){print x,a[x]}}'|awk '$2 >1 {print $0}'

出力

sed "N;s/
/ /g" i.txt | sed "N;s/
/ /g"| Perl -pne "s/ /
/g"| sed '/^$/d'| awk '{a[$1]++}END{for(x in a){print x,a[x]}}'|awk '$2 >1 {print $0}' 10 2 12 3 34 2