列が一致する2つのファイルを結合する

Question

File1.txt

_ id No gi|371443199|gb|JH556661.1| 7907290 gi|371443198|gb|JH556662.1| 7573913 gi|371443197|gb|JH556663.1| 7384412 gi|371440577|gb|JH559283.1| 6931777 _

File2.txt

_ id P R S gi|367088741|gb|AGAJ01056324.1| 5 5 0 gi|371443198|gb|JH556662.1| 2 2 0 gi|367090281|gb|AGAJ01054784.1| 4 4 0 gi|371440577|gb|JH559283.1| 21 19 2 _

output.txt

_ id P R S NO gi|371443198|gb|JH556662.1| 2 2 0 7573913 gi|371440577|gb|JH559283.1| 21 19 2 6931777 _

File1.txtには2つの列があり、File2.txtには4つの列があります。一意のIDを持つ2つのファイル（array [1]は両方のファイル（file1.txtとfile2.txt）で一致する必要があります）を結合し、一致するIDのみを出力したい（output.txtを参照）。

私はjoin -v <(sort file1.txt) <(sort file2.txt)を試しました。要求されたawkまたはjoinコマンドのヘルプ。

rush · Accepted Answer

joinはうまく機能します：

$ join <(sort File1.txt) <(sort File2.txt) | column -t | tac id No P R S gi|371443198|gb|JH556662.1| 7573913 2 2 0 gi|371440577|gb|JH559283.1| 6931777 21 19 2

ps。出力列の順序は重要ですか？

はいの場合：

$ join <(sort 1) <(sort 2) | tac | awk '{print $1,$3,$4,$5,$2}' | column -t id P R S No gi|371443198|gb|JH556662.1| 2 2 0 7573913 gi|371440577|gb|JH559283.1| 21 19 2 6931777

Birei · Answer

awkを使用する1つの方法：

の内容 script.awk：

## Process first file of arguments. Save 'id' as key and 'No' as value ## of a hash. FNR == NR { if ( FNR == 1 ) { header = $2 next } hash[ $1 ] = $2 next } ## Process second file of arguments. Print header in first line and for ## the rest check if first field is found in the hash. FNR < NR { if ( $1 in hash || FNR == 1 ) { printf "%s %s
", $0, ( FNR == 1 ? header : hash[ $1 ] ) } }

次のように実行します。

awk -f script.awk File1.txt File2.txt | column -t

次の結果：

id P R S NO gi|371443198|gb|JH556662.1| 2 2 0 7573913 gi|371440577|gb|JH559283.1| 21 19 2 6931777