grepパターンがファイルから完全に一致し、最初の列でのみ検索

Question

私はこのような大きなファイルを持っています：

denovo1 xxx yyyy oggugu ddddd denovo11 ggg hhhh bbbb gggg denovo22 hhhh yyyy kkkk iiii denovo2 yyyyy rrrr fffff jjjj denovo33 hhh yyy eeeee fffff

次に、私のパターンファイルは：

denovo1 denovo3 denovo22

ファイル内のパターンと完全に一致する行のみを抽出するためにfgrepを使用しようとしています（したがって、denovo1は必要ですが、denovo11は必要ありません）。完全一致に-xを使用しようとしましたが、空のファイルが表示されました。私は試した：

fgrep -x --file="pattern" bigfile.txt > clusters.blast.uniq

最初の列でのみgrep検索を行う方法はありますか？

steeldriver · Accepted Answer

-wフラグ-man grepから

 -w, --Word-regexp Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-Word constituent character. Similarly, it must be either at the end of the line or followed by a non-Word constituent character. Word-constituent characters are letters, digits, and the underscore.

つまり.

grep -wFf patfile file denovo1 xxx yyyy oggugu ddddd denovo22 hhhh yyyy kkkk iiii

最初の列でのみ一致を強制するには、パターンファイルのエントリを変更してラインアンカーを追加する必要があります。また、の代わりに\bワードアンカーを使用することもできます。コマンドライン-wスイッチ例patfile：

^denovo1\b ^denovo3\b ^denovo22\b

その後

grep -f patfile file denovo1 xxx yyyy oggugu ddddd denovo22 hhhh yyyy kkkk iiii

ファイルに単純な固定文字列ではなく正規表現が含まれている場合は、-Fスイッチを削除する必要があることに注意してください。

Hackaholic · Answer

awkも使用できます：

awk 'NR==FNR{a[$0]=$0}NR>FNR{if($1==a[$1])print $0}' pattern_file big_file

出力：

denovo1 xxx yyyy oggugu ddddd denovo22 hhhh yyyy kkkk iiii