ファイル内の単語のみを比較

Question

比較する必要がある2つのファイルがあります。

問題は、インデントと改行のフォーマットが異なるため、diff file1 file2は、両方のファイルの出力全体を返します。

実際のテキスト以外をすべて無視する方法はありますか？

Gilles &#39;SO- stop being evil&#39; · Answer

diff -wは、すべての水平方向の空白の変更を無視します。これはインデントを処理しますが、行が別の幅に折り返されている場合、またはテキストの変更後に行が折り返されている場合は役に立ちません。

テキストのフォーマット方法によっては、fmtの出力の比較が使用できる場合と使用できない場合があります。

diff -u --label=file1 <(fmt file1) --label=file2 <(fmt file2)

wdiff をインストールできる場合、その全体的な目的は、直面している問題を解決することです。 EPELから入手できます。

Gitにはこの機能が組み込まれています。Gitリポジトリの外でも機能します。

git diff --Word-diff file1 file2

Kusalananda · Answer

wdiff（ "Word diff"）を使用できます。

$ cat file1 this is file 1, it is two lines long

$ cat file2 this is file 2, it is three lines long

$ wdiff file1 file2 this is file [-1,-] {+2,+} it is [-two-] {+three+} lines long

$ wdiff --no-common file1 file2 ====================================================================== [-1,-] {+2,+} ====================================================================== [-two-] {+three+} ======================================================================`

AdminBee · Answer

meldを試すこともできます。これは（グラフィカルではありますが）かなり強力なファイル比較ツールであり、CentOSで使用できるはずです。

Paul_Pedant · Answer

Diffには、このための複数のオプションがあります。

 -i, --ignore-case ignore case differences in file contents -E, --ignore-tab-expansion ignore changes due to tab expansion -Z, --ignore-trailing-space ignore white space at line end -b, --ignore-space-change ignore changes in the amount of white space -w, --ignore-all-space ignore all white space -B, --ignore-blank-lines ignore changes whose lines are all blank --strip-trailing-cr strip trailing carriage return on input

単語が実際に行間を移動する場合は、各入力ファイルをWordストリームに縮小して比較できます。しかし、それは言葉がどこから来たのかについて多くの文脈を失います。これは、単語を「英数字文字列」を意味し、単語レベルで順番に比較します。

diff <( tr -cs [:alnum:] '
' < file1 ) <( tr -cs [:alnum:] '
' < file2 )