不正確なテキスト検索

Question

grepやuniqのようなユーティリティはありますが、検索が不正確な場合、または自分で作成する必要がありますか？

つまり、90％（数は異なる場合があります）の一致、またはそのようなものになります。たとえば、いくつかの文字列を含むファイルがあります。

abc123 abd123 abc223 qwe938

この場合、そのようなユーティリティは最初の3つの文字列を返すか、類似していると言う必要があります。もちろん、grepやuniqの場合のようにファイルの内容のパターンはわかりません。

laebshade · Accepted Answer

agrep またはtre-grepはあなたが求めていることを実行します。それらは「おおよその」正規表現マッチング/ grepです。詳細については、ウィキペディアの記事も参照してください。

% tre-agrep --help | head (05-23 16:53) Usage: tre-agrep [OPTION]... PATTERN [FILE]... Searches for approximate matches of PATTERN in each FILE or standard input. Example: `tre-agrep -2 optimize foo.txt' outputs all lines in file `foo.txt' that match "optimize" within two errors. E.g. lines which contain "optimise", "optmise", and "opitmize" all match. Regexp selection and interpretation: -e, --regexp=PATTERN use PATTERN as a regular expression -i, --ignore-case ignore case distinctions -k, --literal PATTERN is a literal string % agrep | head (05-23 16:53) usage: agrep [-@#abcdehiklnoprstvwxyBDGIMSV] [-f patternfile] [-H dir] pattern [files] summary of frequently used options: (For a more detailed listing see 'man agrep'.) -#: find matches with at most # errors -c: output the number of matched records -d: define record delimiter -h: do not output file names -i: case-insensitive search, e.g., 'a' = 'A' -l: output the names of files that contain a match -n: output record prefixed by record number -v: output those records that have no matches -w: pattern has to match as a Word, e.g., 'win' will not match 'wind' -B: best match mode. find the closest matches to the pattern -G: output the files that contain a match -H 'dir': the cast-dictionary is located in directory 'dir'