Pythonでは、2つのテキストファイルの内容が同じかどうかを比較する簡潔な方法はありますか？

Question

違いが気にならない。内容が違うのか知りたいだけです。

Federico A. Ramponi · Accepted Answer

低レベルの方法：

from __future__ import with_statement with open(filename1) as f1: with open(filename2) as f2: if f1.read() == f2.read(): ...

高レベルの方法：

import filecmp if filecmp.cmp(filename1, filename2, shallow=False): ...

Rich · Answer

基本的な効率を上げる場合は、まずファイルサイズを確認することをお勧めします。

if os.path.getsize(filename1) == os.path.getsize(filename2): if open('filename1','r').read() == open('filename2','r').read(): # Files are the same.

これにより、サイズが同じでなく、同じにすることはできない2つのファイルのすべての行を読み取る手間が省けます。

（それよりもさらに、各ファイルの高速MD5sumを呼び出してそれらを比較することもできますが、これは「Pythonで」ではないので、ここで終了します。）

tzot · Answer

これは、関数型のファイル比較関数です。ファイルのサイズが異なる場合、即座にFalseを返します。それ以外の場合は、4KiBのブロックサイズを読み取り、最初の違いが発生すると即座にFalseを返します。

from __future__ import with_statement import os import itertools, functools, operator def filecmp(filename1, filename2): "Do the two files have exactly the same contents?" with open(filename1, "rb") as fp1, open(filename2, "rb") as fp2: if os.fstat(fp1.fileno()).st_size != os.fstat(fp2.fileno()).st_size: return False # different sizes ∴ not equal fp1_reader= functools.partial(fp1.read, 4096) fp2_reader= functools.partial(fp2.read, 4096) cmp_pairs= itertools.izip(iter(fp1_reader, ''), iter(fp2_reader, '')) inequalities= itertools.starmap(operator.ne, cmp_pairs) return not any(inequalities) if __== "__main__": import sys print filecmp(sys.argv[1], sys.argv[2])

ただ違うテイク:)

user32141 · Answer

他の人の答えにはコメントできないので自分で書きます。

Md5を使用する場合、メモリの使用量が多すぎるため、md5.update（f.read（））だけを使用してはなりません。

def get_file_md5(f, chunk_size=8192): h = hashlib.md5() while True: chunk = f.read(chunk_size) if not chunk: break h.update(chunk) return h.hexdigest()

Jeremy Cantrell · Answer

MD5を使用してファイルのコンテンツのハッシュを使用します。

import hashlib def checksum(f): md5 = hashlib.md5() md5.update(open(f).read()) return md5.hexdigest() def is_contents_same(f1, f2): return checksum(f1) == checksum(f2) if not is_contents_same('foo.txt', 'bar.txt'): print 'The contents are not the same!'

mmattax · Answer

 f = open(filename1, "r").read() f2 = open(filename2,"r").read() print f == f2

ConcernedOfTunbridgeWells · Answer

より大きなファイルの場合、ファイルの MD5 または [〜＃〜] sha [〜＃〜] ハッシュを計算できます。

Prashanth Babu · Answer

from __future__ import with_statement filename1 = "G:\test1.TXT" filename2 = "G:\test2.TXT" with open(filename1) as f1: with open(filename2) as f2: file1list = f1.read().splitlines() file2list = f2.read().splitlines() list1length = len(file1list) list2length = len(file2list) if list1length == list2length: for index in range(len(file1list)): if file1list[index] == file2list[index]: print file1list[index] + "==" + file2list[index] else: print file1list[index] + "!=" + file2list[index]+" Not-Equel" else: print "difference inthe size of the file and number of lines"