tailに似たPythonでファイルの最後のn行を取得する

Question

Webアプリケーション用のログファイルビューアーを作成しているため、ログファイルの行をページ分割する必要があります。ファイル内のアイテムは、最新のアイテムが下部にある行ベースです。

したがって、下からn行を読み取り、オフセットをサポートするtail()メソッドが必要です。私が思いついたのは次のようなものです。

def tail(f, n, offset=0): """Reads a n lines from f with an offset of offset lines.""" avg_line_length = 74 to_read = n + offset while 1: try: f.seek(-(avg_line_length * to_read), 2) except IOError: # woops. apparently file is smaller than what we want # to step back, go to the beginning instead f.seek(0) pos = f.tell() lines = f.read().splitlines() if len(lines) >= to_read or pos == 0: return lines[-to_read:offset and -offset or None] avg_line_length *= 1.3

これは合理的なアプローチですか？オフセットを使用してログファイルを追跡する推奨方法は何ですか？

Armin Ronacher · Accepted Answer

最終的に使用したコード。これはこれまでのところ最高だと思います：

def tail(f, n, offset=None): """Reads a n lines from f with an offset of offset lines. The return value is a Tuple in the form ``(lines, has_more)`` where `has_more` is an indicator that is `True` if there are more lines in the file. """ avg_line_length = 74 to_read = n + (offset or 0) while 1: try: f.seek(-(avg_line_length * to_read), 2) except IOError: # woops. apparently file is smaller than what we want # to step back, go to the beginning instead f.seek(0) pos = f.tell() lines = f.read().splitlines() if len(lines) >= to_read or pos == 0: return lines[-to_read:offset and -offset or None], \ len(lines) > to_read or pos > 0 avg_line_length *= 1.3

S.Lott · Answer

これはあなたよりも速いかもしれません。行の長さについては想定していません。正しい数の「\ n」文字が見つかるまで、ファイルを1ブロックずつ戻します。

def tail( f, lines=20 ): total_lines_wanted = lines BLOCK_SIZE = 1024 f.seek(0, 2) block_end_byte = f.tell() lines_to_go = total_lines_wanted block_number = -1 blocks = [] # blocks of size BLOCK_SIZE, in reverse order starting # from the end of the file while lines_to_go > 0 and block_end_byte > 0: if (block_end_byte - BLOCK_SIZE > 0): # read the last block we haven't yet read f.seek(block_number*BLOCK_SIZE, 2) blocks.append(f.read(BLOCK_SIZE)) else: # file too small, start from begining f.seek(0,0) # only read what was not read blocks.append(f.read(block_end_byte)) lines_found = blocks[-1].count('
') lines_to_go -= lines_found block_end_byte -= BLOCK_SIZE block_number -= 1 all_read_text = ''.join(reversed(blocks)) return '
'.join(all_read_text.splitlines()[-total_lines_wanted:])

実際の問題として、あなたがそのようなことを決して知ることができないとき、私は行の長さについてトリッキーな仮定が好きではありません。

通常、これにより、ループの最初または2回目のパスで最後の20行が検索されます。 74文字のものが実際に正確である場合、ブロックサイズを2048にし、ほぼ20行をテールします。

また、物理的なOSブロックとの調整を行おうとして脳のカロリーをあまり消費しません。これらの高レベルI/Oパッケージを使用すると、OSブロックの境界に合わせようとすることでパフォーマンスが低下することを疑います。下位レベルのI/Oを使用すると、速度が向上する場合があります。

Mark · Answer

できるPython 2上のUnixライクなシステムを想定しています：

import os def tail(f, n, offset=0): stdin,stdout = os.popen2("tail -n "+n+offset+" "+f) stdin.close() lines = stdout.readlines(); stdout.close() return lines[:,-offset]

python 3の場合：

import subprocess def tail(f, n, offset=0): proc = subprocess.Popen(['tail', '-n', n + offset, f], stdout=subprocess.PIPE) lines = proc.stdout.readlines() return lines[:, -offset]

A. Coady · Answer

ファイル全体を読み取ることが許容される場合は、両端キューを使用します。

from collections import deque deque(f, maxlen=n)

2.6より前では、dequeにはmaxlenオプションがありませんでしたが、実装するのは簡単です。

import itertools def maxque(items, size): items = iter(items) q = deque(itertools.islice(items, size)) for item in items: del q[0] q.append(item) return q

ファイルを最後から読み取る必要がある場合は、ギャロップ（別名指数）検索を使用します。

def tail(f, n): assert n >= 0 pos, lines = n+1, [] while len(lines) <= n: try: f.seek(-pos, 2) except IOError: f.seek(0) break finally: lines = list(f) pos *= 2 return lines[-n:]

glenbot · Answer

これが私の答えです。純粋なパイソン。 timeitを使用すると、かなり高速に見えます。 100,000行のログファイルの100行をテーリング：

>>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=10) 0.0014600753784179688 >>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=100) 0.00899195671081543 >>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=1000) 0.05842900276184082 >>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=10000) 0.5394978523254395 >>> timeit.timeit('tail.tail(f, 100, 4098)', 'import tail; f = open("log.txt", "r");', number=100000) 5.377126932144165

コードは次のとおりです。

import os def tail(f, lines=1, _buffer=4098): """Tail a file and get X lines from the end""" # place holder for the lines found lines_found = [] # block counter will be multiplied by buffer # to get the block size from the end block_counter = -1 # loop until we find X lines while len(lines_found) < lines: try: f.seek(block_counter * _buffer, os.SEEK_END) except IOError: # either file is too small, or too many lines requested f.seek(0) lines_found = f.readlines() break lines_found = f.readlines() # we found enough lines, get out # Removed this line because it was redundant the while will catch # it, I left it for history # if len(lines_found) > lines: # break # decrement the block counter to get the # next X bytes block_counter -= 1 return lines_found[-lines:]

papercrane · Answer

上記のS.Lottの答えは私にはほとんど効果がありますが、最終的には部分的な線を与えてしまいます。データは読み取りブロックを逆の順序で保持するため、ブロック境界上のデータが破損することがわかります。 '' .join（data）が呼び出されると、ブロックの順序が間違っています。これで修正されます。

def tail(f, window=20): """ Returns the last `window` lines of file `f` as a list. f - a byte file-like object """ if window == 0: return [] BUFSIZ = 1024 f.seek(0, 2) bytes = f.tell() size = window + 1 block = -1 data = [] while size > 0 and bytes > 0: if bytes - BUFSIZ > 0: # Seek back one whole BUFSIZ f.seek(block * BUFSIZ, 2) # read BUFFER data.insert(0, f.read(BUFSIZ)) else: # file too small, start from begining f.seek(0,0) # only read what was not read data.insert(0, f.read(bytes)) linesFound = data[0].count('
') size -= linesFound bytes -= BUFSIZ block -= 1 return ''.join(data).splitlines()[-window:]

dimitri · Answer

Mmapを使用したシンプルで高速なソリューション：

import mmap import os def tail(filename, n): """Returns last n lines from the filename. No exception handling""" size = os.path.getsize(filename) with open(filename, "rb") as f: # for Windows the mmap parameters are different fm = mmap.mmap(f.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) try: for i in xrange(size - 1, -1, -1): if fm[i] == '
': n -= 1 if n == -1: break return fm[i + 1 if i else 0:].splitlines() finally: fm.close()

Hauke Rehfeld · Answer

挿入はしないが、追加と反転を行う、さらにクリーンなpython3互換バージョン：

def tail(f, window=1): """ Returns the last `window` lines of file `f` as a list of bytes. """ if window == 0: return b'' BUFSIZE = 1024 f.seek(0, 2) end = f.tell() nlines = window + 1 data = [] while nlines > 0 and end > 0: i = max(0, end - BUFSIZE) nread = min(end, BUFSIZE) f.seek(i) chunk = f.read(nread) data.append(chunk) nlines -= chunk.count(b'
') end -= nread return b'
'.join(b''.join(reversed(data)).splitlines()[-window:])

次のように使用します。

with open(path, 'rb') as f: last_lines = tail(f, 3).decode('utf-8')

ShadowRanger · Answer

同様の質問に対する私の答えのコメント投稿者の要望に答えを投稿します。同じテクニックを使用して、ファイルの最後の行を変更するだけでなく、最後の行を変更します。

かなりのサイズのファイルの場合、 mmap がこれを行う最適な方法です。既存のmmapの回答を改善するために、このバージョンはWindowsとLinuxの間で移植可能であり、より高速に実行する必要があります（ただし、32ビットPythonをGB範囲のファイルで変更しないと動作しません。これを処理するためのヒント、およびPython 2で動作するように変更するためのその他の回答）。

import io # Gets consistent version of open for both Py2.7 and Py3.x import itertools import mmap def skip_back_lines(mm, numlines, startidx): '''Factored out to simplify handling of n and offset''' for _ in itertools.repeat(None, numlines): startidx = mm.rfind(b'
', 0, startidx) if startidx < 0: break return startidx def tail(f, n, offset=0): # Reopen file in binary mode with io.open(f.name, 'rb') as binf, mmap.mmap(binf.fileno(), 0, access=mmap.ACCESS_READ) as mm: # len(mm) - 1 handles files ending w/newline by getting the prior line startofline = skip_back_lines(mm, offset, len(mm) - 1) if startofline < 0: return [] # Offset lines consumed whole file, nothing to return # If using a generator function (yield-ing, see below), # this should be a plain return, no empty list endoflines = startofline + 1 # Slice end to omit offset lines # Find start of lines to capture (add 1 to move from newline to beginning of following line) startofline = skip_back_lines(mm, n, startofline) + 1 # Passing True to splitlines makes it return the list of lines without # removing the trailing newline (if any), so list mimics f.readlines() return mm[startofline:endoflines].splitlines(True) # If Windows style 
 newlines need to be normalized to 
, and input # is ASCII compatible, can normalize newlines with: # return mm[startofline:endoflines].replace(os.linesep.encode('ascii'), b'
').splitlines(True)

これは、末尾の行数が十分に小さく、一度にすべてを安全にメモリに読み込むことができることを前提としています。また、これをジェネレーター関数にして、最終行を次の行に置き換えることにより、一度に手動で行を読み取ることもできます。

 mm.seek(startofline) # Call mm.readline n times, or until EOF, whichever comes first # Python 3.2 and earlier: for line in itertools.islice(iter(mm.readline, b''), n): yield line # 3.3+: yield from itertools.islice(iter(mm.readline, b''), n)

最後に、これはバイナリモード（mmapを使用するために必要）で読み取るため、str行（Py2）およびbytes行（Py3）を提供します。 unicode（Py2）またはstr（Py3）が必要な場合、反復アプローチを調整して、デコードしたり、改行を修正したりできます。

 lines = itertools.islice(iter(mm.readline, b''), n) if f.encoding: # Decode if the passed file was opened with a specific encoding lines = (line.decode(f.encoding) for line in lines) if 'b' not in f.mode: # Fix line breaks if passed file opened in text mode lines = (line.replace(os.linesep, '
') for line in lines) # Python 3.2 and earlier: for line in lines: yield line # 3.3+: yield from lines

注：テストするためにPythonにアクセスできないマシンでこれをすべて入力しました。何か入力ミスがあったら教えてください。これは他の答えに十分に似ていたので、私はthink動作すると思いますが、微調整（たとえば、offsetの処理）はできます。微妙なエラーにつながります。間違いがある場合は、コメントでお知らせください。

Marko · Answer

上記のPopenが最適なソリューションであることがわかりました。それは速くて汚れており、動作しますUnixマシンのpython 2.6では、以下を使用しました

 def GetLastNLines(self, n, fileName): """ Name: Get LastNLines Description: Gets last n lines using Unix tail Output: returns last n lines of a file Keyword argument: n -- number of last lines to return filename -- Name of the file you need to tail into """ p=subprocess.Popen(['tail','-n',str(n),self.__fileName], stdout=subprocess.PIPE) soutput,sinput=p.communicate() return soutput

soutputには、コードの最後のn行が含まれます。行ごとにsoutputを反復するには：

for line in GetLastNLines(50,'myfile.log').split('
'): print line

Emilio · Answer

@papercraneソリューションをpython3に更新します。 open(filename, 'rb')でファイルを開き、以下を実行します。

def tail(f, window=20): """Returns the last `window` lines of file `f` as a list. """ if window == 0: return [] BUFSIZ = 1024 f.seek(0, 2) remaining_bytes = f.tell() size = window + 1 block = -1 data = [] while size > 0 and remaining_bytes > 0: if remaining_bytes - BUFSIZ > 0: # Seek back one whole BUFSIZ f.seek(block * BUFSIZ, 2) # read BUFFER bunch = f.read(BUFSIZ) else: # file too small, start from beginning f.seek(0, 0) # only read what was not read bunch = f.read(remaining_bytes) bunch = bunch.decode('utf-8') data.insert(0, bunch) size -= bunch.count('
') remaining_bytes -= BUFSIZ block -= 1 return ''.join(data).splitlines()[-window:]

GL2014 · Answer

これは非常に単純な実装です。

with open('/etc/passwd', 'r') as f: try: f.seek(0,2) s = '' while s.count('
') < 11: cur = f.tell() f.seek((cur - 10)) s = f.read(10) + s f.seek((cur - 10)) print s except Exception as e: f.readlines()

Eyecue · Answer

s.Lottの上位投票回答（08年9月25日21時43分）に基づきますが、小さなファイル用に修正されました。

def tail(the_file, lines_2find=20): the_file.seek(0, 2) #go to end of file bytes_in_file = the_file.tell() lines_found, total_bytes_scanned = 0, 0 while lines_2find+1 > lines_found and bytes_in_file > total_bytes_scanned: byte_block = min(1024, bytes_in_file-total_bytes_scanned) the_file.seek(-(byte_block+total_bytes_scanned), 2) total_bytes_scanned += byte_block lines_found += the_file.read(1024).count('
') the_file.seek(-total_bytes_scanned, 2) line_list = list(the_file.readlines()) return line_list[-lines_2find:] #we read at least 21 line breaks from the bottom, block by block for speed #21 to ensure we don't get a half line

これが役立つことを願っています。

Travis Bear · Answer

Pypiにはtailの既存の実装がいくつかあり、pipを使用してインストールできます。

mtFileUtil
マルチテール
log4tailer
...

状況によっては、これらの既存のツールのいずれかを使用する利点がある場合があります。

Samba Siva Reddy · Answer

シンプル：

with open("test.txt") as f: data = f.readlines() tail = data[-2:] print(''.join(tail)

rabbit · Answer

f.seek（0、2）を使用してファイルの最後に移動し、次のreadline（）の置き換えで1行ずつ読み取ることができます。

def readline_backwards(self, f): backline = '' last = '' while not last == '
': backline = last + backline if f.tell() <= 0: return backline f.seek(-1, 1) last = f.read(1) f.seek(-1, 1) backline = last last = '' while not last == '
': backline = last + backline if f.tell() <= 0: return backline f.seek(-1, 1) last = f.read(1) f.seek(-1, 1) f.seek(1, 1) return backline

Brian · Answer

非常に大きなファイル（tailを使用する場合があるログファイルの状況で一般的）で効率を上げるため、通常はファイル全体を読み取らないようにします（ファイル全体を一度にメモリに読み取らずに行う場合でも）。ただし、文字ではなく行のオフセットを何らかの方法で解決する必要があります。 1つの可能性は、文字ごとにseek（）文字で逆方向に読み取ることですが、これは非常に遅いです。代わりに、大きなブロックで処理する方が適切です。

ここで使用できるファイルを逆読みするために少し前に書いたユーティリティ関数があります。

import os, itertools def rblocks(f, blocksize=4096): """Read file as series of blocks from end of file to start. The data itself is in normal order, only the order of the blocks is reversed. ie. "hello world" -> ["ld","wor", "lo ", "hel"] Note that the file must be opened in binary mode. """ if 'b' not in f.mode.lower(): raise Exception("File must be opened using binary mode.") size = os.stat(f.name).st_size fullblocks, lastblock = divmod(size, blocksize) # The first(end of file) block will be short, since this leaves # the rest aligned on a blocksize boundary. This may be more # efficient than having the last (first in file) block be short f.seek(-lastblock,2) yield f.read(lastblock) for i in range(fullblocks-1,-1, -1): f.seek(i * blocksize) yield f.read(blocksize) def tail(f, nlines): buf = '' result = [] for block in rblocks(f): buf = block + buf lines = buf.splitlines() # Return all lines except the first (since may be partial) if lines: result.extend(lines[1:]) # First line may not be complete if(len(result) >= nlines): return result[-nlines:] buf = lines[0] return ([buf]+result)[-nlines:] f=open('file_to_tail.txt','rb') for line in tail(f, 20): print line

[編集]より具体的なバージョンを追加しました（2回逆にする必要がありません）

fdb · Answer

Eyecueの回答（10年6月10日21:28）に基づく：このクラスは、head（）およびtail（）メソッドをファイルオブジェクトに追加します。

class File(file): def head(self, lines_2find=1): self.seek(0) #Rewind file return [self.next() for x in xrange(lines_2find)] def tail(self, lines_2find=1): self.seek(0, 2) #go to end of file bytes_in_file = self.tell() lines_found, total_bytes_scanned = 0, 0 while (lines_2find+1 > lines_found and bytes_in_file > total_bytes_scanned): byte_block = min(1024, bytes_in_file-total_bytes_scanned) self.seek(-(byte_block+total_bytes_scanned), 2) total_bytes_scanned += byte_block lines_found += self.read(1024).count('
') self.seek(-total_bytes_scanned, 2) line_list = list(self.readlines()) return line_list[-lines_2find:]

使用法：

f = File('path/to/file', 'r') f.head(3) f.tail(3)

David Rogers · Answer

これらのソリューションのいくつかは、ファイルが\ nで終わらない場合、または最初の行全体が確実に読み取られるようにする場合に問題があります。

def tail(file, n=1, bs=1024): f = open(file) f.seek(-1,2) l = 1-f.read(1).count('
') # If file doesn't end in 
, count it anyway. B = f.tell() while n >= l and B > 0: block = min(bs, B) B -= block f.seek(B, 0) l += f.read(block).count('
') f.seek(B, 0) l = min(l,n) # discard first (incomplete) line if l > n lines = f.readlines()[-l:] f.close() return lines

Jigar Wala · Answer

A.Coady で与えられる回答の更新

python 3で動作します。

これは Exponential Search を使用し、N行のみを後ろからバッファリングし、非常に効率的です。

import time import os import sys def tail(f, n): assert n >= 0 pos, lines = n+1, [] # set file pointer to end f.seek(0, os.SEEK_END) isFileSmall = False while len(lines) <= n: try: f.seek(f.tell() - pos, os.SEEK_SET) except ValueError as e: # lines greater than file seeking size # seek to start f.seek(0,os.SEEK_SET) isFileSmall = True except IOError: print("Some problem reading/seeking the file") sys.exit(-1) finally: lines = f.readlines() if isFileSmall: break pos *= 2 print(lines) return lines[-n:] with open("stream_logs.txt") as f: while(True): time.sleep(0.5) print(tail(f,2))

Hal Canary · Answer

Dequeを使用した最初の例ではなく、より単純な例です。これは一般的なものです。ファイルだけでなく、反復可能なオブジェクトで動作します。

#!/usr/bin/env python import sys import collections def tail(iterable, N): deq = collections.deque() for thing in iterable: if len(deq) >= N: deq.popleft() deq.append(thing) for thing in deq: yield thing if __== '__main__': for line in tail(sys.stdin,10): sys.stdout.write(line)

Raj · Answer

This is my version of tailf import sys, time, os filename = 'path to file' try: with open(filename) as f: size = os.path.getsize(filename) if size < 1024: s = size else: s = 999 f.seek(-s, 2) l = f.read() print l while True: line = f.readline() if not line: time.sleep(1) continue print line except IOError: pass

Quinten Cabo · Answer

これを行うことができる非常に便利なモジュールがあります。

from file_read_backwards import FileReadBackwards with FileReadBackwards("/tmp/file", encoding="utf-8") as frb: # getting lines by lines starting from the last line up for l in frb: print(l)

Leifbk · Answer

ファイルの最後の行から特定の値を読み取る必要があり、このスレッドにつまずきました。 Pythonで車輪を再発明するのではなく、/ usr/local/bin/get_last_netpとして保存された小さなシェルスクリプトになりました。

#! /bin/bash tail -n1 /home/leif/projects/transfer/export.log | awk {'print $14'}

Pythonプログラムでは：

from subprocess import check_output last_netp = int(check_output("/usr/local/bin/get_last_netp"))

Y Kal · Answer

import itertools fname = 'log.txt' offset = 5 n = 10 with open(fname) as f: n_last_lines = list(reversed([x for x in itertools.islice(f, None)][-(offset+1):-(offset+n+1):-1]))

moylop260 · Answer

import time attemps = 600 wait_sec = 5 fname = "YOUR_PATH" with open(fname, "r") as f: where = f.tell() for i in range(attemps): line = f.readline() if not line: time.sleep(wait_sec) f.seek(where) else: print line, # already has newline

Kant Manapure · Answer

abc = "2018-06-16 04:45:18.68" filename = "abc.txt" with open(filename) as myFile: for num, line in enumerate(myFile, 1): if abc in line: lastline = num print "last occurance of work at file is in "+str(lastline)