pythonのファイルの最初のN行を読み取ります

Question

指定したサイズにトリミングしたい大きな生データファイルがあります。私は.net c＃の経験がありますが、これをpythonで行い、物事を単純化し、興味のないものにしたいと考えています。

Pythonでテキストファイルの最初のN行を取得するにはどうすればよいですか？使用されているOSは実装に影響しますか？

John La Rooy · Accepted Answer

Python 2

with open("datafile") as myfile: head = [next(myfile) for x in xrange(N)] print head

Python 3

with open("datafile") as myfile: head = [next(myfile) for x in range(N)] print(head)

別の方法があります（Python 2および3の両方）

from itertools import islice with open("datafile") as myfile: head = list(islice(myfile, N)) print head

ghostdog74 · Answer

N = 10 file = open("file.txt", "a")#the a opens it in append mode for i in range(N): line = file.next().strip() print line file.close()

G M · Answer

最初の行をすばやく読み、パフォーマンスを気にしない場合は、リストオブジェクトを返し、リストをスライスする.readlines()を使用できます。

例えば。最初の5行の場合：

with open("pathofmyfileandfileandname") as myfile: firstNlines=myfile.readlines()[0:5] #put here the interval you want

注：ファイル全体が読み込まれるのでパフォーマンスの観点からは最適ではありませんしかし、使いやすく、書きやすく、覚えやすい一度だけ計算するだけでとても便利です

print firstNlines

Cro-Magnon · Answer

私がやることは、pandasを使用してN行を呼び出すことです。パフォーマンスは最高ではないと思いますが、たとえばN=1000の場合：

import pandas as pd yourfile = pd.read('path/to/your/file.csv',nrows=1000)

artdanil · Answer

ファイルオブジェクトによって公開される行数を読み取る特定の方法はありません。

最も簡単な方法は次のとおりだと思います：

lines =[] with open(file_name) as f: lines.extend(f.readline() for i in xrange(N))

FatihAkici · Answer

これを行う最も直観的な2つの方法は次のとおりです。

ファイルを1行ずつ繰り返し、break行の後にNを繰り返します。
next()メソッドをN回使用して、ファイルを1行ずつ繰り返します。（これは本質的に、トップアンサーが行うことのための単なる異なる構文です。）

コードは次のとおりです。

# Method 1: with open("fileName", "r") as f: counter = 0 for line in f: print line counter += 1 if counter == N: break # Method 2: with open("fileName", "r") as f: for i in xrange(N): line = f.next() print line

要するに、ファイル全体をメモリにreadlines()またはenumerateing使用しない限り、たくさんのオプションがあります。

fdb · Answer

Gnibblerの上位投票回答（2009年11月20日0:27）に基づく：このクラスは、head（）およびtail（）メソッドをファイルオブジェクトに追加します。

class File(file): def head(self, lines_2find=1): self.seek(0) #Rewind file return [self.next() for x in xrange(lines_2find)] def tail(self, lines_2find=1): self.seek(0, 2) #go to end of file bytes_in_file = self.tell() lines_found, total_bytes_scanned = 0, 0 while (lines_2find+1 > lines_found and bytes_in_file > total_bytes_scanned): byte_block = min(1024, bytes_in_file-total_bytes_scanned) self.seek(-(byte_block+total_bytes_scanned), 2) total_bytes_scanned += byte_block lines_found += self.read(1024).count('
') self.seek(-total_bytes_scanned, 2) line_list = list(self.readlines()) return line_list[-lines_2find:]

使用法：

f = File('path/to/file', 'r') f.head(3) f.tail(3)

Maxim Plaksin · Answer

私自身で最も便利な方法：

LINE_COUNT = 3 print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

List Comprehension に基づくソリューション関数open（）は、反復インターフェースをサポートします。 enumerate（）はopen（）をカバーし、タプル（インデックス、アイテム）を返します。次に、許容範囲内にあることを確認し（i <LINE_COUNTの場合）、結果を出力します。

Pythonをお楽しみください。 ;）

cacosomoza · Answer

本当に大きなファイルがあり、出力をnumpy配列にしたい場合、np.genfromtxtを使用するとコンピューターがフリーズします。これは私の経験では非常に優れています：

def load_big_file(fname,maxrows): '''only works for well-formed text file of space-separated doubles''' rows = [] # unknown number of lines, so use list with open(fname) as f: j=0 for line in f: if j==maxrows: break else: line = [float(s) for s in line.split()] rows.append(np.array(line, dtype = np.double)) j+=1 return np.vstack(rows) # convert list of vectors to array

John Machin · Answer

明らかに（マニュアルで難解なものを調べることなく）インポートやtry/exceptなしで動作し、かなりの範囲のPython 2.xバージョン（2.2から2.6）で動作するものが必要な場合：

def headn(file_name, n): """Like *x head -N command""" result = [] nlines = 0 assert n >= 1 for line in open(file_name): result.append(line) nlines += 1 if nlines >= n: break return result if __== "__main__": import sys rval = headn(sys.argv[1], int(sys.argv[2])) print rval print len(rval)

Steve Bading · Answer

Python 2.6以降では、IO基本クラスのより洗練された機能を利用できます。したがって、上記の上位の回答は次のように書き換えることができます。

 with open("datafile") as myfile: head = myfile.readlines(N) print head

（StopIteration例外がスローされないため、ファイルがN行未満になることを心配する必要はありません。）

Surya · Answer

最初の5行については、次のようにします。

N=5 with open("data_file", "r") as file: for i in range(N): print file.next()

Mansur Ali · Answer

#!/usr/bin/python import subprocess p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE) output, err = p.communicate() print output

このメソッドは私のために働いた

Sukanta · Answer

これは私のために働いた

f = open("history_export.csv", "r") line= 5 for x in range(line): a = f.readline() print(a)