複数のExcelファイルをpython pandas=にインポートし、それらを1つのデータフレームに連結します

Question

ディレクトリから複数のExcelファイルをpandasに読み込み、それらを1つの大きなデータフレームに連結したいと思います。しかし、理解できませんでした。連結されたデータフレームの構築：ここに私が持っているものがあります：

import sys import csv import glob import pandas as pd # get data file names path =r'C:\DRO\DCL_rawdata_files\excelfiles' filenames = glob.glob(path + "/*.xlsx") dfs = [] for df in dfs: xl_file = pd.ExcelFile(filenames) df=xl_file.parse('Sheet1') dfs.concat(df, ignore_index=True)

ericmjl · Accepted Answer

コメントで述べたように、あなたが犯している1つのエラーは、空のリストをループしていることです。

5つの同一のExcelファイルを次々に追加する例を使用して、これを行う方法を示します。

（1）インポート：

import os import pandas as pd

（2）リストファイル：

path = os.getcwd() files = os.listdir(path) files

出力：

['.DS_Store', '.ipynb_checkpoints', '.localized', 'Screen Shot 2013-12-28 at 7.15.45 PM.png', 'test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls', 'Untitled0.ipynb', 'Werewolf Modelling', '~$Random Numbers.xlsx']

（3）「xls」ファイルを選択：

files_xls = [f for f in files if f[-3:] == 'xls'] files_xls

出力：

['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']

（4）空のデータフレームの初期化：

df = pd.DataFrame()

（5）ファイルのリストをループして空のデータフレームに追加します：

for f in files_xls: data = pd.read_Excel(f, 'Sheet1') df = df.append(data)

（6）新しいデータフレームをお楽しみください。：-）

df

出力：

 Result Sample 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10 0 a 1 1 b 2 2 c 3 3 d 4 4 e 5 5 f 6 6 g 7 7 h 8 8 i 9 9 j 10

john blue · Answer

これはpython 2.xで動作します

excelファイルがあるディレクトリにある

http://pbpython.com/Excel-file-combine.html を参照してください

import numpy as np import pandas as pd import glob all_data = pd.DataFrame() for f in glob.glob("*.xlsx"): df = pd.read_Excel(f) all_data = all_data.append(df,ignore_index=True) # now save the data frame writer = pd.ExcelWriter('output.xlsx') all_data.to_Excel(writer,'sheet1') writer.save()

Tarun Bhavnani · Answer

import pandas as pd import os os.chdir('...') #read first file for column names fdf= pd.read_Excel("first_file.xlsx", sheet_name="sheet_name") #create counter to segregate the different file's data fdf["counter"]=1 nm= list(fdf) c=2 #read first 1000 files for i in os.listdir(): print(c) if c<1001: if "xlsx" in i: df= pd.read_Excel(i, sheet_name="sheet_name") df["counter"]=c if list(df)==nm: fdf=fdf.append(df) c+=1 else: print("headers name not match") else: print("not xlsx") fdf=fdf.reset_index(drop=True) #relax