Pandas DataFrame to a PDF using using Python

Question

パンダのデータフレームにPDFを生成する効率的な方法は何ですか？

wgwz · Accepted Answer

一つの方法は、マークダウンを使用することです。 df.to_html()を使用できます。これにより、データフレームがHTMLテーブルに変換されます。そこから、生成されたhtmlをマークダウンファイル（.md）に配置できます（ http://daringfireball.net/projects/markdown/basics を参照）。そこから、markdownをpdfに変換するユーティリティがあります（ https://www.npmjs.com/package/markdown-pdf ）。

この方法の1つのオールインワンツールは、Atomテキストエディター（ https://atom.io/ ）を使用することです。ここでは、拡張機能を使用できます。「markdown to pdf」を検索すると、変換が行われます。

注：to_html()を使用する場合、最近、何らかの理由で余分な「\ n」文字を削除する必要がありました。 Atom -> Find -> ' ' -> Replace ""を使用することにしました。

全体的にこれでうまくいくはずです！

Dalibor · Answer

ここでは、sqlite3、pandasおよび pdfkit を使用してsqliteデータベースから実行する方法を示します。

import pandas as pd import pdfkit as pdf import sqlite3 con=sqlite3.connect("baza.db") df=pd.read_sql_query("select * from dobit", con) df.to_html('/home/linux/izvestaj.html') nazivFajla='/home/linux/pdfPrintOut.pdf' pdf.from_file('/home/linux/izvestaj.html', nazivFajla)

user3226167 · Answer

最初にmatplotlibでテーブルをプロットし、次にpdfを生成します

import pandas as pd import numpy as np import matplotlib.pyplot as plt from matplotlib.backends.backend_pdf import PdfPages df = pd.DataFrame(np.random.random((10,3)), columns = ("col 1", "col 2", "col 3")) #https://stackoverflow.com/questions/32137396/how-do-i-plot-only-a-table-in-matplotlib fig, ax =plt.subplots(figsize=(12,4)) ax.axis('tight') ax.axis('off') the_table = ax.table(cellText=df.values,colLabels=df.columns,loc='center') #https://stackoverflow.com/questions/4042192/reduce-left-and-right-margins-in-matplotlib-plot pp = PdfPages("foo.pdf") pp.savefig(fig, bbox_inches='tight') pp.close()

参照：

Matplotlibでテーブルのみをプロットするにはどうすればよいですか？

matplotlibプロットの左右のマージンを減らす

mit · Answer

これは、中間pdfファイルを使用したソリューションです。

テーブルは最小限のcssでかなり印刷されています。

PDF変換はweasyprintで行われます。必要がある pip install weasyprint。

# Create a pandas dataframe with demo data: import pandas as pd demodata_csv = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv' df = pd.read_csv(demodata_csv) # Pretty print the dataframe as an html table to a file intermediate_html = '/tmp/intermediate.html' to_html_pretty(df,intermediate_html,'Iris Data') # if you do not want pretty printing, just use pandas: # df.to_html(intermediate_html) # Convert the html file to a pdf file using weasyprint import weasyprint out_pdf= '/tmp/demo.pdf' weasyprint.HTML(intermediate_html).write_pdf(out_pdf) # This is the table pretty printer used above: def to_html_pretty(df, filename='/tmp/out.html', title=''): ''' Write an entire dataframe to an HTML file with Nice formatting. Thanks to @stackoverflowuser2010 for the pretty printer see https://stackoverflow.com/a/47723330/362951 ''' ht = '' if title != '': ht += '<h2> %s </h2>
' % title ht += df.to_html(classes='wide', escape=False) with open(filename, 'w') as f: f.write(HTML_TEMPLATE1 + ht + HTML_TEMPLATE2) HTML_TEMPLATE1 = ''' <html> <head> <style> h2 { text-align: center; font-family: Helvetica, Arial, sans-serif; } table { margin-left: auto; margin-right: auto; } table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 5px; text-align: center; font-family: Helvetica, Arial, sans-serif; font-size: 90%; } table tbody tr:hover { background-color: #dddddd; } .wide { width: 90%; } </style> </head> <body> ''' HTML_TEMPLATE2 = ''' </body> </html> '''

プリティプリンターの@ stackoverflowuser2010のおかげで、stackoverflowuser2010の回答を参照 https://stackoverflow.com/a/47723330/362951

ヘッドレスマシンでpdfkitに問題があったため、pdfkitは使用しませんでした。しかし、weasyprintは素晴らしいです。