.NET用のiText5を使用してPDFファイルを読み取る

Question

プログラミングプラットフォームとしてC＃を使用し、iTextSharpを使用してPDFコンテンツを読み取ります。以下のコードを使用してコンテンツを読み取りましたが、ページごとに読み取られているようです。

 public string ReadPdfFile(object Filename) { string strText = string.Empty; try { PdfReader reader = new PdfReader((string)Filename); for (int page = 1; page <= reader.NumberOfPages; page++) { ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy(); String s = PdfTextExtractor.GetTextFromPage(reader, page, its); s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s))); strText = strText + s; } reader.Close(); } catch (Exception ex) { MessageBox.Show(ex.Message); } return strText; }

1行あたりPDFコンテンツを読み取るコードを作成する方法について誰かが私を助けてくれますか？

Jonathan · Accepted Answer

これを試してください。LocationTextExtractionStrategyの代わりにSimpleTextExtractionStrategyを使用すると、返されるテキストに改行文字が追加されます。次に、strText.Split(' ')を使用してテキストをstring[]に分割し、行ごとに使用できます。