Python docxスタイルを維持しながら段落内の文字列を置き換える

Question

ドキュメント全体のフォーマットを維持しながら、Wordドキュメントの文字列を置き換えるのに助けが必要です。

私はpython-docxを使用しています。ドキュメントを読んだ後、段落全体で機能するため、太字や斜体の単語などの書式を緩めます。置き換えるテキストを含めて太字で示していますので、そのままにしておきたいと思います。私はこのコードを使用しています：

from docx import Document def replace_string2(filename): doc = Document(filename) for p in doc.paragraphs: if 'Text to find and replace' in p.text: print 'SEARCH FOUND!!' text = p.text.replace('Text to find and replace', 'new text') style = p.style p.text = text p.style = style # doc.save(filename) doc.save('test.docx') return 1

したがって、それを実装して、次のようなものが必要な場合（置換される文字列を含む段落はフォーマットを失います）：

これは段落1であり、これは太字のテキストです。

これは段落2であり、置き換えます古いテキスト

現在の結果は次のとおりです。

これは段落1であり、これは太字のテキストです。

これは段落2で、新しいテキストを置き換えます

Alo · Accepted Answer

私はこの質問を投稿しました（ここに同じ質問がいくつかありましたが）。（私の知る限り）それらのどれも問題を解決しなかったからです。 oodocxライブラリを使用しているものがありました。試してみましたが、機能しませんでした。だから私は回避策を見つけました。

コードは非常に似ていますが、ロジックは次のとおりです。置き換えたい文字列を含む段落が見つかったら、runsを使用して別のループを追加します。（これは、置き換えたい文字列のフォーマットが同じである場合にのみ機能します）。

def replace_string(filename): doc = Document(filename) for p in doc.paragraphs: if 'old text' in p.text: inline = p.runs # Loop added to work with runs (strings with same style) for i in range(len(inline)): if 'old text' in inline[i].text: text = inline[i].text.replace('old text', 'new text') inline[i].text = text print p.text doc.save('dest1.docx') return 1

adejones · Answer

これは、テキストを置き換えるときにテキストスタイルを保持するために私が機能するものです。

Aloの回答と、検索テキストを複数の実行に分割できるという事実に基づいて、テンプレートdocxファイルのプレースホルダーテキストを置き換えることができました。プレースホルダーのすべてのドキュメント段落とテーブルセルの内容をチェックします。

段落内で検索テキストが見つかると、実行をループして、検索テキストの部分テキストが含まれている実行を識別します。その後、最初の実行で置換テキストを挿入し、残りの実行で残りの検索テキスト文字を空白にします。

これが誰かに役立つことを願っています。これが要点誰かがそれを改善したいなら

編集：その後、docxテンプレート内でjinja2スタイルのテンプレートを作成できるpython-docx-templateを発見しました。ドキュメントへのリンクは次のとおりです

python python-docx python-docx-template

def docx_replace(doc, data): paragraphs = list(doc.paragraphs) for t in doc.tables: for row in t.rows: for cell in row.cells: for paragraph in cell.paragraphs: paragraphs.append(paragraph) for p in paragraphs: for key, val in data.items(): key_name = '${{{}}}'.format(key) # I'm using placeholders in the form ${PlaceholderName} if key_name in p.text: inline = p.runs # Replace strings and retain the same style. # The text to be replaced can be split over several runs so # search through, identify which runs need to have text replaced # then replace the text in those identified started = False key_index = 0 # found_runs is a list of (inline index, index of match, length of match) found_runs = list() found_all = False replace_done = False for i in range(len(inline)): # case 1: found in single run so short circuit the replace if key_name in inline[i].text and not started: found_runs.append((i, inline[i].text.find(key_name), len(key_name))) text = inline[i].text.replace(key_name, str(val)) inline[i].text = text replace_done = True found_all = True break if key_name[key_index] not in inline[i].text and not started: # keep looking ... continue # case 2: search for partial text, find first run if key_name[key_index] in inline[i].text and inline[i].text[-1] in key_name and not started: # check sequence start_index = inline[i].text.find(key_name[key_index]) check_length = len(inline[i].text) for text_index in range(start_index, check_length): if inline[i].text[text_index] != key_name[key_index]: # no match so must be false positive break if key_index == 0: started = True chars_found = check_length - start_index key_index += chars_found found_runs.append((i, start_index, chars_found)) if key_index != len(key_name): continue else: # found all chars in key_name found_all = True break # case 2: search for partial text, find subsequent run if key_name[key_index] in inline[i].text and started and not found_all: # check sequence chars_found = 0 check_length = len(inline[i].text) for text_index in range(0, check_length): if inline[i].text[text_index] == key_name[key_index]: key_index += 1 chars_found += 1 else: break # no match so must be end found_runs.append((i, 0, chars_found)) if key_index == len(key_name): found_all = True break if found_all and not replace_done: for i, item in enumerate(found_runs): index, start, length = [t for t in item] if i == 0: text = inline[index].text.replace(inline[index].text[start:start + length], str(val)) inline[index].text = text else: text = inline[index].text.replace(inline[index].text[start:start + length], '') inline[index].text = text # print(p.text) # usage doc = docx.Document('path/to/template.docx') docx_replace(doc, dict(ItemOne='replacement text', ItemTwo="Some replacement text
and some more") doc.save('path/to/destination.docx')

zain · Answer

from docx import Document document = Document('old.docx') dic = {'name':'ahmed','me':'zain'} for p in document.paragraphs: inline = p.runs for i in range(len(inline)): text = inline[i].text if text in dic.keys(): text=text.replace(text,dic[text]) inline[i].text = text document.save('new.docx')