Python生の文字列で改行文字を一致させる方法

Question

Python生文字列。生文字列を使用すると、 '\'を通常のバックスラッシュとして扱うことがわかります（例：r '\ n'は '\'と' n '）。しかし、生の文字列の改行文字に一致させたいのかと思っていました。r' 'を試しましたが、うまくいきませんでした。

mgilson · Accepted Answer

正規表現では、マルチラインモードであることを指定する必要があります。

>>> import re >>> s = """cat ... dog""" >>> >>> re.match(r'cat
dog',s,re.M) <_sre.SRE_Match object at 0xcb7c8>

reは（生の文字列）を改行に変換することに注意してください。コメントで示したように、実際には必要ありません re.M $と^をより直感的に一致させるのに役立ちます：

>> re.match(r'^cat
dog',s).group(0) 'cat
dog' >>> re.match(r'^cat$
dog',s).group(0) #doesn't match Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'NoneType' object has no attribute 'group' >>> re.match(r'^cat$
dog',s,re.M).group(0) #matches. 'cat
dog'

Gareth Latty · Answer

最も単純な答えは、単純に生の文字列を使用しないことです。 \を使用して、バックスラッシュをエスケープできます。

一部のセグメントにバックスラッシュが大量にある場合は、必要に応じて生の文字列と通常の文字列を連結できます。

r"some string \ with \ backslashes" "
"

（Pythonは文字列リテラルを自動的に連結します。文字列リテラルの間は空白のみです。）

Windowsでパスを使用している場合、最も簡単なオプションはスラッシュのみを使用することであることに注意してください-引き続き正常に機能します。

Rajat Subhra Bhowmick · Answer

def clean_with_puncutation(text): from string import punctuation import re punctuation_token={p:'<PUNC_'+p+'>' for p in punctuation} punctuation_token['<br/>']="<TOKEN_BL>" punctuation_token['\n']="<TOKEN_NL>" punctuation_token['<EOF>']='<TOKEN_EOF>' punctuation_token['<SOF>']='<TOKEN_SOF>' #punctuation_token regex = r"(<br/>)|(<EOF>)|(<SOF>)|[\n\!\@\#\$\%\^\&\*\ {\}\;\:\,\./\?\|\`\_\+\\=\~\-\<\>]" ###Always put new sequence token at front to avoid overlapping results #text = '<EOF>!@#$%^&*()[]{};:,./<>?\|`~-= _+\<br/>\n <SOF>\ ' text_="" matches = re.finditer(regex, text) index=0 for match in matches: #print(match.group()) #print(punctuation_token[match.group()]) #print ("Match at index: %s, %s" % (match.start(), match.end())) text_=text_+ text[index:match.start()] +" " +punctuation_token[match.group()]+ " " index=match.end() return text_

Mohammad Hossein zare mehrjard · Answer

[\ r ]を使用して新しい行に一致させることもできます