文字列をリストに分割するにはどうすればよいですか？

Question

この文字列がある場合：

2 + 24 * 48/32

このリストを作成するための最も効率的なアプローチは何ですか：

['2'、 '+'、 '24'、 '*'、 '48'、 '/'、 '32']

Glyph · Answer

分割したいトークンが既にPythonトークンであるため、組み込みのtokenizeモジュールを使用できます。ほとんど1ライナーです：

from cStringIO import StringIO from tokenize import generate_tokens STRING = 1 list(token[STRING] for token in generate_tokens(StringIO('2+24*48/32').readline) if token[STRING]) ['2', '+', '24', '*', '48', '/', '32']

Readonly · Answer

splitモジュールのreを使用できます。

re.split（pattern、string、maxsplit = 0、flags = 0）

パターンの出現によって文字列を分割します。パターンでキャプチャ括弧を使用すると、パターン内のすべてのグループのテキストも結果リストの一部として返されます。

サンプルコード：

import re data = re.split(r'(\D)', '2+24*48/32')

\ D

UNICODEフラグが指定されていない場合、\ Dは数字以外の文字と一致します。これは、セット[^ 0-9]と同等です。

molasses · Answer

>>> import re >>> re.findall(r'\d+|\D+', '2+24*48/32=10') ['2', '+', '24', '*', '48', '/', '32', '=', '10']

連続する数字または連続する非数字に一致します。

各一致は、リスト内の新しい要素として返されます。

使用法によっては、正規表現を変更する必要がある場合があります。数字を小数点と一致させる必要がある場合など。

>>> re.findall(r'[0-9\.]+|[^0-9\.]+', '2+24*48/32=10.1') ['2', '+', '24', '*', '48', '/', '32', '=', '10.1']

Jerub · Answer

これは構文解析の問題のように見えるため、構文解析手法に基づいたソリューションを提示せざるを得ません。

この文字列を「分割」したいように見えるかもしれませんが、実際にやりたいのは「トークン化」だと思います。トークン化または字句解析は、解析する前のコンパイル手順です。ここで適切な再帰的パーサーを実装するために、編集の元の例を修正しました。これは、手動でパーサーを実装する最も簡単な方法です。

import re patterns = [ ('number', re.compile('\d+')), ('*', re.compile(r'\*')), ('/', re.compile(r'\/')), ('+', re.compile(r'\+')), ('-', re.compile(r'\-')), ] whitespace = re.compile('\W+') def tokenize(string): while string: # strip off whitespace m = whitespace.match(string) if m: string = string[m.end():] for tokentype, pattern in patterns: m = pattern.match(string) if m: yield tokentype, m.group(0) string = string[m.end():] def parseNumber(tokens): tokentype, literal = tokens.pop(0) assert tokentype == 'number' return int(literal) def parseMultiplication(tokens): product = parseNumber(tokens) while tokens and tokens[0][0] in ('*', '/'): tokentype, literal = tokens.pop(0) if tokentype == '*': product *= parseNumber(tokens) Elif tokentype == '/': product /= parseNumber(tokens) else: raise ValueError("Parse Error, unexpected %s %s" % (tokentype, literal)) return product def parseAddition(tokens): total = parseMultiplication(tokens) while tokens and tokens[0][0] in ('+', '-'): tokentype, literal = tokens.pop(0) if tokentype == '+': total += parseMultiplication(tokens) Elif tokentype == '-': total -= parseMultiplication(tokens) else: raise ValueError("Parse Error, unexpected %s %s" % (tokentype, literal)) return total def parse(tokens): tokenlist = list(tokens) returnvalue = parseAddition(tokenlist) if tokenlist: print 'Unconsumed data', tokenlist return returnvalue def main(): string = '2+24*48/32' for tokentype, literal in tokenize(string): print tokentype, literal print parse(tokenize(string)) if __== '__main__': main()

ブラケットの処理の実装は、読者の課題として残されています。この例は、加算の前に乗算を正しく行います。

Ber · Answer

これは解析の問題であるため、split（）ではなくregexも「良い」解決策ではありません。代わりにパーサージェネレーターを使用してください。

pyparsing を詳しく調べます。 Python Magazine にも、pyparsingに関するいくつかのまともな記事があります。

Jiayao Yu · Answer

s = "2 + 24 * 48/32"

p = re.compile（r '（\ W +）'）

p.split（s）

Cristian · Answer

正規表現：

>>> import re >>> splitter = re.compile(r'([+*/])') >>> splitter.split("2+24*48/32")

正規表現を展開して、分割したい他の文字を含めることができます。

habnabit · Answer

これに対する別の解決策は、そのような計算機の作成を完全に避けることです。 RPNパーサーの記述ははるかに単純であり、中置記法を使用した数学の記述に固有のあいまいさはありません。

import operator, math calc_operands = { '+': (2, operator.add), '-': (2, operator.sub), '*': (2, operator.mul), '/': (2, operator.truediv), '//': (2, operator.div), '%': (2, operator.mod), '^': (2, operator.pow), '**': (2, math.pow), 'abs': (1, operator.abs), 'ceil': (1, math.ceil), 'floor': (1, math.floor), 'round': (2, round), 'trunc': (1, int), 'log': (2, math.log), 'ln': (1, math.log), 'pi': (0, lambda: math.pi), 'e': (0, lambda: math.e), } def calculate(inp): stack = [] for tok in inp.split(): if tok in self.calc_operands: n_pops, func = self.calc_operands[tok] args = [stack.pop() for x in xrange(n_pops)] args.reverse() stack.append(func(*args)) Elif '.' in tok: stack.append(float(tok)) else: stack.append(int(tok)) if not stack: raise ValueError('no items on the stack.') return stack.pop() if stack: raise ValueError('%d item(s) left on the stack.' % len(stack)) calculate('24 38 * 32 / 2 +')

jbchichoko · Answer

>>> import re >>> my_string = "2+24*48/32" >>> my_list = re.findall(r"-?\d+|\S", my_string) >>> print my_list ['2', '+', '24', '*', '48', '/', '32']

これでうまくいきます。以前にこの種の問題に遭遇しました。

Diamond Python · Answer

これは質問に正確に答えるわけではありませんが、あなたが達成しようとしていることを解決すると信じています。コメントとして追加しますが、まだ許可していません。

私は個人的に、execでPythonの数学機能を直接利用します。

式= "2 + 24 * 48/32"
exec "result =" +式
印刷結果
38