Pythonで数式の文字列を分割するにはどうすればよいですか？

Question

Pythonで中置を後置に変換するプログラムを作りました。問題は、私が議論を紹介するときです。私がこのようなものを紹介した場合:(これは文字列になります）

( ( 73 + ( ( 34 - 72 ) / ( 33 - 3 ) ) ) + ( 56 + ( 95 - 28 ) ) )

.split（）で分割し、プログラムは正しく動作します。しかし、私はユーザーが次のようなものを紹介できるようにしたいと思います。

((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )

ご覧のとおり、空白スペースは簡単にできるようにしたいのですが、プログラムは引き続き文字列を括弧、整数（数字ではない）、およびオペランドで分割します。

forで解決しようとしましたが、整数（73、34、72）を1桁ずつ（7、3、3、4、7、2）キャッチする方法がわかりません。

要約すると、私が欲しいのは((81 * 6) /42+ (3-1))のような文字列を次のように分割することです。

[(, (, 81, *, 6, ), /, 42, +, (, 3, -, 1, ), )]

Eric Duminil · Accepted Answer

`ast`のツリー

ast を使用して、式のツリーを取得できます。

import ast source = '((81 * 6) /42+ (3-1))' node = ast.parse(source) def show_children(node, level=0): if isinstance(node, ast.Num): print(' ' * level + str(node.n)) else: print(' ' * level + str(node)) for child in ast.iter_child_nodes(node): show_children(child, level+1) show_children(node)

出力：

<_ast.Module object at 0x7f56abbc5490> <_ast.Expr object at 0x7f56abbc5350> <_ast.BinOp object at 0x7f56abbc5450> <_ast.BinOp object at 0x7f56abbc5390> <_ast.BinOp object at 0x7f56abb57cd0> 81 <_ast.Mult object at 0x7f56abbd0dd0> 6 <_ast.Div object at 0x7f56abbd0e50> 42 <_ast.Add object at 0x7f56abbd0cd0> <_ast.BinOp object at 0x7f56abb57dd0> 3 <_ast.Sub object at 0x7f56abbd0d50> 1

@ user2357112がコメントに書いたように：ast.parseはPython構文であり、数式ではありません。(1+2)(3+4)は関数呼び出しとして解析され、リスト内包表記は、おそらく有効な数式と見なされるべきではない場合でも受け入れられます。

正規表現付きのリスト

フラットな構造が必要な場合は、正規表現が機能します。

import re number_or_symbol = re.compile('(\d+|[^ 0-9])') print(re.findall(number_or_symbol, source)) # ['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']

それはどちらかを探します：

複数桁
または数字でもスペースでもない文字

要素のリストを取得したら、構文が正しいかどうかを確認できます。たとえば、 stack を使用して、かっこが一致しているかどうか、またはすべての要素が既知の要素であるかどうかを確認できます。

Horia Coman · Answer

入力には非常に単純なトークナイザーを実装する必要があります。次の種類のトークンがあります。

（（
）
+
-
*
/
\ d +

それらは、あらゆる種類の空白で区切られた入力文字列で見つけることができます。

したがって、最初のステップは、文字列を最初から最後まで処理し、これらのトークンを抽出してから、文字列自体ではなく、トークンに対して解析を行うことです。

これを行うための便利な方法は、次の正規表現を使用することです：'\s*([()+*/-]|\d+)'。その後、次のことができます。

_import re the_input='(3+(2*5))' tokens = [] tokenizer = re.compile(r'\s*([()+*/-]|\d+)') current_pos = 0 while current_pos < len(the_input): match = tokenizer.match(the_input, current_pos) if match is None: raise Error('Syntax error') tokens.append(match.group(1)) current_pos = match.end() print(tokens) _

これにより、['(', '3', '+', '(', '2', '*', '5', ')', ')']が出力されます

_re.findall_または_re.finditer_を使用することもできますが、この場合は構文エラーである不一致をスキップすることになります。

Christian Dean · Answer

単純な式トークナイザーを手作業でロールするのは実際にはかなり簡単です。そして、私はあなたもそのようにもっと学ぶだろうと思います。

したがって、教育と学習のために、これは拡張可能な簡単な式トークナイザーの実装です。 "maximal-much" ルールに基づいて機能します。これは、各トークンを構築するためにできるだけ多くの文字を消費しようとして、「貪欲」に動作することを意味します。

さらに面倒なことはありませんが、ここにトークナイザーがあります：

class ExpressionTokenizer: def __init__(self, expression, operators): self.buffer = expression self.pos = 0 self.operators = operators def _next_token(self): atom = self._get_atom() while atom and atom.isspace(): self._skip_whitespace() atom = self._get_atom() if atom is None: return None Elif atom.isdigit(): return self._tokenize_number() Elif atom in self.operators: return self._tokenize_operator() else: raise SyntaxError() def _skip_whitespace(self): while self._get_atom(): if self._get_atom().isspace(): self.pos += 1 else: break def _tokenize_number(self): endpos = self.pos + 1 while self._get_atom(endpos) and self._get_atom(endpos).isdigit(): endpos += 1 number = self.buffer[self.pos:endpos] self.pos = endpos return number def _tokenize_operator(self): operator = self.buffer[self.pos] self.pos += 1 return operator def _get_atom(self, pos=None): pos = pos or self.pos try: return self.buffer[pos] except IndexError: return None def tokenize(self): while True: token = self._next_token() if token is None: break else: yield token

使用法のデモは次のとおりです。

tokenizer = ExpressionTokenizer('((81 * 6) /42+ (3-1))', {'+', '-', '*', '/', '(', ')'}) for token in tokenizer.tokenize(): print(token)

これは出力を生成します：

( ( 81 * 6 ) / 42 + ( 3 - 1 ) )

Jingjie YANG · Answer

クイック正規表現の回答：re.findall(r"\d+|[()+\-*\/]", str_in)

デモンストレーション：

>>> import re >>> str_in = "((81 * 6) /42+ (3-1))" >>> re.findall(r"\d+|[()+\-*\/]", str_in) ['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']

ネストされた括弧部分の場合、スタックを使用してレベルを追跡できます。

Bill Bell · Answer

これはあなたが望む完全な結果を提供しませんが、この質問を見る他の人にとっては興味深いかもしれません。 pyparsingライブラリを利用します。

# Stolen from http://pyparsing.wikispaces.com/file/view/simpleArith.py/30268305/simpleArith.py # Copyright 2006, by Paul McGuire # ... and slightly altered from pyparsing import * integer = Word(nums).setParseAction(lambda t:int(t[0])) variable = Word(alphas,exact=1) operand = integer | variable expop = Literal('^') signop = oneOf('+ -') multop = oneOf('* /') plusop = oneOf('+ -') factop = Literal('!') expr = operatorPrecedence( operand, [("!", 1, opAssoc.LEFT), ("^", 2, opAssoc.RIGHT), (signop, 1, opAssoc.RIGHT), (multop, 2, opAssoc.LEFT), (plusop, 2, opAssoc.LEFT),] ) print (expr.parseString('((81 * 6) /42+ (3-1))'))

出力：

[[[[81, '*', 6], '/', 42], '+', [3, '-', 1]]]

McGrady · Answer

reモジュールを使用したくない場合は、次の方法を試すことができます。

s="((81 * 6) /42+ (3-1))" r=[""] for i in s.replace(" ",""): if i.isdigit() and r[-1].isdigit(): r[-1]=r[-1]+i else: r.append(i) print(r[1:])

出力：

['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']

Michael Grazebrook · Answer

Grakoの使用：

start = expr $; expr = calc | value; calc = value operator value; value = integer | "(" @:expr ")" ; operator = "+" | "-" | "*" | "/"; integer = /\d+/;

grakoはpythonに変換されます。

この例では、戻り値は次のようになります。

['73', '+', ['34', '-', '72', '/', ['33', '-', '3']], '+', ['56', '+', ['95', '-', '28']]]

通常、生成されたセマンティクスクラスを、さらに処理するためのテンプレートとして使用します。

clintval · Answer

簡単に拡張できる、より冗長な正規表現アプローチを提供するには、次のようにします。

import re solution = [] pattern = re.compile('([\d\.]+)') s = '((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )' for token in re.split(pattern, s): token = token.strip() if re.match(pattern, token): solution.append(float(token)) continue for character in re.sub(' ', '', token): solution.append(character)

それはあなたに結果を与えるでしょう：

 solution = ['(', '(', 73, '+', '(', '(', 34, '-', 72, ')', '/', '(', 33, '-', 3, ')', ')', ')', '+', '(', 56, '+', '(', 95, '-', 28, ')', ')', ')']

aslisabanci · Answer

@McGradyの回答と同様に、基本的なキューの実装でこれを行うことができます。非常に基本的な実装として、Queueクラスは次のようになります。

class Queue: EMPTY_QUEUE_ERR_MSG = "Cannot do this operation on an empty queue." def __init__(self): self._items = [] def __len__(self) -> int: return len(self._items) def is_empty(self) -> bool: return len(self) == 0 def enqueue(self, item): self._items.append(item) def dequeue(self): try: return self._items.pop(0) except IndexError: raise RuntimeError(Queue.EMPTY_QUEUE_ERR_MSG) def peek(self): try: return self._items[0] except IndexError: raise RuntimeError(Queue.EMPTY_QUEUE_ERR_MSG)

この単純なクラスを使用して、解析関数を次のように実装できます。

def tokenize_with_queue(exp: str) -> List: queue = Queue() cum_digit = "" for c in exp.replace(" ", ""): if c in ["(", ")", "+", "-", "/", "*"]: if cum_digit != "": queue.enqueue(cum_digit) cum_digit = "" queue.enqueue(c) Elif c.isdigit(): cum_digit += c else: raise ValueError if cum_digit != "": #one last sweep in case there are any digits waiting queue.enqueue(cum_digit) return [queue.dequeue() for i in range(len(queue))]

以下のようにテストします。

exp = "((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )" print(tokenize_with_queue(exp)")

トークンリストは次のようになります。

['(', '(', '73', '+', '(', '(', '34', '-', '72', ')', '/', '(', '33', '-', '3', ')', ')', ')', '+', '(', '56', '+', '(', '95', '-', '28', ')', ')', ')']

Pythonで数式の文字列を分割するにはどうすればよいですか？

astのツリー

正規表現付きのリスト

`ast`のツリー