Spacy lemmatizerを使用してWordを基本的な形式にする方法

Question

私はspacyが初めてで、そのlemmatizer関数を使用したいのですが、Wordの文字列に変換するように、その使用方法がわかりません。単語の基本形式で文字列を返します。

例：

「単語」=>「単語」
'did' => 'do'

ありがとうございました。

例：

「単語」=>「単語」
'did' => 'do'

ありがとうございました。

damio · Answer

以前の回答は複雑で編集できないため、より一般的な回答を示します。

# make sure your downloaded the english model with "python -m spacy download en" import spacy nlp = spacy.load('en') doc = nlp(u"Apples and oranges are similar. Boots and hippos aren't.") for token in doc: print(token, token.lemma, token.lemma_)

出力：

Apples 6617 apples and 512 and oranges 7024 orange are 536 be similar 1447 similar . 453 . Boots 4622 boot and 512 and hippos 98365 hippo are 536 be n't 538 not . 453 .

公式照明ツアーから

RAVI · Answer

コード：

import os from spacy.en import English, LOCAL_DATA_DIR data_dir = os.environ.get('SPACY_DATA', LOCAL_DATA_DIR) nlp = English(data_dir=data_dir) doc3 = nlp(u"this is spacy lemmatize testing. programming books are more better than others") for token in doc3: print token, token.lemma, token.lemma_

出力：

this 496 this is 488 be spacy 173779 spacy lemmatize 1510965 lemmatize testing 2900 testing . 419 . programming 3408 programming books 1011 book are 488 be more 529 more better 615 better than 555 than others 871 others

参照の例： here

joel · Answer

Lemmatizerのみを使用する場合。次の方法でそれを行うことができます。

from spacy.lemmatizer import Lemmatizer from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES) lemmas = lemmatizer(u'ducks', u'NOUN') print(lemmas)

出力

['duck']