Vaderの「化合物」極性スコアは、Python NLTK？

Question

極性スコアを取得するためにVader SentimentAnalyzerを使用しています。前に正/負/中立の確率スコアを使用しましたが、-1（ほとんどの負）から1（ほとんどのpos）の範囲の「複合」スコアが極性の単一の尺度を提供することに気付きました。「複合」スコアはどのように計算されたのだろうか。それは[pos、neu、neg]ベクトルから計算されますか？

alvas · Answer

VADERアルゴリズムは、センチメントスコアを4つのクラスのセンチメントに出力します https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L441 ：

neg：負
neu：ニュートラル
pos：正
compound：複合（つまり、集計されたスコア）

コードを見てみましょう。化合物の最初のインスタンスは https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L421 にあり、ここで計算されます：

compound = normalize(sum_s)

normalize()関数は https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L107 で定義されています。

def normalize(score, alpha=15): """ Normalize the score to be between -1 and 1 using an alpha that approximates the max expected value """ norm_score = score/math.sqrt((score*score) + alpha) return norm_score

ハイパーパラメーターalphaがあります。

sum_sに関しては、score_valence()関数に渡されるセンチメント引数の合計です https://github.com/nltk/nltk/blob/develop/nltk/sentiment /vader.py#L41

そして、このsentiment引数をさかのぼると、 https://github.com/nltk/nltk/blob/develop/でpolarity_scores()関数を呼び出したときに計算されていることがわかります。 nltk/sentiment/vader.py＃L217 ：

def polarity_scores(self, text): """ Return a float for sentiment strength based on the input text. Positive values are positive valence, negative value are negative valence. """ sentitext = SentiText(text) #text, words_and_emoticons, is_cap_diff = self.preprocess(text) sentiments = [] words_and_emoticons = sentitext.words_and_emoticons for item in words_and_emoticons: valence = 0 i = words_and_emoticons.index(item) if (i < len(words_and_emoticons) - 1 and item.lower() == "kind" and \ words_and_emoticons[i+1].lower() == "of") or \ item.lower() in BOOSTER_DICT: sentiments.append(valence) continue sentiments = self.sentiment_valence(valence, sentitext, item, i, sentiments) sentiments = self._but_check(words_and_emoticons, sentiments)

polarity_scores関数を見ると、SentiTextレキシコン全体を反復処理し、ルールベースのsentiment_valence()関数でチェックして、感情に価数スコアを割り当てます https：/ /github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L24 、 http://comp.social.gatech.edu/papers/のセクション2.1.1を参照icwsm14.vader.hutto.pdf

複合スコアに戻ると、次のことがわかります。

compoundスコアは、sum_sの正規化されたスコアであり、
sum_sは、いくつかのヒューリスティックとセンチメントレキシコン（別名、センチメント強度）に基づいて計算された価の合計です。
正規化されたスコアは、単純にsum_sをその平方で割ったものに、正規化関数の分母を増やすアルファパラメーターを加えたものです。

[pos、neu、neg]ベクトルから計算されたものですか？

本当にない=）

score_valence関数を見ると https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L411 であることがわかります。複合スコアは、_sift_sentiment_scores()からの生のスコアを使用して個々のpos、neg、およびneuスコアを計算するsentiment_valence()を使用して、pos、neg、およびneuスコアの前にsum_sで計算されます。合計。

このalpha数学を見ると、alphaの値に応じて、正規化の出力は（制約されないままの場合）かなり不安定なようです。

alpha=0：

alpha=15：

alpha=50000：

alpha=0.001：

負の場合はファンキーになります：

alpha=-10：

alpha=-1,000,000：

alpha=-1,000,000,000：

leonfrench · Answer

github repo の「スコアリングについて」セクションに説明があります。