XGBoostパッケージの機能スコア（/重要度）はどのように計算されますか？

Question

コマンド xgb.importanceは、fスコアで測定された特徴の重要度のグラフを返します。

これfスコアは何を表し、どのように計算されますか？

出力：機能の重要度のグラフ

T. Scharf · Accepted Answer

これは、各機能が分割された回数を単純に合計するメトリックです。 Rバージョンの頻度メトリックに類似しています。 https://cran.r-project.org/web/packages/xgboost/xgboost.pdf

これは、基本的な機能の重要度の指標です。

i.e。この変数は何回分割されましたか？

このメソッドのコードは、すべてのツリーに特定の機能の存在を単純に追加していることを示しています。

[ここ.. https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L953] [1]

def get_fscore(self, fmap=''): """Get feature importance of each feature. Parameters ---------- fmap: str (optional) The name of feature map file """ trees = self.get_dump(fmap) ## dump all the trees to text fmap = {} for tree in trees: ## loop through the trees for line in tree.split('
'): # text processing arr = line.split('[') if len(arr) == 1: # text processing continue fid = arr[1].split(']')[0] # text processing fid = fid.split('<')[0] # split on the greater/less(find variable name) if fid not in fmap: # if the feature id hasn't been seen yet fmap[fid] = 1 # add it else: fmap[fid] += 1 # else increment it return fmap # return the fmap, which has the counts of each time a variable was split on

Aerin · Answer

私はこの答えが正確かつ徹底的だと感じました。 feature_importancesの実装を示しています。

https://stats.stackexchange.com/questions/162162/relative-variable-importance-for-boosting