私は現在"Scikit-Learn&TensorFlowによる実践的な機械学習"を読んでいます。変換パイプラインコードを再作成しようとすると、エラーが発生します。どうすれば修正できますか?
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
num_pipeline = Pipeline([('imputer', Imputer(strategy = "median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
housing_num_tr = num_pipeline.fit_transform(housing_num)
from sklearn.pipeline import FeatureUnion
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]
num_pipeline = Pipeline([
('selector', DataFrameSelector(num_attribs)),
('imputer', Imputer(strategy = "median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
cat_pipeline = Pipeline([('selector', DataFrameSelector(cat_attribs)),
('label_binarizer', LabelBinarizer()),
])
full_pipeline = FeatureUnion(transformer_list = [("num_pipeline", num_pipeline),
("cat_pipeline", cat_pipeline),
])
# And we can now run the whole pipeline simply:
housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-350-3a4a39e5bc1c> in <module>()
43
44 num_pipeline = Pipeline([
---> 45 ('selector', DataFrameSelector(num_attribs)),
46 ('imputer', Imputer(strategy = "median")),
47 ('attribs_adder', CombinedAttributesAdder()),
NameError: name 'DataFrameSelector' is not defined
DataFrameSelector
が見つからないため、インポートする必要があります。これはsklearn
の一部ではありませんが、同じ名前のものが sklearn-features で利用可能です:
from sklearn_features.transformers import DataFrameSelector
( [〜#〜] docs [〜#〜] )
from sklearn.base import BaseEstimator, TransformerMixin
class DataFrameSelector(BaseEstimator, TransformerMixin):
def __init__(self, attribute_names):
self.attribute_names=attribute_names
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.attribute_names].values
これは動作するはずです。
SklearnとTensorflowを使ってHands of Machine Learningをフォローしている場合、それは次のページ、カスタムメイドのデータフレームジェネレーターにあります
from sklearn.pipeline import FeatureUnion
class DataFrameSelector(BaseEstimator, TransformerMixin):
def __init__(self, attribute_names):
self.attribute_names = attribute_names
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.attribute_names].values
from sklearn.pipeline import FeatureUnion
class DataFrameSelector(BaseEstimator, TransformerMixin):
def __init__(self, attribute_names):
self.attribute_names = attribute_names
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.attribute_names].values
うまくいくかもしれません。