Paste Rと同等のPython

Question

私は新しいpython afficionadoです。Rユーザーの場合、データフレーム内の2つ以上の変数を連結するのに役立つ関数が1つあります。非常に便利です。たとえば、このデータフレームがあるとします：

 categorie titre tarifMin lieu long lat img dateSortie 1 Zoo, Aquar 0.0 Aquar 2.385 48.89 ilo,0 2 Zoo, Aquar 4.5 Aquar 2.408 48.83 ilo,0 6 lieu Jardi 0.0 Jardi 2.320 48.86 ilo,0 7 lieu Bois 0.0 Bois 2.455 48.82 ilo,0 13 espac Canal 0.0 Canal 2.366 48.87 ilo,0 14 espac Canal -1.0 Canal 2.384 48.89 ilo,0 15 parc Le Ma 20.0 Le Ma 2.353 48.87 ilo,0

データフレーム内の別の列とテキストを使用する新しい列を作成したいと思います。 Rで、私はします：

> y$thecolThatIWant=ifelse(y$tarifMin!=-1, + paste("Evenement permanent -->",y$categorie, + y$titre,"C partir de",y$tarifMin,"€uros"), + paste("Evenement permanent -->",y$categorie, + y$titre,"sans prix indique"))

そして結果は：

> y categorie titre tarifMin lieu long lat img dateSortie 1 Zoo, Aquar 0.0 Aquar 2.385 48.89 ilo,0 2 Zoo, Aquar 4.5 Aquar 2.408 48.83 ilo,0 6 lieu Jardi 0.0 Jardi 2.320 48.86 ilo,0 7 lieu Bois 0.0 Bois 2.455 48.82 ilo,0 13 espac Canal 0.0 Canal 2.366 48.87 ilo,0 14 espac Canal -1.0 Canal 2.384 48.89 ilo,0 15 parc Le Ma 20.0 Le Ma 2.353 48.87 ilo,0 thecolThatIWant 1 Evenement permanent --> Zoo, Aquar C partir de 0.0 €uros 2 Evenement permanent --> Zoo, Aquar C partir de 4.5 €uros 6 Evenement permanent --> lieu Jardi C partir de 0.0 €uros 7 Evenement permanent --> lieu Bois C partir de 0.0 €uros 13 Evenement permanent --> espac Canal C partir de 0.0 €uros 14 Evenement permanent --> espac Canal C partir de -1.0 €uros 15 Evenement permanent --> parc Le Ma C partir de 20.0 €uros

私の質問は：Python Pandasまたは他のモジュールで同じことをするにはどうしたらいいですか？

これまでに試したこと：まあ、私は非常に新しいユーザーです。私の間違いでごめんなさい。私はPython=で例を複製しようとしています、そして私はこのようなものを得ると仮定します

table=pd.read_csv("y.csv",sep=",") tt= table.loc[:,['categorie','titre','tarifMin','long','lat','lieu']] table ategorie titre tarifMin long lat lieu 0 Zoo, Aquar 0.0 2.385 48.89 Aquar 1 Zoo, Aquar 4.5 2.408 48.83 Aquar 2 lieu Jardi 0.0 2.320 48.86 Jardi 3 lieu Bois 0.0 2.455 48.82 Bois 4 espac Canal 0.0 2.366 48.87 Canal 5 espac Canal -1.0 2.384 48.89 Canal 6 parc Le Ma 20.0 2.353 48.87 Le Ma

基本的にこれを試しました

sc="Even permanent -->" + " "+ tt.titre+" "+tt.lieu tt['theColThatIWant'] = sc tt

そして私はこれを手に入れました

 categorie titre tarifMin long lat lieu theColThatIWant 0 Zoo, Aquar 0.0 2.385 48.89 Aquar Even permanent --> Aquar Aquar 1 Zoo, Aquar 4.5 2.408 48.83 Aquar Even permanent --> Aquar Aquar 2 lieu Jardi 0.0 2.320 48.86 Jardi Even permanent --> Jardi Jardi 3 lieu Bois 0.0 2.455 48.82 Bois Even permanent --> Bois Bois 4 espac Canal 0.0 2.366 48.87 Canal Even permanent --> Canal Canal 5 espac Canal -1.0 2.384 48.89 Canal Even permanent --> Canal Canal 6 parc Le Ma 20.0 2.353 48.87 Le Ma Even permanent --> Le Ma Le Ma

今、私はRのようなベクトル化がない場合、条件でループする必要があると思いますか？

shadowtalker · Answer

リストとおそらく他のイテラブルで機能する単純な実装を次に示します。警告：簡単にテストされており、Python 3.5：

import functools def reduce_concat(x, sep=""): return functools.reduce(lambda x, y: str(x) + sep + str(y), x) def paste(*lists, sep=" ", collapse=None): result = map(lambda x: reduce_concat(x, sep=sep), Zip(*lists)) if collapse is not None: return reduce_concat(result, sep=collapse) return list(result) print(paste([1,2,3], [11,12,13], sep=',')) print(paste([1,2,3], [11,12,13], sep=',', collapse=";")) # ['1,11', '2,12', '3,13'] # '1,11;2,12;3,13'

もう少し楽しくして、paste0などの他の関数を複製することもできます。

paste0 = functools.partial(paste, sep="")

Edward · Answer

この特定のケースでは、pasteのR演算子は、Pythonで追加されたPythonの format に最も近い= 2.6。これは、以前の_%_演算子よりも新しく、やや柔軟です。

Numpyやpandasを使用せずに純粋にPython風の答えを得るには、リストのリストの形式で元のデータを使用してこれを行う1つの方法を示します（これはdictのリストとしても行うことができますが、雑然としているようです）私に）。

_# -*- coding: utf-8 -*- names=['categorie','titre','tarifMin','lieu','long','lat','img','dateSortie'] records=[[ 'Zoo', 'Aquar', 0.0,'Aquar',2.385,48.89,'ilo',0],[ 'Zoo', 'Aquar', 4.5,'Aquar',2.408,48.83,'ilo',0],[ 'lieu', 'Jardi', 0.0,'Jardi',2.320,48.86,'ilo',0],[ 'lieu', 'Bois', 0.0,'Bois', 2.455,48.82,'ilo',0],[ 'espac', 'Canal', 0.0,'Canal',2.366,48.87,'ilo',0],[ 'espac', 'Canal', -1.0,'Canal',2.384,48.89,'ilo',0],[ 'parc', 'Le Ma', 20.0,'Le Ma', 2.353,48.87,'ilo',0] ] def prix(p): if (p != -1): return 'C partir de {} €uros'.format(p) return 'sans prix indique' def msg(a): return 'Evenement permanent --> {}, {} {}'.format(a[0],a[1],prix(a[2])) [m.append(msg(m)) for m in records] from pprint import pprint pprint(records) _

結果はこれです：

_[['Zoo', 'Aquar', 0.0, 'Aquar', 2.385, 48.89, 'ilo', 0, 'Evenement permanent --> Zoo, Aquar C partir de 0.0 \xe2\x82\xacuros'], ['Zoo', 'Aquar', 4.5, 'Aquar', 2.408, 48.83, 'ilo', 0, 'Evenement permanent --> Zoo, Aquar C partir de 4.5 \xe2\x82\xacuros'], ['lieu', 'Jardi', 0.0, 'Jardi', 2.32, 48.86, 'ilo', 0, 'Evenement permanent --> lieu, Jardi C partir de 0.0 \xe2\x82\xacuros'], ['lieu', 'Bois', 0.0, 'Bois', 2.455, 48.82, 'ilo', 0, 'Evenement permanent --> lieu, Bois C partir de 0.0 \xe2\x82\xacuros'], ['espac', 'Canal', 0.0, 'Canal', 2.366, 48.87, 'ilo', 0, 'Evenement permanent --> espac, Canal C partir de 0.0 \xe2\x82\xacuros'], ['espac', 'Canal', -1.0, 'Canal', 2.384, 48.89, 'ilo', 0, 'Evenement permanent --> espac, Canal sans prix indique'], ['parc', 'Le Ma', 20.0, 'Le Ma', 2.353, 48.87, 'ilo', 0, 'Evenement permanent --> parc, Le Ma C partir de 20.0 \xe2\x82\xacuros']] _

リストnamesを定義しましたが、実際には使用されていません。タイトルの名前をキー、フィールド番号（0から始まる）を値として辞書を定義することもできますが、例を単純に保つためにこれに悩むことはありませんでした。

関数prixとmsgはかなり単純です。トリッキーな部分は、リストの内包[m.append(msg(m)) for m in records]だけです。これは、すべてのレコードを反復処理し、それぞれを変更して、msgの呼び出しによって作成された新しいフィールドを追加します。

SAHIL BHANGE · Answer

これは、R：RコードのPasteコマンドとよく似ています。

 words = c("Here", "I","want","to","concatenate","words","using","pipe","delimeter") paste(words,collapse="|")

[1]

「ここ|私|したい|する|連結する|言葉|使用する|パイプ|デリメーター」

Python：

words = ["Here", "I","want","to","concatenate","words","using","pipe","delimeter"] "|".join(words)

結果：

'ここ|私|したい|する|連結する|言葉|使用する|パイプ|デリメーター

lowtech · Answer

私のanwserは元の質問に大まかに基づいており、wolesによる回答から編集されました。ポイントを説明したいと思います：

貼り付けはPythonの％演算子です
適用を使用すると、新しい値を作成して新しい列に割り当てることができます

rの人々の場合：直接形式のifelseはありません（しかし、それをうまく置き換える方法はあります）。

import numpy as np import pandas as pd dates = pd.date_range('20140412',periods=7) df = pd.DataFrame(np.random.randn(7,4),index=dates,columns=list('ABCD')) df['categorie'] = ['z', 'z', 'l', 'l', 'e', 'e', 'p'] def apply_to_row(x): ret = "this is the value i want: %f" % x['A'] if x['B'] > 0: ret = "no, this one is better: %f" % x['C'] return ret df['theColumnIWant'] = df.apply(apply_to_row, axis = 1) print df

胡亦朗 · Answer

応募してみましょう。

df.apply( lambda x: str( x.loc[ desired_col ] ) + "pasting?" , axis = 1 )

ペーストのような同様のものを受け取ります

shouldsee · Answer

あなたは試すことができます_pandas.Series.str.cat_

_import pandas as pd def paste0(ss,sep=None,na_rep=None,): '''Analogy to R paste0''' ss = [pd.Series(s) for s in ss] ss = [s.astype(str) for s in ss] s = ss[0] res = s.str.cat(ss[1:],sep=sep,na_rep=na_rep) return res pasteA=paste0 _

または単にsep.join()

＃

_def paste0(ss,sep=None,na_rep=None, castF=unicode, ##### many languages dont work well with str ): if sep is None: sep='' res = [castF(sep).join(castF(s) for s in x) for x in Zip(*ss)] return res pasteB = paste0 %timeit pasteA([range(1000),range(1000,0,-1)],sep='_') # 100 loops, best of 3: 7.11 ms per loop %timeit pasteB([range(1000),range(1000,0,-1)],sep='_') # 100 loops, best of 3: 2.24 ms per loop _

私はitertoolsを使用してリサイクルを模倣しました

_import itertools def paste0(ss,sep=None,na_rep=None,castF=unicode): '''Analogy to R paste0 ''' if sep is None: sep=u'' L = max([len(e) for e in ss]) it = itertools.izip(*[itertools.cycle(e) for e in ss]) res = [castF(sep).join(castF(s) for s in next(it) ) for i in range(L)] # res = pd.Series(res) return res _

patsy 関係があるかもしれません（私自身は経験豊富なユーザーではありません）。

Corey Levinson · Answer

2つの文字列列を貼り付けるだけの場合は、関数を作成する必要がないため、@ shouldseeの答えを簡略化できます。たとえば、私の場合：

df['newcol'] = df['id_part_one'].str.cat(df['id_part_two'], sep='_')

これを行うには、両方のシリーズがdtype objectである必要がある場合があります（私は確認していません）。

Michał · Answer

これはそれを達成する方法の簡単な例です（私が身に着けていない場合、何をしたいですか）：

import numpy as np import pandas as pd dates = pd.date_range('20130101',periods=6) df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD')) for row in df.itertuples(): index, A, B, C, D = row print '%s Evenement permanent --> %s , next data %s' % (index, A, B)

出力：

>>>df A B C D 2013-01-01 -0.400550 -0.204032 -0.954237 0.019025 2013-01-02 0.509040 -0.611699 1.065862 0.034486 2013-01-03 0.366230 0.805068 -0.144129 -0.912942 2013-01-04 1.381278 -1.783794 0.835435 -0.140371 2013-01-05 1.140866 2.755003 -0.940519 -2.425671 2013-01-06 -0.610569 -0.282952 0.111293 -0.108521

この印刷のループ：2013-01-01 00:00:00 Evenement Permanent-> -0.400550121168、次のデータ-0.204032344442

2013-01-02 00:00:00 Evenement permanent --> 0.509040318928 , next data -0.611698560541 2013-01-03 00:00:00 Evenement permanent --> 0.366230438863 , next data 0.805067758304 2013-01-04 00:00:00 Evenement permanent --> 1.38127775713 , next data -1.78379439485 2013-01-05 00:00:00 Evenement permanent --> 1.14086631509 , next data 2.75500268167 2013-01-06 00:00:00 Evenement permanent --> -0.610568516983 , next data -0.282952162792