共通の要素を共有するリストをマージする

Question

私の入力はリストのリストです。それらのいくつかは、共通の要素を共有しています。

L = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]

共通の要素を共有するすべてのリストをマージし、同じアイテムのリストがなくなるまでこの手順を繰り返す必要があります。ブール演算とwhileループの使用を考えましたが、良い解決策を思い付くことができませんでした。

最終結果は次のようになります。

L = [['a','b','c','d','e','f','g','o','p'],['k']]

Jochen Ritzel · Answer

リストはグラフの表記として見ることができます。つまり、['a','b','c']は、3つのノードが相互に接続されたグラフです。あなたが解決しようとしている問題は、このグラフの接続されたコンポーネントを見つけることです。

これには NetworkX を使用できます。これには、正しいことがほぼ保証されているという利点があります。

l = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']] import networkx from networkx.algorithms.components.connected import connected_components def to_graph(l): G = networkx.Graph() for part in l: # each sublist is a bunch of nodes G.add_nodes_from(part) # it also imlies a number of edges: G.add_edges_from(to_edges(part)) return G def to_edges(l): """ treat `l` as a Graph and returns it's edges to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)] """ it = iter(l) last = next(it) for current in it: yield last, current last = current G = to_graph(l) print connected_components(G) # prints [['a', 'c', 'b', 'e', 'd', 'g', 'f', 'o', 'p'], ['k']]

これを効率的に自分で解決するには、リストをとにかくグラフっぽいものに変換する必要があるため、最初からnetworkXを使用することもできます。

Howard · Answer

アルゴリズム：

リストから最初のセットAを取得
お互いにBがAと共通の要素を持っている場合は、リスト内のBがBをAに結合します。リストからBを削除
aと重複しなくなるまで2.を繰り返します。
aをoutpupに入れます
リストの残りで1.を繰り返す

したがって、リストの代わりにセットを使用したい場合があります。次のプログラムはそれを行うべきです。

l = [['a', 'b', 'c'], ['b', 'd', 'e'], ['k'], ['o', 'p'], ['e', 'f'], ['p', 'a'], ['d', 'g']] out = [] while len(l)>0: first, *rest = l first = set(first) lf = -1 while len(first)>lf: lf = len(first) rest2 = [] for r in rest: if len(first.intersection(set(r)))>0: first |= set(r) else: rest2.append(r) rest = rest2 out.append(first) l = rest print(out)

Nicholas Braaksma · Answer

共通の値を持つリストをマージしようとする同じ問題に遭遇しました。この例はあなたが探しているものかもしれません。リストをループするのは1回だけで、結果セットは更新されます。

lists = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']] lists = sorted([sorted(x) for x in lists]) #Sorts lists in place so you dont miss things. Trust me, needs to be done. resultslist = [] #Create the empty result list. if len(lists) >= 1: # If your list is empty then you dont need to do anything. resultlist = [lists[0]] #Add the first item to your resultset if len(lists) > 1: #If there is only one list in your list then you dont need to do anything. for l in lists[1:]: #Loop through lists starting at list 1 listset = set(l) #Turn you list into a set merged = False #Trigger for index in range(len(resultlist)): #Use indexes of the list for speed. rset = set(resultlist[index]) #Get list from you resultset as a set if len(listset & rset) != 0: #If listset and rset have a common value then the len will be greater than 1 resultlist[index] = list(listset | rset) #Update the resultlist with the updated union of listset and rset merged = True #Turn trigger to True break #Because you found a match there is no need to continue the for loop. if not merged: #If there was no match then add the list to the resultset, so it doesnt get left out. resultlist.append(l) print resultlist

＃

resultset = [['a', 'b', 'c', 'd', 'e', 'g', 'f', 'o', 'p'], ['k']]

MAK · Answer

これは graph として問題をモデル化することで解決できると思います。各サブリストはノードであり、2つのサブリストに共通の要素がある場合にのみ、別のノードとEdgeを共有します。したがって、マージされたサブリストは基本的に、グラフでは接続されたコンポーネントです。それらのすべてをマージすることは、単にすべての接続されたコンポーネントを見つけてリストすることの問題です。

これは、グラフを単純にたどることで実行できます。 [〜＃〜] bfs [〜＃〜] と [〜＃〜] dfs [〜＃〜] の両方を使用できますが、ここではDFSを使用しています私にとっては少し短いです。

l = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']] taken=[False]*len(l) l=[set(elem) for elem in l] def dfs(node,index): taken[index]=True ret=node for i,item in enumerate(l): if not taken[i] and not ret.isdisjoint(item): ret.update(dfs(item,i)) return ret def merge_all(): ret=[] for i,node in enumerate(l): if not taken[i]: ret.append(list(dfs(node,i))) return ret print(merge_all())

pillmuncher · Answer

Jochen Ritzelが指摘として、グラフ内の接続されたコンポーネントを探しています。グラフライブラリを使用せずに実装する方法は次のとおりです。

from collections import defaultdict def connected_components(lists): neighbors = defaultdict(set) seen = set() for each in lists: for item in each: neighbors[item].update(each) def component(node, neighbors=neighbors, seen=seen, see=seen.add): nodes = set([node]) next_node = nodes.pop while nodes: node = next_node() see(node) nodes |= neighbors[node] - seen yield node for node in neighbors: if node not in seen: yield sorted(component(node)) L = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']] print list(connected_components(L))

duhaime · Answer

かなり大きなリストに対してOPで何百万回も説明されているクラスタリング手法を実行する必要があったため、上記で提案された方法のうち、最も正確で最もパフォーマンスの高い方法を特定したいと考えました。

上記の各メソッドに対して2 ^ 1から2 ^ 10のサイズの入力リストを10回試行し、各メソッドに同じ入力リストを使用して、上記の各アルゴリズムの平均実行時間をミリ秒で測定しました。結果は次のとおりです。

これらの結果は、常に正しい結果を返すメソッドの中で、@ jochenが最速であることがわかりました。一貫して正しい結果を返さないメソッドの中で、makのソリューションはすべての入力要素を含まないことが多く（つまり、リストメンバーのリストが欠落しています）、braaksma、cmangla、およびアスタリスクのソリューションは最大限にマージされることが保証されていません。

興味深いのは、2つの最速の正しいアルゴリズムが適切にランク付けされた順序で、現在までの上位2つの賛成票を持っていることです。

テストの実行に使用されるコードは次のとおりです。

from networkx.algorithms.components.connected import connected_components from itertools import chain from random import randint, random from collections import defaultdict, deque from copy import deepcopy from multiprocessing import Pool import networkx import datetime import os ## # @mimomu ## def mimomu(l): l = deepcopy(l) s = set(chain.from_iterable(l)) for i in s: components = [x for x in l if i in x] for j in components: l.remove(j) l += [list(set(chain.from_iterable(components)))] return l ## # @Howard ## def howard(l): out = [] while len(l)>0: first, *rest = l first = set(first) lf = -1 while len(first)>lf: lf = len(first) rest2 = [] for r in rest: if len(first.intersection(set(r)))>0: first |= set(r) else: rest2.append(r) rest = rest2 out.append(first) l = rest return out ## # Nx @Jochen Ritzel ## def jochen(l): l = deepcopy(l) def to_graph(l): G = networkx.Graph() for part in l: # each sublist is a bunch of nodes G.add_nodes_from(part) # it also imlies a number of edges: G.add_edges_from(to_edges(part)) return G def to_edges(l): """ treat `l` as a Graph and returns it's edges to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)] """ it = iter(l) last = next(it) for current in it: yield last, current last = current G = to_graph(l) return list(connected_components(G)) ## # Merge all @MAK ## def mak(l): l = deepcopy(l) taken=[False]*len(l) l=map(set,l) def dfs(node,index): taken[index]=True ret=node for i,item in enumerate(l): if not taken[i] and not ret.isdisjoint(item): ret.update(dfs(item,i)) return ret def merge_all(): ret=[] for i,node in enumerate(l): if not taken[i]: ret.append(list(dfs(node,i))) return ret result = list(merge_all()) return result ## # @cmangla ## def cmangla(l): l = deepcopy(l) len_l = len(l) i = 0 while i < (len_l - 1): for j in range(i + 1, len_l): # i,j iterate over all pairs of l's elements including new # elements from merged pairs. We use len_l because len(l) # may change as we iterate i_set = set(l[i]) j_set = set(l[j]) if len(i_set.intersection(j_set)) > 0: # Remove these two from list l.pop(j) l.pop(i) # Merge them and append to the orig. list ij_union = list(i_set.union(j_set)) l.append(ij_union) # len(l) has changed len_l -= 1 # adjust 'i' because elements shifted i -= 1 # abort inner loop, continue with next l[i] break i += 1 return l ## # @pillmuncher ## def pillmuncher(l): l = deepcopy(l) def connected_components(lists): neighbors = defaultdict(set) seen = set() for each in lists: for item in each: neighbors[item].update(each) def component(node, neighbors=neighbors, seen=seen, see=seen.add): nodes = set([node]) next_node = nodes.pop while nodes: node = next_node() see(node) nodes |= neighbors[node] - seen yield node for node in neighbors: if node not in seen: yield sorted(component(node)) return list(connected_components(l)) ## # @NicholasBraaksma ## def braaksma(l): l = deepcopy(l) lists = sorted([sorted(x) for x in l]) #Sorts lists in place so you dont miss things. Trust me, needs to be done. resultslist = [] #Create the empty result list. if len(lists) >= 1: # If your list is empty then you dont need to do anything. resultlist = [lists[0]] #Add the first item to your resultset if len(lists) > 1: #If there is only one list in your list then you dont need to do anything. for l in lists[1:]: #Loop through lists starting at list 1 listset = set(l) #Turn you list into a set merged = False #Trigger for index in range(len(resultlist)): #Use indexes of the list for speed. rset = set(resultlist[index]) #Get list from you resultset as a set if len(listset & rset) != 0: #If listset and rset have a common value then the len will be greater than 1 resultlist[index] = list(listset | rset) #Update the resultlist with the updated union of listset and rset merged = True #Turn trigger to True break #Because you found a match there is no need to continue the for loop. if not merged: #If there was no match then add the list to the resultset, so it doesnt get left out. resultlist.append(l) return resultlist ## # @Rumple Stiltskin ## def stiltskin(l): l = deepcopy(l) hashdict = defaultdict(int) def hashit(x, y): for i in y: x[i] += 1 return x def merge(x, y): sums = sum([hashdict[i] for i in y]) if sums > len(y): x[0] = x[0].union(y) else: x[1] = x[1].union(y) return x hashdict = reduce(hashit, l, hashdict) sets = reduce(merge, l, [set(),set()]) return list(sets) ## # @Asterisk ## def asterisk(l): l = deepcopy(l) results = {} for sm in ['min', 'max']: sort_method = min if sm == 'min' else max l = sorted(l, key=lambda x:sort_method(x)) queue = deque(l) grouped = [] while len(queue) >= 2: l1 = queue.popleft() l2 = queue.popleft() s1 = set(l1) s2 = set(l2) if s1 & s2: queue.appendleft(s1 | s2) else: grouped.append(s1) queue.appendleft(s2) if queue: grouped.append(queue.pop()) results[sm] = grouped if len(results['min']) < len(results['max']): return results['min'] return results['max'] ## # Validate no more clusters can be merged ## def validate(output, L): # validate all sublists are maximally merged d = defaultdict(list) for idx, i in enumerate(output): for j in i: d[j].append(i) if any([len(i) > 1 for i in d.values()]): return 'not maximally merged' # validate all items in L are accounted for all_items = set(chain.from_iterable(L)) accounted_items = set(chain.from_iterable(output)) if all_items != accounted_items: return 'missing items' # validate results are good return 'true' ## # Timers ## def time(func, L): start = datetime.datetime.now() result = func(L) delta = datetime.datetime.now() - start return result, delta ## # Function runner ## def run_func(args): func, L, input_size = args results, elapsed = time(func, L) validation_result = validate(results, L) return func.__name__, input_size, elapsed, validation_result ## # Main ## all_results = defaultdict(lambda: defaultdict(list)) funcs = [mimomu, howard, jochen, mak, cmangla, braaksma, asterisk] args = [] for trial in range(10): for s in range(10): input_size = 2**s # get some random inputs to use for all trials at this size L = [] for i in range(input_size): sublist = [] for j in range(randint(5, 10)): sublist.append(randint(0, 2**24)) L.append(sublist) for i in funcs: args.append([i, L, input_size]) pool = Pool() for result in pool.imap(run_func, args): func_name, input_size, elapsed, validation_result = result all_results[func_name][input_size].append({ 'time': elapsed, 'validation': validation_result, }) # show the running time for the function at this input size print(input_size, func_name, elapsed, validation_result) pool.close() pool.join() # write the average of time trials at each size for each function with open('times.tsv', 'w') as out: for func in all_results: validations = [i['validation'] for j in all_results[func] for i in all_results[func][j]] linetype = 'incorrect results' if any([i != 'true' for i in validations]) else 'correct results' for input_size in all_results[func]: all_times = [i['time'].microseconds for i in all_results[func][input_size]] avg_time = sum(all_times) / len(all_times) out.write(func + '	' + str(input_size) + '	' + \ str(avg_time) + '	' + linetype + '
')

そしてプロットのために：

library(ggplot2) df <- read.table('times.tsv', sep='	') p <- ggplot(df, aes(x=V2, y=V3, color=as.factor(V1))) + geom_line() + xlab('number of input lists') + ylab('runtime (ms)') + labs(color='') + scale_x_continuous(trans='log10') + facet_wrap(~V4, ncol=1) ggsave('runtimes.png')

mimomu · Answer

Itertoolsはリストをマージするための高速なオプションであり、この問題を解決しました。

import itertools LL = set(itertools.chain.from_iterable(L)) # LL is {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'k', 'o', 'p'} for each in LL: components = [x for x in L if each in x] for i in components: L.remove(i) L += [list(set(itertools.chain.from_iterable(components)))] # then L = [['k'], ['a', 'c', 'b', 'e', 'd', 'g', 'f', 'o', 'p']]

大きなセットの場合、LLを最も一般的な要素から頻度の低い要素まで頻度で並べ替えると、処理が少し速くなります。

Rumple Stiltskin · Answer

私の試み。機能的に見えます。

#!/usr/bin/python from collections import defaultdict l = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']] hashdict = defaultdict(int) def hashit(x, y): for i in y: x[i] += 1 return x def merge(x, y): sums = sum([hashdict[i] for i in y]) if sums > len(y): x[0] = x[0].union(y) else: x[1] = x[1].union(y) return x hashdict = reduce(hashit, l, hashdict) sets = reduce(merge, l, [set(),set()]) print [list(sets[0]), list(sets[1])]

dani herrera · Answer

私は非互換バージョンを見逃しています。 2018年（7年後）に投稿します

簡単で不安定アプローチ：

1）要素が共通の場合、デカルト積（クロス結合）を両方マージします
2）重複を削除

#your list l=[['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']] #import itertools from itertools import product, groupby #inner lists to sets (to list of sets) l=[set(x) for x in l] #cartesian product merging elements if some element in common for a,b in product(l,l): if a.intersection( b ): a.update(b) b.update(a) #back to list of lists l = sorted( [sorted(list(x)) for x in l]) #remove dups list(l for l,_ in groupby(l)) #result [['a', 'b', 'c', 'd', 'e', 'f', 'g', 'o', 'p'], ['k']]

Zags · Answer

これは、依存関係のないかなり高速なソリューションです。次のように機能します。

各サブシストに一意の参照番号を割り当てます（この場合、サブリストの初期インデックス）
各サブリスト、および各サブリストの各アイテムの参照要素の辞書を作成します。
変更が発生しなくなるまで、次の手順を繰り返します。

3a。各サブリストの各項目を確認します。そのアイテムの現在の参照番号がそのサブリストの参照番号と異なる場合、要素は2つのリストの一部である必要があります。 2つのリストをマージし（参照から現在のサブリストを削除）、現在のサブリスト内のすべてのアイテムの参照番号を新しいサブリストの参照番号に設定します。

この手順で変更が発生しないのは、すべての要素が1つのリストに含まれているためです。反復ごとにワーキングセットのサイズが減少するため、アルゴリズムは必ず終了します。

 def merge_overlapping_sublists(lst): output, refs = {}, {} for index, sublist in enumerate(lst): output[index] = set(sublist) for elem in sublist: refs[elem] = index changes = True while changes: changes = False for ref_num, sublist in list(output.items()): for elem in sublist: current_ref_num = refs[elem] if current_ref_num != ref_num: changes = True output[current_ref_num] |= sublist for elem2 in sublist: refs[elem2] = current_ref_num output.pop(ref_num) break return list(output.values())

このコードの一連のテストを次に示します。

def compare(a, b): a = list(b) try: for elem in a: b.remove(elem) except ValueError: return False return not b import random lst = [["a", "b"], ["b", "c"], ["c", "d"], ["d", "e"]] random.shuffle(lst) assert compare(merge_overlapping_sublists(lst), [{"a", "b", "c", "d", "e"}]) lst = [["a", "b"], ["b", "c"], ["f", "d"], ["d", "e"]] random.shuffle(lst) assert compare(merge_overlapping_sublists(lst), [{"a", "b", "c",}, {"d", "e", "f"}]) lst = [["a", "b"], ["k", "c"], ["f", "g"], ["d", "e"]] random.shuffle(lst) assert compare(merge_overlapping_sublists(lst), [{"a", "b"}, {"k", "c"}, {"f", "g"}, {"d", "e"}]) lst = [["a", "b", "c"], ["b", "d", "e"], ["k"], ["o", "p"], ["e", "f"], ["p", "a"], ["d", "g"]] random.shuffle(lst) assert compare(merge_overlapping_sublists(lst), [{"k"}, {"a", "c", "b", "e", "d", "g", "f", "o", "p"}]) lst = [["a", "b"], ["b", "c"], ["a"], ["a"], ["b"]] random.shuffle(lst) assert compare(merge_overlapping_sublists(lst), [{"a", "b", "c"}])

戻り値はセットのリストであることに注意してください。

sarnold · Answer

あなたが何を望んでいるのか全くわからないまま、私はあなたが何を意味するのかを推測することにしました。すべての要素を一度だけ見つけたいのです。

#!/usr/bin/python def clink(l, acc): for sub in l: if sub.__class__ == list: clink(sub, acc) else: acc[sub]=1 def clunk(l): acc = {} clink(l, acc) print acc.keys() l = [['a', 'b', 'c'], ['b', 'd', 'e'], ['k'], ['o', 'p'], ['e', 'f'], ['p', 'a'], ['d', 'g']] clunk(l)

出力は次のようになります。

['a', 'c', 'b', 'e', 'd', 'g', 'f', 'k', 'o', 'p']

Scott Boston · Answer

グラフ理論および接続されたコンポーネントの問題であるため、networkxライブラリを使用できます。

import networkx as nx L = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']] G = nx.Graph() #Add nodes to Graph G.add_nodes_from(sum(L, [])) #Create edges from list of nodes q = [[(s[i],s[i+1]) for i in range(len(s)-1)] for s in L] for i in q: #Add edges to Graph G.add_edges_from(i) #Find all connnected components in graph and list nodes for each component [list(i) for i in nx.connected_components(G)]

出力：

[['p', 'c', 'f', 'g', 'o', 'a', 'd', 'b', 'e'], ['k']]

cmangla · Answer

これはおそらくより単純/高速なアルゴリズムであり、うまく機能するようです-

l = [['a', 'b', 'c'], ['b', 'd', 'e'], ['k'], ['o', 'p'], ['e', 'f'], ['p', 'a'], ['d', 'g']] len_l = len(l) i = 0 while i < (len_l - 1): for j in range(i + 1, len_l): # i,j iterate over all pairs of l's elements including new # elements from merged pairs. We use len_l because len(l) # may change as we iterate i_set = set(l[i]) j_set = set(l[j]) if len(i_set.intersection(j_set)) > 0: # Remove these two from list l.pop(j) l.pop(i) # Merge them and append to the orig. list ij_union = list(i_set.union(j_set)) l.append(ij_union) # len(l) has changed len_l -= 1 # adjust 'i' because elements shifted i -= 1 # abort inner loop, continue with next l[i] break i += 1 print l # prints [['k'], ['a', 'c', 'b', 'e', 'd', 'g', 'f', 'o', 'p']]

Eric Chow · Answer

私は同じ問題を解決していました。モジュールをインポートしたくないので、すべての答えに満足していません。

したがって、私はクラスターの組み合わせアルゴリズムを作成しました。

クラスターの組み合わせアルゴリズム

このアルゴリズムは、共通の要素がある場合、2つ以上のグループを結合します。

Cluster Combinationまたは[〜＃〜] cc [〜＃〜]という名前を最初に付けたのは、kmeansクラスタリングアルゴリズムに触発されたからです。第二に、私は実際にクラスターを組み合わせていました。

重要なのは、2つの一時リストを使用することです。 1つ目はelementsと呼ばれ、すべてのグループに存在するすべての要素を格納します。 2番目はlabelsという名前です。 sklearnのkmeansアルゴリズムからインスピレーションを得ました。 'labels'は要素のラベルまたは重心を格納します。ここでは、クラスターの最初の要素を重心にしています。最初は、値は0からlength-1まで昇順です。

グループごとに、「要素」に「インデックス」が表示されます。次に、インデックスに従ってグループのラベルを取得します。そして、ラベルの最小値を計算します。これがそれらの新しいラベルになります。 labels for groupのすべての要素を新しいラベルで置き換えます。

あるいは、反復ごとに、2つ以上の既存のグループを結合しようとします。グループに0と2のラベルがある場合、新しいラベル0を見つけます。これは2つのラベルの最小値です。私はそれらを0で置き換えます。

def cluser_combine(groups): n_groups=len(groups) #first, we put all elements appeared in 'gruops' into 'elements'. elements=list(set.union(*[set(g) for g in groups])) #and sort elements. elements.sort() n_elements=len(elements) #I create a list called clusters, this is the key of this algorithm. #I was inspired by sklearn kmeans implementation. #they have an attribute called labels_ #the url is here: #https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html #i called this algorithm cluster combine, because of this inspiration. labels=list(range(n_elements)) #for each group, I get their 'indices' in 'elements' #I then get the labels for indices. #and i calculate the min of the labels, that will be the new label for them. #I replace all elements with labels in labels_for_group with the new label. #or to say, for each iteration, #i try to combine two or more existing groups. #if the group has labels of 0 and 2 #i find out the new label 0, that is the min of the two. #i than replace them with 0. for i in range(n_groups): #if there is only zero/one element in the group, skip if len(groups[i])<=1: continue indices=list(map(elements.index, groups[i])) labels_for_group=list(set([labels[i] for i in indices])) #if their is only one label, all the elements in group are already have the same label, skip. if len(labels_for_group)==1: continue labels_for_group.sort() label=labels_for_group[0] #combine for k in range(n_elements): if labels[k] in labels_for_group[1:]: labels[k]=label new_groups=[] for label in set(labels): new_group = [elements[i] for i, v in enumerate(labels) if v == label] new_groups.append(new_group) return new_groups

あなたの質問の詳細な結果を印刷しました：

cluser_combine([['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']])

要素：
['a'、 'b'、 'c'、 'd'、 'e'、 'f'、 'g'、 'k'、 'o'、 'p']
ラベル：
[0、1、2、3、4、5、6、7、8、9]
-------------------- group 0 ----------------------- -
グループは：
['a'、 'b'、 'c']
要素内のグループのインデックス
[0、1、2]
結合前のラベル
[0、1、2、3、4、5、6、7、8、9]
組み合わせる...
結合後のラベル
[0、0、0、3、4、5、6、7、8、9]
-------------------- group 1 ----------------------- -
グループは：
['b'、 'd'、 'e']
要素内のグループのインデックス
[1、3、4]
結合前のラベル
[0、0、0、3、4、5、6、7、8、9]
組み合わせる...
結合後のラベル
[0、0、0、0、0、5、6、7、8、9]
-------------------- group 2 ----------------------- -
グループは：
['k']
-------------------- group 3 ----------------------- -
グループは：
['o'、 'p']
要素内のグループのインデックス
[8、9]
結合前のラベル
[0、0、0、0、0、5、6、7、8、9]
組み合わせる...
結合後のラベル
[0、0、0、0、0、5、6、7、8、8]
-------------------- group 4 ----------------------- -
グループは：
['e'、 'f']
要素内のグループのインデックス
[4、5]
結合前のラベル
[0、0、0、0、0、5、6、7、8、8]
組み合わせる...
結合後のラベル
[0、0、0、0、0、0、6、7、8、8]
-------------------- group 5 ----------------------- -
グループは：
['p'、 'a']
要素内のグループのインデックス
[9、0]
結合前のラベル
[0、0、0、0、0、0、6、7、8、8]
組み合わせる...
結合後のラベル
[0、0、0、0、0、0、6、7、0、0]
--------------------グループ6 ----------------------- -
グループは：
['d'、 'g']
要素内のグループのインデックス
[3、6]
結合前のラベル
[0、0、0、0、0、0、6、7、0、0]
組み合わせる...
結合後のラベル
[0、0、0、0、0、0、0、7、0、0]
（[0、0、0、0、0、0、0、7、0、0]、
[['a'、 'b'、 'c'、 'd'、 'e'、 'f'、 'g'、 'o'、 'p']、['k']] ）

詳細は my github jupyter notebook を参照してください