一致するファイルの数が10を超える場合、特定の名前に一致するすべてのファイルを新しいフォルダーに移動するにはどうすればよいですか？

Question

実行時にディレクトリを調べてすべてのファイルを検索し、ファイル名のパターンを自動的に検出して、以下に示す追加のロジックに基づいてそれらを移動するスクリプトを作成しようとしています。

フォルダに次のファイルがあるとします。

aaa.txt
temp-203981.log
temp-098723.log
temp-123197.log
temp-734692.log
test1.sh
test2.sh
test3.sh

スクリプトは自動的にディレクトリを検索できるようになり、名前に一致するプレフィックスを持つ4つのファイル（temp-XXX.log）と3つのファイル（testXXX.sh）があることがわかります。次に、ファイルの数を見つけたら、定義された制限、たとえば3と比較する必要があります。

指定された名前に一致するファイルの数が制限を超えている場合は、見つかったファイルを、一致したファイル名の部分にちなんで名付けられたフォルダーに移動する必要があります。

したがって、上からの親フォルダーは次のようになります。

aaa.txt
temp.log（これは、temp-734692.log、temp-123197.log、temp-098723.log、temp-203981.logを含むフォルダーになります）
test.sh（これは、test1.sh、test2.sh、test3.shを含むフォルダーになります）

これが理にかなっていることを願っています。

P.S.私はこのスクリプトにASHを使用しているので、多くの派手なbash機能がなくても実行できる必要があります。そうしないと、これが簡単になります。

ありがとう！

編集：最初は明瞭さが変わります。また、すべてのファイル名に含まれる「＆」などの事前定義された区切り文字を指定すると、より簡単になる場合があります。スクリプトでは、区切り文字の前のファイル名に基づいて可変フォルダー名を作成する必要がありますが、これにより、作業がより明確で簡単になると思います。

MiniMax · Accepted Answer

確認してください。機能しますか。説明を追加します。どのように機能するかを説明します。 dashでテストしました。

注：ファイル名にはスペースや改行を含めないでください。

#!/bin/dash limit=1 printf "%s
" * | sed 's/[-0-9]*\..*$//' | uniq -c | awk -v lim=${limit} '$1 >= lim {print $2}' | sort -r | while read -r i; do for j in "${i}"*; do [ -f "$j" ] || continue dir=${i}.${j#*.} [ -d "$dir" ] || mkdir "$dir" mv -v "$j" "$dir" done done

ここで1つの問題があります。ファイル名がaaa.txtのように、将来のディレクトリ名と等しい場合です。 aaa.txtの場合、ファイル名に余分な文字が含まれていないため、ファイル名から何も削除されません。したがって、新しいディレクトリ名は同じになり、エラーが発生します。

mkdir: cannot create directory ‘aaa.txt’: File exists mv: 'aaa.txt' and 'aaa.txt' are the same file

この問題の回避策の1つは、想定されるディレクトリ名がファイル名と等しいかどうかを確認し、aaa1.txtのように将来のディレクトリ名に番号を追加することです。

デモンストレーション

スクリプト実行前

$ tree . ├── aaa.txt ├── temp-098723.log ├── temp-123197.log ├── temp-203981.log ├── temp-734692.log ├── temp-new-file123.log ├── temp-new-file-2323-12.log ├── temp-new-file-342.log ├── test1.sh ├── test2.sh └── test3.sh 0 directories, 11 files

スクリプト実行後： script.sh

$ tree . ├── aaa.txt ├── temp.log │ ├── temp-098723.log │ ├── temp-123197.log │ ├── temp-203981.log │ └── temp-734692.log ├── temp-new-file.log │ ├── temp-new-file123.log │ ├── temp-new-file-2323-12.log │ └── temp-new-file-342.log └── test.sh ├── test1.sh ├── test2.sh └── test3.sh 3 directories, 11 files

igal · Answer

私はあなたがここで何を求めているのか誤解している可能性がありますが、述べたように、この質問には微妙な点があり、比較的洗練された解決策が必要だと思います。つまり、スクリプトがどれほど単純で何ができるかわかりません。あなたが欲しい。たとえば、ファイルのサンプルリストを注意深く見てみましょう。

 aaa.txt temp-203981.log temp-098723.log temp-123197.log temp-734692.log test1.sh test2.sh test3.sh

あなたの質問によると、このリストから抽出されたプレフィックスはtempとtestにする必要があります。ここで、aaaを持つファイルは1つしかないため、aaaは除外されます。 _をプレフィックスとして使用し、しきい値の例は3です。しかし、teで始まるファイルが7つあるのに、なぜteがプレフィックスではないのでしょうか。または、最初にファイル名のサフィックスに基づいてファイルをグループ化したいように思われるので、新しいサブディレクトリの1つがt.logではなくtemp-.logまたはtemp.logではないのはなぜですか？この議論で、プレフィックスのリストを引数として使用せずに、プログラムが潜在的なプレフィックスを独自に決定するようにしたい場合は、質問ステートメントに解決する必要のあるいくつかのあいまいさ（および対応する選択肢）があることを明確にしたいと思います。作成する必要があります）。

これはPythonスクリプトで、単純な trie データ構造を使用して、いくつかの制約（引数として指定可能）を満たす最長の一致するプレフィックスを検索します。

#!/usr/bin/env python2 # -*- coding: ascii -*- """ trieganize.py Use the trie data structure to look for prefixes of filenames in a given directory and then reorganiz those files into subdirectories based on those prefixes. In this script the trie data structure is just a dictionary of the following form: trie = { "count": integer, "children": dictionary, "leaf": boolean } Where the dictionary keys have the following semantics. count: stores the number of total descendents of the given trie node children: stores the child trie nodes of the given node leaf: denotes whether this trie corresponds to the final character in a Word """ import sys import os import string def add_Word_to_trie(trie, Word): """Add a new Word to the trie.""" if Word: trie["count"] += 1 if Word[0] not in trie["children"]: trie["children"][Word[0]] = \ {"count": 0, "children": {}, "leaf": False} add_Word_to_trie(trie=trie["children"][Word[0]], Word=word[1:]) else: trie["leaf"] = True return(trie) def expand_trie(trie, prefix='', words=None): """Given a trie, return the list of words it encodes.""" if words is None: words = list() if trie["leaf"]: words.append(prefix) for character, child in trie["children"].iteritems(): if trie["children"]: expand_trie(trie=child, prefix=prefix+character, words=words) return(words) def extract_groups_from_trie( trie, threshold=0, prefix='', groups=None, minimum_prefix_length=0, maximum_prefix_length=float("inf"), prefix_charset=string.ascii_letters, ): """Given a trie and some prefix constraints, return a dictionary which groups together the words in the trie based on shared prefixes which satisfy the specified constraints. """ if groups is None: groups = dict() if trie["count"] >= threshold: children = { character: child for character, child in trie["children"].iteritems() if ( child["count"] >= threshold and len(prefix) + 1 >= minimum_prefix_length and len(prefix) + 1 <= maximum_prefix_length and character in prefix_charset ) } if not children: groups[prefix] = expand_trie(trie, prefix) else: for character, child in children.iteritems(): extract_groups_from_trie( trie=child, threshold=threshold, prefix=prefix+character, groups=groups ) return(groups) def reorganize_files(basedir, suffix_separator='.', threshold=3): """Takes a path to a directory and reorganizes the files in that directory into subdirectories based on the prefixes of their filenames.""" # Get the list of file names filenames = os.listdir(basedir) # Group the filenames by suffix suffixes = {} for filename in filenames: basename, separator, suffix = filename.rpartition(suffix_separator) if suffix not in suffixes: suffixes[suffix] = [] suffixes[suffix].append(basename) # For each suffix, search for prefixes for suffix, basenames in suffixes.iteritems(): # Initialize a trie object trie = {"count":0, "children": {}, "leaf": False} # Add the filenames to the trie for basename in basenames: add_Word_to_trie(trie, basename) # Break the filenames up into groups based on their prefixes groups = extract_groups_from_trie(trie, threshold) # Organize the groups of files into subdirectories for prefix, group in groups.iteritems(): targetdir = os.path.join(basedir, prefix + suffix_separator + suffix) os.mkdir(targetdir) for basename in group: filename = basename + suffix_separator + suffix sourcefile = os.path.join(basedir, filename) targetfile = os.path.join(targetdir, filename) os.rename(sourcefile, targetfile) if __name__=="__main__": reorganize_files(basedir=sys.argv[1])

このPythonスクリプトを示すために、テストディレクトリを作成してデータを設定する小さなシェルスクリプトを作成しました。

#!/usr/bin/bash # create-test-dir.sh rm -rf /tmp/testdir mkdir -p /tmp/testdir files=( aaa.txt temp-203981.log temp-098723.log temp-123197.log temp-734692.log test1.sh test2.sh test3.sh ) for file in ${files[@]}; do touch "/tmp/testdir/${file}"; done

スクリプトを実行できます。

bash create-test-dir.sh

その後、テストディレクトリは次のようになります（tree /tmp/testdirを実行）：

 /tmp/testdir/ |-aaa.txt |-temp-098723.log |-temp-123197.log | -temp-203981.log | -temp-734692.log | -test1.sh | -test2.sh `-test3.sh 0ディレクトリ、8ファイル

これで、Pythonスクリプトを実行できます：

python trieganize.py /tmp/testdir

その後、ファイルは次のように編成されます。

 /tmp/testdir / | -aaa.txt | -temp.log | | -temp-098723.log | | -temp-123197.log | | -temp-203981.log | `-temp-734692.log ` -test.sh | -test1.sh | -test2.sh `-test3 .sh 2ディレクトリ、8ファイル

m0dular · Answer

はい、bashを使用するとこれが簡単になりますが、POSIXソリューションは次のとおりです。

#!/bin/sh for pattern in "$@"; do set -- "$pattern"* if [ $# -gt 2 ]; then for f in "$@"; do [ -f "$f" ] || continue ext="${f##*.}" dest="${pattern}.${ext}" [ -d "$dest" ] || mkdir "$dest" mv "$f" "$dest" done fi done exit

これには、任意の数のパターンが必要です。 ./script temp test。パターンごとに、パターンに一致するファイルに位置パラメータを設定し、パターンに一致するファイルが3つ以上ある場合は、<pattern>.<file_extension>という名前のフォルダに移動します。サンプルファイルを使用して、意図した結果を得ました。

編集：ディレクトリなどの移動を避けるために、$fが通常のファイルであることをテストします。