Linuxで一致する文字列に基づいて、単一のファイルを複数のファイルに分割する

Question

次のような内容のファイルがあります。

File.txt：

661###############20160315### ########################### ########################### 661###############20160316### ########################### 661###############20160317### ###########################

この単一のファイルを、開始文字列「661」と日付（2016MMDD）に基づいて複数のファイルに分割し、分割したファイルの名前を20160315.txt、20160316.txtなどに変更します。たとえば、各分割ファイルには次のものがあります。

20160315.txtには以下が含まれます。

661###############20160315######## ################################ ################################

20160316.txtには次のものがあります。

661###############20160316######## ################################

20160317.txtには次のものがあります。

661###############20160317####### ###############################

それを行うことができるawkコマンドはありますか？

maulinglawns · Answer

これを実行できるawkコマンドがあると確信しています。私は、解決策を考え出すのに十分なawkのスキルがありません。それまでの間、次のようなものを使用できます。

#!/bin/bash csplit -z tosplit /661/ {*} for file in xx*; do newName=$(egrep -o '2[0-9]{7}' $file) mv $file $newName.txt done rm -rf xx*

tosplitはこのファイル（サンプルファイル）です。

661###############20160315### ########################### ########################### 661###############20160316### ########################### 661###############20160317### ###########################

このスクリプトを（tosplitファイルと同じディレクトリで）実行した後、3つのファイルを取得します。

ls 2016031* 20160315.txt 20160316.txt 20160317.txt

...このように見えます：

cat 20160315.txt 661###############20160315### ########################### ########################### cat 20160316.txt 661###############20160316### ########################### cat 20160317.txt 661###############20160317### ###########################

おそらく（？）csplitを使用してファイルに名前を付けることもできますが、それも私の給与水準を上回っています。

FloHimself · Answer

awkのようなもので

awk '/^661/{f=substr($0,match($0,/2016[0-9]{4}/),8)".txt"}{print>>f}' file.txt

あなたのために働くかもしれません。

基本的にパーツは次のとおりです。

/^661/{...} # on each line starting with 661 match($0,/2016[0-9]{4}/) # find the index of the date (2016MMDD) in current line substr($0,match($0,/2016[0-9]{4}/),8) # extract the the date in the current line f=substr($0,match($0,/2016[0-9]{4}/),8)".txt" # assign it to f and append ".txt" {print>>f} # redirect the content of the current line into the file named by f

従来のawk実装では、間隔式を次のように置き換える必要がある場合があります。

awk '/^661/{f=substr($0,match($0,/2016[01][0-9][0-9][0-9]/),8)".txt"}{print>>f}' file.txt

ユースケースによっては、リダイレクトの動作、つまりprint>fとprint>>fを変更することもできます。