区切り文字に基づいて1つのファイルを複数のファイルに分割する

Question

各セクションの後に区切り文字として-|を含むファイルが1つあります... unixを使用して各セクションに個別のファイルを作成する必要があります。

入力ファイルの例

wertretr ewretrtret 1212132323 000232 -| ereteertetet 232434234 erewesdfsfsfs 0234342343 -| jdhg3875jdfsgfd sjdhfdbfjds 347674657435 -|

ファイル1の期待される結果

wertretr ewretrtret 1212132323 000232 -|

ファイル2の期待される結果

ereteertetet 232434234 erewesdfsfsfs 0234342343 -|

ファイル3の期待される結果

jdhg3875jdfsgfd sjdhfdbfjds 347674657435 -|

ctrl-alt-delor · Answer

ワンライナー、プログラミングなし。（正規表現などを除く）

csplit --digits=2 --quiet --prefix=outfile infile "/-|/+1" "{*}"

William Pursell · Answer

awk '{print $0 " -|"> "file" NR}' RS='-\|' input-file

説明（編集済み）：

RSはレコード区切り文字であり、このソリューションではgnu awk拡張機能を使用して、複数の文字にすることができます。 NRはレコード番号です。

Printステートメントは、" -|"が続くレコードを、名前にレコード番号を含むファイルに印刷します。

twalberg · Answer

Debianにはcsplitがありますが、それがall/most/otherディストリビューションに共通しているかどうかはわかりません。そうでない場合でも、ソースを追跡してコンパイルするのはそれほど難しくないはずです...

John David Smith · Answer

ファイルに、後続のテキストが進むべき名前の行が含まれる、わずかに異なる問題を解決しました。このPerlコードは私のためのトリックを行います：

#!/path/to/Perl -w #comment the line below for UNIX systems use Win32::Clipboard; # Get command line flags #print ($#ARGV, "
"); if($#ARGV == 0) { print STDERR "usage: ncsplit.pl --mff -- filename.txt [...] 

Note that no space is allowed between the '--' and the related parameter.

The mff is found on a line followed by a filename. All of the contents of filename.txt are written to that file until another mff is found.
"; exit; } # this package sets the ARGV count variable to -1; use Getopt::Long; my $mff = ""; GetOptions('mff' => \$mff); # set a default $mff variable if ($mff eq "") {$mff = "-#-"}; print ("using file switch=", $mff, "

"); while($_ = shift @ARGV) { if(-f "$_") { Push @filelist, $_; } } # Could be more than one file name on the command line, # but this version throws away the subsequent ones. $readfile = $filelist[0]; open SOURCEFILE, "<$readfile" or die "File not found...

"; #print SOURCEFILE; while (<SOURCEFILE>) { /^$mff (.*$)/o; $outname = $1; # print $outname; # print "right is: $1 
"; if (/^$mff /) { open OUTFILE, ">$outname" ; print "opened $outname
"; } else {print OUTFILE "$_"}; }

Thanh · Answer

次のコマンドが機能します。それが役に立てば幸い。

awk 'BEGIN{file = 0; filename = "output_" file ".txt"} /-|/ {getline; file ++; filename = "output_" file ".txt"} {print $0 > filename}' input

rkyser · Answer

Awkも使用できます。私はawkにあまり詳しくありませんが、次のことがうまくいくようでした。 part1.txt、part2.txt、part3.txt、part4.txtを生成しました。これが生成する最後のpartn.txtファイルは空であることに注意してください。私はそれをどのように修正するのか分かりませんが、少し調整することでそれができると確信しています。誰か提案はありますか？

awk_pattern file：

BEGIN{ fn = "part1.txt"; n = 1 }
{
   print > fn
   if (substr($0,1,2) == "-|") {
       close (fn)
       n++
       fn = "part" n ".txt"
   }
}

bashコマンド：

awk -f awk_pattern input.file

awk_pattern file：

BEGIN{ fn = "part1.txt"; n = 1 } { print > fn if (substr($0,1,2) == "-|") { close (fn) n++ fn = "part" n ".txt" } }

bashコマンド：

awk -f awk_pattern input.file

Aaron Hall · Answer

csplitを使用している場合は使用します。

そうではないが、Pythonがあれば... Perlを使用しないでください。

ファイルの遅延読み取り

ファイルが大きすぎてメモリに一度に保持できない場合があります。1行ずつ読み取ることをお勧めします。入力ファイルの名前が「samplein」であると仮定します。

$ python3 -c "from itertools import count with open('samplein') as file: for i in count(): firstline = next(file, None) if firstline is None: break with open(f'out{i}', 'w') as out: out.write(firstline) for line in file: out.write(line) if line == '-|
': break"

ctrlc-root · Answer

Python 3スクリプトは、区切り文字で指定されたファイル名に基づいてファイルを複数のファイルに分割します。入力ファイルの例：

# Ignored ######## FILTER BEGIN foo.conf This goes in foo.conf. ######## FILTER END # Ignored ######## FILTER BEGIN bar.conf This goes in bar.conf. ######## FILTER END

スクリプトは次のとおりです。

#!/usr/bin/env python3 import os import argparse # global settings start_delimiter = '######## FILTER BEGIN' end_delimiter = '######## FILTER END' # parse command line arguments parser = argparse.ArgumentParser() parser.add_argument("-i", "--input-file", required=True, help="input filename") parser.add_argument("-o", "--output-dir", required=True, help="output directory") args = parser.parse_args() # read the input file with open(args.input_file, 'r') as input_file: input_data = input_file.read() # iterate through the input data by line input_lines = input_data.splitlines() while input_lines: # discard lines until the next start delimiter while input_lines and not input_lines[0].startswith(start_delimiter): input_lines.pop(0) # corner case: no delimiter found and no more lines left if not input_lines: break # extract the output filename from the start delimiter output_filename = input_lines.pop(0).replace(start_delimiter, "").strip() output_path = os.path.join(args.output_dir, output_filename) # open the output file print("extracting file: {0}".format(output_path)) with open(output_path, 'w') as output_file: # while we have lines left and they don't match the end delimiter while input_lines and not input_lines[0].startswith(end_delimiter): output_file.write("{0}
".format(input_lines.pop(0))) # remove end delimiter if present if not input_lines: input_lines.pop(0)

最後に、実行方法を示します。

$ python3 script.py -i input-file.txt -o ./output-folder/

amaksr · Answer

これを行うPerlコードを次に示します

#!/usr/bin/Perl open(FI,"file.txt") or die "Input file not found"; $cur=0; open(FO,">res.$cur.txt") or die "Cannot open output file $cur"; while(<FI>) { print FO $_; if(/^-\|/) { close(FO); $cur++; open(FO,">res.$cur.txt") or die "Cannot open output file $cur" } } close(FO);

mbonnin · Answer

cat file| ( I=0; echo -n "">file0; while read line; do echo $line >> file$I; if [ "$line" == '-|' ]; then I=$[I+1]; echo -n "" > file$I; fi; done )

およびフォーマットされたバージョン：

#!/bin/bash cat FILE | ( I=0; echo -n"">file0; while read line; do echo $line >> file$I; if [ "$line" == '-|' ]; then I=$[I+1]; echo -n "" > file$I; fi; done; )