複数のファイルを複数のファイルにcsplit

Question

人々

私はこれで少し困惑しています。私はcsplitを使用して複数の入力ファイルを取得し、同じパターンに従ってそれらを分割するbashスクリプトを記述しようとしています。（コンテキスト：複数のTeXファイルに質問があり、\ questionコマンドで区切られています。各質問を独自のファイルに抽出したいと思います。）

私がこれまでに持っているコード：

#!/bin/bash # This script uses csplit to run through an input TeX file (or list of TeX files) to separate out all the questions into their own files. # This line is for the user to input the name of the file they need questions split from. read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files read -ep "Type the directory where you would like to save the split files: " save read -ep "What unit do these questions belong to?" unit # This is a check for the user to confirm the file list, and proceed if true: echo "The file(s) being split is/are $files. Please confirm that you wish to split this file, or cancel." select ynf in "Yes" "No"; do case $ynf in No ) exit;; Yes ) echo "The split files will be saved to $save. Please confirm that you wish to save the files here." select ynd in "Yes" "No"; do case $ynd in Yes ) # This line will create a loop to conduct the script over all the files in the list. for i in ${files[@]} do # Mass re-naming is formatted to give "guestion###.tex' to enable processing a large number of questions quickly. # csplit is the utility used here; run "man csplit" to learn more of its functionality. # the structure is "csplit [name of file] [output options] [search filter] [separator(s)]. # this script calls csplit, will accept the name of the file in the argument, searches the files for calls of "question", splits the file everywhere it finds a line with "question", and renames it according to the scheme [prefix]#[suffix] (the %03d in the suffix-format is what increments the numbering automatically). # the '\question' allows searching for \question, which eliminates the split for \end{questions}; eliminating the \begin{questions} split has not yet been understood. csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\question'/ '{*}' done; exit;; No ) exit;; esac done esac done return

私が持っている入力ファイルに対して意図したとおりにループを実行することを確認できます。ただし、私が気付いている動作は、最初のファイルが「q1.tex q2.tex q3.tex」に期待どおりに分割され、リスト内の次のファイルに移動すると、古いファイルを質問して上書きし、3番目のファイルは2番目のファイルの分割を上書きします。たとえば、File1に3つの質問がある場合、次のように出力されます。

q1.tex q2.tex q3.tex

そして、File2に4つの質問がある場合、次のように増加し続けます。

q4.tex q5.tex q6.tex q7.tex

Csplitがこのループで既に行われた番号付けを検出し、適切にインクリメントする方法はありますか？

皆さんが提供できる支援をありがとう！

roaima · Accepted Answer

csplitコマンドには保存されたコンテキストがないため（また、その必要もありません）、常に1からカウントを開始します。これを修正する方法はありませんが、プレフィックス文字列に補間する独自のカウント値を維持できます。

または、交換してみてください

read -ep "Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. " files ... for i in ${files[@]} do csplit $i --prefix=$save'/'$unit'q' --suffix-format='%03d.tex' /'\question'/ '{*}' done

と

read -a files -ep 'Type the directory and/or name of the file needed to split. If there is more than one file, enter the files separated by a space. ' ... cat "${files[@]}" | csplit - --prefix="$save/${unit}q" --suffix-format='%03d.tex' '/\question/' '{*}'

これは、csplitが単一のファイル引数（またはstdinの場合はcat {file} | ...）を受け取るため、-を実際に使用する必要がある比較的まれなインスタンスの1つです。。

for ... do csplit ...ループで（正しく）使用しようとしているので、配列変数を使用するようにreadアクションを変更しました。

最終的に何をするかに関わらず、特に"${files[@]}"などの配列リストを使用する場合は、使用するすべての変数を二重引用符で囲むことを強くお勧めします。

JJoao · Answer

Awkを使用すると、次のように実行できます。

awk '/\question/ {i++} ; {print > "q" i ".tex"}' exam*.tex

Out-dir（d）とtopic（t）を定義し、数値の長さを制御する場合：

awk '/\question/ {f=sprintf("%s/%s-q%03d.tex", d, t, i++)} {print>f}' d=d1 t=t1 ex*

TeX preambuloをスキップするために、「f」が定義されているときに「印刷」できます。

awk '/\question/ {f=sprintf("%s/%s-q%03d.tex", d, t, ++i)} f {print>f}' d=d1 t=t1 ex*

durmus yılmaz · Answer

このスクリプトを使用できます

grep -o -P '(parameter).*(parameter)' your_teX_file.teX > questions.txt

あなたはquestions.txtファイルをすべての質問に使用すると、分割できます。

split -l 1 questions.txt