テキストファイルから読み取り、C言語で行を解析して単語にする

Question

私はCおよびシステムプログラミングの初心者です。宿題を出すには、標準入力の解析行からの入力をワードに読み取り、System Vメッセージキューを使用してソートサブプロセスにワードを送信するプログラム（ワードのカウントなど）を作成する必要があります。入力部が動かなくなってしまいました。入力を処理して、非アルファ文字を削除し、すべてのアルファワードを小文字にして、最後に、ワードの行を複数のワードに分割しようとしています。これまでのところ、すべてのアルファワードを小文字で印刷できますが、単語間に行がありますが、これは正しくないと思います。誰かが見て、私にいくつかの提案をしてもらえますか？

テキストファイルの例：The Home of Iliad of HomerのProject Gutenberg EBook、Homer作

正しい出力は次のようになるはずです。

the project gutenberg ebook of the iliad of homer by homer

しかし、私の出力は次のとおりです：

project gutenberg ebook of the iliad of homer <------There is a line there by homer

空行は「、」と「by」の間のスペースが原因だと思います。「if isspace（c）なら何もしない」などを試しましたが、うまくいきません。私のコードは以下です。任意の助けや提案をいただければ幸いです。

#include <stdio.h> #include <stdlib.h> #include <ctype.h> #include <fcntl.h> #include <errno.h> #include <unistd.h> #include <string.h> //Main Function int main (int argc, char **argv) { int c; char *input = argv[1]; FILE *input_file; input_file = fopen(input, "r"); if (input_file == 0) { //fopen returns 0, the NULL pointer, on failure perror("Canot open input file
"); exit(-1); } else { while ((c =fgetc(input_file)) != EOF ) { //if it's an alpha, convert it to lower case if (isalpha(c)) { c = tolower(c); putchar(c); } else if (isspace(c)) { ; //do nothing } else { c = '
'; putchar(c); } } } fclose(input_file); printf("
"); return 0; }

編集**

私はコードを編集して、最終的に正しい出力を得ました：

int main (int argc, char **argv) { int c; char *input = argv[1]; FILE *input_file; input_file = fopen(input, "r"); if (input_file == 0) { //fopen returns 0, the NULL pointer, on failure perror("Canot open input file
"); exit(-1); } else { int found_Word = 0; while ((c =fgetc(input_file)) != EOF ) { //if it's an alpha, convert it to lower case if (isalpha(c)) { found_Word = 1; c = tolower(c); putchar(c); } else { if (found_Word) { putchar('
'); found_Word=0; } } } } fclose(input_file); printf("
"); return 0; }

Rob · Accepted Answer

アルファベット以外の文字!isalpha(c)を無視する必要があるだけだと思います。それ以外の場合は小文字に変換します。この場合、Wordを見つけたら追跡する必要があります。

int found_Word = 0; while ((c =fgetc(input_file)) != EOF ) { if (!isalpha(c)) { if (found_Word) { putchar('
'); found_Word = 0; } } else { found_Word = 1; c = tolower(c); putchar(c); } }

「ではない」などの単語内でアポストロフィを処理する必要がある場合は、これで処理できます-

int found_Word = 0; int found_apostrophe = 0; while ((c =fgetc(input_file)) != EOF ) { if (!isalpha(c)) { if (found_Word) { if (!found_apostrophe && c=='\'') { found_apostrophe = 1; } else { found_apostrophe = 0; putchar('
'); found_Word = 0; } } } else { if (found_apostrophe) { putchar('\''); found_apostrophe = 0; } found_Word = 1; c = tolower(c); putchar(c); } }

abarnert · Answer

allアルファベット以外の文字をセパレーターとして扱い、スペースをセパレーターとして扱い、アルファベット以外の文字を無視したいのではないでしょうか。それ以外の場合、_foo--bar_は単一のWord foobarとして表示されますよね？良い知らせは、物事をより簡単にするということです。 isspace句を削除して、else句を使用することができます。

一方、句読点を特別に扱うかどうかに関係なく、問題があります。スペースの改行をまったく印刷します。したがって、__または__で終わる行、または_._で終わる文でも、空白行が出力されます。それを回避する明白な方法は、最後の文字またはフラグを追跡することです。そのため、以前に文字を印刷したことがある場合にのみ改行を印刷します。

例えば：

_int last_c = 0 while ((c = fgetc(input_file)) != EOF ) { //if it's an alpha, convert it to lower case if (isalpha(c)) { c = tolower(c); putchar(c); } else if (isalpha(last_c)) { putchar(c); } last_c = c; } _

しかし、本当にすべての句読点を同じように扱いたいですか？問題の記述はあなたがそうすることを意味しますが、実際の生活ではそれは少し奇妙です。たとえば、_foo--bar_はおそらくfooとbarの別々の単語として表示されますが、_it's_は実際にitとsの別々の単語として表示されますか？さらに言えば、isalphaを "Word文字"のルールとして使用すると、たとえば、_2nd_はndと表示されます。

したがって、isasciiがWordの文字と区切り文字を区別するための適切なルールではない場合、正しい区別を行う独自の関数を作成する必要があります。このようなルールは、ロジック（isalnum(c) || c == '\''など）またはテーブル（128整数の配列だけなので、関数は_c >= 0 && c < 128 && Word_char_table[c]_）で簡単に表現できます。そのようにすると、後でコードを拡張してLatin-1やUnicodeを処理したり、プログラムテキスト（英語のテキストとは異なるWord文字が含まれる）を処理したり、…

P0W · Answer

スペースで区切られているようですので、

while ((c =fgetc(input_file)) != EOF ) { if (isalpha(c)) { c = tolower(c); putchar(c); } else if (isspace(c)) { putchar('
'); } }

も動作します。入力テキストの単語間に複数のスペースがないことを条件とします。