Rで区切り文字として複数のスペースを使用してテキストファイルを読み取る

Question

約94列と300万行で構成されるビッグデータセットがあります。このファイルには、列間の区切り文字として単一のスペースと複数のスペースがあります。私はRでこのファイルからいくつかの列を読み取る必要があります。これのために、以下のコードで見ることができるオプションでread.table（）を使用してみました、コードは以下に貼り付けられています-

### Defining the columns to be read from the file, the first 5 column, then we do not read next 24, after this we read next 5 columns. Last 60 columns are not read in- col_classes = c(rep("character",2), rep("numeric", 3), rep("NULL",24), rep("numeric", 5), rep("NULL", 60)) ### Reading first 100 rows of the data data <- read.table(file, sep = " ",header = F, nrows = 100, na.strings ="", stringsAsFactors= F)

読み込む必要があるファイルには、いくつかの列の間の区切り文字として複数のスペースがあるため、上記の方法は機能しません。このファイルを効率的に読み取ることができる方法はありますか？.

Simon O&#39;Hanlon · Accepted Answer

区切り文字を変更する必要があります。 " "は、1つの空白文字を指します。 ""は、任意の長さの空白を区切り文字として参照します

 data <- read.table(file, sep = "" , header = F , nrows = 100, na.strings ="", stringsAsFactors= F)

マニュアルから：

Sep = ""（read.tableのデフォルト）の場合、区切り文字は「空白」、つまり1つ以上のスペース、タブ、改行、またはキャリッジリターンです。

また、大きなデータファイルでは、data.table:::freadすぐにデータを直接data.tableに読み込みます。私は今朝この機能を使用していました。それはまだ実験的ですが、私はそれが実際に非常にうまくいくと思います。

littlebird · Answer

代わりにtidyverse（またはreadr）パッケージを使用する場合は、代わりにread_tableを使用できます。

read_table(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = "NA", skip = 0, n_max = Inf, guess_max = min(n_max, 1000), progress = show_progress(), comment = "")

そして、ここの説明を参照してください：

read_table() and read_table2() are designed to read the type of textual data where each column is #' separate by one (or more) columns of space.

cmbarbu · Answer

フィールドの幅が固定されている場合は、欠損値をより適切に処理できるread.fwf()の使用を検討する必要があります。