OverflowError：Python intが大きすぎてCに変換できませんlong torchtext.datasets.text_classification.DATASETS ['AG_NEWS']（）

Question

私は64ビットのWindows 10 OSをインストールしていますpython 3.6.8私はpipを使用してトーチとトーチテキストをインストールしました。トーチのバージョンは1.2.0です。

以下のコードを使用してAG_NEWSデータセットをロードしようとしています：

import torch import torchtext from torchtext.datasets import text_classification NGRAMS = 2 import os if not os.path.isdir('./.data'): os.mkdir('./.data') train_dataset, test_dataset = text_classification.DATASETS['AG_NEWS'](root='./.data', ngrams=NGRAMS, vocab=None)

上記のコードの最後のステートメントで、以下のエラーが発生しています。

--------------------------------------------------------------------------- OverflowError Traceback (most recent call last) <ipython-input-1-7e8544fdaaf6> in <module> 6 if not os.path.isdir('./.data'): 7 os.mkdir('./.data') ----> 8 train_dataset, test_dataset = text_classification.DATASETS['AG_NEWS'](root='./.data', ngrams=NGRAMS, vocab=None) 9 # BATCH_SIZE = 16 10 # device = torch.device("cuda" if torch.cuda.is_available() else "cpu") c:\users\pramodp\appdata\local\programs\python\python36\lib\site-packages	orchtext\datasets	ext_classification.py in AG_NEWS(*args, **kwargs) 168 """ 169 --> 170 return _setup_datasets(*(("AG_NEWS",) + args), **kwargs) 171 172 c:\users\pramodp\appdata\local\programs\python\python36\lib\site-packages	orchtext\datasets	ext_classification.py in _setup_datasets(dataset_name, root, ngrams, vocab, include_unk) 126 if vocab is None: 127 logging.info('Building Vocab based on {}'.format(train_csv_path)) --> 128 vocab = build_vocab_from_iterator(_csv_iterator(train_csv_path, ngrams)) 129 else: 130 if not isinstance(vocab, Vocab): c:\users\pramodp\appdata\local\programs\python\python36\lib\site-packages	orchtext\vocab.py in build_vocab_from_iterator(iterator) 555 counter = Counter() 556 with tqdm(unit_scale=0, unit='lines') as t: --> 557 for tokens in iterator: 558 counter.update(tokens) 559 t.update(1) c:\users\pramodp\appdata\local\programs\python\python36\lib\site-packages	orchtext\datasets	ext_classification.py in _csv_iterator(data_path, ngrams, yield_cls) 33 with io.open(data_path, encoding="utf8") as f: 34 reader = unicode_csv_reader(f) ---> 35 for row in reader: 36 tokens = ' '.join(row[1:]) 37 tokens = tokenizer(tokens) c:\users\pramodp\appdata\local\programs\python\python36\lib\site-packages	orchtext\utils.py in unicode_csv_reader(unicode_csv_data, **kwargs) 128 maxInt = int(maxInt / 10) 129 --> 130 csv.field_size_limit(sys.maxsize) 131 132 if six.PY2: OverflowError: Python int too large to convert to C long

以下のコードでも同じエラーが発生するため、問題はWindows osまたはtorchtextのどちらかにあると思います。

pos = data.TabularDataset( path='data/pos/pos_wsj_train.tsv', format='tsv', fields=[('text', data.Field()), ('labels', data.Field())])

誰か助けてもらえますか？そして、主にファイルに大きな数値はありません。

Nikhil Mehra · Accepted Answer

私も同様の問題に遭遇しました。 torchtext\utils.pyファイルのコード行を変更したところ、エラーが消えました。

csv.field_size_limit（sys.maxsize）-これを変更
csv.field_size_limit（maxInt）-これに

お役に立てれば。