UnicodeEncodeError： 'ascii'コーデックは0〜5の位置の文字をエンコードできません：範囲外の序数（128）

Question

私は単に\ uXXXX\uXXXX\uXXXXのような文字列をデコードしようとしています。しかし、エラーが発生します：

$ python Python 2.7.6 (default, Sep 9 2014, 15:04:36) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> print u'\u041e\u043b\u044c\u0433\u0430'.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

私はPython newbie。問題は何ですか？ありがとう！

Martijn Pieters · Accepted Answer

Pythonは役に立つようにしています。あなたはデコードできませんUnicodeデータ、既にデコードされています。だからPython最初にエンコードデータを（ASCII codecを使用して）デコードするバイトを取得します。失敗します。

Unicodeデータがある場合は、デコードするのではなく、UTF-8にencodeを指定するだけです。

>>> print u'\u041e\u043b\u044c\u0433\u0430' Ольга >>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8') '\xd0\x9e\xd0\xbb\xd1\x8c\xd0\xb3\xd0\xb0'

Unicode値が必要な場合は、Unicodeリテラル（u'...'）必要な作業はこれだけです。これ以上デコードする必要はありません。

同じ暗黙の変換が他の方向で行われます。バイト文字列をエンコードしようとすると、暗黙的なデコードがトリガーされます。

>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8').encode('utf8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

Ranvijay Sachan · Answer

デフォルトのエンコーディングutf-8を設定できます。

import sys reload(sys) sys.setdefaultencoding('utf-8')