UnicodeEncodeError：「latin-1」コーデックは文字をエンコードできません

Question

データベースに外部文字を挿入しようとすると、このエラーの原因は何ですか？

>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)

そして、どうすれば解決できますか？

ありがとう！

bobince · Accepted Answer

文字U + 201C左二重引用符は、Latin-1（ISO-8859-1）エンコーディングには存在しません。

It isコードページ1252（西ヨーロッパ）に存在します。これはISO-8859-1に基づくWindows固有のエンコードですが、余分な文字を0x80-0x9Fの範囲に入れます。コードページ1252はISO-8859-1と混同されることがよくあります。これは、ページをISO-8859-1として提供すると、ブラウザがそれらを代わりにcp1252として扱うという迷惑ですが、現在標準的なWebブラウザの動作です。ただし、実際には2つの異なるエンコーディングです。

>>> u'He said \u201CHello\u201D'.encode('iso-8859-1') UnicodeEncodeError >>> u'He said \u201CHello\u201D'.encode('cp1252') 'He said \x93Hello\x94'

データベースをバイトストアとしてのみ使用している場合、cp1252を使用して、“およびWindows Westernコードページに存在するその他の文字をエンコードできます。ただし、cp1252に存在しない他のUnicode文字はエラーの原因になります。

encode(..., 'ignore')を使用して文字を削除することでエラーを抑制することができますが、実際には今世紀にはデータベースとページの両方でUTF-8を使用する必要があります。このエンコードにより、任意の文字を使用できます。また、MySQLにUTF-8文字列を使用していることを理想的に伝える必要があります（データベース接続と文字列列の照合を設定することにより）。大文字と小文字を区別しない比較と並べ替えを正しく行うことができます。

Nick · Answer

Python MySQLdbモジュールを使用しているときに、この同じ問題に遭遇しました。 MySQLでは、文字セットに関係なく、テキストフィールドに必要なほぼすべてのバイナリデータを保存できるため、ここで解決策を見つけました。

Python MySQLdbでUTF8を使用

編集：上記のURLから引用して、最初のコメントの要求を満たす...

「UnicodeEncodeError： 'latin-1' codecは文字をエンコードできません...」

これは、MySQLdbが通常everythinをlatin-1にエンコードしようとするためです。これは、接続を確立した直後に次のコマンドを実行することで修正できます。

db.set_character_set('utf8') dbc.execute('SET NAMES utf8;') dbc.execute('SET CHARACTER SET utf8;') dbc.execute('SET character_set_connection=utf8;')

「db」はMySQLdb.connect()の結果であり、「dbc」はdb.cursor()の結果です。

Cheney · Answer

最良の解決策は

mysqlの文字セットを「utf-8」に設定する
このコメントのようにします（use_unicode=Trueおよびcharset="utf8"を追加）

db = MySQLdb.connect（Host = "localhost"、user = "root"、passwd = ""、db = "testdb"、use_unicode = True、charset = "utf8"）–キョンフンキム14年3月13日17:04

詳細を参照してください：

class Connection(_mysql.connection): """MySQL Database Connection Object""" default_cursor = cursors.Cursor def __init__(self, *args, **kwargs): """ Create a connection to the database. It is strongly recommended that you only use keyword parameters. Consult the MySQL C API documentation for more information. Host string, Host to connect user string, user to connect as passwd string, password to use db string, database to use port integer, TCP/IP port to connect to unix_socket string, location of unix_socket to use conv conversion dictionary, see MySQLdb.converters connect_timeout number of seconds to wait before the connection attempt fails. compress if set, compression is enabled named_pipe if set, a named pipe is used to connect (Windows only) init_command command which is run once the connection is created read_default_file file from which default client values are read read_default_group configuration group to use from the default file cursorclass class object, used to create cursors (keyword only) use_unicode If True, text-like columns are returned as unicode objects using the connection's character set. Otherwise, text-like columns are returned as strings. columns are returned as normal strings. Unicode objects will always be encoded to the connection's character set regardless of this setting. charset If supplied, the connection character set will be changed to this character set (MySQL-4.1 and newer). This implies use_unicode=True. sql_mode If supplied, the session SQL mode will be changed to this setting (MySQL-4.1 and newer). For more details and legal values, see the MySQL documentation. client_flag integer, flags to use or 0 (see MySQL docs or constants/CLIENTS.py) ssl dictionary or mapping, contains SSL connection parameters; see the MySQL documentation for more details (mysql_ssl_set()). If this is set, and the client does not support SSL, NotSupportedError will be raised. local_infile integer, non-zero enables LOAD LOCAL INFILE; zero disables autocommit If False (default), autocommit is disabled. If True, autocommit is enabled. If None, autocommit isn't set and server default is used. There are a number of undocumented, non-standard methods. See the documentation for the MySQL C API for some hints on what they do. """

knitti · Answer

データベースが少なくともUTF-8であることを願っています。次に、yourstring.encode('utf-8')を実行してから、データベースに配置する必要があります。

jabley · Answer

Unicodeコードポイント\u201cを保存しようとしていますが、そのコードポイントを記述できないエンコードISO-8859-1 / Latin-1を使用しています。 utf-8を使用するようにデータベースを変更し、適切なエンコードを使用して文字列データを保存する必要がある場合、またはコンテンツを保存する前に入力をサニタイズする必要がある場合があります。すなわち Sam Rubyの優れた国際化ガイドのようなものを使用します。 windows-1252が引き起こす可能性のある問題について説明し、その処理方法とサンプルコードへのリンクを提案します。

mgojohn · Answer

SQLAlchemyユーザーは、単にフィールドをconvert_unicode=Trueとして指定できます。

例：sqlalchemy.String(1000, convert_unicode=True)

SQLAlchemyは、単にUnicodeオブジェクトを受け入れ、それらを返し、エンコード自体を処理します。

ドキュメント

msw · Answer

Latin-1（別名 ISO 8859-1 ）は単一のオクテット文字エンコーディングスキームであり、\u201c（“）を1バイトに収めることはできません。

UTF-8エンコードを使用するつもりでしたか？

Uday Allu · Answer

以下のスニペットを使用して、テキストをラテン語から英語に変換します

import unicodedata def strip_accents(text): return "".join(char for char in unicodedata.normalize('NFKD', text) if unicodedata.category(char) != 'Mn') strip_accents('áéíñóúü')

出力：

「愛のうう」