データベースをUTF-8に変換しているときに、制御文字0x80-0x9Fに関する奇妙な動作に気づきました。たとえば、次のメソッドを使用すると、0x92(右アポストロフィ)はUTF-8に変換されず、列の残りのコンテンツが切り捨てられます。
CREATE TABLE `bar` (
`content` text
) ENGINE=MyISAM DEFAULT CHARSET=latin1
INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 1 row affected (0.06 sec)
SELECT content FROM bar;
+---------------------------------------------------------------------------------+
| content |
+---------------------------------------------------------------------------------+
| €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
+---------------------------------------------------------------------------------+
1 row in set (0.06 sec)
ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected, 1 warning (0.06 sec)
Records: 1 Duplicates: 0 Warnings: 1
SHOW WARNINGS;
+---------+------+-------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\x80\x81\x82\x83\x84\x85...' for column 'content' at row 1 |
+---------+------+-------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
SELECT * FROM bar;
+---------+
| content |
+---------+
| |
+---------+
1 row in set (0.06 sec)
通常、0x80-0x9FはLatin1では許可されませんが、MySQLはそれを異なる方法で処理するようです:
MySQLのlatin1は、Windowsのcp1252文字セットと同じです。これは、IANA latin1が0x80と0x9fの間のコードポイントを「未定義」として扱うのを除いて、公式のISO 8859-1またはIANA(Internet Assigned Numbers Authority)latin1と同じですが、cp1252、したがってMySQLのlatin1は文字を割り当てますそれらのポジションのために。 [src]
しかし、MySQLは、上記の範囲の値をそのlatin1文字セットからUTF-8文字セットに変換できないようです。
これらの文字は、Word文書(cp1252)からのコピー/貼り付けから私のデータベースに取り込まれます。アプリケーションに新しいエントリに適切なUTF-8値を強制させる方法を見つけたかもしれませんが、古いものを確認する必要があります正しく変換されました。
MySQL内で、これらを各テキスト列の各行を通過せずにASCII対応のバージョンに置き換えることなく、UTF-8に変換する方法がありませんか?
よくわかりません。私はあなたの問題を再現することから始めようとしましたが、オルターは私にとってはうまくいきました。
test > CREATE TABLE `bar` ( `content` text ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 0 rows affected (0.02 sec)
Query OK, 1 row affected (0.00 sec)
test > ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected (0.04 sec)
Records: 1 Duplicates: 0 Warnings: 0
test > select * from bar;
+---------------------------------+
| content |
+---------------------------------+
| ����������������������������� |
+---------------------------------+
1 row in set (0.00 sec)
test > set names utf8;
Query OK, 0 rows affected (0.00 sec)
test > select * from bar;
+---------------------------------------------------------------------------------+
| content |
+---------------------------------------------------------------------------------+
| €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
これが私の関連する文字設定です
test > show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
編集
セット名utf8を実行する前の私のchar設定
test > show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
バージョン
test > select version();
+-------------------------+
| version() |
+-------------------------+
| 5.1.41-3ubuntu12.10-log |
+-------------------------+
1 row in set (0.00 sec)
データをロードする前に、文字セットをcp1250に変換する必要がある場合があります。
私はこれを最初に実行しました
mysql> show character set like 'cp%';
+---------+---------------------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------------------+-------------------+--------+
| cp850 | DOS West European | cp850_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
+---------+---------------------------+-------------------+--------+
8 rows in set (0.00 sec)
cp1252はここに存在しません。最も近いのはcp1250です。
このシーケンスを試してください:
drop database if exists dtest;
create database dtest;
use dtest
set names cp1250;
CREATE TABLE `bar` (
`content` text
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
SELECT content FROM bar;
SHOW VARIABLES LIKE '%char%';
set names utf8;
SHOW VARIABLES LIKE '%char%';
ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
SELECT content FROM bar;
そして何が起こるかを見てください。
LinuxのMySQL 5.5.19でこれを取得しました
mysql> drop database if exists dtest;
Query OK, 0 rows affected (0.00 sec)
mysql> create database dtest;
Query OK, 1 row affected (0.00 sec)
mysql> use dtest
Database changed
mysql> set names cp1250;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE `bar` (
-> `content` text
-> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 1 row affected (0.00 sec)
mysql> SELECT content FROM bar;
+---------------------------------+
| content |
+---------------------------------+
| ??
?????? |
+---------------------------------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | cp1250 |
| character_set_connection | cp1250 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | cp1250 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected (0.01 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> SELECT content FROM bar;
+---------------------------------------------------------------------------------+
| content |
+---------------------ŽÂÂâââââ---------------------------------------------------+
| â¬ÂâÆââ¦â â¡â°Å â¹Å ¢Å¡âºÅÂ
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql>
私のWindows 7マシン上のMySQL for Windows 5.5.12でこれを取得しました
mysql> drop database if exists dtest;
Query OK, 1 row affected (0.00 sec)
mysql> create database dtest;
Query OK, 1 row affected (0.02 sec)
mysql> use dtest
Database changed
mysql> set names cp1250;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE `bar` (
-> `content` text
-> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
Query OK, 0 rows affected (0.06 sec)
mysql> INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 1 row affected (0.00 sec)
mysql> SELECT content FROM bar;
+---------------------------------+
| content |
+---------------------------------+
| Ç?é?äàåçëèï??Ä??æÆôöòûù?ÖÜ¢??₧? |
+---------------------------------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+---------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------+
| character_set_client | cp1250 |
| character_set_connection | cp1250 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | cp1250 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | C:\MySQL_5.5.12\share\charsets\ |
+--------------------------+---------------------------------+
8 rows in set (0.00 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+---------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | C:\MySQL_5.5.12\share\charsets\ |
+--------------------------+---------------------------------+
8 rows in set (0.00 sec)
mysql> ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected (0.06 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> SELECT content FROM bar;
+---------------------------------------------------------------------------------+
| content |
+---------------------------------------------------------------------------------+
| €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql>
試してみる !!!