fgetcsvでCSVファイルを読み取る際のUTF-8の問題

Question

CSVを読み取ってコンテンツをエコーしようとしました。しかし、コンテンツには誤った文字が表示されます。

MäxMüstermänn->MÃ¤xMü¼stermä¤nn

CSVファイルのエンコードは、BOMなしのUTF-8です（Notepad ++でチェック）。

これはCSVファイルの内容です。

_"Mäx";"Müstermänn"_

My PHP script

_<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> </head> <body> <?php $handle = fopen ("specialchars.csv","r"); echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>'; while ($data = fgetcsv ($handle, 1000, ";")) { $num = count ($data); for ($c=0; $c < $num; $c++) { // output data echo "<td>$data[$c]</td>"; } echo "</tr><tr>"; } ?> </body> </html> _

提案どおりにsetlocale(LC_ALL, 'de_DE.utf8');を使用しようとしました here 成功しませんでした。コンテンツはまだ間違って表示されます。

私は何が欠けていますか？

編集：

echo mb_detect_encoding($data[$c],'UTF-8');は、UTF-8 UTF-8を提供します。

echo file_get_contents("specialchars.csv");は_"MÃ¤x";"MÃ¼stermÃ¤nn"_をくれます。

そして

_print_r(str_getcsv(reset(explode("
", file_get_contents("specialchars.csv"))), ';')) _

私にくれます

Array ( [0] => MÃ¤x [1] => MÃ¼stermÃ¤nn )

どういう意味ですか？

testing · Accepted Answer

headerコマンドを削除した後、動作するようになりました。問題は、phpファイルのエンコードがISO-8859-1にあったことだと思います。 BOMなしでUTF-8に設定しました。私はすでにそれをやったと思っていましたが、おそらく追加の取り消しを行いました。

さらに、SET NAMES 'utf8'データベース用。これで、データベースでも正しくなりました。

robssanches · Answer

これを試して：

<?php $handle = fopen ("specialchars.csv","r"); echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>'; while ($data = fgetcsv ($handle, 1000, ";")) { $data = array_map("utf8_encode", $data); //added $num = count ($data); for ($c=0; $c < $num; $c++) { // output data echo "<td>$data[$c]</td>"; } echo "</tr><tr>"; } ?>

user2992220 · Answer

同様の問題が発生しました：CSVファイルをé、è、öなどの特殊文字で解析します...

以下は私のためにうまくいきました：

Htmlページで文字を正しく表現するには、ヘッダーが必要でした：

header('Content-Type: text/html; charset=UTF-8');

すべての文字を正しく解析するために、以下を使用しました：

utf8_encode(fgets($file));

次のようなすべての文字列操作で「マルチバイト文字列関数」を使用することを忘れないでください。

mb_strtolower($value, 'UTF-8');

Andreas Stokholm · Answer

これをファイルの先頭（他の出力の前）に入れてみてください：

<?php header('Content-Type: text/html; charset=UTF-8'); ?>

Manvel · Answer

問題は、関数がUTF-8を返す（mb_detect_encodingを使用して確認できる）が、変換しないため、これらの文字はUTF-8として扱われることです。したがって、 iconv を使用して、初期エンコード（Windows-1251またはCP1251）への逆変換を行う必要があります。しかし、fgetcsvによって配列が返されるため、カスタム関数を記述することをお勧めします。[Sorry for my english]

function customfgetcsv(&$handle, $length, $separator = ';'){ if (($buffer = fgets($handle, $length)) !== false) { return explode($separator, iconv("CP1251", "UTF-8", $buffer)); } return false; }

Petr Hlad&#237;k · Answer

私の場合、ソースファイルにはwindows-1250エンコーディングがあり、iconvは入力文字列の不正な文字に関する通知を大量に出力します...

だから、この解決策は私を大いに助けました：

/** * getting CSV array with UTF-8 encoding * * @param resource &$handle * @param integer $length * @param string $separator * * @return array|false */ private function fgetcsvUTF8(&$handle, $length, $separator = ';') { if (($buffer = fgets($handle, $length)) !== false) { $buffer = $this->autoUTF($buffer); return str_getcsv($buffer, $separator); } return false; } /** * automatic convertion windows-1250 and iso-8859-2 info utf-8 string * * @param string $s * * @return string */ private function autoUTF($s) { // detect UTF-8 if (preg_match('#[\x80-\x{1FF}\x{2000}-\x{3FFF}]#u', $s)) return $s; // detect WINDOWS-1250 if (preg_match('#[\x7F-\x9F\xBC]#', $s)) return iconv('WINDOWS-1250', 'UTF-8', $s); // assume ISO-8859-2 return iconv('ISO-8859-2', 'UTF-8', $s); }

@manvelの答えへの応答-爆発の代わりにstr_getcsvを使用してください-このような場合のため：

some;Nice;value;"and;here;comes;combinated;value";and;some;others

explodeは文字列を部分に分解します：

some Nice value "and here comes combinated value" and some others

しかし、str_getcsvは文字列を部分に分解します：

some Nice value and;here;comes;combinated;value and some others