WindowsコンソールアプリでUnicode文字列を出力する

Question

こんにちは、iostreamsを使用してUnicode文字列をコンソールに出力しようとして失敗しました。

私はこれを見つけました： C++コンソールアプリでUnicodeフォントを使用、このスニペットは動作します。

SetConsoleOutputCP(CP_UTF8); wchar_t s[] = L"èéøÞǽлљΣæča"; int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL); char* m = new char[bufferSize]; WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL); wprintf(L"%S", m);

ただし、iostreamでUnicodeを正しく出力する方法は見つかりませんでした。助言がありますか？

これは動作しません：

SetConsoleOutputCP(CP_UTF8); utf8_locale = locale(old_locale,new boost::program_options::detail::utf8_codecvt_facet()); wcout.imbue(utf8_locale); wcout << L"¡Hola!" << endl;

[〜＃〜] edit [〜＃〜]このスニペットをストリームにラップする以外の解決策は見つかりませんでした。誰かがより良いアイデアを持っていることを願っています。

//Unicode output for a Windows console ostream &operator-(ostream &stream, const wchar_t *s) { int bufSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL); char *buf = new char[bufSize]; WideCharToMultiByte(CP_UTF8, 0, s, -1, buf, bufSize, NULL, NULL); wprintf(L"%S", buf); delete[] buf; return stream; } ostream &operator-(ostream &stream, const wstring &s) { stream - s.c_str(); return stream; }

DuckMaestro · Answer

ここでは、Visual Studio 2010を使用してソリューションを検証しました。これを介して MSDN記事および MSDNブログ投稿を使用します。トリックは、_setmode(..., _O_U16TEXT)のあいまいな呼び出しです。

解決策：

#include <iostream> #include <io.h> #include <fcntl.h> int wmain(int argc, wchar_t* argv[]) { _setmode(_fileno(stdout), _O_U16TEXT); std::wcout << L"Testing unicode -- English -- Ελληνικά -- Español." << std::endl; }

スクリーンショット：

Unicode in console

David · Answer

中国語のUnicode Hello World

これが中国語のHello Worldです。実際には「こんにちは」です。これをWindows 10でテストしましたが、Windows Vista以降で動作する可能性があると思います。 Windows Vistaより前は、コンソール/レジストリなどを設定するのではなく、プログラムによる解決策が必要な場合は困難になります。Windows7でこれを本当に行う必要がある場合は、こちらをご覧ください。コンソールフォントWindows 7の変更

私はこれが唯一の解決策であると主張したくはありませんが、これは私のために働いたものです。

アウトライン

Unicodeプロジェクトのセットアップ
コンソールのコードページをユニコードに設定します
表示する文字をサポートするフォントを見つけて使用します
表示する言語のロケールを使用します
ワイド文字出力、つまりstd::wcoutを使用します

1プロジェクトのセットアップ

Visual Studio 2017 CEを使用しています。空のコンソールアプリを作成しました。デフォルト設定は大丈夫です。ただし、問題が発生した場合や別のIDEを使用している場合は、次のことを確認してください。

プロジェクトのプロパティで、構成プロパティ->一般->プロジェクトのデフォルト->文字セットを見つけます。「マルチバイト」ではなく「ユニコード文字セットを使用」にする必要があります。これにより、_UNICODEおよびUNICODEプリプロセッサマクロが定義されます。

int wmain(int argc, wchar_t* argv[])

また、wmainの代わりにmain関数を使用する必要があると思います。どちらも機能しますが、Unicode環境ではwmainの方が便利かもしれません。

また、ソースファイルはUTF-16-LEでエンコードされており、Visual Studio 2017のデフォルトのようです。

2.コンソールコードページ

これは非常に明白です。コンソールにユニコードコードページが必要です。デフォルトのコードページを確認したい場合は、コンソールを開いて引数なしでchcpと入力してください。 UTF-8コードページである65001に変更する必要があります。 Windowsコードページ識別子そのコードページにはプリプロセッサマクロがあります：CP_UTF8。入力コードページと出力コードページの両方を設定する必要がありました。どちらかを省略した場合、出力は正しくありませんでした。

SetConsoleOutputCP(CP_UTF8); SetConsoleCP(CP_UTF8);

これらの関数のブール値の戻り値を確認することもできます。

3.フォントを選択します

まだ、すべての文字をサポートするコンソールフォントは見つかりませんでした。そのため、1つを選択する必要がありました。一部は1つのフォントでのみ、一部は別のフォントで使用できる文字を出力したい場合、解決策を見つけることは不可能だと思います。多分、すべての文字をサポートするフォントがそこにある場合のみ。しかし、フォントのインストール方法も検討しませんでした。

同じコンソールウィンドウで2つの異なるフォントを同時に使用することはできないと思います。

互換性のあるフォントを見つける方法は？コンソールを開き、ウィンドウの左上にあるアイコンをクリックして、コンソールウィンドウのプロパティに移動します。 [フォント]タブに移動し、フォントを選択して[OK]をクリックします。次に、コンソールウィンドウに文字を入力してみます。使用できるフォントが見つかるまでこれを繰り返します。次に、フォントの名前を書き留めます。

また、プロパティウィンドウでフォントのサイズを変更できます。満足のいくサイズが見つかった場合は、プロパティウィンドウの「選択したフォント」セクションに表示されるサイズ値を書き留めます。幅と高さをピクセル単位で表示します。

プログラムでフォントを実際に設定するには、次を使用します。

CONSOLE_FONT_INFOEX fontInfo; // ... configure fontInfo SetCurrentConsoleFontEx(hConsole, false, &fontInfo);

詳細については、この回答の最後にある私の例を参照してください。または、細かいマニュアルで調べてください： SetCurrentConsoleFont 。この関数は、Windows Vista以降にのみ存在します。

4.ロケールを設定する

ロケールを、文字を印刷する言語のロケールに設定する必要があります。

char* a = setlocale(LC_ALL, "chinese");

戻り値は興味深いです。選択されたロケールを正確に記述する文字列が含まれます。試してみてください:-) chineseおよびgermanでテストしました。詳細： setlocale

5.ワイド文字出力を使用する

ここで言うことはあまりありません。ワイド文字を出力する場合は、たとえば次を使用します。

std::wcout << L"你好" << std::endl;

ああ、ワイド文字のLプレフィックスを忘れないでください！また、ソースファイルにこのようなリテラルUnicode文字を入力する場合、ソースファイルはUnicodeエンコードする必要があります。 Visual StudioのデフォルトはUTF-16-LEです。または、 notepad ++ を使用して、エンコーディングをUCS-2 LE BOMに設定することもできます。

例

最後に、例としてすべてをまとめます。

#include <Windows.h> #include <iostream> #include <io.h> #include <fcntl.h> #include <locale.h> #include <wincon.h> int wmain(int argc, wchar_t* argv[]) { SetConsoleTitle(L"My Console Window - 你好"); HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE); char* a = setlocale(LC_ALL, "chinese"); SetConsoleOutputCP(CP_UTF8); SetConsoleCP(CP_UTF8); CONSOLE_FONT_INFOEX fontInfo; fontInfo.cbSize = sizeof(fontInfo); fontInfo.FontFamily = 54; fontInfo.FontWeight = 400; fontInfo.nFont = 0; const wchar_t myFont[] = L"KaiTi"; fontInfo.dwFontSize = { 18, 41 }; std::copy(myFont, myFont + (sizeof(myFont) / sizeof(wchar_t)), fontInfo.FaceName); SetCurrentConsoleFontEx(hConsole, false, &fontInfo); std::wcout << L"Hello World!" << std::endl; std::wcout << L"你好!" << std::endl; return 0; }

乾杯！

Puppy · Answer

Wcoutには、CRTとは異なるロケール設定が必要です。修正方法は次のとおりです。

int _tmain(int argc, _TCHAR* argv[]) { char* locale = setlocale(LC_ALL, "English"); // Get the CRT's current locale. std::locale lollocale(locale); setlocale(LC_ALL, locale); // Restore the CRT. std::wcout.imbue(lollocale); // Now set the std::wcout to have the locale that we got from the CRT. std::wcout << L"¡Hola!"; std::cin.get(); return 0; }

私はちょうどそれをテストしました、そしてそれはここで文字列を絶対にうまく表示します。

Henrik Haftmann · Answer

SetConsoleCP（）および chcp 同じではありません！

このプログラムスニペットをご覧ください。

_SetConsoleCP(65001) // 65001 = UTF-8 static const char s[]="tränenüberströmt™
"; DWORD slen=lstrlen(s); WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE),s,slen,&slen,NULL); _

ソースコードはUTF-8として保存する必要があります なしで BOM（バイトオーダーマーク;署名）。次に、Microsoftコンパイラー cl.exe UTF-8文字列をそのまま使用します。
このコードが保存されている場合と BOM、cl.exeは文字列をANSI（つまりCP1252）にトランスコードしますが、これはCP65001（= UTF-8）とは一致しません。

表示フォントを変更します ルシディアコンソールそれ以外の場合、UTF-8出力はまったく機能しません。

タイプ：chcp
回答：_850_
タイプ：_test.exe_
回答：_tr├ñnen├╝berstr├ÂmtÔäó_
タイプ：chcp
回答：_65001_ -この設定はSetConsoleCP()によって変更されましたが、有用な効果はありません。
タイプ：_chcp 65001_
タイプ：_test.exe_
回答：_tränenüberströmt™_ -すべてOKです。

テスト済み：ドイツ語Windows XP SP3

newtover · Answer

最近、UnicodeをPythonからWindowsコンソールにストリーミングしたかったので、作成に必要な最小限のものを以下に示します。

コンソールフォントは、Unicodeシンボルをカバーするフォントに設定する必要があります。幅広い選択肢はありません：コンソールのプロパティ>フォント> Lucida Console
現在のコンソールのコードページを変更する必要があります：run chcp 65001コンソールで、またはC++コードで対応するメソッドを使用します
writeConsoleWを使用してコンソールに書き込みます

WindowsコンソールのJavaユニコードに関する興味深い記事をご覧ください。

その上、Pythonこの場合、デフォルトのsys.stdoutに書き込むことはできません。os.write（1、binarystring）を使用して何かを置き換えるか、ラッパーを直接呼び出す必要があります。 WriteConsoleW。C++の場合と同じように思われます。

Victor Gubin · Answer

Mswcrtおよびioストリームにはいくつかの問題があります。

トリック_setmode（_fileno（stdout）、_O_U16TEXT）; MinVC-GCCではなく、MS VC++でのみ動作します。さらに、Windowsの構成によってはクラッシュすることもあります。
UTF-8のSetConsoleCP（65001）。多くのマルチバイト文字のシナリオで失敗する可能性がありますが、UTF-16LEでは常に問題ありません
アプリケーションの終了時にプレビューコンソールのコードページを復元する必要があります。

Windowsコンソールは、UTF-16LEモードのReadConsoleおよびWriteConsole関数でUNICODEをサポートします。バックグラウンド効果-この場合のパイピングは機能しません。つまりmyapp.exe >> ret.logは、0バイトのret.logファイルになります。この事実に問題がなければ、私のライブラリを次のように試すことができます。

const char* umessage = "Hello!
Привет!
Привіт!
Χαιρετίσματα!
Helló!
Hallå!
"; ... #include <console.hpp> #include <ios> ... std::ostream& cout = io::console::out_stream(); cout << umessage << 1234567890ull << '
' << 123456.78e+09 << '
' << 12356.789e+10L << '
' << std::hex << 0xCAFEBABE << std::endl;

ライブラリは、UTF-8をUTF-16LEに自動変換し、WriteConsoleを使用してコンソールに書き込みます。エラーと入力ストリームもあります。別のライブラリの利点-色。

サンプルアプリのリンク： https://github.com/incoder1/IO/tree/master/examples/iostreams

ライブラリのホームページ： https://github.com/incoder1/IO

スクリーンショット：

Afriza N. Arief · Answer

まず、申し訳ありませんが、おそらく必要なフォントがないため、まだテストできません。

ここで少し怪しげに見える

// the following is said to be working SetConsoleOutputCP(CP_UTF8); // output is in UTF8 wchar_t s[] = L"èéøÞǽлљΣæča"; int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL); char* m = new char[bufferSize]; WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL); wprintf(L"%S", m); // <-- upper case %S in wprintf() is used for MultiByte/utf-8 // lower case %s in wprintf() is used for WideChar printf("%s", m); // <-- does this work as well? try it to verify my assumption

ながら

// the following is said to have problem SetConsoleOutputCP(CP_UTF8); utf8_locale = locale(old_locale, new boost::program_options::detail::utf8_codecvt_facet()); wcout.imbue(utf8_locale); wcout << L"¡Hola!" << endl; // <-- you are passing wide char. // have you tried passing the multibyte equivalent by converting to utf8 first? int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL); char* m = new char[bufferSize]; WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL); cout << m << endl;

どう？

// without setting locale to UTF8, you pass WideChars wcout << L"¡Hola!" << endl; // set locale to UTF8 and use cout SetConsoleOutputCP(CP_UTF8); cout << utf8_encoded_by_converting_using_WideCharToMultiByte << endl;

Joma · Answer

デフォルトのエンコード：

Windows UTF-16。

Linux UTF-8。

MacOS UTF-8。

私の解決策の手順には、ヌル文字\ 0（切り捨てを避ける）が含まれています。 windows.hヘッダーで関数を使用しない場合：

プラットフォームを検出するマクロを追加します。

#if defined (_WIN32) #define WINDOWSLIB 1 #Elif defined (__Android__) || defined(Android)//Android #define ANDROIDLIB 1 #Elif defined (__Apple__)//iOS, Mac OS #define MACOSLIB 1 #Elif defined (__LINUX__) || defined(__gnu_linux__) || defined(__linux__)//_Ubuntu - Fedora - Centos - RedHat #define LINUXLIB 1 #endif

変換関数std :: w stringをstd :: stringに、またはその逆に作成します。

#include <locale> #include <iostream> #include <string> #ifdef WINDOWSLIB #include <Windows.h> #endif using namespace std::literals::string_literals; // Convert std::wstring to std::string std::string WidestringToString(const std::wstring& wstr, const std::string& locale) { if (wstr.empty()) { return std::string(); } size_t pos; size_t begin = 0; std::string ret; size_t size; #ifdef WINDOWSLIB _locale_t lc = _create_locale(LC_ALL, locale.c_str()); pos = wstr.find(static_cast<wchar_t>(0), begin); while (pos != std::wstring::npos && begin < wstr.length()) { std::wstring segment = std::wstring(&wstr[begin], pos - begin); _wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc); std::string converted = std::string(size, 0); _wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc); ret.append(converted); begin = pos + 1; pos = wstr.find(static_cast<wchar_t>(0), begin); } if (begin <= wstr.length()) { std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin); _wcstombs_s_l(&size, nullptr, 0, &segment[0], _TRUNCATE, lc); std::string converted = std::string(size, 0); _wcstombs_s_l(&size, &converted[0], size, &segment[0], _TRUNCATE, lc); converted.resize(size - 1); ret.append(converted); } _free_locale(lc); #Elif defined LINUXLIB std::string currentLocale = setlocale(LC_ALL, nullptr); setlocale(LC_ALL, locale.c_str()); pos = wstr.find(static_cast<wchar_t>(0), begin); while (pos != std::wstring::npos && begin < wstr.length()) { std::wstring segment = std::wstring(&wstr[begin], pos - begin); size = wcstombs(nullptr, segment.c_str(), 0); std::string converted = std::string(size, 0); wcstombs(&converted[0], segment.c_str(), converted.size()); ret.append(converted); ret.append({ 0 }); begin = pos + 1; pos = wstr.find(static_cast<wchar_t>(0), begin); } if (begin <= wstr.length()) { std::wstring segment = std::wstring(&wstr[begin], wstr.length() - begin); size = wcstombs(nullptr, segment.c_str(), 0); std::string converted = std::string(size, 0); wcstombs(&converted[0], segment.c_str(), converted.size()); ret.append(converted); } setlocale(LC_ALL, currentLocale.c_str()); #Elif defined MACOSLIB #endif return ret; } // Convert std::string to std::wstring std::wstring StringToWideString(const std::string& str, const std::string& locale) { if (str.empty()) { return std::wstring(); } size_t pos; size_t begin = 0; std::wstring ret; size_t size; #ifdef WINDOWSLIB _locale_t lc = _create_locale(LC_ALL, locale.c_str()); pos = str.find(static_cast<char>(0), begin); while (pos != std::string::npos) { std::string segment = std::string(&str[begin], pos - begin); std::wstring converted = std::wstring(segment.size() + 1, 0); _mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc); converted.resize(size - 1); ret.append(converted); ret.append({ 0 }); begin = pos + 1; pos = str.find(static_cast<char>(0), begin); } if (begin < str.length()) { std::string segment = std::string(&str[begin], str.length() - begin); std::wstring converted = std::wstring(segment.size() + 1, 0); _mbstowcs_s_l(&size, &converted[0], converted.size(), &segment[0], _TRUNCATE, lc); converted.resize(size - 1); ret.append(converted); } _free_locale(lc); #Elif defined LINUXLIB std::string currentLocale = setlocale(LC_ALL, nullptr); setlocale(LC_ALL, locale.c_str()); pos = str.find(static_cast<char>(0), begin); while (pos != std::string::npos) { std::string segment = std::string(&str[begin], pos - begin); std::wstring converted = std::wstring(segment.size(), 0); size = mbstowcs(&converted[0], &segment[0], converted.size()); converted.resize(size); ret.append(converted); ret.append({ 0 }); begin = pos + 1; pos = str.find(static_cast<char>(0), begin); } if (begin < str.length()) { std::string segment = std::string(&str[begin], str.length() - begin); std::wstring converted = std::wstring(segment.size(), 0); size = mbstowcs(&converted[0], &segment[0], converted.size()); converted.resize(size); ret.append(converted); } setlocale(LC_ALL, currentLocale.c_str()); #Elif defined MACOSLIB #endif return ret; }

Std :: stringを出力します。 RawString Suffix を確認してください。

Linuxコード。 std :: coutを使用してstd :: stringを直接印刷します。
std :: wstringがある場合。
1。 std :: stringに変換します。
2。 std :: coutで印刷します。

std::wstring x = L"\0\001日本ABC\0DE\0F\0G????\0"s; std::string result = WidestringToString(x, "en_US.UTF-8"); std::cout << "RESULT=" << result << std::endl; std::cout << "RESULT_SIZE=" << result.size() << std::endl;

WindowsでUnicodeを印刷する必要がある場合。 std :: wstringまたはstd :: stringからUnicode文字を出力するには WriteConsole を使用する必要があります。

void WriteUnicodeLine(const std::string& s) { #ifdef WINDOWSLIB WriteUnicode(s); std::cout << std::endl; #Elif defined LINUXLIB std::cout << s << std::endl; #Elif defined MACOSLIB #endif } void WriteUnicode(const std::string& s) { #ifdef WINDOWSLIB std::wstring unicode = Insane::String::Strings::StringToWideString(s); WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr); #Elif defined LINUXLIB std::cout << s; #Elif defined MACOSLIB #endif } void WriteUnicodeLineW(const std::wstring& ws) { #ifdef WINDOWSLIB WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr); std::cout << std::endl; #Elif defined LINUXLIB std::cout << String::Strings::WidestringToString(ws)<<std::endl; #Elif defined MACOSLIB #endif } void WriteUnicodeW(const std::wstring& ws) { #ifdef WINDOWSLIB WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), ws.c_str(), static_cast<DWORD>(ws.length()), nullptr, nullptr); #Elif defined LINUXLIB std::cout << String::Strings::WidestringToString(ws); #Elif defined MACOSLIB #endif }

Windowsコード。 WriteLineUnicodeまたはWriteUnicode関数を使用します。 Linuxでも同じコードを使用できます。

std::wstring x = L"\0\001日本ABC\0DE\0F\0G????\0"s; std::string result = WidestringToString(x, "en_US.UTF-8"); WriteLineUnicode(u8"RESULT" + result); WriteLineUnicode(u8"RESULT_SIZE" + std::to_string(result.size()));

最後にWindowsで。コンソールでUnicode文字を強力かつ完全にサポートする必要があります。 ConEm を推奨し、 Windowsのデフォルト端末として設定します。

Microsoft Visual StudioおよびJetbrains Clionでテストします。

VC++を使用してMicrosoft Visual Studio 2017でテスト。 std = c ++ 17。（Windowsプロジェクト）

G ++を使用してMicrosoft Visual Studio 2017でテスト。 std = c ++ 17。（Linuxプロジェクト）

Jetbrains Clion 2018.3でg ++を使用してテスト済み。 std = c ++ 17。（Linuxツールチェーン/リモート）

QA

Q.なぜ<codecvt>ヘッダー関数とクラスを使用しないのですか？.
A.廃止削除または廃止された機能 VC++でのビルドは不可能ですが、g ++では問題ありません。警告と頭痛が0であるのが好きです。

Q. Windowsのwstringはinterchanです。
A.廃止削除または廃止された機能 VC++でのビルドは不可能ですが、g ++では問題ありません。警告と頭痛が0であるのが好きです。

Q. std :: wstringはクロスプラットフォームですか？
A.いいえ。std:: wstringはwchar_t要素を使用します。 Windowsでは、wchar_tのサイズは2バイトで、各文字はUTF-16単位で保存されます。文字がU + FFFFより大きい場合、文字はサロゲートペアと呼ばれる2つのUTF-16単位（2 wchar_t要素）で表されます。 Linuxでは、wchar_tのサイズは4バイトで、各文字は1つのwchar_t要素に格納されます。サロゲートペアは不要です。 NIX、Linux、およびWindowsの標準データ型を確認してください。

Q. std :: stringはクロスプラットフォームですか？
A.はい。 std :: stringはchar要素を使用します。すべてのコンパイラで同じバイトサイズのchar型が保証されます。 char型のサイズは1バイトです。 NIX、Linux、およびWindowsの標準データ型を確認してください。

call me Steve · Answer

簡単な答えはないと思います。 Console Code Pages および SetConsoleCP Function を見ると、出力する文字セットに適切なコードページを設定する必要があるようです。