Java同一の出力を生成するJavaScriptのencodeURIComponentと同等？

Question

Javaコードのさまざまなビットを試し、引用符、スペース、「エキゾチックな」Unicode文字を含む文字列をエンコードし、JavaScriptの出力と同じ出力を生成するコードを作成しようとしています。 encodeURIComponent 関数。

私の拷問テスト文字列は： "A" B± "

Firebugで次のJavaScriptステートメントを入力した場合：

encodeURIComponent('"A" B ± "');

—それから私は得る：

"%22A%22%20B%20%C2%B1%20%22"

ここに私の小さなテストJavaプログラム：

import Java.io.UnsupportedEncodingException; import Java.net.URLEncoder; public class EncodingTest { public static void main(String[] args) throws UnsupportedEncodingException { String s = "\"A\" B ± \""; System.out.println("URLEncoder.encode returns " + URLEncoder.encode(s, "UTF-8")); System.out.println("getBytes returns " + new String(s.getBytes("UTF-8"), "ISO-8859-1")); } }

-このプログラムの出力：

URLEncoder.encodeは％22A％22 + B +％C2％B1 +％22 getBytesが "A" B± "を返します

閉じますが、葉巻はありません！ JavaScriptのencodeURIComponentと同じ出力を生成するためにJavaを使用してUTF-8文字列をエンコードする最良の方法は何ですか？

EDIT：Java 1.4からJava 5に移動します。

Tomalak · Accepted Answer

実装の違いを見ると、次のことがわかります。

encodeURIComponent()のMDC ：

リテラル文字（正規表現）：[-a-zA-Z0-9._*~'()!]

URLEncoderに関するJava 1.5.0ドキュメント：

リテラル文字（正規表現）：_[-a-zA-Z0-9._*]_
スペース文字_" "_はプラス記号_"+"_に変換されます。

したがって、基本的に、目的の結果を得るには、URLEncoder.encode(s, "UTF-8")を使用してから、後処理を行います。

出現するすべての_"+"_を_"%20"_に置き換えます
[~'()!]のいずれかを表す_"%xx"_のすべてのオカレンスをリテラルのカウンターパートに戻します

John Topley · Answer

これは私が最後に思いついたクラスです：

import Java.io.UnsupportedEncodingException; import Java.net.URLDecoder; import Java.net.URLEncoder; /** * Utility class for JavaScript compatible UTF-8 encoding and decoding. * * @see http://stackoverflow.com/questions/607176/Java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output * @author John Topley */ public class EncodingUtil { /** * Decodes the passed UTF-8 String using an algorithm that's compatible with * JavaScript's <code>decodeURIComponent</code> function. Returns * <code>null</code> if the String is <code>null</code>. * * @param s The UTF-8 encoded String to be decoded * @return the decoded String */ public static String decodeURIComponent(String s) { if (s == null) { return null; } String result = null; try { result = URLDecoder.decode(s, "UTF-8"); } // This exception should never occur. catch (UnsupportedEncodingException e) { result = s; } return result; } /** * Encodes the passed String as UTF-8 using an algorithm that's compatible * with JavaScript's <code>encodeURIComponent</code> function. Returns * <code>null</code> if the String is <code>null</code>. * * @param s The String to be encoded * @return the encoded String */ public static String encodeURIComponent(String s) { String result = null; try { result = URLEncoder.encode(s, "UTF-8") .replaceAll("\+", "%20") .replaceAll("\%21", "!") .replaceAll("\%27", "'") .replaceAll("\%28", "(") .replaceAll("\%29", ")") .replaceAll("\%7E", "~"); } // This exception should never occur. catch (UnsupportedEncodingException e) { result = s; } return result; } /** * Private constructor to prevent this class from being instantiated. */ private EncodingUtil() { super(); } }

Ravi Wallau · Answer

Java 6：に同梱されているJavaScriptエンジンを使用します。

 import javax.script.ScriptEngine; import javax.script.ScriptEngineManager; public class Wow { public static void main(String[] args) throws Exception { ScriptEngineManager factory = new ScriptEngineManager(); ScriptEngine engine = factory.getEngineByName("JavaScript"); engine.eval("print(encodeURIComponent('\"A\" B ± \"'))"); } }

出力：％22A％22％20B％20％c2％b1％20％22

ケースは異なりますが、望みに近いものです。

Chris Nitchie · Answer

私はJava.net.URI#getRawPath()を使用しています。

_String s = "a+b c.html"; String fixed = new URI(null, null, s, null).getRawPath(); _

fixedの値は_a+b%20c.html_になります。これはあなたが望むものです。

URLEncoder.encode()の出力を後処理すると、想定であるプラスがURIに含まれるすべてのプラスが消去されます。例えば

_URLEncoder.encode("a+b c.html").replaceAll("\+", "%20"); _

_a%20b%20c.html_が得られ、これは_a b c.html_として解釈されます。

Joe Mill · Answer

独自のバージョンのencodeURIComponentを思い付きました。これは、投稿されたソリューションに1つの問題があるためです。

だからここに私のクラスがあります：

import Java.io.UnsupportedEncodingException; import Java.util.BitSet; public final class EscapeUtils { /** used for the encodeURIComponent function */ private static final BitSet dontNeedEncoding; static { dontNeedEncoding = new BitSet(256); // a-z for (int i = 97; i <= 122; ++i) { dontNeedEncoding.set(i); } // A-Z for (int i = 65; i <= 90; ++i) { dontNeedEncoding.set(i); } // 0-9 for (int i = 48; i <= 57; ++i) { dontNeedEncoding.set(i); } // '()* for (int i = 39; i <= 42; ++i) { dontNeedEncoding.set(i); } dontNeedEncoding.set(33); // ! dontNeedEncoding.set(45); // - dontNeedEncoding.set(46); // . dontNeedEncoding.set(95); // _ dontNeedEncoding.set(126); // ~ } /** * A Utility class should not be instantiated. */ private EscapeUtils() { } /** * Escapes all characters except the following: alphabetic, decimal digits, - _ . ! ~ * ' ( ) * * @param input * A component of a URI * @return the escaped URI component */ public static String encodeURIComponent(String input) { if (input == null) { return input; } StringBuilder filtered = new StringBuilder(input.length()); char c; for (int i = 0; i < input.length(); ++i) { c = input.charAt(i); if (dontNeedEncoding.get(c)) { filtered.append(c); } else { final byte[] b = charToBytesUTF(c); for (int j = 0; j < b.length; ++j) { filtered.append('%'); filtered.append("0123456789ABCDEF".charAt(b[j] >> 4 & 0xF)); filtered.append("0123456789ABCDEF".charAt(b[j] & 0xF)); } } } return filtered.toString(); } private static byte[] charToBytesUTF(char c) { try { return new String(new char[] { c }).getBytes("UTF-8"); } catch (UnsupportedEncodingException e) { return new byte[] { (byte) c }; } } }

sangupta · Answer

http://blog.sangupta.com/2010/05/encodeuricomponent-and.html で文書化された別の実装を思いつきました。実装はUnicodeバイトも処理できます。

silver · Answer

これは、Ravi Wallauのソリューションの簡単な例です。

_public String buildSafeURL(String partialURL, String documentName) throws ScriptException { ScriptEngineManager scriptEngineManager = new ScriptEngineManager(); ScriptEngine scriptEngine = scriptEngineManager .getEngineByName("JavaScript"); String urlSafeDocumentName = String.valueOf(scriptEngine .eval("encodeURIComponent('" + documentName + "')")); String safeURL = partialURL + urlSafeDocumentName; return safeURL; } public static void main(String[] args) { EncodeURIComponentDemo demo = new EncodeURIComponentDemo(); String partialURL = "https://www.website.com/document/"; String documentName = "Tom & Jerry Manuscript.pdf"; try { System.out.println(demo.buildSafeURL(partialURL, documentName)); } catch (ScriptException se) { se.printStackTrace(); } } _

出力： _https://www.website.com/document/Tom%20%26%20Jerry%20Manuscript.pdf_

また、String変数をencodeURIComponent()に渡す方法に関するLoren Shqipognjaのコメントにある懸案の質問にも答えています。メソッドscriptEngine.eval()はObjectを返すため、他のメソッドの中でもString.valueOf()を介してStringに変換できます。

Nuno Cruces · Answer

これは私が使用しているものです：

private static final String HEX = "0123456789ABCDEF"; public static String encodeURIComponent(String str) { if (str == null) return null; byte[] bytes = str.getBytes(StandardCharsets.UTF_8); StringBuilder builder = new StringBuilder(bytes.length); for (byte c : bytes) { if (c >= 'a' ? c <= 'z' || c == '~' : c >= 'A' ? c <= 'Z' || c == '_' : c >= '0' ? c <= '9' : c == '-' || c == '.') builder.append((char)c); else builder.append('%') .append(HEX.charAt(c >> 4 & 0xf)) .append(HEX.charAt(c & 0xf)); } return builder.toString(); }

RFC 3986 に従って、予約されていない文字ではないすべての文字をパーセントエンコードすることにより、Javascriptを超えています。

これは逆変換です：

public static String decodeURIComponent(String str) { if (str == null) return null; int length = str.length(); byte[] bytes = new byte[length / 3]; StringBuilder builder = new StringBuilder(length); for (int i = 0; i < length; ) { char c = str.charAt(i); if (c != '%') { builder.append(c); i += 1; } else { int j = 0; do { char h = str.charAt(i + 1); char l = str.charAt(i + 2); i += 3; h -= '0'; if (h >= 10) { h |= ' '; h -= 'a' - '0'; if (h >= 6) throw new IllegalArgumentException(); h += 10; } l -= '0'; if (l >= 10) { l |= ' '; l -= 'a' - '0'; if (l >= 6) throw new IllegalArgumentException(); l += 10; } bytes[j++] = (byte)(h << 4 | l); if (i >= length) break; c = str.charAt(i); } while (c == '%'); builder.append(new String(bytes, 0, j, UTF_8)); } } return builder.toString(); }

balazs · Answer

私にとってこれはうまくいきました：

import org.Apache.http.client.utils.URIBuilder; String encodedString = new URIBuilder() .setParameter("i", stringToEncode) .build() .getRawQuery() // output: i=encodedString .substring(2);

または別のUriBuilderで

import javax.ws.rs.core.UriBuilder; String encodedString = UriBuilder.fromPath("") .queryParam("i", stringToEncode) .toString() // output: ?i=encodedString .substring(3);

私の意見では、標準ライブラリを使用することは、後処理を手動で行うよりも良いアイデアです。また、@ Chrisの回答は良さそうに見えましたが、「 http：// a + b c.html」のようなURLでは機能しません

AlexN · Answer

String encodedUrl = new URI(null, url, null).toASCIIString();を使用してURLをエンコードしました。 urlの既存のパラメーターの後にパラメーターを追加するには、UriComponentsBuilderを使用します

Mike Bryant · Answer

Java.net.URIクラスを次のように正常に使用しました。

public static String uriEncode(String string) { String result = string; if (null != string) { try { String scheme = null; String ssp = string; int es = string.indexOf(':'); if (es > 0) { scheme = string.substring(0, es); ssp = string.substring(es + 1); } result = (new URI(scheme, ssp, null)).toString(); } catch (URISyntaxException usex) { // ignore and use string that has syntax error } } return result; }

Aliaksei Nikuliak · Answer

グアバライブラリにはPercentEscaperがあります。

Escaper percentEscaper = new PercentEscaper("-_.*", false);

「-_。*」は安全な文字です

falseは、PercentEscaperが「+」ではなく「％20」でスペースをエスケープすることを示します

honzajde · Answer

EncodeURIComponentを非常に簡単に実装するために使用できるgoogle-http-Java-clientライブラリからPercentEscaperクラスを見つけました。

google-http-Java-client javadocからのPercentEscaper google-http-Java-client home