UTF-8でエンコードされた文字列があります。例えば:
Thats a Nice joke ???????????? ????
文に存在するすべての絵文字を抽出する必要があります。絵文字はどんなものでも構いません
コマンドless text.txt
を使用してこの文を端末で表示すると、次のように表示されます。
Thats a Nice joke <U+1F606><U+1F606><U+1F606> <U+1F61B>
これは、絵文字に対応するUTFコードです。絵文字のすべてのコードは emojitracker にあります。
すべての出現を見つけるために、正規表現パターン(<U\+\w+?>)
を使用しましたが、UTF-8エンコード文字列では機能しませんでした。
以下は私のコードです:
String s="Thats a Nice joke ???????????? ????";
Pattern pattern = Pattern.compile("(<U\\+\\w+?>)");
Matcher matcher = pattern.matcher(s);
List<String> matchList = new ArrayList<String>();
while (matcher.find()) {
matchList.add(matcher.group());
}
for(int i=0;i<matchList.size();i++){
System.out.println(matchList.get(i));
}
この pdf はRange: 1F300–1F5FF for Miscellaneous Symbols and Pictographs
と言います。そのため、この範囲内にあるキャラクターをキャプチャしたいと思います。
先ほど言及したpdf は、Range:1F300–1F5FFで、その他の記号と絵文字を表します。この範囲内にあるキャラクターをキャプチャしたいとします。今何をする?
さて、質問の絵文字はその範囲外であることに注意してください! :-)
これらが0xFFFF
の上にあるという事実は、Java文字列がUTF-16を格納するため、事態を複雑にします。したがって、1つの単純な文字クラスを使用することはできません。 havesurrogate pair。(その他: http://www.unicode.org/faq/utf_bom.html )
UTF-16のU + 1F300は\uD83C\uDF00
;のペアになります。 U + 1F5FFは\uD83D\uDDFF
になります。最初のキャラクターが上がって、少なくとも1つの境界を越えていることに注意してください。したがって、探しているサロゲートペアの範囲を知る必要があります。
UTF-16の内部動作についての知識に浸っていないので、見つけるためのプログラムを作成しました(最後にソースがあります。私を信頼するのではなく、あなたであるかどうかを再確認します)。 \uD83C
の後に\uDF00-\uDFFF
(包括的)の範囲内の何か、または\uD83D
の後に\uDC00-\uDDFF
(包括的)の範囲内の何かを探していることを示しています。
その知識を武器に、理論的にはパターンを書くことができました。
// This is wrong, keep reading
Pattern p = Pattern.compile("(?:\uD83C[\uDF00-\uDFFF])|(?:\uD83D[\uDC00-\uDDFF])");
これは、キャプチャされていない2つのグループ、つまり\uD83C
で始まるペアの最初のグループと、\uD83D
で始まるペアの2番目のグループの交互です。
しかし、それは失敗します(何も見つかりません)。さまざまなサロゲートペアのhalfを指定しようとしているためだと確信しています場所:
Pattern p = Pattern.compile("(?:\uD83C[\uDF00-\uDFFF])|(?:\uD83D[\uDC00-\uDDFF])");
// Half of a pair --------------^------^------^-----------^------^------^
そのようなサロゲートペアを単に分割することはできません。サロゲートpairsと呼ばれる理由があります。 :-)
したがって、これには正規表現(または実際、文字列ベースのアプローチ)を使用できないと思います。 char
配列を検索する必要があると思います。
char
配列はUTF-16値を保持するため、can困難な方法で検索すると、データ内のこれらの半ペアを見つけることができます。
String s = new StringBuilder()
.append("Thats a Nice joke ")
.appendCodePoint(0x1F606)
.appendCodePoint(0x1F606)
.appendCodePoint(0x1F606)
.append(" ")
.appendCodePoint(0x1F61B)
.toString();
char[] chars = s.toCharArray();
int index;
char ch1;
char ch2;
index = 0;
while (index < chars.length - 1) { // -1 because we're looking for two-char-long things
ch1 = chars[index];
if ((int)ch1 == 0xD83C) {
ch2 = chars[index+1];
if ((int)ch2 >= 0xDF00 && (int)ch2 <= 0xDFFF) {
System.out.println("Found emoji at index " + index);
index += 2;
continue;
}
}
else if ((int)ch1 == 0xD83D) {
ch2 = chars[index+1];
if ((int)ch2 >= 0xDC00 && (int)ch2 <= 0xDDFF) {
System.out.println("Found emoji at index " + index);
index += 2;
continue;
}
}
++index;
}
明らかにそれは単なるデバッグレベルのコードですが、それは仕事をします。 (与えられた文字列では、絵文字はもちろん、範囲外にあるため何も見つかりません。しかし、2番目のペアの上限を0xDEFF
ではなく0xDDFF
に変更すると、それはそうなります。しかし、それが非絵文字も含むかどうかはわかりません。)
代理範囲が何であるかを調べるための私のプログラムのソース:
public class FindRanges {
public static void main(String[] args) {
char last0 = '\0';
char last1 = '\0';
for (int x = 0x1F300; x <= 0x1F5FF; ++x) {
char[] chars = new StringBuilder().appendCodePoint(x).toString().toCharArray();
if (chars[0] != last0) {
if (last0 != '\0') {
System.out.println("-\\u" + Integer.toHexString((int)last1).toUpperCase());
}
System.out.print("\\u" + Integer.toHexString((int)chars[0]).toUpperCase() + " \\u" + Integer.toHexString((int)chars[1]).toUpperCase());
last0 = chars[0];
}
last1 = chars[1];
}
if (last0 != '\0') {
System.out.println("-\\u" + Integer.toHexString((int)last1).toUpperCase());
}
}
}
出力:
\ uD83C\uDF00-\uDFFF \uD83D\uDC00-\uDDFF
emoji-Java を使用して、 fitzpatrick modifiers を含むすべての絵文字を削除する簡単なメソッドを作成しました。外部ライブラリが必要ですが、それらのモンスター正規表現よりも保守が簡単です。
つかいます:
String input = "A string ????with a \uD83D\uDC66\uD83C\uDFFFfew ????emojis!";
String result = EmojiParser.removeAllEmojis(input);
emoji-Java mavenのインストール:
<dependency>
<groupId>com.vdurmont</groupId>
<artifactId>emoji-Java</artifactId>
<version>3.1.3</version>
</dependency>
gradle:
compile 'com.vdurmont:emoji-Java:3.1.3'
編集:以前に送信された回答は絵文字Javaソースコードにプルされました。
同様の問題がありました。以下は私によく役立ち、サロゲートペアに一致します
public class SplitByUnicode {
public static void main(String[] argv) throws Exception {
String string = "Thats a Nice joke ???????????? ????";
System.out.println("Original String:"+string);
String regexPattern = "[\uD83C-\uDBFF\uDC00-\uDFFF]+";
byte[] utf8 = string.getBytes("UTF-8");
String string1 = new String(utf8, "UTF-8");
Pattern pattern = Pattern.compile(regexPattern);
Matcher matcher = pattern.matcher(string1);
List<String> matchList = new ArrayList<String>();
while (matcher.find()) {
matchList.add(matcher.group());
}
for(int i=0;i<matchList.size();i++){
System.out.println(i+":"+matchList.get(i));
}
}
}
出力は次のとおりです。
Original String:Thats a Nice joke ???????????? ????
0:????????????
1:????
https://stackoverflow.com/a/24071599/915972 から正規表現を見つけました
これはJava 8:
public static String mysqlSafe(String input) {
if (input == null) return null;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < input.length(); i++) {
if (i < (input.length() - 1)) { // Emojis are two characters long in Java, e.g. a rocket emoji is "\uD83D\uDE80";
if (Character.isSurrogatePair(input.charAt(i), input.charAt(i + 1))) {
i += 1; //also skip the second character of the emoji
continue;
}
}
sb.append(input.charAt(i));
}
return sb.toString();
}
このようにできます
String s="Thats a Nice joke ???????????? ????";
Pattern pattern = Pattern.compile("[\ud83c\udc00-\ud83c\udfff]|[\ud83d\udc00-\ud83d\udfff]|[\u2600-\u27ff]",
Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(s);
List<String> matchList = new ArrayList<String>();
while (matcher.find()) {
matchList.add(matcher.group());
}
for(int i=0;i<matchList.size();i++){
System.out.println(matchList.get(i));
}
すべての絵文字を抽出するための最適な正規表現は次のとおりです。
(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|[\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|[\ud83c[\ude32-\ude3a]|[\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])
他の回答では説明できない単一文字の絵文字を識別します。この正規表現の仕組みの詳細については、この投稿をご覧ください。 https://medium.com/@thekevinscott/emojis-in-javascript-f693d0eb79fb#.enomgcu6
標準のUnicode絵文字の範囲(ベンダーによって異なるブロックがある)を要求していると仮定すると、次の3つの範囲を検討できます。
T.J. Crowderが私たちと共有したすべての思慮深い説明に加えて、Java 7で始まることでUTF-16エンコードされたサロゲートペアを簡単に一致させることができます。
ドキュメントを見てください:
http://docs.Oracle.com/javase/7/docs/api/Java/util/regex/Pattern.html
Unicode文字は、構造体\ x {...}で説明されているように、16進表記(16進コードポイント値)を直接使用して正規表現で表すこともできます。たとえば、補助文字U + 2011Fは\ xとして指定できます{2011F}、サロゲートペア\ uD840\uDD1Fの2つの連続したUnicodeエスケープシーケンスの代わり。
それでも、Java 7に切り替えることができない場合は、グアバが提供する貴重な nicodeEscaper を拡張できます。
ここに例のための実装:
public class SimpleEscaper extends UnicodeEscaper
{
@Override
protected char[] escape(int codePoint)
{
if (0x1f000 >= codePoint && codePoint <= 0x1ffff)
{
return Integer.toHexString(codePoint).toCharArray();
}
return Character.toChars(codePoint);
}
}
絵文字正規表現
public static final String sEmojiRegex = "(?:[\\u2700-\\u27bf]|" +
"(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" +
"[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?" +
"(?:\\u200d(?:[^\\ud800-\\udfff]|" +
"(?:[\\ud83c\\udde6-\\ud83c\\uddff]){2}|" +
"[\\ud800\\udc00-\\uDBFF\\uDFFF]|[\\u2600-\\u26FF])[\\ufe0e\\ufe0f]?(?:[\\u0300-\\u036f\\ufe20-\\ufe23\\u20d0-\\u20f0]|[\\ud83c\\udffb-\\ud83c\\udfff])?)*|" +
"[\\u0023-\\u0039]\\ufe0f?\\u20e3|\\u3299|\\u3297|\\u303d|\\u3030|\\u24c2|[\\ud83c\\udd70-\\ud83c\\udd71]|[\\ud83c\\udd7e-\\ud83c\\udd7f]|\\ud83c\\udd8e|[\\ud83c\\udd91-\\ud83c\\udd9a]|[\\ud83c\\udde6-\\ud83c\\uddff]|[\\ud83c\\ude01-\\ud83c\\ude02]|\\ud83c\\ude1a|\\ud83c\\ude2f|[\\ud83c\\ude32-\\ud83c\\ude3a]|[\\ud83c\\ude50-\\ud83c\\ude51]|\\u203c|\\u2049|[\\u25aa-\\u25ab]|\\u25b6|\\u25c0|[\\u25fb-\\u25fe]|\\u00a9|\\u00ae|\\u2122|\\u2139|\\ud83c\\udc04|[\\u2600-\\u26FF]|\\u2b05|\\u2b06|\\u2b07|\\u2b1b|\\u2b1c|\\u2b50|\\u2b55|\\u231a|\\u231b|\\u2328|\\u23cf|[\\u23e9-\\u23f3]|[\\u23f8-\\u23fa]|\\ud83c\\udccf|\\u2934|\\u2935|[\\u2190-\\u21ff]";
一部の絵文字(1627)
// count = 1627
public static final String sEmojiTest = "????????????????????????????????☺️????????????????????????????????????????????????????????????????????????????????????????????????????????????????☹️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????☠️????????????????????????????????????????????????????????????????????????????????????✊????????????✌️????????????????????????☝️✋????????????????????????????✍️????????????????????????????????????????????????????????????????????????????????????♀????????????????????♀????????♀????????♀????????♀????????️♀️????????⚕????⚕????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????✈????✈????????????????????⚖????⚖????????????????????????????????????♀????????????♂????????♂????????♂????????♂????♀????♂????♀????♂????????♂????????♂????????♂????????♂????????????????????♂????♀????????♀????????????????????????❤️????????❤️????????????❤️????????????❤️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????⛑????????????????????????????????☂️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????☘️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????⭐️????✨⚡️????????☄☀️????⛅️????????????☁️????⛈????????☃️⛄️❄️????????????????????????????☔️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????☕️????????????????????????????????????????????????????⚽️????????⚾️????????????????????????????????????????⛳️????????????????⛸????⛷????????️♀️????????????♀????♂????♀????♂⛹️♀️⛹????♀????♂????️♀️????????♀????????♀????????♀????♂????♀????????????♀????????♀????????????????????????????????????????????????????????♀????♂????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????✈️????????????????????????⛵️????????????⛴????⚓️????⛽️????????????????????????⛲️????????????????????????????⛱????????⛰????????????????????⛺️????????????????????????????????????????????????????????????????????????????????⛪️????????????⛩????????????????????????????????????????????????????????????⌚️????????????⌨️????????????????????????????????????????????????????????????????????????☎️????????????????????????????⏱⏲⏰????⌛️⏳????????????????????????????????????????????????????????????????⚖️????????⚒????⛏????⚙️⛓????????????????⚔️????????⚰️⚱️????????????????⚗️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????✉️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????✂️????????✒️????????????✏️????????????????????????❤️????????????????????????❣️????????????????????????????????☮️✝️☪️????☸️✡️????????☯️☦️????⛎♈️♉️♊️♋️♌️♍️♎️♏️♐️♑️♒️♓️????⚛️????☢️☣️????????????????️????????????️✴️????????????㊙️㊗️????????????????????️????️????????????️????❌⭕️????⛔️????????????????♨️????????????????????????????❗️❕❓❔‼️⁉️????????〽️⚠️????????⚜️????♻️✅????️????❇️✳️❎????????Ⓜ️????????????????♿️????️????????️????????????????????????????????????????????????????ℹ️????????????????????????????????????0️⃣1️⃣2️⃣3️⃣4️⃣5️⃣6️⃣7️⃣8️⃣9️⃣????????#️⃣*️⃣▶️⏸⏯⏹⏺⏭⏮⏩⏪⏫⏬◀️????????➡️⬅️⬆️⬇️↗️↘️↙️↖️↕️↔️↪️↩️⤴️⤵️????????????????????????????➕➖➗✖️????????™️©️®️〰️➰➿????????????????????✔️☑️????⚪️⚫️????????????????????????????????????????▪️▫️◾️◽️◼️◻️⬛️⬜️????????????????????????????????????????????????????♠️♣️♥️♦️????????????️????????????????????????????????????????????????????????????????????????????????????????????????????️????????????????️????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????⚽️????????⚾️????????????????????????????????????????⛳️????????????????⛸????⛷????????️♀️????????♀️????????♀️????????♀️????????♀️????????♀️????️????????????????????????????????????????????????♀️????♂️????♀️????????♀️????????♀️????????♀️????????♀️????????♀️????♂️????????♂️????????♂️????????♂️????????♂️????????♂️⛹️♀️⛹????♀️⛹????♀️⛹????♀️⛹????♀️⛹????♀️⛹️⛹????⛹????⛹????⛹????⛹????????♀️????????♀️????????♀️????????♀️????????♀️????????♀️????♂️????????♂️????????♂️????????♂️????????♂️????????♂️????️♀️????????♀️????????♀️????????♀️????????♀️????????♀️????️????????????????????????????????????????????♀️????????♀️????????♀️????????♀️????????♀️????????♀️????????????????????????????????????????????????♀️????????♀️????????♀️????????♀️????????♀️????????♀️????????????????????????????????????????????????♀️????????♀️????????♀️????????♀️????????♀️????????♀️????♂️????????♂️????????♂️????????♂️????????♂️????????♂️????♀️????????♀️????????♀️????????♀️????????♀️????????♀️????????????????????????????????????????????????????????????????????????????????????????????♀️????????♀️????????♀️????????♀️????????♀️????????♀️????????????????????????????????????????????????♀️????????♀️????????♀️????????♀️????????♀️????????♀️????????????????????????????????????????????????????????????????????????????????????????????????♀️????♂️????????????????????????????????????????????????????????????????????";
絵文字をテストする機能
public void checkMatchingEmojis() {
final Pattern pattern = Pattern.compile(sEmojiRegex);
final Matcher matcher = pattern.matcher(sEmojiTest);
int foundEmojiCount = 0;
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
foundEmojiCount++;
}
System.out.println("*******************************************");
System.out.println("Input Emoji count = 1627");
System.out.println("Captured Emoji count = " + foundEmojiCount);
System.out.println("*******************************************");
}
ここ は要点であり、すべてのUnicode 10絵文字でテスト済み
Kevin Scott に感謝します
この厄介な問題を解決するには、2つの方法があります。
1つ目は、 emoji-Java やemoji4jなどのサードパーティライブラリを使用することです。これらは上記のとおりです。メソッドcontainsEmoji
やremovesEmoji
などを簡単に使用できます。また、独自のアプリでは、これらのライブラリで更新を維持する必要があります。
私に関しては、この問題を解決するための簡単な解決策を見つけたいです。
丸1日検索した後、魔法の正規表現を見つけました。
"(?:[\uD83C\uDF00-\uD83D\uDDFF]|[\uD83E\uDD00-\uD83E\uDDFF]|[\uD83D\uDE00-\uD83D\uDE4F]|[\uD83D\uDE80-\uD83D\uDEFF]|[\u2600-\u26FF]\uFE0F?|[\u2700-\u27BF]\uFE0F?|\u24C2\uFE0F?|[\uD83C\uDDE6-\uD83C\uDDFF]{1,2}|[\uD83C\uDD70\uD83C\uDD71\uD83C\uDD7E\uD83C\uDD7F\uD83C\uDD8E\uD83C\uDD91-\uD83C\uDD9A]\uFE0F?|[\u0023\u002A\u0030-\u0039]\uFE0F?\u20E3|[\u2194-\u2199\u21A9-\u21AA]\uFE0F?|[\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55]\uFE0F?|[\u2934\u2935]\uFE0F?|[\u3030\u303D]\uFE0F?|[\u3297\u3299]\uFE0F?|[\uD83C\uDE01\uD83C\uDE02\uD83C\uDE1A\uD83C\uDE2F\uD83C\uDE32-\uD83C\uDE3A\uD83C\uDE50\uD83C\uDE51]\uFE0F?|[\u203C\u2049]\uFE0F?|[\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE]\uFE0F?|[\u00A9\u00AE]\uFE0F?|[\u2122\u2139]\uFE0F?|\uD83C\uDC04\uFE0F?|\uD83C\uDCCF\uFE0F?|[\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA]\uFE0F?)"
私はJavaでOKをテストしました。それは私の問題を完全に解決しました。
これはGithubページで表示できます。
https://github.com/zly394/EmojiRegex
ノート:
@Eric Nakagawaが提供した回答にはいくつかのエラーが含まれており、それらは適切に操作できません。
emoji4j libraryを使用することもできます。
String emojiText = "A ????, ???? and a ???? became friends. For ????'s birthday party, they all had ????s, ????s, ????s and ????.";
EmojiUtils.removeAllEmojis(emojiText);//returns "A , and a became friends. For 's birthday party, they all had s, s, s and .
正規表現を使用して解決するだけです:
s = s.replaceAll("\\p{So}+", "");
あなたはそれを見つけることができます
http://www.regular-expressions.info/unicode.html
https://docs.Oracle.com/javase/7/docs/api/Java/lang/Character.html#OTHER_SYMBOL
これは私が絵文字を削除するために使用するもので、これまでのところ、他のすべてのアルファベットを許可することが示されています。
private static String remove_Emojis(String name)
{
//we will store all the letters in this array
ArrayList<Character> nonEmoji = new ArrayList<>();
// and when we rebuild the name we will put it in here
String newName = "";
// we are going to loop through checking each character to see if its an emoji or not
for (int i = 0; i < name.length(); i++)
{
if (Character.isLetterOrDigit(name.charAt(i)))
{
nonEmoji.add(name.charAt(i));
}
else
{
// this is just a 2nd check in case the other method didn't allow some letter
if (Build.VERSION.SDK_INT > 18)
{
if (Character.isAlphabetic(name.charAt(i)))
{
nonEmoji.add(name.charAt(i));
}
}
}
if (name.charAt(i) == ' ')// may want to consider adding or '-' or '\''
{
nonEmoji.add(i);// just add it
}
if (name.charAt(i) == '@' && !name.contains(" "))// I put this in for email addresses
{
nonEmoji.add('@');
}
}
// finally just loop through building it back out
for (int i = 0; i < nonEmoji.size(); i++) {
newName += nonEmoji.get(i);
}
return newName;
}
仕様が変更されるたびに、独自の正規表現を生成できます。
このツール(スクリーンショット ここ )。
Utf-8/32モード(文字列)、拡張モードの場合:
" # Use the 'Mega-Conversion' tool to change into other syntaxes"
" # -------------------------------------------------------------"
" "
" [#*0-9] \\x{FE0F} \\x{20E3}"
" | [\\x{A9}\\x{AE}\\x{203C}\\x{2049}\\x{2122}\\x{2139}\\x{2194}-\\x{2199}\\x{21A9}\\x{21AA}\\x{231A}\\x{231B}\\x{2328}\\x{23CF}\\x{23E9}-\\x{23F3}\\x{23F8}-\\x{23FA}\\x{24C2}\\x{25AA}\\x{25AB}\\x{25B6}\\x{25C0}\\x{25FB}-\\x{25FE}\\x{2600}-\\x{2604}\\x{260E}\\x{2611}\\x{2614}\\x{2615}\\x{2618}]"
" | \\x{261D} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{2620}\\x{2622}\\x{2623}\\x{2626}\\x{262A}\\x{262E}\\x{262F}\\x{2638}-\\x{263A}\\x{2640}\\x{2642}\\x{2648}-\\x{2653}\\x{265F}\\x{2660}\\x{2663}\\x{2665}\\x{2666}\\x{2668}\\x{267B}\\x{267E}\\x{267F}\\x{2692}-\\x{2697}\\x{2699}\\x{269B}\\x{269C}\\x{26A0}\\x{26A1}\\x{26AA}\\x{26AB}\\x{26B0}\\x{26B1}\\x{26BD}\\x{26BE}\\x{26C4}\\x{26C5}\\x{26C8}\\x{26CE}\\x{26CF}\\x{26D1}\\x{26D3}\\x{26D4}\\x{26E9}\\x{26EA}\\x{26F0}-\\x{26F5}\\x{26F7}\\x{26F8}]"
" | \\x{26F9}"
" (?:"
" \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{26FA}\\x{26FD}\\x{2702}\\x{2705}\\x{2708}\\x{2709}]"
" | [\\x{270A}-\\x{270D}] [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{270F}\\x{2712}\\x{2714}\\x{2716}\\x{271D}\\x{2721}\\x{2728}\\x{2733}\\x{2734}\\x{2744}\\x{2747}\\x{274C}\\x{274E}\\x{2753}-\\x{2755}\\x{2757}\\x{2763}\\x{2764}\\x{2795}-\\x{2797}\\x{27A1}\\x{27B0}\\x{27BF}\\x{2934}\\x{2935}\\x{2B05}-\\x{2B07}\\x{2B1B}\\x{2B1C}\\x{2B50}\\x{2B55}\\x{3030}\\x{303D}\\x{3297}\\x{3299}\\x{1F004}\\x{1F0CF}\\x{1F170}\\x{1F171}\\x{1F17E}\\x{1F17F}\\x{1F18E}\\x{1F191}-\\x{1F19A}]"
" | \\x{1F1E6} [\\x{1F1E8}-\\x{1F1EC}\\x{1F1EE}\\x{1F1F1}\\x{1F1F2}\\x{1F1F4}\\x{1F1F6}-\\x{1F1FA}\\x{1F1FC}\\x{1F1FD}\\x{1F1FF}]"
" | \\x{1F1E7} [\\x{1F1E6}\\x{1F1E7}\\x{1F1E9}-\\x{1F1EF}\\x{1F1F1}-\\x{1F1F4}\\x{1F1F6}-\\x{1F1F9}\\x{1F1FB}\\x{1F1FC}\\x{1F1FE}\\x{1F1FF}]"
" | \\x{1F1E8} [\\x{1F1E6}\\x{1F1E8}\\x{1F1E9}\\x{1F1EB}-\\x{1F1EE}\\x{1F1F0}-\\x{1F1F5}\\x{1F1F7}\\x{1F1FA}-\\x{1F1FF}]"
" | \\x{1F1E9} [\\x{1F1EA}\\x{1F1EC}\\x{1F1EF}\\x{1F1F0}\\x{1F1F2}\\x{1F1F4}\\x{1F1FF}]"
" | \\x{1F1EA} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}\\x{1F1EC}\\x{1F1ED}\\x{1F1F7}-\\x{1F1FA}]"
" | \\x{1F1EB} [\\x{1F1EE}-\\x{1F1F0}\\x{1F1F2}\\x{1F1F4}\\x{1F1F7}]"
" | \\x{1F1EC} [\\x{1F1E6}\\x{1F1E7}\\x{1F1E9}-\\x{1F1EE}\\x{1F1F1}-\\x{1F1F3}\\x{1F1F5}-\\x{1F1FA}\\x{1F1FC}\\x{1F1FE}]"
" | \\x{1F1ED} [\\x{1F1F0}\\x{1F1F2}\\x{1F1F3}\\x{1F1F7}\\x{1F1F9}\\x{1F1FA}]"
" | \\x{1F1EE} [\\x{1F1E8}-\\x{1F1EA}\\x{1F1F1}-\\x{1F1F4}\\x{1F1F6}-\\x{1F1F9}]"
" | \\x{1F1EF} [\\x{1F1EA}\\x{1F1F2}\\x{1F1F4}\\x{1F1F5}]"
" | \\x{1F1F0} [\\x{1F1EA}\\x{1F1EC}-\\x{1F1EE}\\x{1F1F2}\\x{1F1F3}\\x{1F1F5}\\x{1F1F7}\\x{1F1FC}\\x{1F1FE}\\x{1F1FF}]"
" | \\x{1F1F1} [\\x{1F1E6}-\\x{1F1E8}\\x{1F1EE}\\x{1F1F0}\\x{1F1F7}-\\x{1F1FB}\\x{1F1FE}]"
" | \\x{1F1F2} [\\x{1F1E6}\\x{1F1E8}-\\x{1F1ED}\\x{1F1F0}-\\x{1F1FF}]"
" | \\x{1F1F3} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}-\\x{1F1EC}\\x{1F1EE}\\x{1F1F1}\\x{1F1F4}\\x{1F1F5}\\x{1F1F7}\\x{1F1FA}\\x{1F1FF}]"
" | \\x{1F1F4} \\x{1F1F2}"
" | \\x{1F1F5} [\\x{1F1E6}\\x{1F1EA}-\\x{1F1ED}\\x{1F1F0}-\\x{1F1F3}\\x{1F1F7}-\\x{1F1F9}\\x{1F1FC}\\x{1F1FE}]"
" | \\x{1F1F6} \\x{1F1E6}"
" | \\x{1F1F7} [\\x{1F1EA}\\x{1F1F4}\\x{1F1F8}\\x{1F1FA}\\x{1F1FC}]"
" | \\x{1F1F8} [\\x{1F1E6}-\\x{1F1EA}\\x{1F1EC}-\\x{1F1F4}\\x{1F1F7}-\\x{1F1F9}\\x{1F1FB}\\x{1F1FD}-\\x{1F1FF}]"
" | \\x{1F1F9} [\\x{1F1E6}\\x{1F1E8}\\x{1F1E9}\\x{1F1EB}-\\x{1F1ED}\\x{1F1EF}-\\x{1F1F4}\\x{1F1F7}\\x{1F1F9}\\x{1F1FB}\\x{1F1FC}\\x{1F1FF}]"
" | \\x{1F1FA} [\\x{1F1E6}\\x{1F1EC}\\x{1F1F2}\\x{1F1F3}\\x{1F1F8}\\x{1F1FE}\\x{1F1FF}]"
" | \\x{1F1FB} [\\x{1F1E6}\\x{1F1E8}\\x{1F1EA}\\x{1F1EC}\\x{1F1EE}\\x{1F1F3}\\x{1F1FA}]"
" | \\x{1F1FC} [\\x{1F1EB}\\x{1F1F8}]"
" | \\x{1F1FD} \\x{1F1F0}"
" | \\x{1F1FE} [\\x{1F1EA}\\x{1F1F9}]"
" | \\x{1F1FF} [\\x{1F1E6}\\x{1F1F2}\\x{1F1FC}]"
" | [\\x{1F201}\\x{1F202}\\x{1F21A}\\x{1F22F}\\x{1F232}-\\x{1F23A}\\x{1F250}\\x{1F251}\\x{1F300}-\\x{1F321}\\x{1F324}-\\x{1F384}]"
" | \\x{1F385} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F386}-\\x{1F393}\\x{1F396}\\x{1F397}\\x{1F399}-\\x{1F39B}\\x{1F39E}-\\x{1F3C1}]"
" | \\x{1F3C2} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F3C3}\\x{1F3C4}]"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F3C5}\\x{1F3C6}]"
" | \\x{1F3C7} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F3C8}\\x{1F3C9}]"
" | \\x{1F3CA}"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F3CB}\\x{1F3CC}]"
" (?:"
" \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F3CD}-\\x{1F3F0}]"
" | \\x{1F3F3}"
" (?: \\x{FE0F} \\x{200D} \\x{1F308} )?"
" | \\x{1F3F4}"
" (?:"
" \\x{200D} \\x{2620} \\x{FE0F}"
" | \\x{E0067} \\x{E0062}"
" (?:"
" \\x{E0065} \\x{E006E} \\x{E0067}"
" | \\x{E0073} \\x{E0063} \\x{E0074}"
" | \\x{E0077} \\x{E006C} \\x{E0073}"
" )"
" \\x{E007F}"
" )?"
" | [\\x{1F3F5}\\x{1F3F7}-\\x{1F440}]"
" | \\x{1F441}"
" (?: \\x{FE0F} \\x{200D} \\x{1F5E8} \\x{FE0F} )?"
" | [\\x{1F442}\\x{1F443}] [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F444}\\x{1F445}]"
" | [\\x{1F446}-\\x{1F450}] [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F451}-\\x{1F465}]"
" | [\\x{1F466}\\x{1F467}] [\\x{1F3FB}-\\x{1F3FF}]?"
" | \\x{1F468}"
" (?:"
" \\x{200D}"
" (?:"
" [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
" | \\x{2764} \\x{FE0F} \\x{200D}"
" (?: \\x{1F48B} \\x{200D} )?"
" \\x{1F468}"
" | [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}]"
" | \\x{1F466}"
" (?: \\x{200D} \\x{1F466} )?"
" | \\x{1F467}"
" (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
" | [\\x{1F468}\\x{1F469}] \\x{200D}"
" (?:"
" \\x{1F466}"
" (?: \\x{200D} \\x{1F466} )?"
" | \\x{1F467}"
" (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
" )"
" | [\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
" )"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?:"
" \\x{200D}"
" (?:"
" [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
" | [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
" )"
" )?"
" )?"
" | \\x{1F469}"
" (?:"
" \\x{200D}"
" (?:"
" [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
" | \\x{2764} \\x{FE0F} \\x{200D}"
" (?: \\x{1F48B} \\x{200D} )?"
" [\\x{1F468}\\x{1F469}]"
" | [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}]"
" | \\x{1F466}"
" (?: \\x{200D} \\x{1F466} )?"
" | \\x{1F467}"
" (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
" | \\x{1F469} \\x{200D}"
" (?:"
" \\x{1F466}"
" (?: \\x{200D} \\x{1F466} )?"
" | \\x{1F467}"
" (?: \\x{200D} [\\x{1F466}\\x{1F467}] )?"
" )"
" | [\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
" )"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?:"
" \\x{200D}"
" (?:"
" [\\x{2695}\\x{2696}\\x{2708}] \\x{FE0F}"
" | [\\x{1F33E}\\x{1F373}\\x{1F393}\\x{1F3A4}\\x{1F3A8}\\x{1F3EB}\\x{1F3ED}\\x{1F4BB}\\x{1F4BC}\\x{1F527}\\x{1F52C}\\x{1F680}\\x{1F692}\\x{1F9B0}-\\x{1F9B3}]"
" )"
" )?"
" )?"
" | [\\x{1F46A}-\\x{1F46D}]"
" | \\x{1F46E}"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | \\x{1F46F}"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" | \\x{1F470} [\\x{1F3FB}-\\x{1F3FF}]?"
" | \\x{1F471}"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | \\x{1F472} [\\x{1F3FB}-\\x{1F3FF}]?"
" | \\x{1F473}"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F474}-\\x{1F476}] [\\x{1F3FB}-\\x{1F3FF}]?"
" | \\x{1F477}"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | \\x{1F478} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F479}-\\x{1F47B}]"
" | \\x{1F47C} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F47D}-\\x{1F480}]"
" | [\\x{1F481}\\x{1F482}]"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | \\x{1F483} [\\x{1F3FB}-\\x{1F3FF}]?"
" | \\x{1F484}"
" | \\x{1F485} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F486}\\x{1F487}]"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F488}-\\x{1F4A9}]"
" | \\x{1F4AA} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F4AB}-\\x{1F4FD}\\x{1F4FF}-\\x{1F53D}\\x{1F549}-\\x{1F54E}\\x{1F550}-\\x{1F567}\\x{1F56F}\\x{1F570}\\x{1F573}]"
" | \\x{1F574} [\\x{1F3FB}-\\x{1F3FF}]?"
" | \\x{1F575}"
" (?:"
" \\x{FE0F} \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F576}-\\x{1F579}]"
" | \\x{1F57A} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F587}\\x{1F58A}-\\x{1F58D}]"
" | [\\x{1F590}\\x{1F595}\\x{1F596}] [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F5A4}\\x{1F5A5}\\x{1F5A8}\\x{1F5B1}\\x{1F5B2}\\x{1F5BC}\\x{1F5C2}-\\x{1F5C4}\\x{1F5D1}-\\x{1F5D3}\\x{1F5DC}-\\x{1F5DE}\\x{1F5E1}\\x{1F5E3}\\x{1F5E8}\\x{1F5EF}\\x{1F5F3}\\x{1F5FA}-\\x{1F644}]"
" | [\\x{1F645}-\\x{1F647}]"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F648}-\\x{1F64A}]"
" | \\x{1F64B}"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | \\x{1F64C} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F64D}\\x{1F64E}]"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | \\x{1F64F} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F680}-\\x{1F6A2}]"
" | \\x{1F6A3}"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F6A4}-\\x{1F6B3}]"
" | [\\x{1F6B4}-\\x{1F6B6}]"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F6B7}-\\x{1F6BF}]"
" | \\x{1F6C0} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F6C1}-\\x{1F6C5}\\x{1F6CB}]"
" | \\x{1F6CC} [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F6CD}-\\x{1F6D2}\\x{1F6E0}-\\x{1F6E5}\\x{1F6E9}\\x{1F6EB}\\x{1F6EC}\\x{1F6F0}\\x{1F6F3}-\\x{1F6F9}\\x{1F910}-\\x{1F917}]"
" | [\\x{1F918}-\\x{1F91C}] [\\x{1F3FB}-\\x{1F3FF}]?"
" | \\x{1F91D}"
" | [\\x{1F91E}\\x{1F91F}] [\\x{1F3FB}-\\x{1F3FF}]?"
" | [\\x{1F920}-\\x{1F925}]"
" | \\x{1F926}"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F927}-\\x{1F92F}]"
" | [\\x{1F930}-\\x{1F936}] [\\x{1F3FB}-\\x{1F3FF}]?"
" | \\x{1F937}"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F938}\\x{1F939}]"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | \\x{1F93A}"
" | \\x{1F93C}"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" | [\\x{1F93D}\\x{1F93E}]"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F940}-\\x{1F945}\\x{1F947}-\\x{1F970}\\x{1F973}-\\x{1F976}\\x{1F97A}\\x{1F97C}-\\x{1F9A2}\\x{1F9B0}-\\x{1F9B4}]"
" | [\\x{1F9B5}\\x{1F9B6}] [\\x{1F3FB}-\\x{1F3FF}]?"
" | \\x{1F9B7}"
" | [\\x{1F9B8}\\x{1F9B9}]"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F9C0}-\\x{1F9C2}\\x{1F9D0}]"
" | [\\x{1F9D1}-\\x{1F9D5}] [\\x{1F3FB}-\\x{1F3FF}]?"
" | \\x{1F9D6}"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F9D7}-\\x{1F9DD}]"
" (?:"
" \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F}"
" | [\\x{1F3FB}-\\x{1F3FF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" )?"
" | [\\x{1F9DE}\\x{1F9DF}]"
" (?: \\x{200D} [\\x{2640}\\x{2642}] \\x{FE0F} )?"
" | [\\x{1F9E0}-\\x{1F9FF}]"
Utf-16モード(文字列)、圧縮モードの場合:
"[#*0-9]\\uFE0F\\u20E3|[\\u00A9\\u00AE\\u203C\\u2049\\u2122\\u2139\\u2"
"194-\\u2199\\u21A9\\u21AA\\u231A\\u231B\\u2328\\u23CF\\u23E9-\\u23F3\\"
"u23F8-\\u23FA\\u24C2\\u25AA\\u25AB\\u25B6\\u25C0\\u25FB-\\u25FE\\u260"
"0-\\u2604\\u260E\\u2611\\u2614\\u2615\\u2618]|\\u261D(?:\\uD83C[\\uDF"
"FB-\\uDFFF])?|[\\u2620\\u2622\\u2623\\u2626\\u262A\\u262E\\u262F\\u26"
"38-\\u263A\\u2640\\u2642\\u2648-\\u2653\\u265F\\u2660\\u2663\\u2665\\u"
"2666\\u2668\\u267B\\u267E\\u267F\\u2692-\\u2697\\u2699\\u269B\\u269C\\"
"u26A0\\u26A1\\u26AA\\u26AB\\u26B0\\u26B1\\u26BD\\u26BE\\u26C4\\u26C5\\"
"u26C8\\u26CE\\u26CF\\u26D1\\u26D3\\u26D4\\u26E9\\u26EA\\u26F0-\\u26F5"
"\\u26F7\\u26F8]|\\u26F9(?:\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640"
"\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uFE0F)?|[\\u26FA\\u"
"26FD\\u2702\\u2705\\u2708\\u2709]|[\\u270A-\\u270D](?:\\uD83C[\\uDFF"
"B-\\uDFFF])?|[\\u270F\\u2712\\u2714\\u2716\\u271D\\u2721\\u2728\\u273"
"3\\u2734\\u2744\\u2747\\u274C\\u274E\\u2753-\\u2755\\u2757\\u2763\\u27"
"64\\u2795-\\u2797\\u27A1\\u27B0\\u27BF\\u2934\\u2935\\u2B05-\\u2B07\\u"
"2B1B\\u2B1C\\u2B50\\u2B55\\u3030\\u303D\\u3297\\u3299]|\\uD83C(?:[\\u"
"DC04\\uDCCF\\uDD70\\uDD71\\uDD7E\\uDD7F\\uDD8E\\uDD91-\\uDD9A]|\\uDDE"
"6\\uD83C[\\uDDE8-\\uDDEC\\uDDEE\\uDDF1\\uDDF2\\uDDF4\\uDDF6-\\uDDFA\\u"
"DDFC\\uDDFD\\uDDFF]|\\uDDE7\\uD83C[\\uDDE6\\uDDE7\\uDDE9-\\uDDEF\\uDD"
"F1-\\uDDF4\\uDDF6-\\uDDF9\\uDDFB\\uDDFC\\uDDFE\\uDDFF]|\\uDDE8\\uD83C"
"[\\uDDE6\\uDDE8\\uDDE9\\uDDEB-\\uDDEE\\uDDF0-\\uDDF5\\uDDF7\\uDDFA-\\u"
"DDFF]|\\uDDE9\\uD83C[\\uDDEA\\uDDEC\\uDDEF\\uDDF0\\uDDF2\\uDDF4\\uDDF"
"F]|\\uDDEA\\uD83C[\\uDDE6\\uDDE8\\uDDEA\\uDDEC\\uDDED\\uDDF7-\\uDDFA]"
"|\\uDDEB\\uD83C[\\uDDEE-\\uDDF0\\uDDF2\\uDDF4\\uDDF7]|\\uDDEC\\uD83C["
"\\uDDE6\\uDDE7\\uDDE9-\\uDDEE\\uDDF1-\\uDDF3\\uDDF5-\\uDDFA\\uDDFC\\uD"
"DFE]|\\uDDED\\uD83C[\\uDDF0\\uDDF2\\uDDF3\\uDDF7\\uDDF9\\uDDFA]|\\uDD"
"EE\\uD83C[\\uDDE8-\\uDDEA\\uDDF1-\\uDDF4\\uDDF6-\\uDDF9]|\\uDDEF\\uD8"
"3C[\\uDDEA\\uDDF2\\uDDF4\\uDDF5]|\\uDDF0\\uD83C[\\uDDEA\\uDDEC-\\uDDE"
"E\\uDDF2\\uDDF3\\uDDF5\\uDDF7\\uDDFC\\uDDFE\\uDDFF]|\\uDDF1\\uD83C[\\u"
"DDE6-\\uDDE8\\uDDEE\\uDDF0\\uDDF7-\\uDDFB\\uDDFE]|\\uDDF2\\uD83C[\\uD"
"DE6\\uDDE8-\\uDDED\\uDDF0-\\uDDFF]|\\uDDF3\\uD83C[\\uDDE6\\uDDE8\\uDD"
"EA-\\uDDEC\\uDDEE\\uDDF1\\uDDF4\\uDDF5\\uDDF7\\uDDFA\\uDDFF]|\\uDDF4\\"
"uD83C\\uDDF2|\\uDDF5\\uD83C[\\uDDE6\\uDDEA-\\uDDED\\uDDF0-\\uDDF3\\uD"
"DF7-\\uDDF9\\uDDFC\\uDDFE]|\\uDDF6\\uD83C\\uDDE6|\\uDDF7\\uD83C[\\uDD"
"EA\\uDDF4\\uDDF8\\uDDFA\\uDDFC]|\\uDDF8\\uD83C[\\uDDE6-\\uDDEA\\uDDEC"
"-\\uDDF4\\uDDF7-\\uDDF9\\uDDFB\\uDDFD-\\uDDFF]|\\uDDF9\\uD83C[\\uDDE6"
"\\uDDE8\\uDDE9\\uDDEB-\\uDDED\\uDDEF-\\uDDF4\\uDDF7\\uDDF9\\uDDFB\\uDD"
"FC\\uDDFF]|\\uDDFA\\uD83C[\\uDDE6\\uDDEC\\uDDF2\\uDDF3\\uDDF8\\uDDFE\\"
"uDDFF]|\\uDDFB\\uD83C[\\uDDE6\\uDDE8\\uDDEA\\uDDEC\\uDDEE\\uDDF3\\uDD"
"FA]|\\uDDFC\\uD83C[\\uDDEB\\uDDF8]|\\uDDFD\\uD83C\\uDDF0|\\uDDFE\\uD8"
"3C[\\uDDEA\\uDDF9]|\\uDDFF\\uD83C[\\uDDE6\\uDDF2\\uDDFC]|[\\uDE01\\uD"
"E02\\uDE1A\\uDE2F\\uDE32-\\uDE3A\\uDE50\\uDE51\\uDF00-\\uDF21\\uDF24-"
"\\uDF84]|\\uDF85(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDF86-\\uDF93\\uDF9"
"6\\uDF97\\uDF99-\\uDF9B\\uDF9E-\\uDFC1]|\\uDFC2(?:\\uD83C[\\uDFFB-\\u"
"DFFF])?|[\\uDFC3\\uDFC4](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\"
"uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDFC5\\uDFC6"
"]|\\uDFC7(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDFC8\\uDFC9]|\\uDFCA(?:\\"
"u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2"
"640\\u2642]\\uFE0F)?)?|[\\uDFCB\\uDFCC](?:\\uD83C[\\uDFFB-\\uDFFF]("
"?:\\u200D[\\u2640\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uF"
"E0F)?|[\\uDFCD-\\uDFF0]|\\uDFF3(?:\\uFE0F\\u200D\\uD83C\\uDF08)?|\\u"
"DFF4(?:\\u200D\\u2620\\uFE0F|\\uDB40\\uDC67\\uDB40\\uDC62\\uDB40(?:\\"
"uDC65\\uDB40\\uDC6E\\uDB40\\uDC67|\\uDC73\\uDB40\\uDC63\\uDB40\\uDC74"
"|\\uDC77\\uDB40\\uDC6C\\uDB40\\uDC73)\\uDB40\\uDC7F)?|[\\uDFF5\\uDFF7"
"-\\uDFFF])|\\uD83D(?:[\\uDC00-\\uDC40]|\\uDC41(?:\\uFE0F\\u200D\\uD8"
"3D\\uDDE8\\uFE0F)?|[\\uDC42\\uDC43](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\"
"uDC44\\uDC45]|[\\uDC46-\\uDC50](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDC"
"51-\\uDC65]|[\\uDC66\\uDC67](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC68(?"
":\\u200D(?:[\\u2695\\u2696\\u2708]\\uFE0F|\\u2764\\uFE0F\\u200D\\uD83"
"D(?:\\uDC8B\\u200D\\uD83D)?\\uDC68|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDF"
"A4\\uDFA8\\uDFEB\\uDFED]|\\uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?"
"|\\uDC67(?:\\u200D\\uD83D[\\uDC66\\uDC67])?|[\\uDC68\\uDC69]\\u200D\\"
"uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?|\\uDC67(?:\\u200D\\uD83D["
"\\uDC66\\uDC67])?)|[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92])|\\uD"
"83E[\\uDDB0-\\uDDB3])|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D(?:[\\u2695"
"\\u2696\\u2708]\\uFE0F|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uD"
"FEB\\uDFED]|\\uD83D[\\uDCBB\\uDCBC\\uDD27\\uDD2C\\uDE80\\uDE92]|\\uD8"
"3E[\\uDDB0-\\uDDB3]))?)?|\\uDC69(?:\\u200D(?:[\\u2695\\u2696\\u2708"
"]\\uFE0F|\\u2764\\uFE0F\\u200D\\uD83D(?:\\uDC8B\\u200D\\uD83D)?[\\uDC"
"68\\uDC69]|\\uD83C[\\uDF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]"
"|\\uD83D(?:\\uDC66(?:\\u200D\\uD83D\\uDC66)?|\\uDC67(?:\\u200D\\uD83"
"D[\\uDC66\\uDC67])?|\\uDC69\\u200D\\uD83D(?:\\uDC66(?:\\u200D\\uD83D"
"\\uDC66)?|\\uDC67(?:\\u200D\\uD83D[\\uDC66\\uDC67])?)|[\\uDCBB\\uDCB"
"C\\uDD27\\uDD2C\\uDE80\\uDE92])|\\uD83E[\\uDDB0-\\uDDB3])|\\uD83C[\\u"
"DFFB-\\uDFFF](?:\\u200D(?:[\\u2695\\u2696\\u2708]\\uFE0F|\\uD83C[\\u"
"DF3E\\uDF73\\uDF93\\uDFA4\\uDFA8\\uDFEB\\uDFED]|\\uD83D[\\uDCBB\\uDCB"
"C\\uDD27\\uDD2C\\uDE80\\uDE92]|\\uD83E[\\uDDB0-\\uDDB3]))?)?|[\\uDC6"
"A-\\uDC6D]|\\uDC6E(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-"
"\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDC6F(?:\\u200D[\\u2"
"640\\u2642]\\uFE0F)?|\\uDC70(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC71(?"
":\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\"
"u2640\\u2642]\\uFE0F)?)?|\\uDC72(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC"
"73(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u20"
"0D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDC74-\\uDC76](?:\\uD83C[\\uDFFB-\\"
"uDFFF])?|\\uDC77(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\"
"uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDC78(?:\\uD83C[\\uDF"
"FB-\\uDFFF])?|[\\uDC79-\\uDC7B]|\\uDC7C(?:\\uD83C[\\uDFFB-\\uDFFF])"
"?|[\\uDC7D-\\uDC80]|[\\uDC81\\uDC82](?:\\u200D[\\u2640\\u2642]\\uFE0"
"F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uD"
"C83(?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDC84|\\uDC85(?:\\uD83C[\\uDFFB-"
"\\uDFFF])?|[\\uDC86\\uDC87](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C"
"[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDC88-\\uD"
"CA9]|\\uDCAA(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDCAB-\\uDCFD\\uDCFF-\\"
"uDD3D\\uDD49-\\uDD4E\\uDD50-\\uDD67\\uDD6F\\uDD70\\uDD73]|\\uDD74(?:"
"\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD75(?:\\uD83C[\\uDFFB-\\uDFFF](?:\\u2"
"00D[\\u2640\\u2642]\\uFE0F)?|\\uFE0F\\u200D[\\u2640\\u2642]\\uFE0F)?"
"|[\\uDD76-\\uDD79]|\\uDD7A(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDD87\\uD"
"D8A-\\uDD8D]|[\\uDD90\\uDD95\\uDD96](?:\\uD83C[\\uDFFB-\\uDFFF])?|["
"\\uDDA4\\uDDA5\\uDDA8\\uDDB1\\uDDB2\\uDDBC\\uDDC2-\\uDDC4\\uDDD1-\\uDD"
"D3\\uDDDC-\\uDDDE\\uDDE1\\uDDE3\\uDDE8\\uDDEF\\uDDF3\\uDDFA-\\uDE44]|"
"[\\uDE45-\\uDE47](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\"
"uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDE48-\\uDE4A]|\\uDE"
"4B(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u20"
"0D[\\u2640\\u2642]\\uFE0F)?)?|\\uDE4C(?:\\uD83C[\\uDFFB-\\uDFFF])?|"
"[\\uDE4D\\uDE4E](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\u"
"DFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDE4F(?:\\uD83C[\\uDFF"
"B-\\uDFFF])?|[\\uDE80-\\uDEA2]|\\uDEA3(?:\\u200D[\\u2640\\u2642]\\uF"
"E0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|["
"\\uDEA4-\\uDEB3]|[\\uDEB4-\\uDEB6](?:\\u200D[\\u2640\\u2642]\\uFE0F|"
"\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDE"
"B7-\\uDEBF]|\\uDEC0(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDEC1-\\uDEC5\\u"
"DECB]|\\uDECC(?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDECD-\\uDED2\\uDEE0-"
"\\uDEE5\\uDEE9\\uDEEB\\uDEEC\\uDEF0\\uDEF3-\\uDEF9])|\\uD83E(?:[\\uDD"
"10-\\uDD17]|[\\uDD18-\\uDD1C](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD1D|"
"[\\uDD1E\\uDD1F](?:\\uD83C[\\uDFFB-\\uDFFF])?|[\\uDD20-\\uDD25]|\\uD"
"D26(?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u2"
"00D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDD27-\\uDD2F]|[\\uDD30-\\uDD36]("
"?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDD37(?:\\u200D[\\u2640\\u2642]\\uFE0"
"F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\u"
"DD38\\uDD39](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFF"
"F](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|\\uDD3A|\\uDD3C(?:\\u200D[\\"
"u2640\\u2642]\\uFE0F)?|[\\uDD3D\\uDD3E](?:\\u200D[\\u2640\\u2642]\\u"
"FE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|"
"[\\uDD40-\\uDD45\\uDD47-\\uDD70\\uDD73-\\uDD76\\uDD7A\\uDD7C-\\uDDA2\\"
"uDDB0-\\uDDB4]|[\\uDDB5\\uDDB6](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDDB"
"7|[\\uDDB8\\uDDB9](?:\\u200D[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-"
"\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uDDC0-\\uDDC2\\uDDD"
"0]|[\\uDDD1-\\uDDD5](?:\\uD83C[\\uDFFB-\\uDFFF])?|\\uDDD6(?:\\u200D"
"[\\u2640\\u2642]\\uFE0F|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u"
"2642]\\uFE0F)?)?|[\\uDDD7-\\uDDDD](?:\\u200D[\\u2640\\u2642]\\uFE0F"
"|\\uD83C[\\uDFFB-\\uDFFF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?)?|[\\uD"
"DDE\\uDDDF](?:\\u200D[\\u2640\\u2642]\\uFE0F)?|[\\uDDE0-\\uDDFF])"
正規表現は遅すぎ、絵文字は非常に高速に更新されます。
このプロジェクトを試してください simple-emoji-4j
絵文字12.0(2018.10.15)との互換性
シンプルで:
EmojiUtils.containsEmoji(str)