JavaでXMLを生成する際の特殊文字のエスケープ

Question

アプリケーションユーザーにデータをXML形式でエクスポートできるようにするXMLエクスポート機能を開発しようとしています。いくつかのケースで失敗し始めるまで、私はこの機能を準備して動作させました。次に、エンコードが必要な特殊文字が原因であることに気付きました。たとえば、データに＆または！が含まれる場合がありますまたは％または 'または＃などなど。これは適切にエスケープする必要があります。 XML仕様に従ってすべての特殊文字をエスケープできる汎用ユーティリティがあるかどうか疑問に思っていました。 Googleで何も見つかりませんでした。

すでにそこにそのようなものがありますか？またはそれを行う他の方法はありますか？

XMLを生成するために使用しているコードは次のとおりです。

 Document xmldoc = new DocumentImpl(); Element root = xmldoc.createElement("Report"); Element name= xmldoc.createElement((exportData.getChartName() == null) ? "Report" : exportData.getChartName()); if (exportData.getExportDataList().size() > 0 && exportData.getExportDataList().get(0) instanceof Vector) { // First row is the HEADER, i.e name Vector name = exportData.getExportDataList().get(0); for (int i = 1; i value = exportData.getExportDataList().get(i); Element sub_root = xmldoc.createElement("Data"); //I had to remove a for loop from here. StackOverflow description field would not take that. :( // Insert header row Element node = xmldoc.createElementNS(null, replaceUnrecognizedChars(name.get(j))); Node node_value = xmldoc.createTextNode(value.get(j)); node.appendChild(node_value); sub_root.appendChild(node); chartName.appendChild(sub_root); } } } root.appendChild(name); // Prepare the DOM document for writing Source source = new DOMSource(root); // Prepare the output file Result result = new StreamResult(file); // Write the DOM document to the file Transformer xformer = TransformerFactory.newInstance().newTransformer(); xformer.transform(source, result);`

サンプルXML：

 <Data> <TimeStamp>2010-08-31 00:00:00.0</TimeStamp> <[Name that needs to be encoded]>0.0</[Name that needs to be encoded]> <Group_Average>1860.0</Group_Average> </Data>

gigadot · Accepted Answer

Apache common lang library を使用して文字列をエスケープできます。

org.Apache.commons.lang.StringEscapeUtils String escapedXml = StringEscapeUtils.escapeXml("the data might contain & or ! or % or ' or # etc");

しかし、探しているのは、任意の文字列を有効なXMLタグ名に変換する方法です。 ASCII文字の場合、XMLタグ名は_：a-zA-Zのいずれかで始まり、_：a-zA-Z0-9の任意の数の文字が続く必要があります。

これを行うライブラリはないので、このパターンに一致する任意の文字列から変換するか、attribbueの値に変換する独自の関数を実装する必要があります。

<property name="no more need to be encoded, it should be handled by XML library">0.0</property>

Chintan Raghwani · Answer

public class RssParser { int length; URL url; URLConnection urlConn; NodeList nodeList; Document doc; Node node; Element firstEle; NodeList titleList; Element ele; NodeList txtEleList; String retVal, urlStrToParse, rootNodeName; public RssParser(String urlStrToParse, String rootNodeName){ this.urlStrToParse = urlStrToParse; this.rootNodeName = rootNodeName; url=null; urlConn=null; nodeList=null; doc=null; node=null; firstEle=null; titleList=null; ele=null; txtEleList=null; retVal=null; doc = null; try { url = new URL(this.urlStrToParse); // dis is path of url which v'll parse urlConn = url.openConnection(); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); String s = isToString(urlConn.getInputStream()); s = s.replace("&", "&amp;"); StringBuilder sb = new StringBuilder ("<?xml version=\"1.0\" encoding=\"utf-8\"?>"); sb.append("
"+s); System.out.println("STR: 
"+sb.toString()); s = sb.toString(); doc = db.parse(urlConn.getInputStream()); nodeList = doc.getElementsByTagName(this.rootNodeName); // dis is d first node which // contains other inner element-nodes length =nodeList.getLength(); firstEle=doc.getDocumentElement(); } catch (ParserConfigurationException pce) { System.out.println("Could not Parse XML: " + pce.getMessage()); } catch (SAXException se) { System.out.println("Could not Parse XML: " + se.getMessage()); } catch (IOException ioe) { System.out.println("Invalid XML: " + ioe.getMessage()); } catch(Exception e){ System.out.println("Error: "+e.toString()); } } public String isToString(InputStream in) throws IOException { StringBuffer out = new StringBuffer(); byte[] b = new byte[512]; for (int i; (i = in.read(b)) != -1;) { out.append(new String(b, 0, i)); } return out.toString(); } public String getVal(int i, String param){ node =nodeList.item(i); if(node.getNodeType() == Node.ELEMENT_NODE) { System.out.println("Param: "+param); titleList = firstEle.getElementsByTagName(param); if(firstEle.hasAttribute("id")) System.out.println("hasAttrib----------------"); else System.out.println("Has NOTNOT NOT"); System.out.println("titleList: "+titleList.toString()); ele = (Element)titleList.item(i); System.out.println("ele: "+ele); txtEleList = ele.getChildNodes(); retVal=(((Node)txtEleList.item(0)).getNodeValue()).toString(); if (retVal == null) return null; System.out.println("retVal: "+retVal); } return retVal; } }

Abhishek Jha · Answer

以下のコードを使用して、XMLを使用して文字列内の文字をエスケープします。StringEscapeUtilsはapche commons lang3 jarで利用可能です

StringEscapeUtils.escapeXml11("String to be escaped");