XML文件大概如下,通过DOM去解析,但是如果元素中含有中文,则无法整体返回字符串,只能返回一个第一个中文,如“蒙”:
XML文件一:
[code="java"]
Source Milk Title
2011-05-30T12:47:58Z
1
Milk
Milk Title
2011-08-14T12:23:16Z
蒙牛的好喝酸奶
2011-06-06T12:52:21Z
2
蒙牛酸奶
蒙牛的好喝酸奶
2011-06-06T12:52:21Z
[/code]
具体相关代码如下:
代码一:
[code="java"] DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(xmlString));
Document doc = db.parse(is);
NodeList nodes = doc.getElementsByTagName("post");
eventsArrayList = new ArrayList<myEvents>(); //Gertig
//Iterate the events
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
eventsArrayList.add(new myEvents());
NodeList eventIDNum = element.getElementsByTagName("id");
Element line = (Element) eventIDNum.item(0);
eventsArrayList.get(i).eventID = Integer.parseInt(getCharacterDataFromElement(line));
NodeList eventName = element.getElementsByTagName("name");
line = (Element) eventName.item(0);
eventsArrayList.get(i).name = getCharacterDataFromElement(line).trim();
// String reName = getCharacterDataFromElement(line);
// String reTrimName = getCharacterDataFromElement(line).trim();
// NodeList eventBudget = element.getElementsByTagName("content");
// line = (Element) eventBudget.item(0);
// eventsArrayList.get(i).budget = Double.parseDouble(getCharacterDataFromElement(line));
NodeList eventContent = element.getElementsByTagName("content");
line = (Element) eventContent.item(0);
eventsArrayList.get(i).content = getCharacterDataFromElement(line).trim();
}
[/code]
代码二:
[code="java"]public static String getCharacterDataFromElement(Element e) {
Node child = e.getFirstChild();
Node lchild = e.getLastChild();
if (child instanceof CharacterData) {
CharacterData cd = (CharacterData) child;
CharacterData lcd = (CharacterData) lchild;
String cdStr = cd.getNodeValue();
String lcdStr = lcd.getNodeValue();
return cd.getData();
}
return "?"; //ListActivity will display a ? if a null value is passed to the Rails server
}[/code]
通过代码二分析,发现中文字符串在此处并没有被看成一个完整的节点,而是多个node,比如对:蒙牛的好喝酸奶 解析, getFirstChild()返回的是第一个字符“蒙”, etLastChild()返回的是最后一个字符“奶”。
问题出现在什么地方呢? 求解答?
另外,通过debug,发现传进去的不是初始的XML文件一,而是类似如下含有对应中文编码字符。或许与此有关,但是不知其然?
XML文件二:
<?xml version="1.0" encoding="UTF-8"?>
Source Milk Title
2011-05-30T12:47:58Z
1
Milk
Milk Title
2011-08-14T12:23:16Z
蒙牛的好喝酸奶
2011-06-06T12:52:21Z
2
蒙牛酸奶
蒙牛的好喝酸奶
2011-06-06T12:52:21Z