The Source for Java Technology Collaboration

Home » java.net Forums » Java Web Services and XML » JAXP

Thread: Entity reference Conversion to Special Character

Welcome, Guest Help
Login Login
Guest Settings Guest Settings
This question is not answered. Helpful answers available: 2. Correct answers available: 1.

Reply to this Thread Reply to this Thread Search Forum Search Forum Back to Thread List Back to Thread List

Permlink Replies: 2 - Last Post: Oct 24, 2008 10:52 PM by: prunge Threads: [ Previous | Next ]
ipodee

Posts: 1
Entity reference Conversion to Special Character
Posted: Oct 23, 2008 8:40 AM
 
  Click to reply to this thread Reply

Does anyone know how to diable the auto conversion of entity reference by SAXParser?

For example, when I feed source xml file with entity reference like "& lt;" to SAXParser.parse(...), it's converted to < character in the target xml.

I don't want this happen, how can I do it?

Thanks,

Kevin

Message was edited by: ipodee

joehw

Posts: 122
Re: Entity reference Conversion to Special Character
Posted: Oct 23, 2008 3:11 PM   in response to: ipodee
 
  Click to reply to this thread Reply

While it's impossible to turn that off since entities like < are sort of built-in in XML, you may eascape them in a CDATA section.

Joe

prunge

Posts: 101
Re: Entity reference Conversion to Special Character
Posted: Oct 24, 2008 4:43 PM   in response to: ipodee
 
  Click to reply to this thread Reply

Hi Kevin,

If you are using Xerces you can turn on the

http://apache.org/xml/features/scanner/notify-builtin-refs

(see http://xerces.apache.org/xerces2-j/features.html#scanner.notify-builtin-refs)

feature to be notified of these entities. You will still be notified of the parsed characters in the characters() method but because they will be surrounded with startEntity() and endEntity() events you can write some additional logic for this.

SAXParserFactory spf = SAXParserFactory.newInstance();
		spf.setNamespaceAware(true);
		spf.setFeature("http://apache.org/xml/features/scanner/notify-builtin-refs", true);
		
		SAXParser parser = spf.newSAXParser();
		XmlHandler handler = new XmlHandler();
		
		//Need this otherwise XmlHandler is treated as a standard DefaultHandler
		parser.setProperty ("http://xml.org/sax/properties/lexical-handler", handler);
		
		String xml = "<root>This is a &lt;test&gt;</root>";
		StringReader reader = new StringReader(xml);
		
		parser.parse(new InputSource(reader), handler);


private static class XmlHandler extends DefaultHandler2
	{
		@Override
		public void characters(char[] ch, int start, int length)
		throws SAXException 
		{
			System.out.println("Characters: " + new String(ch, start, length));
		}
 
		@Override
		public void endEntity(String name) throws SAXException 
		{
			System.out.println("end entity: " + name);
		}
 
		@Override
		public void startEntity(String name) throws SAXException 
		{
			System.out.println("start entity: " + name);
		}
	}


And the output is:

Characters: This is a
start entity: lt
Characters: <
end entity: lt
Characters: test
start entity: gt
Characters: >
end entity: gt

Hope this helps,

Peter

(fixed message to have properly escaped entities)

Message was edited by: prunge




 XML java.net RSS