|
Replies:
3
-
Last Post:
Jul 30, 2007 3:52 PM
by: justinlindh
|
|
|
|
|
|
|
Any known issues with 4-byte utf-8 characters and JAX-WS?
Posted:
Jul 19, 2007 2:26 PM
|
|
|
I have a webservice hosted in JBoss and recently upgraded to JAX-WS from JAX-RPC.
Everything's working well except a bug has appeared which wasn't there under JAX-RPC when a UTF-8-encoded 4-byte (e.g. Japanese) character is in the SOAP message body. The server returns a "Bad request" fault, somewhere early in the stack.
Is there any known issue with 4-byte utf-8 characters and JAX-WS? The byte sequence of the character I'm using to test is F0 A6 9F 8C.
The character (assuming it renders correctly here) is 𦟌. It occurs in the text content of an element in the SOAP body (not in an attribute value or identifier).
TIA, Chris
|
|
|
|
|
|
|
Re: Any known issues with 4-byte utf-8 characters and JAX-WS?
Posted:
Jul 20, 2007 12:07 PM
in response to: chriscorbell
|
|
|
Most UTF-8 encoded Japanese characters will encode in 3 bytes. For example, the character æ¼¢ (KAN) encodes as the three UTF-8 code units E6 BC A2. If you have Japanese characters that encode as four UTF-8 code units, you must be using characters above the base multilingual plane (supplementary characters). Maybe you are encoding the characters incorrectly? Are you really using supplementary characters?
Regards, John O'Conner
|
|
|
|
|
|
|
|
Re: Any known issues with 4-byte utf-8 characters and JAX-WS?
Posted:
Jul 20, 2007 7:58 PM
in response to: chriscorbell
|
|
|
Converting your UTF-8 to a Unicode code point value, I get U+267CC, definitely in the supplementary area. It is a completely valid Unicode character, supported nicely in Java SE 5 and higher. What version of Java are you using? Maybe you are using an older version of the Java platform, one that doesn't quite grok the character. Can you try a slightly less ambitious character, perhaps one up to U+FFFF...let's see how your app works then, and then we'll re-evaluate the problem.
Regards, John O'Conner http://joconner.com
|
|
|
|
|
|
|
|
Re: Any known issues with 4-byte utf-8 characters and JAX-WS?
Posted:
Jul 30, 2007 3:52 PM
in response to: joconner
|
|
|
I'm also having some problems that sound similar to this.
I'm submitting data from a web page, and when I use Japanese characters I'm seeing the following received in the debugger: \u0006F22\u0005B57 (for: æ¼¢å—)
This is UTF-16, but I'm sending this data to a dotnet application that is expecting UTF-8. How can I do this conversion? I've tried: String utf8Body = new String(request.getBody().getBytes(), "UTF-8");
But this only serves to mangle the String once received. I'm new to internationalization issues, so any help is greatly appreciated.
|
|
|
|
|