Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
11.0, 12.5
-
None
-
None
Description
I'm trying to understand what charset is being used when typing into the Netbeans console. I expected it to obey my current locale (which is UTF8 on my system), but it obviously isn't doing that. I then set on trying to understand what charset it is. And the answer seems to be : none. Here are my findings :
Using Netbeans 12.5 + Maven and the following code :
final InputStream IN = System.in; do { System.out.println("Byte: " + IN.read()); } while (IN.available() > 0);
When I enter "€" in the Netbeans console, I get the following unexpected output ("10" is just the newline char):
Byte: 172 Byte: 10
It is unexpected because € is never encoded as 172 (0xac) alone. In UTF8 it is three bytes (0xe2 0x82 0xac), and in UTF16 it is two (0x20 0xac)
Similarly, entering 𐐷 (DESERET SMALL LETTER YEE), I get something unexpected:
Byte: 1 Byte: 55 Byte: 10
IOW these are 0x01 0x37. In UTF8, it should be 0xf0 0x90 0x90 0xb7 – in UTF16 it should be 0xd8 0x01 0xdc 0x37
I'm on Linux, my locale (as reported by the command locale) is UTF8, but these results look like the encoding is "half UTF16" : it's like UTF16 but every other byte is missing.
If I run the same code within an Ant project or a Gradle project it works fine for the symbol €, and the bytes reported are consistent with my UTF8 locale ;
but if I enter 𐐷 (DESERET SMALL LETTER YEE), then :
- with the Gradle project, it outputs "Byte: -1" (no other bytes reported), and
- with the Ant project, it outputs nothing and the program does not seem to stop, I have to manually abort the Run.