-
Notifications
You must be signed in to change notification settings - Fork 18k
internal/syscall/windows: GetACP returns wrong codepage #16857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When reading from console, go decode the bytes from stream to utf-8.
|
|
What value return from GetConsoleCP() on your environment? |
|
@mattn I will let you try and fix this, since I have nothing but English around me. Alex |
CL https://golang.org/cl/27575 mentions this issue. |
@tkausl you always change console codepage with |
No I don't. I've googled a bit, there is not much Information about default charsets for Windows but according to this site, Windows uses charset 1252 as default for west-european countries but it uses usually 850 for the console for west-european countries. Not sure why, but these are the defaults it seems. |
@tkausl can you, please, try this change https://golang.org/cl/27575 to see if it fixes your problem? (let use know if you need more instructions) Thank you. Alex |
This does change the input I get but it's still not the correct one. With this change I get the sequence |
Seems like they are equivalent, so this change gets us one step further at least. |
Oh, you actually choose to convert them to decomposed characters (MB_COMPOSITE), not sure what the advantages are but it would be great if Stdout could actually re-encode them correctly. |
@tkausl I know nothing about unicode, but please try https://go-review.googlesource.com/#/c/27576/ to see if that fixes your problem. Thank you. Alex |
@tkausl if changing MB_COMPOSITE to MB_PRECOMPOSED, it works? |
please note merging CL27576 will reopen #6303 |
@alexbrainman Yes, with this change it works. |
@tkausl thanks your confirming. MB_PRECOMPOSED will fix the issue. I'm working on vim-dev and vim.exe is implemented using MB_PRECOMPOSED not MB_COMPOSITE. I don't hear any issue about using MB_PRECOMPOSED. Thanks. |
I understand. But I think correct input character decoding is more important than ctrl+Z handling.
Thanks for checking. I will let my https://go-review.googlesource.com/27576 as alternative to @mattn change. Given that I was against submitting of CL 4310 in the first place (and I didn't changed my mind), I don't see how I can make correct call here. So leaving for someone else to decide which CL to go with. Plus I know very little about Unicode. Alex |
CL https://golang.org/cl/27576 mentions this issue. |
@alexbrainman The replacing MF_COMPOSED to MF_PRECOMPOSED and using GetConsoleCP() will fix this issue. I don't understand why you want to revert. If you know little about unicode, let's add another one for reviewing. :) |
That is what I suggested myself (#16857 (comment)). Alex |
@hirochachacha could you please take a look? |
@mattn Sure, I'll try. but don't expect too much. |
What version of Go are you using (
go version
)?1.7
What operating system and processor architecture are you using (
go env
)?Windows, arm64
go env
:What did you do?
I wrote a program which just reads bytes from
os.Stdin
and prints their values as Hex to Stdout. I started the program, typed aä
to the console and pressed enter.What did you expect to see?
I expected to see the correct UTF-8 sequence for a
ä
which isC3 A4
What did you see instead?
I see the Hex-sequence
E2 80 9E
I found that internal/syscall/windows GetACP() returns 1252 even though I can verify that the
ä
is encoded in CP850. Because of this wrongly returned codepage, Stdin.readConsole tries to decode the character from 1252 to UTF-8 instead of from 850 to UTF-8. As you see in my Post on Stackoverflow, when I read from stdin through CGO I get the byte 0x84 which is the value for aä
in CP850, it should've been 0xE4 if it were in 1252. The value 0x84 decoded fromgolang.org/x/text/encoding/charmap
.CodePage850
decodes to the correct UTF-8 character.chcp
tells me that the active codepage is 850.The text was updated successfully, but these errors were encountered: