You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Go 1.3.3, the XML parser for Go is locked into UTF-8 encodings. In
encoding/xml/xml.go (around line 576), there's the line:
enc := procInstEncoding(string(data))
if enc != "" && enc != "utf-8" && enc != "UTF-8" {
For documents with:
<?xml version="1.0" encoding="ISO-8859-1"?>
you get this error message:
Invalid body content: xml: encoding "ISO-8859-1" declared but Decoder.CharsetReader is nil
You can override the reader to support alternative encodings, but this means pre-parse
the XML []byte yourself for the proper encoding, setup the reader, then parse the XML.
Could the package be adapted somehow so you could provide alternate readers ahead of
time, based on the encoding value? Something like this (pseudocode):
func init() {
xml.AddCharsetReader("iso-8859-1", ISO8859Reader)
}
func Parse(doc []byte) (SomeStruct, error) {
var myobj SomeStruct
if err := xml.Unmarshal(doc, &myobj); err != nil {
return nil, err
}
return myobj, nil
}
The text was updated successfully, but these errors were encountered:
Except that to do that, you have to know the encoding ahead of time. Our servers get
messages in either UTF-8 or ISO-8859-1. So we basically have to parse the incoming
stream for the encoding parameter, load the correct reader, and unmarshal. Feels clunky.
Look at the docs:
// CharsetReader, if non-nil, defines a function to generate
// charset-conversion readers, converting from the provided
// non-UTF-8 charset into UTF-8. If CharsetReader is nil or
// returns an error, parsing stops with an error. One of the
// the CharsetReader's result values must be non-nil.
CharsetReader func(charset string, input io.Reader) (io.Reader, error)
Your hook gets passed in the charset. You don't need to parse it yourself.
by pico303:
The text was updated successfully, but these errors were encountered: