encoding/xml: very low performance in xml parser #21823

243083df · 2017-09-09T21:43:31Z

What version of Go are you using (`go version`)?

1.9

Does this issue reproduce with the latest release?

True

What operating system and processor architecture are you using (`go env`)?

Windows

What did you do?

I trying to parse large files with SAX with go, and get decadent performance.
I rewrite code in C#, and get maximum performance.

file, err := os.Open(filename)
handle(err)
defer file.Close()
buffer := bufio.NewReaderSize(file, 1024*1024*256) // 33554432
decoder := xml.NewDecoder(buffer)
for {
        t, _ := decoder.Token()
        if t == nil {
            break
        }
        switch se := t.(type) {
        case xml.StartElement:
            if se.Name.Local == "House" {
                house := House{}
                err := decoder.DecodeElement(&house, &se)
                handle(err)
            }
        }
    }

using (XmlReader reader = XmlReader.Create(filename)
            {
                while (reader.Read())
                {
                    switch (reader.NodeType)
                    {
                        case XmlNodeType.Element:
                            if (reader.Name == "House")
                            {
                                //Code
                            }
                            break;
                    }
                }
            }

What did you expect to see?

Mature and fast xml parser in golang.

What did you see instead?

The bottleneck in SAX xml parsing with go is CPU, instead of low HDD io performance.

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2017-09-09T21:55:33Z

Can you show us a complete, standalone example program? And provide your input file, or at least tell us about it?

What are the actual performance numbers?

Thanks.

243083df · 2017-09-09T22:23:01Z

File is very big, around 10-50 millions records.
The xml file looks like this:

<?xml version="1.0" encoding="utf-8"?>
<LIST>
<ELEMENT ATTRIBUTE1="" ATTRIBUTE2="" /><ELEMENT ATTRIBUTE1="" ATTRIBUTE2="" />
</LIST>

a-h · 2017-09-10T21:56:34Z

I was curious, since I use .NET and Go regularly, so I turned this into a reproduction at https://github.com/a-h/sax

On my MacOS machine, I found that the Go version was much slower (around 20 seconds) than the .NET Core 2.0 one (around 3 seconds) for a file with 10 million elements in it laid out as per the example above.

I put the timing output in the README.md. I noted that the CPU was at 100% during Go execution.

I added an SVG of the Go CPU profile output to the repo.

243083df · 2017-09-11T18:31:58Z

I think, @a-h's benchmark enough to investigate. I used same code.

gopherbot · 2017-09-12T20:46:35Z

Change https://golang.org/cl/63390 mentions this issue: unicode: speed-up is16/is32

Avoid division in common case. There are 5438 ranges in unicode/tables.go 4110 of them have stride 1. Stride 1 case got significantly faster. Other stride is a bit slower. Measured by import ( "testing" "unicode" ) func BenchmarkDiv1(b *testing.B) { rtb := &unicode.RangeTable{ R16: []unicode.Range16{ {0xa800, 0xdfff, 1}, // or 3 }, } for i := 0; i < b.N; i++ { unicode.Is(rtb, rune(0xc700)) } } Div1-6 15.6ns ± 1% 9.9ns ± 1% -36.54% (p=0.000 n=10+10) Div3-6 15.5ns ± 1% 16.1ns ± 1% +3.67% (p=0.000 n=10+10) Helps a bit with xml parsing from issue #21823 XMLsax-6 30.9s ± 0% 29.6s ± 0% -4.15% (p=0.000 n=10+9) Change-Id: Ibac1a91d7b9474d0c134b0add83e56caa62daa20 Reviewed-on: https://go-review.googlesource.com/63390 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>

243083df · 2017-09-13T15:52:34Z

By the way. Is call len(p) in https://github.com/golang/go/blob/master/src/bufio/bufio.go#L196 necessarily?

mattn · 2017-09-13T16:56:35Z

xml.NewDecoder already create bufio.Reader.

243083df · 2017-09-14T09:51:35Z

@mattn Yes, but its create bufio.Reader without minimum read value.
bufio.Reader with minimum read value around 64Mb, speed up parsing 2Gb xml from 2m40s to 2m5s(around 20%, but its not stable) on HDD.

robfordww · 2017-11-04T10:58:55Z

A lot of speed could be gained if the parser was referring to the xml byte slice instead of copying everything. I am working on such an parser for go, based on the rapidxml library in c++

saleem-mirza · 2017-12-02T11:30:22Z

I am experiencing same issue. Parsing 1 GB XML takes minutes which C# version completes in few seconds ( < 10)

suntong · 2017-12-31T22:17:40Z

So the go sax parser is about 6~7 times slower than the C#, i.e., much room to improve. Watching the development on this...

Anyone can confirm, whether unmarshaling into defined data structures, if possible, can speed things up? I always use the sax parser way, maybe it is time to go with the DOM parser way, as my files are not terribly big. I went with the sax parser way only because intuitively thinking, it should be faster than the DOM parser way.

a-h · 2018-01-03T19:16:13Z

Just checked the performance of various versions using Docker containers to see if 1.10beta1 is likely to improve performance (I thought I saw a few perf improvements to Unicode handling in the changelog).

docker run -it --rm -v `pwd`:/go/src/github.com/a-h/sax golang:1.10beta1 /bin/bash
docker run -it --rm -v `pwd`:/go/src/github.com/a-h/sax golang:1.9.2 /bin/bash
docker run -it --rm -v `pwd`:/go/src/github.com/a-h/sax golang:1.8 /bin/bash

Results

1.10: 0m21.383s
1.9.2: 0m21.281s
1.8: 0m27.370s

So, looks like no improvement this version.

suntong · 2018-01-03T23:26:09Z

Well done!

Has anyone done a profile to see exactly where the pain-point is?
Maybe that might help pushing things a bit?

a-h · 2018-01-04T13:55:56Z

@suntong - see https://github.com/a-h/sax/ and in particular the profile output at https://github.com/a-h/sax/blob/master/pprof001.svg

a-h · 2018-03-25T20:28:27Z

It's possible to reduce the amount of calls to the UTF8 DecodeRune etc. by keeping a cache of the names the decoder has already seen, on the basis that it's very likely those names will be seen again. This could be limited to a reasonable level, e.g. 10,000 XML names to stop it using too much RAM.

Over 1,000,000 XML elements, I saw an improvement from 1.632s to 1.431s.

$ go test -bench=Decoder -cpuprofile profile_cpu.out
processed authors:  1000000
goos: darwin
goarch: amd64
pkg: encoding/xml
BenchmarkDecoder-4             1        1437269859 ns/op        518663400 B/op   8000059 allocs/op
PASS
ok      encoding/xml    1.632s
$ go test -bench=Decoder -cpuprofile profile_cpu.out
processed authors:  1000000
goos: darwin
goarch: amd64
pkg: encoding/xml
BenchmarkDecoder-4             1        1211802156 ns/op        518656056 B/op   8000053 allocs/op
PASS
ok      encoding/xml    1.431s

func (d *Decoder) isName(s []byte) bool {
	// Check the cache first.
	if _, ok := d.names[string(s)]; ok {
		return true
	}
	// If it's not in the cache, add it if it's valid.
	v := isName(s)
	if v {
		d.names[string(s)] = true
	}
	return v
}

// Get name: /first(first|second)*/
// Do not set d.err if the name is missing (unless unexpected EOF is received):
// let the caller provide better context.
func (d *Decoder) name() (s string, ok bool) {
	d.buf.Reset()
	if !d.readName() {
		return "", false
	}

	// Now we check the characters.
	b := d.buf.Bytes()
	if !d.isName(b) {
		d.err = d.syntaxError("invalid XML name: " + string(b))
		return "", false
	}
	return string(b), true
}

@ianlancetaylor - is it worth me pursuing that as a change?

a-h · 2018-03-25T20:29:32Z

My benchmark is:

func BenchmarkDecoder(b *testing.B) {
	b.ReportAllocs()
	count := 1000000
	buffer := strings.NewReader(`<authors>` + authors(count) + `</authors`)
	for i := 0; i < b.N; i++ {
		buffer.Seek(0, io.SeekStart)
		decoder := NewDecoder(buffer)
		var authors = 0
		for {
			tok, _ := decoder.Token()
			if tok == nil {
				break
			}
			switch se := tok.(type) {
			case StartElement:
				if se.Name.Local == "author" {
					authors++
				}
			}
		}
		if authors != count {
			panic("failed to report correct number of authors")
		}
		fmt.Println("processed authors: ", authors)
	}
}

func authors(count int) string {
	buf := bytes.NewBufferString("")
	for i := 0; i < count; i++ {
		buf.WriteString(`<author name="Alan Watt" />`)
	}
	return buf.String()
}

243083df · 2018-03-28T19:42:13Z

@a-h I think it would be better to use slice cache isead of Read|Unread byte

a-h · 2018-03-29T06:50:32Z

Do you mean something that reads into a buffer like this? https://github.com/a-h/lexical/blob/master/input/stream.go

nussjustin · 2018-04-12T17:45:30Z

@a-h The code could be further optimized like this:

// Get name: /first(first|second)*/
// Do not set d.err if the name is missing (unless unexpected EOF is received):
// let the caller provide better context.
func (d *Decoder) name() (s string, ok bool) {
	d.buf.Reset()
	if !d.readName() {
		return "", false
	}

	// Now we check the characters.
	b := d.buf.Bytes()
    if s, ok = d.names[string(b)]; ok {
		return s, ok
	}
	if !isName(b) {
		d.err = d.syntaxError("invalid XML name: " + string(b))
		return "", false
	}
    s = string(b)
    d.names[s] = s
	return s, true
}

This way we can avoid the isName check and reuse the string from the d.names map without allocation.

For the existing unmarshal benchmark this gives me

name         old time/op    new time/op    delta
Unmarshal-8    11.6µs ± 1%    11.7µs ± 3%     ~     (p=0.990 n=12+14)

name         old alloc/op   new alloc/op   delta
Unmarshal-8    8.27kB ± 0%    9.13kB ± 0%  +10.40%  (p=0.000 n=15+15)

name         old allocs/op  new allocs/op  delta
Unmarshal-8       190 ± 0%       166 ± 0%  -12.63%  (p=0.000 n=15+15)

No change on the time, but at work we have a tool that parses multiple XML files, each around 250MB+, and reusing the allocated names saves about 2 seconds per file (14s -> 12s).

In my case I shaved off another second by making copyValue accept both []byte and string, avoiding an allocation for each attribute value.

But again, this doesn't reflect in the existing benchmark:

name         old time/op    new time/op    delta
Unmarshal-8    11.6µs ± 1%    11.4µs ± 1%  -1.72%  (p=0.000 n=12+12)

name         old alloc/op   new alloc/op   delta
Unmarshal-8    8.27kB ± 0%    8.11kB ± 0%  -1.93%  (p=0.000 n=15+15)

name         old allocs/op  new allocs/op  delta
Unmarshal-8       190 ± 0%       186 ± 0%  -2.11%  (p=0.000 n=15+15)

tamerh · 2018-11-15T13:23:08Z

Hi, any update on this or workaround solution? i also observed that it is slower than java.

webern · 2018-12-23T18:56:42Z

I have also found my way here after seeing a very large box in my profile for encoding/xml (*Decoder) unmarshal. Golang XML 'feels' a lot slower than C++ rapidxml, pugixml and expat, though I have not benchmarked.

tamerh · 2019-01-23T22:57:16Z

Hi all and @243083df, @l-we, @webern, @ajruckman, @qwantix, @evt

Regarding this issue you can try my library

https://github.com/tamerh/xml-stream-parser

ilyabreev · 2019-04-27T15:38:26Z

Any progress on this issue?

saleem-mirza · 2019-04-27T17:32:08Z

Actually, programmer who coded this library is long gone so we are clueless.

Disclaimer: this is not official excuse, I made it up since community is mysteriously silent on this issue

webern · 2019-04-27T17:38:41Z

@saleem-mirza this begs the question, what are the plans for Golang to continue supporting XML natively as part of the standard library?

Currently the XML implementation suffers from several issues that make it substandard (when compared with Xerces, Java or libxml2, for example)

It is very slow.
It is non-validating (we need xsd validation)
There is no support for loading XML when the schema is not known. i.e. we can load any json and we get a map[string]interface. But we cannot do this with xml.

I would love to work on something like this, but it would definitely be a full-time job, i.e. someone would have to be paid to build this.

tamerh · 2019-06-11T12:56:20Z

Hi all and @243083df, @l-we, @webern, @ajruckman, @qwantix, @evt

Regarding this issue you can try my library

https://github.com/tamerh/xml-stream-parser

For people who are interested in streaming parsing i have refactored the library
to improve code quality, error handling and now it is more than 30% faster compare to previous version.Let me know if you have suggestions/feedbacks.

gopherbot · 2020-02-08T23:12:06Z

Change https://golang.org/cl/218658 mentions this issue: encoding/xml: decoder to use buffered start attr

This PR adds a new event log reader implementation that's behind a feature flag for now. It achieves higher event throughput than the current reader by not using XML and by heavily caching static metadata about events. To enable it add `api` to each event log reader. ``` winlogbeat.event_logs: - name: Security api: wineventlog-experimental ``` The existing reader requests each event as XML and then must unmarshal the XML document. EvtFormatMessage is used to get the XML document from Windows. Then the Go stdlib encoder/xml package is used to parse it. Both of these operations are relatively slow (see golang/go#21823 about encoding/xml). This new reader utilizes the publisher metadata APIs to fetch and cache metadata about all event IDs associated with a provider. It does this the first time it encounters a provider ID while reading events. __Risk: Caching this info could lead to having stale information in memory if metadata changes via software update (see Edge Cases).__ It caches the names of the event data parameters and a templatized version of the message string. To get the data for an event this reader receives EVT_VARIANT structs containing the parameters rather than receiving and parsing XML. This is more efficient because there are fewer and smaller memory allocations and no XML encoding or decoding. To get the message for an event it utilizes the cached text/template it has for the event ID and passes it the list of parameter values. Edge Cases There is no provider metadata installed on the host. Could happen for forwarded events or reading from .evtx files. - Mitigate by falling back to getting parameter names by the event XML and rendering the message with EvtFormatMessage for each event. Software is updated and an event ID changes it's event data parameters. Saw this between Sysmon versions 9 and 10 with event ID 5. - Mitigate by fingerprinting the number of event data parameters and their types. - If the fingerprint changes, fetch the XML for the event and store the parameter names. Benchmark Comparison Comparing batch_size 500, that's a 1396% increase in events/sec, a -81% reduction in bytes allocated per event, and -86% decrease in the number of allocations. PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -run TestBenchmarkRead -benchmem -benchtime 10s -benchtest -v . --- PASS: TestBenchmarkRead (231.68s) --- PASS: TestBenchmarkRead/api=wineventlog (53.57s) --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=10 (12.19s) bench_test.go:128: 2067.28 events/sec 18283 B/event 182836 B/batch 251 allocs/event 2516 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=100 (16.73s) bench_test.go:128: 2144.50 events/sec 17959 B/event 1795989 B/batch 250 allocs/event 25020 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=500 (13.48s) bench_test.go:128: 1888.40 events/sec 17648 B/event 8824455 B/batch 250 allocs/event 125018 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=1000 (11.18s) bench_test.go:128: 2064.14 events/sec 17650 B/event 17650459 B/batch 250 allocs/event 250012 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental (98.28s) --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=10 (18.72s) bench_test.go:128: 16813.52 events/sec 3974 B/event 39744 B/batch 34 allocs/event 344 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=100 (25.39s) bench_test.go:128: 28300.30 events/sec 3634 B/event 363498 B/batch 33 allocs/event 3324 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=500 (26.40s) bench_test.go:128: 28266.73 events/sec 3332 B/event 1666041 B/batch 33 allocs/event 16597 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=1000 (27.77s) bench_test.go:128: 28387.74 events/sec 3330 B/event 3330690 B/batch 33 allocs/event 33127 allocs/batch --- PASS: TestBenchmarkRead/api=eventlogging (13.29s) bench_test.go:128: 56243.80 events/sec 8043 B/event 6513053 B/batch 31 allocs/event 25151 allocs/batch PASS ok github.com/elastic/beats/v7/winlogbeat/eventlog 231.932s

…#16849) This PR adds a new event log reader implementation that's behind a feature flag for now. It achieves higher event throughput than the current reader by not using XML and by heavily caching static metadata about events. To enable it add `api` to each event log reader. ``` winlogbeat.event_logs: - name: Security api: wineventlog-experimental ``` The existing reader requests each event as XML and then must unmarshal the XML document. EvtFormatMessage is used to get the XML document from Windows. Then the Go stdlib encoder/xml package is used to parse it. Both of these operations are relatively slow (see golang/go#21823 about encoding/xml). This new reader utilizes the publisher metadata APIs to fetch and cache metadata about all event IDs associated with a provider. It does this the first time it encounters a provider ID while reading events. __Risk: Caching this info could lead to having stale information in memory if metadata changes via software update (see Edge Cases).__ It caches the names of the event data parameters and a templatized version of the message string. To get the data for an event this reader receives EVT_VARIANT structs containing the parameters rather than receiving and parsing XML. This is more efficient because there are fewer and smaller memory allocations and no XML encoding or decoding. To get the message for an event it utilizes the cached text/template it has for the event ID and passes it the list of parameter values. Edge Cases There is no provider metadata installed on the host. Could happen for forwarded events or reading from .evtx files. - Mitigate by falling back to getting parameter names by the event XML and rendering the message with EvtFormatMessage for each event. Software is updated and an event ID changes it's event data parameters. Saw this between Sysmon versions 9 and 10 with event ID 5. - Mitigate by fingerprinting the number of event data parameters and their types. - If the fingerprint changes, fetch the XML for the event and store the parameter names. Benchmark Comparison Comparing batch_size 500, that's a 1396% increase in events/sec, a -81% reduction in bytes allocated per event, and -86% decrease in the number of allocations. PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -run TestBenchmarkRead -benchmem -benchtime 10s -benchtest -v . --- PASS: TestBenchmarkRead (231.68s) --- PASS: TestBenchmarkRead/api=wineventlog (53.57s) --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=10 (12.19s) bench_test.go:128: 2067.28 events/sec 18283 B/event 182836 B/batch 251 allocs/event 2516 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=100 (16.73s) bench_test.go:128: 2144.50 events/sec 17959 B/event 1795989 B/batch 250 allocs/event 25020 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=500 (13.48s) bench_test.go:128: 1888.40 events/sec 17648 B/event 8824455 B/batch 250 allocs/event 125018 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=1000 (11.18s) bench_test.go:128: 2064.14 events/sec 17650 B/event 17650459 B/batch 250 allocs/event 250012 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental (98.28s) --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=10 (18.72s) bench_test.go:128: 16813.52 events/sec 3974 B/event 39744 B/batch 34 allocs/event 344 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=100 (25.39s) bench_test.go:128: 28300.30 events/sec 3634 B/event 363498 B/batch 33 allocs/event 3324 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=500 (26.40s) bench_test.go:128: 28266.73 events/sec 3332 B/event 1666041 B/batch 33 allocs/event 16597 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=1000 (27.77s) bench_test.go:128: 28387.74 events/sec 3330 B/event 3330690 B/batch 33 allocs/event 33127 allocs/batch --- PASS: TestBenchmarkRead/api=eventlogging (13.29s) bench_test.go:128: 56243.80 events/sec 8043 B/event 6513053 B/batch 31 allocs/event 25151 allocs/batch PASS ok github.com/elastic/beats/v7/winlogbeat/eventlog 231.932s (cherry picked from commit d81ef73)

…#17022) This PR adds a new event log reader implementation that's behind a feature flag for now. It achieves higher event throughput than the current reader by not using XML and by heavily caching static metadata about events. To enable it add `api` to each event log reader. ``` winlogbeat.event_logs: - name: Security api: wineventlog-experimental ``` The existing reader requests each event as XML and then must unmarshal the XML document. EvtFormatMessage is used to get the XML document from Windows. Then the Go stdlib encoder/xml package is used to parse it. Both of these operations are relatively slow (see golang/go#21823 about encoding/xml). This new reader utilizes the publisher metadata APIs to fetch and cache metadata about all event IDs associated with a provider. It does this the first time it encounters a provider ID while reading events. __Risk: Caching this info could lead to having stale information in memory if metadata changes via software update (see Edge Cases).__ It caches the names of the event data parameters and a templatized version of the message string. To get the data for an event this reader receives EVT_VARIANT structs containing the parameters rather than receiving and parsing XML. This is more efficient because there are fewer and smaller memory allocations and no XML encoding or decoding. To get the message for an event it utilizes the cached text/template it has for the event ID and passes it the list of parameter values. Edge Cases There is no provider metadata installed on the host. Could happen for forwarded events or reading from .evtx files. - Mitigate by falling back to getting parameter names by the event XML and rendering the message with EvtFormatMessage for each event. Software is updated and an event ID changes it's event data parameters. Saw this between Sysmon versions 9 and 10 with event ID 5. - Mitigate by fingerprinting the number of event data parameters and their types. - If the fingerprint changes, fetch the XML for the event and store the parameter names. Benchmark Comparison Comparing batch_size 500, that's a 1396% increase in events/sec, a -81% reduction in bytes allocated per event, and -86% decrease in the number of allocations. PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -run TestBenchmarkRead -benchmem -benchtime 10s -benchtest -v . --- PASS: TestBenchmarkRead (231.68s) --- PASS: TestBenchmarkRead/api=wineventlog (53.57s) --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=10 (12.19s) bench_test.go:128: 2067.28 events/sec 18283 B/event 182836 B/batch 251 allocs/event 2516 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=100 (16.73s) bench_test.go:128: 2144.50 events/sec 17959 B/event 1795989 B/batch 250 allocs/event 25020 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=500 (13.48s) bench_test.go:128: 1888.40 events/sec 17648 B/event 8824455 B/batch 250 allocs/event 125018 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog/batch_size=1000 (11.18s) bench_test.go:128: 2064.14 events/sec 17650 B/event 17650459 B/batch 250 allocs/event 250012 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental (98.28s) --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=10 (18.72s) bench_test.go:128: 16813.52 events/sec 3974 B/event 39744 B/batch 34 allocs/event 344 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=100 (25.39s) bench_test.go:128: 28300.30 events/sec 3634 B/event 363498 B/batch 33 allocs/event 3324 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=500 (26.40s) bench_test.go:128: 28266.73 events/sec 3332 B/event 1666041 B/batch 33 allocs/event 16597 allocs/batch --- PASS: TestBenchmarkRead/api=wineventlog-experimental/batch_size=1000 (27.77s) bench_test.go:128: 28387.74 events/sec 3330 B/event 3330690 B/batch 33 allocs/event 33127 allocs/batch --- PASS: TestBenchmarkRead/api=eventlogging (13.29s) bench_test.go:128: 56243.80 events/sec 8043 B/event 6513053 B/batch 31 allocs/event 25151 allocs/batch PASS ok github.com/elastic/beats/v7/winlogbeat/eventlog 231.932s (cherry picked from commit d81ef73)

ianlancetaylor changed the title ~~Very low performance in xml parser~~ encoding/xml: very low performance in xml parser Sep 9, 2017

ianlancetaylor added this to the Unplanned milestone Sep 9, 2017

ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Sep 9, 2017

tamerh mentioned this issue Nov 20, 2018

Request: streaming parser support alecthomas/participle#39

Closed

tsak mentioned this issue Jun 12, 2019

Additional test for go a-h/sax#1

Open

andrewkroh mentioned this issue Mar 5, 2020

[Winlogbeat] Add experimental event log reader with increased performance elastic/beats#16849

Merged

8 tasks

xamgore mentioned this issue Mar 9, 2020

Sax moovweb/gokogiri#34

Open

andrewkroh mentioned this issue Mar 16, 2020

Cherry-pick #16849 to 7.x: [Winlogbeat] Add experimental event log reader with increased performance elastic/beats#17022

Merged

8 tasks

qmuntal mentioned this issue Apr 9, 2020

encoding/xml: memoize names during decode #38332

Open

vdjagilev mentioned this issue Nov 3, 2023

bug: parsing large files/xml vdjagilev/nmap-formatter#149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding/xml: very low performance in xml parser #21823

encoding/xml: very low performance in xml parser #21823

243083df commented Sep 9, 2017

ianlancetaylor commented Sep 9, 2017

243083df commented Sep 9, 2017

a-h commented Sep 10, 2017

243083df commented Sep 11, 2017

gopherbot commented Sep 12, 2017

243083df commented Sep 13, 2017

mattn commented Sep 13, 2017

243083df commented Sep 14, 2017 •

edited

robfordww commented Nov 4, 2017 •

edited

saleem-mirza commented Dec 2, 2017

suntong commented Dec 31, 2017 •

edited

a-h commented Jan 3, 2018

suntong commented Jan 3, 2018

a-h commented Jan 4, 2018

a-h commented Mar 25, 2018

a-h commented Mar 25, 2018

243083df commented Mar 28, 2018

a-h commented Mar 29, 2018

nussjustin commented Apr 12, 2018

tamerh commented Nov 15, 2018

webern commented Dec 23, 2018

tamerh commented Jan 23, 2019

ilyabreev commented Apr 27, 2019

saleem-mirza commented Apr 27, 2019

webern commented Apr 27, 2019 •

edited

tamerh commented Jun 11, 2019

gopherbot commented Feb 8, 2020

encoding/xml: very low performance in xml parser #21823

encoding/xml: very low performance in xml parser #21823

Comments

243083df commented Sep 9, 2017

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

ianlancetaylor commented Sep 9, 2017

243083df commented Sep 9, 2017

a-h commented Sep 10, 2017

243083df commented Sep 11, 2017

gopherbot commented Sep 12, 2017

243083df commented Sep 13, 2017

mattn commented Sep 13, 2017

243083df commented Sep 14, 2017 • edited

robfordww commented Nov 4, 2017 • edited

saleem-mirza commented Dec 2, 2017

suntong commented Dec 31, 2017 • edited

a-h commented Jan 3, 2018

Results

suntong commented Jan 3, 2018

a-h commented Jan 4, 2018

a-h commented Mar 25, 2018

a-h commented Mar 25, 2018

243083df commented Mar 28, 2018

a-h commented Mar 29, 2018

nussjustin commented Apr 12, 2018

tamerh commented Nov 15, 2018

webern commented Dec 23, 2018

tamerh commented Jan 23, 2019

ilyabreev commented Apr 27, 2019

saleem-mirza commented Apr 27, 2019

webern commented Apr 27, 2019 • edited

tamerh commented Jun 11, 2019

gopherbot commented Feb 8, 2020

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

243083df commented Sep 14, 2017 •

edited

robfordww commented Nov 4, 2017 •

edited

suntong commented Dec 31, 2017 •

edited

webern commented Apr 27, 2019 •

edited