Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: html.Parse() panics #30600

Closed
empijei opened this issue Mar 5, 2019 · 7 comments
Closed

x/net/html: html.Parse() panics #30600

empijei opened this issue Mar 5, 2019 · 7 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@empijei
Copy link
Contributor

empijei commented Mar 5, 2019

NOTE: this is a bug found by fuzzing. Main bug to track the fuzzing effort: #27848

I'm using the x/net package on commit 16b79f2e4e9, master branch, Fri Mar 1 2019.

What did you do?

Parsed an html string with the code below

package main

import (
	"strings"
	"golang.org/x/net/html"
)

func main() {
	r := strings.NewReader(`<table><math><th><mO><select></table>`)
	html.Parse(r)
}

Got panic with "runtime error: index out of range"

Stack trace:

golang.org/x/net/html.(*nodeStack).pop(...)
        /gopath/src/golang.org/x/net/html/node.go:153
golang.org/x/net/html.(*parser).clearActiveFormattingElements(0xc00008a410)
        /gopath/src/golang.org/x/net/html/parse.go:372 +0x52
golang.org/x/net/html.inCellIM(0xc00008a410, 0x0)
        /gopath/src/golang.org/x/net/html/parse.go:1688 +0x343
golang.org/x/net/html.(*parser).parseCurrentToken(0xc00008a410)
        /gopath/src/golang.org/x/net/html/parse.go:2199 +0x80
golang.org/x/net/html.(*parser).parse(0xc00008a410, 0xc00000e028, 0x0)
        /gopath/src/golang.org/x/net/html/parse.go:2225 +0x50
golang.org/x/net/html.ParseFragment(0x51ec60, 0xc00000c0e0, 0x0, 0x2, 0xc00008a2f8, 0x0, 0xc000010050, 0xc0000445d0)
        /gopath/src/golang.org/x/net/html/parse.go:2306 +0x2b2
main.ParseFragment(0x4fe13a, 0x9, 0x506d3e, 0x2f)

The pop states that it will panic if s is empty:

// pop pops the stack. It will panic if s is empty.
func (s *nodeStack) pop() *Node {
	i := len(*s)
	n := (*s)[i-1]
	*s = (*s)[:i-1]
	return n
}

So I guess it should be the caller responsibility to check for it:

// Section 12.2.4.3.
func (p *parser) clearActiveFormattingElements() {
	for {
		n := p.afe.pop() // ← line 372 of parse.go
		if len(p.afe) == 0 || n.Type == scopeMarkerNode {
			return
		}
	}
}

Proposed fix:

// Section 12.2.4.3.
func (p *parser) clearActiveFormattingElements() {
	if len(p.afe) == 0 {
		return
	}
	for {
		n := p.afe.pop()
		if len(p.afe) == 0 || n.Type == scopeMarkerNode {
			return
		}
	}
}

If someone is ok with this I'll send a CL with the fix.

@gopherbot gopherbot added this to the Unreleased milestone Mar 5, 2019
@mazarin1
Copy link

mazarin1 commented Mar 5, 2019

I don't think it would be just enough to add check in clearActiveFormattingElements. nodeStack.pop getting called all over the parser. I think we need fix on both ends: add nil return on empty stack pop and check in callers to act appropriately

// pop pops the stack. It will panic if s is empty.
func (s *nodeStack) pop() *Node {
	i := len(*s)
        if i == 0{
            return nil
        }
	n := (*s)[i-1]
	*s = (*s)[:i-1]
	return n
}

sry writing from phone....

@empijei
Copy link
Contributor Author

empijei commented Mar 5, 2019

I believe there is a reason for pop being documented to panic, probably performance.

In the original code if len(p.afe) == 0 was already there, it was just missing the initial check. I'd be more inclined to do some spot fixes where needed, but this is just my idea to address it. I'm open to suggestions.

cc @bradfitz

@bcmills
Copy link
Contributor

bcmills commented Apr 12, 2019

We don't seem to have a specific owner for x/net/html (on https://dev.golang.org/owners), so if you want to take any particular approach to a fix I doubt anyone will object.

(@mikioh is listed as the primary owner for the x/net repo overall, but @nigeltao and @namusyaka seem to have made most of the changes in the html package.)

@bcmills bcmills added the NeedsFix The path to resolution is known, but the work has not been done. label Apr 12, 2019
@nigeltao
Copy link
Contributor

The x/net/html package implements the HTML5 parsing algorithm, which has a very, very detailed specification:

https://html.spec.whatwg.org/multipage/syntax.html#tree-construction

Please bear this in mind. I'd like to avoid making ad-hoc algorithmic changes if possible.

CC'ing @TomAnthony who has also recently made a fix to x/net/html.

@TomAnthony
Copy link

TomAnthony commented Apr 16, 2019

I only have familiarity with aspects of the algorithm, but my understanding is...

The pop shouldn't be empty, (the final call to pop leaving the open elements empty should be popping the html tag) and thus hitting the panic implies an error earlier in the tree construction.

I think in this case the the issue is with this part of the spec:

Where the steps above say to close the cell, they mean to run the following algorithm:

1. Generate implied end tags.

2. If the current node is not now a td element or a th element, then this is a parse error.

3. Pop elements from the stack of open elements stack until a td element or a th element has been popped from the stack.

4. Clear the list of active formatting elements up to the last marker.

5. Switch the insertion mode to "in row".

The relevant code is at line 1690:

if !p.elementInScope(tableScope, p.tok.DataAtom) {
	// Ignore the token.
	return true
}
// Close the cell and reprocess.
p.popUntil(tableScope, a.Td, a.Th)
p.clearActiveFormattingElements()
p.im = inRowIM

The check in step 2 of the spec is never completed, and in this case would be a failure. Then at step 3 we don't do anything (but we shouldn't be there according to the failure at step 2). My initial idea is that we combine these steps as popUntil reports whether it made any changes:

if !p.elementInScope(tableScope, p.tok.DataAtom) {
	// Ignore the token.
	return true
}
// Close the cell and reprocess.
if (p.popUntil(tableScope, a.Td, a.Th)) {
	p.clearActiveFormattingElements()
}
p.im = inRowIM

This passes the case above (and all tests) and produces the same output as both Chrome and Firefox for this snippet.

If this looks good I can prepare a PR with this and add a test case.

@TomAnthony
Copy link

I've just come back to this. In the spec there is a similar situation in body insertion mode (see An end tag token whose tag name is one of: "applet", "marquee", "object"):

1. Generate implied end tags.

2. If the current node is not an HTML element with the same tag name as that of the token, then this is a parse error.

3. Pop elements from the stack of open elements until an HTML element with the same tag name as the token has been popped from the stack.

4. Clear the list of active formatting elements up to the last marker.

This is handled in the code at line 1127, with a construct that is the same as my suggestion above:

if p.popUntil(defaultScope, p.tok.DataAtom) {
	p.clearActiveFormattingElements()
}

TomAnthony added a commit to TomAnthony/net that referenced this issue Apr 16, 2019
…spec

In section 12.2.6.4.15 of the spec, there is a condition that the current node is a td or th element, which is not implemented. This can lead to a panic when the open elements stack is popped whilst empty, as outlined in golang/go#30600. This commit implements that check.

Fixes golang/go#30600
TomAnthony added a commit to TomAnthony/net that referenced this issue Apr 16, 2019
…spec

In section 12.2.6.4.15 of the spec, there is a condition that the current node is a td or th element, which is not implemented. This can lead to a panic when the open elements stack is popped whilst empty, as outlined in golang/go#30600. This commit implements that check.

Fixes golang/go#30600
@gopherbot
Copy link

Change https://golang.org/cl/172377 mentions this issue: html: Add missing condition to 'in cell' insertion mode, required by spec

@golang golang locked and limited conversation to collaborators Apr 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants