Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: ParseFragment fails to parse sub-table elements in the root position #7232

Open
gopherbot opened this issue Jan 29, 2014 · 2 comments
Milestone

Comments

@gopherbot
Copy link

by algorithmicimperative:

1. Use `html.ParseFragment` to parse a fragment of HTML where the root elements are
`<tbody>`, `<tr>` or `<td>` (and probably other table sub-elements)

For example:

s := `<td>first</td>
    <td>second</td>
    <td>third</td>
`
doc, err := html.ParseFragment(strings.NewReader(s), &html.Node{
    Type: html.ElementNode,
    Data: "body",
    DataAtom: atom.Body,
})


2. Check the result `fmt.Printf("%#v\n", doc)`


What is the expected output?

`[]*html.Node` of 3 `td` elements


What do you see instead?

`[]*html.Node` of a single text node containing the `first second third` text.


Which operating system are you using? Linux


Which version are you using?  1.2



ParseFragment works fine with other semantically incorrect structures, like
`<option>` elements. Has trouble with table sub-elements though.

If this isn't a bug and is failing by design, perhaps we need something like
`atom.DocumentFragment` that will receive any arbitrary HTML.
@rsc
Copy link
Contributor

rsc commented Mar 3, 2014

Comment 1:

Labels changed: added repo-net.

Status changed to Accepted.

@andybalholm
Copy link
Contributor

Comment 2:

It is working as intended, since it parses it just as it would if the fragment were
enclosed between <body> and </body>. But it does surprise people. 
A ParseFragment-like function that does not take a context, and tries to return the
parse tree that a user is likely to expect, would be nice. It would probably require
adding a new insertion mode to the parser, though, so it wouldn't be trivial.

@mikioh mikioh changed the title code.google.com/p/go.net/html: ParseFragment fails to parse sub-table elements in the root position x/net/html: ParseFragment fails to parse sub-table elements in the root position Dec 23, 2014
@mikioh mikioh added repo-net and removed repo-net labels Dec 23, 2014
@mikioh mikioh changed the title x/net/html: ParseFragment fails to parse sub-table elements in the root position html: ParseFragment fails to parse sub-table elements in the root position Jan 4, 2015
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@rsc rsc changed the title html: ParseFragment fails to parse sub-table elements in the root position x/net/html: ParseFragment fails to parse sub-table elements in the root position Apr 14, 2015
@rsc rsc modified the milestones: Unreleased, Unplanned Apr 14, 2015
@rsc rsc removed the repo-net label Apr 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants