It still tokenizes "<a" as "<a>". This doesn't match the change description. Neither one matches ...
13 years, 5 months ago
(2011-10-14 15:30:20 UTC)
#2
It still tokenizes "<a" as "<a>". This doesn't match the change description.
Neither one matches html5lib. Html5lib has a sort of unusual behavior here. It
ignores "<a" or "<a ", but "<a x" is `<a x="">`. To match that, I think we would
need to check z.err between reading the tag name and reading the attributes. If
it isn't nil, return an ErrorToken.
http://codereview.appspot.com/5284042/diff/2002/src/pkg/html/token.go
File src/pkg/html/token.go (right):
http://codereview.appspot.com/5284042/diff/2002/src/pkg/html/token.go#newcode323
src/pkg/html/token.go:323: z.nextText()
should be z.nextBogusComment():
html5lib.parse(StringIO.StringIO("</*")).printTree()
#document
| <!-- * -->
| <html html>
| <html head>
| <html body>
Also, html5lib doesn't return any token at all for "</>". It is completely
ignored. What would be the best way to do that? My only thought is:
if c == '>' {
z.Next()
}
but would a recursive call to Next work correctly? I think it would, but
requiring Next to be reentrant would be sort of an odd constraint on future
changes.
> Also, html5lib doesn't return any token at all for "</>". It is > completely ...
13 years, 5 months ago
(2011-10-14 22:53:49 UTC)
#4
> Also, html5lib doesn't return any token at all for "</>". It is
> completely ignored. What would be the best way to do that?
Let me think about it. I'll leave it as a TODO.
Issue 5284042: code review 5284042: html: tokenize "<a" as text and "a < b" as one whole (t...
(Closed)
Created 13 years, 5 months ago by nigeltao
Modified 13 years, 5 months ago
Reviewers:
Base URL:
Comments: 2