Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/xml: update character ranges for names to fifth edition (2008) specification #28124

Open
iand opened this issue Oct 10, 2018 · 3 comments

Comments

@iand
Copy link
Contributor

iand commented Oct 10, 2018

Currently the validation of XML names is based on the original 1998 specification which defines a large set of codepoint ranges that are to be accepted. These ranges were widened and simplified in the fifth edition of the spec, published in 2008 and now the current version.

The name production rules are now:

NameStartChar  ::=       ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | 
                           [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | 
                           [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | 
                           [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | 
                           [#x10000-#xEFFFF]
NameChar       ::=       NameStartChar | "-" | "." | [0-9] | #xB7 | 
                           [#x0300-#x036F] | [#x203F-#x2040]
Name           ::=       NameStartChar (NameChar)*
Names          ::=       Name (#x20 Name)*
Nmtoken        ::=       (NameChar)+
Nmtokens       ::=       Nmtoken (#x20 Nmtoken)*

This may also address the majority of the requirements for xml1.1 support (#25755) since the changes between 1.0 and 1.1 were the expansion of the name character ranges, the addition of two line ending characters (U+0085, U+2028) and specification of additional normalisation rules

The current ranges span 300 lines of code in the xml package so changing this will also contribute to #26775

If there is interest then I can submit a CL.

@bcmills
Copy link
Contributor

bcmills commented Oct 23, 2018

CC @rsc

@bcmills bcmills added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. FeatureRequest labels Oct 23, 2018
@bcmills bcmills added this to the Unplanned milestone Oct 23, 2018
@iand
Copy link
Contributor Author

iand commented Jun 6, 2019

Is this something that could be considered for 1.14 @rsc? I'm happy to submit a CL if it's accepted as a desired feature.

@bcmills
Copy link
Contributor

bcmills commented Jun 6, 2019

I'll mark it as a proposal so that the proposal group will see it to make a decision.

@bcmills bcmills added Proposal and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Jun 6, 2019
@bcmills bcmills modified the milestones: Unplanned, Proposal Jun 6, 2019
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Feb 24, 2021
@rsc rsc changed the title encoding/xml: update character ranges for names to fifth edition (2008) specification proposal: encoding/xml: update character ranges for names to fifth edition (2008) specification Jun 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Incoming
Development

No branches or pull requests

2 participants