Understanding the UTF-8 Lexer

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Understanding the UTF-8 Lexer

NateTG
I was looking at src/lexer.l and noticed
122:U       [\x80-\xbf]
123:U2      [\xc2-\xdf]
124:U3      [\xe0-\xef]
125:U4      [\xf0-\xf4]
126:UNICODE {U2}{U}|{U3}{U}{U}|{U4}{U}{U}{U}
I guess U+0080 through U+009F are control codes that are unlikely to occur but, shouldn't U2 be [\xc0-\xdf]?

Sent from the OpenSCAD mailing list archive at Nabble.com.

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
tp3
Reply | Threaded
Open this post in threaded view
|

Re: Understanding the UTF-8 Lexer

tp3
On 12.07.19 04:29, NateTG wrote:
> I guess U+0080 through U+009F are control codes
> that are unlikely to occur but, shouldn't U2 be
> [\xc0-\xdf]?

No, I don't think so. I guess the reason is that C0
and C1 would generate overlapping values with single
byte sequences.

https://www.fileformat.info/info/unicode/utf8.htm
also shows C2 to DF.

ciao,
  Torsten.

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org
-- Torsten
Reply | Threaded
Open this post in threaded view
|

Re: Understanding the UTF-8 Lexer

NateTG
Oh, I guess I misread the docs.  Thanks.




--
Sent from: http://forum.openscad.org/

_______________________________________________
OpenSCAD mailing list
[hidden email]
http://lists.openscad.org/mailman/listinfo/discuss_lists.openscad.org