lexer: encode \u escapes using locale encoding#51
Conversation
|
Super nice, tysm! Let me review this on Friday, I'm a bit busy this week. |
|
Do you mind solving the merge conflicts created after merging the other tests? Thanks in advance. |
|
Thank you for the review. I will address this conflict.
|
Alonely0
left a comment
There was a problem hiding this comment.
Pretty nice overall; mostly nitpicks from my part. I want to do some more testing but this looks great, especially since it allows us to support a lot more locales than gawk quite effortlessly. Nice job!
|
needs some rebasing too |
7fbe944 to
be7c222
Compare
be7c222 to
4105912
Compare
7fbe944 to
6643f57
Compare
Detect charset from LC_ALL/LC_CTYPE/LANG and encode \u sequences into the locale multibyte encoding (UTF-8, ISO-8859-1, ASCII-only for C/POSIX), matching gawk. Unrepresentable or invalid code points become '?'. Closes: uutils#40
6643f57 to
0a28365
Compare
Alonely0
left a comment
There was a problem hiding this comment.
Super nice! I have to say taking &mut impl Extend<u8> is something I hadn't ever thought of, although I worry it codegens worse than taking a vector and using push()/extend_from_slice_copy(). Still, that's just a nitpick and we can merge this without worrying much about that. ty!
Detect charset from LC_ALL/LC_CTYPE/LANG and encode \u sequences into the locale multibyte encoding (UTF-8, ISO-8859-1, ASCII-only for C/POSIX), matching gawk. Unrepresentable or invalid code points become '?'.
Closes: #40