lexer: encode \u escapes using locale encoding by Franklin-Qi · Pull Request #51 · uutils/awk

Franklin-Qi · 2026-06-17T03:21:58Z

Detect charset from LC_ALL/LC_CTYPE/LANG and encode \u sequences into the locale multibyte encoding (UTF-8, ISO-8859-1, ASCII-only for C/POSIX), matching gawk. Unrepresentable or invalid code points become '?'.

Closes: #40

Alonely0 · 2026-06-17T09:46:24Z

Super nice, tysm! Let me review this on Friday, I'm a bit busy this week.

Alonely0 · 2026-06-17T10:18:06Z

Do you mind solving the merge conflicts created after merging the other tests? Thanks in advance.

Franklin-Qi · 2026-06-18T02:40:30Z

Thank you for the review. I will address this conflict.

Super nice, tysm! Let me review this on Friday, I'm a bit busy this week.

Alonely0

Pretty nice overall; mostly nitpicks from my part. I want to do some more testing but this looks great, especially since it allows us to support a lot more locales than gawk quite effortlessly. Nice job!

sylvestre · 2026-06-28T16:23:11Z

needs some rebasing too

Detect charset from LC_ALL/LC_CTYPE/LANG and encode \u sequences into the locale multibyte encoding (UTF-8, ISO-8859-1, ASCII-only for C/POSIX), matching gawk. Unrepresentable or invalid code points become '?'. Closes: uutils#40

Alonely0

Super nice! I have to say taking &mut impl Extend<u8> is something I hadn't ever thought of, although I worry it codegens worse than taking a vector and using push()/extend_from_slice_copy(). Still, that's just a nitpick and we can merge this without worrying much about that. ty!

Alonely0 self-requested a review June 17, 2026 09:46

Alonely0 requested changes Jun 19, 2026

View reviewed changes

Comment thread lexer/src/locale_encoding.rs Outdated

Comment thread lexer/src/locale_encoding.rs Outdated

Comment thread lexer/src/locale_encoding.rs Outdated

Comment thread lexer/src/locale_encoding.rs Outdated

Comment thread lexer/src/locale_encoding.rs Outdated

cursor Bot force-pushed the feature-task#40-Lexer-Numeric-escaping-u-for-different-locales branch from 7fbe944 to be7c222 Compare June 30, 2026 12:56

Franklin-Qi mentioned this pull request Jun 30, 2026

lexer: encode \u escapes using locale encoding (rebase) Franklin-Qi/awk#1

Closed

cursor Bot force-pushed the feature-task#40-Lexer-Numeric-escaping-u-for-different-locales branch from be7c222 to 4105912 Compare June 30, 2026 12:59

cursor Bot deleted the feature-task#40-Lexer-Numeric-escaping-u-for-different-locales branch June 30, 2026 13:07

cursor Bot force-pushed the feature-task#40-Lexer-Numeric-escaping-u-for-different-locales branch 4 times, most recently from 7fbe944 to 6643f57 Compare June 30, 2026 13:10

lexer: encode \u escapes using locale encoding

0a28365

Detect charset from LC_ALL/LC_CTYPE/LANG and encode \u sequences into the locale multibyte encoding (UTF-8, ISO-8859-1, ASCII-only for C/POSIX), matching gawk. Unrepresentable or invalid code points become '?'. Closes: uutils#40

Franklin-Qi force-pushed the feature-task#40-Lexer-Numeric-escaping-u-for-different-locales branch from 6643f57 to 0a28365 Compare July 1, 2026 08:08

Alonely0 approved these changes Jul 2, 2026

View reviewed changes

Alonely0 merged commit 894c8be into uutils:main Jul 2, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

lexer: encode \u escapes using locale encoding#51

lexer: encode \u escapes using locale encoding#51
Alonely0 merged 1 commit into
uutils:mainfrom
Franklin-Qi:feature-task#40-Lexer-Numeric-escaping-u-for-different-locales

Franklin-Qi commented Jun 17, 2026

Uh oh!

Alonely0 commented Jun 17, 2026

Uh oh!

Alonely0 commented Jun 17, 2026

Uh oh!

Franklin-Qi commented Jun 18, 2026

Uh oh!

Alonely0 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sylvestre commented Jun 28, 2026

Uh oh!

Alonely0 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Franklin-Qi commented Jun 17, 2026

Uh oh!

Alonely0 commented Jun 17, 2026

Uh oh!

Alonely0 commented Jun 17, 2026

Uh oh!

Franklin-Qi commented Jun 18, 2026

Uh oh!

Alonely0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sylvestre commented Jun 28, 2026

Uh oh!

Alonely0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants