zoneinfo: pure-Python POSIX TZ `Jn`/`n` day-of-year field accepts non-digit input via `int()` (C rejects)

## Bug report

### Bug description

In `Lib/zoneinfo/_zoneinfo.py`, `_parse_dst_start_end()` validates the `Mm.w.d`
transition rule strictly with an `re.ASCII` fullmatch, but the `Jn` (Julian)
and `n` (0-based) day-of-year branches fall through to a bare `int(date)` with
no format guard:

```python
    else:
        if type == "J":
            n_is_julian = True
            date = date[1:]
        else:
            n_is_julian = False

        doy = int(date)          # <-- no ASCII / format check
        offset = _DayOffset(doy, n_is_julian)
```

`int()` accepts things the C accelerator's day-of-year parser rejects. The C
side (`Modules/_zoneinfo.c`, `parse_transition_rule`) reads the field with
`parse_digits(&ptr, 1, 3, &day)`, which consumes 1 to 3 ASCII digits via
`Py_ISDIGIT` and nothing else. So the two implementations disagree on the same
POSIX TZ string.

The most serious case is a **silent miscompile**, not a crash: `int('1_0')`
is `10` (PEP 515 underscore grouping), so a TZ string like
`AAA4BBB,J1_0,J300/2` builds a *valid but different* zone (DST starts on day
10) in pure Python, while the C accelerator raises `ValueError`. A program
that relies on the pure fallback silently computes wrong local times instead
of reporting the malformed rule.

Other pure-accept / C-reject inputs for the day-of-year field: a leading `+`
(`J+1`), a leading space (`J 1`), 4-or-more-digit widths (`J0001`), and
non-ASCII digits (Arabic-Indic `J١`).

### Differential (main, before fix)

TZ template `AAA4BBB,<token>,J300/2`, only `<token>` varies; loaded through
both implementations via a crafted TZif v2+ footer:

| token   | C accelerator | pure-Python (before) |
|---------|---------------|----------------------|
| `J1_0`  | reject        | **accept — day 10** (silent miscompile) |
| `1_0`   | reject        | **accept — day 10** (silent miscompile) |
| `J+1`   | reject        | accept |
| `+1`    | reject        | accept |
| `J 1`   | reject        | accept |
| ` 1`    | reject        | accept |
| `J0001` | reject        | accept |
| `0001`  | reject        | accept |
| `J١` (Arabic 1) | reject | accept |
| `١`             | reject | accept |
| `J01`, `J001` | accept | accept (agree; 1-3 digit leading zeros are valid) |
| `J1`, `J365`, `0`, `365` | accept | accept (valid controls) |
| `J366`, `J400`, `J1234` | reject | reject (agree; range/width) |

10 divergent inputs. The C accelerator consumes at most 3 digits, so
`J0001` (4 digits) is rejected by C — any fix must not accept it either.

### CPython versions

main (3.16). The pure-Python parser has carried this since the POSIX TZ
support was added.

### Fix

Add an `re.ASCII` digit guard matching C's `parse_digits(&ptr, 1, 3, &day)`
(1 to 3 ASCII digits) before `int()`, in the `J`/`n` branch only:

```python
        if re.fullmatch(r"\d{1,3}", date, re.ASCII) is None:
            raise ValueError(f"Invalid dst start/end date: {dststr}")
        doy = int(date)
```

This makes pure exactly match C: it rejects the 10 divergent inputs, still
accepts the leading-zero `J01`/`J001` forms C accepts, and leaves the existing
`_DayOffset` range check (`[julian, 365]`) to reject out-of-range values, so
no numeric-range behaviour changes. All 499 bundled IANA zones parse
byte-identically through both implementations after the fix.


### Linked PRs
* gh-152848
* gh-152908
* gh-152909
* gh-152910

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

zoneinfo: pure-Python POSIX TZ `Jn`/`n` day-of-year field accepts non-digit input via `int()` (C rejects) #152847

Bug report

Bug description

Differential (main, before fix)

CPython versions

Fix

Linked PRs

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

token	C accelerator	pure-Python (before)
`J1_0`	reject	accept — day 10 (silent miscompile)
`1_0`	reject	accept — day 10 (silent miscompile)
`J+1`	reject	accept
`+1`	reject	accept
`J 1`	reject	accept
`1`	reject	accept
`J0001`	reject	accept
`0001`	reject	accept
`J١` (Arabic 1)	reject	accept
`١`	reject	accept
`J01`, `J001`	accept	accept (agree; 1-3 digit leading zeros are valid)
`J1`, `J365`, `0`, `365`	accept	accept (valid controls)
`J366`, `J400`, `J1234`	reject	reject (agree; range/width)

Uh oh!

Uh oh!

zoneinfo: pure-Python POSIX TZ Jn/n day-of-year field accepts non-digit input via int() (C rejects) #152847

Description

Bug report

Bug description

Differential (main, before fix)

CPython versions

Fix

Linked PRs

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions

zoneinfo: pure-Python POSIX TZ `Jn`/`n` day-of-year field accepts non-digit input via `int()` (C rejects) #152847