factorize: dynamic scratch-table sizing (generalize grow_table to rehash array keys)

## Summary

Give `factorize` (#212) a **dynamically sized scratch hash table** instead of pre-sizing it to `n`
(the worst-case, all-unique count). The enabler is a small generalization of `grow_table` so it can
rehash **array-keyed** tables, not just the list/object-backed ones it supports today.

## Motivation

`factorize` builds a private scratch `FAMObject` and sizes it once with `grow_table(&scratch, n)`.
For the common grouping/pivot workload the number of distinct keys `k` is far smaller than `n`, so the
table is drastically over-allocated:

- `n = 1_000_000` → a table of ~2^21 slots ≈ **32 MB**, of which only `k` slots are ever touched.
- Those `k` touched slots are scattered across 32 MB, so lookups reach into L3 instead of staying
  L2/L1-resident.

Measured effect (int64, `n = 1M`, comparing the `n`-sized table against a small, cache-resident table):

| case | `n`-sized (32 MB) | small table | gain |
|---|---|---|---|
| k=100  | 2.62 ms | 2.20 ms | ~16% |
| k=5000 | 3.21 ms | 2.91 ms | ~9%  |

This also matters for closing the last gap to pandas. After adding a fast bit-based float hash,
`factorize` on `float64` is at near-parity with `pandas.factorize` (0.92–0.97x at the larger/higher-`k`
cases) but still slightly behind; a right-sized, cache-resident table is the most likely lever to flip
float (and low-`k` int) past pandas. For context, `factorize` already **beats** pandas decisively on
`<U` strings (4–9x), datetime64 (1.4–1.7x), and is at parity on int64.

## Why it isn't a free reuse of existing growth

`AutoMap`/`FrozenAutoMap` already grow dynamically, but **only for the list/object representation**.
`grow_table`'s rehash re-inserts each key via `insert_obj(self, PyList_GET_ITEM(self->keys, i), i, h)`,
which needs a Python-object key list. For any array-keyed table (`keys_array_type != 0` — every typed
int/float/string/datetime fast path) it bails:

```c
// src/auto_map.c, grow_table()
if (size_old) {
    if (self->keys_array_type) {
        PyErr_SetString(PyExc_NotImplementedError, "Cannot grow table for array keys");
        goto restore;
    }
    ...
}
```

`AutoMap.append`/`extend` reject array-backed keys the same way. So the existing growth is inseparable
from the boxed/object path — exactly the slow path `factorize` avoids. Reusing it as-is would throw away
the typed-hashing speed.

## Proposed approach

An open-addressing rehash doesn't need the key **value** — only to re-place each `(keys_pos, hash)`
entry by re-probing. Generalize `grow_table`'s array-keyed branch to do this. Two viable variants:

1. **Rehash by stored hash** — replay the `hash & mask` → SCAN → reprobe sequence to the first empty
   slot. No key access at all; dtype-agnostic.
2. **Rehash via the typed probes** — since the entries reference `self->keys` (an array) at valid
   `keys_pos` indices, call the existing `lookup_hash_*(new_table, key_at(keys_pos), h, kat)` to find
   the new slot; it re-probes and resolves same-hash collisions via `==`. Reuses code we already rely on.

Then wire `factorize`:
- Start the scratch table at a modest size (e.g. `min(n, 4096)`-sized) rather than `n`.
- Add a growth trigger at the top of the element loop: when the running distinct count `k` reaches the
  load threshold (`table_size * LOAD`), grow (rehash) before the next lookup, so no re-probe of the
  current element is needed.

This is safe for the rest of the codebase: real `FrozenAutoMap`s are pre-sized and never regrow, so the
new array-keyed rehash branch is dead for them and changes no existing behavior.

## Tradeoffs

- Helps the common low-cardinality case (`k ≪ n`): ~10–15% and better cache behavior.
- The rare all-unique case (`k ≈ n`) does O(log n) rehashes (total O(n) extra copies), a modest
  regression. Mitigated by the `min(n, 4096)` starting size so low-`k` needs zero or one growth, and
  high-`k` grows geometrically (amortized O(n), constant factor small).

## Acceptance criteria

- `grow_table` (or a dedicated helper) can rehash an array-keyed table correctly for all supported KATs
  (int/uint/float/unicode/string/datetime), verified by the existing `factorize` round-trip and
  randomized parity tests.
- `factorize` scratch table starts small and grows; no correctness change (all `test/test_factorize.py`
  and the full suite pass).
- Benchmark shows improvement on low-`k` `factorize` (int/float) with no material regression on the
  all-unique case; report float vs `pandas.factorize` before/after.

## References

- #212 (the `factorize` primitive)
- `src/auto_map.c`: `grow_table` (rehash guard), `lookup_hash_*` (typed probes), `factorize`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

factorize: dynamic scratch-table sizing (generalize grow_table to rehash array keys) #214

Summary

Motivation

Why it isn't a free reuse of existing growth

Proposed approach

Tradeoffs

Acceptance criteria

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

factorize: dynamic scratch-table sizing (generalize grow_table to rehash array keys) #214

Description

Summary

Motivation

Why it isn't a free reuse of existing growth

Proposed approach

Tradeoffs

Acceptance criteria

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions