Skip to content

Ngmix v2.0 (CI mirror of #740)#741

Open
cailmdaley wants to merge 146 commits into
developfrom
ngmix_v2.0
Open

Ngmix v2.0 (CI mirror of #740)#741
cailmdaley wants to merge 146 commits into
developfrom
ngmix_v2.0

Conversation

@cailmdaley

Copy link
Copy Markdown
Contributor

What this is

A same-repo mirror of #740 (@martinkilbinger's "Ngmix v2.0"), pushed to a branch on CosmoStat/shapepipe so that CI actually runs. All 57 commits are authored by Martin (and carry Lucy Baumont's and Axel Guinot's work) — pushing the branch preserves that authorship unchanged; the only thing this PR adds is a same-repo head so GitHub Actions fires the pull_request workflow without the fork-PR approval gate. #740 received no CI runs at all for this reason.

Substance is identical to #740 — see that PR for the full description. In short: upgrade ngmix to 2.4.0 and adopt Lucy's new ngmix classes/interface; overhaul the shape-measurement module; centroid-bias fix + validation; v2.0 production-run plumbing.

Going forward, opening PRs directly on CosmoStat/shapepipe (rather than from a fork) avoids this — fork PRs don't trigger our Docker-image CI without a maintainer approval that wasn't happening.

Closes/supersedes #740 once CI is green (leaving that call to Martin).

Review

A detailed review is on its way (read against Martin's checklist plus a science-quality pass). Headline from exercising the new fitter against real ngmix 2.4.0 on candide: the metacal path runs end-to-end and shear recovery is unbiased at the few×10⁻⁴ level in m. Full notes to follow.

— Claude on behalf of Cail

Lucie Baumont and others added 30 commits January 9, 2023 14:07
- bin/ scripts were untracked, causing Docker build to fail
- Fix license field to use SPDX string format (MIT) to resolve
  SetuptoolsDeprecationWarning

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cailmdaley and others added 7 commits June 30, 2026 10:59
conftest's on_candide() matches the hostname against ^(c\d|n\d{2}) and the
comment explicitly says to override it "for CI or odd hostnames" — but the
workflow never did. CI runs pytest via `docker run` with no --hostname, so
the container's hostname is its hex ID; ~4% of the time that ID starts with
c+digit, on_candide() false-positives, the candide-marked cluster tests run
instead of skipping, and they ERROR on the missing candide data
(FileNotFoundError) — a flaky red on an otherwise-green suite (275 passed,
2 should-be-skipped). Set the override the conftest already supports.

(Same one-liner should land on develop; the flake exists there too.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01DwHw42NeHoEWwsykwwDooP
…-container

Document how to run the container against a different pure-Python library build
(a worktree, an unreleased branch, this repo's own src/ against a stale image)
by prepending the host checkout to PYTHONPATH — the local-testing counterpart of
a git-ref dependency like `cs_util @ develop`. Includes the -o addopts='' tip for
when a pytest plugin the config references isn't in the image.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
…shear

SCIENCE-AFFECTING — the one value change in the column-grammar refactor.

The HSM star/PSF moment columns are named E1/E2_*_HSM (e-type, the distortion
convention the star/PSF residual ρ/τ statistics expect), but the producers were
storing galsim's reduced shear `observed_shape.g1/.g2` (g-type). Name and value
disagreed. This swaps the five producer sites to `observed_shape.e1/.e2` so the
stored value is the true e-type distortion (|e| = 2|g|/(1+|g|²)), matching the
column name.

Sites (galsim.Shear.e1/.e2, distortion):
  psfex_interp.py                 _get_psfshapes (325-326), _get_starshapes (440-441)
  mccd_interpolation_script.py    PSF (222-223), STAR (280-281)
  shapepipe_auxiliary_mccd.py     PSF (421-422)

Closes Sacha's review point (#761): HSM is e-type, ngmix is g-type — name and
value must agree. Downstream consequence: ρ/τ-statistic numbers change. Fabian
/Axel to confirm the e-type convention is what the residual stats want.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
…ed as T

Rename every HSM moment column to the ShapePipe-v2 grammar
ESTIMATOR_COMPONENT_OBJECT and store size as T = 2*sigma^2 (an area) sourced
from cs_util.size, across the PSFEx and MCCD producers and their consumers:

  E1/E2_PSF_HSM       -> HSM_E1/E2_PSF
  E1/E2_STAR_HSM      -> HSM_E1/E2_STAR
  SIGMA_PSF/STAR_HSM  -> HSM_T_PSF / HSM_T_STAR   (value now T, not sigma)
  FLAG_PSF/STAR_HSM   -> HSM_FLAG_PSF / HSM_FLAG_STAR

Size is converted at the earliest honest point: the producers (psfex_interp,
mccd_interpolation_script, shapepipe_auxiliary_mccd) store cs_size.sigma_to_T
under the HSM_T_* keys -- in the FITS columns AND the SHAPES SqliteDict -- so
every consumer reads T straight through. The hand-coded conversions in the
consumers drop out: merge_starcat's np.sqrt(size) / **2 and mccd_plot_utilities'
2*sigma^2 arithmetic become plain HSM_T_* reads.

One intended numeric change: the merge_starcat MCCD branch derives its output
size from the external PSF_MOM_LIST/STAR_MOM_LIST[:,2] (a sigma), now via
cs_size.sigma_to_T -- so that validation catalogue's size column changes from
sigma^2 to T = 2*sigma^2.

Extends test_psfex_interp with an HSM e-type property test (elliptical-PSF
fixture, since e vs g vanishes on round shapes) and a producer test pinning the
new grammar + T routing.

Downstream consumers (sp_validation) break on both the rename and the sigma->T
semantics; that ripple is absorbed by the separate #205 PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
Axel's review point #2: spread_model must never be cut on -- and it isn't used
as a pipeline step. Delete the module and unwire it:

  - delete spread_model_package/ and spread_model_runner.py (only the runner
    imported the package, so the removal is self-contained)
  - make_cat_runner: drop spread_model_runner from the input modules, drop the
    sexcat_sm input pattern/ext and the SM_* config, remove the save_sm_data()
    call, and collapse the now-dead 3/4/5-input branching (the len==5
    galsim-dual-shape path was dormant -- no galsim_shapes_runner exists)
  - example configs, module docstring, pipeline tutorial: drop spread_model

The make_cat-side writer (save_sm_data) and its SPREAD_* columns are removed in
the column-grammar commit that follows. summary_params_pre_v2.py keeps its
historical v1 runner list untouched.

Downstream, sp_validation cuts on SPREAD_MODEL (classification_galaxy_base,
do_spread_model default True) -- that paired change rides the separate #205 PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
…olumns

Apply the ShapePipe-v2 grammar (ESTIMATOR_COMPONENT_OBJECT, uppercase; galaxy
the implicit default object with no token; ellipticity split into named scalar
G1/G2 for g-type and E1/E2 for e-type; one stored size T = 2*sigma^2 via
cs_util.size) to every column make_cat writes, and drop the spread_model writer:

  ngmix galaxy: NGMIX_*_GAL_<S> -> NGMIX_*_<S>  (drop GAL; realigns with what
                sp_validation already reads, e.g. NGMIX_T_NOSHEAR)
  galsim:       GALSIM_GAL_ELL[_ERR|_UNCORR]_<K> packed(N,2) -> scalar
                GALSIM_{G1,G2}[_ERR|_UNCORR]_<K>; GALSIM_PSF_ELL_<K> ->
                GALSIM_{G1,G2}_PSF_<K>; GALSIM_*_SIGMA_<K> -> GALSIM_T[_PSF]_<K>
                via cs_size.sigma_to_T. Also fixes a latent NameError (undefined
                n_obj -> len(self._obj_id)) in this dormant block.
  HSM sink:     _save_psf_data's PSF_ELL_<i> -> HSM_E1/E2_PSF_<i>, PSF_FWHM_<i>
                -> HSM_T_PSF_<i> (read straight from the SHAPES dict, now T),
                PSF_FLAG_<i> -> HSM_FLAG_PSF_<i>
  spread_model: delete the save_sm_data writer and its SPREAD_MODEL /
                SPREADERR_MODEL / SPREAD_CLASS columns

Updates the example .param allow-lists (final_cat, cat_matched) and the make_cat
/grammar/averaging property tests; adds test_galsim_grammar_properties. ngmix
PSF (PSF_ORIG/PSF_RECONV) columns were already grammar-correct.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
shapepipe's column grammar sources size (T = 2*sigma^2) from cs_util.size, which
landed in cs_util 0.2.x. Point the dependency at cs_util's develop branch via
[tool.uv.sources] (a git ref, mirroring the ngmix entry) so new cs_util helpers
are available without shipping a release and rebuilding the image each time, and
raise the floor cs_util>=0.2.1 to make the cs_util.size requirement explicit
rather than coincidental on PyPI's current latest. uv.lock pins the resolved
develop commit.

CI rebuilds the image from this pyproject/lock; local in-container runs use the
PYTHONPATH-shadow of an on-disk cs_util checkout until the dev image is rebuilt
(see CLAUDE.md).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
sachaguer and others added 7 commits July 1, 2026 08:51
* ngmix: give the PSF observation a finite weight map (#749)

make_ngmix_observation built psf_obs weightless, so ngmix defaulted to
unit weight on the unit-flux PSF stamp and the galaxy GPriorBA(0.4) prior
handed to the diagnostic PSF fitter swamped the likelihood. The recovered
PSF shape collapsed toward the prior (zero ellipticity), corrupting the
*_psf_orig diagnostic columns (#749), and the small finite-weight noise
budget present in the pre-v2.0 esheldon/aguinot code was lost in the
rewrite (the now-retired #774 — the same bug from the noise-budget angle).

Build psf_obs with a flat weight = 1/PSF_NOISE**2 (PSF_NOISE = 1e-5, the
esheldon/aguinot value Axel's reproduction used), forced float for parity
with the galaxy weight — the PSF analogue of the galaxy weight built just
below. Weight only; PSFEx/MCCD models already carry a little noise, so no
explicit stamp-noise injection is needed (Axel's guidance on #749). The
exact value is non-critical: validated flat across 1e-4..1e-6 on the twin.

Calibration is untouched: metacal fits this same psf_obs with a prior-free
AdmomFitter, which is insensitive to a flat weight's absolute scale, so the
calibrated shear is invariant (verified bit-for-bit on the twin).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017KQn6mXoL4XMgscZDsaTkB

* tests: guard the PSF-obs weight end-to-end (#749)

Three assertions in test_ngmix_weight_validation.py, on the real production
path now that this sits on the column grammar (#761):

- structural: make_ngmix_observation builds psf_obs with the finite
  1/PSF_NOISE**2 weight, never unit weight;
- recovery (#749 done-condition): through do_ngmix_metacal /
  average_original_psf, the *_psf_orig column recovers the injected PSF moment
  ellipticity to 1e-3 with the weight, and collapses below 10% without it (the
  contrast that proves the weight is load-bearing for the diagnostic columns);
- invariance: every metacal-type shear is bit-for-bit identical between the
  unit-weight (pre-fix) and shipped PSF_NOISE builds (metacal's prior-free
  AdmomFitter is insensitive to the flat weight's scale).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_017KQn6mXoL4XMgscZDsaTkB

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@cailmdaley cailmdaley mentioned this pull request Jul 1, 2026
12 tasks
cailmdaley and others added 8 commits July 1, 2026 18:33
ShapePipe has no galsim shape-measurement runner -- only ngmix_runner
measures shapes and writes NGMIX_* columns. The galsim family in make_cat
(_save_galsim_shapes, the process() "galsim" branch) could only be
populated by reading an external shape catalogue that nothing in the
pipeline produces; every production config pins SHAPE_MEASUREMENT_TYPE =
ngmix. #761 renamed these columns onto the v2 grammar, but they are not
creatable end-to-end -- confirmed from the sp_validation side, where the
mirror galsim consumer was removed as dead code (sp_validation #201).
Remove the dormant writer rather than carry grammar-renamed columns no
pipeline can exercise.

  - delete SaveCatalogue._save_galsim_shapes and its process() "galsim"
    dispatch branch; process modes are now "ngmix" or "psf"
  - drop the now-orphaned cs_util.size import (only the galsim block used it)
  - delete tests/module/test_galsim_grammar_properties.py, which drove the
    removed writer

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0191bMtgT2gcmcT7fNYbGLS1
ngmix is now the pipeline's only shape estimator, so make_cat_runner
validates SHAPE_MEASUREMENT_TYPE against ngmix alone and raises on
anything else. The knob is kept (rather than hardcoded) as the extension
point for a future estimator family, mirroring sp_validation's retained
`prefix`. The shape-write loop drops the galsim-vs-ngmix catalogue
selection -- shapes are always read from the single ngmix input. Docstring
and the cat_matched.param comment lose their galsim references.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0191bMtgT2gcmcT7fNYbGLS1
The static grammar test covered ngmix, galsim, and HSM. With the galsim
writer gone, drop its regex branch and its valid/invalid example columns;
the frozen grammar now spans the two families ShapePipe actually produces
(ngmix, HSM). Also drop the docstring pointer to the deleted
test_galsim_grammar_properties.py.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0191bMtgT2gcmcT7fNYbGLS1
Restores the galsim shape-serialization path that commits e036a95,
df84064 and 1c69af5 removed:
  - _save_galsim_shapes() writer in make_cat.py (new-grammar form)
  - GALSIM in SHAPE_MEASUREMENT_TYPE (make_cat_runner, __init__)
  - test_galsim_grammar_properties.py and the galsim frozen-grammar case

Per Axel Guinot's 2026-07-02 review of #761: the galsim dead-path removal
is unrelated to the reconvolved-PSF / column-grammar purpose of this PR
("too many things at once"). The removal is good work and stands on its
own — it moves to a dedicated PR against ngmix_v2.0 rather than riding
here. This revert keeps #761 scoped to the PSF column grammar.

This reverts the net effect of e036a95, df84064 and 1c69af5.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GQSyaXLnLZtksntYAYD41E
Undoes the spread_model removal that this branch carried (85a18ab in
full, plus the save_sm_data half of 4f0756d), so #761 no longer touches
spread_model at all:
  - spread_model_package/ + spread_model_runner.py restored
  - make_cat_runner: spread_model_runner input, sexcat_sm pattern, SM_*
    config, and the save_sm_data() call restored
  - make_cat.save_sm_data() writer + SPREAD_MODEL / SPREADERR_MODEL /
    SPREAD_CLASS columns restored (re-added by hand alongside the new
    grammar, since 4f0756d deleted the writer while unifying the grammar)
  - example configs, .param allow-lists, pipeline tutorial restored

Per Axel Guinot's 2026-07-02 review: his position is that spread_model
must not be *cut on* for shear selection — not that it should be deleted
("it could still be useful for other things... it does not hurt to keep
it... this is not the right PR to remove it"). The earlier deletion
over-read that point. Whether spread_model should be removed in a
dedicated future PR is a separate question, addressed in a standalone
report; #761 is scoped to the PSF column grammar and leaves it in place.

The grammar unification of 4f0756d (ngmix / galsim / HSM-sink columns)
is untouched — only the spread_model writer it also deleted is restored.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01GQSyaXLnLZtksntYAYD41E
The HSM column-grammar rename (4b17f7e) renamed the SqliteDict SHAPES
key SIGMA_PSF_HSM -> HSM_T_PSF and changed its value from the Gaussian
sigma to T = 2 sigma^2. spread_model still read the old key, so with
spread_model restored to this PR it would KeyError at runtime.

Read HSM_T_PSF and convert back to sigma via cs_util.size.T_to_sigma,
which the rest of the module (get_model, the >0 guard, sigma_to_fwhm,
the galsim Gaussian) expects. Mean-of-sigma semantics are preserved:
each per-exp/ccd T is converted to sigma before averaging, exactly as
before. This completes the grammar rename for spread_model's one
consumer so #761 no longer ships it broken.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01JSoGrXLyNr74SjiFw5FVkB
ngmix v2.0: one column grammar across estimators (PSF source-of-truth, single size, E↔G)
_save_psf_data still read HSM_E1_PSF/HSM_E2_PSF from the multi-epoch
PSF sqlite, but every producer (psfex_interp._interpolate_me, mccd)
writes HSM_G1_PSF/HSM_G2_PSF since the return to g-type ellipticities
(#761) — a guaranteed KeyError on the first unflagged PSF epoch
whenever SAVE_PSF is enabled. Rename the consumer (and its output
columns) to the g-type grammar.

The frozen-grammar property test encoded the same stale E-type
convention for HSM, rubber-stamping the mismatch: fix the regex and
examples (HSM_E1_PSF is now an explicit invalid example), and align
the stale e-type docstrings in test_psfex_interp with the g-type
behavior the assertions actually pin.

Found by the final integrated review of #741.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01TVYoKzcNwYjuGiFi9CxsYP
@cailmdaley

Copy link
Copy Markdown
Contributor Author

I just did one more review and cleaned up some leftover e-type HSM consumers that were missed in #761. Also found a preexisting issue #785 that needn't block this PR. @martinkilbinger I think this is now good to go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ngmix v2.0 — a correct, validated v2.0 shape catalogue

7 participants