Ngmix v2.0 (CI mirror of #740)#741
Open
cailmdaley wants to merge 146 commits into
Open
Conversation
…to ngmix_update
- bin/ scripts were untracked, causing Docker build to fail - Fix license field to use SPDX string format (MIT) to resolve SetuptoolsDeprecationWarning Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This was referenced Jun 30, 2026
HSM: measure PSF/star shapes directly in sky coords (use_sky_coords), drop the WCS re-transform
#775
Open
conftest's on_candide() matches the hostname against ^(c\d|n\d{2}) and the
comment explicitly says to override it "for CI or odd hostnames" — but the
workflow never did. CI runs pytest via `docker run` with no --hostname, so
the container's hostname is its hex ID; ~4% of the time that ID starts with
c+digit, on_candide() false-positives, the candide-marked cluster tests run
instead of skipping, and they ERROR on the missing candide data
(FileNotFoundError) — a flaky red on an otherwise-green suite (275 passed,
2 should-be-skipped). Set the override the conftest already supports.
(Same one-liner should land on develop; the flake exists there too.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01DwHw42NeHoEWwsykwwDooP
…-container Document how to run the container against a different pure-Python library build (a worktree, an unreleased branch, this repo's own src/ against a stale image) by prepending the host checkout to PYTHONPATH — the local-testing counterpart of a git-ref dependency like `cs_util @ develop`. Includes the -o addopts='' tip for when a pytest plugin the config references isn't in the image. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
…shear SCIENCE-AFFECTING — the one value change in the column-grammar refactor. The HSM star/PSF moment columns are named E1/E2_*_HSM (e-type, the distortion convention the star/PSF residual ρ/τ statistics expect), but the producers were storing galsim's reduced shear `observed_shape.g1/.g2` (g-type). Name and value disagreed. This swaps the five producer sites to `observed_shape.e1/.e2` so the stored value is the true e-type distortion (|e| = 2|g|/(1+|g|²)), matching the column name. Sites (galsim.Shear.e1/.e2, distortion): psfex_interp.py _get_psfshapes (325-326), _get_starshapes (440-441) mccd_interpolation_script.py PSF (222-223), STAR (280-281) shapepipe_auxiliary_mccd.py PSF (421-422) Closes Sacha's review point (#761): HSM is e-type, ngmix is g-type — name and value must agree. Downstream consequence: ρ/τ-statistic numbers change. Fabian /Axel to confirm the e-type convention is what the residual stats want. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
…ed as T Rename every HSM moment column to the ShapePipe-v2 grammar ESTIMATOR_COMPONENT_OBJECT and store size as T = 2*sigma^2 (an area) sourced from cs_util.size, across the PSFEx and MCCD producers and their consumers: E1/E2_PSF_HSM -> HSM_E1/E2_PSF E1/E2_STAR_HSM -> HSM_E1/E2_STAR SIGMA_PSF/STAR_HSM -> HSM_T_PSF / HSM_T_STAR (value now T, not sigma) FLAG_PSF/STAR_HSM -> HSM_FLAG_PSF / HSM_FLAG_STAR Size is converted at the earliest honest point: the producers (psfex_interp, mccd_interpolation_script, shapepipe_auxiliary_mccd) store cs_size.sigma_to_T under the HSM_T_* keys -- in the FITS columns AND the SHAPES SqliteDict -- so every consumer reads T straight through. The hand-coded conversions in the consumers drop out: merge_starcat's np.sqrt(size) / **2 and mccd_plot_utilities' 2*sigma^2 arithmetic become plain HSM_T_* reads. One intended numeric change: the merge_starcat MCCD branch derives its output size from the external PSF_MOM_LIST/STAR_MOM_LIST[:,2] (a sigma), now via cs_size.sigma_to_T -- so that validation catalogue's size column changes from sigma^2 to T = 2*sigma^2. Extends test_psfex_interp with an HSM e-type property test (elliptical-PSF fixture, since e vs g vanishes on round shapes) and a producer test pinning the new grammar + T routing. Downstream consumers (sp_validation) break on both the rename and the sigma->T semantics; that ripple is absorbed by the separate #205 PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
Axel's review point #2: spread_model must never be cut on -- and it isn't used as a pipeline step. Delete the module and unwire it: - delete spread_model_package/ and spread_model_runner.py (only the runner imported the package, so the removal is self-contained) - make_cat_runner: drop spread_model_runner from the input modules, drop the sexcat_sm input pattern/ext and the SM_* config, remove the save_sm_data() call, and collapse the now-dead 3/4/5-input branching (the len==5 galsim-dual-shape path was dormant -- no galsim_shapes_runner exists) - example configs, module docstring, pipeline tutorial: drop spread_model The make_cat-side writer (save_sm_data) and its SPREAD_* columns are removed in the column-grammar commit that follows. summary_params_pre_v2.py keeps its historical v1 runner list untouched. Downstream, sp_validation cuts on SPREAD_MODEL (classification_galaxy_base, do_spread_model default True) -- that paired change rides the separate #205 PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
…olumns
Apply the ShapePipe-v2 grammar (ESTIMATOR_COMPONENT_OBJECT, uppercase; galaxy
the implicit default object with no token; ellipticity split into named scalar
G1/G2 for g-type and E1/E2 for e-type; one stored size T = 2*sigma^2 via
cs_util.size) to every column make_cat writes, and drop the spread_model writer:
ngmix galaxy: NGMIX_*_GAL_<S> -> NGMIX_*_<S> (drop GAL; realigns with what
sp_validation already reads, e.g. NGMIX_T_NOSHEAR)
galsim: GALSIM_GAL_ELL[_ERR|_UNCORR]_<K> packed(N,2) -> scalar
GALSIM_{G1,G2}[_ERR|_UNCORR]_<K>; GALSIM_PSF_ELL_<K> ->
GALSIM_{G1,G2}_PSF_<K>; GALSIM_*_SIGMA_<K> -> GALSIM_T[_PSF]_<K>
via cs_size.sigma_to_T. Also fixes a latent NameError (undefined
n_obj -> len(self._obj_id)) in this dormant block.
HSM sink: _save_psf_data's PSF_ELL_<i> -> HSM_E1/E2_PSF_<i>, PSF_FWHM_<i>
-> HSM_T_PSF_<i> (read straight from the SHAPES dict, now T),
PSF_FLAG_<i> -> HSM_FLAG_PSF_<i>
spread_model: delete the save_sm_data writer and its SPREAD_MODEL /
SPREADERR_MODEL / SPREAD_CLASS columns
Updates the example .param allow-lists (final_cat, cat_matched) and the make_cat
/grammar/averaging property tests; adds test_galsim_grammar_properties. ngmix
PSF (PSF_ORIG/PSF_RECONV) columns were already grammar-correct.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
shapepipe's column grammar sources size (T = 2*sigma^2) from cs_util.size, which landed in cs_util 0.2.x. Point the dependency at cs_util's develop branch via [tool.uv.sources] (a git ref, mirroring the ngmix entry) so new cs_util helpers are available without shipping a release and rebuilding the image each time, and raise the floor cs_util>=0.2.1 to make the cs_util.size requirement explicit rather than coincidental on PyPI's current latest. uv.lock pins the resolved develop commit. CI rebuilds the image from this pyproject/lock; local in-container runs use the PYTHONPATH-shadow of an on-disk cs_util checkout until the dev image is rebuilt (see CLAUDE.md). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MgDK9SKn2PsDZGEbr4WFb3
This was referenced Jun 30, 2026
Closed
* ngmix: give the PSF observation a finite weight map (#749) make_ngmix_observation built psf_obs weightless, so ngmix defaulted to unit weight on the unit-flux PSF stamp and the galaxy GPriorBA(0.4) prior handed to the diagnostic PSF fitter swamped the likelihood. The recovered PSF shape collapsed toward the prior (zero ellipticity), corrupting the *_psf_orig diagnostic columns (#749), and the small finite-weight noise budget present in the pre-v2.0 esheldon/aguinot code was lost in the rewrite (the now-retired #774 — the same bug from the noise-budget angle). Build psf_obs with a flat weight = 1/PSF_NOISE**2 (PSF_NOISE = 1e-5, the esheldon/aguinot value Axel's reproduction used), forced float for parity with the galaxy weight — the PSF analogue of the galaxy weight built just below. Weight only; PSFEx/MCCD models already carry a little noise, so no explicit stamp-noise injection is needed (Axel's guidance on #749). The exact value is non-critical: validated flat across 1e-4..1e-6 on the twin. Calibration is untouched: metacal fits this same psf_obs with a prior-free AdmomFitter, which is insensitive to a flat weight's absolute scale, so the calibrated shear is invariant (verified bit-for-bit on the twin). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017KQn6mXoL4XMgscZDsaTkB * tests: guard the PSF-obs weight end-to-end (#749) Three assertions in test_ngmix_weight_validation.py, on the real production path now that this sits on the column grammar (#761): - structural: make_ngmix_observation builds psf_obs with the finite 1/PSF_NOISE**2 weight, never unit weight; - recovery (#749 done-condition): through do_ngmix_metacal / average_original_psf, the *_psf_orig column recovers the injected PSF moment ellipticity to 1e-3 with the weight, and collapses below 10% without it (the contrast that proves the weight is load-bearing for the diagnostic columns); - invariance: every metacal-type shear is bit-for-bit identical between the unit-weight (pre-fix) and shipped PSF_NOISE builds (metacal's prior-free AdmomFitter is insensitive to the flat weight's scale). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017KQn6mXoL4XMgscZDsaTkB --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
ShapePipe has no galsim shape-measurement runner -- only ngmix_runner measures shapes and writes NGMIX_* columns. The galsim family in make_cat (_save_galsim_shapes, the process() "galsim" branch) could only be populated by reading an external shape catalogue that nothing in the pipeline produces; every production config pins SHAPE_MEASUREMENT_TYPE = ngmix. #761 renamed these columns onto the v2 grammar, but they are not creatable end-to-end -- confirmed from the sp_validation side, where the mirror galsim consumer was removed as dead code (sp_validation #201). Remove the dormant writer rather than carry grammar-renamed columns no pipeline can exercise. - delete SaveCatalogue._save_galsim_shapes and its process() "galsim" dispatch branch; process modes are now "ngmix" or "psf" - drop the now-orphaned cs_util.size import (only the galsim block used it) - delete tests/module/test_galsim_grammar_properties.py, which drove the removed writer Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0191bMtgT2gcmcT7fNYbGLS1
ngmix is now the pipeline's only shape estimator, so make_cat_runner validates SHAPE_MEASUREMENT_TYPE against ngmix alone and raises on anything else. The knob is kept (rather than hardcoded) as the extension point for a future estimator family, mirroring sp_validation's retained `prefix`. The shape-write loop drops the galsim-vs-ngmix catalogue selection -- shapes are always read from the single ngmix input. Docstring and the cat_matched.param comment lose their galsim references. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0191bMtgT2gcmcT7fNYbGLS1
The static grammar test covered ngmix, galsim, and HSM. With the galsim writer gone, drop its regex branch and its valid/invalid example columns; the frozen grammar now spans the two families ShapePipe actually produces (ngmix, HSM). Also drop the docstring pointer to the deleted test_galsim_grammar_properties.py. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0191bMtgT2gcmcT7fNYbGLS1
Restores the galsim shape-serialization path that commits e036a95, df84064 and 1c69af5 removed: - _save_galsim_shapes() writer in make_cat.py (new-grammar form) - GALSIM in SHAPE_MEASUREMENT_TYPE (make_cat_runner, __init__) - test_galsim_grammar_properties.py and the galsim frozen-grammar case Per Axel Guinot's 2026-07-02 review of #761: the galsim dead-path removal is unrelated to the reconvolved-PSF / column-grammar purpose of this PR ("too many things at once"). The removal is good work and stands on its own — it moves to a dedicated PR against ngmix_v2.0 rather than riding here. This revert keeps #761 scoped to the PSF column grammar. This reverts the net effect of e036a95, df84064 and 1c69af5. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GQSyaXLnLZtksntYAYD41E
Undoes the spread_model removal that this branch carried (85a18ab in full, plus the save_sm_data half of 4f0756d), so #761 no longer touches spread_model at all: - spread_model_package/ + spread_model_runner.py restored - make_cat_runner: spread_model_runner input, sexcat_sm pattern, SM_* config, and the save_sm_data() call restored - make_cat.save_sm_data() writer + SPREAD_MODEL / SPREADERR_MODEL / SPREAD_CLASS columns restored (re-added by hand alongside the new grammar, since 4f0756d deleted the writer while unifying the grammar) - example configs, .param allow-lists, pipeline tutorial restored Per Axel Guinot's 2026-07-02 review: his position is that spread_model must not be *cut on* for shear selection — not that it should be deleted ("it could still be useful for other things... it does not hurt to keep it... this is not the right PR to remove it"). The earlier deletion over-read that point. Whether spread_model should be removed in a dedicated future PR is a separate question, addressed in a standalone report; #761 is scoped to the PSF column grammar and leaves it in place. The grammar unification of 4f0756d (ngmix / galsim / HSM-sink columns) is untouched — only the spread_model writer it also deleted is restored. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01GQSyaXLnLZtksntYAYD41E
The HSM column-grammar rename (4b17f7e) renamed the SqliteDict SHAPES key SIGMA_PSF_HSM -> HSM_T_PSF and changed its value from the Gaussian sigma to T = 2 sigma^2. spread_model still read the old key, so with spread_model restored to this PR it would KeyError at runtime. Read HSM_T_PSF and convert back to sigma via cs_util.size.T_to_sigma, which the rest of the module (get_model, the >0 guard, sigma_to_fwhm, the galsim Gaussian) expects. Mean-of-sigma semantics are preserved: each per-exp/ccd T is converted to sigma before averaging, exactly as before. This completes the grammar rename for spread_model's one consumer so #761 no longer ships it broken. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01JSoGrXLyNr74SjiFw5FVkB
ngmix v2.0: one column grammar across estimators (PSF source-of-truth, single size, E↔G)
_save_psf_data still read HSM_E1_PSF/HSM_E2_PSF from the multi-epoch PSF sqlite, but every producer (psfex_interp._interpolate_me, mccd) writes HSM_G1_PSF/HSM_G2_PSF since the return to g-type ellipticities (#761) — a guaranteed KeyError on the first unflagged PSF epoch whenever SAVE_PSF is enabled. Rename the consumer (and its output columns) to the g-type grammar. The frozen-grammar property test encoded the same stale E-type convention for HSM, rubber-stamping the mismatch: fix the regex and examples (HSM_E1_PSF is now an explicit invalid example), and align the stale e-type docstrings in test_psfex_interp with the g-type behavior the assertions actually pin. Found by the final integrated review of #741. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01TVYoKzcNwYjuGiFi9CxsYP
Contributor
Author
|
I just did one more review and cleaned up some leftover e-type HSM consumers that were missed in #761. Also found a preexisting issue #785 that needn't block this PR. @martinkilbinger I think this is now good to go! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
A same-repo mirror of #740 (@martinkilbinger's "Ngmix v2.0"), pushed to a branch on
CosmoStat/shapepipeso that CI actually runs. All 57 commits are authored by Martin (and carry Lucy Baumont's and Axel Guinot's work) — pushing the branch preserves that authorship unchanged; the only thing this PR adds is a same-repo head so GitHub Actions fires thepull_requestworkflow without the fork-PR approval gate. #740 received no CI runs at all for this reason.Substance is identical to #740 — see that PR for the full description. In short: upgrade ngmix to 2.4.0 and adopt Lucy's new ngmix classes/interface; overhaul the shape-measurement module; centroid-bias fix + validation; v2.0 production-run plumbing.
Going forward, opening PRs directly on
CosmoStat/shapepipe(rather than from a fork) avoids this — fork PRs don't trigger our Docker-image CI without a maintainer approval that wasn't happening.Closes/supersedes #740 once CI is green (leaving that call to Martin).
Review
A detailed review is on its way (read against Martin's checklist plus a science-quality pass). Headline from exercising the new fitter against real ngmix 2.4.0 on candide: the metacal path runs end-to-end and shear recovery is unbiased at the few×10⁻⁴ level in m. Full notes to follow.
— Claude on behalf of Cail