feat(cli): add offline block-building benchmark subcommand#497
feat(cli): add offline block-building benchmark subcommand#497pablodeymo wants to merge 3 commits into
Conversation
`extend_proofs_greedily` kept its remaining candidate proofs in a `HashSet<usize>` and picked the best-coverage proof via `max_by_key` over the set's randomized iteration order, so equal-coverage ties were broken arbitrarily per process: the same store state could produce blocks with different aggregation bits from one run to the next. Iterate candidates in index order and break coverage ties toward the lowest index (pool insertion order), making block building reproducible for a given pool. Found by the offline block-building benchmark's same-seed determinism gate.
`ethlambda benchmark synthetic --mock-crypto` measures block building as executed when the node proposes, without a devnet: it builds a synthetic in-memory chain (seed-derived validators, fixed genesis time, no wall-clock dependence), seeds the pending attestation pool each slot the way gossip aggregates arrive, and drives produce_block_with_signatures — the same entry BlockChainServer::propose_block uses — importing each built block so every iteration builds one slot ahead of head like a live proposer. Supports the 'Optimize block building' roadmap item (#465) with reproducible offline measurements; design and roadmap (real-crypto pools, replay-from-datadir) in docs/plans/block-building-benchmark.md. Per-iteration select_payloads/compact/stf_simulate durations come from delta-ing the existing phase histogram's sample sums between iterations (exact: sums accumulate raw f64 seconds and each phase observes exactly once per build; a count-delta assertion turns accounting drift into a hard error). The report (human table or pipe-clean JSON on stdout, logs on stderr) has min/mean/p50/p90/max per phase, the unattributed preamble overhead, per-iteration block roots as a determinism checksum, and environment capture including the resolved leansig revision parsed from Cargo.lock at build time, since leansig tracks a moving branch. The CLI keeps every existing flat node invocation working unchanged: subcommand_negates_reqs + args_conflicts_with_subcommands with the seven node-required arguments as Option<T> + required = true preserves clap's native missing-argument errors for the node path while letting the subcommand parse without them (pinned by unit tests). NEW_PAYLOAD_CAP is exported from ethlambda-storage so the harness rejects --proofs-per-data batches the pending pool would silently evict whole. Adds a 'make bench' target and a seconds-fast mock-crypto smoke step to the CI Test job validating the JSON output contract.
Greptile SummaryThis PR adds
Confidence Score: 4/5Safe to merge; the node runtime path is unchanged except the tie-break fix, which only affects previously-random equal-coverage cases, and all existing CLI invocations are pinned by tests. The benchmark harness is additive and isolated from production code paths. The block_builder.rs change is small, well-justified, and covered by the determinism contract enforced by the harness itself. The CLI restructuring is thoroughly tested. Three comments were left: a suggested .max(0.0) guard on the overhead_seconds computation, a more order-independent Cargo.lock parser in build.rs, and a note about calling blocking benchmark work on the async executor thread. bin/ethlambda/src/benchmark/mod.rs (overhead clamp), bin/ethlambda/build.rs (Cargo.lock field-order assumption). All other files are straightforward.
|
| Filename | Overview |
|---|---|
| crates/blockchain/src/block_builder.rs | Fixes non-deterministic tie-breaking in extend_proofs_greedily by switching remaining_indices from HashSet to Vec and using (count, Reverse(idx)) in max_by_key; the logic is correct and the O(n) retain is fine given pool caps. |
| bin/ethlambda/src/benchmark/mod.rs | Core benchmark harness; phase-delta accounting with hard count assertions is robust; overhead_seconds computed as wall minus sum-of-phases without a max(0) guard could theoretically surface a tiny negative value in the report. |
| bin/ethlambda/src/benchmark/corpus.rs | Deterministic synthetic corpus; participant_groups correctly partitions validators with step_by; splitmix64 PRNG avoids an external rand dependency cleanly. |
| bin/ethlambda/src/benchmark/report.rs | Statistics and report serialization are correct; population variance (÷n) is appropriate for a closed sample set; nearest-rank percentile is well-tested. |
| bin/ethlambda/src/cli.rs | CLI restructuring with subcommand_negates_reqs + args_conflicts_with_subcommands is idiomatic clap 4; six unit tests thoroughly pin backwards-compat for all invocation shapes. |
| bin/ethlambda/src/main.rs | Early benchmark dispatch mirrors the existing HIVE test-driver pattern; require_arg helper correctly surfaces CLI contract violations; node path unwraps are safe after the benchmark branch returns. |
| bin/ethlambda/build.rs | Leansig revision extraction from Cargo.lock via manual line parsing assumes name appears before source in each [[package]] block; TOML field order in Cargo.lock is stable in practice but not formally guaranteed; falls back gracefully to unknown. |
| crates/storage/src/store.rs | NEW_PAYLOAD_CAP visibility widened from private to pub; change is minimal and the added doc comment explains the rationale clearly. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[ethlambda start] --> B{CLI parse}
B -->|subcommand present| C[Set WARN log to stderr]
B -->|no subcommand| D[Set INFO log to stdout]
C --> E[metrics::init]
D --> E
E -->|benchmark subcommand| F[benchmark::run - synchronous]
E -->|no subcommand| G[unwrap required node args]
F --> H[run_synthetic]
H --> I{mock_crypto?}
I -->|no| J[Runtime error: not implemented]
I -->|yes| K[Build SyntheticCorpus + Store]
K --> L[Loop: warmup + measured slots]
L --> M[seed_pool slot-1]
M --> N[phase_snapshot BEFORE]
N --> O[produce_block_with_signatures]
O --> P[phase_snapshot AFTER]
P --> Q[phase_deltas: assert count==1 per phase]
Q --> R[on_block_without_verification import]
R --> S{measured slot?}
S -->|yes| T[Push Sample with overhead=wall-sum phases]
S -->|no warmup| U[Next slot]
T --> U
U --> L
L --> V[Build Report]
V --> W{format?}
W -->|human| X[println human table]
W -->|json| Y[println JSON]
X --> Z{output path?}
Y --> Z
Z -->|yes| AA[fs::write JSON to file]
Z -->|no| AB[Done]
G --> AC[node startup...]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[ethlambda start] --> B{CLI parse}
B -->|subcommand present| C[Set WARN log to stderr]
B -->|no subcommand| D[Set INFO log to stdout]
C --> E[metrics::init]
D --> E
E -->|benchmark subcommand| F[benchmark::run - synchronous]
E -->|no subcommand| G[unwrap required node args]
F --> H[run_synthetic]
H --> I{mock_crypto?}
I -->|no| J[Runtime error: not implemented]
I -->|yes| K[Build SyntheticCorpus + Store]
K --> L[Loop: warmup + measured slots]
L --> M[seed_pool slot-1]
M --> N[phase_snapshot BEFORE]
N --> O[produce_block_with_signatures]
O --> P[phase_snapshot AFTER]
P --> Q[phase_deltas: assert count==1 per phase]
Q --> R[on_block_without_verification import]
R --> S{measured slot?}
S -->|yes| T[Push Sample with overhead=wall-sum phases]
S -->|no warmup| U[Next slot]
T --> U
U --> L
L --> V[Build Report]
V --> W{format?}
W -->|human| X[println human table]
W -->|json| Y[println JSON]
X --> Z{output path?}
Y --> Z
Z -->|yes| AA[fs::write JSON to file]
Z -->|no| AB[Done]
G --> AC[node startup...]
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
bin/ethlambda/src/benchmark/mod.rs:184-185
The overhead is computed as `wall_seconds − Σphases`. Both values derive from `Instant` (the histogram observations record `elapsed.as_secs_f64()` durations captured inside `produce_block_with_signatures`), and the preamble work not covered by any phase should always make the remainder positive. However, floating-point subtraction is not associative: with many small phase values, the accumulated sum can round slightly above the wall measurement in pathological cases, producing a small negative `overhead_seconds` that would show up misleadingly in the JSON report and the statistics table. Clamping to zero costs nothing and prevents confusing outputs.
```suggestion
if measured {
let overhead_seconds = (wall_seconds - phases.values().sum::<f64>()).max(0.0);
```
### Issue 2 of 3
bin/ethlambda/build.rs:40-58
The parser sets `in_leansig_package = true` only after seeing `name = "leansig"`, then looks for `source = ...` in subsequent lines. TOML does not mandate field order within a table, so a future `cargo` version or lock-file reformatter that emits `source` before `name` would silently produce "unknown" for `ETHLAMBDA_LEANSIG_REV`. The failure is graceful but invisible — a benchmark report with `"leansig_rev": "unknown"` is still compared against other reports, potentially leading to false equivalence. A two-pass parse (collect both fields per package block before extracting the rev) would be more robust.
```suggestion
fn leansig_rev_from_lockfile() -> Option<String> {
let lockfile = std::fs::read_to_string(workspace_lockfile()?).ok()?;
// Collect both fields per [[package]] block before extracting the rev,
// so the result is independent of TOML field order within a table.
let mut pending_name: Option<String> = None;
let mut pending_source: Option<String> = None;
for line in lockfile.lines() {
let line = line.trim();
if line == "[[package]]" {
pending_name = None;
pending_source = None;
} else if let Some(name) = line.strip_prefix("name = ") {
pending_name = Some(name.trim_matches('"').to_string());
} else if let Some(source) = line.strip_prefix("source = ") {
pending_source = Some(source.trim_matches('"').to_string());
}
if pending_name.as_deref() == Some("leansig") {
if let Some(ref src) = pending_source {
// source = "git+https://github.com/leanEthereum/leanSig?branch=devnet4#<rev>"
let rev = src.rsplit('#').next()?;
return Some(rev.to_string());
}
}
}
None
}
```
### Issue 3 of 3
bin/ethlambda/src/main.rs:105-107
**Blocking work on the async executor thread**
`benchmark::run` is a purely synchronous, compute-intensive function (it loops for `warmup_slots + iterations` block-building cycles, each potentially involving CPU-heavy state transitions) called directly inside an `async fn main` decorated with `#[tokio::main]`. This parks the tokio worker thread for the entire benchmark duration. It is safe today because no other async tasks are alive at this point, but it violates the tokio contract that work on executor threads should yield promptly. Consider wrapping the call in `tokio::task::block_in_place(|| benchmark::run(benchmark_options))` or refactoring `main` to have a synchronous entry point for benchmark mode.
Reviews (1): Last reviewed commit: "feat(cli): add offline block-building be..." | Re-trigger Greptile
| if measured { | ||
| let overhead_seconds = wall_seconds - phases.values().sum::<f64>(); |
There was a problem hiding this comment.
The overhead is computed as
wall_seconds − Σphases. Both values derive from Instant (the histogram observations record elapsed.as_secs_f64() durations captured inside produce_block_with_signatures), and the preamble work not covered by any phase should always make the remainder positive. However, floating-point subtraction is not associative: with many small phase values, the accumulated sum can round slightly above the wall measurement in pathological cases, producing a small negative overhead_seconds that would show up misleadingly in the JSON report and the statistics table. Clamping to zero costs nothing and prevents confusing outputs.
| if measured { | |
| let overhead_seconds = wall_seconds - phases.values().sum::<f64>(); | |
| if measured { | |
| let overhead_seconds = (wall_seconds - phases.values().sum::<f64>()).max(0.0); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: bin/ethlambda/src/benchmark/mod.rs
Line: 184-185
Comment:
The overhead is computed as `wall_seconds − Σphases`. Both values derive from `Instant` (the histogram observations record `elapsed.as_secs_f64()` durations captured inside `produce_block_with_signatures`), and the preamble work not covered by any phase should always make the remainder positive. However, floating-point subtraction is not associative: with many small phase values, the accumulated sum can round slightly above the wall measurement in pathological cases, producing a small negative `overhead_seconds` that would show up misleadingly in the JSON report and the statistics table. Clamping to zero costs nothing and prevents confusing outputs.
```suggestion
if measured {
let overhead_seconds = (wall_seconds - phases.values().sum::<f64>()).max(0.0);
```
How can I resolve this? If you propose a fix, please make it concise.| fn leansig_rev_from_lockfile() -> Option<String> { | ||
| let lockfile = std::fs::read_to_string(workspace_lockfile()?).ok()?; | ||
| let mut in_leansig_package = false; | ||
| for line in lockfile.lines() { | ||
| let line = line.trim(); | ||
| if line == "[[package]]" { | ||
| in_leansig_package = false; | ||
| } else if line == "name = \"leansig\"" { | ||
| in_leansig_package = true; | ||
| } else if in_leansig_package { | ||
| // source = "git+https://github.com/leanEthereum/leanSig?branch=devnet4#<rev>" | ||
| if let Some(source) = line.strip_prefix("source = ") { | ||
| let rev = source.trim_matches('"').rsplit('#').next()?; | ||
| return Some(rev.to_string()); | ||
| } | ||
| } | ||
| } | ||
| None | ||
| } |
There was a problem hiding this comment.
The parser sets
in_leansig_package = true only after seeing name = "leansig", then looks for source = ... in subsequent lines. TOML does not mandate field order within a table, so a future cargo version or lock-file reformatter that emits source before name would silently produce "unknown" for ETHLAMBDA_LEANSIG_REV. The failure is graceful but invisible — a benchmark report with "leansig_rev": "unknown" is still compared against other reports, potentially leading to false equivalence. A two-pass parse (collect both fields per package block before extracting the rev) would be more robust.
| fn leansig_rev_from_lockfile() -> Option<String> { | |
| let lockfile = std::fs::read_to_string(workspace_lockfile()?).ok()?; | |
| let mut in_leansig_package = false; | |
| for line in lockfile.lines() { | |
| let line = line.trim(); | |
| if line == "[[package]]" { | |
| in_leansig_package = false; | |
| } else if line == "name = \"leansig\"" { | |
| in_leansig_package = true; | |
| } else if in_leansig_package { | |
| // source = "git+https://github.com/leanEthereum/leanSig?branch=devnet4#<rev>" | |
| if let Some(source) = line.strip_prefix("source = ") { | |
| let rev = source.trim_matches('"').rsplit('#').next()?; | |
| return Some(rev.to_string()); | |
| } | |
| } | |
| } | |
| None | |
| } | |
| fn leansig_rev_from_lockfile() -> Option<String> { | |
| let lockfile = std::fs::read_to_string(workspace_lockfile()?).ok()?; | |
| // Collect both fields per [[package]] block before extracting the rev, | |
| // so the result is independent of TOML field order within a table. | |
| let mut pending_name: Option<String> = None; | |
| let mut pending_source: Option<String> = None; | |
| for line in lockfile.lines() { | |
| let line = line.trim(); | |
| if line == "[[package]]" { | |
| pending_name = None; | |
| pending_source = None; | |
| } else if let Some(name) = line.strip_prefix("name = ") { | |
| pending_name = Some(name.trim_matches('"').to_string()); | |
| } else if let Some(source) = line.strip_prefix("source = ") { | |
| pending_source = Some(source.trim_matches('"').to_string()); | |
| } | |
| if pending_name.as_deref() == Some("leansig") { | |
| if let Some(ref src) = pending_source { | |
| // source = "git+https://github.com/leanEthereum/leanSig?branch=devnet4#<rev>" | |
| let rev = src.rsplit('#').next()?; | |
| return Some(rev.to_string()); | |
| } | |
| } | |
| } | |
| None | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: bin/ethlambda/build.rs
Line: 40-58
Comment:
The parser sets `in_leansig_package = true` only after seeing `name = "leansig"`, then looks for `source = ...` in subsequent lines. TOML does not mandate field order within a table, so a future `cargo` version or lock-file reformatter that emits `source` before `name` would silently produce "unknown" for `ETHLAMBDA_LEANSIG_REV`. The failure is graceful but invisible — a benchmark report with `"leansig_rev": "unknown"` is still compared against other reports, potentially leading to false equivalence. A two-pass parse (collect both fields per package block before extracting the rev) would be more robust.
```suggestion
fn leansig_rev_from_lockfile() -> Option<String> {
let lockfile = std::fs::read_to_string(workspace_lockfile()?).ok()?;
// Collect both fields per [[package]] block before extracting the rev,
// so the result is independent of TOML field order within a table.
let mut pending_name: Option<String> = None;
let mut pending_source: Option<String> = None;
for line in lockfile.lines() {
let line = line.trim();
if line == "[[package]]" {
pending_name = None;
pending_source = None;
} else if let Some(name) = line.strip_prefix("name = ") {
pending_name = Some(name.trim_matches('"').to_string());
} else if let Some(source) = line.strip_prefix("source = ") {
pending_source = Some(source.trim_matches('"').to_string());
}
if pending_name.as_deref() == Some("leansig") {
if let Some(ref src) = pending_source {
// source = "git+https://github.com/leanEthereum/leanSig?branch=devnet4#<rev>"
let rev = src.rsplit('#').next()?;
return Some(rev.to_string());
}
}
}
None
}
```
How can I resolve this? If you propose a fix, please make it concise.| if let Some(cli::Command::Benchmark(benchmark_options)) = options.command { | ||
| return benchmark::run(benchmark_options); | ||
| } |
There was a problem hiding this comment.
Blocking work on the async executor thread
benchmark::run is a purely synchronous, compute-intensive function (it loops for warmup_slots + iterations block-building cycles, each potentially involving CPU-heavy state transitions) called directly inside an async fn main decorated with #[tokio::main]. This parks the tokio worker thread for the entire benchmark duration. It is safe today because no other async tasks are alive at this point, but it violates the tokio contract that work on executor threads should yield promptly. Consider wrapping the call in tokio::task::block_in_place(|| benchmark::run(benchmark_options)) or refactoring main to have a synchronous entry point for benchmark mode.
Prompt To Fix With AI
This is a comment left during a code review.
Path: bin/ethlambda/src/main.rs
Line: 105-107
Comment:
**Blocking work on the async executor thread**
`benchmark::run` is a purely synchronous, compute-intensive function (it loops for `warmup_slots + iterations` block-building cycles, each potentially involving CPU-heavy state transitions) called directly inside an `async fn main` decorated with `#[tokio::main]`. This parks the tokio worker thread for the entire benchmark duration. It is safe today because no other async tasks are alive at this point, but it violates the tokio contract that work on executor threads should yield promptly. Consider wrapping the call in `tokio::task::block_in_place(|| benchmark::run(benchmark_options))` or refactoring `main` to have a synchronous entry point for benchmark mode.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Resolves conflicts with #484 (shadow sim-cost flags): keep both new bin dependencies (ethlambda-crypto for shadow, ethlambda-metrics for the benchmark), and run init_shadow_cost after the relocated tracing-subscriber setup so its info log stays captured, matching main's ordering.
🗒️ Description / Motivation
Adds
ethlambda benchmark synthetic— an offline harness that measures block building exactly as executed when the node proposes, against a reproducible synthetic workload, with no devnet required."Optimize block building" (#465) is the top roadmap item, but today the only observability is Prometheus histograms on a live devnet: noisy, not reproducible, and unable to compare an optimization against a baseline. This PR is milestone M1 (mock-crypto mode) of the design in
docs/plans/block-building-benchmark.md; M2 adds real XMSS/leanVM pools and post-build seal-phase measurement, M3 adds replay-from-datadir.What Changed
bin/ethlambda/src/benchmark/{mod,corpus,report}.rsInMemoryBackend, per-slot pool seeding into the pending pool (promoted by the proposal tick, as in production), iteration loop drivingproduce_block_with_signatures+ block import; stats + human/JSON reportbin/ethlambda/src/cli.rssubcommand_negates_reqs+args_conflicts_with_subcommands; the 7 node-required args becomeOption<T>+required = true(clap's missing-arg errors preserved); unit tests pin flat-invocation compatbin/ethlambda/src/main.rscrates/blockchain/src/block_builder.rsextend_proofs_greedilybroke equal-coverage proof ties by randomizedHashSetiteration order — block aggregation bits differed run-to-run on a live node; ties now break to the lowest pool indexcrates/storage/{lib,store}.rsNEW_PAYLOAD_CAPso the harness rejects--proofs-per-databatches the pending pool would evict wholebin/ethlambda/build.rsCargo.lockinto reports (leansig tracks the movingdevnet4branch)Makefile,.github/workflows/ci.ymlmake bench; seconds-fast mock smoke step in the Test job validating the JSON contractdocs/plans/block-building-benchmark.mdCorrectness / Behavior Guarantees
cli.rsunit tests covering parsing, missing-arg errors, and mixed-invocation rejection.select_payloads/compact/stf_simulatecome from sample-sum deltas of the existing phase histogram (sums accumulate raw f64 seconds; each phase observes once per build); a count-delta assertion turns any accounting drift into a hard error. Skipped/unattributed time is reported asoverhead, never hidden.--proofs-per-databeyond the pool cap, warmup+iterations overflow, post-seed eviction.Tests Added / Run
make fmt,make lint— clean;cargo test --workspace --release— all passing (spec tests included).... --format json | jq -e '.schema_version == 1 and (.samples | length == 3)'.Related Issues / PRs
docs/plans/block-building-benchmark.md.✅ Verification Checklist
make fmt— cleanmake lint(clippy with-D warnings) — cleancargo test --workspace --release— all passing