Improving ECDSA Circuits: Notes From Work in Progress

These are working notes, not a results post. The project: my entry in a public, open benchmark that asks for the cheapest reversible quantum circuit performing one elliptic-curve point addition on secp256k1, the curve under Bitcoin and Ethereum. The entry is live and mid-iteration. No final numbers here - just the problem, the harness, where the cost lives, and what I am trying next.

01 / The inner loop

Why one point addition matters

ECDSA on secp256k1 is the signature scheme securing Bitcoin and Ethereum. Point addition is its atomic operation, and the atom is what this benchmark prices.

The reason anyone bothers pricing it is Shor's algorithm. Run Shor against elliptic-curve cryptography and the inner primitive, repeated thousands of times, is point addition. The cost of one reversible point addition is therefore a proxy for the question behind the question: how many qubits and how many gates does a fault-tolerant quantum computer need before the curve falls. Cut the primitive and you cut the whole estimate. The field is called quantum resource estimation; this benchmark turns it into a head-to-head game.

If you build ZK circuits for a living, the work will feel familiar. Swap constraint count for Toffoli count and witness width for qubit width and it is the same discipline: the same field arithmetic, a different cost model, and one extra rule with no analogue in SNARK land. Everything must also run backwards.

02 / The setup

The benchmark, plainly

The task: a reversible circuit that adds a quantum point to a classical point on secp256k1. The score is a single product:

score = average executed Toffoli count x peak live qubits
lower is better

Toffoli gates are the non-Clifford part, the expensive resource on a fault-tolerant machine, so they are what gets counted. Peak qubits is the memory bill. Multiplying the two means you cannot buy your way out of one by torching the other.

The harness is adapted from publicly released academic code on quantum resource estimates for elliptic-curve cryptography, reused under a permissive license with attribution. It is written in Rust and split into two binaries so the trust boundary sits exactly on the process boundary. An untrusted build step is the only place contestant code executes; it serializes an operation stream and nothing else. A separate trusted binary re-simulates that stream against a reference secp256k1 adder, validates it, counts gates, and writes the score. Contestant code cannot touch the simulator, the test inputs, or the scoring.

The contract is deliberately tight. You may edit the point-addition module and nothing else; the simulator, the reference curve arithmetic, the build manifests and the toolchain are fixed, and no new dependencies are allowed. The whole thing builds on Rust 1.93.0 with exactly three crates: alloy-primitives, ruint and sha3. The output is a score file:

{
  "score": 10704574395,
  "metrics": {
    "toffoli": 3942753,
    "qubits": 2715
  }
}

[ Example values from the public README, not my numbers ]

03 / The rules

What valid means: four iron checks

A run only counts if it passes all four:

Classical correctness. The circuit computes the right sum on every test point.
Full reversibility. Every ancilla qubit is uncomputed back to the zero state.
Phase cleanliness. No leftover phase kickback anywhere in the state.
Identity. Running the circuit, then its gate-reversed inverse, restores the original state exactly.

The test inputs are derived from a Fiat-Shamir style hash of your own operation stream, and each run is validated on 9024 hash-derived test points. Change one gate and the inputs re-roll. There is no fixed test set to overfit, no way to special-case inputs, and no way to look cheap by leaking garbage into ancillas.

A circuit that cheats on uncomputation does not score better. It does not score at all.

That is the property I respect most in this benchmark: the rules contain no shortcuts. Every Toffoli you remove has to be removed by doing the arithmetic better.

04 / The cost model

The frontier, and where the cost hides

Gates and qubits pull against each other. Recompute intermediate values and you save qubits but pay Toffolis; keep scratch state alive and the qubit ceiling climbs. So the public reference numbers form a Pareto frontier, not a single target:

[ Reference points from the benchmark's public README ]
Reference point	Toffoli	Peak qubits	Score
Challenge initial circuit	3,942,753	2,715	~1.07 x 10¹⁰
Low-qubit Pareto point	~2,700,000	1,175	~3.2 x 10⁹
Low-gate Pareto point	~2,100,000	1,425	~3.0 x 10⁹

The benchmark's own research loop reports a roughly 33x score cut below the textbook baseline circuit, and the README states outright that the authors believe the published frontier is beatable. That admission is the whole reason to enter. The game is not to slide along the curve; it is to move the curve.

Where does the cost actually hide? Mostly in the modular inverse. Point addition needs the slope of the line through two points, and the slope needs a division mod p. The standard reversible route runs a binary-GCD style inverse, divstep after divstep, every step reversible, every scratch register eventually uncomputed. It dwarfs the multiplications around it. Anyone who has profiled a SNARK circuit doing non-native field arithmetic knows the feeling: one subroutine is the budget.

05 / Status

Honestly mid-flight

Stated carefully: a locally validated circuit - all four checks passing under the trusted harness - currently lands my Toffoli x qubits product below the reference frontier published in the benchmark's own README. That is a real, harness-checked milestone. It is not a final result. The benchmark is live, the frontier is shared, and rank can change while I sleep.

The working tree tells the honest version of the story. There are uncommitted changes to the modular-inverse and circuit-entry code. There is untracked sweep and automation tooling. Some experimental configuration sweeps still fail the classical-correctness check, which is the harness doing exactly its job. The editable surface is real - the point-addition module spans roughly 18,000 lines of Rust across its submodules - and a fair share of it is mid-surgery.

I am not publishing my exact score, Toffoli count, qubit count or rank while the competition runs. Rivals read blog posts too.

06 / Next

Two roads forward

Two directions are open, kept deliberately high-level here.

The incremental road: a compute-bound search for cleaner test-set seeds that shave a little cost. Legal under the rules, bounded in upside, paid for in CPU hours. The mechanism stays private.

The structural road: a better reversible modular-inverse construction. This is the one that would shift the whole cost curve rather than nudge a point along it. It is a multi-day circuit rewrite, it is not done, and it might not beat what I already have. That uncertainty is the interesting part.

When one of those produces a number that survives the harness and is worth defending in public, it gets its own post, with the numbers in it. These notes are the problem and the process. The result is still being earned.