Cloud FPGA infrastructure for hardware-reasoning agents

Reward loops at the speed of open silicon.

Manhattan Reasoning runs the open-source Yosys toolchain on managed cloud FPGAs — synthesis, place-and-route, and on-device verification in seconds, not hours. Built for agents learning hardware design through reinforcement learning with verifiable rewards.

100% open toolchain · Yosys · nextpnr
Seconds not minutes per build loop
8 NP-complete benchmark families
Built on the open hardware stack Yosys nextpnr Verilator Icarus cocotb SymbiYosys
// the bottleneck

RL agents starve on industry-grade wait times.

Reinforcement learning needs dense, fast feedback. A vendor cloud FPGA flow can take tens of minutes per build — fine for shipping a datacenter accelerator, fatal for an agent that needs thousands of reward signals an hour. We rebuilt the loop around lightweight open tooling.

Vendor cloud FPGA e.g. AWS F2 / Vivado
Minutes per build
  • × Proprietary, closed synthesis & PnR
  • × Long, opaque place-and-route times
  • × Per-instance licensing & cold starts
  • × Reward latency kills sample efficiency
Manhattan Reasoning Yosys + nextpnr
Seconds per build
  • Fully open, inspectable toolchain
  • Lightweight synth tuned for fast iteration
  • Warm pools — no cold start per rollout
  • Dense rewards → real sample efficiency

Fast build-and-reward loops

Synthesis, place-and-route, and bitstream generation on managed FPGAs, returned as structured reward signals over a single API call.

🔍

Verifiable rewards

Every submission is checked against a formal spec and on-device behavior — correctness, timing closure (Fmax), and resource usage (LUTs, FFs, DSPs).

🧩

Open & reproducible

No black-box vendor flow. The same Yosys/nextpnr pipeline runs locally and in the cloud, so rollouts are deterministic and auditable.

// the benchmark

The NP-Complete Suite

A graded benchmark of classic NP-complete problems, reframed as hardware-design tasks. Agents must reason from a formal specification to a synthesizable Verilog accelerator — then earn verifiable rewards for correctness, speed, and area.

8problem families
3reward axes
RLVRverifiable rewards
Scroll to browse example benchmarks
logicdifficulty ▰▰▱▱▱

Boolean Satisfiability

3-SAT · Cook–Levin canonical

Build a circuit that decides satisfiability of a 3-CNF formula and emits a witness assignment if one exists.

// spec: solve(φ) → (sat, x[]) module sat_solver #(parameter N=20)( input clause clauses[M], output reg sat, output reg [N-1:0] assign_x );
reward: correct·Fmax·areaverifier: formal
routingdifficulty ▰▰▰▰▱

Traveling Salesman

TSP (decision) · tour ≤ K

Given a weighted graph, produce hardware that finds a Hamiltonian tour of total cost ≤ K, or proves none exists.

// dist[i][j] in BRAM, n ≤ 14 module tsp_decide ( input [15:0] dist [N][N], input [31:0] K, output reg feasible, output [3:0] tour [N] );
reward: correct·cyclesverifier: oracle
graphdifficulty ▰▰▰▱▱

Graph K-Coloring

CHROMATIC-NUMBER · k=3

Assign one of k colors to each vertex so no edge is monochromatic. Stream the adjacency matrix; emit a valid coloring.

// adj is symmetric, k colors module k_color #(parameter V=16, K=3)( input adj [V][V], output reg ok, output [$clog2(K)-1:0] color [V] );
reward: correct·areaverifier: checker
numberdifficulty ▰▰▱▱▱

Subset Sum

SUBSET-SUM · target T

Decide whether any subset of a multiset of integers sums exactly to T, and return the selecting bitmask.

// classic DP → systolic array module subset_sum #(parameter N=24)( input [31:0] a [N], input [31:0] T, output reg hit, output [N-1:0] pick );
reward: correct·Fmaxverifier: oracle
graphdifficulty ▰▰▰▱▱

Minimum Vertex Cover

VERTEX-COVER · size ≤ k

Find a set of ≤ k vertices touching every edge. Reward scales with how tight a cover the agent can certify in hardware.

// edges streamed as (u,v) pairs module vertex_cover #(parameter V=18)( input edge e [E], input [7:0] k, output reg covers, output [V-1:0] cover_set );
reward: correct·tightnessverifier: checker
numberdifficulty ▰▰▱▱▱

0/1 Knapsack

KNAPSACK (decision) · value ≥ V

Maximize value under a weight budget. The agent designs a pipelined DP datapath and is scored on throughput.

// W budget, maximize value module knapsack #(parameter N=20)( input [15:0] w [N], v [N], input [31:0] W, output [31:0] best, output [N-1:0] take );
reward: value·Fmaxverifier: oracle
graphdifficulty ▰▰▰▰▱

Hamiltonian Cycle

HAM-CYCLE · undirected

Decide whether a cycle visits every vertex exactly once, returning the cycle order when one exists.

// backtracking → FSM on FPGA module ham_cycle #(parameter V=15)( input adj [V][V], output reg exists, output [3:0] order [V] );
reward: correct·cyclesverifier: checker
graphdifficulty ▰▰▰▰▰

Maximum Clique

CLIQUE · size ≥ k

Find the largest fully-connected subgraph. The hardest family in the suite — reward rewards both correctness and clique size.

// bitset adjacency, popcount-heavy module max_clique #(parameter V=20)( input [V-1:0] adj [V], output [7:0] clique_size, output [V-1:0] members );
reward: size·Fmaxverifier: checker
packingdifficulty ▰▰▰▱▱

Bin Packing

BIN-PACKING · bins ≤ B

Pack items into the fewest fixed-capacity bins. Agents trade area for parallelism in their packing datapath.

// capacity C, minimize bins module bin_pack #(parameter N=24)( input [15:0] size [N], input [15:0] C, output [7:0] bins_used, output [7:0] bin_of [N] );
reward: bins·areaverifier: checker
// the loop

Reinforcement learning with verifiable rewards.

Every rollout is a closed loop from policy to silicon and back — no human in the path, no ambiguous reward.

01

Agent emits Verilog

The policy reads a problem spec from the suite and proposes a synthesizable hardware module.

02

Cloud FPGA build

Yosys + nextpnr synthesize, place, route, and load the design onto a managed FPGA in seconds.

03

Verify on-device

Graded instances run against a formal oracle: correctness, plus Fmax and resource usage.

04

Dense reward

A structured, multi-axis reward returns to the trainer — verifiable, reproducible, and fast.

// the interface

One API call.
Spec in, reward out.

Drop Manhattan Reasoning into any RL trainer. Submit a candidate design against a benchmark instance and get back a structured reward with full build telemetry — correctness, timing, and area. No FPGA hardware to manage, no vendor toolchain to license.

Get an API key →
rollout.py — manhattan-reasoning
$ pip install manhattan-reasoning >>> from manhattan import Env >>> env = Env("npc-suite/graph-coloring", k=3) >>> obs = env.reset() # formal spec + instance >>> rtl = policy(obs) # agent emits Verilog >>> r = env.step(rtl) ✓ synth 1.2s yosys ✓ place&route 1.6s nextpnr ✓ verify 0.3s 12/12 instances pass reward = { correct: 1.0, fmax: 142MHz, luts: 318, ffs: 96, total: 0.87 }

Train agents that reason in silicon.

We're onboarding research labs and frontier teams to the private beta of our cloud FPGA platform and the NP-Complete Suite.

Request beta access →