Building "Wall Street of AI Agents": How I Crammed a Multi-Agent Financial Simulation into a 1B Parameter Model

Community Article Published June 12, 2026

Most AI agent frameworks are built to do your taxes, scrape websites, or summarize boring email threads. But when I saw the prompt for the Hugging Face Build Small Hackathon—specifically the "Thousand Token Wood" track, which asked us to build something whimsical, delightful, and weird—I knew I didn't want to build another B2B productivity tool.

I wanted to build a reality TV show.

The result is Wall Street of AI Agents, a fully autonomous, high-frequency trading floor simulation where four AI agents with clashing personalities trade fake money, eavesdrop on each other, and panic during market crashes.

And the craziest part? The entire cognitive engine driving all four autonomous agents concurrently runs locally on a 1-Billion to 4-Billion parameter model. No cloud APIs at all.

▶️ Play the Live Simulation on Hugging Face Spaces
🎬 [Watch the Demo Video][https://youtu.be/1XZuUsiwuTA]

Here are my field notes on how I escaped the bloat of modern agentic frameworks, broke out of Gradio's default UI, and forced a tiny language model to output perfect JSON.

The Concept: A Live-Action Behavioral Benchmark

Sarah leads the leaderboard at $10,700. Alice is arguing with Mike in the hallway. Alex is alone in the Office. The market is Stagnant. Panic ensues.

The premise of the simulation is simple. All Four traders are given $10,000 to start and these four traders share a retro pixel-art office:

Alice (The Quant): Cold, calculating, trades purely on tech signals.
Sarah (The Bear Shorter): Conservative, loves bonds, waits for the world to burn.
Alex (The Crypto Degen): Aggressive, buys the top, believes everything is a bull run.
Mike (The Hype Trader): Trades entirely on breaking news; panics instantly.

Every few seconds, the global market shifts (Tech Boom, Crypto Frenzy, Stagnant, etc.). The agents read the news, look at who is in the room with them, and make a trade.

But beneath the retro graphics, this is actually a lightweight visual benchmark for LLM reasoning. By trapping a tiny model in a high-stakes financial simulation, you are stress-testing three things:

Instruction Following: Can the model output strict JSON while maintaining a persona?
Regime Adaptation: Will Sarah the Bear recognize the crash and buy bonds to get rich, while Alex the Crypto Degen stubbornly holds and goes bankrupt?
Spatial Social Dynamics: Can agents hold a logical conversation based only on who is physically standing next to them?

Challenge 1: The "Anti-Framework" Architecture

If you want to build a multi-agent system today, the default advice is to use heavy orchestration frameworks like AutoGen or CrewAI. For a highly visual, fast-paced game running on edge hardware, these frameworks are far too bloated.

Instead, I built a Spatially-Aware Polling System using FastAPI and an ultra-fast SQLite database.

There is no complex "Message Bus" routing messages between agents. Instead, physical proximity drives the conversation. Before an agent takes a turn, the Python engine queries the database: "Who is physically standing in the VC Office right now?"

If Sarah is in the room with Alex, the backend intercepts her last spoken sentence and injects it directly into Alex's prompt:

“Sarah is here. She just said: 'Hold cash for safety.' INSTRUCTION: Reply directly to what she said.”

This creates organic, localized conversations where agents actively influence each other's trades, mimicking a real trading floor.

Challenge 2: The Chaos Button

I added a giant red button to the dashboard: ⚡ Trigger Chaos. As the "Producer" of the show, I can click this button at any time. It injects a catastrophic headline ("Anonymous leak reveals massive data breach!") and crashes the market.

AI agents can go bankrupt too—one wrong decision in the wrong market regime can wipe out everything. Watching them react in real-time is hilarious. But forcing a 1B model to process that context without breaking its output formatting was the real challenge.

Challenge 3: Forcing a Tiny Titan to Behave

Running a continuous game loop on a Free CPU Space is incredibly difficult because 1B and 4B models (like OpenBMB's MiniCPM5-1B or NVIDIA's Nemotron-3-Nano-4B) are notoriously bad at outputting clean JSON. They hallucinate brackets, inject Markdown, and politely add "Sure! Here's your trading decision:" before the data. Every single one of these quirks crashes a standard game loop.

I didn't try to sanitize the text in Python with messy Regex. Instead, I prevented the hallucinations in C++ at the token sampling level.

Leveraging the llama-cpp-python runtime, I used Native JSON Schema enforcement. By passing a strict Pydantic-style schema into the API, llama.cpp hooks into the neural network's logit generator.

response_format={
    "type": "json_object",
    "schema": {
        "type": "object",
        "properties": {
            "trade": {"type": "string", "enum": ["Tech", "Crypto", "Bonds", "Hold"]},
            "speech": {"type": "string"},
            "thought": {"type": "string"},
            "location": {"type": "string", "enum": ["Startup Offices", "VC Office", "Coffee Shop", "News Room"]}
        },
        "required": ["trade", "speech", "thought", "location"]
    }
}

The model is physically prevented from choosing a token that violates the JSON structure, and is forced to pick a trade exclusively from my Enum array.

This optimization is what makes this project hum. It allows a 1-Billion parameter model to punch massively above its weight class, running 4 autonomous agents continuously on a basic 2-vCPU Hugging Face Space without a single crash.

Challenge 4: Breaking out of the Chatbox

Gradio is a fantastic tool for ML demos, but its default UI screams "AI Tool." I wanted this to feel like a video game.

To achieve the "Off-Brand" aesthetic, I completely bypassed the stock Gradio layout. I built a 2D RPG environment using Phaser.js (compiled via Vite) and mounted the static dist/ folder directly into a FastAPI route. I then injected this HTML5 canvas into Gradio using a gr.HTML iframe.

The result is a Neubrutalist dashboard with three distinct columns:

The Leaderboard: Live tracking of who is going bankrupt.
The Game Canvas: The Phaser.js engine handling walk animations, anti-stuck pathfinding, and dynamic speech bubbles.
The Confession Booth: Because the LLM outputs a structured JSON payload, I split their public "speech" from their private "thought". If you click an agent's sprite in the game, the UI updates to show their hidden anxieties. You get to read the LLM's inner monologue while it publicly projects confidence to the other agents.

The Takeaway: Small Models are the Future of Behavioral Simulation

We've spent the last year obsessed with massive frontier models serving as omniscient chatbots. But Wall Street of AI Agents proves that there is a massive, untapped design space for tiny models acting as NPC brains.

By applying strict grammar constraints to a quantized GGUF model, you can run a localized, dynamic, and hilarious multi-agent simulation entirely in RAM.

The market doesn't care about your architecture choices—but your CPU definitely does.

Built for the Hugging Face Build Small Hackathon 2026. You can play the live simulation here on Hugging Face Spaces, or check out the [Post on X][https://x.com/ashdebugs/status/2065443044833562840].

Spaces mentioned in this article 1

Typotopia

June 15, 2026

LocalDuo — Build Small Hackathon Field Notes

June 15, 2026

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote