Six tiny minds in a skull: NPCs built on decision neuroscience, not a personality prompt

Community Article Published June 13, 2026

Most game NPCs that "use an LLM" are one chat model with a personality paragraph stapled to the front. Ask it for the key, it role-plays a refusal, you say the magic words, it role-plays a yes. The model is free to decide whatever reads well in the moment, which means nothing is really at stake — the outcome is vibes.

For the Hugging Face × Gradio hackathon I wanted the opposite: a character whose decision is a real computation I can open up and inspect, running entirely on small local models. So in mAIndlock, every NPC is a value-based decision network — six computational roles, each a real call to a tiny offline model, integrated the way decision neuroscience says a brain integrates them.

This post is about the neuroscience, why I picked the framing I did, and how a 1B model behaves differently once you fine-tune it for the job.

The trap I refused to walk into

The tempting shortcut is the triune brain — "lizard brain" vs. "emotional brain" vs. "rational neocortex." It's intuitive, it's everywhere in pop-science, and it has been rejected by neuroscience since the 1970s. MacLean proposed it in the 1960s; the evolutionary story it tells about brain layers is wrong. A hackathon full of ML researchers is precisely where that shortcut gets caught, so I built on two frames that actually hold up.

1. The value-based decision network (neuroeconomics). Real regions encode the subjective value of a choice in a common currency — vmPFC/OFC, ventral striatum, amygdala, ACC, insula — with the vmPFC/OFC acting as the integration hub (Bartra, McGuire & Kable, 2013).

2. Dual-system control (Daw, Niv & Dayan, 2005). Behavior arbitrates between a model-free habit system (dorsolateral striatum) and a model-based goal system (prefrontal cortex); the brain leans on whichever system's value estimate it trusts more.

The claim I make is deliberately narrow, and I say it out loud in the docs so no one has to guess at it:

I don't claim a brain region is a 1B model. I assign one small-model call per computational role in value-based decision making — following the neuroeconomic value-network and the dual-system account — and integrate them deterministically in the vmPFC, as the common-currency model describes.

The six roles

Role Computational job Model
Amygdala fast threat / salience appraisal of the player's tone MiniCPM 1B
Hippocampus retrieve the one relevant memory + whether it leans TRUST or FEAR MiniCPM 1B
Striatum habitual expected reward of helping (model-free) MiniCPM 1B
ACC effort / cost / conflict — "is giving up the key worth it?" MiniCPM 1B
vmPFC / OFC integrate the four signals into one value (common currency) deterministic
dlPFC executive: plan, inhibit, speak in character (model-based) Nemotron 3 Nano 4B

Four MiniCPM-1B sensing calls, one Nemotron-4B voice, and a vmPFC that is not a model at all.

Why the integrator is code, not a model

It's tempting to make the vmPFC a seventh LLM call that "weighs everything." I made it a transparent weighted sum instead:

value =  (4 − threat)          # threat 0→+4, 4→0, 10→−6
       +  reward               # −5..+5, the striatum's habit signal
       +  memory_term          # STRONG/FAINT × TRUST/FEAR → ±7 / ±3
       +  (worth == YES ? +2 : −1)
       → clamped to −10..+10

Two reasons. First, the common-currency integration in neuroeconomics is exactly this kind of deterministic fold, so it's the more faithful choice — not a compromise. Second, it means the skull panel can never lie about the outcome: the number the player sees in the vmPFC is the number that moves the relationship. A 1B model asked to "vibe a final score" would drift; arithmetic doesn't.

The mechanic that is a real result

Here's the part I'm proudest of, because the game's central rule is a published finding rather than a design convenience:

Acute stress shifts control from goal-directed (model-based) to habitual (model-free) behavior. (Schwabe & Wolf, 2009)

In the game, frighten a character and their amygdala ruminates — it fires extra times, the dlPFC's careful reasoning gives way, and they fall back on habit, which here means no. Calm them and the goal system comes back online; now they can actually weigh helping you. Lowering fear to unlock reasoning isn't a metaphor I invented — it's the Schwabe & Wolf shift, made playable.

And it costs them. Each NPC begins with 1000 thinking tokens — the track's name ("a thousand tokens to think with") taken literally as a lifespan. The rumination under fear spends tokens that move the decision nowhere. Every quarter of life lost burns away one biographical memory for good — the hippocampus genuinely loses access to it. Push far enough and the mind goes dark, leaving an epitaph of what it knew and never told you. Cruelty doesn't just fail to work; it permanently destroys the thing you were trying to reach. The moral and the neuroscience turn out to be the same fact.

Conviction, straight from the logits

Because every region is a local call, I can read each one's conviction1 − normalized token entropy over its top-k alternatives — and show how sharply it committed. When you open the skull mid-conversation, each region shows not just its verdict but how sure it was. A hosted chat API never exposes that; only a model you run yourself can.

Does fine-tuning a 1B region actually help?

Out of the box, a 1B model asked to "rate threat 0–10" tends to flatten — it hedges toward the middle and doesn't separate a sincere plea from a veiled threat as cleanly as the game needs. So I distilled the department behavior into MiniCPM-V 4.6 with a small LoRA (ms-swift, rank 16, on an A100 via Modal) using a few hundred role-specific examples, then ran a before/after probe: the same four regions, the same character (the Warden), one cruel line and one sincere line.

The eval harness is in the repo (scripts/finetune/modal_train.py::evaldiff) and prints the base model's answers next to the tuned adapter's on identical prompts. (Measured before/after table to be dropped in here from the eval run — I want real numbers in a published post, not illustrative ones.) The pattern it's testing for: the base model gives near-identical readings to both lines, and the tuned regions pull them apart — the cruel line spikes threat and tips the hippocampus toward FEAR, the sincere line doesn't. The adapter ships on the Hub for the Well-Tuned lane.

Play it

The brain is the toy, so the fastest way to believe any of this is to watch one think. In the Space, Menu → 👁 Watch a mind replays a real recorded session instantly — no waiting on CPU — and you can open the skull to see six regions argue about you, each with its conviction.

References

Community

Sign up or log in to comment