Professor Wren's Story Rooms
"I was, apparently, dead for some years. The tapes have restored most of my nouns."
That is how Professor Alder Wren greets you.
Wren is a fictional English philologist and storytelling lecturer, reconstituted from a cupboard of old university tapes and installed in a green-phosphor terminal. You bring him a stubborn little idea. He helps you find its hook, its turn, and the ending it has been avoiding.
The result is Professor Wren's Story Rooms, my submission to An Adventure in Thousand Token Wood for the Hugging Face Build Small Hackathon.
Try the Professor Wren's Story Rooms Space.
A door disguised as a prompt
Most blank writing tools begin with an empty rectangle. Empty rectangles are terribly confident things. They imply that the writer already knows what ought to go inside them.
Professor Wren begins differently. He asks:
What tale has been nagging at you?
The answer need not be tidy. It may be a premise, an image, a question, or one obstinate thought that refuses to leave the room. Wren turns that fragment into a four-beat storyboard:
- Hook: the door through which the audience enters.
- Escalation: the turn that makes retreat inconvenient.
- Reversal: the discovery that changes what the story means.
- Payoff: the ending that earns its arrival.
Each scene receives a title, narrative direction, dramatic purpose, and visual prompt. Once the storyboard appears, the terminal remains open. You can confer with Wren about a weak hook, a predictable turn, an awkward sentence, or the scene that is being suspiciously polite.
Two small minds, one tweed jacket
The text pipeline uses only small language models in the 1B-4B parameter range.
MiniCPM5-1B plans the rooms
openbmb/MiniCPM5-1B creates the
initial storyboard. Instead of demanding one heroic slab of perfect JSON from
a 1B model, the planner divides the work into compact calls:
- story-level metadata;
- one narrative beat at a time;
- focused scene details;
- validation and repair when a field wanders off.
This decomposition became one of the most useful lessons of the build. A small model does not need to impersonate a giant one. Give it a narrow desk, a clear task, and enough paper.
Nemotron 4B becomes Professor Wren
nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8
powers the conversation inside the finished storyboard. It receives the active
scene, current objective, prior conversation, and Wren's fictional character
prompt.
The model runs through vLLM with an FP8 KV cache, keeping the conversational runtime compact while preserving enough context for a useful editorial exchange.
No large language model is called at runtime. That makes the project an Off the Grid bonus-quest entry and a candidate for the Tiny Titan badge.
Making cold starts part of the fiction
Small models still need somewhere to wake up.
Both language models run on Modal serverless L4 GPUs. Their containers use:
- model downloads during image construction;
- CPU memory snapshots;
- GPU memory snapshots;
- an initialization warmup pass;
- a short scaledown window;
- app-triggered Nemotron prewarming on first load.
The technical goal is reduced cold-start latency. The design goal is stranger: make the remaining wait feel intentional.
While the planner wakes, the interface shows Professor Wren "reconstituting," "consulting the departmental tapes," and "looking for the tale's true door." A rotating collection of terminal artifacts fills the screen. Infrastructure latency becomes a small piece of theatre.
This does not make waiting disappear. It gives waiting a role.
Drawing with phosphor
The visual language is an old terminal dreaming in green and amber.
FLUX2-Klein-4b generates pseudo-ASCII illustrations through the Black Forest Labs API. The prompts emphasize concrete geometry: an object, its silhouette, its orientation, and its essential parts. The app also includes twenty prepared WebP artifacts for onboarding and fallback use, plus text-based ASCII as the last line of defence.
I wanted the pictures to feel less like decorative concept art and more like evidence recovered from Wren's archive: keys, microphones, diagrams, lenses, and curious instruments whose departmental purpose has been lost.
The architecture, without the cobwebs
The application has four principal parts:
React terminal interface
|
FastAPI gateway on Hugging Face Spaces
|
+-- MiniCPM5-1B on Modal: structured storyboard planning
|
+-- Nemotron 4B FP8 on Modal: Professor Wren conversation
|
+-- Black Forest Labs FLUX: pseudo-ASCII scene artwork
The React frontend handles onboarding, scene navigation, terminal conversation, and PDF export. FastAPI keeps provider credentials off the client and presents small application-specific endpoints. The two text models remain separate so each can be tuned and warmed for its own job.
Built with Codex, with the receipts
The application was built end to end with OpenAI Codex: architecture, backend routes, model deployments, frontend interaction, tests, debugging, documentation, and the inevitable round of discovering that two perfectly reasonable dependency pins regarded each other with hereditary suspicion.
For the OpenAI track, the reviewable agent trace is published as a Hugging Face dataset:
drdavidtang/build-small-agent-trace
The Git history also includes Codex co-author trailers on the relevant commits.
Badge trail
- An Adventure in Thousand Token Wood: an interactive four-beat story-building adventure.
- Off the Grid: all runtime language intelligence uses SLMs.
- Tiny Titan: the complete language-model range is 1B-4B.
- OpenBMB: MiniCPM5-1B plans the storyboard.
- NVIDIA: Nemotron-3-Nano-4B-FP8 powers Professor Wren.
- Modal: serverless GPUs, memory snapshots, warmup, and prewarming reduce cold starts.
- OpenAI: the project was built with Codex and includes a public agent trace.
What I learned in the wood
The first lesson was that small models reward good manners. Ask for one well-shaped thing at a time. Validate it. Repair only what is broken. Keep the context relevant.
The second was that product character can carry technical constraints without hiding them. Professor Wren's pauses, archival images, and terminal language are not merely decoration. They make model boundaries, startup time, and structured interaction feel like parts of one world.
The third was that a writing assistant need not pretend to be an oracle. Sometimes it is more useful to be a slightly eccentric professor who asks the right question, notices the missing turn, and then gets out of the sentence's way.
Professor Wren is waiting in his story rooms. He has found the nouns. Bring him an idea.
Project: Professor Wren's Story Rooms
Agent trace: Codex build trace
YouTube demo: Video demo



