Scrubs: built for the frontier

For the past ten years, we’ve been writing software the way you write software in 2015. Define your data model. Tighten your validations. Build the UI. Ship it. Every line of code carries a promise that some combination of human and machine will later enforce: a date is a date, a clinician’s primary specialty is one of the eleven we recognize, the bill rate has to be a positive number.

That works. Trusted runs on it. It is how Works got built and got to scale.

But it is also a set of architectural assumptions that bake in a particular relationship between code and judgment. The code is supposed to enforce rules; humans are supposed to interpret edges. When AI is part of the system, that division of labor shifts. The model is now competent enough to enforce some of the rules and interpret some of the edges that used to require people. The question that follows is whether the architecture you built five years ago can accommodate the model that hasn’t shipped yet.

Last quarter we set out to answer that, on a thirty-day deadline.

We gave a small team a blank repository, the goal of rebuilding our intake-to-submission flow end to end, and one architectural rule: assume the models keep getting better. Build for that.

The success bar we set before writing a line of code: ninety percent of incoming clinicians should be able to complete a job submission---intake to application---without needing a human. That’s the number we’d eventually evaluate ourselves against.

The internal name was Scrubs. It is a working implementation, not a product. None of its code is in production today. What it gave us is a one-month proof of what the same business problem looks like when you build it from first principles for the AI that’s coming. Five of those choices are worth writing down.

(Cursor stats said something like 80 to 90 percent of the code in Scrubs was written by AI, which directionally feels right. Most of the month was telling the system what I wanted and reading what came back.)

Flat data structures, because the model does the evaluation

Trusted’s main platform stores a clinician as roughly ninety-four distinct data attributes, nested between one and fifteen layers deep. Before you can add a work experience you have to make sure the location exists. Before the location exists you have to make sure the facility exists. Adding a credential touches a different subgraph. Every relationship is normalized, every foreign key enforced, every constraint guarded by validation. This is good software, in the way the early-2010s taught us to write good software.

Scrubs stores a clinician as a small number of flat tables. Locations are strings. Facilities are strings. Work experience is a near-flat record. There are no foreign keys to the unit catalog, no enum constraint on employment type, no normalized address tree.

This sounds like a regression. It is not, for one specific reason: the system that has to decide whether a clinician matches a job is an LLM, and an LLM does not need a normalized facility ID to know that Memorial Hospital - West and Memorial Hospital West Campus are the same place. It can read both strings and tell you yes or no with its reasoning. The structure that exists to support deterministic SQL doesn’t exist to support an LLM doing semantic comparison.

This is not always the right choice. There are good reasons to keep normalized structures: billing accuracy, regulatory reporting, downstream integrations with systems that demand referential consistency. Trusted’s main platform has all three. But in Scrubs, every place we asked “do we really need this entity, or could we store the string and let the model interpret it later?” the answer was usually that the string was fine. The data model collapsed from hundreds of related tables to a handful of flat ones. The system got faster, simpler, and easier to reason about as a result.

The bet underneath the choice is that the model gets better, not worse, at semantic interpretation over time. We are comfortable with that bet.

Constrain by tools, not by prompts

In Scrubs there are two agents that talk to clinicians. The first, the Activation Agent, gathers profile data from a person who has just arrived. Name, license, specialty, where they want to work, when they’re available. The second, the Advocate Agent, takes over once the profile crosses a threshold and starts surfacing actual jobs.

The Activation Agent literally cannot talk about jobs. Not because we told it not to in the prompt. Because it does not have access to the tools that look up jobs.

This is a consequential difference. The “we told the model not to do X” approach is fragile in the worst way: it works most of the time but fails in exactly the cases where the cost of failure is highest. A clever prompt, a confused turn, a new edge case, and the model helpfully invents an answer that ignores its instructions.

Tool-based constraints do not fail that way. If the agent has no search_jobs tool exposed to it, it cannot return job results. It will tell the user, in conversation, that it needs to learn a bit more before it can help with jobs. It will not fabricate a job, because no tool is available that returns one.

The shift from constraint-by-prompt to constraint-by-tool is one of the most underrated patterns in agentic software design. It treats the tool registry as a security boundary, not a convenience. It gives engineers a place to enforce business rules outside the model’s context window: when a clinician asks for the highest-paying jobs in Ohio, the server-side code filters by the clinician’s qualified roles before the results return to the agent. The tool enforced the policy. The model never had to remember it.

This is the same idea behind the Form pattern we wrote about previously: the agent gets to do work, but only the work the tools allow. The tools are written by people. The model fills in the judgment.

Persistent agents, not sessions

One agent at the top labeled ONE AGENT connects via dashed lines down to five timeline events along a horizontal line at the bottom, labeled DAY 1, MONTH 1, MONTH 3, MONTH 6, and YEAR 1. Caption: SAME CONTEXT ACROSS YEARS.

Traditional software has a session. You log in, do a thing, log out. The system forgets you between sessions, or it remembers the database row but not the conversation that produced it.

Scrubs does not work that way. The same agent that talks to a clinician on day one is the agent that talks to them three months later. It has the full context: every conversation, every form they filled out, every shift they expressed interest in, every change they made to their license. When they text us at three in the morning about a new opening, the agent does not need to be reintroduced. It picks up where it left off.

This is a small idea with large consequences.

The first is that the relationship layer---the thing recruiters used to provide---moves into the system. In Every Open Shift we argued that the relationship doesn’t disappear when an industry digitizes; it moves out of the broker and into the infrastructure that makes the transaction trustworthy at scale. Scrubs is what that move looks like in software. The relationship is the agent’s accumulated context.

The second is on the communication side. When the agent reaches out (a new ICU shift in Phoenix, a license expiring soon, a per diem opportunity that matches a preference the clinician mentioned six months ago) the message is written by the same agent that knows the history, in the same voice it has used with the clinician before. The communication queue handles the procedural rules: don’t text at midnight, don’t double-text, throttle if the clinician has not responded. The writing is the agent’s, with the agent’s context.

The third is on customer expectation. People can tell when an “AI assistant” is actually a sequence of disconnected sessions wearing a friendly mask. They can also tell when something is remembering them. The latter compounds. The former wears out fast.

Conversational UI as the default

A chat thread with five message bubbles alternating left and right. One bubble on the left contains an inline form with two input rows rendered inside the chat. A typing-indicator bubble at the bottom shows three dots. Caption: CONVERSATION FIRST · FORMS RENDERED INLINE.

Trusted’s existing app is form-driven. You see fields, fill them in, click save, advance to the next screen.

Scrubs is conversation-driven. You see a chat. You type a message. The agent decides what to ask next based on what it knows about you, what it still needs, and what the conversation has been about so far.

This is less a religious commitment to chat-as-UI than a practical observation. When you are collecting information from someone whose life does not fit a form (which is most people), the conversational mode handles edge cases the form cannot. “I worked at three different ICUs as a traveler over the last eighteen months, should I list them separately?” is a question your form has to anticipate and build a UI affordance for. A conversation just answers it.

Conversational UI is not always better. Filling out a resume by question and answer is tedious. Reviewing your own profile is easier as a structured view than a chat history. The honest answer is that the future is hybrid: conversation where it works, forms where they work, generated on the fly by the same system. Scrubs leans hard on the conversational side because we wanted to find the edges of what was possible, not because every interaction should be a chat.

The deeper point is that the form-versus-chat decision is now a runtime choice, not a build-time one. The same agent can render a structured form when the structure helps and ask a question when it doesn’t. The static UI of the last decade was a constraint imposed by the limits of the systems we had. Those limits are gone.

Same effort applied to everyone

Two models stacked. On top, a GATED funnel narrows from many input dots at its wide top to three dots at its narrow bottom, with the caption FILTERED BY HUMAN CAPACITY. On the bottom, an OPEN horizontal bar contains eleven evenly spaced dots, with the caption SAME EFFORT FOR EVERYONE.

The economics of Trusted’s existing system, and of most healthcare staffing companies, encode a hard constraint: human recruiters are a scarce resource, so the business has to decide who gets their time. Qualification engines, intent signals, fit scores. The work of human recruiters is gated by an upstream filter that asks, in effect, is this person worth our time today?

Scrubs removes that gate. Every clinician who arrives gets the same agent-driven experience, the same depth of conversation, the same follow-up. A first-year nursing student gets the same machinery a twenty-year travel nurse does. They get different answers---the agent’s context, the jobs it surfaces, the timing of its outreach---but they get the same system.

This is only possible because the economics of running an agent are different from the economics of running a recruiter. The marginal cost of one more conversation is cents, not hours. The constraint that shaped the older system, scarce human time, no longer applies.

What this changes is the population we can serve. Students who graduate in eighteen months. Per diem nurses we do not have shifts for today but might next quarter. The experienced clinician who is “checking the market” but not actively looking. These are all people the older system rationally ignored. Scrubs can engage with all of them, build context over time, and re-engage when the moment is right. Not every conversation pays off. The cost of being there for the ones that do has collapsed.

This is, in the long run, a different business than the one Trusted is today.

What thirty days actually looks like

The thirty-day timeline is the headline. The 80% AI-written code is the part that matters.

Trusted’s main codebase is the result of nine years of engineering work: millions of lines, hundreds of database tables, dozens of integrations. Scrubs is none of that. It is a Rails app built on a starter template, with a small set of well-defined models and a thin view layer. The whole stack fits in one engineer’s head at once.

The 80% number isn’t magic and isn’t laziness. It is the result of the architectural decisions being made before the code got written. Once the data model, the agent boundaries, and the conversational UI were settled, the actual code those decisions implied was routine enough that an AI assistant could write most of it. Every controller, every model, every prompt scaffold, every test. The engineers’ time went mostly to deciding what should exist and reviewing what the AI produced. Keyboard time on actual code was a small fraction of either.

The practical takeaway for any team trying to replicate the velocity is that the order matters. Decide first. Generate second. Review third. The speed came from the generate step being mostly automatable once the decide step had been done well by humans. Skip the deciding---ask the model to make architectural choices it had no business making---and you get code that runs but feels designed by a committee of one. Skip the reviewing and you get the same code with no one accountable for it, which is technical debt with a faster clock. The savings are real, and they live entirely in the middle step.

We do not think this generalizes to all codebases. Trusted’s main platform has too much legacy weight, too many integration surfaces, too many edge cases the AI does not know about. But on a clean codebase with a clear architectural spine, AI-assisted development changes the unit economics of building software in a way that, nine months ago, we would have been skeptical about.

The corollary is that organizational design starts to become the ceiling. If a small team can produce a meaningful application in thirty days, the bottleneck shifts to how fast you can decide what to build and how confidently you can review what got built. Cycles you used to plan in quarters now want to be planned in weeks. We are still figuring out what that means.

What Scrubs doesn’t solve

Scrubs is a clean room. It does not handle the regulatory reporting, the billing integrations, the historical data migration, the vendor coordination that the main Trusted platform handles every day. A lot of the simplification (flat data, conversational-only UI, no normalized facilities) works because we said we are not going to integrate with X for this exercise.

There are clinician journeys Scrubs does not model. It can take a single nurse from intake to a single submission. It does not run a credentialing back office. It does not reconcile travel contracts. It does not handle compliance audits. The things that make healthcare workforce hard at production scale are the things Scrubs deliberately bracketed out.

What it told us is what the core conversational and matching loop looks like when built from scratch with these architectural assumptions. The hard parts of the rest of the platform are still hard. They will be re-imagined with similar logic, on a different timeline, with different tradeoffs.

There’s also the part Scrubs did tackle that isn’t finished. The first version of an agentic onboarding system is not the steady-state version. Users have been trained by a decade of bad chatbots to assume the worst of any chat interface they meet, and that trust has to be re-earned conversation by conversation. The interaction paradigms for agentic UX aren’t settled---we made choices in Scrubs about when to render a form inline, when to ask a clarifying question, when to act unilaterally, and we’ll revise most of those choices in production. The refinement tail on AI-native software is longer than the procedural-software equivalent, not shorter. Better to be honest about that now than ship a triumphalist story we’d have to walk back.

Where this goes

We are already working on the first integration: bringing a conversational chat surface into Trusted’s existing platform so it can do Scrubs-style intake without rebuilding the rest of the system. The hard part isn’t the chat. It’s giving the agent enough agency inside an existing codebase to actually do useful work without breaking the things that already work.

Resume parsing is another candidate. The OCR is good enough now that asking a clinician for anything beyond “upload your resume” is asking for too much. The harder work is making sure the parsed result lands cleanly in our existing data structures, which were not designed for it.

Underneath both is the slower work of cleaning up our data structures, our test infrastructure, our deployment surface, so that the next time someone gives a small team thirty days to do something hard, the answer is not yes, but you’ll have to build it in a separate repo because the main one is too brittle.

--- Spenser, Engineering