Designing Healthcare Agents That Run Without Human Oversight
Forget chatbots—here's what happens when AI agents wake themselves up to process insurance claims with zero human babysitting required.

When the Agent Wakes Up on Its Own
A design postmortem on an event-triggered prior auth agent on GCP, and what shifts in your architecture when nobody is watching it run.
Your home security system has no user interface. No one walks up to the motion sensor and says "begin monitoring now." The sensors are always listening, processing ambient signals, making routing decisions in milliseconds. When motion crosses a threshold, the system evaluates: homeowner returning from work, delivery driver dropping a package, or unexpected entry requiring immediate escalation. Each scenario routes to a different outcome: silent dismissal, notification to your phone, or full alarm with dispatch. Every event gets logged whether it triggered an action or not, because the log is your audit trail for reconstructing what happened when you were not there to witness it.
The permission boundary was set at installation time, not at the moment of activation. The sensors can communicate with the central panel and the cloud service, nothing else. That constraint lives in the wiring and the network configuration, not in the software running on the devices.
A payer organization's prior authorization request stream is the motion sensor field. EDI feeds, clearinghouse APIs, provider portal submissions arrive continuously throughout the day. There is no operator clicking Run. There is no chat interface waiting for someone to type a question. The ambient agent either activates correctly on event arrival or the request sits unprocessed. In healthcare operations, the second outcome has direct patient consequences.
This article examines the design decisions that change when you remove the user from the activation loop. Specifically, where the trust boundary lives, how audit becomes the only feedback signal, why three nodes is the right pipeline shape, and what the confidence threshold actually does when there is no human to override it in real time.
What This Prototype Does
The prototype is an always-on, event-triggered prior authorization agent on Google Cloud Platform. A Pub/Sub topic receives prior authorization request messages from upstream systems. A Cloud Run subscriber service is always listening. When a message arrives, the subscriber validates the schema, instantiates a fresh LangGraph pipeline, runs the request through three nodes, writes an audit record to BigQuery, and returns to listening. No user clicks anything at any point.
The three-node pipeline reflects the three responsibilities the agent has on every event. CriteriaEvalNode calls Vertex AI Gemini with the clinical request and a structured output schema. ConfidenceRoutingNode applies the 0.80 threshold and assigns a determination tier. AuditWriteNode writes the structured determination to BigQuery; if the determination falls below the confidence threshold, a second record goes to the human review queue table. The subscriber then returns HTTP 200 to Pub/Sub and resumes listening for the next message.
The existing portfolio already demonstrates user-invoked prior authorization pipelines and agent-to-agent negotiation between provider and payer agents. This prototype adds the third activation pattern. Ambient event-driven activation is the pattern that matches how most prior auth work actually arrives at a payer operation.
The software terms in this article reference the server code at github.com/paullopez-ai/prior-auth-ambient-agent. The UI is at github.com/paullopez-ai/prior-auth-ambient-agent-ui.
The Activation Pattern Is the Architecture
The activation pattern determines how the agent starts running, and that choice cascades through every other architectural decision. Three patterns exist in the enterprise AI landscape: user-invoked agents where someone opens a UI, types a request, clicks a button, and sees a result. Agent-to-agent systems where one agent receives a structured message from another agent over a typed protocol and returns a determination. Ambient event-driven systems where an event arrives from an upstream system through Pub/Sub, Kafka, SQS, EventBridge, or Service Bus, with no user and no calling agent present.

Moving from user-invoked to ambient changes the input contract first. There is no UI form to constrain bad input or provide helpful error messages. The first node cannot be the LLM call; it has to be schema validation that can reject malformed events without retry. The retry semantics shift to infrastructure conventions: HTTP 400 means "do not retry" for terminal errors, HTTP 500 means "retry with backoff" for transient errors. Pub/Sub honors these conventions automatically.
State management changes fundamentally. Each event is independent with no conversation context and no shared memory between concurrent events. The audit log becomes the only after-the-fact feedback signal. There is no user to notice a wrong answer in real time and course-correct the next interaction. The TOGAF Phase C Intelligence Architecture gap becomes critical here: the system prompt is the specification, and the specification must be precise enough to run correctly without a human to interpret edge cases.
The design implication is structural: in an ambient system, the architecture sets the trust boundary at infrastructure through IAM, schema contracts, and audit logging because runtime supervision is no longer available.
Three Nodes, Not Four
The pipeline has three nodes. CriteriaEvalNode handles inference. ConfidenceRoutingNode handles the trust boundary decision. AuditWriteNode handles persistence. Each node has exactly one external dependency or none. CriteriaEvalNode calls Vertex AI Gemini and can fail on model errors or timeout. ConfidenceRoutingNode is pure logic with no external calls and cannot fail. AuditWriteNode calls BigQuery and can fail on write errors without affecting the determination already made.

This decomposition is not architectural minimalism for its own sake. It reflects a specific design choice about where failure isolation matters most. A Gemini timeout should not cause a BigQuery retry. A BigQuery write failure should not repeat the Gemini call. Each node failure surfaces independently and routes to its own retry policy at the infrastructure layer.
The existing payer-auth-intelligence prototype demonstrates the four-node pattern with an interrupt-before HumanReviewNode that pauses the pipeline for human input. That pattern is correct for user-invoked agents where a human is available to provide the input that resumes the pipeline. In an ambient system there is no human available in real time to resume the pipeline, so the interrupt pattern is replaced by writing the sub-threshold determination to a queue table that a human reviews later. The architectural primitive is different. The activation pattern dictates the orchestration pattern.
The Confidence Threshold Without a User
The confidence threshold serves a different function in an ambient context than in user-invoked systems. The threshold remains 0.80 for portfolio consistency with payer-auth-intelligence, but its operational meaning changes completely. In a user-invoked agent, the user is the final quality check. The model can produce a plausible-but-wrong answer at 0.85 confidence, and the user notices something is off and overrides it. The user completes the quality loop.
In an ambient agent, the user is not in the loop at the moment of decision. The confidence threshold is the only check that exists in real time. A determination at 0.81 confidence is auto-approved without anyone reviewing it. A determination at 0.79 confidence routes to the human review queue and waits for someone to examine it eventually. This shifts how the threshold should be chosen from a tuning exercise to a governance decision.
In an ambient system the threshold is not a tuning parameter; it is a governance artifact that embeds organizational risk tolerance directly into the runtime. Setting it too high produces a review queue that overwhelms the human team. Setting it too low produces auto-approvals on cases that should have been routed for review. Classical confidence routing matters more in ambient contexts because there is no user available to override a fluent-but-wrong LLM output in real time.
The threshold should be a configurable environment variable (CONFIDENCE_THRESHOLD) documented with rationale, not a magic number in source code. The prototype implements this discipline to make the governance decision explicit and auditable.
Audit Is the Feedback Loop
In a user-invoked agent the user provides feedback at the moment of decision. They accept the recommendation, override it, or escalate it. The feedback is immediate and tied to the request that produced it. In an ambient agent the feedback loop is the audit log.
Every event produces a BigQuery record regardless of outcome. The audit record includes the message ID, the model version, the confidence score, the rationale, the processing time, the token count, and the cost in USD. Auto-approvals and human review routes both write to the audit_records table. Sub-threshold determinations also write to the human_review_queue table, where they wait for a human to act on them.
The audit log is not a compliance artifact bolted on after the system is built. In an ambient system the audit log is the only signal that tells operators whether the agent is making good decisions. Token cost trends in the audit log signal model drift. Confidence distribution trends signal whether the threshold is calibrated correctly. Processing time percentiles signal whether the pipeline is degrading. The audit log is the production telemetry, the compliance record, and the quality evaluation dataset, all in one schema.
The BigQuery choice is deliberate. The audit table is append-only, high-volume, and queryable by compliance and analytics teams who already work in SQL. Firestore is document-oriented and not analytics-native. Cloud SQL adds operational overhead and is not built for the workload pattern. BigQuery is the correct enterprise audit sink for an ambient agent that may eventually process millions of events.
The Trust Boundary Lives in Terraform
In a user-invoked agent, the trust boundary is partially enforced by the UI. The user sees the determination, can override it, and is implicitly authenticated by the session that produced the request. In an ambient agent that supervision layer is gone. The trust boundary moves entirely to infrastructure.
The subscriber service account in this prototype has two permissions: roles/pubsub.subscriber and roles/bigquery.dataEditor. It cannot read from BigQuery. It cannot call other GCP services. It cannot publish to Pub/Sub. The IAM configuration is written in Terraform before the pipeline code, so the constraint exists before the system is deployable. Any compliance reviewer can read iam.tf and understand exactly what the agent can and cannot do.
This is the trust posture that ambient systems require. Permissions are set at infrastructure time, not at runtime. The agent's behavior is bounded by what its service account can do, not by what its prompt instructs it to do. A misbehaving prompt cannot escalate permissions because the permissions are not in the prompt's reach.
What Production Would Add
The prototype demonstrates the ambient activation pattern, the three-node pipeline, the confidence-gated routing, and the audit trail discipline. It does not yet include the infrastructure patterns that enterprise deployment requires. Dead letter queue handling would add a Pub/Sub dead letter topic for messages that fail processing repeatedly, with its own monitoring and alerting. Cloud Healthcare API integration would replace synthetic clinical notes embedded in Pub/Sub message payloads with FHIR Bundle references dereferenced at processing time.
Multi-region failover would deploy the system to two regions with Pub/Sub topic mirroring and BigQuery cross-region replication for disaster recovery. Cost controls would add per-service budget alerts, Vertex AI quota limits, and Cloud Run max-instance constraints scaled to business volume expectations. The prototype includes a $20/month billing alert recommendation in the README; production would implement comprehensive cost governance.
Observability beyond audit would add Cloud Trace, Cloud Profiler, and structured Cloud Logging with log-based metrics. The audit log carries forward the discipline learned from prior auth radar, but the stakes are higher in ambient systems because no user is there to notice missing audit data in real time. The audit log is the business record; production needs both business audit and operational telemetry.
What the Architecture Reveals About Ambient AI
The interesting part of this prototype is not the technology stack. Pub/Sub, Cloud Run, Vertex AI, and BigQuery are well-understood Google Cloud primitives. The interesting part is what changes in the architecture when you remove the user from the activation loop.
The first node has to be schema validation, not inference. The trust boundary moves from runtime supervision to infrastructure permissions. The confidence threshold becomes a governance artifact rather than a tuning parameter. The audit log becomes the only feedback signal. The pipeline shape simplifies because the interrupt-before pattern depends on a human being available to resume execution. The IAM configuration becomes the trust documentation.
Most enterprise AI work in healthcare arrives as events, not chat sessions. EDI feeds, clearinghouse APIs, scheduling system updates, claim status changes. The activation pattern that matches the work is ambient, not user-invoked. Architects who have only built request-response agents will design ambient systems with assumptions that do not survive the first production day.
Close
The well-designed system is not the one that responds when you tell it to; it is the one that knows what to do when you are not there. Ambient agents are not user-invoked agents with a different trigger. They are a different architectural primitive that requires different decisions about input contracts, trust boundaries, audit, and orchestration. The production work in healthcare AI increasingly looks like ambient activation, and the architects who can design for that pattern are the ones who can move from impressive demos to systems that run unattended.