Building a multi agent triage application

Jul 10, 2025

LLM

Agent

Python

LangGraph

Angular

Building a multi agent triage application

Help! I need agents

The first half of 2025 has been all about AI agents in the software development world. Some say they're the future, and will soon replace countless jobs. Others argue they're just overhyped and too unreliable for the real world. So, where should you place your bets? To cut through the noise, I decided to build my own agent-based application: a medical triage system. I wanted to see for myself what these agents can truly do. Spoiler: I was impressed 😄.

Why start with medical triage? Well, triaging is a critical process in many industries, involving the sorting of large amounts of data to decide on the next steps. Since this data is often in natural language, LLMs are the perfect tool for the job to improve and automate these processes.

In this post, I'll give you a high-level overview of the application, explain how it works, and detail the technologies I used. We'll also dive into some of the code. If you're eager to explore further, you can find the complete code repository at the end.

The goal
Why agents?
The backend: LangGraph workflow
Frontend
Real-Time Communication: WebSocket connection
Possible improvements
Conclusion
Bonus song

The goal

I've named the application TriageFlow (I know, great name). Its purpose is to automate parts of the patient triage process in a hospital emergency room. While this definitely isn't a production-ready application and simplifies the complexities of a real-world ER, it served as an excellent learning project and a powerful demonstration of what's possible with today's technology.

The application consists of a frontend where the ER receptionist can follow the intermediate results of the agents and take action on those results. The main page consists of a chatbox where we can input a conversation between the receptionist and the patient. In a real-world scenario, this would likely be fed by a speech-to-text service, but for now, we'll just paste the conversation directly to set the agents in motion.

Why agents?

You might be wondering, with LLMs becoming so powerful, why do we need to complicate things with "agents"? The answer lies in the fundamental limits of a single LLM call. It’s like asking one employee to run an entire company based on a single meeting. Even if they’re brilliant, they can't do everything at once or respond to changing needs. It lacks crucial capabilities:

No Control Flow: A single call can't make decisions or loop. It follows a straight path from prompt to output.
No Iteration or Self-Correction: If the output has a flaw, you can't ask the model to "rethink" just one part of its answer. You have to start the entire process over.
Poor Problem Decomposition: Complex tasks, like triaging a patient, are best solved by breaking them down into smaller, sequential steps. A single LLM struggles to handle this layered logic reliably.

This is precisely the problem that frameworks like LangGraph solve. They provide the scaffolding to move from a single, isolated brain to a structured, multi-step thought process, enabling our applications to iterate, make decisions, and tackle complexity in a much more robust way. This brings us to the concept of an AI agent. While the term is used broadly, the definition that I think describes an agent the best is:

An AI agent is an autonomous system that uses a LLM as its reasoning brain to achieve a specific goal. It works by operating in a continuous Observe, Think, Act loop, where it analyzes the current situation, decides on the next best step, and uses tools (like code execution or API calls) to interact with its environment.

For a complex workflow like medical triage, one agent isn't enough, we need A team. Just like in a real emergency room, you have a receptionist, a triage nurse, and a coordinator, each with a specific role. This is why I created multiple specialized agents:

Intake: This agent's job is to listen in and extract key information from the conversation, such as the patient's symptoms, personal details, pain level, current medications, allergies, ...
Triage: Using the information gathered by the Intake agent, this agent assesses the situation and assigns an Emergency Severity Index (ESI) score.
Coordinator: This agent decides on the next steps, checking the availability of doctors and nurses, and allocating resources.

This approach makes the system more reliable and maintainable. Each agent is simpler, its prompts are more focused, and its behavior is more predictable.

Overseeing these three is the "Supervisor" agent The supervisor agent controls all communication flow and task delegation, making decisions about which agent to invoke based on the current context and task requirements. This "supervisor" model is a common and effective architecture in multi-agent systems, alongside other patterns like swarms or more complex hierarchical structures.

Okay, enough of this abstract theory stuff, let's look at some code.

The backend: LangGraph workflow

Core agent Architecture

As mentioned, I used a supervisor architecture. To make this architecture, I used LangGraph. It allows us to model our entire workflow as a Graph.

I like to think of a graph as a railway system for our state. The main building blocks of a graph are:

Nodes (train stations): Functions that hold the logic of our agents. They receive the current state as input and return an updated state.
Edges (tracks): Connection between nodes. They can conditionally branch or have a fixed route.
State (cargo): A shared data structure that represents the current snapshot of the workflow. I chose a Pydantic model to get runtime validation out of the box.

What I like about LangGraph is that it is very modular, the nodes and edges also give a clear visual representation. LangGraph doesn't limit you to one graph, you can easily nest a graph inside a node. This is what I did, every node in the "supervisor" is an agent with its own graph. This makes the overall application much cleaner and easier to manage. We can build, test, and maintain each "sub-workflow" independently before plugging it into the main system.

Below is a high-level overview of the main graph structure and how it translates into code.

# From apps/backend/src/graphs/main_graph.py
class TriageWorkflow:
    def __init__(self):
        self.graph = self._build_graph()
        self.memory = MemorySaver()
        self.app = self.graph.compile(checkpointer=self.memory)

    def _build_graph(self) -> StateGraph:
        workflow = StateGraph(WorkflowState, output=WorkflowState)

        # Add specialized agents
        workflow.add_node(SUPERVISOR_NODE, self._supervisor_node)
        workflow.add_node(INTAKE_NODE, self._intake_node)
        workflow.add_node(TRIAGE_NODE, self._triage_node)
        workflow.add_node(COORDINATOR_NODE, self._coordinator_node)

        # Define workflow flow
        workflow.set_entry_point(SUPERVISOR_NODE)
        workflow.add_conditional_edges(SUPERVISOR_NODE, self._route_next_step)

        return workflow

Handling state

Handling state effectively is a critical aspects of a robust agentic system. For TriageFlow, I implemented a two-level approach to keep things both organized and efficient.

The Global "WorkflowState"

Single source of truth for the whole workflow. This global state holds all the essential information that needs to be shared across the entire workflow. By keeping shared data in one place, we ensure that every agent is always working with the most up-to-date information.

Local agent State

At the same time, each agent can have its own local state. This is like a specialist's private notebook. It holds temporary data, intermediate thoughts, or calculations that are only relevant to that specific agent's task.

A good practice around state management, is to keep your state as "low" as possible. We only elevate information to the global WorkflowState when it needs be shared across agents.

1. Intake agent: Intelligent Information Extraction

The first stop in our workflow is the Intake agent. Its sole responsibility is to carefully parse the unstructured conversation between the ER receptionist and the patient. It will try to extract key information like symptoms, pain level, medication, allergies, patient history, ... For the "brain" of this agent, I chose Google's Gemini 2.5 Flash. It's a smaller, lightweight model that offers an excellent balance of speed and capability, making it perfect for a proof-of-concept like this. You can easily get a free API key from Google AI Studio to get started.

However, a word of caution for any real-world application. It's probably not the best idea when handling sensitive data like medical records to use a cloud based AI model. Know what you are doing, because in a production environment privacy is key. A local model can be a better choice.

# From apps/backend/src/agents/intake.py
class IntakeAgent:
    def _build_graph(self) -> StateGraph:
      """Build the intake agent graph with local state."""

      workflow = StateGraph(IntakeAgentState, output=IntakeAgentState)

      workflow.add_node(
          EXTRACT_CONVERSATION_INFO_NODE, self._extract_conversation_info
      )
      workflow.add_node(ANALYZE_AND_GATHER_INFO_NODE, self._analyze_and_gather_info)
      workflow.add_node(TOOLS_NODE, self._execute_tools)

      workflow.set_entry_point(EXTRACT_CONVERSATION_INFO_NODE)
      workflow.add_edge(EXTRACT_CONVERSATION_INFO_NODE, ANALYZE_AND_GATHER_INFO_NODE)
      workflow.add_conditional_edges(
          ANALYZE_AND_GATHER_INFO_NODE,
          self._should_continue,
      )
      workflow.add_edge(TOOLS_NODE, ANALYZE_AND_GATHER_INFO_NODE)

      return workflow

A fundamental challenge with LLMs is that they are designed to generate unstructured, free-flowing text. For an application to use an LLM's response, we need that information in a structured format, like JSON. LanGraph gives us a really nice method for this with_structured_output() by chaining this to our model, we can provide it with a data structure how we want it to structure its response. Behind the scenes LangGraph uses clever prompt engineering and output parsing. In my testing this worked very well and the output always had the requested structure. I like to think of it as a bridge connecting the non-deterministic world of LLMs to the deterministic world of our application code. In my testing, this feature worked flawlessly, reliably providing structured data that the rest of my program could immediately use.

async def _llm_parse_conversation(self, conversation: str) -> IntakeConversationInfo:
    """Use LLM to parse conversation and extract patient information."""

    system_prompt = """
        You are a medical intake specialist.
        Extract key patient information from the conversation.
        The conversation is between a nurse and patient.

        Guidelines:
        - Extract symptoms mentioned by the patient
        - Look for pain ratings on a 1-10 scale
        - Look for medications mentioned by the patient
        - Look for allergies mentioned by the patient
        - Summarize the conversation as the chief complaint
        - Extract any additional notes from the conversation
        - Be precise and only include information explicitly mentioned
        - Use null for missing information
        """

    human_prompt = f"""Extract patient information from this conversation:
        {conversation}
        """

    messages = [
        SystemMessage(content=system_prompt),
        HumanMessage(content=human_prompt),
    ]

    base_model = _get_model()
    structured_model = base_model.with_structured_output(IntakeConversationInfo)

    response = await structured_model.ainvoke(messages)

    return response

Beyond just thinking, a truly useful agent needs to be able to do things. The Intake agent, for example, needs the ability to look up a patient's medical records. This is achieved through tool calling.

We grant the agent access to a "tool", which is simply a function it can choose to call to get more information. For this proof-of-concept, I created a tool that allows the agent to search a CSV file containing patient data. To make things a bit more fun, I populated the patient records with characters from famous movies and TV series.

@tool
def get_patient_medical_record(patient_identifier: str = "") -> PatientInfo:
    """Retrieves patient's complete medical record from hospital database."""

    # Intelligent patient matching by name, ID, or partial match
    csv_path = Path(__file__).parent.parent.parent / "data" / "patients.csv"
    df = pd.read_csv(csv_path)

    # Flexible matching logic
    identifier_lower = patient_identifier.lower()
    for _, row in df.iterrows():
        full_name = f"{row['firstname']} {row['lastname']}".lower()
        if identifier_lower in full_name:
            return _create_patient_info_from_row(row)
        # ...

model = _get_model()
# Bind tools to the model for tool calling
self._model = model.bind_tools([get_patient_medical_record])

2. Triage agent: Medical Assessment with ESI Protocol

The Triage agent acts as our digital triage nurse, and its primary function is to assess the patient's condition and assign a priority level based on the Emergency Severity Index (ESI) protocol..

The ESI is a five-level algorithm used in emergency rooms worldwide to sort patients by acuity—from level 1 (most urgent) to level 5 (least urgent). Our agent takes the structured data from the previous step, including symptoms and patient history, and determines the appropriate ESI level. It also generates a clear, concise justification for its decision.

For this POC the agent logic is completely driven by an engineered prompt. So, for those keeping score at home, this component is less of an autonomous "agent" and more of a "highly-qualified intern" who just does one thing perfectly without asking questions.

In production, this component would be a great candidate for promotion to full agent status.
We could easily extend its capabilities with tools, such as a RAG system that allows the LLM to consult up-to-date medical protocols.

# From apps/backend/src/agents/triage.py
async def _assess(self, state: WorkflowState) -> WorkflowState:
    system_message = SystemMessage(content="""
    TRIAGE INSTRUCTIONS (Follow ESI algorithm):

    Step 1: Is immediate life-saving intervention required?
    - Not breathing, pulseless, severe distress → ESI Level 1

    Step 2: High-risk situation?
    - Could deteriorate quickly, severe pain (7/10+) → ESI Level 2

    Step 3: How many resources needed?
    - Many resources (2+): ESI Level 3
    - One resource: ESI Level 4
    - No resources: ESI Level 5
    """)

    # Process patient data through structured assessment
    return await self._structured_assessment(state)

3. Coordinator agent: Smart Staff Assignment

Finally, we have the Coordinator agent. After a patient has been assessed and given a triage score, this agent steps in to handle the logistics. Think of it as the team's dispatcher or operational brain, responsible for answering the crucial question: "Who is the best person to handle this case right now?"

This agent also has a tool to his disposal. It can retrieve staff that can possibly help. The agent analyzes the information produced by the intake and triage agent and tries to match this best with the staff available.

Its key ability is a tool that allows it to query the staff database (which, for this project, is a simple CSV file). It doesn't just find a doctor; it intelligently prioritizes available staff. For example, if the primary cardiologist is busy, its logic includes a fallback to find the next most suitable and available staff member. This ensures that every patient is matched with the appropriate personnel as quickly as possible, completing the automated triage workflow.

# From apps/backend/src/agents/coordinator.py
@tool
def get_staff_member(query: str = "") -> list[StaffMember]:
    """
    Retrieves staff members from the hospital database.
    Searches by name, role, speciality, or status with intelligent prioritization.
    """
    # Prioritize by specialty match and availability
    def get_speciality_priority(staff_info):
        row, staff_data = staff_info

        if query_lower == staff_data["speciality"]:
            speciality_priority = 0  # Exact match
        elif query_lower in staff_data["speciality"]:
            speciality_priority = 1  # Partial match
        else:
            speciality_priority = 2  # No match

        availability_priority = 0 if staff_data["status"] == "available" else 1
        return (speciality_priority, availability_priority)

# From apps/backend/src/agents/coordinator.py
async def _coordinate_staff_assignment(self, state: CoordinatorAgentState):
    """Main coordination logic with mandatory tool usage."""
    system_message = SystemMessage(content=f"""
    CRITICAL: You MUST use the {GET_STAFF_MEMBER_TOOL_NAME} tool to find appropriate staff.

    1. Analyze patient triage data (medical category, urgency)
    2. Search for staff matching the exact medical category
    3. If no available specialists found, search for "available" staff
    4. For Neurology cases: also search "Emergency Medicine", "Trauma"
    5. For Cardiology cases: also search "Emergency Medicine", "Internal Medicine"
    """)

    response = await self.model.ainvoke([system_message, context_message])
    return response

Frontend

The main purpose of the frontend is to submit the conversation between the patient and the ER receptionist. The conversation gets sent to the backend which triggers the agent workflow. In the meantime the frontend should display every intermediate result, so the receptionist can see the progress and decisions the system takes and eventually take action.

Because of these actions a simple HTTP connection won't cut it. I need bidirectional communication between client and server. WebSockets provide a persistent, two-way communication channel between a client and a server over a single, long-lived connection. The frontend opens the connection and adds a unique sessionId, so we can keep track of the open connections.

// From libs/patient/data-access/src/lib/patient-data.service.ts
@Injectable()
export class PatientDataService {
  readonly #http = inject(HttpClient);
  readonly #config = inject(APP_CONFIG);

  // WebSocket for real-time communication
  openTriageConnection(sessionId: string): WebSocketSubject<TriageDTO> {
    return webSocket(`${this.#config.websocketEndpoint}agents/patient/triage/${sessionId}`);
  }
}

Inside the component we leverage reactive programming via RxJS to listen to these messages and map them to the results presented on the screen. I really like RxJS in this example, we just react to messages and use RxJS operators to create all state necessary to put on the screen.

Programming reactively requires a mental shift compared to imperative programming. Once it clicked for me I saw the benefits clearly: the code mainly gets easier to understand and debug, because a property gets declared in 1 place only and does not get modified in other parts of the code.

// From libs/patient/feat-triage-tracker/src/lib/triage-tracker.component.ts
export class TriageTrackerComponent {
  // Reactive WebSocket connection
  readonly webSocketSubject$ = toObservable(this.sessionId).pipe(
    switchMap((sessionId) => of(this.#patientDataFacade.openTriageConnection(sessionId))),
    share()
  );

  // Send messages with side effects
  readonly #userMessageWithSideEffect$: Observable<TriageTrackerOutput> = this.userMessages$.pipe(
    withLatestFrom(this.webSocketSubject$),
    tap(([message, webSocketSubject]) => {
      webSocketSubject.next({
        sessionId: this.sessionId(),
        type: TriageMessageTypeEnum.startWorkflow,
        conversation: message
      });
    }),
    map(([message]) => ({
      type: TriageTrackerOutputTypeEnum.Message,
      data: { type: MessageSenderEnum.Human, content: message }
    }))
  );

  // Handle agent responses
  private readonly agentResponses$ = this.webSocketSubject$.pipe(
    switchMap((webSocketSubject) => webSocketSubject.asObservable()),
    map((response) => {
      switch (response.name) {
        case AgentNameEnum.intake:
          return { type: TriageTrackerOutputTypeEnum.Intake, data: response.data };
        case AgentNameEnum.triage:
          return { type: TriageTrackerOutputTypeEnum.Triage, data: response.data };
        case AgentNameEnum.coordinator:
          return { type: TriageTrackerOutputTypeEnum.Coordinator, data: response.data };
      }
    }),
    filter((output) => output !== null)
  );
}

Real-Time Communication: WebSocket connection

Backend WebSocket Implementation

The backend uses FastAPI's WebSocket support to create real-time, bidirectional communication:

# From apps/backend/src/api/agents.py
@app.websocket("/api/ws/agents/patient/triage/{session_id}")
async def websocket_endpoint(websocket: WebSocket, session_id: str):
    await websocket.accept()
    active_connections[session_id] = websocket

    try:
        while True:
            data = await websocket.receive_text()
            message_data: TriageDTO = json.loads(data)

            if message_data["type"] == TriageMessageTypeEnum.START_WORKFLOW:
                await workflow_service.start_workflow_stream(
                    websocket, session_id, message_data["conversation"]
                )
    except WebSocketDisconnect:
        active_connections.pop(session_id, None)

The WorkflowService streams agent results in real-time, we can use astream_events from LangGraph to react on these events and sent them as message to the frontend.

# From apps/backend/src/services/workflow_service.py
async def _execute_workflow_stream(self, session_id: str, conversation: str):
    """Execute workflow and yield formatted messages as JSON strings."""

    async for event in self.workflow.app.astream_events(initial_state, config=config):
        if event["event"] == "on_chain_start" and event["name"] in ["intake", "triage", "coordinator"]:
            # Send "agent is running" update
            running_message = self._create_running_agent_update(event["name"], session_id)
            yield running_message.model_dump_json(by_alias=True)

        elif event["event"] == "on_chain_end":
            # Send agent results
            async for formatted_message in self._handle_node_completion(
                event["name"], event["data"]["output"], session_id
            ):
                yield json.dumps(formatted_message.model_dump(by_alias=True))

Possible improvements

Memory management

For a production application, we will need a good way of handling memory. Agents should have access to the information they need to produce the best outcome. We cannot always dump every piece of information into the LLM, we will just overload the LLM with unrelated information and run into context limit issues. LLMs produce better results when you give them focused information, so they have every piece of information they need for their tasks and nothing more.

Evals

A serious agentic application should have evals. In some ways evaluations become a form of intellectual property, reflecting understanding of how the application performs under real-world conditions. Unlike traditional deterministic software, agentic systems operate in more dynamic, probabilistic environments. Behavior can vary based on input, context, and prompt engineering. Their true value lies not just in design, but in testing and refinement that proves their desired behaviors. This continuous evaluation process builds a proprietary understanding of the system's strengths, weaknesses, and optimal operation.

Human in the loop

I didn't fully add the capabilities of a human in the loop. With the current application the user is able to monitor the output of each agent. But has no real control over it. With LangGraph it's easy to pause or edit the current execution with the interrupt() method. The user can for example accept or reject an action to take, like assigning a doctor in our example. We can also let the user edit the graph state, so the input for the next agent can be enhanced.

Open for refactoring

Refactoring should always be in the back of your mind when building agentic applications, because the underlying technology is changing at an extraordinary pace. Designing for modularity allows us to easily swap in newer, more powerful models and frameworks, ensuring the application can follow the latest enhancements.

Conclusion

Building this project was more than just a technical exercise, it felt like a new paradigm of software development. To me working with LLMs feels like adding a new layer to the traditional stack of frontend and backend. It's a layer dedicated to bridging the gap between the deterministic, logical world of code and the non-deterministic, creative power of a language model. Our job is to build that bridge effectively to unlock capabilities we never had before.

It’s clear to me that these agentic systems will automate a lot of business processes, especially those that were previously too reliant on unstructured data to automate. I don't think these tools will make all jobs obsolete (maybe some jobs). The far more common outcome will be augmentation, not replacement I believe. In this example, nurses could focus more on really helping the patient and let the application handle the repetitive, data-intensive tasks.

Of course in this world of AI, nothing is written in stone. In a few months the landscape could look entirely different. As a developer learning to build and orchestrate these systems is a very useful exercise, the technology is here and can deliver real business value.

Want to see more? Check out the full source code and feel free to reach out with questions!

🎶 Bonus song

As is tradition, I'll leave you with a great song to wrap things up. Maybe in the future, we won’t be calling out “Help, I need somebody!” like the Beatles sang, but instead we’ll be shouting “Help, I need an agent!”

The Beatles - Help!