Understanding the AgentState Data Structure in ai‑hedge‑fund

The AgentState is a TypedDict that serves as the central runtime container for LLM messages, mutable analysis data, and workflow metadata in the ai‑hedge‑fund simulation, implemented across src/graph/state.py and src/data/models.py.

The AgentState data structure forms the backbone of the graph-based workflow in the virattt/ai-hedge-fund repository. It enables type-safe state management as data flows between analyst agents, risk managers, and the portfolio manager in this LLM-driven hedge fund simulation.

Core Components of AgentState

The architecture separates runtime state management from domain-specific data models. This separation allows LangGraph to handle state transitions while Pydantic models enforce data integrity for financial calculations.

The AgentState TypedDict (Runtime Container)

Located in src/graph/state.py, the AgentState class is a TypedDict that defines the shape of data passed between graph nodes:

class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    data:     Annotated[dict[str, any], merge_dicts]
    metadata: Annotated[dict[str, any], merge_dicts]
  • messages – Accumulates BaseMessage objects from LangChain; agents append responses using operator.add for list concatenation.
  • data – Stores mutable analysis payloads merged via the merge_dicts helper function. In practice, this holds serialized AgentStateData objects.
  • metadata – Contains workflow-wide flags and configuration, also merged incrementally.

The AgentStateData Pydantic Model (Domain Payload)

Defined in src/data/models.py, AgentStateData is a BaseModel subclass that structures the actual financial data:

class AgentStateData(BaseModel):
    tickers:          list[str]
    portfolio:        Portfolio
    start_date:       str
    end_date:         str
    ticker_analyses:  dict[str, TickerAnalysis]
  • tickers – List of stock symbols under analysis (e.g., ["AAPL", "MSFT"]).
  • portfolio – Current cash positions, margin requirements, and holdings.
  • ticker_analyses – Mapping of symbols to TickerAnalysis objects, each containing analyst_signals from individual agents like valuation_analyst_agent or technical_analyst_agent.

The AgentStateMetadata Pydantic Model (Workflow Configuration)

Also in src/data/models.py, this model controls execution behavior:

class AgentStateMetadata(BaseModel):
    show_reasoning: bool = False
    model_config = {"extra": "allow"}
  • show_reasoning – Boolean flag that agents check to determine whether to print intermediate LLM reasoning steps.
  • extra: "allow" – Permits additional keys like model_name and model_provider for dynamic LLM configuration overrides.

How AgentState Flows Through the Workflow

Understanding the lifecycle of the AgentState data structure reveals how the hedge fund simulation maintains deterministic state across asynchronous agent execution.

Initialization in src/main.py

The workflow begins with state construction:

from src.graph.state import AgentState
from src.data.models import AgentStateData, AgentStateMetadata, Portfolio

portfolio = Portfolio(
    cash=100_000,
    margin_requirement=0.5,
    positions={},
    realized_gains={},
)

initial_state: AgentState = {
    "messages": [],
    "data": AgentStateData(
        tickers=["AAPL", "MSFT"],
        portfolio=portfolio,
        start_date="2023-01-01",
        end_date="2023-12-31",
        ticker_analyses={},
    ).model_dump(),
    "metadata": AgentStateMetadata(show_reasoning=True).model_dump(),
}

The model_dump() calls serialize Pydantic objects into plain dictionaries required by the graph engine.

Agent Node Execution

Each analyst agent receives the current state and returns partial updates. In src/agents/valuation.py and similar files, functions follow this signature:

from src.graph.state import AgentState

def valuation_analyst_agent(state: AgentState, agent_id: str = "valuation_analyst_agent"):
    tickers = state["data"]["tickers"]
    
    # Perform valuation analysis...

    
    new_data = {
        "ticker_analyses": {
            "AAPL": {
                "ticker": "AAPL",
                "analyst_signals": {
                    "valuation_analyst_agent": {
                        "signal": "buy",
                        "confidence": 0.87,
                        "reasoning": "PE ratio below sector average"
                    }
                }
            }
        }
    }
    
    return {"data": new_data}

The merge_dicts function defined in src/graph/state.py automatically merges these fragments into the existing state without overwriting unrelated fields.

Metadata Access for Debugging

Agents conditionally output reasoning based on metadata flags:

from src.graph.state import show_agent_reasoning

def risk_management_agent(state: AgentState, agent_id: str = "risk_management_agent"):
    # Risk calculations...

    if state["metadata"]["show_reasoning"]:
        show_agent_reasoning(risk_report, agent_id)
    return {"data": updated_risk_data}

Summary

  • AgentState is a TypedDict defined in src/graph/state.py that serves as the runtime state container for the LangGraph workflow, separating concerns between messages, data, and metadata.
  • AgentStateData (Pydantic model in src/data/models.py) enforces type safety for financial data including tickers, portfolios, and analyst signals.
  • AgentStateMetadata controls workflow behavior such as reasoning visibility and LLM provider configuration.
  • The merge_dicts helper enables immutable-style updates while using plain dictionaries for graph processing efficiency.
  • State flows from initialization in src/main.py through analyst agents to the final portfolio manager, with each node returning partial state updates that are automatically merged.

Frequently Asked Questions

How does AgentState handle concurrent updates from multiple analysts?

The Annotated type hints in AgentState specify merge strategies: operator.add concatenates message lists, while the custom merge_dicts function in src/graph/state.py performs deep dictionary merging. This ensures that when multiple agents return updates simultaneously, their ticker_analyses entries merge without overwriting each other's signals.

What is the difference between AgentState and AgentStateData?

AgentState is the runtime container (TypedDict) required by LangGraph's StateGraph, while AgentStateData is a Pydantic BaseModel that defines the actual financial schema. The data field inside AgentState holds a serialized dictionary representation of AgentStateData, allowing the graph engine to process state transitions while maintaining type-safe data validation.

Can I extend AgentState to include custom fields for new analyst agents?

Yes, the AgentStateMetadata model uses Pydantic's extra: "allow" configuration, permitting additional keys for custom workflow flags. For new data fields, modify AgentStateData in src/data/models.py or nest custom dictionaries within the existing ticker_analyses structure without breaking the core graph logic in src/graph/state.py.

Where is the initial AgentState created in the codebase?

The initialization occurs in src/main.py within the run_hedge_fund function and related helpers. This module constructs the Portfolio object, instantiates AgentStateData and AgentStateMetadata, serializes them via model_dump(), and assembles the initial AgentState dictionary passed to the workflow compiler.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →