# How to Optimize LLM Costs When Running the AI Hedge Fund

> Optimize LLM costs for your AI hedge fund. Discover strategies like configuring cheaper models, reducing retries, batching tickers, caching responses, and monitoring usage with llm_calls_count.

- Repository: [Virat Singh/ai-hedge-fund](https://github.com/virattt/ai-hedge-fund)
- Tags: performance
- Published: 2026-03-09

---

**You can optimize LLM costs in the AI Hedge Fund by configuring cheaper default models, reducing retry attempts, batching multiple tickers into single prompts, implementing response caching, and monitoring usage via the built-in `llm_calls_count` tracker.**

The virattt/ai-hedge-fund repository orchestrates multiple LLM-driven agents—such as Warren Buffett and Peter Lynch personas—that analyze financial data to generate trading signals. Because each agent may invoke premium models like GPT-4 multiple times per run, operational costs can escalate rapidly without proper optimization strategies.

## Model Selection Strategy

The most immediate cost lever is the model itself. In [`src/utils/llm.py`](https://github.com/virattt/ai-hedge-fund/blob/main/src/utils/llm.py), the `call_llm` function (lines 33-40) and `get_agent_model_config` (lines 24-47) resolve the model name from request state or fall back to the system default—currently `gpt-4.1` with the `OPENAI` provider.

To reduce costs, override the default with a cheaper alternative such as `gpt-3.5-turbo` or a local Ollama model. The repository already exposes a model catalog in [`src/llm/models.py`](https://github.com/virattt/ai-hedge-fund/blob/main/src/llm/models.py) (line 108) via the `LLM_ORDER` list, which can be surfaced in a CLI flag or UI dropdown.

```python

# src/cli/input.py – example override

parser.add_argument(
    "--model",
    choices=[m.model_name for m in AVAILABLE_MODELS],
    default="gpt-3.5-turbo",
    help="Cheaper LLM model to use for all agents",
)

```

## Retry Policy Tuning

The LLM wrapper implements automatic retries to handle transient failures. In [`src/utils/llm.py`](https://github.com/virattt/ai-hedge-fund/blob/main/src/utils/llm.py) (lines 58-78), the retry loop attempts up to three times before returning a default response. While this improves reliability, each retry multiplies token-billable calls.

For non-critical agents—such as news sentiment analysis—reduce `max_retries` to 1. Reserve higher retry counts for deterministic agents like valuation models.

```python

# src/agents/news_sentiment.py

response = call_llm(
    prompt,
    Sentiment,
    agent_name="news_sentiment",
    state=state,
    max_retries=1,  # Reduced from default 3

    default_factory=lambda: Sentiment(decision="NEUTRAL")
)

```

## Prompt Engineering and Batching

Token consumption scales with prompt length. The `call_llm` function accepts any prompt object, and long contexts—such as analyst biographies or extensive ticker lists—directly increase costs.

Optimize by moving static context to a shared system prompt stored once per run. Additionally, batch multiple tickers into a single prompt rather than invoking the LLM per ticker. This reduces API calls from *N* to 1.

```python
batched_prompt = f"""
For each ticker in {tickers}, provide a short buy/sell/hold decision.
Return JSON: {{ "ticker": "decision", ... }}
"""
result = call_llm(
    batched_prompt,
    DecisionsModel,
    agent_name="portfolio_manager",
    state=state
)

```

## Response Caching Implementation

Unlike API data—which is cached in [`src/data/cache.py`](https://github.com/virattt/ai-hedge-fund/blob/main/src/data/cache.py)—LLM responses are not cached by default. Repeated analyses, such as sentiment scoring for the same news article, trigger redundant costly calls.

Implement a memo-cache keyed by `<model_name>|<prompt_hash>` to deduplicate identical requests. Use the existing cache pattern from [`src/data/cache.py`](https://github.com/virattt/ai-hedge-fund/blob/main/src/data/cache.py) as a model.

```python

# src/utils/llm.py – add at top of file

import hashlib
import json
_llm_memo: dict[str, BaseModel] = {}

def memoized_call_llm(prompt, model, **kwargs):
    key = hashlib.sha256(
        f"{model.__name__}:{json.dumps(prompt, sort_keys=True)}".encode()
    ).hexdigest()
    if cached := _llm_memo.get(key):
        return cached
    result = call_llm(prompt, model, **kwargs)
    _llm_memo[key] = result
    return result

```

## Usage Monitoring and Cost Tracking

The database schema tracks every LLM invocation. In [`app/backend/database/models.py`](https://github.com/virattt/ai-hedge-fund/blob/main/app/backend/database/models.py) (line 88), the `llm_calls_count` column increments with each call, while `estimated_cost` provides visibility into spend.

Query these fields after backtests to identify cost hotspots. Schedule alerts when a run exceeds predefined thresholds to prevent budget overruns.

```python
from sqlalchemy.orm import Session
from app.backend.database.models import HedgeFundFlowRunCycle

def print_llm_costs(session: Session, run_id: int):
    cycle = session.query(HedgeFundFlowRunCycle).get(run_id)
    print(
        f"LLM calls: {cycle.llm_calls_count}, "
        f"API calls: {cycle.api_calls_count}, "
        f"Estimated cost: {cycle.estimated_cost}"
    )

```

## Summary

- **Select cheaper models** by overriding the default `gpt-4.1` fallback in [`src/utils/llm.py`](https://github.com/virattt/ai-hedge-fund/blob/main/src/utils/llm.py) with `gpt-3.5-turbo` or Ollama models via the `LLM_ORDER` catalog.
- **Reduce retry counts** from the default 3 to 1 for non-critical agents to eliminate redundant token billing.
- **Batch ticker processing** into single prompts rather than individual calls, cutting API volume from *N* to 1.
- **Implement response caching** using a `<model_name>|<prompt_hash>` key to avoid recomputing identical analyses.
- **Monitor usage** via the `llm_calls_count` and `estimated_cost` columns in [`app/backend/database/models.py`](https://github.com/virattt/ai-hedge-fund/blob/main/app/backend/database/models.py) to identify cost hotspots.

## Frequently Asked Questions

### What is the default LLM model used by the AI Hedge Fund?

The system defaults to `gpt-4.1` provided by OpenAI. This fallback is defined in [`src/utils/llm.py`](https://github.com/virattt/ai-hedge-fund/blob/main/src/utils/llm.py) within the `call_llm` function (lines 37-40) when no model is specified in the request state.

### How can I switch to a cheaper model without modifying agent code?

You can override the model selection via CLI flags or request metadata. The repository exposes an `LLM_ORDER` list in [`src/llm/models.py`](https://github.com/virattt/ai-hedge-fund/blob/main/src/llm/models.py) (line 108) that catalogs available models. Pass a cheaper option like `gpt-3.5-turbo` or an Ollama model name through the request state, and `get_agent_model_config` in [`src/utils/llm.py`](https://github.com/virattt/ai-hedge-fund/blob/main/src/utils/llm.py) (lines 24-47) will resolve it.

### Why does the AI Hedge Fund retry LLM calls, and how does this affect costs?

The `call_llm` wrapper in [`src/utils/llm.py`](https://github.com/virattt/ai-hedge-fund/blob/main/src/utils/llm.py) (lines 58-78) implements a retry loop that attempts up to three times before falling back to a default response. While this improves reliability against transient API failures, each retry generates a fresh billable token request. For non-critical agents like news sentiment analysis, reducing `max_retries` to 1 can significantly cut costs without materially impacting decision quality.

### How do I track which agents are driving the highest LLM costs?

The database schema in [`app/backend/database/models.py`](https://github.com/virattt/ai-hedge-fund/blob/main/app/backend/database/models.py) (line 88) includes an `llm_calls_count` column that increments with every LLM invocation, alongside an `estimated_cost` field. After running a backtest, query the `HedgeFundFlowRunCycle` table to identify which workflow cycles generated the most calls and associated costs, then target those specific agents for optimization.