Ollama vs Cloud LLM APIs for ai-hedge-fund: Architecture and Implementation Guide
The ai-hedge-fund repository abstracts LLM interactions through a unified chat() interface that dynamically routes requests to either a local Ollama instance or cloud APIs like OpenAI based on the LLM_BACKEND environment variable.
The ai-hedge-fund project supports both local and cloud-based language model inference, allowing developers to run financial analysis agents on-premises using Ollama or scale to production using cloud LLM APIs. This architectural flexibility ensures that trading agents remain agnostic to the underlying model provider while giving operators full control over latency, cost, and data privacy.
Architecture Overview
The codebase implements a clean separation between LLM backend logic and agent business logic. All LLM interactions flow through a single entry point in src/utils/llm.py, which delegates to provider-specific implementations based on runtime configuration.
Backend Selection Logic
The system determines which provider to use by reading the LLM_BACKEND environment variable at module load time. In src/utils/llm.py, the implementation defaults to "ollama" if the variable is unset:
# src/utils/llm.py
LLM_BACKEND = os.getenv("LLM_BACKEND", "ollama")
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
OPENAI_API_URL = os.getenv("OPENAI_API_URL", "https://api.openai.com/v1")
The public chat() function acts as a router, selecting between _ollama_chat() and _openai_chat() internal methods:
def chat(messages: List[Dict[str, str]]) -> Dict[str, Any]:
"""Unified chat interface that selects the backend dynamically."""
if LLM_BACKEND == "ollama":
return _ollama_chat(messages)
elif LLM_BACKEND == "openai":
return _openai_chat(messages)
else:
raise ValueError(f"Unsupported LLM_BACKEND: {LLM_BACKEND}")
This design allows agents in src/agents/ to call chat() without modification regardless of the deployment environment.
Local Ollama Implementation
When LLM_BACKEND=ollama, the system communicates with a locally running Ollama server via HTTP requests to the Ollama API.
Ollama Lifecycle Management
The src/utils/ollama.py module provides comprehensive lifecycle management for local models. It handles installation detection, server startup, and model downloading through functions like is_ollama_installed(), start_ollama_server(), and ensure_ollama_and_model().
The ensure_ollama_and_model() function coordinates the entire setup process, checking for the Ollama binary, starting the server if necessary, and downloading missing models:
# Conceptual representation based on src/utils/ollama.py implementation
def ensure_ollama_and_model(model_name: str = "llama2"):
if not is_ollama_installed():
raise RuntimeError("Ollama not found")
if not is_server_running():
start_ollama_server()
if model_name not in get_locally_available_models():
download_model(model_name)
The _ollama_chat() function in src/utils/llm.py constructs the request payload and posts to the local Ollama endpoint:
def _ollama_chat(messages: List[Dict[str, str]]) -> Dict[str, Any]:
url = f"{OLLAMA_BASE_URL}/api/chat"
payload = {"model": os.getenv("OLLAMA_MODEL", "llama2"), "messages": messages}
response = requests.post(url, json=payload, timeout=60)
response.raise_for_status()
return response.json()
Cloud API Implementation
For production deployments, the system supports cloud LLM providers through the same unified interface.
OpenAI Integration
When LLM_BACKEND=openai, requests route to the _openai_chat() function in src/utils/llm.py. This implementation retrieves API keys via src/utils/api_key.py and constructs authenticated requests to the OpenAI API:
def _openai_chat(messages: List[Dict[str, str]]) -> Dict[str, Any]:
url = f"{OPENAI_API_URL}/chat/completions"
headers = {
"Authorization": f"Bearer {OPENAI_API_KEY}",
"Content-Type": "application/json"
}
payload = {
"model": os.getenv("OPENAI_MODEL", "gpt-4o-mini"),
"messages": messages
}
response = requests.post(url, headers=headers, json=payload, timeout=60)
response.raise_for_status()
return response.json()
The API key is retrieved securely through the centralized key management system in src/utils/api_key.py, separating sensitive credentials from business logic.
FastAPI Service Layer
The app/backend/services/ollama_service.py file exposes Ollama operations via HTTP endpoints, wrapping the utilities from src/utils/ollama.py for remote management. This allows containerized deployments to check model availability or trigger downloads through REST APIs rather than direct CLI access.
Configuration and Usage Examples
Switching Between Backends
To use Ollama locally (default behavior):
export LLM_BACKEND=ollama
export OLLAMA_MODEL=mistral
export OLLAMA_BASE_URL=http://localhost:11434
from src.utils.llm import chat
response = chat([{"role": "user", "content": "Analyze AAPL valuation"}])
print(response["message"]["content"])
To use OpenAI cloud API:
export LLM_BACKEND=openai
export OPENAI_MODEL=gpt-4o-mini
export OPENAI_API_KEY=your_key_here # Or use api_key.py storage
from src.utils.llm import chat
# Same interface, different backend
result = chat([{"role": "user", "content": "Generate risk assessment"}])
print(result["choices"][0]["message"]["content"])
Model Management with Ollama
For local deployments, ensure required models are available before running agents:
from src.utils.ollama import ensure_ollama_and_model
# Downloads model if missing, starts server if needed
ensure_ollama_and_model("llama3")
Summary
- Unified Interface: The
chat()function insrc/utils/llm.pyprovides a single entry point for all LLM interactions, routing to either Ollama or OpenAI based on theLLM_BACKENDenvironment variable. - Local Flexibility: Ollama integration in
src/utils/ollama.pysupports full lifecycle management including installation detection, server startup, and model downloading. - Cloud Scalability: OpenAI integration uses standard HTTP requests with authentication via
src/utils/api_key.py, supporting production deployments without code changes. - Agent Agnosticism: Trading agents in
src/agents/call the unified interface, remaining decoupled from specific LLM providers. - Service Exposure: FastAPI endpoints in
app/backend/services/ollama_service.pywrap local Ollama operations for containerized environments.
Frequently Asked Questions
How does ai-hedge-fund decide which LLM provider to use?
The system checks the LLM_BACKEND environment variable at runtime in src/utils/llm.py. If set to "ollama", it routes requests to a local server; if "openai", it sends requests to the cloud API. The default value is "ollama", ensuring local-first operation when no configuration is provided.
Can I run ai-hedge-fund completely offline?
Yes. By setting LLM_BACKEND=ollama and ensuring your models are pre-downloaded using ensure_ollama_and_model(), the entire system operates without internet connectivity. All inference happens locally via the Ollama server running at OLLAMA_BASE_URL (default http://localhost:11434).
What changes are required to switch from Ollama to OpenAI?
No code changes are required within agents. Simply set the environment variables LLM_BACKEND=openai and provide your API key through OPENAI_API_KEY or src/utils/api_key.py. The chat() function automatically switches to _openai_chat(), changing the request URL from your local Ollama instance to https://api.openai.com/v1/chat/completions.
Where is the API key stored for cloud LLM access?
API keys are managed through src/utils/api_key.py, which provides a centralized get_api_key() function. For OpenAI, the key is retrieved via api_key.get_api_key("openai") and injected into the Authorization header as a Bearer token in src/utils/llm.py.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →