How to Manage LLM and Embedding Models with ModelsService in OpenRAG
The ModelsService in OpenRAG provides a centralized, async interface to discover and validate language and embedding models across OpenAI, Anthropic, Ollama, and IBM Watson X without consuming inference credits.
OpenRAG (langflow-ai/openrag) ships with a dedicated ModelsService that abstracts provider-specific APIs into a unified curation layer. Whether you are building a custom UI or automating model selection in scripts, this service handles credential validation, filtering, and default model detection. This guide walks you through the architecture and implementation details found in the src/services/models_service.py core.
Understanding the ModelsService Architecture
The service acts as a singleton dependency injected into FastAPI routes via get_models_service in src/dependencies.py. It coordinates between provider constants and raw API responses to return sanitized model lists.
| Component | Role | Key File |
|---|---|---|
| ModelsService core | Retrieves and curates model lists per provider. Handles lightweight validation, sorting, and default-model detection. | src/services/models_service.py |
| Provider constants | Lists of known LLM IDs and default model identifiers for each provider. | src/config/model_constants.py |
| FastAPI models endpoint | Exposes RESTful GET /v1/models/{provider} that delegates to ModelsService. |
src/api/v1/models.py |
| Dependency injection | Supplies a singleton ModelsService instance to FastAPI routes. | get_models_service in src/dependencies.py |
| Utility helpers | URL transformation for localhost and container-aware networking. | src/utils/container_utils.py |
The service validates API keys or endpoints by performing lightweight metadata calls rather than executing expensive inference requests.
Provider-Specific Implementation Details
Each provider implements a dedicated async method within ModelsService that handles authentication, filtering, and capability detection.
OpenAI Model Discovery
In src/services/models_service.py, the get_openai_models method calls GET https://api.openai.com/v1/models using the supplied key. It filters the raw catalog against OPENAI_VALIDATION_MODELS from src/config/model_constants.py, retaining only validated LLMs and any model whose ID starts with text-embedding for embeddings. The service marks gpt-4o as the default language model and text-embedding-3-small as the default embedding model.
Anthropic Model Curation
For Anthropic, the service queries https://api.anthropic.com/v1/models and filters results against ANTHROPIC_VALIDATION_MODELS. The implementation flags claude-sonnet-4-5-20250929 as the default model. This validation occurs without invoking the Messages API, ensuring no token costs are incurred during discovery.
Ollama Local Model Detection
The Ollama implementation queries /api/tags to list local models, then inspects each model via /api/show to check its capabilities array. Language models must support both completion and tools, while embedding models require the embedding capability. The service uses a default pattern matching "gpt-oss" to pre-select recommended local models. For containerized deployments, src/utils/container_utils.py handles localhost URL rewriting.
IBM Watson X Integration
IBM Watson X support requires obtaining an IAM bearer token first, then calling the foundation-model specification endpoint twice—once for text-chat models and once for embedding models. The service treats the first model in each returned list as the default. All network errors are wrapped with structured logging and raised as generic exceptions that the API layer translates into HTTP 500 responses.
How to Use ModelsService in Your Code
You can consume the service directly in async Python scripts or through the provided REST endpoint.
Direct Async Usage
Instantiate ModelsService and call provider-specific methods to retrieve structured model lists:
import asyncio
from src.services.models_service import ModelsService
async def list_openai_models():
service = ModelsService()
# Replace with your real OpenAI key (do NOT commit it)
api_key = "sk-xxxx"
models = await service.get_openai_models(api_key=api_key)
print("LLM models:", models["language_models"])
print("Embedding models:", models["embedding_models"])
asyncio.run(list_openai_models())
Each method returns a dictionary containing language_models and embedding_models, where each entry is a list of objects with value, label, and default boolean fields.
REST API Endpoint
The FastAPI endpoint in src/api/v1/models.py exposes a GET route at /v1/models/{provider}. Use standard HTTP clients to fetch curated lists:
curl -X GET "http://localhost:8000/v1/models/openai" \
-H "Authorization: Bearer <your-api-key>"
The JSON response follows this structure:
{
"language_models": [
{"value":"gpt-4o","label":"gpt-4o","default":true},
{"value":"gpt-4","label":"gpt-4","default":false}
],
"embedding_models": [
{"value":"text-embedding-3-small","label":"text-embedding-3-small","default":true}
]
}
Scripting with Custom Providers
For multi-provider scripts, branch logic based on the provider name and pass credentials accordingly:
from src.services.models_service import ModelsService
import asyncio
async def get_provider_models(provider, **creds):
service = ModelsService()
if provider == "ollama":
models = await service.get_ollama_models(endpoint=creds.get("endpoint"))
elif provider == "watsonx":
models = await service.get_ibm_models(
endpoint=creds["endpoint"],
api_key=creds["api_key"],
project_id=creds["project_id"]
)
return models
# Example: Ollama on local machine
asyncio.run(get_provider_models("ollama", endpoint="http://localhost:11434"))
Summary
- ModelsService centralizes model discovery across OpenAI, Anthropic, Ollama, and IBM Watson X in
src/services/models_service.py. - Credential validation uses lightweight metadata endpoints rather than inference calls, preventing accidental credit consumption.
- Provider-specific filters and default identifiers are defined in
src/config/model_constants.py. - The service returns standardized dictionaries with
language_modelsandembedding_modelsarrays containingvalue,label, anddefaultfields. - FastAPI integration via
src/api/v1/models.pyprovides RESTful access at/v1/models/{provider}.
Frequently Asked Questions
How does ModelsService validate provider credentials without consuming credits?
The service performs read-only metadata calls—such as OpenAI's /v1/models or Anthropic's /v1/models endpoints—rather than invoking chat completion or embedding generation APIs. These metadata requests validate API key permissions and connectivity without executing model inference or incurring token costs.
What is the difference between language models and embedding models in the service output?
language_models lists LLMs capable of chat completion and tool use, while embedding_models contains models specialized for vector generation. In the returned JSON or dictionary, each category is a separate array of objects with value (model ID), label (display name), and default (boolean flag) properties.
Can I use ModelsService with a local Ollama instance running in Docker?
Yes. Pass the container-accessible endpoint URL (e.g., http://host.docker.internal:11434) to get_ollama_models. The src/utils/container_utils.py module handles localhost-to-host networking translation automatically, ensuring the service can reach Ollama across container boundaries.
Where are the default model identifiers configured?
Default models are defined in src/config/model_constants.py as constants like OPENAI_DEFAULT_MODEL and ANTHROPIC_DEFAULT_MODEL. The service references these values when setting the default: true flag on the appropriate model object in the returned lists.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →