# How to Manage LLM and Embedding Models with ModelsService in OpenRAG

> Master LLM and embedding models with OpenRAG's ModelsService. Discover and validate models from OpenAI, Anthropic, Ollama, and IBM Watson X efficiently without spending inference credits. Learn how now.

- Repository: [Langflow/openrag](https://github.com/langflow-ai/openrag)
- Tags: how-to-guide
- Published: 2026-03-13

---

**The ModelsService in OpenRAG provides a centralized, async interface to discover and validate language and embedding models across OpenAI, Anthropic, Ollama, and IBM Watson X without consuming inference credits.**

OpenRAG (langflow-ai/openrag) ships with a dedicated **ModelsService** that abstracts provider-specific APIs into a unified curation layer. Whether you are building a custom UI or automating model selection in scripts, this service handles credential validation, filtering, and default model detection. This guide walks you through the architecture and implementation details found in the [`src/services/models_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/models_service.py) core.

## Understanding the ModelsService Architecture

The service acts as a singleton dependency injected into FastAPI routes via `get_models_service` in [`src/dependencies.py`](https://github.com/langflow-ai/openrag/blob/main/src/dependencies.py). It coordinates between provider constants and raw API responses to return sanitized model lists.

| Component | Role | Key File |
|-----------|------|----------|
| **ModelsService** core | Retrieves and curates model lists per provider. Handles lightweight validation, sorting, and default-model detection. | [`src/services/models_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/models_service.py) |
| **Provider constants** | Lists of known LLM IDs and default model identifiers for each provider. | [`src/config/model_constants.py`](https://github.com/langflow-ai/openrag/blob/main/src/config/model_constants.py) |
| **FastAPI models endpoint** | Exposes RESTful GET `/v1/models/{provider}` that delegates to ModelsService. | [`src/api/v1/models.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/models.py) |
| **Dependency injection** | Supplies a singleton ModelsService instance to FastAPI routes. | `get_models_service` in [`src/dependencies.py`](https://github.com/langflow-ai/openrag/blob/main/src/dependencies.py) |
| **Utility helpers** | URL transformation for localhost and container-aware networking. | [`src/utils/container_utils.py`](https://github.com/langflow-ai/openrag/blob/main/src/utils/container_utils.py) |

The service validates API keys or endpoints by performing lightweight metadata calls rather than executing expensive inference requests.

## Provider-Specific Implementation Details

Each provider implements a dedicated async method within `ModelsService` that handles authentication, filtering, and capability detection.

### OpenAI Model Discovery

In [`src/services/models_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/models_service.py), the `get_openai_models` method calls `GET https://api.openai.com/v1/models` using the supplied key. It filters the raw catalog against `OPENAI_VALIDATION_MODELS` from [`src/config/model_constants.py`](https://github.com/langflow-ai/openrag/blob/main/src/config/model_constants.py), retaining only validated LLMs and any model whose ID starts with `text-embedding` for embeddings. The service marks `gpt-4o` as the default language model and `text-embedding-3-small` as the default embedding model.

### Anthropic Model Curation

For Anthropic, the service queries `https://api.anthropic.com/v1/models` and filters results against `ANTHROPIC_VALIDATION_MODELS`. The implementation flags `claude-sonnet-4-5-20250929` as the default model. This validation occurs without invoking the Messages API, ensuring no token costs are incurred during discovery.

### Ollama Local Model Detection

The Ollama implementation queries `/api/tags` to list local models, then inspects each model via `/api/show` to check its `capabilities` array. Language models must support both `completion` and `tools`, while embedding models require the `embedding` capability. The service uses a default pattern matching `"gpt-oss"` to pre-select recommended local models. For containerized deployments, [`src/utils/container_utils.py`](https://github.com/langflow-ai/openrag/blob/main/src/utils/container_utils.py) handles localhost URL rewriting.

### IBM Watson X Integration

IBM Watson X support requires obtaining an IAM bearer token first, then calling the foundation-model specification endpoint twice—once for text-chat models and once for embedding models. The service treats the first model in each returned list as the default. All network errors are wrapped with structured logging and raised as generic exceptions that the API layer translates into HTTP 500 responses.

## How to Use ModelsService in Your Code

You can consume the service directly in async Python scripts or through the provided REST endpoint.

### Direct Async Usage

Instantiate `ModelsService` and call provider-specific methods to retrieve structured model lists:

```python
import asyncio
from src.services.models_service import ModelsService

async def list_openai_models():
    service = ModelsService()
    # Replace with your real OpenAI key (do NOT commit it)

    api_key = "sk-xxxx"
    models = await service.get_openai_models(api_key=api_key)
    print("LLM models:", models["language_models"])
    print("Embedding models:", models["embedding_models"])

asyncio.run(list_openai_models())

```

Each method returns a dictionary containing `language_models` and `embedding_models`, where each entry is a list of objects with `value`, `label`, and `default` boolean fields.

### REST API Endpoint

The FastAPI endpoint in [`src/api/v1/models.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/models.py) exposes a GET route at `/v1/models/{provider}`. Use standard HTTP clients to fetch curated lists:

```bash
curl -X GET "http://localhost:8000/v1/models/openai" \
     -H "Authorization: Bearer <your-api-key>"

```

The JSON response follows this structure:

```json
{
  "language_models": [
    {"value":"gpt-4o","label":"gpt-4o","default":true},
    {"value":"gpt-4","label":"gpt-4","default":false}
  ],
  "embedding_models": [
    {"value":"text-embedding-3-small","label":"text-embedding-3-small","default":true}
  ]
}

```

### Scripting with Custom Providers

For multi-provider scripts, branch logic based on the provider name and pass credentials accordingly:

```python
from src.services.models_service import ModelsService
import asyncio

async def get_provider_models(provider, **creds):
    service = ModelsService()
    if provider == "ollama":
        models = await service.get_ollama_models(endpoint=creds.get("endpoint"))
    elif provider == "watsonx":
        models = await service.get_ibm_models(
            endpoint=creds["endpoint"],
            api_key=creds["api_key"],
            project_id=creds["project_id"]
        )
    return models

# Example: Ollama on local machine

asyncio.run(get_provider_models("ollama", endpoint="http://localhost:11434"))

```

## Summary

- **ModelsService** centralizes model discovery across OpenAI, Anthropic, Ollama, and IBM Watson X in [`src/services/models_service.py`](https://github.com/langflow-ai/openrag/blob/main/src/services/models_service.py).
- Credential validation uses lightweight metadata endpoints rather than inference calls, preventing accidental credit consumption.
- Provider-specific filters and default identifiers are defined in [`src/config/model_constants.py`](https://github.com/langflow-ai/openrag/blob/main/src/config/model_constants.py).
- The service returns standardized dictionaries with `language_models` and `embedding_models` arrays containing `value`, `label`, and `default` fields.
- FastAPI integration via [`src/api/v1/models.py`](https://github.com/langflow-ai/openrag/blob/main/src/api/v1/models.py) provides RESTful access at `/v1/models/{provider}`.

## Frequently Asked Questions

### How does ModelsService validate provider credentials without consuming credits?

The service performs read-only metadata calls—such as OpenAI's `/v1/models` or Anthropic's `/v1/models` endpoints—rather than invoking chat completion or embedding generation APIs. These metadata requests validate API key permissions and connectivity without executing model inference or incurring token costs.

### What is the difference between language models and embedding models in the service output?

`language_models` lists LLMs capable of chat completion and tool use, while `embedding_models` contains models specialized for vector generation. In the returned JSON or dictionary, each category is a separate array of objects with `value` (model ID), `label` (display name), and `default` (boolean flag) properties.

### Can I use ModelsService with a local Ollama instance running in Docker?

Yes. Pass the container-accessible endpoint URL (e.g., `http://host.docker.internal:11434`) to `get_ollama_models`. The [`src/utils/container_utils.py`](https://github.com/langflow-ai/openrag/blob/main/src/utils/container_utils.py) module handles localhost-to-host networking translation automatically, ensuring the service can reach Ollama across container boundaries.

### Where are the default model identifiers configured?

Default models are defined in [`src/config/model_constants.py`](https://github.com/langflow-ai/openrag/blob/main/src/config/model_constants.py) as constants like `OPENAI_DEFAULT_MODEL` and `ANTHROPIC_DEFAULT_MODEL`. The service references these values when setting the `default: true` flag on the appropriate model object in the returned lists.