# How to Use Vector Stores and Embeddings for RAG Implementations in Microsoft Agent Framework

> Learn to implement RAG in Microsoft Agent Framework using vector stores and embeddings. Generate vectors with OpenAIEmbeddingClient, persist in a vector store, and attach to an Agent for enhanced generation.

- Repository: [Microsoft/agent-framework](https://github.com/microsoft/agent-framework)
- Tags: how-to-guide
- Published: 2026-04-05

---

**Use the `OpenAIEmbeddingClient` to generate dense vectors, persist them in a vector store via `client.vector_stores.create`, and attach the store to an `Agent` through `get_file_search_tool()` for automatic retrieval-augmented generation.**

The **Microsoft Agent Framework** provides first-class support for Retrieval-Augmented Generation (RAG) by combining embedding clients with vector store backends. This architecture enables agents to retrieve relevant document chunks during chat turns, grounding LLM responses in external knowledge. The framework implements this pattern through three core components: an embedding client for vectorization, a vector store for persistence, and a file-search tool that exposes storage to the agent runtime.

## Understanding the RAG Architecture

The RAG pipeline in Agent Framework consists of three integrated components that handle the flow from raw text to retrieved context.

### The Embedding Client

The **`OpenAIEmbeddingClient`** generates dense embeddings from raw text using providers like OpenAI, Azure OpenAI, or Ollama. Located in [`python/packages/openai/agent_framework_openai/_embedding_client.py`](https://github.com/microsoft/agent-framework/blob/main/python/packages/openai/agent_framework_openai/_embedding_client.py) (lines 316-324), this client supports single text, batch processing, and custom dimension configurations. It exposes the `get_embeddings()` method which returns vector representations that can be stored or compared for similarity search.

### The Vector Store

Vector stores persist embeddings alongside original documents, enabling efficient similarity queries. The framework supports native provider stores (such as OpenAI File Search or Foundry) and custom backends like Redis Search. In [`python/samples/02-agents/providers/openai/client_with_file_search.py`](https://github.com/microsoft/agent-framework/blob/main/python/samples/02-agents/providers/openai/client_with_file_search.py) (lines 22-36), the `create_vector_store` helper demonstrates store creation via `client.client.vector_stores.create`, uploading files, and linking them to the storage backend.

### The File-Search Tool

The **`get_file_search_tool()`** method in [`python/packages/openai/agent_framework_openai/_chat_client.py`](https://github.com/microsoft/agent-framework/blob/main/python/packages/openai/agent_framework_openai/_chat_client.py) (lines 1087-1128) creates a `FileSearchToolParam` configuration that attaches vector store IDs to an Agent. When the agent runs, the framework automatically invokes the tool, fetches the most relevant chunks, and injects them into the chat request context.

## Implementing OpenAI File-Search RAG

The quickest path to RAG uses OpenAI's native vector store and file search capabilities. This approach handles embedding generation internally, requiring only document upload and tool configuration.

The following implementation from [`python/samples/02-agents/providers/openai/client_with_file_search.py`](https://github.com/microsoft/agent-framework/blob/main/python/samples/02-agents/providers/openai/client_with_file_search.py) demonstrates the complete workflow:

```python
from agent_framework import Agent
from agent_framework.openai import OpenAIChatClient
import asyncio

async def main() -> None:
    client = OpenAIChatClient()

    # ① Upload a document and create a vector store

    file_id, vector_store_id = await create_vector_store(client)

    # ② Build the agent with the file-search tool

    agent = Agent(
        client=client,
        instructions="You are a helpful assistant that can search through files.",
        tools=[client.get_file_search_tool(vector_store_ids=[vector_store_id])],
    )

    # ③ Run a query that requires document lookup

    response = await agent.run("What is the weather today? Do a file search.")
    print(f"Agent: {response}")

    # Cleanup

    await delete_vector_store(client, file_id, vector_store_id)

asyncio.run(main())

```

Key implementation details:
- **`create_vector_store`** handles the `client.client.vector_stores.create` call and file upload
- **`get_file_search_tool`** accepts a list of `vector_store_ids` and returns a tool configuration the Agent invokes automatically during the chat turn

## Generating Embeddings Directly with OpenAIEmbeddingClient

For scenarios requiring explicit control over vectorization—such as custom indexing, caching, or hybrid search—use the `OpenAIEmbeddingClient` directly. This class provides batch processing, dimension customization, and provider-agnostic configuration.

Example implementation showcasing batch and custom dimension support:

```python
from agent_framework.openai import OpenAIEmbeddingClient
import asyncio, os
from dotenv import load_dotenv
load_dotenv()

async def main() -> None:
    client = OpenAIEmbeddingClient(
        model="text-embedding-3-small",
        api_key=os.getenv("OPENAI_API_KEY"),
    )
    
    # Single text embedding

    result = await client.get_embeddings(["Hello, world!"])
    print("Dimensions:", result[0].dimensions)

    # Batch processing

    batch = await client.get_embeddings([
        "The weather is sunny today.",
        "Machine learning is fascinating."
    ])
    print("Batch size:", len(batch))

    # Custom dimensions (e.g., for Azure OpenAI)

    custom = await client.get_embeddings(
        ["Custom dimensions example"],
        options={"dimensions": 256}
    )
    print("Custom dims:", custom[0].dimensions)

asyncio.run(main())

```

The `get_embeddings()` method accepts an optional `options` dictionary for provider-specific parameters like `dimensions`, making it compatible with both standard OpenAI and Azure OpenAI deployments.

## Building Hybrid RAG with Redis (Vector + Keyword)

For production scenarios requiring both **semantic similarity** and **keyword matching**, the Redis context provider integrates with `OpenAITextVectorizer` to enable hybrid search. This configuration stores vectors in a RediSearch index while maintaining full-text search capabilities, all within the agent runtime.

The following example from [`python/samples/02-agents/context_providers/redis/redis_conversation.py`](https://github.com/microsoft/agent-framework/blob/main/python/samples/02-agents/context_providers/redis/redis_conversation.py) demonstrates the setup:

```python
import asyncio, os
from agent_framework import Agent
from agent_framework.foundry import FoundryChatClient
from agent_framework.redis import RedisContextProvider
from redisvl.extensions.cache.embeddings import EmbeddingsCache
from redisvl.utils.vectorize import OpenAITextVectorizer
from dotenv import load_dotenv
load_dotenv()

async def main() -> None:
    # ① Configure vectorizer with Redis caching

    vectorizer = OpenAITextVectorizer(
        model="text-embedding-ada-002",
        api_config={"api_key": os.getenv("OPENAI_API_KEY")},
        cache=EmbeddingsCache(
            name="openai_embeddings_cache", 
            redis_url="redis://localhost:6379"
        ),
    )

    # ② Initialize Redis context provider for hybrid search

    provider = RedisContextProvider(
        source_id="redis_context",
        redis_url="redis://localhost:6379",
        index_name="redis_conversation",
        prefix="redis_conversation",
        application_id="my_app",
        agent_id="my_agent",
        user_id="user123",
        redis_vectorizer=vectorizer,
        vector_field_name="vector",
        vector_algorithm="hnsw",
        vector_distance_metric="cosine",
    )

    # ③ Create Foundry client (any LLM client works)

    client = FoundryChatClient(
        project_endpoint=os.getenv("FOUNDRY_PROJECT_ENDPOINT"),
        model=os.getenv("FOUNDRY_MODEL"),
        credential=None,
    )

    # ④ Wire provider to Agent—no explicit tools needed

    agent = Agent(
        client=client,
        name="HybridRAGAgent",
        instructions="Use stored context to answer queries.",
        tools=[],
        context_providers=[provider],
    )

    session = agent.create_session()
    result = await agent.run("What is the capital of France?", session=session)
    print("Agent:", result)

    # Cleanup

    await provider.redis_index.delete()

asyncio.run(main())

```

Critical configuration parameters in [`python/packages/redis/agent_framework_redis/_context_provider.py`](https://github.com/microsoft/agent-framework/blob/main/python/packages/redis/agent_framework_redis/_context_provider.py) (lines 61-80) include:
- **`vector_algorithm`**: Set to `hnsw` for approximate nearest neighbor search
- **`vector_distance_metric`**: Use `cosine` for semantic similarity matching
- **`redis_vectorizer`**: The `OpenAITextVectorizer` instance that embeds queries on-the-fly during retrieval

## Summary

- **Use `OpenAIEmbeddingClient`** ([`_embedding_client.py`](https://github.com/microsoft/agent-framework/blob/main/_embedding_client.py), lines 316-324) to generate dense vectors with support for batch processing and custom dimensions
- **Create vector stores** via `client.vector_stores.create` for native OpenAI File Search, or configure `RedisContextProvider` for hybrid vector + keyword retrieval
- **Attach retrieval tools** using `OpenAIChatClient.get_file_search_tool()` ([`_chat_client.py`](https://github.com/microsoft/agent-framework/blob/main/_chat_client.py), lines 1087-1128) to enable automatic context injection during agent execution
- **Implement hybrid RAG** by combining `OpenAITextVectorizer` with `RedisContextProvider` when you need both semantic similarity and full-text search capabilities

## Frequently Asked Questions

### How does the Agent Framework automatically retrieve documents during a chat turn?

When you attach a file-search tool via `get_file_search_tool()`, the Agent Framework intercepts the LLM's request to search and calls the underlying vector store API. According to the source code in [`python/packages/openai/agent_framework_openai/_chat_client.py`](https://github.com/microsoft/agent-framework/blob/main/python/packages/openai/agent_framework_openai/_chat_client.py) (lines 1087-1128), the method builds a `FileSearchToolParam` configuration that the OpenAI client uses to fetch relevant chunks. The framework then injects these chunks into the prompt context before sending the final request to the LLM.

### Can I use Azure OpenAI instead of standard OpenAI for embeddings?

Yes. The `OpenAIEmbeddingClient` accepts configuration parameters that support Azure deployments. Pass your Azure endpoint and API key to the client constructor, and use the `options` parameter in `get_embeddings()` to specify Azure-specific settings such as custom dimensions or deployment names. The client interface remains identical regardless of the backend provider.

### What is the difference between using OpenAI's native vector store versus Redis for RAG?

OpenAI's native vector store (`client.vector_stores.create`) provides managed embedding generation and retrieval optimized for OpenAI models, requiring minimal configuration. Redis implementations (`RedisContextProvider`) offer hybrid search capabilities—combining vector similarity with keyword matching—and full control over the embedding process through `OpenAITextVectorizer`. Choose Redis when you need on-premise storage, custom indexing logic, or retrieval that combines semantic and lexical matching.

### How do I configure custom embedding dimensions for Azure OpenAI?

When calling `OpenAIEmbeddingClient.get_embeddings()`, pass a dictionary to the `options` parameter containing the `dimensions` key set to your desired value (e.g., `options={"dimensions": 256}`). This matches the Azure OpenAI API specification for reducing vector size, which can lower storage costs and improve retrieval speed while maintaining search quality.