how-to-guide

How to Use Vector Stores and Embeddings for RAG Implementations in Microsoft Agent Framework

April 5, 2026 microsoft/agent-framework ↗

Use the OpenAIEmbeddingClient to generate dense vectors, persist them in a vector store via client.vector_stores.create, and attach the store to an Agent through get_file_search_tool() for automatic retrieval-augmented generation.

The Microsoft Agent Framework provides first-class support for Retrieval-Augmented Generation (RAG) by combining embedding clients with vector store backends. This architecture enables agents to retrieve relevant document chunks during chat turns, grounding LLM responses in external knowledge. The framework implements this pattern through three core components: an embedding client for vectorization, a vector store for persistence, and a file-search tool that exposes storage to the agent runtime.

Understanding the RAG Architecture

The RAG pipeline in Agent Framework consists of three integrated components that handle the flow from raw text to retrieved context.

The Embedding Client

The OpenAIEmbeddingClient generates dense embeddings from raw text using providers like OpenAI, Azure OpenAI, or Ollama. Located in python/packages/openai/agent_framework_openai/_embedding_client.py (lines 316-324), this client supports single text, batch processing, and custom dimension configurations. It exposes the get_embeddings() method which returns vector representations that can be stored or compared for similarity search.

The Vector Store

Vector stores persist embeddings alongside original documents, enabling efficient similarity queries. The framework supports native provider stores (such as OpenAI File Search or Foundry) and custom backends like Redis Search. In python/samples/02-agents/providers/openai/client_with_file_search.py (lines 22-36), the create_vector_store helper demonstrates store creation via client.client.vector_stores.create, uploading files, and linking them to the storage backend.

The File-Search Tool

The get_file_search_tool() method in python/packages/openai/agent_framework_openai/_chat_client.py (lines 1087-1128) creates a FileSearchToolParam configuration that attaches vector store IDs to an Agent. When the agent runs, the framework automatically invokes the tool, fetches the most relevant chunks, and injects them into the chat request context.

Implementing OpenAI File-Search RAG

The quickest path to RAG uses OpenAI's native vector store and file search capabilities. This approach handles embedding generation internally, requiring only document upload and tool configuration.

The following implementation from python/samples/02-agents/providers/openai/client_with_file_search.py demonstrates the complete workflow:

from agent_framework import Agent
from agent_framework.openai import OpenAIChatClient
import asyncio

async def main() -> None:
    client = OpenAIChatClient()

    # ① Upload a document and create a vector store

    file_id, vector_store_id = await create_vector_store(client)

    # ② Build the agent with the file-search tool

    agent = Agent(
        client=client,
        instructions="You are a helpful assistant that can search through files.",
        tools=[client.get_file_search_tool(vector_store_ids=[vector_store_id])],
    )

    # ③ Run a query that requires document lookup

    response = await agent.run("What is the weather today? Do a file search.")
    print(f"Agent: {response}")

    # Cleanup

    await delete_vector_store(client, file_id, vector_store_id)

asyncio.run(main())

Key implementation details:

create_vector_store handles the client.client.vector_stores.create call and file upload
get_file_search_tool accepts a list of vector_store_ids and returns a tool configuration the Agent invokes automatically during the chat turn

Generating Embeddings Directly with OpenAIEmbeddingClient

For scenarios requiring explicit control over vectorization—such as custom indexing, caching, or hybrid search—use the OpenAIEmbeddingClient directly. This class provides batch processing, dimension customization, and provider-agnostic configuration.

Example implementation showcasing batch and custom dimension support:

from agent_framework.openai import OpenAIEmbeddingClient
import asyncio, os
from dotenv import load_dotenv
load_dotenv()

async def main() -> None:
    client = OpenAIEmbeddingClient(
        model="text-embedding-3-small",
        api_key=os.getenv("OPENAI_API_KEY"),
    )
    
    # Single text embedding

    result = await client.get_embeddings(["Hello, world!"])
    print("Dimensions:", result[0].dimensions)

    # Batch processing

    batch = await client.get_embeddings([
        "The weather is sunny today.",
        "Machine learning is fascinating."
    ])
    print("Batch size:", len(batch))

    # Custom dimensions (e.g., for Azure OpenAI)

    custom = await client.get_embeddings(
        ["Custom dimensions example"],
        options={"dimensions": 256}
    )
    print("Custom dims:", custom[0].dimensions)

asyncio.run(main())

The get_embeddings() method accepts an optional options dictionary for provider-specific parameters like dimensions, making it compatible with both standard OpenAI and Azure OpenAI deployments.

Building Hybrid RAG with Redis (Vector + Keyword)

For production scenarios requiring both semantic similarity and keyword matching, the Redis context provider integrates with OpenAITextVectorizer to enable hybrid search. This configuration stores vectors in a RediSearch index while maintaining full-text search capabilities, all within the agent runtime.

The following example from python/samples/02-agents/context_providers/redis/redis_conversation.py demonstrates the setup:

import asyncio, os
from agent_framework import Agent
from agent_framework.foundry import FoundryChatClient
from agent_framework.redis import RedisContextProvider
from redisvl.extensions.cache.embeddings import EmbeddingsCache
from redisvl.utils.vectorize import OpenAITextVectorizer
from dotenv import load_dotenv
load_dotenv()

async def main() -> None:
    # ① Configure vectorizer with Redis caching

    vectorizer = OpenAITextVectorizer(
        model="text-embedding-ada-002",
        api_config={"api_key": os.getenv("OPENAI_API_KEY")},
        cache=EmbeddingsCache(
            name="openai_embeddings_cache", 
            redis_url="redis://localhost:6379"
        ),
    )

    # ② Initialize Redis context provider for hybrid search

    provider = RedisContextProvider(
        source_id="redis_context",
        redis_url="redis://localhost:6379",
        index_name="redis_conversation",
        prefix="redis_conversation",
        application_id="my_app",
        agent_id="my_agent",
        user_id="user123",
        redis_vectorizer=vectorizer,
        vector_field_name="vector",
        vector_algorithm="hnsw",
        vector_distance_metric="cosine",
    )

    # ③ Create Foundry client (any LLM client works)

    client = FoundryChatClient(
        project_endpoint=os.getenv("FOUNDRY_PROJECT_ENDPOINT"),
        model=os.getenv("FOUNDRY_MODEL"),
        credential=None,
    )

    # ④ Wire provider to Agent—no explicit tools needed

    agent = Agent(
        client=client,
        name="HybridRAGAgent",
        instructions="Use stored context to answer queries.",
        tools=[],
        context_providers=[provider],
    )

    session = agent.create_session()
    result = await agent.run("What is the capital of France?", session=session)
    print("Agent:", result)

    # Cleanup

    await provider.redis_index.delete()

asyncio.run(main())

Critical configuration parameters in python/packages/redis/agent_framework_redis/_context_provider.py (lines 61-80) include:

vector_algorithm: Set to hnsw for approximate nearest neighbor search
vector_distance_metric: Use cosine for semantic similarity matching
redis_vectorizer: The OpenAITextVectorizer instance that embeds queries on-the-fly during retrieval

Summary

Use OpenAIEmbeddingClient (_embedding_client.py, lines 316-324) to generate dense vectors with support for batch processing and custom dimensions
Create vector stores via client.vector_stores.create for native OpenAI File Search, or configure RedisContextProvider for hybrid vector + keyword retrieval
Attach retrieval tools using OpenAIChatClient.get_file_search_tool() (_chat_client.py, lines 1087-1128) to enable automatic context injection during agent execution
Implement hybrid RAG by combining OpenAITextVectorizer with RedisContextProvider when you need both semantic similarity and full-text search capabilities

Frequently Asked Questions

How does the Agent Framework automatically retrieve documents during a chat turn?

When you attach a file-search tool via get_file_search_tool(), the Agent Framework intercepts the LLM's request to search and calls the underlying vector store API. According to the source code in python/packages/openai/agent_framework_openai/_chat_client.py (lines 1087-1128), the method builds a FileSearchToolParam configuration that the OpenAI client uses to fetch relevant chunks. The framework then injects these chunks into the prompt context before sending the final request to the LLM.

Can I use Azure OpenAI instead of standard OpenAI for embeddings?

Yes. The OpenAIEmbeddingClient accepts configuration parameters that support Azure deployments. Pass your Azure endpoint and API key to the client constructor, and use the options parameter in get_embeddings() to specify Azure-specific settings such as custom dimensions or deployment names. The client interface remains identical regardless of the backend provider.

What is the difference between using OpenAI's native vector store versus Redis for RAG?

OpenAI's native vector store (client.vector_stores.create) provides managed embedding generation and retrieval optimized for OpenAI models, requiring minimal configuration. Redis implementations (RedisContextProvider) offer hybrid search capabilities—combining vector similarity with keyword matching—and full control over the embedding process through OpenAITextVectorizer. Choose Redis when you need on-premise storage, custom indexing logic, or retrieval that combines semantic and lexical matching.

How do I configure custom embedding dimensions for Azure OpenAI?

When calling OpenAIEmbeddingClient.get_embeddings(), pass a dictionary to the options parameter containing the dimensions key set to your desired value (e.g., options={"dimensions": 256}). This matches the Azure OpenAI API specification for reducing vector size, which can lower storage costs and improve retrieval speed while maintaining search quality.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:

curl -s "https://instagit.com/install.md"

Add to your MCP client configuration:

{
  "mcpServers": {
    "instagit": {
      "command": "npx",
      "args": ["-y", "instagit@latest"]
    }
  }
}

Ask your agent:

"Use Instagit MCP to understand how microsoft/agent-framework works."

Works with

Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →