How to Use Vector Stores and Embeddings for RAG Implementations in Microsoft Agent Framework
Use the OpenAIEmbeddingClient to generate dense vectors, persist them in a vector store via client.vector_stores.create, and attach the store to an Agent through get_file_search_tool() for automatic retrieval-augmented generation.
The Microsoft Agent Framework provides first-class support for Retrieval-Augmented Generation (RAG) by combining embedding clients with vector store backends. This architecture enables agents to retrieve relevant document chunks during chat turns, grounding LLM responses in external knowledge. The framework implements this pattern through three core components: an embedding client for vectorization, a vector store for persistence, and a file-search tool that exposes storage to the agent runtime.
Understanding the RAG Architecture
The RAG pipeline in Agent Framework consists of three integrated components that handle the flow from raw text to retrieved context.
The Embedding Client
The OpenAIEmbeddingClient generates dense embeddings from raw text using providers like OpenAI, Azure OpenAI, or Ollama. Located in python/packages/openai/agent_framework_openai/_embedding_client.py (lines 316-324), this client supports single text, batch processing, and custom dimension configurations. It exposes the get_embeddings() method which returns vector representations that can be stored or compared for similarity search.
The Vector Store
Vector stores persist embeddings alongside original documents, enabling efficient similarity queries. The framework supports native provider stores (such as OpenAI File Search or Foundry) and custom backends like Redis Search. In python/samples/02-agents/providers/openai/client_with_file_search.py (lines 22-36), the create_vector_store helper demonstrates store creation via client.client.vector_stores.create, uploading files, and linking them to the storage backend.
The File-Search Tool
The get_file_search_tool() method in python/packages/openai/agent_framework_openai/_chat_client.py (lines 1087-1128) creates a FileSearchToolParam configuration that attaches vector store IDs to an Agent. When the agent runs, the framework automatically invokes the tool, fetches the most relevant chunks, and injects them into the chat request context.
Implementing OpenAI File-Search RAG
The quickest path to RAG uses OpenAI's native vector store and file search capabilities. This approach handles embedding generation internally, requiring only document upload and tool configuration.
The following implementation from python/samples/02-agents/providers/openai/client_with_file_search.py demonstrates the complete workflow:
from agent_framework import Agent
from agent_framework.openai import OpenAIChatClient
import asyncio
async def main() -> None:
client = OpenAIChatClient()
# ① Upload a document and create a vector store
file_id, vector_store_id = await create_vector_store(client)
# ② Build the agent with the file-search tool
agent = Agent(
client=client,
instructions="You are a helpful assistant that can search through files.",
tools=[client.get_file_search_tool(vector_store_ids=[vector_store_id])],
)
# ③ Run a query that requires document lookup
response = await agent.run("What is the weather today? Do a file search.")
print(f"Agent: {response}")
# Cleanup
await delete_vector_store(client, file_id, vector_store_id)
asyncio.run(main())
Key implementation details:
create_vector_storehandles theclient.client.vector_stores.createcall and file uploadget_file_search_toolaccepts a list ofvector_store_idsand returns a tool configuration the Agent invokes automatically during the chat turn
Generating Embeddings Directly with OpenAIEmbeddingClient
For scenarios requiring explicit control over vectorization—such as custom indexing, caching, or hybrid search—use the OpenAIEmbeddingClient directly. This class provides batch processing, dimension customization, and provider-agnostic configuration.
Example implementation showcasing batch and custom dimension support:
from agent_framework.openai import OpenAIEmbeddingClient
import asyncio, os
from dotenv import load_dotenv
load_dotenv()
async def main() -> None:
client = OpenAIEmbeddingClient(
model="text-embedding-3-small",
api_key=os.getenv("OPENAI_API_KEY"),
)
# Single text embedding
result = await client.get_embeddings(["Hello, world!"])
print("Dimensions:", result[0].dimensions)
# Batch processing
batch = await client.get_embeddings([
"The weather is sunny today.",
"Machine learning is fascinating."
])
print("Batch size:", len(batch))
# Custom dimensions (e.g., for Azure OpenAI)
custom = await client.get_embeddings(
["Custom dimensions example"],
options={"dimensions": 256}
)
print("Custom dims:", custom[0].dimensions)
asyncio.run(main())
The get_embeddings() method accepts an optional options dictionary for provider-specific parameters like dimensions, making it compatible with both standard OpenAI and Azure OpenAI deployments.
Building Hybrid RAG with Redis (Vector + Keyword)
For production scenarios requiring both semantic similarity and keyword matching, the Redis context provider integrates with OpenAITextVectorizer to enable hybrid search. This configuration stores vectors in a RediSearch index while maintaining full-text search capabilities, all within the agent runtime.
The following example from python/samples/02-agents/context_providers/redis/redis_conversation.py demonstrates the setup:
import asyncio, os
from agent_framework import Agent
from agent_framework.foundry import FoundryChatClient
from agent_framework.redis import RedisContextProvider
from redisvl.extensions.cache.embeddings import EmbeddingsCache
from redisvl.utils.vectorize import OpenAITextVectorizer
from dotenv import load_dotenv
load_dotenv()
async def main() -> None:
# ① Configure vectorizer with Redis caching
vectorizer = OpenAITextVectorizer(
model="text-embedding-ada-002",
api_config={"api_key": os.getenv("OPENAI_API_KEY")},
cache=EmbeddingsCache(
name="openai_embeddings_cache",
redis_url="redis://localhost:6379"
),
)
# ② Initialize Redis context provider for hybrid search
provider = RedisContextProvider(
source_id="redis_context",
redis_url="redis://localhost:6379",
index_name="redis_conversation",
prefix="redis_conversation",
application_id="my_app",
agent_id="my_agent",
user_id="user123",
redis_vectorizer=vectorizer,
vector_field_name="vector",
vector_algorithm="hnsw",
vector_distance_metric="cosine",
)
# ③ Create Foundry client (any LLM client works)
client = FoundryChatClient(
project_endpoint=os.getenv("FOUNDRY_PROJECT_ENDPOINT"),
model=os.getenv("FOUNDRY_MODEL"),
credential=None,
)
# ④ Wire provider to Agent—no explicit tools needed
agent = Agent(
client=client,
name="HybridRAGAgent",
instructions="Use stored context to answer queries.",
tools=[],
context_providers=[provider],
)
session = agent.create_session()
result = await agent.run("What is the capital of France?", session=session)
print("Agent:", result)
# Cleanup
await provider.redis_index.delete()
asyncio.run(main())
Critical configuration parameters in python/packages/redis/agent_framework_redis/_context_provider.py (lines 61-80) include:
vector_algorithm: Set tohnswfor approximate nearest neighbor searchvector_distance_metric: Usecosinefor semantic similarity matchingredis_vectorizer: TheOpenAITextVectorizerinstance that embeds queries on-the-fly during retrieval
Summary
- Use
OpenAIEmbeddingClient(_embedding_client.py, lines 316-324) to generate dense vectors with support for batch processing and custom dimensions - Create vector stores via
client.vector_stores.createfor native OpenAI File Search, or configureRedisContextProviderfor hybrid vector + keyword retrieval - Attach retrieval tools using
OpenAIChatClient.get_file_search_tool()(_chat_client.py, lines 1087-1128) to enable automatic context injection during agent execution - Implement hybrid RAG by combining
OpenAITextVectorizerwithRedisContextProviderwhen you need both semantic similarity and full-text search capabilities
Frequently Asked Questions
How does the Agent Framework automatically retrieve documents during a chat turn?
When you attach a file-search tool via get_file_search_tool(), the Agent Framework intercepts the LLM's request to search and calls the underlying vector store API. According to the source code in python/packages/openai/agent_framework_openai/_chat_client.py (lines 1087-1128), the method builds a FileSearchToolParam configuration that the OpenAI client uses to fetch relevant chunks. The framework then injects these chunks into the prompt context before sending the final request to the LLM.
Can I use Azure OpenAI instead of standard OpenAI for embeddings?
Yes. The OpenAIEmbeddingClient accepts configuration parameters that support Azure deployments. Pass your Azure endpoint and API key to the client constructor, and use the options parameter in get_embeddings() to specify Azure-specific settings such as custom dimensions or deployment names. The client interface remains identical regardless of the backend provider.
What is the difference between using OpenAI's native vector store versus Redis for RAG?
OpenAI's native vector store (client.vector_stores.create) provides managed embedding generation and retrieval optimized for OpenAI models, requiring minimal configuration. Redis implementations (RedisContextProvider) offer hybrid search capabilities—combining vector similarity with keyword matching—and full control over the embedding process through OpenAITextVectorizer. Choose Redis when you need on-premise storage, custom indexing logic, or retrieval that combines semantic and lexical matching.
How do I configure custom embedding dimensions for Azure OpenAI?
When calling OpenAIEmbeddingClient.get_embeddings(), pass a dictionary to the options parameter containing the dimensions key set to your desired value (e.g., options={"dimensions": 256}). This matches the Azure OpenAI API specification for reducing vector size, which can lower storage costs and improve retrieval speed while maintaining search quality.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →