Implementing RAG with Vector Databases like Pinecone and Weaviate: A Complete Guide

Implementing RAG with vector databases like Pinecone and Weaviate involves chunking documents into manageable segments, generating vector embeddings using OpenAI's text-embedding-ada-002, storing these vectors with metadata in your chosen database, retrieving relevant context through similarity search, and injecting that retrieved context into LLM prompts to generate accurate, grounded answers.

This guide distills production-ready patterns from the openai/openai-cookbook repository, providing concrete implementation details for both managed and self-hosted vector database architectures. Whether you choose Pinecone for its serverless scalability or Weaviate for its built-in vectorization modules, the core Retrieval-Augmented Generation pipeline remains consistent: embed, store, retrieve, and generate.

Document Ingestion and Chunking

The first stage of any RAG pipeline transforms raw documents into vector-ready chunks. Long documents must be segmented to fit within embedding model token limits and LLM context windows.

Chunking strategy typically uses recursive character splitting to maintain semantic coherence while respecting token boundaries. For the OpenAI text-embedding-ada-002 model, chunks of 500 tokens or fewer work effectively, though the specific size depends on your content density.

Embedding generation converts each chunk into a 1536-dimensional vector:

import openai

def embed_chunks(chunks):
    response = openai.embeddings.create(
        model="text-embedding-ada-002",
        input=chunks
    )
    return [item.embedding for item in response.data]

Both Pinecone and Weaviate expect these vectors alongside metadata—such as source document names, page numbers, or chunk indices—which enables filtered retrieval and source attribution in the final output.

Vector Store Setup and Configuration

While both databases serve the same RAG function, their initialization patterns differ significantly. Pinecone operates as a fully managed service with explicit index management, while Weaviate offers flexible deployment options including self-hosting and schema-driven configuration.

Pinecone Initialization

Pinecone requires explicit API key initialization and index creation before data ingestion. The index must be provisioned with the correct dimension (1536 for OpenAI embeddings) and distance metric (typically cosine similarity).

import os
import pinecone

pinecone.init(api_key=os.getenv("PINECONE_API_KEY"))

index_name = "rag-demo"
if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        name=index_name,
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-west-2")
    )

index = pinecone.Index(index_name)

Weaviate Initialization

Weaviate uses a schema-first approach where you define classes with specified properties and vectorization modules. The text2vec-openai module allows Weaviate to handle embedding generation automatically during data import, eliminating manual embedding steps.

import weaviate

client = weaviate.Client(
    url="https://your-instance.weaviate.network",
    auth_client_secret=weaviate.auth.AuthApiKey(
        api_key=os.getenv("WEAVIATE_API_KEY")
    )
)

class_obj = {
    "class": "Document",
    "vectorizer": "text2vec-openai",
    "properties": [
        {"name": "text", "dataType": ["text"]},
        {"name": "source", "dataType": ["string"]}
    ]
}
client.schema.create_class(class_obj)

Upserting Vectors and Metadata

After initialization, you must persist your document chunks. Both systems support batch operations and metadata storage, though their APIs differ in structure.

Pinecone Upsert Pattern

Pinecone requires manual embedding generation followed by explicit upsert operations. Each vector requires a unique ID, the embedding vector itself, and a metadata dictionary:

documents = [
    {"text": "The Eiffel Tower is located in Paris.", "source": "wiki-001"},
    {"text": "Python was created by Guido van Rossum.", "source": "wiki-002"}
]

embeddings = embed_chunks([d["text"] for d in documents])

vectors = [
    (f"id-{i}", embedding, {"text": doc["text"], "source": doc["source"]})
    for i, (embedding, doc) in enumerate(zip(embeddings, documents))
]

index.upsert(vectors=vectors, batch_size=100)

Weaviate Batch Import

When using the text2vec-openai module, Weaviate handles vectorization internally. You import raw text objects and the database generates embeddings automatically:

client.batch.configure(timeout_retries=3, dynamic=True)

with client.batch as batch:
    for doc in documents:
        batch.add_data_object(
            data_object={"text": doc["text"], "source": doc["source"]},
            class_name="Document"
        )

For manual embedding control in Weaviate (matching Pinecone's approach), you would supply the vector parameter directly in add_data_object, though this bypasses the convenience of the built-in module.

The retrieval phase converts user queries into embeddings and queries the vector store for semantically similar chunks.

Similarity Search in Pinecone

Pinecone uses a direct query method where you embed the query and specify the number of results (top_k) and metadata inclusion:

def retrieve_context(query, top_k=5):
    query_embedding = openai.embeddings.create(
        model="text-embedding-ada-002",
        input=query
    ).data[0].embedding
    
    results = index.query(
        vector=query_embedding,
        top_k=top_k,
        include_metadata=True
    )
    
    return [match["metadata"]["text"] for match in results["matches"]]

Similarity and Hybrid Search in Weaviate

Weaviate uses a GraphQL-style query interface. The with_near_vector method performs similarity search, while with_hybrid combines vector similarity with BM25 keyword matching—a feature unique to Weaviate in this comparison:

def retrieve_context(query, top_k=5):
    query_embedding = openai.embeddings.create(
        model="text-embedding-ada-002",
        input=query
    ).data[0].embedding
    
    # Pure vector similarity

    result = client.query.get(
        "Document", ["text", "source"]
    ).with_near_vector({
        "vector": query_embedding,
        "certainty": 0.7
    }).with_limit(top_k).do()
    
    return [hit["text"] for hit in result["data"]["Get"]["Document"]]

For hybrid search (vector + lexical), use with_hybrid({"query": query, "alpha": 0.5}) where alpha balances keyword versus vector weight, as demonstrated in examples/vector_databases/weaviate/hybrid-search-with-weaviate-and-openai.ipynb.

Prompt Construction and Generation

Once you retrieve relevant chunks, you concatenate them into a context block and inject them into the LLM prompt. This pattern is identical regardless of which vector database you use:

def generate_answer(query, context_chunks):
    context = "\n\n".join(context_chunks)
    
    prompt = f"""You are a helpful assistant. Use the following context to answer the question accurately. If the context doesn't contain the answer, say "I don't have enough information."

Context:
{context}

Question: {query}
Answer:"""
    
    response = openai.ChatCompletion.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.0
    )
    
    return response.choices[0].message.content

The examples/chatgpt/rag-quickstart/pinecone-retool/gpt-action-pinecone-retool-rag.ipynb file in the cookbook demonstrates this end-to-end flow integrated as a ChatGPT action.

Complete Code Examples

Below are minimal, self-contained implementations you can adapt immediately.

Pinecone RAG Implementation

import os
import openai
import pinecone

# Initialize clients

openai.api_key = os.getenv("OPENAI_API_KEY")
pinecone.init(api_key=os.getenv("PINECONE_API_KEY"))

# Setup index

index_name = "quickstart"
if index_name not in pinecone.list_indexes():
    pinecone.create_index(name=index_name, dimension=1536, metric="cosine")
index = pinecone.Index(index_name)

# Sample data ingestion

docs = [
    {"text": "The capital of France is Paris.", "source": "geography"},
    {"text": "The Python programming language was released in 1991.", "source": "history"}
]

embeddings = openai.embeddings.create(
    model="text-embedding-ada-002",
    input=[d["text"] for d in docs]
).data

vectors = [
    (f"doc-{i}", e.embedding, d) 
    for i, (e, d) in enumerate(zip(embeddings, docs))
]
index.upsert(vectors=vectors)

# RAG pipeline

def answer_question(query):
    q_embed = openai.embeddings.create(
        model="text-embedding-ada-002", input=query
    ).data[0].embedding
    
    results = index.query(vector=q_embed, top_k=3, include_metadata=True)
    context = "\n".join([m["metadata"]["text"] for m in results["matches"]])
    
    prompt = f"Context:\n{context}\n\nQuestion: {query}\nAnswer:"
    resp = openai.ChatCompletion.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )
    return resp.choices[0].message.content

print(answer_question("When was Python released?"))

Weaviate RAG Implementation

import os
import openai
import weaviate

# Initialize client

client = weaviate.Client(
    url=os.getenv("WEAVIATE_URL"),
    auth_client_secret=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY"))
)

# Define schema (run once)

schema = {
    "class": "Article",
    "vectorizer": "text2vec-openai",
    "properties": [
        {"name": "content", "dataType": ["text"]},
        {"name": "category", "dataType": ["string"]}
    ]
}
client.schema.create_class(schema)

# Import data (auto-embedded by Weaviate)

with client.batch as batch:
    batch.add_data_object(
        data_object={"content": "Machine learning is a subset of AI.", "category": "tech"},
        class_name="Article"
    )

# RAG pipeline

def answer_question(query):
    query_vec = openai.embeddings.create(
        model="text-embedding-ada-002", input=query
    ).data[0].embedding
    
    result = client.query.get(
        "Article", ["content"]
    ).with_near_vector({"vector": query_vec}).with_limit(3).do()
    
    context = "\n".join([item["content"] for item in result["data"]["Get"]["Article"]])
    
    prompt = f"Based on this context:\n{context}\n\nAnswer: {query}"
    resp = openai.ChatCompletion.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return resp.choices[0].message.content

print(answer_question("What is machine learning?"))

Key Repository Files

The openai-cookbook repository contains detailed implementations that extend these minimal examples:

File Path Description
examples/vector_databases/pinecone/Using_Pinecone_for_embeddings_search.ipynb Complete Pinecone workflow including index management, upsert operations, and similarity queries
examples/vector_databases/pinecone/Using_vision_modality_for_RAG_with_Pinecone.ipynb Extends RAG to multimodal embeddings using CLIP and vision models
examples/chatgpt/rag-quickstart/pinecone-retool/gpt-action-pinecone-retool-rag.ipynb Production RAG setup integrated as a ChatGPT GPT action
examples/vector_databases/weaviate/Using_Weaviate_for_embeddings_search.ipynb Weaviate fundamentals including schema creation and batch imports
examples/vector_databases/weaviate/question-answering-with-weaviate-and-openai.ipynb Full Q&A pipeline using Weaviate's qna-openai generative module
examples/vector_databases/weaviate/hybrid-search-with-weaviate-and-openai.ipynb Implementation of hybrid search combining BM25 and vector similarity
examples/utils/embeddings_utils.py Shared utility functions for embedding generation used across multiple notebooks

Summary

  • Implementing RAG with vector databases like Pinecone and Weaviate follows a consistent four-step pattern: chunk documents, generate embeddings, retrieve similar vectors, and augment LLM prompts with retrieved context.
  • Pinecone offers explicit control over the embedding process and index configuration through direct API calls, making it ideal for managed, high-throughput production environments.
  • Weaviate provides flexible deployment options and built-in vectorization modules (text2vec-openai) that can automate embedding generation, plus unique hybrid search capabilities combining lexical and vector relevance.
  • Both systems support metadata filtering, batch operations, and upsert semantics for incremental data updates, with implementations documented in the OpenAI Cookbook's vector database examples.

Frequently Asked Questions

What is the difference between Pinecone and Weaviate for RAG implementations?

Pinecone operates as a fully managed vector database service requiring you to generate embeddings externally using OpenAI's API before upserting vectors. Weaviate offers both managed and self-hosted options, with optional built-in modules like text2vec-openai that can generate embeddings automatically during data import, simplifying the pipeline but requiring schema configuration. Additionally, Weaviate supports hybrid search (combining BM25 keyword matching with vector similarity), while Pinecone focuses primarily on pure vector similarity search.

How do I choose the right chunk size for RAG document processing?

The optimal chunk size balances context completeness against embedding model token limits and LLM context window constraints. For text-embedding-ada-002, chunks of 400-500 tokens typically preserve semantic meaning while staying well below the model's 8191 token limit. Smaller chunks (100-200 tokens) improve retrieval precision for specific facts but may lose broader context, while larger chunks (1000+ tokens) risk diluting relevance signals and exceeding model limits. The openai-cookbook examples demonstrate using RecursiveCharacterTextSplitter to intelligently break at natural boundaries like paragraphs and sentences.

Can I use hybrid search with Pinecone for better RAG results?

No, Pinecone currently does not support native hybrid search combining vector similarity with traditional keyword matching (BM25). If you require hybrid retrieval capabilities—useful when queries contain specific terminology or proper names that vector search might miss—you should implement Weaviate, which offers the with_hybrid GraphQL query method demonstrated in examples/vector_databases/weaviate/hybrid-search-with-weaviate-and-openai.ipynb. Alternatively, you can implement a reranking layer or combine Pinecone results with a separate keyword search index manually.

How do I handle incremental updates when my document corpus changes?

Both databases support upsert semantics for incremental updates. In Pinecone, use index.upsert() with existing IDs to overwrite vectors or index.delete() followed by fresh upserts for modified documents. In Weaviate, use client.data_object.update() or re-import with the same UUID to replace objects. For production pipelines, maintain a mapping between source document IDs and vector IDs to efficiently invalidate stale embeddings when source content changes, as shown in the batch processing patterns within Using_Pinecone_for_embeddings_search.ipynb and Using_Weaviate_for_embeddings_search.ipynb.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →