# Implementing RAG with Vector Databases like Pinecone and Weaviate: A Complete Guide

> Learn to implement RAG with vector databases like Pinecone and Weaviate. This guide explains chunking, embeddings, storage, retrieval, and prompt injection for grounded LLM answers.

- Repository: [OpenAI/openai-cookbook](https://github.com/openai/openai-cookbook)
- Tags: how-to-guide
- Published: 2026-03-02

---

**Implementing RAG with vector databases like Pinecone and Weaviate involves chunking documents into manageable segments, generating vector embeddings using OpenAI's `text-embedding-ada-002`, storing these vectors with metadata in your chosen database, retrieving relevant context through similarity search, and injecting that retrieved context into LLM prompts to generate accurate, grounded answers.**

This guide distills production-ready patterns from the [openai/openai-cookbook](https://github.com/openai/openai-cookbook) repository, providing concrete implementation details for both managed and self-hosted vector database architectures. Whether you choose Pinecone for its serverless scalability or Weaviate for its built-in vectorization modules, the core Retrieval-Augmented Generation pipeline remains consistent: embed, store, retrieve, and generate.

## Document Ingestion and Chunking

The first stage of any RAG pipeline transforms raw documents into vector-ready chunks. Long documents must be segmented to fit within embedding model token limits and LLM context windows.

**Chunking strategy** typically uses recursive character splitting to maintain semantic coherence while respecting token boundaries. For the OpenAI `text-embedding-ada-002` model, chunks of 500 tokens or fewer work effectively, though the specific size depends on your content density.

**Embedding generation** converts each chunk into a 1536-dimensional vector:

```python
import openai

def embed_chunks(chunks):
    response = openai.embeddings.create(
        model="text-embedding-ada-002",
        input=chunks
    )
    return [item.embedding for item in response.data]

```

Both Pinecone and Weaviate expect these vectors alongside **metadata**—such as source document names, page numbers, or chunk indices—which enables filtered retrieval and source attribution in the final output.

## Vector Store Setup and Configuration

While both databases serve the same RAG function, their initialization patterns differ significantly. Pinecone operates as a fully managed service with explicit index management, while Weaviate offers flexible deployment options including self-hosting and schema-driven configuration.

### Pinecone Initialization

Pinecone requires explicit API key initialization and index creation before data ingestion. The index must be provisioned with the correct dimension (1536 for OpenAI embeddings) and distance metric (typically cosine similarity).

```python
import os
import pinecone

pinecone.init(api_key=os.getenv("PINECONE_API_KEY"))

index_name = "rag-demo"
if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        name=index_name,
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-west-2")
    )

index = pinecone.Index(index_name)

```

### Weaviate Initialization

Weaviate uses a schema-first approach where you define **classes** with specified properties and vectorization modules. The `text2vec-openai` module allows Weaviate to handle embedding generation automatically during data import, eliminating manual embedding steps.

```python
import weaviate

client = weaviate.Client(
    url="https://your-instance.weaviate.network",
    auth_client_secret=weaviate.auth.AuthApiKey(
        api_key=os.getenv("WEAVIATE_API_KEY")
    )
)

class_obj = {
    "class": "Document",
    "vectorizer": "text2vec-openai",
    "properties": [
        {"name": "text", "dataType": ["text"]},
        {"name": "source", "dataType": ["string"]}
    ]
}
client.schema.create_class(class_obj)

```

## Upserting Vectors and Metadata

After initialization, you must persist your document chunks. Both systems support batch operations and metadata storage, though their APIs differ in structure.

### Pinecone Upsert Pattern

Pinecone requires manual embedding generation followed by explicit upsert operations. Each vector requires a unique ID, the embedding vector itself, and a metadata dictionary:

```python
documents = [
    {"text": "The Eiffel Tower is located in Paris.", "source": "wiki-001"},
    {"text": "Python was created by Guido van Rossum.", "source": "wiki-002"}
]

embeddings = embed_chunks([d["text"] for d in documents])

vectors = [
    (f"id-{i}", embedding, {"text": doc["text"], "source": doc["source"]})
    for i, (embedding, doc) in enumerate(zip(embeddings, documents))
]

index.upsert(vectors=vectors, batch_size=100)

```

### Weaviate Batch Import

When using the `text2vec-openai` module, Weaviate handles vectorization internally. You import raw text objects and the database generates embeddings automatically:

```python
client.batch.configure(timeout_retries=3, dynamic=True)

with client.batch as batch:
    for doc in documents:
        batch.add_data_object(
            data_object={"text": doc["text"], "source": doc["source"]},
            class_name="Document"
        )

```

For manual embedding control in Weaviate (matching Pinecone's approach), you would supply the `vector` parameter directly in `add_data_object`, though this bypasses the convenience of the built-in module.

## Retrieval: Similarity and Hybrid Search

The retrieval phase converts user queries into embeddings and queries the vector store for semantically similar chunks.

### Similarity Search in Pinecone

Pinecone uses a direct query method where you embed the query and specify the number of results (`top_k`) and metadata inclusion:

```python
def retrieve_context(query, top_k=5):
    query_embedding = openai.embeddings.create(
        model="text-embedding-ada-002",
        input=query
    ).data[0].embedding
    
    results = index.query(
        vector=query_embedding,
        top_k=top_k,
        include_metadata=True
    )
    
    return [match["metadata"]["text"] for match in results["matches"]]

```

### Similarity and Hybrid Search in Weaviate

Weaviate uses a GraphQL-style query interface. The `with_near_vector` method performs similarity search, while `with_hybrid` combines vector similarity with BM25 keyword matching—a feature unique to Weaviate in this comparison:

```python
def retrieve_context(query, top_k=5):
    query_embedding = openai.embeddings.create(
        model="text-embedding-ada-002",
        input=query
    ).data[0].embedding
    
    # Pure vector similarity

    result = client.query.get(
        "Document", ["text", "source"]
    ).with_near_vector({
        "vector": query_embedding,
        "certainty": 0.7
    }).with_limit(top_k).do()
    
    return [hit["text"] for hit in result["data"]["Get"]["Document"]]

```

For **hybrid search** (vector + lexical), use `with_hybrid({"query": query, "alpha": 0.5})` where alpha balances keyword versus vector weight, as demonstrated in `examples/vector_databases/weaviate/hybrid-search-with-weaviate-and-openai.ipynb`.

## Prompt Construction and Generation

Once you retrieve relevant chunks, you concatenate them into a context block and inject them into the LLM prompt. This pattern is identical regardless of which vector database you use:

```python
def generate_answer(query, context_chunks):
    context = "\n\n".join(context_chunks)
    
    prompt = f"""You are a helpful assistant. Use the following context to answer the question accurately. If the context doesn't contain the answer, say "I don't have enough information."

Context:
{context}

Question: {query}
Answer:"""
    
    response = openai.ChatCompletion.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.0
    )
    
    return response.choices[0].message.content

```

The `examples/chatgpt/rag-quickstart/pinecone-retool/gpt-action-pinecone-retool-rag.ipynb` file in the cookbook demonstrates this end-to-end flow integrated as a ChatGPT action.

## Complete Code Examples

Below are minimal, self-contained implementations you can adapt immediately.

### Pinecone RAG Implementation

```python
import os
import openai
import pinecone

# Initialize clients

openai.api_key = os.getenv("OPENAI_API_KEY")
pinecone.init(api_key=os.getenv("PINECONE_API_KEY"))

# Setup index

index_name = "quickstart"
if index_name not in pinecone.list_indexes():
    pinecone.create_index(name=index_name, dimension=1536, metric="cosine")
index = pinecone.Index(index_name)

# Sample data ingestion

docs = [
    {"text": "The capital of France is Paris.", "source": "geography"},
    {"text": "The Python programming language was released in 1991.", "source": "history"}
]

embeddings = openai.embeddings.create(
    model="text-embedding-ada-002",
    input=[d["text"] for d in docs]
).data

vectors = [
    (f"doc-{i}", e.embedding, d) 
    for i, (e, d) in enumerate(zip(embeddings, docs))
]
index.upsert(vectors=vectors)

# RAG pipeline

def answer_question(query):
    q_embed = openai.embeddings.create(
        model="text-embedding-ada-002", input=query
    ).data[0].embedding
    
    results = index.query(vector=q_embed, top_k=3, include_metadata=True)
    context = "\n".join([m["metadata"]["text"] for m in results["matches"]])
    
    prompt = f"Context:\n{context}\n\nQuestion: {query}\nAnswer:"
    resp = openai.ChatCompletion.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )
    return resp.choices[0].message.content

print(answer_question("When was Python released?"))

```

### Weaviate RAG Implementation

```python
import os
import openai
import weaviate

# Initialize client

client = weaviate.Client(
    url=os.getenv("WEAVIATE_URL"),
    auth_client_secret=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY"))
)

# Define schema (run once)

schema = {
    "class": "Article",
    "vectorizer": "text2vec-openai",
    "properties": [
        {"name": "content", "dataType": ["text"]},
        {"name": "category", "dataType": ["string"]}
    ]
}
client.schema.create_class(schema)

# Import data (auto-embedded by Weaviate)

with client.batch as batch:
    batch.add_data_object(
        data_object={"content": "Machine learning is a subset of AI.", "category": "tech"},
        class_name="Article"
    )

# RAG pipeline

def answer_question(query):
    query_vec = openai.embeddings.create(
        model="text-embedding-ada-002", input=query
    ).data[0].embedding
    
    result = client.query.get(
        "Article", ["content"]
    ).with_near_vector({"vector": query_vec}).with_limit(3).do()
    
    context = "\n".join([item["content"] for item in result["data"]["Get"]["Article"]])
    
    prompt = f"Based on this context:\n{context}\n\nAnswer: {query}"
    resp = openai.ChatCompletion.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return resp.choices[0].message.content

print(answer_question("What is machine learning?"))

```

## Key Repository Files

The `openai-cookbook` repository contains detailed implementations that extend these minimal examples:

| File Path | Description |
|-----------|-------------|
| `examples/vector_databases/pinecone/Using_Pinecone_for_embeddings_search.ipynb` | Complete Pinecone workflow including index management, upsert operations, and similarity queries |
| `examples/vector_databases/pinecone/Using_vision_modality_for_RAG_with_Pinecone.ipynb` | Extends RAG to multimodal embeddings using CLIP and vision models |
| `examples/chatgpt/rag-quickstart/pinecone-retool/gpt-action-pinecone-retool-rag.ipynb` | Production RAG setup integrated as a ChatGPT GPT action |
| `examples/vector_databases/weaviate/Using_Weaviate_for_embeddings_search.ipynb` | Weaviate fundamentals including schema creation and batch imports |
| `examples/vector_databases/weaviate/question-answering-with-weaviate-and-openai.ipynb` | Full Q&A pipeline using Weaviate's `qna-openai` generative module |
| `examples/vector_databases/weaviate/hybrid-search-with-weaviate-and-openai.ipynb` | Implementation of hybrid search combining BM25 and vector similarity |
| [`examples/utils/embeddings_utils.py`](https://github.com/openai/openai-cookbook/blob/main/examples/utils/embeddings_utils.py) | Shared utility functions for embedding generation used across multiple notebooks |

## Summary

- **Implementing RAG with vector databases like Pinecone and Weaviate** follows a consistent four-step pattern: chunk documents, generate embeddings, retrieve similar vectors, and augment LLM prompts with retrieved context.
- **Pinecone** offers explicit control over the embedding process and index configuration through direct API calls, making it ideal for managed, high-throughput production environments.
- **Weaviate** provides flexible deployment options and built-in vectorization modules (`text2vec-openai`) that can automate embedding generation, plus unique hybrid search capabilities combining lexical and vector relevance.
- Both systems support metadata filtering, batch operations, and upsert semantics for incremental data updates, with implementations documented in the OpenAI Cookbook's vector database examples.

## Frequently Asked Questions

### What is the difference between Pinecone and Weaviate for RAG implementations?

**Pinecone** operates as a fully managed vector database service requiring you to generate embeddings externally using OpenAI's API before upserting vectors. **Weaviate** offers both managed and self-hosted options, with optional built-in modules like `text2vec-openai` that can generate embeddings automatically during data import, simplifying the pipeline but requiring schema configuration. Additionally, Weaviate supports hybrid search (combining BM25 keyword matching with vector similarity), while Pinecone focuses primarily on pure vector similarity search.

### How do I choose the right chunk size for RAG document processing?

The optimal chunk size balances context completeness against embedding model token limits and LLM context window constraints. For `text-embedding-ada-002`, chunks of 400-500 tokens typically preserve semantic meaning while staying well below the model's 8191 token limit. Smaller chunks (100-200 tokens) improve retrieval precision for specific facts but may lose broader context, while larger chunks (1000+ tokens) risk diluting relevance signals and exceeding model limits. The `openai-cookbook` examples demonstrate using `RecursiveCharacterTextSplitter` to intelligently break at natural boundaries like paragraphs and sentences.

### Can I use hybrid search with Pinecone for better RAG results?

**No**, Pinecone currently does not support native hybrid search combining vector similarity with traditional keyword matching (BM25). If you require hybrid retrieval capabilities—useful when queries contain specific terminology or proper names that vector search might miss—you should implement **Weaviate**, which offers the `with_hybrid` GraphQL query method demonstrated in `examples/vector_databases/weaviate/hybrid-search-with-weaviate-and-openai.ipynb`. Alternatively, you can implement a reranking layer or combine Pinecone results with a separate keyword search index manually.

### How do I handle incremental updates when my document corpus changes?

Both databases support **upsert** semantics for incremental updates. In **Pinecone**, use `index.upsert()` with existing IDs to overwrite vectors or `index.delete()` followed by fresh upserts for modified documents. In **Weaviate**, use `client.data_object.update()` or re-import with the same UUID to replace objects. For production pipelines, maintain a mapping between source document IDs and vector IDs to efficiently invalidate stale embeddings when source content changes, as shown in the batch processing patterns within `Using_Pinecone_for_embeddings_search.ipynb` and `Using_Weaviate_for_embeddings_search.ipynb`.