Building Temporal Agents with Knowledge Graphs for Retrieval: A Complete Guide
The OpenAI Cookbook demonstrates how to build temporal agents by storing time-aware facts in a SQLite knowledge graph that automatically invalidates stale predictions and tracks entity evolution through valid_from and valid_to timestamps.
This guide explores the temporal knowledge graph implementation in the openai/openai-cookbook repository, showing how to transform unstructured text into a queryable temporal database. The architecture enables AI agents to answer time-sensitive questions like "What did the company claim before Q2 2024?" while automatically filtering out expired predictions and updated facts.
Architecture Overview
The implementation consists of three distinct layers that work together to enable temporal reasoning. Each layer handles a specific concern: data modeling, utility functions, and persistent storage.
Data Model Layer
The foundation resides in models.py, which defines Pydantic models for entities, events, and their temporal semantics. The TemporalEvent class captures when statements are valid, when they expire, and how they relate to graph entities through subject-predicate-object triplets.
Utility Layer
The utils.py file provides safe date handling and ISO conversion helpers such as parse_date_str and safe_iso. These ensure consistent timestamp formatting across the ingestion pipeline.
Persistence Layer
The db_interface.py module implements a SQLite-backed knowledge graph that materializes relationships between transcripts, chunks, events, triplets, and entities. This layer supports CRUD operations and batch temporal updates without requiring external database services.
Temporal Data Model Design
Effective temporal retrieval requires explicit modeling of time validity and statement types. The system tracks four critical dimensions for every fact stored in the graph.
Time Validity Fields
Every TemporalEvent carries precision timestamps that define its lifecycle:
- valid_at: When the statement becomes true or applicable
- invalid_at: When the statement ceases to be valid (set when superseded)
- expired_at: Automatically computed by the
set_expired_atvalidator wheninvalid_atis populated
Statement Classification
The model distinguishes between ATEMPORAL, STATIC, and DYNAMIC temporal types, alongside FACT, OPINION, and PREDICTION statement types. This classification allows agents to discard outdated predictions while preserving historical facts for audit trails.
Graph Relationships
Events connect to the knowledge graph through triplets stored as JSON arrays. Each triplet represents a subject-predicate-object edge that links the event to canonical entities registered in the entities table.
SQLite Persistence Implementation
The database layer in db_interface.py provides zero-configuration storage suitable for prototyping and production deployment alike.
Schema Design
The relational schema mirrors classic knowledge graph patterns:
- transcripts: Source documents with company and quarter metadata
- chunks: Segmented text portions linked to parent transcripts
- events: Temporal statements with embedding vectors and validity windows
- triplets: Edge records connecting events to entities
- entities: Canonical nodes with optional aliases and descriptions
Core Database Operations
The interface exposes specific functions for graph construction:
make_connection(memory=True): Creates an in-memory SQLite instance for demosinsert_transcript(): Ingests source documents with metadatainsert_chunk(): Stores text segments for processinginsert_event(): Persists temporal statements with JSON-encoded tripletsinsert_entity()/insert_canonical_entity(): Registers nodes in the graphupdate_events_batch(): Marks events as invalid and sets expiration timestampsupdate_entity_references(): Merges entity aliases into canonical forms
Building the Knowledge Graph Pipeline
The complete data flow moves from raw text to queryable temporal facts through six distinct stages.
1. Document Ingestion
Transcripts enter the system via insert_transcript, which assigns unique IDs and extracts metadata like company name and earnings quarter.
2. Text Chunking
Large documents split into manageable pieces through insert_chunk, preserving the relationship to parent transcripts for provenance tracking.
3. Information Extraction
Each chunk processes through an LLM or extraction layer to produce:
- A textual statement
- TemporalType classification (STATIC, DYNAMIC, etc.)
- StatementType classification (FACT, OPINION, PREDICTION)
- An embedding vector for semantic similarity search
- Triplets representing subject-predicate-object relationships
4. Graph Persistence
Extracted components materialize in the database through coordinated inserts:
insert_eventwrites the temporal statement with validity windowsinsert_tripletstores each graph edge with resolved entity IDsinsert_entityregisters new nodes or references existing canonical entities
5. Temporal Updates
When new information invalidates previous facts, the system calls update_events_batch to set invalid_at and expired_at timestamps, recording the ID of the invalidating event for audit trails.
6. Time-Aware Retrieval
Queries filter on valid_at and expired_at boundaries alongside temporal_type constraints. Helper functions like has_events and get_all_unique_predicates enable discovery of available graph relationships without writing raw SQL.
Practical Implementation Guide
The following examples demonstrate the complete workflow using the actual API from examples/partners/temporal_agents_with_knowledge_graphs/.
Setting Up the Database
from db_interface import make_connection, insert_transcript, insert_chunk, insert_entity, insert_event, update_events_batch
from models import Entity, TemporalEvent, TemporalType, StatementType
import uuid
import datetime
# Create an in-memory database for rapid prototyping
conn = make_connection(memory=True, refresh=True)
Ingesting Source Documents
transcript_id = uuid.uuid4()
insert_transcript(
conn,
{
"id": transcript_id,
"text": "Q1 2024 earnings call transcript...",
"company": "Acme Corp",
"date": datetime.datetime(2024, 4, 30, tzinfo=datetime.timezone.utc),
"quarter": "Q1-2024",
},
)
chunk_id = uuid.uuid4()
insert_chunk(
conn,
{
"id": chunk_id,
"transcript_id": transcript_id,
"text": "Revenue grew 12% YoY.",
"metadata": None,
},
)
Registering Canonical Entities
entity_id = uuid.uuid4()
insert_entity(
conn,
{
"id": entity_id,
"name": "Acme Corp",
"type": "Company",
"description": "A fictional manufacturing company.",
},
)
Creating Temporal Events
event = TemporalEvent(
id=uuid.uuid4(),
chunk_id=chunk_id,
statement="Revenue grew 12% YoY.",
embedding=[0.0] * 256, # Placeholder for actual embedding vector
triplets=[entity_id], # Links to subject entity
valid_at=datetime.datetime(2024, 4, 30, tzinfo=datetime.timezone.utc),
invalid_at=None,
temporal_type=TemporalType.STATIC,
statement_type=StatementType.FACT,
)
insert_event(conn, event.dict())
Handling Corrections and Updates
# Create a correction event that supersedes the previous claim
correction = TemporalEvent(
id=uuid.uuid4(),
chunk_id=chunk_id,
statement="Revenue actually grew 9% YoY.",
embedding=[0.0] * 256,
triplets=[entity_id],
valid_at=datetime.datetime(2024, 7, 1, tzinfo=datetime.timezone.utc),
invalid_at=datetime.datetime(2024, 7, 1, tzinfo=datetime.timezone.utc),
temporal_type=TemporalType.DYNAMIC,
statement_type=StatementType.FACT,
)
# Persist the correction
insert_event(conn, correction.dict())
# Invalidate the original event (sets expired_at automatically)
update_events_batch(conn, [event])
Querying Current Valid Facts
import pandas as pd
def get_valid_facts(conn):
query = """
SELECT statement, created_at
FROM events
WHERE statement_type = 'FACT'
AND (invalid_at IS NULL OR invalid_at > datetime('now'))
"""
return pd.read_sql_query(query, conn)
print(get_valid_facts(conn))
Extending the Architecture
The SQLite implementation provides a foundation that supports several production enhancements.
Vector Store Integration: Replace the BLOB embedding column with dedicated vector databases like Pinecone or FAISS for scalable similarity search across millions of events.
Graph Query Layer: Add a Cypher or GraphQL interface atop the relational tables to expose richer traversals beyond the current triplet storage format.
Agent Tooling: Wrap the persistence layer with an OpenAI Functions schema, enabling ChatGPT-driven agents to invoke insert_event, update_events_batch, and retrieval functions directly during conversations.
Summary
- The
openai/openai-cookbookprovides a complete reference implementation for building temporal agents with knowledge graphs for retrieval using Python and SQLite. - The
TemporalEventmodel inmodels.pytracks validity throughvalid_at,invalid_at, and automatically computedexpired_attimestamps. - The
db_interface.pymodule offers CRUD operations for transcripts, chunks, entities, and events, plus batch updates for temporal invalidation. - Time-aware queries filter on validity windows to answer historical questions while excluding stale predictions.
- The architecture supports migration to vector stores and graph databases for production-scale deployments.
Frequently Asked Questions
How does the system handle facts that change over time?
When a correction or update arrives, the agent creates a new TemporalEvent with updated validity timestamps. The system then calls update_events_batch to set invalid_at on the superseded event, which triggers the set_expired_at validator to record when the old fact ceased to be valid. This preserves the complete audit trail while ensuring retrieval queries only return currently valid statements.
What database does the example use, and can it scale?
The reference implementation uses SQLite via the make_connection function in db_interface.py, providing zero-configuration deployment suitable for demos and small-scale applications. The relational schema can migrate to PostgreSQL, MySQL, or dedicated graph databases with minimal changes, and the embedding storage can shift to vector databases like Pinecone for high-volume similarity search.
How are entities managed to avoid duplication?
The system uses insert_canonical_entity to create master entity records and update_entity_references to merge aliases into canonical forms. Each entity receives a unique UUID that serves as the stable node identifier in subject-predicate-object triplets stored within event records.
What types of temporal statements can the graph store?
The model supports three temporal types (ATEMPORAL, STATIC, DYNAMIC) and three statement types (FACT, OPINION, PREDICTION). This taxonomy allows agents to apply different retrieval logic—for example, ignoring expired predictions while retaining historical facts for trend analysis, as implemented in the validity filtering queries of temporal_agents.ipynb.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →