Core Concepts
Understanding the key concepts and architecture of GraphRAG SDK.
Architecture Overview
GraphRAG SDK is built on four core abstractions:
┌─────────────────────────────────────────────────┐
│ Graph (User API) │
│ createGraph() + query() │
└─────────────────────────────────────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌─────────┐
│ Provider │ │ Storage │ │ AI │
│ +Pipeline│ │ Factories│ │ SDK │
└──────────┘ └──────────┘ └─────────┘1. Graph
createGraph() wires up the provider with models and namespace. query() runs a search strategy and generates an answer.
const graph = createGraph({ graph: provider, model, embedding, namespace });
await graph.insert(documents);
const { text } = await query({ graph, search: strategy, prompt: "..." });2. Provider + Pipeline
The provider (e.g. lexicalGraph()) defines how the graph is built. It accepts a pipeline of composable processors that run during insert():
insert() pipeline:
1. Store chunks → KV + vector DB (always, built-in)
2. Run pipeline processors in order (configurable)Each processor builds on what previous processors created:
| Processor | Reads → Produces |
|---|---|
chunkEntityGeneration() | Chunks → entity nodes, relationship edges |
entityCommunityGeneration() | Entity graph → communities, reports |
similarChunkConnection() | Chunk embeddings → SIMILAR_TO edges |
hypotheticalQuestionGeneration() | Chunks → question embeddings |
3. Storage Factories
Storage backends are passed as factory functions that accept a namespace and return a store instance:
graphStoreFactory: (ns) => neo4jGraph({ uri: "bolt://localhost", namespace: ns }),
vectorStoreFactory: (ns) => qdrantVector({ url: "http://localhost:6333", collection: ns }),
kvStoreFactory: (ns) => redisKV({ url: "redis://localhost:6379", prefix: ns }),Three storage types:
- GraphStore — nodes, edges, clustering, community schema
- VectorStore — embeddings with multiple named indexes (chunks, entities, questions)
- KVStore — key-value metadata and chunk content
4. AI SDK Integration
GraphRAG SDK uses the Vercel AI SDK for:
- LLM calls (answer generation, entity extraction, community reports)
- Embeddings (vector representations for search)
Any provider supported by the AI SDK works: OpenAI, Anthropic, Google, etc.
The Insert Pipeline
When you call graph.insert():
Input Text
│
▼
Chunking (configurable size + overlap)
│
▼
Store chunks → KV store + vector DB ('chunks' index)
│
▼
Run pipeline processors in order:
├─ chunkEntityGeneration() → entities + relations in graph store
├─ entityCommunityGeneration() → Leiden clustering + community reports
├─ similarChunkConnection() → SIMILAR_TO edges between chunks
└─ hypotheticalQuestionGeneration() → question embeddings in 'questions' indexPipeline metadata is persisted in the KV store. On subsequent inserts, the pipeline config is validated — mismatched configs throw PipelineConfigMismatchError.
Search Strategies
query() accepts a search strategy that retrieves context from the graph. Each strategy leverages different parts of the pipeline:
| Strategy | What it searches | Requires processor |
|---|---|---|
naiveSearch() | Chunk embeddings | (none — built-in) |
entityEnhancedSearch() | Chunks → HAS_ENTITY → entities → RELATES_TO | chunkEntityGeneration() |
localCommunitySearch() | Entity graph + neighborhoods + communities | chunkEntityGeneration() + entityCommunityGeneration() |
globalCommunitySearch() | Community reports | entityCommunityGeneration() |
similarChunkTraversalSearch() | Chunk similarity graph (BFS) | similarChunkConnection() |
hypotheticalQuestionSearch() | Question embeddings | hypotheticalQuestionGeneration() |
const { text } = await query({
graph,
search: localCommunitySearch({ topK: 10 }),
prompt: "Who is Sarah Chen?",
});Search strategies can override the model or embedding:
const { text } = await query({
graph,
search: localCommunitySearch({ model: openai("gpt-4o") }),
prompt: "Complex question requiring stronger model",
});Entities and Relationships
Entities
Entities are nodes in the knowledge graph extracted by chunkEntityGeneration():
- Name — human-readable identifier (e.g. "TECHCORP")
- Type — category (e.g. "organization", "person")
- Description — LLM-generated summary
- Source — which chunks this entity was extracted from
Relationships
Relationships are edges connecting entities:
- Source → Target — directed connection
- Description — what the relationship means
- Weight — importance/frequency score
Storage Interfaces
GraphStore
interface GraphStore {
upsertNode(nodeId: string, nodeData: GNodeData): Promise<void>;
getNode(nodeId: string): Promise<GNodeData | null>;
upsertEdge(sourceId: string, targetId: string, edgeData: GEdgeData): Promise<void>;
getEdge(sourceId: string, targetId: string): Promise<GEdgeData | null>;
getNodeEdges(nodeId: string): Promise<Array<[string, string]> | null>;
clustering(algorithm: string): Promise<void>;
communitySchema(): Promise<Record<string, SingleCommunitySchema>>;
close(): Promise<void>;
}VectorStore
interface VectorStore {
createIndex(params: CreateIndexParams): Promise<void>;
listIndexes(): Promise<string[]>;
upsert(params: UpsertVectorParams): Promise<string[]>;
query(params: QueryVectorParams): Promise<QueryResult[]>;
}Supports multiple named indexes within a single store (e.g. chunks, entities, questions).
KVStore
interface KVStore<T = any> {
getById(id: string): Promise<T | null>;
getByIds(ids: string[]): Promise<Array<T | null>>;
upsert(data: Record<string, T>): Promise<void>;
drop(): Promise<void>;
}Multi-Tenancy with Namespaces
Use namespaces to isolate data in shared storage:
const graph1 = createGraph({
graph: lexicalGraph({ ... }),
namespace: "project-a",
});
const graph2 = createGraph({
graph: lexicalGraph({ ... }),
namespace: "project-b",
});The namespace is passed to storage factories, so each namespace gets its own isolated stores.
Pipeline Metadata Validation
When a pipeline is configured and insert() runs for the first time, the pipeline config is serialized and stored. On subsequent inserts:
- Same config → proceed (append to existing graph)
- Empty pipeline → proceed (reuse cached graph, just add chunks)
- Different config → throw
PipelineConfigMismatchError
This prevents accidentally corrupting a graph with incompatible processor configs.
Next Steps
- API Reference - Detailed API documentation
- Storage Options - Configure storage backends