Core Concepts

Understanding the key concepts and architecture of GraphRAG SDK.

Architecture Overview

GraphRAG SDK is built on four core abstractions:

┌─────────────────────────────────────────────────┐
│              Graph (User API)                   │
│          createGraph() + query()                │
└─────────────────────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
  ┌──────────┐  ┌──────────┐  ┌─────────┐
  │ Provider │  │ Storage  │  │   AI    │
  │ +Pipeline│  │ Factories│  │  SDK    │
  └──────────┘  └──────────┘  └─────────┘

1. Graph

createGraph() wires up the provider with models and namespace. query() runs a search strategy and generates an answer.

typescript

const graph = createGraph({ graph: provider, model, embedding, namespace });
await graph.insert(documents);
const { text } = await query({ graph, search: strategy, prompt: "..." });

2. Provider + Pipeline

The provider (e.g. lexicalGraph()) defines how the graph is built. It accepts a pipeline of composable processors that run during insert():

insert() pipeline:
  1. Store chunks → KV + vector DB          (always, built-in)
  2. Run pipeline processors in order       (configurable)

Each processor builds on what previous processors created:

Processor	Reads → Produces
`chunkEntityGeneration()`	Chunks → entity nodes, relationship edges
`entityCommunityGeneration()`	Entity graph → communities, reports
`similarChunkConnection()`	Chunk embeddings → SIMILAR_TO edges
`hypotheticalQuestionGeneration()`	Chunks → question embeddings

3. Storage Factories

Storage backends are passed as factory functions that accept a namespace and return a store instance:

typescript

graphStoreFactory: (ns) => neo4jGraph({ uri: "bolt://localhost", namespace: ns }),
vectorStoreFactory: (ns) => qdrantVector({ url: "http://localhost:6333", collection: ns }),
kvStoreFactory: (ns) => redisKV({ url: "redis://localhost:6379", prefix: ns }),

Three storage types:

GraphStore — nodes, edges, clustering, community schema
VectorStore — embeddings with multiple named indexes (chunks, entities, questions)
KVStore — key-value metadata and chunk content

4. AI SDK Integration

GraphRAG SDK uses the Vercel AI SDK for:

LLM calls (answer generation, entity extraction, community reports)
Embeddings (vector representations for search)

Any provider supported by the AI SDK works: OpenAI, Anthropic, Google, etc.

The Insert Pipeline

When you call graph.insert():

Input Text
    │
    ▼
Chunking (configurable size + overlap)
    │
    ▼
Store chunks → KV store + vector DB ('chunks' index)
    │
    ▼
Run pipeline processors in order:
    ├─ chunkEntityGeneration()     → entities + relations in graph store
    ├─ entityCommunityGeneration() → Leiden clustering + community reports
    ├─ similarChunkConnection()    → SIMILAR_TO edges between chunks
    └─ hypotheticalQuestionGeneration() → question embeddings in 'questions' index

Pipeline metadata is persisted in the KV store. On subsequent inserts, the pipeline config is validated — mismatched configs throw PipelineConfigMismatchError.

Search Strategies

query() accepts a search strategy that retrieves context from the graph. Each strategy leverages different parts of the pipeline:

Strategy	What it searches	Requires processor
`naiveSearch()`	Chunk embeddings	(none — built-in)
`entityEnhancedSearch()`	Chunks → HAS_ENTITY → entities → RELATES_TO	`chunkEntityGeneration()`
`localCommunitySearch()`	Entity graph + neighborhoods + communities	`chunkEntityGeneration()` + `entityCommunityGeneration()`
`globalCommunitySearch()`	Community reports	`entityCommunityGeneration()`
`similarChunkTraversalSearch()`	Chunk similarity graph (BFS)	`similarChunkConnection()`
`hypotheticalQuestionSearch()`	Question embeddings	`hypotheticalQuestionGeneration()`

typescript

const { text } = await query({
  graph,
  search: localCommunitySearch({ topK: 10 }),
  prompt: "Who is Sarah Chen?",
});

Search strategies can override the model or embedding:

typescript

const { text } = await query({
  graph,
  search: localCommunitySearch({ model: openai("gpt-4o") }),
  prompt: "Complex question requiring stronger model",
});

Entities and Relationships

Entities

Entities are nodes in the knowledge graph extracted by chunkEntityGeneration():

Name — human-readable identifier (e.g. "TECHCORP")
Type — category (e.g. "organization", "person")
Description — LLM-generated summary
Source — which chunks this entity was extracted from

Relationships

Relationships are edges connecting entities:

Source → Target — directed connection
Description — what the relationship means
Weight — importance/frequency score

Storage Interfaces

GraphStore

typescript

interface GraphStore {
  upsertNode(nodeId: string, nodeData: GNodeData): Promise<void>;
  getNode(nodeId: string): Promise<GNodeData | null>;
  upsertEdge(sourceId: string, targetId: string, edgeData: GEdgeData): Promise<void>;
  getEdge(sourceId: string, targetId: string): Promise<GEdgeData | null>;
  getNodeEdges(nodeId: string): Promise<Array<[string, string]> | null>;
  clustering(algorithm: string): Promise<void>;
  communitySchema(): Promise<Record<string, SingleCommunitySchema>>;
  close(): Promise<void>;
}

VectorStore

typescript

interface VectorStore {
  createIndex(params: CreateIndexParams): Promise<void>;
  listIndexes(): Promise<string[]>;
  upsert(params: UpsertVectorParams): Promise<string[]>;
  query(params: QueryVectorParams): Promise<QueryResult[]>;
}

Supports multiple named indexes within a single store (e.g. chunks, entities, questions).

KVStore

typescript

interface KVStore<T = any> {
  getById(id: string): Promise<T | null>;
  getByIds(ids: string[]): Promise<Array<T | null>>;
  upsert(data: Record<string, T>): Promise<void>;
  drop(): Promise<void>;
}

Multi-Tenancy with Namespaces

Use namespaces to isolate data in shared storage:

typescript

const graph1 = createGraph({
  graph: lexicalGraph({ ... }),
  namespace: "project-a",
});

const graph2 = createGraph({
  graph: lexicalGraph({ ... }),
  namespace: "project-b",
});

The namespace is passed to storage factories, so each namespace gets its own isolated stores.

Pipeline Metadata Validation

When a pipeline is configured and insert() runs for the first time, the pipeline config is serialized and stored. On subsequent inserts:

Same config → proceed (append to existing graph)
Empty pipeline → proceed (reuse cached graph, just add chunks)
Different config → throw PipelineConfigMismatchError

This prevents accidentally corrupting a graph with incompatible processor configs.

Next Steps

API Reference - Detailed API documentation
Storage Options - Configure storage backends

Core Concepts ​

Architecture Overview ​

1. Graph ​

2. Provider + Pipeline ​

3. Storage Factories ​

4. AI SDK Integration ​

The Insert Pipeline ​

Search Strategies ​

Entities and Relationships ​

Entities ​

Relationships ​

Storage Interfaces ​

GraphStore ​

VectorStore ​

KVStore ​

Multi-Tenancy with Namespaces ​

Pipeline Metadata Validation ​

Next Steps ​

Core Concepts

Architecture Overview

1. Graph

2. Provider + Pipeline

3. Storage Factories

4. AI SDK Integration

The Insert Pipeline

Search Strategies

Entities and Relationships

Entities

Relationships

Storage Interfaces

GraphStore

VectorStore

KVStore

Multi-Tenancy with Namespaces

Pipeline Metadata Validation

Next Steps