Skip to content

Core Concepts

Understanding the key concepts and architecture of GraphRAG SDK.

Architecture Overview

GraphRAG SDK is built on four core abstractions:

┌─────────────────────────────────────────────────┐
│              Graph (User API)                   │
│          createGraph() + query()                │
└─────────────────────────────────────────────────┘

        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
  ┌──────────┐  ┌──────────┐  ┌─────────┐
  │ Provider │  │ Storage  │  │   AI    │
  │ +Pipeline│  │ Factories│  │  SDK    │
  └──────────┘  └──────────┘  └─────────┘

1. Graph

createGraph() wires up the provider with models and namespace. query() runs a search strategy and generates an answer.

typescript
const graph = createGraph({ graph: provider, model, embedding, namespace });
await graph.insert(documents);
const { text } = await query({ graph, search: strategy, prompt: "..." });

2. Provider + Pipeline

The provider (e.g. lexicalGraph()) defines how the graph is built. It accepts a pipeline of composable processors that run during insert():

insert() pipeline:
  1. Store chunks → KV + vector DB          (always, built-in)
  2. Run pipeline processors in order       (configurable)

Each processor builds on what previous processors created:

ProcessorReads → Produces
chunkEntityGeneration()Chunks → entity nodes, relationship edges
entityCommunityGeneration()Entity graph → communities, reports
similarChunkConnection()Chunk embeddings → SIMILAR_TO edges
hypotheticalQuestionGeneration()Chunks → question embeddings

3. Storage Factories

Storage backends are passed as factory functions that accept a namespace and return a store instance:

typescript
graphStoreFactory: (ns) => neo4jGraph({ uri: "bolt://localhost", namespace: ns }),
vectorStoreFactory: (ns) => qdrantVector({ url: "http://localhost:6333", collection: ns }),
kvStoreFactory: (ns) => redisKV({ url: "redis://localhost:6379", prefix: ns }),

Three storage types:

  • GraphStore — nodes, edges, clustering, community schema
  • VectorStore — embeddings with multiple named indexes (chunks, entities, questions)
  • KVStore — key-value metadata and chunk content

4. AI SDK Integration

GraphRAG SDK uses the Vercel AI SDK for:

  • LLM calls (answer generation, entity extraction, community reports)
  • Embeddings (vector representations for search)

Any provider supported by the AI SDK works: OpenAI, Anthropic, Google, etc.

The Insert Pipeline

When you call graph.insert():

Input Text


Chunking (configurable size + overlap)


Store chunks → KV store + vector DB ('chunks' index)


Run pipeline processors in order:
    ├─ chunkEntityGeneration()     → entities + relations in graph store
    ├─ entityCommunityGeneration() → Leiden clustering + community reports
    ├─ similarChunkConnection()    → SIMILAR_TO edges between chunks
    └─ hypotheticalQuestionGeneration() → question embeddings in 'questions' index

Pipeline metadata is persisted in the KV store. On subsequent inserts, the pipeline config is validated — mismatched configs throw PipelineConfigMismatchError.

Search Strategies

query() accepts a search strategy that retrieves context from the graph. Each strategy leverages different parts of the pipeline:

StrategyWhat it searchesRequires processor
naiveSearch()Chunk embeddings(none — built-in)
entityEnhancedSearch()Chunks → HAS_ENTITY → entities → RELATES_TOchunkEntityGeneration()
localCommunitySearch()Entity graph + neighborhoods + communitieschunkEntityGeneration() + entityCommunityGeneration()
globalCommunitySearch()Community reportsentityCommunityGeneration()
similarChunkTraversalSearch()Chunk similarity graph (BFS)similarChunkConnection()
hypotheticalQuestionSearch()Question embeddingshypotheticalQuestionGeneration()
typescript
const { text } = await query({
  graph,
  search: localCommunitySearch({ topK: 10 }),
  prompt: "Who is Sarah Chen?",
});

Search strategies can override the model or embedding:

typescript
const { text } = await query({
  graph,
  search: localCommunitySearch({ model: openai("gpt-4o") }),
  prompt: "Complex question requiring stronger model",
});

Entities and Relationships

Entities

Entities are nodes in the knowledge graph extracted by chunkEntityGeneration():

  • Name — human-readable identifier (e.g. "TECHCORP")
  • Type — category (e.g. "organization", "person")
  • Description — LLM-generated summary
  • Source — which chunks this entity was extracted from

Relationships

Relationships are edges connecting entities:

  • SourceTarget — directed connection
  • Description — what the relationship means
  • Weight — importance/frequency score

Storage Interfaces

GraphStore

typescript
interface GraphStore {
  upsertNode(nodeId: string, nodeData: GNodeData): Promise<void>;
  getNode(nodeId: string): Promise<GNodeData | null>;
  upsertEdge(sourceId: string, targetId: string, edgeData: GEdgeData): Promise<void>;
  getEdge(sourceId: string, targetId: string): Promise<GEdgeData | null>;
  getNodeEdges(nodeId: string): Promise<Array<[string, string]> | null>;
  clustering(algorithm: string): Promise<void>;
  communitySchema(): Promise<Record<string, SingleCommunitySchema>>;
  close(): Promise<void>;
}

VectorStore

typescript
interface VectorStore {
  createIndex(params: CreateIndexParams): Promise<void>;
  listIndexes(): Promise<string[]>;
  upsert(params: UpsertVectorParams): Promise<string[]>;
  query(params: QueryVectorParams): Promise<QueryResult[]>;
}

Supports multiple named indexes within a single store (e.g. chunks, entities, questions).

KVStore

typescript
interface KVStore<T = any> {
  getById(id: string): Promise<T | null>;
  getByIds(ids: string[]): Promise<Array<T | null>>;
  upsert(data: Record<string, T>): Promise<void>;
  drop(): Promise<void>;
}

Multi-Tenancy with Namespaces

Use namespaces to isolate data in shared storage:

typescript
const graph1 = createGraph({
  graph: lexicalGraph({ ... }),
  namespace: "project-a",
});

const graph2 = createGraph({
  graph: lexicalGraph({ ... }),
  namespace: "project-b",
});

The namespace is passed to storage factories, so each namespace gets its own isolated stores.

Pipeline Metadata Validation

When a pipeline is configured and insert() runs for the first time, the pipeline config is serialized and stored. On subsequent inserts:

  • Same config → proceed (append to existing graph)
  • Empty pipeline → proceed (reuse cached graph, just add chunks)
  • Different config → throw PipelineConfigMismatchError

This prevents accidentally corrupting a graph with incompatible processor configs.

Next Steps

Released under the Elastic License 2.0. Made with ❤️ by Narek.