Entity Enhanced Search β
The Entity Enhanced Search strategy combines chunk vector search with entity graph traversal to retrieve richer context. Chunks provide the base text; entity relationships reveal connections between real-world concepts spread across different chunks.
Status: β Available Now
Overview β
Entity Enhanced Search improves on basic vector search by traversing the entity graph:
- Vector search for top-K most similar chunks
- Traverse
HAS_ENTITYedges to find entities linked to those chunks - Follow
RELATES_TOedges (BFS) to discover related entities - Assemble context with chunk text + entity descriptions + relationships
This retrieves both the original text AND the relational context between concepts, enabling the LLM to answer questions that require understanding connections across multiple chunks.
Based on the Graph-Enhanced Vector Search pattern from the Neo4j GraphRAG Pattern Catalog.
Installation β
pnpm add @graphrag-sdk/lexicalQuick Start β
import { createGraph, query } from 'graphrag-sdk';
import { lexicalGraph, chunkEntityGeneration, entityEnhancedSearch } from '@graphrag-sdk/lexical';
import { inMemoryGraph, inMemoryVector, inMemoryKV } from '@graphrag-sdk/in-memory-storage';
import { openai } from '@ai-sdk/openai';
const graph = createGraph({
graph: lexicalGraph({
graphStoreFactory: inMemoryGraph,
vectorStoreFactory: inMemoryVector,
kvStoreFactory: inMemoryKV,
pipeline: [chunkEntityGeneration()],
}),
model: openai('gpt-4o-mini'),
embedding: openai.embedding('text-embedding-3-small'),
namespace: 'docs',
});
await graph.insert([
'TechCorp implements responsible AI through bias audits and model cards.',
'Sarah Chen leads the AI ethics committee at TechCorp.',
'The ethics committee publishes quarterly transparency reports.',
]);
const { text } = await query({
graph,
search: entityEnhancedSearch({ topK: 5, maxDepth: 2 }),
prompt: 'How does TechCorp approach responsible AI?',
});
console.log(text);Prerequisites β
The chunkEntityGeneration() processor must be in the pipeline. It creates:
- Chunk nodes in the graph store (
nodeType: 'chunk', withcontent,fullDocId) - Entity nodes in the graph store (with
entity_type,description,source_id) HAS_ENTITYedges from entity nodes to chunk nodesRELATES_TOedges between entity nodes (withweight,description)- Chunk embeddings in the
chunksvector index - Entity embeddings in the
entitiesvector index
Configuration β
entityEnhancedSearch(options) β
interface EntityEnhancedSearchOptions {
topK?: number; // default: 10
maxDepth?: number; // default: 2
maxTokens?: number; // default: 12000
maxChunkTokens?: number; // default: 6000
maxEntityTokens?: number; // default: 6000
model?: LanguageModel;
embedding?: EmbeddingModel;
}Parameters:
| Option | Default | Description |
|---|---|---|
topK | 10 | Number of top chunks from vector search |
maxDepth | 2 | Depth of RELATES_TO traversal from seed entities |
maxTokens | 12000 | Total context budget (tokens) |
maxChunkTokens | 6000 | Budget for the chunk text section |
maxEntityTokens | 6000 | Budget for entity descriptions + relationships |
topK β
Controls how many seed chunks are retrieved from vector search:
// Focused retrieval
entityEnhancedSearch({ topK: 5 })
// Broad retrieval
entityEnhancedSearch({ topK: 20 })maxDepth β
Controls how far to traverse RELATES_TO edges from seed entities:
0: Only entities directly connected to matched chunks1: Entities + their immediate neighbors2: Two hops of relationship traversal (recommended)3+: Deeper traversal (may include less relevant entities)
// Shallow β only direct entities
entityEnhancedSearch({ maxDepth: 0 })
// Deep β discover distant connections
entityEnhancedSearch({ maxDepth: 3 })Token Budgets β
The context is split into two sections, each with its own budget:
// More text, fewer entities
entityEnhancedSearch({ maxChunkTokens: 8000, maxEntityTokens: 4000 })
// More entities, less text
entityEnhancedSearch({ maxChunkTokens: 4000, maxEntityTokens: 8000 })How It Works β
1. Query Processing β
Input: User question
β
Embed question
β
Vector search β Top K chunks (from `chunks` index)
β
For each chunk, traverse HAS_ENTITY edges β seed entities
β
BFS on RELATES_TO edges (up to maxDepth) β neighbor entities
β
Load chunk text from KV store
β
Load entity descriptions + relationship data
β
Assemble structured context (Sources + Entities + Relationships)
β
Send context + question to LLM
β
Result: Answer2. Entity Traversal β
Starting from matched chunks, the algorithm discovers entities and their connections:
Matched Chunks (vector search)
β HAS_ENTITY
Seed Entities (directly linked to chunks)
β RELATES_TO (depth 1)
Neighbor Entities
β RELATES_TO (depth 2)
Second-hop EntitiesContext Format β
The assembled context contains three CSV-formatted sections:
-----Sources-----
```csv
id,content
0,"TechCorp implements responsible AI through bias audits..."
1,"Sarah Chen leads the AI ethics committee at TechCorp..."-----Entities-----
id,entity,type,description,rank
0,TECHCORP,ORGANIZATION,"Technology company implementing responsible AI",3
1,SARAH CHEN,PERSON,"Leader of the AI ethics committee at TechCorp",2
2,AI ETHICS COMMITTEE,ORGANIZATION,"Committee publishing quarterly transparency reports",2-----Relationships-----
id,source,target,description,weight,rank
0,SARAH CHEN,AI ETHICS COMMITTEE,"Leads the committee",1.0,2
1,AI ETHICS COMMITTEE,TECHCORP,"Part of TechCorp",1.0,3
## Graph Structure
### NodesChunk βββHAS_ENTITYββ Entity ββRELATES_TOβββΆ Entity
**Chunk node:**
- `nodeType: 'chunk'`
- `content: string`
- `fullDocId: string`
- `chunkOrderIndex: number`
**Entity node:**
- `entity_type: string`
- `description: string`
- `source_id: string` (chunk IDs, delimited)
### Edges
**`HAS_ENTITY`** (entity β chunk):
- `edgeType: 'HAS_ENTITY'`
- `source_id: string`
**`RELATES_TO`** (entity β entity):
- `edgeType: 'RELATES_TO'`
- `weight: number`
- `description: string`
## Usage Examples
### Basic Usage
```typescript
const { text } = await query({
graph,
search: entityEnhancedSearch(),
prompt: 'What is the relationship between TechCorp and Sarah Chen?',
});Tuning Depth and Token Budgets β
// Deep traversal with large entity budget
const { text } = await query({
graph,
search: entityEnhancedSearch({
topK: 10,
maxDepth: 3,
maxChunkTokens: 4000,
maxEntityTokens: 8000,
}),
prompt: 'Map out all the organizational relationships.',
});With Custom Storage β
import { neo4jGraph } from '@graphrag-sdk/neo4j';
import { qdrantVector } from '@graphrag-sdk/qdrant';
import { redisKV } from '@graphrag-sdk/redis';
const graph = createGraph({
graph: lexicalGraph({
graphStoreFactory: () => neo4jGraph({ url: 'bolt://localhost:7687' }),
vectorStoreFactory: () => qdrantVector({ url: 'http://localhost:6333' }),
kvStoreFactory: () => redisKV({ host: 'localhost', port: 6379 }),
pipeline: [chunkEntityGeneration()],
}),
model: openai('gpt-4o-mini'),
embedding: openai.embedding('text-embedding-3-small'),
namespace: 'production',
});Return Metadata β
The RetrievalResult.metadata includes:
metadata: {
chunkCount: number; // chunks included in context
entityCount: number; // entities found via HAS_ENTITY
relationshipCount: number; // RELATES_TO edges traversed
neighborCount: number; // entities discovered via RELATES_TO
}Error Handling β
- If no chunks match the query, returns
FAIL_RESPONSEwith zeroed metadata - If no
HAS_ENTITYedges exist (entities not extracted), degrades gracefully to chunk-only context (likenaiveSearch) - If no
RELATES_TOedges are found, returns chunks + entity descriptions without relationships
When to Use β
β Good For β
- Relationship queries β "How are X and Y related?"
- Multi-entity questions β Questions involving multiple concepts across chunks
- Enriching context β When chunk text alone misses important connections
- Entity-centric domains β Documents with many named entities and relationships
β Not Ideal For β
- Simple fact lookups β Use
naiveSearch()instead - Global/thematic questions β Use
globalCommunitySearch()instead - Question-answer mismatch β Use
hypotheticalQuestionSearch()when QβA similarity is low
Comparison with Other Search Strategies β
| Strategy | Starts from | Graph traversal | Best for |
|---|---|---|---|
naiveSearch() | Chunk embeddings | None | Simple fact lookups |
entityEnhancedSearch() | Chunk embeddings | HAS_ENTITY β RELATES_TO (depth 0-2) | Relational context across chunks |
hypotheticalQuestionSearch() | Question embeddings | None | QβA retrieval with low similarity |
localCommunitySearch() | Entity embeddings | Edges + community reports | Community-level summaries |
globalCommunitySearch() | Community reports | None (map-reduce) | High-level corpus questions |
Next Steps β
- Algorithm Overview β Compare all algorithms
- Hypothetical Question Search β Question-to-question retrieval
- Storage Options β Choose your backend
- API Reference β GraphProvider interface
Source Code β
View the implementation: