Skip to content

Entity Enhanced Search ​

The Entity Enhanced Search strategy combines chunk vector search with entity graph traversal to retrieve richer context. Chunks provide the base text; entity relationships reveal connections between real-world concepts spread across different chunks.

Status: βœ… Available Now

Overview ​

Entity Enhanced Search improves on basic vector search by traversing the entity graph:

  1. Vector search for top-K most similar chunks
  2. Traverse HAS_ENTITY edges to find entities linked to those chunks
  3. Follow RELATES_TO edges (BFS) to discover related entities
  4. Assemble context with chunk text + entity descriptions + relationships

This retrieves both the original text AND the relational context between concepts, enabling the LLM to answer questions that require understanding connections across multiple chunks.

Based on the Graph-Enhanced Vector Search pattern from the Neo4j GraphRAG Pattern Catalog.

Installation ​

bash
pnpm add @graphrag-sdk/lexical

Quick Start ​

typescript
import { createGraph, query } from 'graphrag-sdk';
import { lexicalGraph, chunkEntityGeneration, entityEnhancedSearch } from '@graphrag-sdk/lexical';
import { inMemoryGraph, inMemoryVector, inMemoryKV } from '@graphrag-sdk/in-memory-storage';
import { openai } from '@ai-sdk/openai';

const graph = createGraph({
  graph: lexicalGraph({
    graphStoreFactory: inMemoryGraph,
    vectorStoreFactory: inMemoryVector,
    kvStoreFactory: inMemoryKV,
    pipeline: [chunkEntityGeneration()],
  }),
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  namespace: 'docs',
});

await graph.insert([
  'TechCorp implements responsible AI through bias audits and model cards.',
  'Sarah Chen leads the AI ethics committee at TechCorp.',
  'The ethics committee publishes quarterly transparency reports.',
]);

const { text } = await query({
  graph,
  search: entityEnhancedSearch({ topK: 5, maxDepth: 2 }),
  prompt: 'How does TechCorp approach responsible AI?',
});

console.log(text);

Prerequisites ​

The chunkEntityGeneration() processor must be in the pipeline. It creates:

  • Chunk nodes in the graph store (nodeType: 'chunk', with content, fullDocId)
  • Entity nodes in the graph store (with entity_type, description, source_id)
  • HAS_ENTITY edges from entity nodes to chunk nodes
  • RELATES_TO edges between entity nodes (with weight, description)
  • Chunk embeddings in the chunks vector index
  • Entity embeddings in the entities vector index

Configuration ​

entityEnhancedSearch(options) ​

typescript
interface EntityEnhancedSearchOptions {
  topK?: number;            // default: 10
  maxDepth?: number;        // default: 2
  maxTokens?: number;       // default: 12000
  maxChunkTokens?: number;  // default: 6000
  maxEntityTokens?: number; // default: 6000
  model?: LanguageModel;
  embedding?: EmbeddingModel;
}

Parameters:

OptionDefaultDescription
topK10Number of top chunks from vector search
maxDepth2Depth of RELATES_TO traversal from seed entities
maxTokens12000Total context budget (tokens)
maxChunkTokens6000Budget for the chunk text section
maxEntityTokens6000Budget for entity descriptions + relationships

topK ​

Controls how many seed chunks are retrieved from vector search:

typescript
// Focused retrieval
entityEnhancedSearch({ topK: 5 })

// Broad retrieval
entityEnhancedSearch({ topK: 20 })

maxDepth ​

Controls how far to traverse RELATES_TO edges from seed entities:

  • 0: Only entities directly connected to matched chunks
  • 1: Entities + their immediate neighbors
  • 2: Two hops of relationship traversal (recommended)
  • 3+: Deeper traversal (may include less relevant entities)
typescript
// Shallow β€” only direct entities
entityEnhancedSearch({ maxDepth: 0 })

// Deep β€” discover distant connections
entityEnhancedSearch({ maxDepth: 3 })

Token Budgets ​

The context is split into two sections, each with its own budget:

typescript
// More text, fewer entities
entityEnhancedSearch({ maxChunkTokens: 8000, maxEntityTokens: 4000 })

// More entities, less text
entityEnhancedSearch({ maxChunkTokens: 4000, maxEntityTokens: 8000 })

How It Works ​

1. Query Processing ​

Input: User question
  ↓
Embed question
  ↓
Vector search β†’ Top K chunks (from `chunks` index)
  ↓
For each chunk, traverse HAS_ENTITY edges β†’ seed entities
  ↓
BFS on RELATES_TO edges (up to maxDepth) β†’ neighbor entities
  ↓
Load chunk text from KV store
  ↓
Load entity descriptions + relationship data
  ↓
Assemble structured context (Sources + Entities + Relationships)
  ↓
Send context + question to LLM
  ↓
Result: Answer

2. Entity Traversal ​

Starting from matched chunks, the algorithm discovers entities and their connections:

Matched Chunks (vector search)
  ↓ HAS_ENTITY
Seed Entities (directly linked to chunks)
  ↓ RELATES_TO (depth 1)
Neighbor Entities
  ↓ RELATES_TO (depth 2)
Second-hop Entities

Context Format ​

The assembled context contains three CSV-formatted sections:

-----Sources-----
```csv
id,content
0,"TechCorp implements responsible AI through bias audits..."
1,"Sarah Chen leads the AI ethics committee at TechCorp..."

-----Entities-----

csv
id,entity,type,description,rank
0,TECHCORP,ORGANIZATION,"Technology company implementing responsible AI",3
1,SARAH CHEN,PERSON,"Leader of the AI ethics committee at TechCorp",2
2,AI ETHICS COMMITTEE,ORGANIZATION,"Committee publishing quarterly transparency reports",2

-----Relationships-----

csv
id,source,target,description,weight,rank
0,SARAH CHEN,AI ETHICS COMMITTEE,"Leads the committee",1.0,2
1,AI ETHICS COMMITTEE,TECHCORP,"Part of TechCorp",1.0,3

## Graph Structure

### Nodes

Chunk ◀──HAS_ENTITY── Entity ──RELATES_TO──▢ Entity


**Chunk node:**
- `nodeType: 'chunk'`
- `content: string`
- `fullDocId: string`
- `chunkOrderIndex: number`

**Entity node:**
- `entity_type: string`
- `description: string`
- `source_id: string` (chunk IDs, delimited)

### Edges

**`HAS_ENTITY`** (entity β†’ chunk):
- `edgeType: 'HAS_ENTITY'`
- `source_id: string`

**`RELATES_TO`** (entity β†’ entity):
- `edgeType: 'RELATES_TO'`
- `weight: number`
- `description: string`

## Usage Examples

### Basic Usage

```typescript
const { text } = await query({
  graph,
  search: entityEnhancedSearch(),
  prompt: 'What is the relationship between TechCorp and Sarah Chen?',
});

Tuning Depth and Token Budgets ​

typescript
// Deep traversal with large entity budget
const { text } = await query({
  graph,
  search: entityEnhancedSearch({
    topK: 10,
    maxDepth: 3,
    maxChunkTokens: 4000,
    maxEntityTokens: 8000,
  }),
  prompt: 'Map out all the organizational relationships.',
});

With Custom Storage ​

typescript
import { neo4jGraph } from '@graphrag-sdk/neo4j';
import { qdrantVector } from '@graphrag-sdk/qdrant';
import { redisKV } from '@graphrag-sdk/redis';

const graph = createGraph({
  graph: lexicalGraph({
    graphStoreFactory: () => neo4jGraph({ url: 'bolt://localhost:7687' }),
    vectorStoreFactory: () => qdrantVector({ url: 'http://localhost:6333' }),
    kvStoreFactory: () => redisKV({ host: 'localhost', port: 6379 }),
    pipeline: [chunkEntityGeneration()],
  }),
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  namespace: 'production',
});

Return Metadata ​

The RetrievalResult.metadata includes:

typescript
metadata: {
  chunkCount: number;         // chunks included in context
  entityCount: number;        // entities found via HAS_ENTITY
  relationshipCount: number;  // RELATES_TO edges traversed
  neighborCount: number;      // entities discovered via RELATES_TO
}

Error Handling ​

  • If no chunks match the query, returns FAIL_RESPONSE with zeroed metadata
  • If no HAS_ENTITY edges exist (entities not extracted), degrades gracefully to chunk-only context (like naiveSearch)
  • If no RELATES_TO edges are found, returns chunks + entity descriptions without relationships

When to Use ​

βœ… Good For ​

  • Relationship queries β€” "How are X and Y related?"
  • Multi-entity questions β€” Questions involving multiple concepts across chunks
  • Enriching context β€” When chunk text alone misses important connections
  • Entity-centric domains β€” Documents with many named entities and relationships

❌ Not Ideal For ​

  • Simple fact lookups β€” Use naiveSearch() instead
  • Global/thematic questions β€” Use globalCommunitySearch() instead
  • Question-answer mismatch β€” Use hypotheticalQuestionSearch() when Q↔A similarity is low

Comparison with Other Search Strategies ​

StrategyStarts fromGraph traversalBest for
naiveSearch()Chunk embeddingsNoneSimple fact lookups
entityEnhancedSearch()Chunk embeddingsHAS_ENTITY β†’ RELATES_TO (depth 0-2)Relational context across chunks
hypotheticalQuestionSearch()Question embeddingsNoneQ↔A retrieval with low similarity
localCommunitySearch()Entity embeddingsEdges + community reportsCommunity-level summaries
globalCommunitySearch()Community reportsNone (map-reduce)High-level corpus questions

Next Steps ​

Source Code ​

View the implementation:

Released under the Elastic License 2.0. Made with ❀️ by Narek.