Hypothetical Question Search β
The Hypothetical Question Search strategy matches user questions against pre-generated hypothetical question embeddings instead of chunk text embeddings. Question-to-question similarity is higher than question-to-text similarity, yielding better retrieval.
Status: β Available Now
Overview β
Standard vector search embeds the user's question and compares it against chunk text embeddings. The problem: questions and their answers often live in different regions of the embedding space. Hypothetical Question Search solves this by:
- Pre-generate questions that each chunk can answer (during indexing)
- Embed those questions into a dedicated
questionsvector index - At query time, search question-to-question instead of question-to-text
- Deduplicate matched questions back to their parent chunks
Based on the Hypothetical Question Retriever pattern from the Neo4j GraphRAG Pattern Catalog.
Installation β
pnpm add @graphrag-sdk/lexicalQuick Start β
import { createGraph, query } from 'graphrag-sdk';
import {
lexicalGraph,
chunkEntityGeneration,
hypotheticalQuestionGeneration,
hypotheticalQuestionSearch,
} from '@graphrag-sdk/lexical';
import { inMemoryGraph, inMemoryVector, inMemoryKV } from '@graphrag-sdk/in-memory-storage';
import { openai } from '@ai-sdk/openai';
const graph = createGraph({
graph: lexicalGraph({
graphStoreFactory: inMemoryGraph,
vectorStoreFactory: inMemoryVector,
kvStoreFactory: inMemoryKV,
pipeline: [
chunkEntityGeneration(),
hypotheticalQuestionGeneration({ questionsPerChunk: 3 }),
],
}),
model: openai('gpt-4o-mini'),
embedding: openai.embedding('text-embedding-3-small'),
namespace: 'docs',
});
await graph.insert([
'TechCorp implements responsible AI through bias audits and model cards.',
'Sarah Chen leads the AI ethics committee at TechCorp.',
'The ethics committee publishes quarterly transparency reports.',
]);
// Search matches question embeddings, not chunk text
const { text } = await query({
graph,
search: hypotheticalQuestionSearch({ topK: 10 }),
prompt: 'How does TechCorp approach responsible AI?',
});
console.log(text);Prerequisites β
The hypotheticalQuestionGeneration() processor must be in the pipeline. It creates:
- Question embeddings in the
questionsvector index (each with{ question, chunkId }metadata) - Question nodes in the graph store (
nodeType: 'question', withchunkId) HAS_QUESTIONedges from question nodes to parent chunk nodes
Configuration β
hypotheticalQuestionSearch(options) β
interface HypotheticalQuestionSearchOptions {
topK?: number; // default: 10
maxTokens?: number; // default: 12000
model?: LanguageModel;
embedding?: EmbeddingModel;
}Parameters:
| Option | Default | Description |
|---|---|---|
topK | 10 | Number of top matching questions from vector search |
maxTokens | 12000 | Context window budget for assembled chunks |
topK β
Controls how many pre-generated questions are retrieved:
// Focused β fewer questions, faster
hypotheticalQuestionSearch({ topK: 5 })
// Broad β more questions, better recall
hypotheticalQuestionSearch({ topK: 20 })Multiple matched questions may point to the same parent chunk. The strategy deduplicates by chunkId, keeping the highest similarity score per chunk.
maxTokens β
Controls the total token budget for the assembled context:
// Smaller context window
hypotheticalQuestionSearch({ maxTokens: 8000 })
// Larger context window
hypotheticalQuestionSearch({ maxTokens: 16000 })How It Works β
1. Indexing (Pre-generation) β
Input: Chunks from document processing
β
For each chunk, LLM generates N hypothetical questions
β
Embed each question
β
Store in `questions` vector index with { question, chunkId } metadata
β
Create Question nodes + HAS_QUESTION edges in graph2. Query Processing β
Input: User question
β
Embed question
β
Vector search on `questions` index β Top K matching questions
β
Deduplicate by chunkId (keep highest score per chunk)
β
Load parent chunk text from KV store
β
Assemble context (ordered by score, truncated to maxTokens)
β
Send context + question to LLM
β
Result: AnswerWhy It Works β
Consider this chunk: "The Leiden algorithm detects communities by optimizing modularity through iterative node movement between partitions."
A user might ask: "How do you find clusters in a graph?"
The cosine similarity between the question and chunk text is low β different vocabulary, different framing. But if we pre-generate the hypothetical question "How does the Leiden algorithm detect communities?", the question-to-question similarity is much higher.
Graph Structure β
Question ββHAS_QUESTIONβββΆ Chunk ββPART_OFβββΆ DocumentQuestion node:
nodeType: 'question'chunkId: string(reference to parent chunk)
Question vector index entry:
embedding: number[](question embedding)metadata: { question: string, chunkId: string }
Matches the Neo4j retrieval query pattern:
MATCH (node)<-[:HAS_QUESTION]-(chunk)
WITH chunk, max(score) AS score
RETURN chunk.text AS text, scoreUsage Examples β
Basic Usage β
const { text } = await query({
graph,
search: hypotheticalQuestionSearch(),
prompt: 'What compliance requirements does TechCorp follow?',
});Generating More Questions Per Chunk β
More questions per chunk increases recall but costs more during indexing:
const graph = createGraph({
graph: lexicalGraph({
graphStoreFactory: inMemoryGraph,
vectorStoreFactory: inMemoryVector,
kvStoreFactory: inMemoryKV,
pipeline: [
chunkEntityGeneration(),
hypotheticalQuestionGeneration({ questionsPerChunk: 5 }), // More questions
],
}),
model: openai('gpt-4o-mini'),
embedding: openai.embedding('text-embedding-3-small'),
namespace: 'docs',
});Combining with Entity Enhanced Search β
You can use different search strategies for different queries:
// Use hypothetical question search for Q&A
const qaResult = await query({
graph,
search: hypotheticalQuestionSearch({ topK: 10 }),
prompt: 'What is TechCorp?',
});
// Use entity enhanced search for relationship queries
const relResult = await query({
graph,
search: entityEnhancedSearch({ topK: 5, maxDepth: 2 }),
prompt: 'How are TechCorp and Sarah Chen related?',
});Return Metadata β
The RetrievalResult.metadata includes:
metadata: {
matchedQuestions: number; // questions found by vector search
uniqueChunks: number; // deduplicated parent chunks
}Error Handling β
- If the
questionsvector index doesn't exist (processor wasn't run), returnsFAIL_RESPONSEwith{ matchedQuestions: 0, uniqueChunks: 0 } - If no matching questions are found, returns
FAIL_RESPONSEcontext
When to Use β
β Good For β
- Q&A retrieval β When users ask questions and the answers are in chunk text
- Low question-to-answer similarity β When questions use different vocabulary than the source text
- FAQ-style knowledge bases β Documents that naturally answer specific questions
- Technical documentation β Where users ask about concepts described with different terminology
β Not Ideal For β
- Relationship queries β Use
entityEnhancedSearch()instead - Global/thematic questions β Use
globalCommunitySearch()instead - Cost-sensitive indexing β Pre-generating questions adds LLM calls during indexing
- Frequently updated corpora β Questions must be regenerated when chunks change
Comparison with Other Search Strategies β
| Strategy | Searches against | Deduplication | Best for |
|---|---|---|---|
naiveSearch() | Chunk text embeddings | None | Simple fact lookups |
hypotheticalQuestionSearch() | Pre-generated question embeddings | By parent chunk (max score) | Q&A where QβA similarity is low |
entityEnhancedSearch() | Chunk embeddings β entity graph | None | Enriching with entity relationships |
similarChunkTraversalSearch() | Chunk embeddings + graph traversal | By visited node | Connected context across documents |
localCommunitySearch() | Entity embeddings | None | Community-level summaries |
globalCommunitySearch() | Community reports | None | High-level corpus questions |
Performance Characteristics β
Indexing Cost β
| Questions Per Chunk | LLM Calls | Embedding Calls |
|---|---|---|
| 3 (default) | 1 per chunk | 3 per chunk |
| 5 | 1 per chunk | 5 per chunk |
| 10 | 1 per chunk | 10 per chunk |
Query Speed β
| Operation | Complexity |
|---|---|
| Vector search on questions index | O(log N) |
| Deduplication | O(K) where K = topK |
| Chunk retrieval from KV | O(unique chunks) |
Next Steps β
- Algorithm Overview β Compare all algorithms
- Entity Enhanced Search β Graph-enhanced retrieval
- Storage Options β Choose your backend
- API Reference β GraphProvider interface
Source Code β
View the implementation: