Hypothetical Question Search

The Hypothetical Question Search strategy matches user questions against pre-generated hypothetical question embeddings instead of chunk text embeddings. Question-to-question similarity is higher than question-to-text similarity, yielding better retrieval.

Status: ✅ Available Now

Overview

Standard vector search embeds the user's question and compares it against chunk text embeddings. The problem: questions and their answers often live in different regions of the embedding space. Hypothetical Question Search solves this by:

Pre-generate questions that each chunk can answer (during indexing)
Embed those questions into a dedicated questions vector index
At query time, search question-to-question instead of question-to-text
Deduplicate matched questions back to their parent chunks

Based on the Hypothetical Question Retriever pattern from the Neo4j GraphRAG Pattern Catalog.

Installation

bash

pnpm add @graphrag-sdk/lexical

Quick Start

typescript

import { createGraph, query } from 'graphrag-sdk';
import {
  lexicalGraph,
  chunkEntityGeneration,
  hypotheticalQuestionGeneration,
  hypotheticalQuestionSearch,
} from '@graphrag-sdk/lexical';
import { inMemoryGraph, inMemoryVector, inMemoryKV } from '@graphrag-sdk/in-memory-storage';
import { openai } from '@ai-sdk/openai';

const graph = createGraph({
  graph: lexicalGraph({
    graphStoreFactory: inMemoryGraph,
    vectorStoreFactory: inMemoryVector,
    kvStoreFactory: inMemoryKV,
    pipeline: [
      chunkEntityGeneration(),
      hypotheticalQuestionGeneration({ questionsPerChunk: 3 }),
    ],
  }),
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  namespace: 'docs',
});

await graph.insert([
  'TechCorp implements responsible AI through bias audits and model cards.',
  'Sarah Chen leads the AI ethics committee at TechCorp.',
  'The ethics committee publishes quarterly transparency reports.',
]);

// Search matches question embeddings, not chunk text
const { text } = await query({
  graph,
  search: hypotheticalQuestionSearch({ topK: 10 }),
  prompt: 'How does TechCorp approach responsible AI?',
});

console.log(text);

Prerequisites

The hypotheticalQuestionGeneration() processor must be in the pipeline. It creates:

Question embeddings in the questions vector index (each with { question, chunkId } metadata)
Question nodes in the graph store (nodeType: 'question', with chunkId)
HAS_QUESTION edges from question nodes to parent chunk nodes

Configuration

`hypotheticalQuestionSearch(options)`

typescript

interface HypotheticalQuestionSearchOptions {
  topK?: number;        // default: 10
  maxTokens?: number;   // default: 12000
  model?: LanguageModel;
  embedding?: EmbeddingModel;
}

Parameters:

Option	Default	Description
`topK`	`10`	Number of top matching questions from vector search
`maxTokens`	`12000`	Context window budget for assembled chunks

`topK`

Controls how many pre-generated questions are retrieved:

typescript

// Focused — fewer questions, faster
hypotheticalQuestionSearch({ topK: 5 })

// Broad — more questions, better recall
hypotheticalQuestionSearch({ topK: 20 })

Multiple matched questions may point to the same parent chunk. The strategy deduplicates by chunkId, keeping the highest similarity score per chunk.

`maxTokens`

Controls the total token budget for the assembled context:

typescript

// Smaller context window
hypotheticalQuestionSearch({ maxTokens: 8000 })

// Larger context window
hypotheticalQuestionSearch({ maxTokens: 16000 })

How It Works

1. Indexing (Pre-generation)

Input: Chunks from document processing
  ↓
For each chunk, LLM generates N hypothetical questions
  ↓
Embed each question
  ↓
Store in `questions` vector index with { question, chunkId } metadata
  ↓
Create Question nodes + HAS_QUESTION edges in graph

2. Query Processing

Input: User question
  ↓
Embed question
  ↓
Vector search on `questions` index → Top K matching questions
  ↓
Deduplicate by chunkId (keep highest score per chunk)
  ↓
Load parent chunk text from KV store
  ↓
Assemble context (ordered by score, truncated to maxTokens)
  ↓
Send context + question to LLM
  ↓
Result: Answer

Why It Works

Consider this chunk: "The Leiden algorithm detects communities by optimizing modularity through iterative node movement between partitions."

A user might ask: "How do you find clusters in a graph?"

The cosine similarity between the question and chunk text is low — different vocabulary, different framing. But if we pre-generate the hypothetical question "How does the Leiden algorithm detect communities?", the question-to-question similarity is much higher.

Graph Structure

Question ──HAS_QUESTION──▶ Chunk ──PART_OF──▶ Document

Question node:

nodeType: 'question'
chunkId: string (reference to parent chunk)

Question vector index entry:

embedding: number[] (question embedding)
metadata: { question: string, chunkId: string }

Matches the Neo4j retrieval query pattern:

cypher

MATCH (node)<-[:HAS_QUESTION]-(chunk)
WITH chunk, max(score) AS score
RETURN chunk.text AS text, score

Usage Examples

Basic Usage

typescript

const { text } = await query({
  graph,
  search: hypotheticalQuestionSearch(),
  prompt: 'What compliance requirements does TechCorp follow?',
});

Generating More Questions Per Chunk

More questions per chunk increases recall but costs more during indexing:

typescript

const graph = createGraph({
  graph: lexicalGraph({
    graphStoreFactory: inMemoryGraph,
    vectorStoreFactory: inMemoryVector,
    kvStoreFactory: inMemoryKV,
    pipeline: [
      chunkEntityGeneration(),
      hypotheticalQuestionGeneration({ questionsPerChunk: 5 }), // More questions
    ],
  }),
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  namespace: 'docs',
});

Combining with Entity Enhanced Search

You can use different search strategies for different queries:

typescript

// Use hypothetical question search for Q&A
const qaResult = await query({
  graph,
  search: hypotheticalQuestionSearch({ topK: 10 }),
  prompt: 'What is TechCorp?',
});

// Use entity enhanced search for relationship queries
const relResult = await query({
  graph,
  search: entityEnhancedSearch({ topK: 5, maxDepth: 2 }),
  prompt: 'How are TechCorp and Sarah Chen related?',
});

Return Metadata

The RetrievalResult.metadata includes:

typescript

metadata: {
  matchedQuestions: number;  // questions found by vector search
  uniqueChunks: number;     // deduplicated parent chunks
}

Error Handling

If the questions vector index doesn't exist (processor wasn't run), returns FAIL_RESPONSE with { matchedQuestions: 0, uniqueChunks: 0 }
If no matching questions are found, returns FAIL_RESPONSE context

When to Use

✅ Good For

Q&A retrieval — When users ask questions and the answers are in chunk text
Low question-to-answer similarity — When questions use different vocabulary than the source text
FAQ-style knowledge bases — Documents that naturally answer specific questions
Technical documentation — Where users ask about concepts described with different terminology

❌ Not Ideal For

Relationship queries — Use entityEnhancedSearch() instead
Global/thematic questions — Use globalCommunitySearch() instead
Cost-sensitive indexing — Pre-generating questions adds LLM calls during indexing
Frequently updated corpora — Questions must be regenerated when chunks change

Comparison with Other Search Strategies

Strategy	Searches against	Deduplication	Best for
`naiveSearch()`	Chunk text embeddings	None	Simple fact lookups
`hypotheticalQuestionSearch()`	Pre-generated question embeddings	By parent chunk (max score)	Q&A where Q↔A similarity is low
`entityEnhancedSearch()`	Chunk embeddings → entity graph	None	Enriching with entity relationships
`similarChunkTraversalSearch()`	Chunk embeddings + graph traversal	By visited node	Connected context across documents
`localCommunitySearch()`	Entity embeddings	None	Community-level summaries
`globalCommunitySearch()`	Community reports	None	High-level corpus questions

Performance Characteristics

Indexing Cost

Questions Per Chunk	LLM Calls	Embedding Calls
3 (default)	1 per chunk	3 per chunk
5	1 per chunk	5 per chunk
10	1 per chunk	10 per chunk

Query Speed

Operation	Complexity
Vector search on questions index	O(log N)
Deduplication	O(K) where K = topK
Chunk retrieval from KV	O(unique chunks)

Next Steps

Algorithm Overview — Compare all algorithms
Entity Enhanced Search — Graph-enhanced retrieval
Storage Options — Choose your backend
API Reference — GraphProvider interface

Source Code

View the implementation:

Hypothetical Question Search ​

Overview ​

Installation ​

Quick Start ​

Prerequisites ​

Configuration ​

hypotheticalQuestionSearch(options) ​

topK ​

maxTokens ​

How It Works ​

1. Indexing (Pre-generation) ​

2. Query Processing ​

Why It Works ​

Graph Structure ​

Usage Examples ​

Basic Usage ​

Generating More Questions Per Chunk ​

Combining with Entity Enhanced Search ​

Return Metadata ​

Error Handling ​

When to Use ​

✅ Good For ​

❌ Not Ideal For ​

Comparison with Other Search Strategies ​

Performance Characteristics ​

Indexing Cost ​

Query Speed ​

Next Steps ​

Source Code ​

Hypothetical Question Search

Overview

Installation

Quick Start

Prerequisites

Configuration

`hypotheticalQuestionSearch(options)`

`topK`

`maxTokens`

How It Works

1. Indexing (Pre-generation)

2. Query Processing

Why It Works

Graph Structure

Usage Examples

Basic Usage

Generating More Questions Per Chunk

Combining with Entity Enhanced Search

Return Metadata

Error Handling

When to Use

✅ Good For

❌ Not Ideal For

Comparison with Other Search Strategies

Performance Characteristics

Indexing Cost

Query Speed

Next Steps

Source Code