Multi-Hop Abstract
Generate conceptual questions requiring reasoning across contexts
Overview
Multi-hop abstract synthesizers generate complex, conceptual questions that require information from multiple context chunks to answer. The term "multi-hop" refers to the need to "hop" across multiple nodes in your knowledge graph (i.e., multiple document chunks) to find all the information needed for a complete answer.
These questions test both:
- Retrieval breadth - Can your system find all relevant chunks?
- Information synthesis - Can your system combine information from multiple sources into a coherent answer?
Unlike single-hop questions (answerable from one chunk), multi-hop abstract questions focus on understanding relationships, patterns, and high-level concepts that span multiple parts of your documentation.
Creating the Synthesizer
import { createSynthesizer } from '@open-evals/generator'
import { openai } from '@ai-sdk/openai'
const synthesizer = createSynthesizer(openai.chat('gpt-4o'), 'multi-hop-abstract')
// Use in synthesis
const samples = await synthesize({
graph: knowledgeGraph,
synthesizers: [[synthesizer, 100]], // 100% of samples
personas,
count: 100,
})Parameters
Prop
Type
What Makes a Question "Multi-Hop"?
A question is multi-hop when answering it requires retrieving and synthesizing information from 2 or more distinct context chunks from your knowledge graph. Each "hop" represents retrieving information from a different node in the graph.
Example scenario:
- Chunk A discusses TypeScript's type checking
- Chunk B discusses IDE autocomplete features
- Chunk C discusses refactoring tools
- Multi-hop question: "How does TypeScript's type system improve code quality?" (requires information from all 3 chunks)
Generated Questions
Multi-hop abstract questions are:
Conceptual - Focus on understanding relationships and patterns, not isolated facts
Multi-Source - Require information from 2+ chunks to answer completely
Abstract - Ask about "how" and "why" rather than specific details
Synthesis-Heavy - Require combining and reasoning across information, not just retrieval
Example questions generated:
// TypeScript documentation
"How does TypeScript's type system improve code quality?"
"What are the tradeoffs between different typing approaches?"
"How do generics relate to type safety and reusability?"
// Architecture documentation
"How do microservices compare to monolithic architectures?"
"What factors influence database selection for a project?"
"How does caching improve system performance?"When to Use
Multi-hop abstract synthesizers are ideal for:
Reasoning Testing - Test if your system can synthesize information across sources
Conceptual Understanding - Verify your system understands relationships, not just facts
Advanced Capabilities - Challenge your system with questions requiring deeper analysis
Realistic Scenarios - Many user questions are conceptual, not purely factual
Characteristics
Complexity
- Medium-High - Require understanding relationships across multiple sources
- Multiple Hops - Need 2-3+ distinct chunks from different parts of your knowledge graph
- Conceptual - Focus on "why" and "how" rather than "what"
- Synthesis Required - Can't be answered by simply concatenating facts; requires reasoning
Testing Focus
- Retrieval Breadth - Does your RAG system find all relevant chunks across different topics?
- Information Synthesis - Can your LLM combine information from multiple sources coherently?
- Relationship Understanding - Does your system grasp how concepts connect and relate?
- Context Ranking - Are the most relevant chunks from each topic retrieved?
Sample Output
{
query: "How does TypeScript's type system improve code quality?",
reference: "TypeScript's type system improves code quality in several ways: 1) It catches type-related errors at compile time rather than runtime, 2) It provides better IDE support through autocomplete and inline documentation, 3) It makes code more self-documenting through explicit types, and 4) It enables safer refactoring by catching breaking changes early.",
retrievedContexts: [
// Hop 1: Chunk about compile-time checking
"TypeScript provides static type checking, which catches type errors during development...",
// Hop 2: Chunk about compiler features
"The TypeScript compiler analyzes your code and reports type mismatches...",
// Hop 3: Chunk about IDE integration
"IDE support for TypeScript includes intelligent autocomplete and refactoring tools..."
],
metadata: {
persona: "Senior Architect",
queryType: "abstract",
queryLength: "long",
queryStyle: "technical"
}
}Notice how the complete answer requires information from 3 different chunks (3 hops) - each covering a different aspect of how TypeScript improves code quality.
Best Practices
Balance with simpler questions - Use multi-hop abstract as 20-30% of your test suite alongside single-hop questions
Use appropriate retrieval settings - Increase topK to 5-8 to ensure all relevant chunks are retrieved
Test synthesis capabilities - These questions are excellent for evaluating how well your LLM combines information from multiple sources
How is this guide?