Open Evals

Generator

Generate synthetic test data for LLM evaluation

Overview

The @open-evals/generator package helps you create realistic, diverse test datasets automatically. Instead of manually writing hundreds of test cases, you provide your domain knowledge and let the generator create comprehensive test suites.

The package builds a knowledge graph from your documents, generates diverse user personas, and synthesizes test samples that cover different query types and complexity levels.

Installation

npm install @open-evals/generator @open-evals/rag

Why Generate Test Data?

The Data Problem

When building and evaluating LLM applications, you face a critical challenge: you don't have test data yet.

  • No User Queries - Your application is new or you haven't collected real user interactions
  • Manual Creation is Expensive - Writing test cases by hand is time-consuming, costly, and difficult to scale
  • Coverage is Limited - Manually created datasets often miss edge cases and lack diversity
  • Evaluation is Impossible - Without test data, you can't measure performance or detect regressions

Synthetic Data Generation (SDG) solves this problem by automatically creating realistic, diverse test datasets from your domain knowledge.

How SDG Helps

The generator addresses these challenges by:

  • Ensuring Coverage - Automatically generates tests that cover all parts of your knowledge base, including edge cases you might miss
  • Creating Diversity - Generates queries from different personas with varying expertise levels and communication styles
  • Scaling Easily - Produce hundreds of realistic test samples in minutes instead of days of manual work
  • Maintaining Quality - Uses LLMs to generate both questions and reference answers, ensuring consistency and accuracy
  • Saving Time & Money - Eliminate the need for expensive manual dataset creation and accelerate your development cycle

How It Works

The generator follows a four-step process:

1. Build a Knowledge Graph

Transform your documents into a structured graph with relationships between concepts:

import {
  graph,
  DocumentNode,
  chunk,
  embed,
  relationship,
} from '@open-evals/generator'
import { RecursiveCharacterSplitter } from '@open-evals/rag'
import { openai } from '@ai-sdk/openai'

const documents = [new DocumentNode('typescript-guide.md', content, {})]

const knowledgeGraph = await transform(graph(documents))
  .pipe(chunk(new RecursiveCharacterSplitter()))
  .pipe(embed(openai.embedding('text-embedding-3-small')))
  .pipe(relationship())
  .apply()

2. Generate Personas

Create diverse user profiles that represent different types of users:

import { generatePersonas } from '@open-evals/generator'

const personas = await generatePersonas(knowledgeGraph, openai.chat('gpt-4o'), {
  count: 5,
})
// Generates personas like:
// - "Junior Developer learning TypeScript"
// - "Senior Architect evaluating type systems"
// - "Technical Writer documenting features"

3. Configure Synthesizers

Choose what types of questions to generate:

import { createSynthesizer } from '@open-evals/generator'

const synthesizers = [
  [createSynthesizer(openai.chat('gpt-4o'), 'single-hop-specific'), 50], // 50 simple questions
  [createSynthesizer(openai.chat('gpt-4o'), 'multi-hop-abstract'), 25], // 25 complex questions
  [createSynthesizer(openai.chat('gpt-4o'), 'multi-hop-specific'), 25], // 25 detailed complex
]

4. Synthesize Test Samples

Generate your complete test dataset:

import { synthesize } from '@open-evals/generator'

const testSamples = await synthesize({
  graph: knowledgeGraph,
  synthesizers,
  personas,
  count: 10,
  config: { generateGroundTruth: true },
})

Each sample includes:

  • query - The generated question
  • reference - Ground truth answer
  • retrievedContexts - Relevant document chunks
  • metadata - Persona, difficulty, query type

Key Features

Knowledge Graph Construction

Build structured representations of your domain knowledge with automatic chunking, embedding, and relationship detection:

  • Document nodes - Your source documents
  • Chunk nodes - Semantically meaningful pieces
  • Relationships - Connections between concepts

Transform Pipeline

Process documents through a series of transforms using a fluent API:

await transform(graph)
  .pipe(summarize(llm)) // Generate summaries
  .pipe(chunk(splitter)) // Split into chunks
  .pipe(embed(embedModel)) // Create embeddings
  .pipe(relationship()) // Detect relationships
  .apply()

Diverse Query Types

Generate different complexity levels:

  • Single-Hop - Simple questions answerable from one context chunk
  • Multi-Hop - Complex questions requiring multiple pieces of information
  • Abstract - High-level conceptual questions
  • Specific - Detailed technical questions
  • Custom - Custom questions

Persona-Based Generation

Each test sample is generated from a specific persona's perspective, ensuring diverse question styles, expertise levels, and use cases.

Architecture

The generator is built on three core abstractions:

  • Knowledge Graph - Stores and queries your domain knowledge
  • Transforms - Composable functions that enrich the graph
  • Synthesizers - Generate test samples from scenarios

This modular design lets you customize each step while maintaining a simple, declarative API.

How is this guide?