Open Evals

Open Evals

An open-source framework for evaluating and testing LLM applications with built-in metrics and synthetic data generation.

Introduction

Open Evals is a comprehensive evaluation framework designed for testing and validating Large Language Model (LLM) applications. Built with TypeScript and full support for the Vercel AI SDK, it provides everything you need to ensure your AI applications work reliably in production.

Key Features

Flexible Evaluation Framework

Evaluate your LLM applications with a powerful, extensible metric system that supports both LLM-based and embedding-based evaluation approaches.

Built-in Metrics

Get started quickly with production-ready metrics:

  • Faithfulness - Measure how well responses are grounded in provided context
  • Factual Correctness - Evaluate the factual accuracy of generated responses
  • Custom Metrics - Easily create your own domain-specific evaluation metrics
More metrics will be added in the future.

Synthetic Test Data Generation

Generate realistic test datasets using:

  • Knowledge Graphs - Build structured representations of your domain
  • Personas - Create diverse user profiles for comprehensive testing
  • Scenarios - Generate single-hop and multi-hop query scenarios
  • Synthesizers - Automatically generate test samples from your knowledge graph with personas and scenarios

Getting Started

Open Evals is organized into focused packages:

Why Open Evals?

  • Type-Safe - Built with TypeScript for best DX experience
  • AI SDK - First-class support for AI SDKs
  • Open Source - Free to use and extend for any project

How is this guide?