Tho Le

A Data Scientist. Looking for knowledge!

LLM Evaluation

17 Feb 2025 » ai, llms, evaluation
  • How to Measure the Reliability of a Large Language Model’s Response
  • (DeepLearing.AI Course) Evaluating AI Agents.
    • Use cases: Eval a shopping assistant, coding agent, research assistant. Need a structured evaluation process. This eval each component of an agent and its end-to-end performance.
    • This helps you identify areas for improvement. This is similar to error analysis in supervised learning.
    • Code-based evals: write code explicitly to test a certain step.
    • LLM-as-a-Judge evals: you prompt an LLM to efficiently come up with ways to evaluate more open-ended outputs.