Free activities and events in New York City

Add your event
Log In / Sign Up

Webinar “Finding the Right Datasets and Metrics for Evaluating LLM Performance”

Published: July 12, 2024; Author: Julia Sonrisa

 July 17, 2024    12:00 PM-12:30 PM EDT
Webinar “Finding the Right Datasets and Metrics for Evaluating LLM Performance”

Evaluation is one of the more difficult challenges to bringing an LLM-based application to production. Traditional evaluation approaches require a labeled dataset representative of the data you’ll see in production. Because labeled datasets are hard to come by, many teams settle for using a handful of manually created examples or rely on off-the-shelf academic datasets that rarely align with their specific production use case.

In this 30-minute webinar, we’ll discuss the tradeoffs and risks of evaluating LLMs using labeled data and unlabeled data. For supervised (labeled) evaluation, we’ll compare traditional metrics on off-the-shelf academic datasets versus small manually-created tailored examples. These techniques should be combined with evaluation approaches that work on unlabeled data that we’ll compare: performance estimation techniques and unsupervised evaluation using semantically meaningful metrics, both appropriate for production data.

Who should attend:

Is anyone interested in building applications with LLMs, AI Observability, Model monitoring, MLOps, and DataOps? This workshop is designed to be approachable for most skill levels. Familiarity with machine learning and Python will be useful, but it’s not required to attend.

About the Speaker:

Bernese Herman is a Sr. Data Scientist at WhyLabs. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Earlier in her career, Bernese built ML-driven solutions for inventory planning at Amazon and conducted quantitative research at Morgan Stanley. Her ongoing academic research focuses on evaluation metrics for machine learning and LLMs. Bernese serves as faculty for the University of Washington Master’s Program in Data Science program and as chair of the Rigorous Evaluation for AI Systems (REAIS) workshop series. She has published work in top machine learning conferences and workshops such as NeurIPS, ICLR, and FAccT. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.

About WhyLabs:

WhyLabs, Inc. (www.whylabs.ai / @whylabs) enables teams to harness the power of AI with precision and control. From Fortune 100 companies to AI-first startups, teams have adopted WhyLabs’ tools to secure and monitor real-time predictive and generative AI applications. WhyLabs’ open-source tools and SaaS observability platform surface bad actors, bias, hallucinations, performance decay, data drift, and data quality issues. With WhyLabs, teams reduce manual operations by over 80% and cut down time-to-resolution of AI incidents by 20x.

Learn more about WhyLabs and our open-source projects:

  • LangKit: An open-source toolkit for monitoring LLMs
  • whylogs: the open-source standard for data logging

Time: 12:00-12:30 pm EST

Free!

Registration

Share it:

List of all free webinars
^