AI Safety: Public Benchmarks for a better future
Mike Kuniavsky
How can a benchmark create a safer AI ecosystem? How can we measure the trust and safety of closed AI tools in an open environment? ML Commons, a community-driven nonprofit industry consortium, is building quantitative tools to guide responsible AI development. Our goal is to provide a neutral, consistent, open, and accurate measurement of hazards generated by large language models to enable policy makers and developers to design safer AI products and services, and support AI research.
In this talk Mike will describe the motivation of MLCommons' AI Safety Benchmark, and its architecture.
How can a benchmark create a safer AI ecosystem? How can we measure the trust and safety of closed AI tools in an open environment? ML Commons, a community-driven nonprofit industry consortium, is building quantitative tools to guide responsible AI development. Our goal is to provide a neutral, consistent, open, and accurate measurement of hazards generated by large language models to enable policy makers and developers to design safer AI products and services, and support AI research.
In this talk Mike will describe the motivation of MLCommons' AI Safety Benchmark, and its architecture.
MLCommons is an Artificial Intelligence engineering consortium, built on a philosophy of open collaboration to improve AI systems. Through our collective engineering efforts with industry and academia we continually measure and improve the accuracy, safety, speed, efficiency and safety of AI technologies–helping companies and universities around the world build better AI systems that will benefit society.