📣 Scale is excited to introduce the latest addition to the SEAL Leaderboards: Adversarial Robustness! Designed to uncover potential risks that may not be apparent in standard testing, this leaderboard evaluates top models against 1,000 adversarial prompts, covering critical areas like illegal activities, harm, and hate speech. Here's what sets the leaderboard apart: ✅ It measures harm that is universally recognized as problematic, rather than issues that might be deemed harmful by some but not others. ✅ Its evaluation dataset was created by red teamers selected for their creativity, different approaches to model prompting, and unique opinions. ✅ We implemented a multi-tiered review system to ensure thorough assessment and accurate categorization of potentially harmful content. ✅ We openly publish our harm categories and encourage contributions from the community to refine and add details to these definitions. By releasing the Adversarial Robustness leaderboard, we remain committed to advancing AI safety standards industry-wide, empowering the AI community to build safer, more trustworthy models. Explore our methodology and results: https://lnkd.in/g7hW476N
Great news!
Great news! 😁 Scale AI