Center for Security and Emerging Technology (CSET) reposted this
Policymakers are debating the risks that AI systems pose if intentionally misused—from covert propaganda to biological weapons. But there's a lot of uncertainty about how advanced AI systems will be used by bad actors. Girish Sastry and I published a new paper last week in the Proceedings of the AAAI/ACM Conference on AI, Ethics and Society (#AIES) that lays out strategies for reducing this uncertainty. We structure it around the “PPOu Framework.” https://lnkd.in/euhYzENV The PPOu Framework has 3 stages for reducing uncertainty about whether an AI system (call it "X") will be intentionally used for a specific harmful behavior (call it "Y"). Stage 1. Plausibility: Can system X do Y, just once? To see if the system can be used for the behavior of concern, researchers will red team/stress test it, and may scale efforts with other models. Stage 2. Performance: How well does X do Y? Researchers will test system capabilities against static benchmarks, run experiments to measure system performance, and model the marginal utility to bad actors. Stage 3. Observed Use: Is X used for Y, in the real world? Collecting evidence of misuse can involve trust and safety monitoring, investigations by journalists and open-source researchers, and developing incident databases. Haven’t heard of these methods? Check out the paper! We overview the methods, along with their benefits, limitations, and areas for improvement. We hope this will be useful for helping the public and policymakers understand the research landscape on malicious use, companies identify questions to ask as they develop new systems, and academics see entry-points for their own work. Happy to chat about malicious use research/ideas. Also open to thoughts on where the framework breaks down. We'll share a policymaker-friendly version in coming weeks as well… Center for Security and Emerging Technology (CSET), Georgetown University Walsh School of Foreign Service