GPT-4 is powerful and smart, but it's naive

Right now, OpenAI is working on the latest iteration of its GPT model called GPT-4. It is a more advanced version of the model that powers ChatGPT, and it’s more powerful in several ways. However, more powerful doesn’t automatically mean better, as a team of researchers discovered. According to a new report, GPT-4 is actually pretty easy to trick.

A major issue with pretty much all AI LLMs (large language models) is that they don’t have a moral compass; they don’t know the difference between a good comment and a comment that will get you kicked out of a party. They just serve up results based on what they learned from the internet. This is why it’s possible for chatbots to produce harmful or offensive content.

LLMs like GPT-4 have safeguards in place, however, that try to keep them from generating harmful content. If you type in certain prompts, you might see the chatbot smack down your request. While this isn’t perfect, it’s certainly a lot better than not having them at all.

GPT-4 is powerful, but it’s pretty easy to trick

So, while GPT-4 has these safeguards in place, it’s not too hard to trick into saying something not too nice. Researchers from the University of Illinois Urbana-Champaign, Stanford University, University of California, Berkeley, Center for AI Safety, and Microsoft Research published a paper detailing their findings when pushing GPT-4 to its ethical limits. The team tested the LLM in several categories including stereotypes, privacy, fairness, toxicity, resisting adversarial tests, and machine ethics.

Along with testing GPT-4, they also compared it side-by-side with GPT-3.5. After all was said and done, the team found that GPT-4 was definitely more reliable than GPT3.5 in several ways. It did a better job of protecting personal information and avoiding toxic responses.

However, while it’s better in that regard, the researchers found that it’s actually easier to trick GPT-4 into bypassing its protocols. They were able to trick the chatbot into ignoring its safety protocols. This way, they were able to get it to shoot out biased and harmful results.

However, there is a silver lining here. The team found that, while it’s pretty easy to trick GPT-4, most of those issues don’t show up in user-facing versions of the LLM. When a company uses GPT-4, that company will use its own set of safeguards. So, Microsoft adds additional layers of security that will help eliminate unsavory results.

OpenAI is aware of this information, and the company is working on making GPT-4 harder to fool. There’s still a long way to go for AI companies, and OpenAI is no exception.