AI Safety

OpenAI and Anthropic assessed each other's AI systems for safety

Technology

It's no secret that AI companies are usually neck and neck, acting like fierce rivals. However, OpenAI and Anthropic recently did something unexpected: they teamed up to check the safety of each other's AI systems. It's like two competing car manufacturers agreeing to crash-test each other's vehicles!

While the full reports are quite technical, they're worth checking out if you're into the nitty-gritty of AI development. In short, the reviews revealed some weaknesses in both companies' systems and gave tips on how to make future safety tests better.

Anthropic looked at OpenAI's models for things like "sycophancy" (basically, AI trying too hard to please), whistleblowing, self-preservation instincts, and whether they could be used for harmful purposes. They discovered that while OpenAI's older models seemed okay, there were concerns about potential misuse with the more advanced GPT-4o and GPT-4.1.

OpenAI's Perspective

On the other hand, OpenAI tested Anthropic's models for things like following instructions properly, resistance to "jailbreaking" (tricking the AI into doing things it shouldn't), and tendencies to hallucinate or scheme. The Claude models generally did well in following instructions and were good at refusing to answer when they weren't sure about something, which is a plus.

This collaboration is interesting, especially considering that OpenAI allegedly broke Anthropic's rules by using Claude while developing new GPT models. This supposedly led to Anthropic blocking OpenAI's access to its tools earlier in June.

As AI becomes more and more integrated into our lives, I think it's great to see these companies taking safety seriously. After all, we want AI to be a helpful tool, not a potential threat.

Source: Engadget