Tag: safety

spot_img

Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

The hypothetical scenarios the researchers presented Opus 4 with that elicited the whistleblowing behavior involved many human lives at stake and absolutely unambiguous...

Anthropic’s Claude Is Good at Poetry—and Bullshitting

The researchers of Anthropic’s interpretability group know that Claude, the company’s large language model, is not a human being, or even a conscious...