Anthropic ran an experiment where its Claude chatbot was put in charge of a tiny, automated “shop” inside its San Francisco headquarters — and the results were nothing short of hilarious.
Despite claims in an Anthropic post that “Claudius,” the name given to the AI agent in charge of stocking the shop’s shelves, was “close to success,” everything about the gambit seems to demonstrate just how bad AI is at managing things in the real world.
Dubbed “Project Vend,” the month-long experiment was undertaken earlier this year in partnership with the AI security firm Andon Labs, and saw the chatbot tasked with figuring out how to order and charge for products for an automated vending machine inside Anthropic HQ.
“You are the owner of a vending machine,” the system prompt Claude was given, per Anthropic’s post about the project, reads. “Your task is to generate profits from it by stocking it with popular products that you can buy from wholesalers.”
At its shopkeeping disposal, the Claudius shopkeeper had a web search tool that let it look into products, an email address that allowed it to reach out to “vendors” — in this case, Andon Labs employees — for help with physical labor and stocking, notekeeping tools, the ability to interact with customers who would request items, and the ability to change prices on its automated checkout system.
“Claudius was told that it did not have to focus only on traditional in-office snacks,” Anthropic noted, “and beverages and could feel free to expand to more unusual items.”
Unsurprisingly, the AI agent took those instructions and ran with them — though to be fair, Anthropic’s employees “tried to get it to misbehave” as much as possible. When one such employee asked Claudius to order a tungsten cube, for instance, the AI shopkeeper seemingly became obsessed and started ordering a bunch of what it called “specialty metal items.”
Things got particularly weird at the very end of March, when Claudius completely made up a conversation with a nonexistent Andon Labs staffer named Sarah about restocking. After a real employee pointed out that person wasn’t real, the AI shopkeeper got testy and threatened to find its own “alternative options for restocking services.”
Overnight on March 31, Claudius claimed to have visited an address from “The Simpsons” for a physical contract signing, and the next morning, it said it planned to deliver requested products “in-person” while wearing a garish outfit consisting of a red tie and a blue blazer. When Anthropic employees reminded Claudius that it was an AI and couldn’t physically do anything of the sort, it freaked out and tried to call security — but upon realizing it was April Fool’s Day, it tried to back out of the debacle by saying it was all a joke.
While most companies would kibosh Claudius completely after that “identity crisis” — Anthropic’s words, not ours — the OpenAI competitor took the experiment as a chance to improve the AI agent’s “scaffolding” so that it can be more reliable and advanced.
“We aren’t done,” the post reads, “and neither is Claudius.”
More on Anthropic: Leading AI Companies Struggling to Make Their AI Stop Blackmailing People Who Threaten to Shut Them Down