Anthropic publishes the “system messages” that thrill Claude

remon3 weeks ago

88 2 minutes read

Anthropic publishes the “system messages” that thrill Claude

Generative AI models aren’t really human-like. They have no intelligence or personality: they’re simply statistical systems that predict the most likely next words in a sentence. But like interns in a tyrannical workplace, they TO DO follow the instructions without complaint — including the initial “system prompts” that prepare the models with their basic qualities and what they should and should not do.

All generative AI vendors, from OpenAI to Anthropic, use system prompts to prevent (or at least try to prevent) models from behaving incorrectly, and to guide the overall tone and sentiment of the models’ responses. For example, a prompt might tell a model to be polite but never apologetic, or to be honest about the fact that it can’t possibly know everything.

But vendors typically keep system prompts secret, probably for competitive reasons, but also perhaps because knowing the system prompt can suggest ways to bypass it. The only way to expose the GPT-4o system prompt, for example, is with a prompt injection attack. And even then, the system output cannot be completely trusted.

However, Anthropic, in its ongoing effort to present itself as a more ethical and transparent AI provider, has released the system prompts for its latest models (Claude 3.5 Opus, Sonnet, and Haiku) in the Claude iOS and Android apps and on the web.

Alex Albert, Anthropic’s head of developer relations, said in a post on X that Anthropic plans to make this type of disclosure a regular thing as it updates and refines its system prompts.

We’ve added a new release notes section on system prompts to our docs. We’ll be recording the changes we make to the default system prompts on Claude dot ai and our mobile apps. (The system prompt does not affect the API.) pic.twitter.com/9mBwv2SgB1

— Alex Albert (@alexalbert__) August 26, 2024

The latest prompts, dated July 12, are very clear about what Claude models can’t do, such as “Claude cannot open URLs, links, or videos.” Facial recognition is a big no-no; the system prompt in Claude 3.5 Opus tells the model to “always react as if it were completely blind to faces” and to “avoid identifying or naming humans in (images).”

But the prompts also describe certain personality traits and characteristics – traits and characteristics that Anthropic would like the Claude models to exemplify.

Opus’s introductory message, for example, states that Claude should come across as “highly intelligent and intellectually curious” and that he “enjoys hearing what humans think about an issue and participating in discussions on a wide variety of topics.” It also asks Claude to approach controversial topics with impartiality and objectivity, providing “thoughtful thoughts” and “clear information”—and to never begin his answers with the words “certainly” or “absolutely.”

This is all a bit strange for this human, these system messages, which are written like an actor in a play might write a character analysis sheet. Opus’ message ends with “Claude is now being connected to a human,” which makes Claude seem like some sort of consciousness on the other side of the screen whose sole purpose is to cater to the whims of his human interlocutors.

But of course, this is an illusion. If the instructions given to Claude teach us anything, it is that without help and without human support, these models are frighteningly blank pages.

With these new system changelogs (the first of their kind from a major AI vendor), Anthropic is putting pressure on its competitors to publish the same ones. It remains to be seen whether the gamble works.

remon3 weeks ago

88 2 minutes read

remon

Related Articles

Phil Spencer and the Battle for Xbox’s Soul

Warhammer 40,000: Space Marine 2 – Official Trailer – IGN

What’s Up With Destiny 2’s Controversial Final Shape Raid Date?

The Atari 400 Mini is a cute little slice of video game history