Prompt Like the Pros: Lessons from Grok, Google’s AlphaEvolve, and GPT-4.1’s Arrival

Prompting advice from Grok, breakthroughs from Google, delays at Meta, and a new model in your ChatGPT app.

May 16, 2025

Welcome to The Median, DataCamp’s newsletter for May 16, 2025.

In this edition: five takeaways from Grok’s system prompts, why Google’s AlphaEvolve is important, what’s behind the Llama 4 Behemoth delay, and more.

This Week in 60 Seconds

Google DeepMind Introduces AlphaEvolve

Google DeepMind has introduced AlphaEvolve, an AI coding agent designed for discovering and optimizing algorithms. The agent combines the creative capabilities of large language models with automated evaluators and an evolutionary framework to refine ideas. AlphaEvolve has already demonstrated significant practical applications, improving efficiency in Google's data centers, assisting in chip design, and accelerating AI training processes. It has also been used to find faster matrix multiplication algorithms and contribute to solving open mathematical problems. We’ll explain how AlphaEvolve works in the “Deeper Look” section.

xAI Makes Grok System Prompts Public Following Incident

Elon Musk's xAI has published Grok's system prompts on GitHub, with the purpose of increasing transparency and reliability. This decision comes after an "unauthorized modification" caused the chatbot to generate unprompted responses on controversial political topics, which xAI stated violated its core values. The company believes making these prompts public will allow users to review changes and build trust in Grok as a "truth-seeking AI." We will explore in the “Deeper Look” section what can be learned from these newly available system prompts.

GPT-4.1 Becomes Available in ChatGPT

OpenAI is making its GPT-4.1 model available directly within ChatGPT, responding to user demand. Previously accessible only via API and targeted at developers, GPT-4.1 is optimized for coding task execution and instruction following. You can now use its speed as a faster alternative to OpenAI's o3 and o4-mini for your everyday coding needs. If you're a Pro, Plus, or Team user, access GPT-4.1 through the "More models" dropdown in the model picker. Enterprise and Education users will gain access in the coming weeks. In addition, all ChatGPT users will see GPT-4o mini replaced by the new GPT-4.1 mini.

Meta Delays Llama 4 Behemoth Rollout

Meta is reportedly delaying the rollout of its flagship AI model, Llama 4 Behemoth, as engineers face challenges improving its capabilities and performance due to training difficulties. Earlier this year, Meta introduced the Llama 4 model suite, releasing Llama 4 Scout and Llama 4 Maverick while only previewing Llama 4 Behemoth, which was still in training. The initial Llama 4 launch had faced criticism over benchmark discrepancies, and Meta may be taking a more cautious approach with Behemoth's rollout this time.

Databricks to Buy Neon for Building Better AI Agents

Databricks is set to acquire database startup Neon for approximately $1 billion, a move aimed at enhancing its analytics platform and its push into AI agents. Neon's cloud-based platform, built on the PostgreSQL database system, is particularly useful because it helps AI agents and developers store, access, and manage data in real-time. This real-time data management is important for building and deploying AI-powered applications. By integrating Neon's technology, Databricks aims to make it easier for businesses using its platform to develop and utilize AI agents.

Learn AI Fundamentals With DataCamp

A Deeper Look at This Week’s News

A Quick Introduction to AlphaEvolve

This week, Google DeepMind introduced AlphaEvolve, an AI coding agent that advances algorithm discovery and optimization. Using Gemini models, AlphaEvolve marks a notable development in automated problem-solving across various scientific and computational areas.

What is AlphaEvolve?

AlphaEvolve is an AI agent designed for general-purpose algorithm discovery and optimization. It combines the creative problem-solving capabilities of Google's Gemini models with automated evaluators and employs an evolutionary framework to refine ideas.

This system can evolve entire codebases and develop complex algorithms, moving beyond discovering single functions.

How does AlphaEvolve work?

Let’s understand the main idea behind AlphaEvolve by explaining the diagram below.

Source: AlphaEvolve White Paper

Think of it this way: you define the problem. You tell AlphaEvolve what you want to achieve, laying out the criteria for success, maybe providing an initial idea, and any background knowledge it needs.

Once you've set the "what," AlphaEvolve steps in to figure out the "how." It does this through a continuous cycle:

It starts with a program database—a library of all the programs and ideas it's found or generated so far, along with how well they performed.
A prompt sampler then pulls from this database, taking past successful ideas as inspiration. It uses these to craft detailed "prompts" that guide the AI.
These prompts go to an LLMs ensemble, a group of advanced AI models. This is where the creative work happens, as they generate proposals for improved programs.
Next, these proposed programs are sent to an evaluators pool. These are automated systems that test the new programs, score them objectively, and provide feedback.
Finally, the results—the new programs and their scores—are sent back to the program database. This adds to the pool of knowledge, making AlphaEvolve smarter for the next round of improvements.

This loop repeats, continuously refining solutions until AlphaEvolve delivers the best possible "Improved solution" back to you.

Why is AlphaEvolve important?

Traditionally, the process of ideation, exploration, experimentation, and validation can be time-consuming and resource-intensive for humans. AlphaEvolve automates significant portions of this, allowing for more rapid progress on complex problems in mathematics, computer science, and system optimization.

The system produces human-readable code, offering practical benefits such as interpretability, predictability, and ease of debugging and deployment.

What has AlphaEvolve achieved so far?

AlphaEvolve has produced notable results in both practical applications and theoretical research.

First, it optimized Google’s computing ecosystem:

It discovered a heuristic that continuously recovers, on average, 0.7% of Google's worldwide compute resources by improving Borg data center scheduling.
AlphaEvolve proposed a Verilog rewrite for a key arithmetic circuit in Tensor Processing Units (TPUs), which has been integrated into an upcoming TPU, representing Gemini's first direct contribution to TPU arithmetic circuits.
It achieved a 1% reduction in Gemini's training time and delivered up to a 32.5% speedup for the FlashAttention kernel implementation in Transformer-based AI models. It has also optimized compiler-generated code (XLA-generated IRs).

Source: Google

Secondly, AlphaEvolve has advanced mathematics and algorithm discovery:

AlphaEvolve designed faster matrix multiplication algorithms, including one that improves upon Strassen's 1969 algorithm for 4x4 complex-valued matrices, achieving 48 scalar multiplications instead of 49. It improved the state-of-the-art for 14 different matrix multiplication targets.
It has made progress on over 50 open mathematical problems, rediscovering state-of-the-art solutions in about 75% of cases and improving upon existing best-known solutions in about 20%.

Source: AlphaEvolve White Paper

What Can We Learn From Grok's System Prompts?

This week, xAI made several of Grok's system prompts available on GitHub. These include:

grok3_official0330_p1.j2: The primary system prompt for the Grok 3 chat assistant on grok.com and X.
Default_deepsearch_final_summarizer_prompt.j2: The prompt used for the DeepSearch feature.
Grok_analyze_button.j2: Used for the "Grok Explain" feature on X.
Ask_grok_summarizer.j2: The prompt for the Grok bot on X.

This release offers an insight into how advanced AI models are directed. Examining these prompts can provide valuable lessons on guiding AI behavior and refining interactions for those interested in developing their own chatbots or gaining a clearer understanding of prompt engineering.

Five takeaways for prompt engineering

Understanding how large language models like Grok are instructed can be useful, whether you aim to develop your own AI applications or want a clearer picture of prompt engineering principles.

Establishing a clear identity: Grok's main system prompt explicitly defines its identity: "You are Grok 3 built by xAI."
Defining tool access and constraints: Grok's prompts outline the tools it can access, such as analyzing X user profiles, processing uploaded content, and searching the web. The prompts also show how these tools are conditionally enabled.
Guiding output style and nuance: The prompts offer specific directions on how Grok should format its responses. For instance, it's instructed to provide "the shortest answer you can" unless otherwise specified, and to match the user's language. For sensitive subjects, it is directed to use cautious phrasing, such as "research suggests" or "it seems likely that," rather than definitive statements.
Enforcing skepticism and neutrality: The prompt for the Grok bot on X emphasizes a commitment to "truth-seeking and neutrality," instructing the AI to be "extremely skeptical" and to avoid "blindly deferring to mainstream authority or media." It also clarifies that search results should be treated as initial information, not as confirmed beliefs.
Managing memory and user interaction: Grok’s prompts include detailed instructions on how its memory functions and how users can manage or delete past conversations. Additionally, directives for user consent, such as asking for confirmation before generating an image, are present—this single instruction could both improve user experience and save costs by avoiding needless image generations.

The release of these system prompts offers an opportunity to understand how advanced AI models are instructed. They indicate that while the core capabilities of the model are fundamental, the precise design of prompts plays an important role in guiding AI behavior and refining its interactions.

Industry Use Cases

Expedia Turns Instagram Reels Into Bookable Trips

Expedia has launched Trip Matching, a new AI-powered feature that turns public Instagram Reels into personalized, bookable travel itineraries. Travelers simply share a reel with the @Expedia account via Instagram, and the system replies with personalized destination suggestions, trip plans, and booking links based on the video. The tool is designed to bridge the gap between social inspiration and action, targeting the 80% of millennials who rely on social media for travel ideas. The feature is still in beta.

Big Tech Targets Healthcare with AI

Tech giants are racing to embed AI across every corner of healthcare. Microsoft is expanding Nuance’s voice tools into ambient scribing and patient scheduling, while Apple is developing a health coach powered by Apple Watch data. Nvidia is partnering with GE Healthcare to simulate autonomous X-rays and investing in startups building robotic surgical assistants. Amazon is testing a health chatbot tied to its pharmacy and One Medical, and Google is rolling out foundation models for summarizing medical visits and automating claims. Oracle, meanwhile, is embedding AI agents into its next-gen health record system.

Unilever Uses AI for Supply Chain Efficiency

Unilever is implementing artificial intelligence through a successful pilot project with the H2Ok Innovations start-up, which uses AI and the Internet of Things (IoT) to optimize liquid technology systems for cleaning nutrition and ice cream machines in its factories. This particular solution at Unilever's Poznan factory has led to a 20% reduction in machine cleaning times, a 10% cut in utility use, and annual savings of €100,000. The technology also ensures precise use of detergents and temperatures, helping prevent over-washing and avoiding spoilt batches, and has reduced changeover time by 16.7%. Unilever plans to roll out this system to an additional 35 sites by 2026.

Tokens of Wisdom

The mindset I’m bringing to responsible innovation is this: I want to build new tools—better tools—but I want to do it slowly and responsibly. Let’s make sure the right controls are in place and that our partners understand their role and responsibility in this ecosystem.

—Andrew Reiskind, Chief Data Officer at Mastercard

Listen to this podcast to learn about AI’s impact on finance with Andrew Reiskind, CDO at Mastercard.