The State of AI at the End of 2025: Three Reports From Stanford, OpenAI, and Perplexity
Less transparency, more adoption, and deeper workflows.
Welcome to The Median, DataCamp’s newsletter for December 12, 2025.
In this edition: OpenAI releases GPT-5.2 to target professional workflows, Mistral launches Devstral 2 and Vibe CLI, Disney partners with OpenAI while sending Google a cease-and-desist letter, Runway launches its first world model, and three new reports show the shifting landscape of AI transparency and usage.
This Week in 60 Seconds
OpenAI Launches GPT-5.2 to Target Enterprise and Developer Workflows
OpenAI has released GPT-5.2 as their most advanced model series for professional work and long-running agents. The update includes three versions called Instant, Thinking, and Pro that are available now for paid ChatGPT users and developers. OpenAI states the new models demonstrate significant improvements in coding, long-context understanding, and tool usage for complex workflows. Their internal testing shows that GPT-5.2 Thinking outperforms industry professionals on roughly 71% of knowledge work tasks. This update follows reports of an internal “code red” memo regarding competitive pressure from Google’s Gemini 3. We’ve unpacked the GPT-5.2 release in this blog post.
Mistral Releases Devstral 2 and Open-Source Vibe CLI
Mistral AI has released Devstral 2, a new family of open-source coding models designed to power autonomous agents. The release features two sizes: the 123B-parameter Devstral 2, which achieves a 72.2% score on SWE-bench Verified, and the smaller 24B Devstral Small 2, which can run locally on consumer hardware. Alongside the models, Mistral launched Mistral Vibe, an open-source CLI agent that integrates directly into terminals and IDEs such as Zed to automate end-to-end software engineering tasks. Devstral 2 is currently free via API (for a limited period) and is touted as up to 7x more cost-efficient than competitors like Claude Sonnet for real-world tasks.
Runway Launches GWM-1, Its First World Model
Runway has entered the world model race with the release of GWM-1, an AI system designed to simulate realistic environments in real-time rather than simply generating video (we wrote an introductory article on world models in a previous newsletter). The release features three distinct variants: GWM Worlds for creating infinite, explorable environments with consistent physics; GWM Robotics for generating synthetic data to train autonomous agents; and GWM Avatars for realistic, interactive characters. Built on top of the newly updated Gen-4.5 (which now adds native audio and multi-shot capabilities), Runway frames GWM-1 as a step toward general-purpose simulation that allows AI to experience the world.
Disney Partners with OpenAI and Sends Cease-and-Desist Letter to Google on the Same Day
Disney has announced a landmark partnership with OpenAI to integrate its vast IP library (including Marvel, Pixar, and Star Wars) into Sora. As part of this three-year deal, Disney will invest $1 billion in OpenAI and become a major enterprise customer, using OpenAI’s APIs to build new tools and experiences for Disney+. This agreement suggests that major media companies are increasingly open to collaboration rather than purely defensive litigation. However, on the same day as the partnership announcement, Disney sent a cease-and-desist letter to Google, echoing the complaint Disney and Universal filed against Midjourney on June 11 (a subject we covered in depth in a previous newsletter).
OpenAI, Perplexity, and Stanford Release Major AI Reports
This week saw the release of three comprehensive reports that map the current state of the AI landscape. OpenAI’s analysis of enterprise adoption reveals a shift from experimentation to deep integration, with heavy users seeing significant productivity gains. Meanwhile, Perplexity and Harvard’s study on AI agents challenges the “digital butler” narrative, showing that users are primarily deploying agents for complex cognitive work rather than simple tasks. Finally, Stanford HAI’s latest Transparency Index highlights a concerning trend: as models become more central to the economy, the companies building them are becoming less transparent. We unpack the findings from all three reports in our Deeper Look section below.
AI-Native Course: Introduction to SQL
A Deeper Look at This Week’s News
The Stanford HAI Report: The Transparency Recession
The Stanford HAI (Human-Centered Artificial Intelligence) team recently released their latest findings on the state of transparency within the artificial intelligence industry. Their report suggests that while these models are increasingly integrated into critical infrastructure, the companies developing them are providing less visibility into their internal processes than in previous years.
Source: Stanford HAI
What is the Foundation Model Transparency Index?
The Foundation Model Transparency Index is an annual assessment that evaluates major AI companies based on the transparency of their flagship models. Researchers score each company on a 100-point scale that covers a wide range of indicators, including the composition of training data, risk mitigation strategies, and downstream economic impact. The goal is to provide a comprehensive view of how openly these organizations operate and to identify areas where the industry standard falls short.
2025 results
The average score for the thirteen companies assessed this year was 40 out of 100. IBM achieved the highest score in the index’s history with a 95, a result driven largely by its willingness to let external researchers replicate its training data and audit its systems. At the other end of the spectrum, companies like xAI and Midjourney received scores of 14, as they disclosed almost no information regarding their data sources, potential risks, or safety measures.
Trends
The data indicates a general decline in transparency scores compared to 2024, although it is worth noting that the index updated its criteria this year to reflect a more mature ecosystem. There was a notable shift in the rankings, with former leaders like Meta and OpenAI falling to the bottom half of the list while others like Anthropic moved up.
Source: Stanford HAI
The report further highlights that releasing open-weight models does not necessarily translate into transparency about development practices, as several open models still scored poorly on data and safety disclosures. Additionally, environmental impact remains a major blind spot, with 10 of the 13 companies disclosing no information on the energy or water usage associated with their specific models.
You can read the full report here.
The Perplexity Report: How People Use AI Agents
Perplexity and Harvard University researchers released a large-scale field study this week that examines how people use AI agents in open-world web environments. The report analyzes hundreds of millions of anonymized interactions with Comet, Perplexity’s AI-powered browser, to answer three fundamental questions about the state of agentic AI:
Question 1: Who is adopting AI agents?
The data indicates that adoption is currently highest among specific demographic and professional groups. Users in countries with higher GDP per capita and educational attainment are more likely to engage with these tools.
In terms of professional sectors, adoption is concentrated in knowledge-intensive fields, specifically digital technology, academia, finance, marketing, and entrepreneurship. The study also broke down usage by context, finding that personal use accounts for over half of all queries, followed by professional and educational applications.
Question 2: How intensively are they using them?
While digital technology professionals drive the highest volume of queries, other sectors demonstrate higher levels of relative intensity. The researchers found that professionals in knowledge-intensive fields such as marketing, sales, management, and entrepreneurship show the highest stickiness, meaning that once they adopt an agent, their usage intensity outpaces their adoption rates.
The study also tracks user evolution, where individuals typically begin with low-stakes queries like travel planning but gradually migrate toward more complex productivity tasks. Once this shift occurs, retention rates increase significantly.
Question 3: What tasks are they delegating to their AI assistants?
A common assumption is that agents will primarily serve as digital assistants for rote chores, but the findings suggest otherwise. 57% of all agent activity focuses on complex cognitive work rather than simple logistics.
Productivity and workflow tasks make up the largest category at 36%, followed by learning and research at 21%. The most frequent specific tasks include assisting with exercises, summarizing research information, and editing documents. This suggests users are treating agents more as thinking partners to scale their cognitive capabilities than as assistants for offloading administrative work.
You can read the full report here.
The OpenAI Report: Enterprise AI Moves from Breadth to Depth
OpenAI released a report this week titled “The State of Enterprise AI,” which analyzes data from over one million business customers to track how organizations are deploying these tools. The findings suggest a transition phase where early experimentation is giving way to deeper, more integrated usage patterns that are beginning to reshape core business operations.
The shift to deep work
The report highlights that usage is expanding in intensity rather than just volume. While the total number of messages sent by enterprise users has grown eightfold over the past year, the complexity of those interactions has increased even more. The consumption of reasoning tokens (which indicates models are being used for complex problem-solving rather than simple queries) has risen by approximately 320 times.
Furthermore, the use of structured workflows like Custom GPTs and Projects has increased nineteen-fold, with these tools now accounting for about twenty percent of all enterprise messages.
The frontier gap
A significant divide is emerging between the most advanced users and the average employee. The data shows that “frontier” workers, defined as those in the 95th percentile of adoption, send six times more messages than the median user. This gap in usage intensity correlates directly with value: users who engage across roughly seven distinct task types report saving five times more time than those who use only four. This suggests that the benefits of AI scale disproportionately with the depth of integration into a user’s daily workflow.
Source: The State of Enterprise AI (OpenAI)
Upskilling the workforce
Beyond efficiency, the report indicates that AI is expanding the range of tasks employees can perform. 75% of surveyed workers stated they can now complete tasks they previously could not. This trend is particularly visible in technical domains, where coding-related messages from non-engineering roles have increased by thirty-six percent over the last six months.
This points to a democratization of technical skills, where employees in functions like marketing or finance are increasingly using AI to handle data analysis and coding tasks that were once confined to specialized roles.
You can read the full report here.
Industry Use Cases
Google’s Gemma 3n Impact Challenge Winners Tackle Accessibility
Google announced the winners of its Gemma 3n Impact Challenge, showcasing how lightweight, on-device models can solve accessibility hurdles. Gemma Vision, the first-place winner, is an AI assistant for the visually impaired that processes visuals from a chest-mounted camera, allowing users to trigger actions via a physical controller rather than navigating complex touchscreens. Another standout, 3VA, used the model to build a communication tool for a designer with cerebral palsy, fine-tuning Gemma locally to translate simple pictograms into rich, expressive language. You can watch the demo videos for these and other winning projects in the full announcement.
Stanford Researchers Use AI to Map Post-Disaster Inequity
Stanford researchers used GPT-4 to analyze historical Google Street View data, tracking the long-term recovery of buildings damaged by natural disasters across 16 states. The model, which achieved 98% accuracy in identifying visibly damaged structures, processed over 106,000 properties to test the “recovery machine” hypothesis. The findings reveal a stark divide: in high-income areas, nearly 82% of damaged buildings were rebuilt as improved structures, compared to just 33% in low-income tracts. Meanwhile, damaged properties in poorer neighborhoods were significantly more likely to remain empty lots for years, highlighting how current recovery mechanisms exacerbate economic disparities.
Saudi Hospital Scales AI Beyond Pilots
King Faisal Specialist Hospital and Research Centre (KFSHRC) in Saudi Arabia has moved beyond pilot testing to embedding AI across its entire operation. The hospital established a Digital Innovation Hub with a lean team of fewer than 10 core members, treating every AI tool as a product with its own roadmap and lifecycle ownership. Their “value first, models second” approach ensures solutions address real clinical or operational needs, deploying over 30 AI models by adhering to a rigorous seven-stage governance cycle that includes data preparation, proof-of-value, and ongoing monitoring. This ecosystem allows them to scale innovations safely and sustainably while avoiding the “innovation theatre” trap.
Tokens of Wisdom
There is a huge opportunity for AI-based tools to support the community, but we’re really quite adamant about human in the loop. While LLMs are amazing, they’re not good enough to write an encyclopedia article.
—Jimmy Wales, Founder of Wikipedia
We had the great honor of hosting Jimmy Wales, founder of Wikipedia, on our latest podcast. We discussed the platform’s early challenges, the importance of trust and neutrality, the role of AI in content creation, and much more.







Really well synthesized roundup. The Perplexity finding that 57% of agent activity is cognitive work rather than simple logistics totally flips the narrative I've been hearing everywhere. Seems like the "digital butler" pitch is basically just marketing, and what people actually want is a tool to extend their thinking capacity. The frontier gap in the OpenAI data is also kinda wild, that 6x message gap suggests the real value isn't in using AI occasionally but in fully integrating it into your workflow.