Disrupting the Future: 10 Bold Predictions for Data Science and AI in 2025!
On agents, open source models, safety, and more
At an AI conference at the close of the year, I was in the speakers’ lounge finishing up my work when three loud AI executives entered just before the penultimate panel for the day on “the future of AI”. After a quick glance in my direction (likely to ensure I was a harmless NPC), one of them loudly proclaimed “this must be my what…30th? 35? conference this year.”
After a pause he added, “…and you know what, they are all starting to sound the same.”
While musing about installing guardrails in my eardrums to filter out the humblebrag, I admit: he had a point. There is disturbing ‘sameness’ in AI narratives. It sounds something like:
AI agents and agentic workflows are the next wave.
AI pilots are aplenty. AI in production is dicey.
AI will not take your job, people who know AI will.
AI governance is important. Something something EU AI Act.
As we cross over into 2025, despite churning AI research publications at a rate of over 240,000 a year, and reproducibility crises aside, I wonder how many of them are truly groundbreaking rather than chasing the next incremental improvement of yet another non-standardized benchmark dataset. Likewise, new narratives seem as scarce as breakthrough AI research.
With that in mind, my predictions for 2025 attempt to provide a view of the tensions of AI, taking an unpopular but balanced view as someone whose work depends not on selling AI, but on implementing AI well — and living through the consequences of our decisions.
1. Agents are both the hype of 2025 and the caution of 1995
It is impossible to talk about the future of AI without a reference to the overwhelming amount of hype around agents, so let’s start by putting agents in proper perspective.
Firstly, agents represent a promising use case developed on top of generative AI (or ‘GenAI’ for short). A key underappreciated aspect of GenAI is that it is not just ‘generative’, it is also general. A single model can do multiple tasks, including things it was not explicitly trained to do.
As such, models trained on language also perform ‘reasoning’, and by chaining multiple calls of multiple models with different combinations of data, capabilities and provided context, it that allows semi-autonomous activity to be executed. The implications are profound:
Agents may be the new SaaS — Service as a Software.
They allow programs to be developed that are able to perform tasks, that would have otherwise required explicit and intentional development effort to do. And while autonomy is limited, that is the promise of agents in a nutshell.
But this is far from the first time agents are on the hype cycle.
At the same time, we need to take a clear eyed look beyond the hype. And for that I point to the above diagram — a snapshot of Gartner’s hype cycle for emerging technologies. In 1995.
Yes, 1995. When video conferencing and wifi were considered ‘emerging technologies’.
Generations of data science and AI professionals have grown up with less exposure to good old fashioned AI (or ‘GOFAI’). But the idea of agents has always been core to AI even before 1995.
In Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig, cited as the world’s most popular AI textbook and in use by over 1,550 universities, the idea of intelligent agents is literally on the second page of the preface.
So with the promise of agents, it is important to remember this is hardly the first time the world has attempted to create value from intelligent agents. The fundamentals of the technology has advanced, but it brings new issues for AI security and AI safety to solve, which takes time.
Essentially, we have upped capability but traded one failure mode for another. We have gone from brittle, narrow, handcrafted workflows and tightly defined knowledge, to broader, probabilistic workflows based on orchestrating, stacking and chaining failure prone reasoning and classification. And for good measure giving them memory and tools.
Like parenting a five year old learning to navigate the physical world, we could talk about how smart they will be in the future, but the key consideration today is working out how we communicate to them, what they are allowed to do, and what tools to keep out of their reach until they are sufficiently mature.
And until then teaching people around them not to take them too seriously.
2. Specialist and open source models give big providers a serious challenge
News on AI companies have disproportionately been focused on large companies like OpenAI, Anthropic, and Google. But thanks to a combination of open source releases and an early model leak that allowed widespread experimentation, open source and specialist models are currently poised to provide credible and meaningfully differentiated alternatives.
But considering frontier models can take up to an estimated $191 million to train, how is this possible?
The answer lies not just in the vague assertion that ‘open source is catching up’, but in the rise of distinct strategies around specialized models such as Qwen 2.5’s Party of Foundation Models.
Unlike the ‘T-shirt sizing’ approach of Llama 3’s models weighing in at 8B, 70B and 405B parameters, the Qwen model suite adopts a different approach and has separate models at different sizes for math, coding and language.
Being optimized for a narrower set of tasks is fundamentally more efficient from the outset in terms of training data, and gets better due to the ability to layer on task-specific optimizations.
Here’s to greener, more efficient and fit for purpose models.
3. Model cards soon give rise to agent cards and data cards.
For anyone unfamiliar, model cards were envisioned as a standardized report card for trained AI models to share information about performance, safety and suitability for various use cases. This would serve many stakeholders across the complex AI value chain, from policy makers, to operations teams and users.
Currently, model cards are working through a slew of issues, with a major one being the proliferation of non-comparable benchmarks. If models were students, it would be akin to one taking the SAT, another taking the GRE and a third GMAT, and all of them saying they topped the class — while being selective when sharing which class they meant.
Despite these issues, it is an area of intense focus and holds much promise.
Just as it is in the interest of educational institutions to ensure their students’ qualifications are recognized by employers, it is likewise in the interest of model providers to signal the quality of their models in ways recognized by deployers.
It therefore makes sense to see where this goes in a world that needs a business model for data providers and agents. The next wave of model cards will likely be supplemented by agent cards and data cards.
Agent cards are straightforward — model cards were always intended to transparently showcase capabilities, and agent cards would be the logical extension. It would include components like allowed actions, tool usage, data it is allowed to access and how it implements access rights, safety and security tests it has passed, and what it knows and remembers. In short it would be a resume of sorts for AI agents.
Data cards have a more complex history. Firstly, they are not new and have appeared in various forms and under different names as a vehicle to facilitate data sharing and usage. More recently, the ‘data mesh’ concept had data-as-a-product as a core principle.
It is not new in terms of naming either. Google attempted to propose the idea of a data card playbook. It is great open content and a highly laudable attempt — though it would be better if Google’s own models followed it.
Regardless of history, generative AI would be well served by data cards for a different reason. Having a standard for data transparency and provenance would be key to recognizing the creators who serve as data providers and include them in the value created by generative AI.
As a side note, in a world where the herd follows the likes of Meta (ahem), OpenAI and Anthropic, I was surprised to come across one of the best examples of data used in models came from none other than IBM. While their Granite models were relatively small and not built to beat current SOTA benchmarks, they were highly detailed in terms of training procedure and individual data sets used for model training.
4. The urgent evolution of unstructured data management.
For years we have been talking about how data is mostly unstructured, with estimates of the unstructured data falling in the range of 80–90%.
In parallel, in that same not-so-distant pre-generative AI past, we find that over 80% of companies were not able to take advantage of unstructured data.
See the problem? That’s right — for all the evolution of data warehouses, lakes, icebergs, and even lakehouses, the vast majority of today’s data management solutions are unprepared for effectively enabling generative AI.
The gap between querying JSON files and multiple knowledge representations and embeddings is large, and companies who are building large generative AI pushes are either building or acquiring it.
5. Impact of Generative AI on ‘classical’ AI.
It is important to remember that the transformer models that power GenAI are not just ‘generative’, they are also both ‘pre-trained’, and ‘general’. This carries massive implications for the teams that develop and deploy them.
Let’s look at these three changes in turn from the lens of DS and AI organizations:
Generative AI means Operations need to manage user generated content.
MLOps teams have taking on new responsibilities around generative AI. But some of these shifts are unnecessarily confusing, partly due to vendors coining their own terms to capture mindshare. I imagine them sitting around a conference table going: Hey, why don’t we upsize MLOps with a fancy new term like LLMOps. Or maybe AIOps. Heck just throw in AgentOps while we’re at it.
Regardless of where the bag of words land, the shift in model management skills is important. But it also misses the main point.
The defining feature of GenAI operations is not just managing bigger models, it is managing user generated content.
The work has shifted from narrow, purpose-built models, to chat applications that are now general purpose AI systems. And when users are given free rein in prompting, they invariably generate a swath of content they should not. This new world requires skills that traditionally sat more comfortably with social media platforms than enterprise teams — content filtering, moderation, user content policies and incident reporting — but this is what operations must now wrangle with.
Data scientists are moving from model training to model selection and evaluation.
What happens to teams when members that use to spend most of their time training models, now primarily use pre-trained models?
The answer is choosing which pre-trained model to fit to each use case (model selection) and doing so through understanding relevant performance, safety and security dimensions (model evaluation).
Transiting from model trainers to model assessors is not a trivial thing — it requires new knowledge, new tooling, more metrics than ever and an understanding of the difference between a general benchmark on a provider website and what one may experience when putting models in production (hint: not quite the same).
General purpose models provide both an opportunity to simplify the future and a burden to revisit the past.
Revisiting our first point, generative AI has impacted classical AI by introducing models that are not just “generative” but “general,” capable of performing diverse tasks beyond traditional, task-specific AI systems.
However, while generative AI models carry the possibility of simplifying complex model pipelines, they do not uniformly outperform statistical or classical machine learning models across all its dimensions. And even if they are superior in performance, one may choose to not adopt them for reasons of efficiency, interpretability, or consistency. Pretraining also raises concerns about biases and ethical implications due to no longer having full visibility of upstream data and training, making responsible development and usage more important than ever.
The work from comparing and updating models and pipelines between generative AI and classical models represents a new type of work imposed on existing data science and AI work portfolios that did not exist before.
6. The adjacency of AI safety and AI security means data science and cybersecurity must work together
One of the issues I seldom hear discussed is the important intersection between data science and cybersecurity, with much of the content coming from either side of the house but seldom both. However the early attempts at defining what sits on each side have been somewhat academic and not something that would add any sort of clarity to actual company departments.
However, pressure has been building up — no less than 10 nations have set up AI safety institutes since 2023, and AI security standards in both international bodies and the large security community are rapidly reaching implementation maturity.
There have also been excellent and recent publications in the space such as the paper AI Risk Management Should Incorporate Both Safety and Security from Princeton and a coalition of 16 other academic and industry researchers. It looks like AI safety and AI security are finally ready to join hands and step up as an effective joint force in 2025.
And I volunteer my own ELI5 summary which I hope will be freeing in its clarity and simplicity:
AI Security is about keeping AI systems safe from bad people; AI Safety is about keeping people safe from bad AI systems.
7. One trillion dollars of AI infrastructure investment in search of ROI redefines the business model for multimodal
Two Goldman Sachs reports recently mused on the ROI of a trillion dollars of capital expenditure spent on generative AI infrastructure and whether it was too much spend for too little benefit.
The dance of narratives of financial investment in AI is always interesting, with investment executives contorting themselves in market speak to justify FOMO.
The equation for the future of AI compute may be highly complex:
- Model sizes are up, with parameter count is up 2.8x a year since 2018. But scaling laws are being called into question.
- Model numbers are up, but we should be careful to not confuse model counts with fine-tuning. It is worth noting that of the 1.2m+ models on HuggingFace, over 50,000 of them may be fine-tuned variants of the Llama model family alone.
- Dependence of general purpose GPUs will likely be down in the medium term, with AMD (launching the M1300X last year), and hyperscalers (Microsoft Azure launched the Maia AI Accelerator) seeking to provide alternatives.
- Inference is being driven up by end user adoption, especially of high cost multimodal models, and back-end use of agentic workflows.
- Efficiencies are also up, with an steep rise in optimization techniques both across the GenAI pipeline and down the deployment stack. They are too numerous to enumerate but include components covering models (SLMs), prompt pipelines (compression), deployment (quantization) and infrastructure (performance scalability and optimization).
…but perhaps the more important issue is that for every number in the above list, someone not paying much today will have to pay for it tomorrow.
Much of that must hit the enterprise and end users, and we should expect to be the subject of pricing experiments aplenty in 2025.
EDIT: Since I drafted this, OpenAI has introduced a $200USD/month pro tier.
8. AI forces companies to re-confront citizen development
Citizen development has always been a grey area, an oxymoronic no-man’s land where labels and categories attempted to tame. Ultimately, somewhere between the ubiquitous Excel macro and a full stack application deployed in a production environment, lines need to be drawn on where proper technology risk controls are applied.
The trend towards higher level languages, libraries and frameworks with more abstraction has been a constant feature of software, as has the complementary trend towards low-code/no-code development. But it has been generative AI that is bringing perhaps its most powerful challenge yet.
Natural language being a programming language is the latest and largest challenge that seeks to bring down the walls between user and developer.
Regardless of what people think about generative AI, one if its most important features is how it has decisively crossed over from the realm of data scientists to become a consumer technology. And organizations now find themselves exposed to the risk of a new generation of citizen developers actively playing with — or armed with — AI and they have little choice but to confront it, or bear the risk of inaction.
9. Impending AI regulations creates an AI compliance industry
At the time of writing, there are no less than 1,800 national policies and strategies in play worldwide.
These have also spilled over into the legal realm in a number of areas previously less impacted prior to generative AI — In 2022, there were 110 AI-related legal cases in United States state and federal courts, roughly seven times more than in 2016. The majority of these cases originated in California, New York, and Illinois, and concerned issues relating to civil, intellectual property, and contract law.
What all these means is:
Discussions on responsible AI that were primarily among practitioners have now decisively moved from the lab and the office to the boardroom and the courtroom.
And this is a good thing.
With AI rapidly becoming a consumer technology and the principles behind it simple enough to broadly understand, it is time to put to bed the myth that only technology companies can understand it and embrace broader regulation.
However, this also means companies the world over need to evolve to comply with new laws, regulations, or at least internal policies. And this is no easy task.
To fill this gap, there will undoubtedly be a whole industry growing up to help them. Driven by the motivation of high principles mixed in with baser interests of profits and prestige.
10. New ways to think about AI and new business models
As we wind up, some ideas stand the test of time, and the way we get real value from AI is one. The breathless proclamations of the wildly varying ‘market size’ of AI mean little in terms of actual value for your workplace.
The AI market measures dollars spent, not value gained.
As I wrote in a similar article covering data science and AI predictions for 2020, getting real value from data science and AI is still a long and difficult journey.
And the root cause of it has little to do with AI as a technology. Or to be more specific, AI is a physical technology which evolve at the pace of science, but the bottleneck are often social technologies such as incentives, mindsets and institutions, which can only evolve at the pace at which humans can change — far, far slower.
To all friends and readers who have made it this far, I believe we have yet to see groundbreaking applications, and it is less a failure of technology but more a failure of imagination and incentives.
The most important models we can train are mental models, and the most important models to deploy are business models.