The Proprietary Data Theme
June 15, 2026
The Proprietary Data Theme

Pricing power may move to proprietary data.
You see the same AI trade everywhere right now. Chips, data centers, power, cooling, cloud, networking, and frontier models. Those areas still matter. Hyperscaler spending has not slowed, and AI infrastructure demand still looks strong.
But you should also pay attention to another part of the stack, the data layer. I do not mean broad internet data, scraped web pages, social media posts, forums, or reviews. I mean data tied to real industries. Insurance claims, medical outcomes, clinical trials, satellite imagery, credit files, employment records, commercial real estate comps, official sports feeds, and scientific journals.
These datasets matter because companies do not gather them overnight. They sit inside real workflows. They come from claims, tests, trials, records, property transactions, employment checks, official events, and other real activity. That makes them harder to copy.
Why generic data is not enough
The first mistake you make is treating all data the same. All data is not equal. Generic data has broad use. It includes web articles, forums, social media posts, reviews, and public content.
Reddit gives you a good example. It has valuable human conversation, niche communities, real opinions, and human input. But Reddit data is broad. It helps general AI models sound more human.
That differs from hard to gather enterprise data. A general AI model explains underwriting, but it does not own decades of claims history. A general AI model explains oncology, but it does not own patient level clinical data, genomic data, trial data, and outcomes data. A general model explains cap rates, but it does not own verified commercial real estate comps built over decades.
Sports betting gives you another clear example. A general model explains live odds, but it does not own official real time league data. Generic data helps general model, proprietary data helps specialized enterprise AI.
The model layer gets crowded
AI infrastructure was the first big trade. Chips, data centers, power, cooling, and networking all worked because AI requires massive compute. That trade still matters because hyperscalers keep spending. But where does durable pricing power settle?
I do not think consumer chatbots hold all of it. Competition keeps rising. Models keep improving. Open source keeps closing gaps. Your better question is simple, what happens when models start looking more alike?
OpenAI, Anthropic, Google, Meta, xAI, and open source developers all push in the same direction. They want faster models, cheaper inference, and better performance. That helps customers. It also pressures the model layer.
If model performance converges, scarce inputs gain value. The scarce input is data.
Enterprise AI needs industry data
Enterprises do not want a generic chatbot. They want AI built around their own workflows. Insurers want underwriting and claims automation. Pharma companies want trial design and real world evidence.
Law firms want legal research with sources. Banks want fraud detection and credit decisioning. Real estate firms want better property intelligence. Sportsbooks want official real time data. Governments want geospatial and defense analytics.
Each use case needs specialized data. The model matters, but the data grounds the model. If a company owns a dataset no one else has, you should pay attention. If that dataset makes an AI product more accurate, more useful, or safer for enterprise use, the company has leverage.
That is the whole point. The model layer gets more competitive. The data layer stays scarce.
The screen
Almost every company claims to have data.
That does not make them good proprietary data investments. You need a tighter screen. First, the data has to be hard to recreate. The data should come from real world activity, including claims, tests, trials, records, property transactions, employment checks, official events, and satellite captures.
Second, the data has to matter for specialized enterprise AI. You are not looking for data that makes a chatbot sound better. You are looking for data that improves high value workflows. Third, the data has to generate revenue through licensing, AI tools, APIs, analytics, workflow agents, pricing power, or better retention.
Fourth, the stock has to make sense. A strong dataset does not fix a bad balance sheet. It does not fix heavy dilution. It does not fix a stock already priced for a perfect outcome.
The watchlist
IQVIA is the healthcare and pharma data platform. You get exposure to prescription data, claims data, real world evidence, and clinical trial data. Healthcare AI needs trusted data, compliant data, and data tied to real outcomes. That makes IQVIA one of the cleaner ways to study this theme.
Verisk is the insurance data moat. You get claims, loss, property, peril, catastrophe, and risk data. Insurers use this data for underwriting, pricing, claims, and risk models. A frontier AI lab does not recreate that history from public web data.
CoStar is the commercial real estate data moat. Public listings are easy to find. Verified commercial real estate comps are harder. CoStar has spent decades building property data, market data, apartment data, and real estate intelligence. That data gains value if AI becomes a bigger part of real estate workflows.
Gartner is the contrarian name. At first glance, AI looks like a threat because AI tools summarize research. That pressures research seats and advisory products. But Gartner also owns decades of proprietary research, benchmarks, Magic Quadrants, Hype Cycles, and client workflow data.
You need to answer one question with Gartner. Does AI weaken Gartner’s front end, or does it increase the value of Gartner’s corpus?
Wiley is the scientific publishing AI licensing play. Wiley owns peer reviewed scientific content. AI companies need high quality, permissioned content from trusted journals. Wiley is also working to grow AI related revenue, and the Emerald acquisition adds more research content.
Tempus is the higher growth clinical and genomic data play. The data fits the theme well because it includes clinical data, genomic data, oncology data, and outcomes data. That data is scarce. The stock has more risk, so valuation, profitability, reimbursement, and execution all matter here.
Equifax owns credit, identity, income, and employment data. The Work Number is the key asset. Verified income and employment data is hard to infer. Lenders, employers, and benefit systems need accurate information, even though regulation limits what Equifax does with the data.
Morningstar is interesting because of PitchBook. Morningstar has fund data and ratings, but PitchBook is the stronger proprietary data asset. Private company data is fragmented and hard to collect. It matters for deal sourcing, private credit, valuation work, and research.
Genius Sports and Sportradar own official sports data rights. This category looks less obvious at first, but official sports data has real scarcity. It is live, licensed, and tied to league contracts. It matters for betting, media, prediction markets, and integrity products. AI improves the products, but the products still need the official feed.
Planet Labs owns a physical world data archive. The company images the Earth over time, and that historical record has value. No one goes back and captures yesterday. That matters for defense, agriculture, climate, insurance, supply chains, and geospatial analytics.
PL has already had a large move.
How to bucket the list
The high quality data compounders are IQVIA, Verisk, CoStar, and Morningstar. These companies have real businesses, real customers, and durable data assets.
The contrarian rerating names are Gartner, Wiley, and Equifax. The data assets deserve more attention.
The red flags
A company with data does not deserve a premium by default.
Generic data is the first one. If the data is public, scraped, or easy to copy, the moat is weak.
Internal only use matters too. Some companies have massive datasets, but they use the data only to improve their own business.
AI also threatens some business models. This matters for publishing, legal, research, and information services. AI increases the value of the source data. It also attacks the user interface.
Valuation matters. A great dataset at a bad price still creates risk. Balance sheet risk matters too. Debt, cash burn, and dilution change the trade.
The final point
The proprietary data theme is not about buying every company with data. You want scarce data. You want vertical data. You want data tied to real workflows. You want data AI labs do not already have.
The best AI data stocks might not look like AI stocks. They might look like insurance data companies, healthcare analytics companies, real estate platforms, research publishers, credit bureaus, sports data vendors, or satellite companies.
Models may commoditize. Scarce data does not.
-Pierce
Note: This article does not provide investment advice. The stocks mentioned should not be taken as recommendations. Your investments are solely your decisions.