About & methodology
How the dataset works and what the numbers mean.
What it is
Schema.org, in collaboration with Google, publishes a public dataset on the real-world usage of vocabulary terms (types like Person or properties like price) across millions of web domains. This dashboard offers a visual, observatory-style reading of it.
How the data is gathered
- Gather: term frequencies are measured within Google's public web crawling infrastructure and aggregated per domain (not per page): using a term on 100 pages of the same site counts as one domain.
- Group: instead of exact (noisy) numbers, sites are grouped into popularity tiers for stability and privacy.
- Publish: a new file is published on GitHub every month.
The domain tiers
Each term is classified into one of these tiers, from the lowest to the highest number of unique domains:
The < 1K tier includes both brand-new terms and highly specialized ones (e.g. medical or government): it does not mean they are ignored.
Important notes
- The data does not distinguish between JSON-LD, Microdata or RDFa: they are counted together.
- The statistics reflect the web as indexed by Google; no crawl covers the entire web.
- The format is open: other crawlers can contribute their own statistics in the same format.
Get the data
Download the raw data for the latest month:
This project
A static site built with Next.js: data is embedded at build time and served from the CDN, with no database or runtime functions. A monthly GitHub Action downloads any new snapshots and rebuilds the site, keeping costs near zero.
Available months: May 2026.