About & methodology

How the dataset works and what the numbers mean.

What it is

Schema.org, in collaboration with Google, publishes a public dataset on the real-world usage of vocabulary terms (types like Person or properties like price) across millions of web domains. This dashboard offers a visual, observatory-style reading of it.

How the data is gathered

  • Gather: term frequencies are measured within Google's public web crawling infrastructure and aggregated per domain (not per page): using a term on 100 pages of the same site counts as one domain.
  • Group: instead of exact (noisy) numbers, sites are grouped into popularity tiers for stability and privacy.
  • Publish: a new file is published on GitHub every month.

The domain tiers

Each term is classified into one of these tiers, from the lowest to the highest number of unique domains:

< 1K1K - 10K10K - 100K100K - 1M1M - 10M10M+

The < 1K tier includes both brand-new terms and highly specialized ones (e.g. medical or government): it does not mean they are ignored.

Important notes

  • The data does not distinguish between JSON-LD, Microdata or RDFa: they are counted together.
  • The statistics reflect the web as indexed by Google; no crawl covers the entire web.
  • The format is open: other crawlers can contribute their own statistics in the same format.

Get the data

Download the raw data for the latest month:

This project

A static site built with Next.js: data is embedded at build time and served from the CDN, with no database or runtime functions. A monthly GitHub Action downloads any new snapshots and rebuilds the site, keeping costs near zero.

Available months: May 2026.

Sources