About & methodology

How the dataset works and what the numbers mean.

What it is

Schema.org, in collaboration with Google, publishes a public dataset on the real-world usage of vocabulary terms (types like Person or properties like price) across millions of web domains. This dashboard offers a visual, observatory-style reading of it.

How the data is gathered

Gather: term frequencies are measured within Google's public web crawling infrastructure and aggregated per domain (not per page): using a term on 100 pages of the same site counts as one domain.
Group: instead of exact (noisy) numbers, sites are grouped into popularity tiers for stability and privacy.
Publish: a new file is published on GitHub every month.

The domain tiers

Each term is classified into one of these tiers, from the lowest to the highest number of unique domains:

< 1K1K - 10K10K - 100K100K - 1M1M - 10M10M+

The < 1K tier includes both brand-new terms and highly specialized ones (e.g. medical or government): it does not mean they are ignored.

Important notes

The data does not distinguish between JSON-LD, Microdata or RDFa: they are counted together.
The statistics reflect the web as indexed by Google; no crawl covers the entire web.
The format is open: other crawlers can contribute their own statistics in the same format.

Get the data

Download the raw data for the latest month:

CSV (July 2026)JSON (July 2026)

This project

Independent community project and is not affiliated with, sponsored by, or endorsed by Schema.org or its founders.

Available months: May 2026, June 2026, July 2026.

SchemaStatsBot

When someone uses the validator, our crawler fetches that page on their behalf. It respects robots.txt, does not bulk-crawl the web, and can be blocked at any time.

Bot documentation