Dataset Schema Generator — Describe and Distribute Your Data

Generate valid Dataset JSON‑LD with creator, license, coverage, and file distributions (CSV/JSON). Improve understanding and discovery.

Try the Dataset Generator

Why many dataset pages underperform

Pain points we solve

Datasets lack a clear description of scope and variables.
No creator or license is declared, creating ambiguity for reuse.
Downloads are listed but not modeled as distributions with formats.
Temporal or spatial coverage is missing, reducing context.

How SwiftSchema helps

Solution

The Dataset generator guides the essentials: name, precise description, creator (Organization or Person), license, and publication date with keywords.

It supports multiple distributions using DataDownload entries (each with contentUrl and encodingFormat), plus temporal/spatial coverage and variableMeasured.

How it works

Choose Dataset in the generator below.
Enter name, a concise description of scope and variables, and a creator.
Declare license and datePublished; add keywords for discoverability.
For downloads, add distributions with contentUrl and encodingFormat (CSV, JSON).
Include temporalCoverage, spatialCoverage, and variableMeasured where applicable; validate in the Rich Results Test.

Generate Dataset JSON‑LD

Paste once per dataset. Validate. Ship.

What is Dataset structured data?

Dataset schema describes datasets so they’re discoverable via Google Dataset Search, knowledge panels, and research tools. It captures what the dataset contains, who created it, how it’s licensed, how to access it (distributions), and contextual metadata (temporal/spatial coverage, variables, formats). Quality metadata increases trust and reuse, assisting researchers, journalists, and public APIs.

Essential properties

name
— human-readable title.
description
— summary covering scope, methodology, and coverage; avoid marketing fluff.
creator
/
publisher
— Organization or Person responsible for the dataset.
license
— URL to license terms (CC BY, proprietary, etc.).
datePublished
/
dateModified
— publication and update timestamps.
keywords
— tags for search (topic, domain, dataset type).
distribution
— array of DataDownload objects with
contentUrl
,
encodingFormat
,
name
,
contentSize
where applicable.
temporalCoverage
/
spatialCoverage
— coverage period and geography.
variableMeasured
— list of variables/columns.
identifier
/
sameAs
— DOIs, catalog IDs, or canonical references.
publisher
— Organization responsible for publishing the dataset.
version
— optional; note dataset version or release tag.

Content prep checklist

Write a concise overview addressing what, who, where, when, how (methodology), and why the data exists.
Gather metadata: creator, publisher, license, release date, update cadence, contact info.
Compile download URLs (CSV, JSON, Parquet, API endpoints) and note their content type, size, and access method (open, requires login).
Document key variables/columns, units, and data dictionary references.
Capture temporal span (e.g.,
2018-01/2024-12
) and geographic coverage (city, country, global).
Include contact or access instructions for restricted datasets.

Implementation workflow

Update the dataset page with the metadata above and ensure downloads are accessible.
Generate Dataset JSON‑LD with
name
,
description
,
creator
,
license
,
datePublished
,
keywords
.
Add distributions for each file type; include
contentUrl
,
encodingFormat
, optional
name
,
contentSize
,
requiresSubscription
.
Include coverage metadata (
temporalCoverage
,
spatialCoverage
,
variableMeasured
) when available.
Embed the JSON‑LD on the dataset landing page; avoid duplicating across multiple pages unless they represent distinct datasets.
Validate using Google’s Dataset structured data tester and Rich Results Test.
Update regularly: bump
dateModified
and version when data changes; refresh distributions if URLs or sizes change.

Linking datasets to other resources

Use
isPartOf
to link dataset collections (e.g., “Traffic Data Portal”) or parent publications.
Reference supporting documentation (methodology PDF, data dictionary) via
citation
or
distribution
.
Link to related APIs using
distribution
entries with
contentUrl
pointing to API endpoints.
Provide
sameAs
links to GitHub repos, DOIs, or government catalog entries so search engines can cross-reference.

Versioning and maintenance

Include
version
or mention versions in the description; update
dateModified
whenever new data is added.
Maintain change logs noting schema updates or variable changes.
If you retire a dataset, mark it as deprecated in the content and remove distribution links or update them to note archival status.
Monitor download endpoints for uptime; broken contentUrl links hurt trust.
Use stable
@id
values if referencing the dataset elsewhere; update once rather than regenerating IDs.
Keep
sameAs
links current when moving repos/catalog entries; add redirects when possible.

Troubleshooting checklist

Missing distributions: ensure each DataDownload entry has
contentUrl
and
encodingFormat
.
Vague descriptions: provide detailed summaries; avoid boilerplate marketing copy.
Absent license: link to a public license or usage terms.
No coverage metadata: include temporal and spatial coverage when the dataset is time- or location-specific.
Unlinked identifiers: add DOIs or catalog IDs in
identifier
or
sameAs
.
No publisher/creator: include at least one of them; prefer both.
Encoding format missing: specify MIME types (
text/csv
,
application/json
,
application/zip
).

Common Errors & Fixes

Missing distribution details: include
contentUrl
,
encodingFormat
, and descriptive names for each file.
Vague description: summarize content, collection method, and coverage clearly.
Absent license: link to a public license or usage terms.
No creator/publisher: specify the owning organization or person for credibility.
Outdated metadata: update
dateModified
and distributions when data changes.

On-page parity checklist

Dataset title and summary on-page match
name
and
description
.
Creator/publisher on-page match
creator
/
publisher
in schema; license link is visible and matches
license
.
Download links on-page match
distribution.contentUrl
and formats; sizes (if shown) align.
Temporal/spatial coverage on-page matches
temporalCoverage
/
spatialCoverage
.
Variables/columns documented on-page align with
variableMeasured
.
DOI or catalog IDs on-page match
identifier
/
sameAs
.

Validation and QA

Run the Dataset structured data tester and Rich Results Test after schema updates.
Check that
contentUrl
endpoints return 200 and are publicly accessible (or clearly marked as restricted).
Verify
encodingFormat
values are valid MIME types and correspond to the actual files.
Spot-check
dateModified
against file timestamps; keep it fresh when data updates.
Crawl periodically to detect broken download links or stale versions.

Required properties

name
description

Recommended properties

url
sameAs[]
identifier
creator.name
publisher.name
license
keywords[]
datePublished
dateModified
distribution.contentUrl
distribution.encodingFormat
distribution.contentSize
distribution.name
temporalCoverage
spatialCoverage.name
variableMeasured[]
isPartOf.name

Minimal Dataset JSON-LD

Validate

{
  "@context": "https://schema.org",
  "@type": "Dataset",
  "name": "City Bike Trips 2024",
  "description": "Aggregated daily bike trip counts by station for 2024.",
  "url": "https://example.com/datasets/city-bike-trips-2024",
  "creator": {
    "@type": "Organization",
    "name": "Example City DOT"
  },
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "datePublished": "2025-01-10",
  "keywords": [
    "bicycles",
    "mobility",
    "trips",
    "transportation"
  ],
  "variableMeasured": [
    "trips",
    "station_id"
  ],
  "temporalCoverage": "2024-01/2024-12",
  "spatialCoverage": {
    "@type": "Place",
    "name": "Example City"
  },
  "distribution": [
    {
      "@type": "DataDownload",
      "contentUrl": "https://example.com/datasets/city-bike-trips-2024.csv",
      "encodingFormat": "text/csv"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://example.com/datasets/city-bike-trips-2024.json",
      "encodingFormat": "application/json"
    }
  ]
}

FAQs

What’s essential for Dataset structured data?Show

Provide a clear `name` and `description`. Add a `creator`, `license`, and at least one `distribution` with `contentUrl` and `encodingFormat` when you can.

How do I represent multiple file formats?Show

Use multiple `distribution` entries, each with its own `contentUrl` and `encodingFormat` (for example, CSV, JSON).

Can I include a DOI?Show

Yes. Put the DOI in `identifier` and/or `sameAs`. Ensure it resolves publicly.

What about time and location?Show

Use `temporalCoverage` for the time span (e.g., `2018-01/2024-12`) and `spatialCoverage` with a place `name` for geographic scope.

Related types

Browse all schema types

References

Dataset Schema Generator — Describe and Distribute Your Data

Generate valid Dataset JSON‑LD with creator, license, coverage, and file distributions (CSV/JSON). Improve understanding and discovery.

Try the Dataset Generator

Why many dataset pages underperform

Pain points we solve

Datasets lack a clear description of scope and variables.
No creator or license is declared, creating ambiguity for reuse.
Downloads are listed but not modeled as distributions with formats.
Temporal or spatial coverage is missing, reducing context.

How SwiftSchema helps

Solution

The Dataset generator guides the essentials: name, precise description, creator (Organization or Person), license, and publication date with keywords.

It supports multiple distributions using DataDownload entries (each with contentUrl and encodingFormat), plus temporal/spatial coverage and variableMeasured.

How it works

Choose Dataset in the generator below.
Enter name, a concise description of scope and variables, and a creator.
Declare license and datePublished; add keywords for discoverability.
For downloads, add distributions with contentUrl and encodingFormat (CSV, JSON).
Include temporalCoverage, spatialCoverage, and variableMeasured where applicable; validate in the Rich Results Test.

Generate Dataset JSON‑LD

Paste once per dataset. Validate. Ship.

What is Dataset structured data?

Essential properties

name
— human-readable title.
description
— summary covering scope, methodology, and coverage; avoid marketing fluff.
creator
/
publisher
— Organization or Person responsible for the dataset.
license
— URL to license terms (CC BY, proprietary, etc.).
datePublished
/
dateModified
— publication and update timestamps.
keywords
— tags for search (topic, domain, dataset type).
distribution
— array of DataDownload objects with
contentUrl
,
encodingFormat
,
name
,
contentSize
where applicable.
temporalCoverage
/
spatialCoverage
— coverage period and geography.
variableMeasured
— list of variables/columns.
identifier
/
sameAs
— DOIs, catalog IDs, or canonical references.
publisher
— Organization responsible for publishing the dataset.
version
— optional; note dataset version or release tag.

Content prep checklist

Write a concise overview addressing what, who, where, when, how (methodology), and why the data exists.
Gather metadata: creator, publisher, license, release date, update cadence, contact info.
Compile download URLs (CSV, JSON, Parquet, API endpoints) and note their content type, size, and access method (open, requires login).
Document key variables/columns, units, and data dictionary references.
Capture temporal span (e.g.,
2018-01/2024-12
) and geographic coverage (city, country, global).
Include contact or access instructions for restricted datasets.

Implementation workflow

Update the dataset page with the metadata above and ensure downloads are accessible.
Generate Dataset JSON‑LD with
name
,
description
,
creator
,
license
,
datePublished
,
keywords
.
Add distributions for each file type; include
contentUrl
,
encodingFormat
, optional
name
,
contentSize
,
requiresSubscription
.
Include coverage metadata (
temporalCoverage
,
spatialCoverage
,
variableMeasured
) when available.
Embed the JSON‑LD on the dataset landing page; avoid duplicating across multiple pages unless they represent distinct datasets.
Validate using Google’s Dataset structured data tester and Rich Results Test.
Update regularly: bump
dateModified
and version when data changes; refresh distributions if URLs or sizes change.

Linking datasets to other resources

Use
isPartOf
to link dataset collections (e.g., “Traffic Data Portal”) or parent publications.
Reference supporting documentation (methodology PDF, data dictionary) via
citation
or
distribution
.
Link to related APIs using
distribution
entries with
contentUrl
pointing to API endpoints.
Provide
sameAs
links to GitHub repos, DOIs, or government catalog entries so search engines can cross-reference.

Versioning and maintenance

Include
version
or mention versions in the description; update
dateModified
whenever new data is added.
Maintain change logs noting schema updates or variable changes.
If you retire a dataset, mark it as deprecated in the content and remove distribution links or update them to note archival status.
Monitor download endpoints for uptime; broken contentUrl links hurt trust.
Use stable
@id
values if referencing the dataset elsewhere; update once rather than regenerating IDs.
Keep
sameAs
links current when moving repos/catalog entries; add redirects when possible.

Troubleshooting checklist

Missing distributions: ensure each DataDownload entry has
contentUrl
and
encodingFormat
.
Vague descriptions: provide detailed summaries; avoid boilerplate marketing copy.
Absent license: link to a public license or usage terms.
No coverage metadata: include temporal and spatial coverage when the dataset is time- or location-specific.
Unlinked identifiers: add DOIs or catalog IDs in
identifier
or
sameAs
.
No publisher/creator: include at least one of them; prefer both.
Encoding format missing: specify MIME types (
text/csv
,
application/json
,
application/zip
).

Common Errors & Fixes

Missing distribution details: include
contentUrl
,
encodingFormat
, and descriptive names for each file.
Vague description: summarize content, collection method, and coverage clearly.
Absent license: link to a public license or usage terms.
No creator/publisher: specify the owning organization or person for credibility.
Outdated metadata: update
dateModified
and distributions when data changes.

On-page parity checklist

Dataset title and summary on-page match
name
and
description
.
Creator/publisher on-page match
creator
/
publisher
in schema; license link is visible and matches
license
.
Download links on-page match
distribution.contentUrl
and formats; sizes (if shown) align.
Temporal/spatial coverage on-page matches
temporalCoverage
/
spatialCoverage
.
Variables/columns documented on-page align with
variableMeasured
.
DOI or catalog IDs on-page match
identifier
/
sameAs
.

Validation and QA

Run the Dataset structured data tester and Rich Results Test after schema updates.
Check that
contentUrl
endpoints return 200 and are publicly accessible (or clearly marked as restricted).
Verify
encodingFormat
values are valid MIME types and correspond to the actual files.
Spot-check
dateModified
against file timestamps; keep it fresh when data updates.
Crawl periodically to detect broken download links or stale versions.

Required properties

name
description

Recommended properties

url
sameAs[]
identifier
creator.name
publisher.name
license
keywords[]
datePublished
dateModified
distribution.contentUrl
distribution.encodingFormat
distribution.contentSize
distribution.name
temporalCoverage
spatialCoverage.name
variableMeasured[]
isPartOf.name

Minimal Dataset JSON-LD

Validate

{
  "@context": "https://schema.org",
  "@type": "Dataset",
  "name": "City Bike Trips 2024",
  "description": "Aggregated daily bike trip counts by station for 2024.",
  "url": "https://example.com/datasets/city-bike-trips-2024",
  "creator": {
    "@type": "Organization",
    "name": "Example City DOT"
  },
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "datePublished": "2025-01-10",
  "keywords": [
    "bicycles",
    "mobility",
    "trips",
    "transportation"
  ],
  "variableMeasured": [
    "trips",
    "station_id"
  ],
  "temporalCoverage": "2024-01/2024-12",
  "spatialCoverage": {
    "@type": "Place",
    "name": "Example City"
  },
  "distribution": [
    {
      "@type": "DataDownload",
      "contentUrl": "https://example.com/datasets/city-bike-trips-2024.csv",
      "encodingFormat": "text/csv"
    },
    {
      "@type": "DataDownload",
      "contentUrl": "https://example.com/datasets/city-bike-trips-2024.json",
      "encodingFormat": "application/json"
    }
  ]
}

FAQs

What’s essential for Dataset structured data?Show

Provide a clear `name` and `description`. Add a `creator`, `license`, and at least one `distribution` with `contentUrl` and `encodingFormat` when you can.

How do I represent multiple file formats?Show

Use multiple `distribution` entries, each with its own `contentUrl` and `encodingFormat` (for example, CSV, JSON).

Can I include a DOI?Show

Yes. Put the DOI in `identifier` and/or `sameAs`. Ensure it resolves publicly.

What about time and location?Show

Use `temporalCoverage` for the time span (e.g., `2018-01/2024-12`) and `spatialCoverage` with a place `name` for geographic scope.

Related types

Browse all schema types

SwiftSchema

Intuitive Schema Generation at Your Fingertips

Dataset Schema Generator — Describe and Distribute Your Data

Why many dataset pages underperform

Pain points we solve

How SwiftSchema helps

Solution

How it works

How it works

What is Dataset structured data?

Essential properties

Content prep checklist

Implementation workflow

Linking datasets to other resources

Versioning and maintenance

Troubleshooting checklist

Common Errors & Fixes

On-page parity checklist

Validation and QA

Required properties

Recommended properties

FAQs

Related types

References

Generate Dataset schema

Dataset Schema Generator — Describe and Distribute Your Data

Why many dataset pages underperform

Pain points we solve

How SwiftSchema helps

Solution

How it works

How it works

What is Dataset structured data?

Essential properties

Content prep checklist

Implementation workflow

Linking datasets to other resources

Versioning and maintenance

Troubleshooting checklist

Common Errors & Fixes

On-page parity checklist

Validation and QA

Required properties

Recommended properties

FAQs

Related types

References

Generate Dataset schema