Dataset Schema Generator — Describe and Distribute Your Data
Generate valid Dataset JSON‑LD with creator, license, coverage, and file distributions (CSV/JSON). Improve understanding and discovery.
Why many dataset pages underperform
Pain points we solve
- Datasets lack a clear description of scope and variables.
- No creator or license is declared, creating ambiguity for reuse.
- Downloads are listed but not modeled as distributions with formats.
- Temporal or spatial coverage is missing, reducing context.
How SwiftSchema helps
Solution
The Dataset generator guides the essentials: name, precise description, creator (Organization or Person), license, and publication date with keywords.
It supports multiple distributions using DataDownload entries (each with contentUrl and encodingFormat), plus temporal/spatial coverage and variableMeasured.
How it works
How it works
- Choose Dataset in the generator below.
- Enter name, a concise description of scope and variables, and a creator.
- Declare license and datePublished; add keywords for discoverability.
- For downloads, add distributions with contentUrl and encodingFormat (CSV, JSON).
- Include temporalCoverage, spatialCoverage, and variableMeasured where applicable; validate in the Rich Results Test.
Paste once per dataset. Validate. Ship.
What is Dataset structured data?
Dataset schema describes datasets so they’re discoverable via Google Dataset Search, knowledge panels, and research tools. It captures what the dataset contains, who created it, how it’s licensed, how to access it (distributions), and contextual metadata (temporal/spatial coverage, variables, formats). Quality metadata increases trust and reuse, assisting researchers, journalists, and public APIs.
Essential properties
- name— human-readable title.
- description— summary covering scope, methodology, and coverage; avoid marketing fluff.
- creator/publisher— Organization or Person responsible for the dataset.
- license— URL to license terms (CC BY, proprietary, etc.).
- datePublished/dateModified— publication and update timestamps.
- keywords— tags for search (topic, domain, dataset type).
- distribution— array of DataDownload objects withcontentUrl,encodingFormat,name,contentSizewhere applicable.
- temporalCoverage/spatialCoverage— coverage period and geography.
- variableMeasured— list of variables/columns.
- identifier/sameAs— DOIs, catalog IDs, or canonical references.
Content prep checklist
- Write a concise overview addressing what, who, where, when, how (methodology), and why the data exists.
- Gather metadata: creator, publisher, license, release date, update cadence, contact info.
- Compile download URLs (CSV, JSON, Parquet, API endpoints) and note their content type, size, and access method (open, requires login).
- Document key variables/columns, units, and data dictionary references.
- Capture temporal span (e.g., 2018-01/2024-12) and geographic coverage (city, country, global).
- Include contact or access instructions for restricted datasets.
Implementation workflow
- Update the dataset page with the metadata above and ensure downloads are accessible.
- Generate Dataset JSON‑LD with name,description,creator,license,datePublished,keywords.
- Add distributions for each file type; include contentUrl,encodingFormat, optionalname,contentSize,requiresSubscription.
- Include coverage metadata (temporalCoverage,spatialCoverage,variableMeasured) when available.
- Embed the JSON‑LD on the dataset landing page; avoid duplicating across multiple pages unless they represent distinct datasets.
- Validate using Google’s Dataset structured data tester and Rich Results Test.
Linking datasets to other resources
- Use isPartOfto link dataset collections (e.g., “Traffic Data Portal”) or parent publications.
- Reference supporting documentation (methodology PDF, data dictionary) via citationordistribution.
- Link to related APIs using distributionentries withcontentUrlpointing to API endpoints.
- Provide sameAslinks to GitHub repos, DOIs, or government catalog entries so search engines can cross-reference.
Versioning and maintenance
- Include versionor mention versions in the description; updatedateModifiedwhenever new data is added.
- Maintain change logs noting schema updates or variable changes.
- If you retire a dataset, mark it as deprecated in the content and remove distribution links or update them to note archival status.
- Monitor download endpoints for uptime; broken contentUrl links hurt trust.
Troubleshooting checklist
- Missing distributions: ensure each DataDownload entry has contentUrlandencodingFormat.
- Vague descriptions: provide detailed summaries; avoid boilerplate marketing copy.
- Absent license: link to a public license or usage terms.
- No coverage metadata: include temporal and spatial coverage when the dataset is time- or location-specific.
- Unlinked identifiers: add DOIs or catalog IDs in identifierorsameAs.
Common Errors & Fixes
- Missing distribution details: include contentUrl,encodingFormat, and descriptive names for each file.
- Vague description: summarize content, collection method, and coverage clearly.
- Absent license: link to a public license or usage terms.
- No creator/publisher: specify the owning organization or person for credibility.
- Outdated metadata: update dateModifiedand distributions when data changes.
Required properties
namedescription
Recommended properties
urlsameAs[]identifiercreator.namelicensekeywords[]datePublisheddistribution.contentUrldistribution.encodingFormattemporalCoveragespatialCoverage.namevariableMeasured[]isPartOf.name
{
"@context": "https://schema.org",
"@type": "Dataset",
"name": "City Bike Trips 2024",
"description": "Aggregated daily bike trip counts by station for 2024.",
"url": "https://example.com/datasets/city-bike-trips-2024",
"creator": {
"@type": "Organization",
"name": "Example City DOT"
},
"license": "https://creativecommons.org/licenses/by/4.0/",
"datePublished": "2025-01-10",
"keywords": [
"bicycles",
"mobility",
"trips",
"transportation"
],
"variableMeasured": [
"trips",
"station_id"
],
"temporalCoverage": "2024-01/2024-12",
"spatialCoverage": {
"@type": "Place",
"name": "Example City"
},
"distribution": [
{
"@type": "DataDownload",
"contentUrl": "https://example.com/datasets/city-bike-trips-2024.csv",
"encodingFormat": "text/csv"
},
{
"@type": "DataDownload",
"contentUrl": "https://example.com/datasets/city-bike-trips-2024.json",
"encodingFormat": "application/json"
}
]
}