SwiftSchema Logo

    SwiftSchema

    Intuitive Schema Generation at Your Fingertips

    AboutLearnContact

    Dataset Schema Generator — Describe and Distribute Your Data

    Generate valid Dataset JSON‑LD with creator, license, coverage, and file distributions (CSV/JSON). Improve understanding and discovery.

    Why many dataset pages underperform

    Pain points we solve

    • Datasets lack a clear description of scope and variables.
    • No creator or license is declared, creating ambiguity for reuse.
    • Downloads are listed but not modeled as distributions with formats.
    • Temporal or spatial coverage is missing, reducing context.

    How SwiftSchema helps

    Solution

    The Dataset generator guides the essentials: name, precise description, creator (Organization or Person), license, and publication date with keywords.

    It supports multiple distributions using DataDownload entries (each with contentUrl and encodingFormat), plus temporal/spatial coverage and variableMeasured.

    How it works

    How it works

    1. Choose Dataset in the generator below.
    2. Enter name, a concise description of scope and variables, and a creator.
    3. Declare license and datePublished; add keywords for discoverability.
    4. For downloads, add distributions with contentUrl and encodingFormat (CSV, JSON).
    5. Include temporalCoverage, spatialCoverage, and variableMeasured where applicable; validate in the Rich Results Test.
    Generate Dataset JSON‑LD

    Paste once per dataset. Validate. Ship.

    What is Dataset structured data?

    Dataset schema describes datasets so they’re discoverable via Google Dataset Search, knowledge panels, and research tools. It captures what the dataset contains, who created it, how it’s licensed, how to access it (distributions), and contextual metadata (temporal/spatial coverage, variables, formats). Quality metadata increases trust and reuse, assisting researchers, journalists, and public APIs.

    Essential properties

    1. name
      — human-readable title.
    2. description
      — summary covering scope, methodology, and coverage; avoid marketing fluff.
    3. creator
      /
      publisher
      — Organization or Person responsible for the dataset.
    4. license
      — URL to license terms (CC BY, proprietary, etc.).
    5. datePublished
      /
      dateModified
      — publication and update timestamps.
    6. keywords
      — tags for search (topic, domain, dataset type).
    7. distribution
      — array of DataDownload objects with
      contentUrl
      ,
      encodingFormat
      ,
      name
      ,
      contentSize
      where applicable.
    8. temporalCoverage
      /
      spatialCoverage
      — coverage period and geography.
    9. variableMeasured
      — list of variables/columns.
    10. identifier
      /
      sameAs
      — DOIs, catalog IDs, or canonical references.

    Content prep checklist

    • Write a concise overview addressing what, who, where, when, how (methodology), and why the data exists.
    • Gather metadata: creator, publisher, license, release date, update cadence, contact info.
    • Compile download URLs (CSV, JSON, Parquet, API endpoints) and note their content type, size, and access method (open, requires login).
    • Document key variables/columns, units, and data dictionary references.
    • Capture temporal span (e.g.,
      2018-01/2024-12
      ) and geographic coverage (city, country, global).
    • Include contact or access instructions for restricted datasets.

    Implementation workflow

    1. Update the dataset page with the metadata above and ensure downloads are accessible.
    2. Generate Dataset JSON‑LD with
      name
      ,
      description
      ,
      creator
      ,
      license
      ,
      datePublished
      ,
      keywords
      .
    3. Add distributions for each file type; include
      contentUrl
      ,
      encodingFormat
      , optional
      name
      ,
      contentSize
      ,
      requiresSubscription
      .
    4. Include coverage metadata (
      temporalCoverage
      ,
      spatialCoverage
      ,
      variableMeasured
      ) when available.
    5. Embed the JSON‑LD on the dataset landing page; avoid duplicating across multiple pages unless they represent distinct datasets.
    6. Validate using Google’s Dataset structured data tester and Rich Results Test.

    Linking datasets to other resources

    • Use
      isPartOf
      to link dataset collections (e.g., “Traffic Data Portal”) or parent publications.
    • Reference supporting documentation (methodology PDF, data dictionary) via
      citation
      or
      distribution
      .
    • Link to related APIs using
      distribution
      entries with
      contentUrl
      pointing to API endpoints.
    • Provide
      sameAs
      links to GitHub repos, DOIs, or government catalog entries so search engines can cross-reference.

    Versioning and maintenance

    • Include
      version
      or mention versions in the description; update
      dateModified
      whenever new data is added.
    • Maintain change logs noting schema updates or variable changes.
    • If you retire a dataset, mark it as deprecated in the content and remove distribution links or update them to note archival status.
    • Monitor download endpoints for uptime; broken contentUrl links hurt trust.

    Troubleshooting checklist

    • Missing distributions: ensure each DataDownload entry has
      contentUrl
      and
      encodingFormat
      .
    • Vague descriptions: provide detailed summaries; avoid boilerplate marketing copy.
    • Absent license: link to a public license or usage terms.
    • No coverage metadata: include temporal and spatial coverage when the dataset is time- or location-specific.
    • Unlinked identifiers: add DOIs or catalog IDs in
      identifier
      or
      sameAs
      .

    Common Errors & Fixes

    • Missing distribution details: include
      contentUrl
      ,
      encodingFormat
      , and descriptive names for each file.
    • Vague description: summarize content, collection method, and coverage clearly.
    • Absent license: link to a public license or usage terms.
    • No creator/publisher: specify the owning organization or person for credibility.
    • Outdated metadata: update
      dateModified
      and distributions when data changes.

    Required properties

    • name
    • description

    Recommended properties

    • url
    • sameAs[]
    • identifier
    • creator.name
    • license
    • keywords[]
    • datePublished
    • distribution.contentUrl
    • distribution.encodingFormat
    • temporalCoverage
    • spatialCoverage.name
    • variableMeasured[]
    • isPartOf.name
    Minimal Dataset JSON-LD
    Validate
    {
      "@context": "https://schema.org",
      "@type": "Dataset",
      "name": "City Bike Trips 2024",
      "description": "Aggregated daily bike trip counts by station for 2024.",
      "url": "https://example.com/datasets/city-bike-trips-2024",
      "creator": {
        "@type": "Organization",
        "name": "Example City DOT"
      },
      "license": "https://creativecommons.org/licenses/by/4.0/",
      "datePublished": "2025-01-10",
      "keywords": [
        "bicycles",
        "mobility",
        "trips",
        "transportation"
      ],
      "variableMeasured": [
        "trips",
        "station_id"
      ],
      "temporalCoverage": "2024-01/2024-12",
      "spatialCoverage": {
        "@type": "Place",
        "name": "Example City"
      },
      "distribution": [
        {
          "@type": "DataDownload",
          "contentUrl": "https://example.com/datasets/city-bike-trips-2024.csv",
          "encodingFormat": "text/csv"
        },
        {
          "@type": "DataDownload",
          "contentUrl": "https://example.com/datasets/city-bike-trips-2024.json",
          "encodingFormat": "application/json"
        }
      ]
    }

    FAQs

    What’s essential for Dataset structured data?Show
    Provide a clear `name` and `description`. Add a `creator`, `license`, and at least one `distribution` with `contentUrl` and `encodingFormat` when you can.
    How do I represent multiple file formats?Show
    Use multiple `distribution` entries, each with its own `contentUrl` and `encodingFormat` (for example, CSV, JSON).
    Can I include a DOI?Show
    Yes. Put the DOI in `identifier` and/or `sameAs`. Ensure it resolves publicly.
    What about time and location?Show
    Use `temporalCoverage` for the time span (e.g., `2018-01/2024-12`) and `spatialCoverage` with a place `name` for geographic scope.

    Generate Dataset schema

    Fill in page details, copy JSON or Script, and validate.

      Schema Type

      📊 Dataset Schema Generator

      SwiftSchema's Dataset Schema Generator is a must-have for organizations, institutions, and researchers dealing with extensive data. This generator formulates structured data to thoroughly describe datasets, enhancing their visibility on search platforms. Elevate the discoverability of your datasets and encourage openness, mutual work, and ongoing investigations with this sophisticated schema generator.

      Generated Schema

      Validate your schema here.