Get Your Open Data onto Data.gov
Data.gov Harvesting 101
Phil Ashlock & Rebecca Williams, Data.gov
Overview
Today we will discuss:
- History
- The Data.gov Harvesting Model
- Harvest Source Types
- Federal Data with Project Open Data
- Federal Geospatial Data
- Non-Federal Data
- Coordinate with the Data.gov Team
- Ready? Contact the Data.gov team today
- Resources
The Data.gov Harvesting Model
- Data.gov does not host data, it simply aggregates metadata
- Data.gov pulls metadata from harvest sources and synchronizes them to catalog.data.gov
- Data.gov is not used to make changes to metadata
- All changes to metadata are made at the harvest source rather than at data.gov
- If a metadata record is no longer listed by the harvest source is will be deleted on catalog.data.gov during the next harvest job
- Most harvest sources on catalog.data.gov are synchronized every 24 hours
Harvest Source Types
- Federal Data with Project Open Data
- Federal Geospatial Data
- Non-Federal Data
Federal Data with Project Open Data
- The Open Data Policy requires agencies to list and describe all data that can be made publicly available through a data.json file hosted on their website at agency.gov/data.json
- The data.json file must validate as JSON in order to be harvested by Data.gov
- The data.json file must follow the metadata schema and minimum requirements defined by Project Open Data.
- Each record listed in the data.json file must meet the metadata requirements to be harvested by data.gov, but an invalid record doesn't prevent valid records from being harvested
Coordinate with Data.gov
- Contact the Data.gov team to let them know you’d like to get started
- The Data.gov team will create a new Harvest Source for your metadata
- Testing. The Data.gov team will test to ensure the harvester works properly
- Go Live!
- Updates every 24 hours
- Receive error reports
Federal Geospatial Data
- Executive Order 12906 and OMB Circular A-16, revised (2002)
- The FGDC has endorsed several geospatial metadata standards, as directed by OMB Circular A-119, such as the Content Standard for Digital Geospatial Metadata (CSDGM) and ISO 19115:2003. ISO is recommended.
- GeoPlatform.gov and Data.gov
Geospatial Harvest Sources
- Geospatial metadata should be provided through a consolidated geospatial harvest source, preferably a single CSW endpoint for the entire department. All of this metadata will also be published to GeoPlatform.gov
- Any metadata that is not made available through this geospatial harvest source should be made available as a JSON file. This should be separate from the main data.json file and not include any datasets provided through the geospatial harvest source. This file should be made available at data.gov/data-nonspatial-harvest.json
- If an agency has a geospatial dataset in the data-nonspatial-harvest.json that should be part of GeoPlatform.gov but is not included in the CSW harvest source, it should include “geospatial” as a value for the “theme” field.
Non-Federal Data
- For non-federal sources, federal-specific fields are not required (bureauCode, programCode, dataQuality, primaryITInvestmentUII, and systemOfRecords) as described in the USG note on the schema documentation.
- The other Common Core Required and Required if Applicable fields are required, but some fields can be left out on a case-by-case basis in consultation with the Data.gov team, if the source cannot provide that particular field.
- Non-federal sources need to have a posted Data Policy, Terms of Use, or similar information posted, in order to make it clear to Data.gov users when they are viewing datasets that are not covered by federal statutory and regulatory requirements.
Coordinate with Data.gov
- Contact the Data.gov team to let them know you’d like to get started
- The Data.gov team will create a new Harvest Source for your metadata
- Testing. The Data.gov team will test to ensure the harvester works properly.
- Updates every 24 hours
Tools & Resources
- Project Open Data Documentation
- Validator
- Dashboard
- Conversion tools
- Changeset preview
- Inventory.data.gov
- CKAN experience
- Data.gov Team
People to help you
- Schedule a time to meet with us
- Let's walk through the process
- Look at your metadata
Get Your Open Data onto Data.gov
Data.gov Harvesting 101
Phil Ashlock & Rebecca Williams, Data.gov
/