These instructions outline the workflow of adding a new dataset to the Earth Engine catalog. See docs on making simple edits for pointers on creating pull requests in GitHub.
Publisher catalogs should be used whenever possible.
Previously, adding user-uploaded assets to the public catalog involves mirroring these assets into public Earth Engine folders. However, the source user-uploaded assets still needed to be kept in user folders for as long as the dataset is present in the catalog.
To add a new dataset:
- File a request and get confirmation that the dataset will be accepted.
- Write a jsonnet file describing the dataset. Write example JS scripts.
- Create and submit a GitHub pull request with these files.
See also dataset acceptance criteria.
Large-scale data uploads and writing good dataset descriptions can be complicated, so please follow these timing rules if you have a dataset launch deadline:
- 4 weeks before the launch: notify the Earth Engine Data team about the upcoming dataset
- 2 weeks before the launch: upload test data to Earth Engine and send them for a review
- 1 week before the launch: create a pull request for the dataset description and send it for review
Note: For large vector datasets, it may be preferable to ingest them into BigQuery. See Guide: Ingesting Geospatial Vector Datasets into BigQuery for detailed instructions.
-
File a bug to add a new dataset or to update an existing one. Reference the existing user-uploaded asset id and make sure the asset is publicly readable.
-
Get a general confirmation from the Earth Engine Data team that the dataset will be accepted.
-
Only for mirrored datasets. Choose a public dataset id that the data will be mirrored to.
-
Only for mirrored datasets. Wait until Earth Engine Data team configures asset mirroring.
-
Create a jsonnet file describing the dataset using any of the existing files as a starting point. See also template files with field annotations. The order of the fields that are located at the same level does not matter.
-
The text fields support markdown syntax. If editing markdown in jsonnet fields becomes too cumbersome, you can use 'importstr' to import text from a separate .md file (see example). Use tools like markdownlivepreview.com to preview your markdown content.
-
Make sure the
gee:terms_of_usefield describes the data license and thelinksfields containsee.link.license()pointing at the URL with the licensing terms. -
New dataset will not be activated at first. To indicate that, set
'gee:status': 'beta'at the top level. -
Add a pointer to the new file to the
catalog.jsonnetfile in the same directory. -
In the examples/ directory, create a JavaScript file that will be used as the main example.
-
In the same directory, create another JavaScript file that generates a 256x256 preview thumbnail. This thumbnail will be used in the catalog to identify the dataset, so choose a representative and good-looking visualization. Make sure to hide the basemap (e.g., by using a single-color background).
-
Create a GitHub pull request with all the files you changed or added.
-
This will trigger automatic syntax and validity checks. Their results can be seen in the "Checks" section of the pull request UI. Fix as many issues as you can, and ask the Earth Engine Data team for help with the rest.
-
When all the checks pass, ask the Earth Engine Data team to review the PR.
-
When the PR is approved, submit it. Wait for the Earth Engine Data team to activate the dataset (this usually simply means adding the thumbnails generated by the preview script).
-
If your jsonnet file specified classification band colors or image attributes that you'd like to preserve on catalog datasets, make sure the Earth Engine Data team runs the mirroring job once again to set those fields.
-
Review the dataset page in the HTML catalog to make sure everything looks good.
One of the benefits of Earth Engine for end users is uniform presentation of data with few surprises. This is achieved by making sure the datasets are normalized to a common form as much as possible during the ingestion/preparation phase, which means a little bit more work up front for data producers.
Here is some advice for data normalization.
-
For global datasets, prefer single assets over tiled mosaics.
-
Images with the same band signatures should be in the same image collection. However, collections should be homogeneous - if not all assets in a collection have the same band names and types, either reingest the assets to make the bands the same or use multiple collections.
-
Use human-readable band names, not the default 'b1', 'b2', etc.
-
Set UTC start and end times on all assets.
-
Make sure bands with non-continuous values (e.g., classification or bitmask bands) are ingested with the pyramiding policy MODE, not the default policy MEAN.
-
Don't mix continuous and classification values in the same band - create two separate bands in such cases.
-
If your datasets have multiple versions, create successor/predecessor links using the versioning approach similar to this one: put a version map into a file named
dataset.libsonnet, then use this map in every jsonnet file. Mark all but the most recent versions with"gee_status": "deprecated". -
Don't create new single-dataset keywords. If you feel a new keyword would make sense, propose other existing datasets where it should also be added.
- If you are getting the error "text block not terminated with |||”.",the problem is with indentation. You can temporarily fix it by switching, e.g.,
description: |||
Badly
indented
text
|||
to
description: |||
one good line
|||,
and then gradually reintroducing the real lines. Make sure to strip trailing whitespaces on empty lines and watch out for tab characters.