|
| 1 | +## Spatial vector data |
| 2 | + |
| 3 | +### About |
| 4 | + |
| 5 | +Spatial vector files are geospatial files that represent geographic features using points, lines, and polygons. Spatial vector files can include GeoJSON (.json), ESRI shapefiles (.shp), GeoPackage (.gpkg), GeoParquet (.parquet), Google Keyhole Markup Language (.kml, .kmz), etc. |
| 6 | + |
| 7 | +### Processing `spatialVector` entities |
| 8 | + |
| 9 | +In addition to the usual metadata information we'll need (description, attributes, physical), we'll need some additional metadata to create a `spatialVector` entity. In particular, we'll need to know the *geometry* and *coordinate reference system* of the file. |
| 10 | + |
| 11 | +To do this, we can either get this information from the submitter directly or upload the vector file into QGIS (or another GIS software) to explore its metadata. Otherwise, it will take some extra sleuthing and file processing in R from our end. Here we'll go over some techniques to gather this information from vector files. Then, we'll show how to create a `spatialVector` entity within an EML doc. |
| 12 | + |
| 13 | +We'll start by setting our node, reading in the data package, and gathering the PID of our geospatial file: |
| 14 | + |
| 15 | +```{r, eval=FALSE} |
| 16 | +library(sf) |
| 17 | +library(dataone) |
| 18 | +library(datapack) |
| 19 | +library(uuid) |
| 20 | +library(arcticdatautils) |
| 21 | +library(EML) |
| 22 | +
|
| 23 | +### Set up node and gather data package |
| 24 | +d1c <- dataone::D1Client("...", "urn:node:...") # Setting the Member Node |
| 25 | +resourceMapId <- "..." # Get data package PID (resource map ID) |
| 26 | +dp <- getDataPackage(d1c, identifier = resourceMapId, lazyLoad = TRUE, quiet = FALSE) # Gather data package |
| 27 | +
|
| 28 | +### Load in Metadata EML |
| 29 | +metadataId <- selectMember(dp, name="sysmeta@formatId", value="https://eml.ecoinformatics.org/eml-2.2.0") # Get metadata PID |
| 30 | +doc <- read_eml(getObject(d1c@mn, metadataId)) # Read in metadata EML file |
| 31 | +
|
| 32 | +### Read in spatial vector file |
| 33 | +spatial_vector_pid <- selectMember(dp, "sysmeta@fileName", "exampleFile.zip") |
| 34 | +``` |
| 35 | + |
| 36 | +#### Reading in the vector file |
| 37 | + |
| 38 | +We'll first need to read in the vector file to extract the necessary metadata. |
| 39 | + |
| 40 | +##### ESRI shapefiles |
| 41 | + |
| 42 | +To find information from ESRI shapefiles, we can first use a function `arcticdatautils::read_zip_shapefile()`. |
| 43 | + |
| 44 | +```{r, eval=FALSE} |
| 45 | +shapefile <- arcticdatautils::read_zip_shapefile(d1c@mn, shp_pid) |
| 46 | +``` |
| 47 | + |
| 48 | +##### GeoJSON, GeoPackage, and Parquet files |
| 49 | + |
| 50 | +For GeoJSON, GeoPackage, and Parquet files, we don't have an arcticdatautils function to read the file from the node, so you'll need to download the file locally. We can use the `sf` library to read in these vector files instead. |
| 51 | + |
| 52 | +```{r, eval=FALSE} |
| 53 | +geojson_file <- sf::st_read("~/path/to/vectorFile.json") |
| 54 | +geopackage_file <- sf::st_read("~/path/to/vectorFile.gpkg") |
| 55 | +geoparquet_file <- sf::st_read("~/path/to/vectorFile.parquet") |
| 56 | +``` |
| 57 | + |
| 58 | +#### Exploring vector file for metadata |
| 59 | + |
| 60 | +To find information from ESRI shapefiles, GeoJSONs, GeoPackages, and Parquet files, we can use the `sf` library again to find the *coordinate reference system* and *geometry*. |
| 61 | + |
| 62 | +```{r, eval=FALSE} |
| 63 | +### Get coordinate reference system |
| 64 | +sf::st_crs(file) |
| 65 | +
|
| 66 | +### Find the geometry |
| 67 | +sf::st_geometry(file) |
| 68 | +``` |
| 69 | + |
| 70 | +To reference the names of the coordinate reference systems, we can use `arcticdatautils::get_coord_list()`. |
| 71 | + |
| 72 | +##### Additional files |
| 73 | + |
| 74 | +For `.kml` and `.kmz` files, or other vector files not mentioned, there may be other libraries in R that can be used to explore their metadata. Uploading the file into QGIS or another GIS software is another quick way to retrieve this metadata information. |
| 75 | + |
| 76 | +#### Edit format ID |
| 77 | + |
| 78 | +Next, we'll want to check the format ID and, if necessary, change the format ID to reflect the correct file type. If it needs to be changed to an ESRI shapefile, we'll do the following: |
| 79 | + |
| 80 | +```{r, eval=FALSE} |
| 81 | +spatial_vector_pid <- selectMember(dp, "sysmeta@fileName", "exampleFile.zip") |
| 82 | +sysmeta <- dataone::getSystemMetadata(d1c@mn, spatial_vector_pid) |
| 83 | +sysmeta@formatId <- "application/vnd.shp+zip" |
| 84 | +
|
| 85 | +dataone::updateSystemMetadata(d1c@mn, spatial_vector_pid, sysmeta) |
| 86 | +``` |
| 87 | + |
| 88 | +You can check for format IDs in this [documentation](https://cn.dataone.org/cn/v2/formats). |
| 89 | + |
| 90 | +#### Creating `spatialVector` entity |
| 91 | + |
| 92 | +Next, we'll be creating our `spatialVector` entity. We can use an `arcticdatautils` function to do this. Then, we'll add it to the EML doc. |
| 93 | + |
| 94 | +One thing we'll need for this entity is an attribute list. If one was already created from the web editor, you can copy that over. Otherwise, you can use R to create and add one for this file. The example code below will assume that we're copying the attribute list over from the `otherEntity` of an ESRI shapefile. |
| 95 | + |
| 96 | +```{r, eval=FALSE} |
| 97 | +spatialVector <- arcticdatautils::pid_to_eml_entity(d1c@mn, |
| 98 | + spatial_vector_pid, |
| 99 | + entity_type = "spatialVector", |
| 100 | + entityName = "exampleFile.zip", |
| 101 | + entityDescription = "spatial vector description", |
| 102 | + attributeList = doc$dataset$otherEntity[[i]]$attributeList, |
| 103 | + geometry = "Polygon", |
| 104 | + spatialReference = "list(horizCoordSysName = GCS_North_American_1983")) |
| 105 | +
|
| 106 | +doc$dataset$spatialVector[[1]] <- spatialVector |
| 107 | +
|
| 108 | +doc$dataset$otherEntity[[i]] <- NULL # removing the previous otherEntity of the file |
| 109 | +``` |
| 110 | + |
| 111 | +Finally, we'll run `eml_validate(doc)` to make sure everything is fine. |
| 112 | + |
| 113 | +### Example script |
| 114 | + |
| 115 | +Here is an example script combining everything when processing an ESRI shapefile: |
| 116 | + |
| 117 | +```{r, eval=FALSE} |
| 118 | +### Set up node and gather data package |
| 119 | +d1c <- dataone::D1Client("PROD", "urn:node:ARCTIC") # Setting the Member Node |
| 120 | +resourceMapId <- "..." # Get data package PID (resource map ID) |
| 121 | +dp <- getDataPackage(d1c, identifier = resourceMapId, lazyLoad = TRUE, quiet = FALSE) # Gather data package |
| 122 | +
|
| 123 | +### Load in Metadata EML |
| 124 | +metadataId <- selectMember(dp, name="sysmeta@formatId", value="https://eml.ecoinformatics.org/eml-2.2.0") # Get metadata PID |
| 125 | +doc <- read_eml(getObject(d1c@mn, metadataId)) # Read in metadata EML file |
| 126 | +
|
| 127 | +### Creating Spatial Vector |
| 128 | +
|
| 129 | +# read in shapefile |
| 130 | +shp_pid <- selectMember(dp, "sysmeta@fileName", "PeatTess.zip") |
| 131 | +shapefile <- arcticdatautils::read_zip_shapefile(d1c@mn, shp_pid) |
| 132 | +
|
| 133 | +# get coordinate system |
| 134 | +sf::st_crs(shapefile) # -> GCS_North_American_1927 |
| 135 | +
|
| 136 | +# find geometry of shapefile |
| 137 | +sf::st_geometry(shapefile) # -> polygon |
| 138 | +
|
| 139 | +### Edit formatId |
| 140 | +
|
| 141 | +# Format ID |
| 142 | +vector_pid <- selectMember(dp, "sysmeta@fileName", "PeatTess.zip") |
| 143 | +sysmeta <- getSystemMetadata(d1c@mn, vector_pid) |
| 144 | +sysmeta@formatId <- "application/vnd.shp+zip" |
| 145 | +
|
| 146 | +updateSystemMetadata(d1c@mn, vector_pid, sysmeta) |
| 147 | +
|
| 148 | +### Create spatial vector entity |
| 149 | +spatialVector <- pid_to_eml_entity(d1c@mn, |
| 150 | + shp_pid, |
| 151 | + entity_type = "spatialVector", |
| 152 | + entityName = "PeatTess.zip", |
| 153 | + entityDescription = "1km tessellation of the Alaska peatland map", |
| 154 | + attributeList = doc$dataset$otherEntity$attributeList, |
| 155 | + geometry = "Polygon", |
| 156 | + spatialReference = list(horizCoordSysName = "GCS_North_American_1927")) |
| 157 | +
|
| 158 | +# add spatial vector to doc |
| 159 | +doc$dataset$spatialVector[[1]] <- spatialVector |
| 160 | +
|
| 161 | +# NULL the corresponding otherEntity |
| 162 | +doc$dataset$otherEntity <- NULL |
| 163 | +
|
| 164 | +eml_validate(doc) |
| 165 | +``` |
0 commit comments