Processing large scale satellite imagery with openEO Platform and R

R-bloggers 2022-11-25

[This article was first published on r-spatial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

[view rawRmd]

Summary: openEO is an open source,community-based API for cloud-based processing of Earth Observationdata. This blog introduces the R openeo client, and demonstrates asample analysis using the openEO Platform forprocessing.

openEO

OpenEO is an open source project that tries to make large-scale,cloud-based satellite image processing easier and more open. This blogpost shows how the openEO client, available fromCRAN and with onlinedocumentation here, can beused to explore datasets, compute on them, and download results.

From image collection to data cube

Image collections are collections of satellite images that are allprocessed in a uniform way. They are typically organized in tiles (orgranules): raster files with all pixels obtained over an area observedin a single overpass of the satellite. Satellite image analysis ofteninvolves the analysis of a large number of tiles, obtained fromdifferent overpasses. The concept of data cubes makes it easier toanalyse such data: a user defines the spatial extent and resolution, andthe temporal extent and resolution, and all data is then reprocessed(aggregated) into the regular space and time resolutions. Cloudplatforms such as openEO let the user define a data cube, and reprocessthe data into this form before the final analysis steps are done. Moreexplanation about datacubes in openEO is foundhere. The samplesession below involves

  • selecting an image collection, along with spatial and temporalextents
  • selecting spectral bands
  • reducing the spectral dimension by computing a single index
  • aggregating the tiles to monthly median values
  • downloading and plotting the resulting image time series

Sample session

An openEO session is started with loading the library, connecting to aback-end (here: openeo.cloud), andauthenticating with the back-end:

library(openeo)con = connect("openeo.cloud")login()

The login will prompt for an authentication ID which can come from yourorganisation, or an openID-based mechanism such as google or GitHub. Forloggin on you need an account on the back-end, as cloud computing ingeneral is not a free resource. The openeo.cloudwebsite has instructions on how to apply for a (ESA-) sponsored accountwith limited compute credits.

When these commands were carried out, and you are using RStudio as anIDE, RStudio will show an overview of the image collections available onthis backend, which looks like this:

and which can help search for a particular collection. A more extensiveviewer of the data collections is obtained in RStudio by

collection_viewer()

which opens a view window like this:

The same view is obtained online using thislink. The set of imagecollections can also be obtained programmatically by

collections = list_collections(con)names(collections) |> head()## [1] "SENTINEL1_GRD_SIGMA0"     "S1_GRD_SIGMA0_ASCENDING" ## [3] "S1_GRD_SIGMA0_DESCENDING" "TERRASCOPE_S2_FAPAR_V2"  ## [5] "TERRASCOPE_S2_NDVI_V2"    "TERRASCOPE_S2_LAI_V2"length(collections)## [1] 77

which can then be further processed in R.

We will work with Sentinel level 2A data, available in the collection

collection = "SENTINEL2_L2A"coll_meta = describe_collection(collection)names(coll_meta)##  [1] "cube:dimensions" "description"     "extent"          "id"             ##  [5] "keywords"        "license"         "links"           "providers"      ##  [9] "stac_extensions" "stac_version"    "summaries"       "title"

information about the names and extents of data cube dimensions is forinstance obtained by

coll_meta$`cube:dimensions`## Dimension:    bands ## Type:         bands ## Values:       [B01,B02,B03,B04,B05,B06,B07,B08,B8A,B11,B12,SCL,relativeAzimuthAngles,sunZenithAngles,viewZenithAngles,B09,AOT,SNW,CLD,CLP,CLM,sunAzimuthAngles,viewAzimuthMean,viewZenithMean,dataMask]## ## Dimension:    t ## Type:         temporal ## Extent:       [2015-07-06T00:00:00Z,NULL] ## ## Dimension:    x ## Type:         spatial ## Axis:         x ## Extent:       [-180,180] ## ## Dimension:    y ## Type:         spatial ## Axis:         y ## Extent:       [-56,83]

Next select a spatial region, a set of spectral bands and a time periodto work on, by specifying a few R objects:

library(sf)## Linking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.1; sf_use_s2() is TRUEbbox = st_bbox(c(xmin = 7, xmax = 7.01, ymin = 52, ymax = 52.01), crs = 'EPSG:4326')bands = c("B04", "B08")time_range = list("2018-01-01", "2019-01-01")

We can then start building a process graph, the object that contains thework to be done on the back-end side. First we load the availableprocesses from the back-end:

p = openeo::processes(con)

The object p now contains the processes available on the backend weare using, and is used to define further tasks. We can explore theavailable processes by viewing them with

process_viewer()

which is shown below, or browse them onlinehere.

We can constrain the image collection that we want to work on by itsname, extents in space and time, and bands (if no constraints are given,the full extent is used). Using a member function of p here guaranteesthat we do not use processes not available on the back-end:

data = p$load_collection(id = collection,                          spatial_extent = bbox,                         temporal_extent = time_range,                          bands = bands) 

We will compute NDVI, normalized differenced vegetationindex,from the two selected bands, and use the NDVI function inreduce_dimension to reduce dimension bands:

ndvi = function(data, context) {  red = data[1]  nir = data[2]  (nir-red)/(nir+red)}calc_ndvi = p$reduce_dimension(data = data,                               dimension = "bands",                               reducer = ndvi)

Although ndvi is defined as an R function, in effect the openeo Rclient translates this function into openEO native processes. Thiscannot be done with arbitrarily complex functions, and passing on Rfunctions to be processed by an R instance in the back-end is done usinguser-defined functions, the topic of a future blog post.

We will now process the NDVI values into a data cube with monthlyvalues, by picking for each pixel the median value of all pixels overthe month (Sentinel-2 has an image for roughly every 5 days). This isdone by aggregate_temporal_period:

temp_period = p$aggregate_temporal_period(data = calc_ndvi, period = "month",                             reducer = function(data, context){p$median(data)},                              dimension = "t")

Finally, we can define how we want to save results (which file format),by the save_result process

result = p$save_result(data = temp_period, format="NetCDF")

and request the results synchronously by compute_results:

# synchronous:compute_result(result, format = "NetCDF", output_file = "ndvi.nc", con = con)## [1] "ndvi.nc"

All commands before compute_result() can be executed withoutauthentification; only compute_result asks for “real” computations onimagery, and requires authentication, so that the compute costs can beaccounted for.

compute_result downloads the file locally, and we can now import itand plot it either by e.g. ggplot2

library(stars)## Loading required package: abindr = read_stars("ndvi.nc")library(ggplot2)ggplot() + geom_stars(data = r) +        facet_wrap(~t) + coord_equal() +        theme_void() +        scale_x_discrete(expand = c(0,0)) +        scale_y_discrete(expand = c(0,0)) +        scale_fill_viridis_c()

or by mapview (where the “real” mapview obviously gives an interactiveplot):

library(mapview)mapview(r)

Batch jobs

The example above was deliberately kept very small; for larger jobs thesynchronous call to compute_result will time out, and a batch job canbe started with

job = create_job(graph = result, title = "ndvi.nc", description = "ndvi 2018")start_job(job = job) # use the id of the job (job$id) to start the jobjob_list = list_jobs() # here you can see all your jobs and their statusstatus(job) 

The returned status is either queued, running, error orfinished. When it is finished, then results can be downloaded by

dwnld = download_results(job = job, folder = "./") 

In case the status is error, error logs are obtained by

logs(job = job)

Further reading

The CRAN landing page hassix (6!) vignettes on getting started, sample data retrieval,architecture, process graph building, and implementation details.

Upcoming…

The use of R functions that can be sent to the back-end and executedthere by an R engine is under development. This is a big milestone, asit would provide arbitrary R functionality (given that arbitrary Rpackages would also be provided by the back-end), and is something thatfor instance Google Earth Engine can not provide. The current form inwhich this works is still a bit user-unfriendly (for the curious:examples are found in thisrepo, and contain still amix of Python and R code). When this has been cleaned up and undergonemore testing it will be the topic of a follow-up blog post.

To leave a comment for the author, please follow the link and comment on their blog: r-spatial.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Processing large scale satellite imagery with openEO Platform and R