Skip to content

KhaosResearch/sentinel2-download

Repository files navigation

ds_download

ds_download is a Python package designed to facilitate the downloading and processing of Sentinel-2 satellite imagery, including creating composites and calculating various spectral indices. This package automates interactions with the Sentinel API and handles composite creation through cloud-based storage like MinIO and Google Cloud Storage.

Features

  • Download Sentinel-2 Data: Fetches Sentinel-2 products using Copernicus Open Access Hub API and Google Cloud.
  • Composite Creation: Automatically creates pixel-wise median composites for specified date ranges.
  • Spectral Indices Calculation: Supports the computation of various indices like NDVI, NDWI, NDSI, EVI, and more.
  • MongoDB and MinIO Integration: Stores metadata in MongoDB and handles data through MinIO for distributed processing.

Requirements

  • Python 3.10+
  • A working MongoDB and MinIO instance
  • Access to Google Cloud

Installation

  1. Clone the repository:

    git clone https://github.com/KhaosResearch/sentinel2-download.git
    cd ds_download
  2. Since the library is not available on PyPI, you will need to install it locally. Install the package using pip:

    pip install .
  3. Set up your environment variables in a .env file:

  4. Install the required dependencies:

    pip install -r requirements.txt

Usage

For usage examples, refer to main_script.py and main_script_dask.py in the repository. These scripts demonstrate how to use the library both locally and in a Dask cluster environment.

Download Sentinel-2 Products

from ds_download.download_using_sentinel_api import download_product_using_sentinel_api
from datetime import datetime

# Download products for a specific tile and date range
start_date = datetime(2021, 4, 1)
end_date = datetime(2021, 4, 30)
tile_id = "29SLC"

download_product_using_sentinel_api(
    calculate_raw_indexes=False,
    calculate_intermediate_products=True,
    from_date=start_date,
    to_date=end_date,
    tile_id=tile_id
)

Create a Composite

from ds_download.compute_composite import create_composite_by_tile_and_date
from datetime import datetime

# Create a composite for a specific tile and date range
tile_id = "29SLC"
start_date = datetime(2021, 4, 1)
end_date = datetime(2021, 4, 30)

create_composite_by_tile_and_date(
    calculate_raw_indexes=True,
    calculate_intermediate_products=False,
    tile=tile_id,
    start_date=start_date,
    end_date=end_date,
    min_useful_data_percentage=30
)

Dask Integration for Distributed Processing

from dask.distributed import Client
from ds_download.download_using_sentinel_api import download_product_using_sentinel_api
from ds_download.compute_composite import create_composite_by_tile_and_date

client = Client("<dask-scheduler-host>:<dask-scheduler-port>")

def process_month(year, month, tile):
    # Define logic for processing
    ...

futures = [client.submit(process_month, year, month, tile) for year in years for tile in tiles]
results = client.gather(futures)

for result in results:
    print(result)

Deploying a Dask Cluster in Kubernetes

To deploy a Dask cluster compatible with the ds_download library in Kubernetes, you can use Helm with custom values. This section will guide you through deploying the Dask cluster and the necessary secrets to integrate with MinIO, MongoDB, and Google Cloud.

Prerequisites

  • Kubernetes cluster up and running.
  • Helm installed.
  • Access to your Kubernetes cluster context.

Step 1: Deploy Dask using Helm

Refer to the dask_values_product_download.yaml file provided in the repository to configure the Dask cluster for use with ds_download. Ensure to fill in all placeholder values (e.g., MinIO credentials, MongoDB credentials) before deploying. Additionally, since the Kubernetes nodes need access to the ds_download library, you will need to build the package and upload it to your own package repository.

Deploy the Dask cluster using Helm:

helm repo add dask https://helm.dask.org/
helm install dask-cluster dask/dask -f dask_values_product_download.yaml

Step 2: Create Kubernetes Secrets

To allow Dask workers to authenticate with Google Cloud, MongoDB, and MinIO, you need to create the appropriate Kubernetes secrets. Here's an example of how to create a secret for Google Cloud credentials:

kubectl create secret generic gcloud-credentials --from-file=path_to_your_gcloud_credentials.json

Ensure that the secret name matches the one specified in your values.yaml file.

Step 3: Verify Deployment

After deploying the Dask cluster and creating the necessary secrets, verify that the scheduler and workers are running:

kubectl get pods

You should see pods for the Dask scheduler and multiple workers, indicating that the cluster is successfully deployed.

Now, your Dask cluster is ready to work with the ds_download library for distributed processing of Sentinel-2 data.

About

A set of function for downloading Sentinel-2 products. Also creates composites and spectral indexes

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages