Cost reduction measures (storage, data transfer, processing)

## S3 upload vs download

- Upload should use S3 acceleration to _improve user experience_: https://github.com/hotosm/drone-tm/issues/699
- Download should use Cloudfront in front of the S3 to _save costs_. It has 1TB free Egress per month and will use caching for subsequent downloads.
- We probably need two env vars to configure up and down S3 URLs - with a fallback to use a single endpoint for both in basic use cases.

## Processing vs Egress costs

- It might seem smart to run processing on-prem to save costs, but if we host the imagery in AWS, the Egress fees to download the imagery quickly add up, outweighing any savings.
- We have two options:
  - **All in AWS**: S3 + burst compute capabilities (Karpenter). Overspill to on-prem for large projects if needed.
  - **On-prem + Cloudflare R2**: switch from AWS S3 to Cloudflare R2 to have free Egress costs. On-prem processing as planned.

### Cost comparison

**Assumptions**:
- 5TB imagery per month
- Each dataset is uploaded and read twice for processing
- Final outputs = 1TB of storage
- Region us-east-1

**All AWS approach**:
- S3 storage: ~$115
- PUT/GET requests: $5-10
- Processing data transfer (10TB reads): $100
- Compute:
  - r7i.8xlarge Spot instance (32 core, 256 RAM)
  - 150hrs/month? ($0.70/h): $105
- UI imagery display (cloudfront): possibly free if under 1TB?
**Total: $320/month**

**On-prem + Cloudflare R2**:
- R2 storage: ~$75
- Processing data transfer (10TB reads): $0
- Compute: server up-front cost, but effectively free if volunteers?
- AWS overspill: this is actually a bad idea, as Egress costs R2 --> AWS network will soon add up (the key here is that Egress from R2 to public internet is free, but to AWS it will cost). Only in exceptional circumstances / large projects.
**Total: $75/month**
(note this doesn't factor any hardware cost, electricity cost, and potential overspill cost for AWS processing...)

### Archiving old data

- Using S3 lifecycle rule, we can archive old imagery in AWS glacier:
  - 90 days: Glacier instant retrieval (cheaper than standard S3, basically same access speed, but high access fees).
  - 180 days: Glacier deep archive (takes time to recover on request).
- E.g. filter by `projects/*/imagery/*` for the raw drone imagery.
- Glacier is cheap. E.g. 20TB archived = ~$20/month in deep archive.
- Retrieval takes 12-24hrs for deep archive (on request?), but this is fine for old projects.
**Update: just enable S3 intelligent tiering to do this automatically! Script in k8s-infra repo.**

## Example of cost per-project (full AWS)

Assuming:
- 200GB of imagery total
- 40 task areas × 5GB
- Fast orthos per task + final ortho and 3d product generation

Costs:
- S3 upload + accelerate: ~$1
- UI display via S3+Cloudfront: ~$0
- 2x S3 data transfer for processing:  400GB × $0.01/GB = $4
- Compute costs for r7i.8xlarge to run ODM processing:
  - 40 task areas × ~10 minutes each = 6.5 hours
  - Full 200GB dataset: ~24 hours
  - Total = ~30hrs = $21

**Total: $26 per project**
Says we scale this to 2TB imagery on a large project: ~$260?

## Summary

1. Uploads → S3 Transfer Acceleration
2. UI downloads → CloudFront
3. Processing should happen where the data lives
4. On-prem processing only makes sense if storage is also outside AWS
5. Cloudflare R2 is viable only if we fully commit to non-AWS compute
6. Mixing clouds introduces egress costs that erase savings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cost reduction measures (storage, data transfer, processing) #708

S3 upload vs download

Processing vs Egress costs

Cost comparison

Archiving old data

Example of cost per-project (full AWS)

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cost reduction measures (storage, data transfer, processing) #708

Description

S3 upload vs download

Processing vs Egress costs

Cost comparison

Archiving old data

Example of cost per-project (full AWS)

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions