Skip to content

Cost reduction measures (storage, data transfer, processing) #708

@spwoodcock

Description

@spwoodcock

S3 upload vs download

  • Upload should use S3 acceleration to improve user experience: Add S3 transfer acceleration for imagery uploads #699
  • Download should use Cloudfront in front of the S3 to save costs. It has 1TB free Egress per month and will use caching for subsequent downloads.
  • We probably need two env vars to configure up and down S3 URLs - with a fallback to use a single endpoint for both in basic use cases.

Processing vs Egress costs

  • It might seem smart to run processing on-prem to save costs, but if we host the imagery in AWS, the Egress fees to download the imagery quickly add up, outweighing any savings.
  • We have two options:
    • All in AWS: S3 + burst compute capabilities (Karpenter). Overspill to on-prem for large projects if needed.
    • On-prem + Cloudflare R2: switch from AWS S3 to Cloudflare R2 to have free Egress costs. On-prem processing as planned.

Cost comparison

Assumptions:

  • 5TB imagery per month
  • Each dataset is uploaded and read twice for processing
  • Final outputs = 1TB of storage
  • Region us-east-1

All AWS approach:

  • S3 storage: ~$115
  • PUT/GET requests: $5-10
  • Processing data transfer (10TB reads): $100
  • Compute:
    • r7i.8xlarge Spot instance (32 core, 256 RAM)
    • 150hrs/month? ($0.70/h): $105
  • UI imagery display (cloudfront): possibly free if under 1TB?
    Total: $320/month

On-prem + Cloudflare R2:

  • R2 storage: ~$75
  • Processing data transfer (10TB reads): $0
  • Compute: server up-front cost, but effectively free if volunteers?
  • AWS overspill: this is actually a bad idea, as Egress costs R2 --> AWS network will soon add up (the key here is that Egress from R2 to public internet is free, but to AWS it will cost). Only in exceptional circumstances / large projects.
    Total: $75/month
    (note this doesn't factor any hardware cost, electricity cost, and potential overspill cost for AWS processing...)

Archiving old data

  • Using S3 lifecycle rule, we can archive old imagery in AWS glacier:
    • 90 days: Glacier instant retrieval (cheaper than standard S3, basically same access speed, but high access fees).
    • 180 days: Glacier deep archive (takes time to recover on request).
  • E.g. filter by projects/*/imagery/* for the raw drone imagery.
  • Glacier is cheap. E.g. 20TB archived = ~$20/month in deep archive.
  • Retrieval takes 12-24hrs for deep archive (on request?), but this is fine for old projects.
    Update: just enable S3 intelligent tiering to do this automatically! Script in k8s-infra repo.

Example of cost per-project (full AWS)

Assuming:

  • 200GB of imagery total
  • 40 task areas × 5GB
  • Fast orthos per task + final ortho and 3d product generation

Costs:

  • S3 upload + accelerate: ~$1
  • UI display via S3+Cloudfront: ~$0
  • 2x S3 data transfer for processing: 400GB × $0.01/GB = $4
  • Compute costs for r7i.8xlarge to run ODM processing:
    • 40 task areas × ~10 minutes each = 6.5 hours
    • Full 200GB dataset: ~24 hours
    • Total = ~30hrs = $21

Total: $26 per project
Says we scale this to 2TB imagery on a large project: ~$260?

Summary

  1. Uploads → S3 Transfer Acceleration
  2. UI downloads → CloudFront
  3. Processing should happen where the data lives
  4. On-prem processing only makes sense if storage is also outside AWS
  5. Cloudflare R2 is viable only if we fully commit to non-AWS compute
  6. Mixing clouds introduces egress costs that erase savings

Metadata

Metadata

Assignees

Labels

devopsRelated to deployment or configurationdocsImprovements or additions to documentationpriority:highShould be addressed as a priorityrepo:drone-tm

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions