-
Notifications
You must be signed in to change notification settings - Fork 23
Closed
Labels
devopsRelated to deployment or configurationRelated to deployment or configurationdocsImprovements or additions to documentationImprovements or additions to documentationpriority:highShould be addressed as a priorityShould be addressed as a priorityrepo:drone-tm
Description
S3 upload vs download
- Upload should use S3 acceleration to improve user experience: Add S3 transfer acceleration for imagery uploads #699
- Download should use Cloudfront in front of the S3 to save costs. It has 1TB free Egress per month and will use caching for subsequent downloads.
- We probably need two env vars to configure up and down S3 URLs - with a fallback to use a single endpoint for both in basic use cases.
Processing vs Egress costs
- It might seem smart to run processing on-prem to save costs, but if we host the imagery in AWS, the Egress fees to download the imagery quickly add up, outweighing any savings.
- We have two options:
- All in AWS: S3 + burst compute capabilities (Karpenter). Overspill to on-prem for large projects if needed.
- On-prem + Cloudflare R2: switch from AWS S3 to Cloudflare R2 to have free Egress costs. On-prem processing as planned.
Cost comparison
Assumptions:
- 5TB imagery per month
- Each dataset is uploaded and read twice for processing
- Final outputs = 1TB of storage
- Region us-east-1
All AWS approach:
- S3 storage: ~$115
- PUT/GET requests: $5-10
- Processing data transfer (10TB reads): $100
- Compute:
- r7i.8xlarge Spot instance (32 core, 256 RAM)
- 150hrs/month? ($0.70/h): $105
- UI imagery display (cloudfront): possibly free if under 1TB?
Total: $320/month
On-prem + Cloudflare R2:
- R2 storage: ~$75
- Processing data transfer (10TB reads): $0
- Compute: server up-front cost, but effectively free if volunteers?
- AWS overspill: this is actually a bad idea, as Egress costs R2 --> AWS network will soon add up (the key here is that Egress from R2 to public internet is free, but to AWS it will cost). Only in exceptional circumstances / large projects.
Total: $75/month
(note this doesn't factor any hardware cost, electricity cost, and potential overspill cost for AWS processing...)
Archiving old data
- Using S3 lifecycle rule, we can archive old imagery in AWS glacier:
- 90 days: Glacier instant retrieval (cheaper than standard S3, basically same access speed, but high access fees).
- 180 days: Glacier deep archive (takes time to recover on request).
- E.g. filter by
projects/*/imagery/*for the raw drone imagery. - Glacier is cheap. E.g. 20TB archived = ~$20/month in deep archive.
- Retrieval takes 12-24hrs for deep archive (on request?), but this is fine for old projects.
Update: just enable S3 intelligent tiering to do this automatically! Script in k8s-infra repo.
Example of cost per-project (full AWS)
Assuming:
- 200GB of imagery total
- 40 task areas × 5GB
- Fast orthos per task + final ortho and 3d product generation
Costs:
- S3 upload + accelerate: ~$1
- UI display via S3+Cloudfront: ~$0
- 2x S3 data transfer for processing: 400GB × $0.01/GB = $4
- Compute costs for r7i.8xlarge to run ODM processing:
- 40 task areas × ~10 minutes each = 6.5 hours
- Full 200GB dataset: ~24 hours
- Total = ~30hrs = $21
Total: $26 per project
Says we scale this to 2TB imagery on a large project: ~$260?
Summary
- Uploads → S3 Transfer Acceleration
- UI downloads → CloudFront
- Processing should happen where the data lives
- On-prem processing only makes sense if storage is also outside AWS
- Cloudflare R2 is viable only if we fully commit to non-AWS compute
- Mixing clouds introduces egress costs that erase savings
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
devopsRelated to deployment or configurationRelated to deployment or configurationdocsImprovements or additions to documentationImprovements or additions to documentationpriority:highShould be addressed as a priorityShould be addressed as a priorityrepo:drone-tm