Skip to content

Commit 6e9a655

Browse files
authored
Merge pull request #128 from SchmidtDSE/update-2025
Add support for 2025 data.
2 parents 5a6ddba + bd05b99 commit 6e9a655

File tree

4 files changed

+50
-3
lines changed

4 files changed

+50
-3
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,4 +179,4 @@ Annotated version history:
179179
- `0.0.2`: License under BSD.
180180
- `0.0.1`: Initial release.
181181

182-
The community files were last updated on Jan 7, 2025.
182+
The community files were last updated on Oct 31, 2025.

snapshot/README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,13 @@ Note that, while provided as a service to the community, these Avro files and di
2222
### Manual execution
2323
In order to build the Avro files yourself by requesting, joining, and indexing original upstream API data, you can simply execute `bash execute_all.sh` after local setup. These will build these files on S3 but they may be deployed to an SFTP server trivially.
2424

25+
For updating just a specific year, use `get_year.sh`:
26+
```bash
27+
bash get_year.sh 2025
28+
```
29+
30+
This will fetch species, catch, and haul data for the specified year and upload to S3. After running this, you'll need to run the remaining pipeline steps (render_flat.py, indexing, etc.) to complete the update.
31+
2532
## Local setup
2633
Local environment setup varies depending on how these files are used.
2734

@@ -37,8 +44,11 @@ To perform manual execution, these scripts expect to use [AWS S3](https://aws.am
3744
- `AWS_ACCESS_KEY`: This is the access key used to upload completed payloads to AWS S3 or to request those data as part of distributed indexing and processing.
3845
- `AWS_ACCESS_SECRET`: This is the secret associated with the access key used to upload completed payloads to AWS S3 or to request those data as part of distributed indexing and processing.
3946
- `BUCKET_NAME`: This is the name of the bucket where completed uploads should be uploaded or requested within S3.
47+
- `SFTP_HOST`: The SFTP server hostname for deploying files to data.pyafscgap.org.
48+
- `SFTP_USER`: The SFTP username for authentication.
49+
- `SFTP_PASS`: The SFTP password for authentication.
4050

41-
These may be set within `.bashrc` files or similar through `EXPORT` commands. Finally, these scripts expect [Coiled](https://www.coiled.io/) to perform distributed tasks.
51+
These may be set within `.bashrc` files or similar through `EXPORT` commands. A `setup_env.sh` file in the parent directory can also be used (should not be committed to version control). Finally, these scripts expect [Coiled](https://www.coiled.io/) to perform distributed tasks.
4252

4353
## Testing
4454
Unit tests can be executed by running `nose2` within the `snapshot` directory.

snapshot/get_year.sh

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
#!/bin/bash
2+
3+
# Check if BUCKET_NAME is set
4+
if [ -z "$BUCKET_NAME" ]; then
5+
echo "Error: BUCKET_NAME environment variable is not set"
6+
echo "Please run: source ../../setup_env.sh"
7+
exit 1
8+
fi
9+
10+
# Check if year argument is provided
11+
if [ -z "$1" ]; then
12+
echo "Error: Year argument is required"
13+
echo "Usage: bash get_year.sh <year>"
14+
echo "Example: bash get_year.sh 2025"
15+
exit 1
16+
fi
17+
18+
YEAR=$1
19+
20+
echo "-- Getting species --"
21+
python request_source.py species $BUCKET_NAME species
22+
[ $? -ne 0 ] && exit $?
23+
24+
echo "-- Getting catch --"
25+
python request_source.py catch $BUCKET_NAME catch
26+
[ $? -ne 0 ] && exit $?
27+
28+
echo "-- Getting $YEAR --"
29+
python request_source.py haul $BUCKET_NAME haul $YEAR
30+
[ $? -ne 0 ] && exit $?
31+
32+
echo "Done with getting $YEAR."

snapshot/render_flat.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -448,7 +448,12 @@ def get_haul_record(year: int, survey: str, haul: int) -> typing.Optional[dict]:
448448
return None
449449

450450
haul_records = get_avro(haul_loc)
451-
assert len(haul_records) == 1
451+
if len(haul_records) != 1:
452+
raise ValueError(
453+
f"Expected exactly 1 haul record but found "
454+
f"{len(haul_records)} records for year={year}, "
455+
f"survey={survey}, haul={haul}, file={haul_loc}"
456+
)
452457
haul_record = haul_records[0]
453458
return haul_record
454459

0 commit comments

Comments
 (0)