Skip to content
This repository was archived by the owner on Dec 19, 2024. It is now read-only.

Latest commit

 

History

History
49 lines (31 loc) · 1.23 KB

File metadata and controls

49 lines (31 loc) · 1.23 KB

apache-beam-exploration

This repository has been archived.

Repo for exploring the use of Apache Beam as the orchestrator for OGDC recipes.

Repo currently focuses on following along with the beam "getting started" materials: https://beam.apache.org/get-started/

System check

To start, run the built-in copy of the word-count example with the following command, just to make sure that Apache Beam is correctly installed.

python -m apache_beam.examples.wordcount_minimal \
  --input data/words.txt \
  --output data/wordcounts_official_example.txt

This outputs a file wordcounts_official_example.txt-00000-of-00001. Why doesn't it match the requested output file name?

Our own implementaiton of the example

python -m wordcount_example \
  --input data/words.txt \
  --output data/wordcounts_our_example.txt

The output file looks the same as the output file from the above example. There is significantly less log output, however. Why is that?

Seal tag data spike

python -m seal_csv_to_gpkg

Useful resources