Skip to content

szabilukacs/Romania-weather

Repository files navigation

Romania-weather

Python CI Docker Python Postgres

Tests Code style: black Lint License: MIT


📌 Overview

Romania-weather is a data engineering learning project, built to practice modern ETL pipelines and later extend to cloud deployment.

The goal is to simulate a real-world scenario:

  • Extract weather data from the Meteostat API.
  • Store both historical and fresh/current weather data from all meteorological stations in Romania.
  • Load the data into a PostgreSQL database.
  • Provide an interactive Streamlit dashboard to explore, summarize, and visualize the results.

This project demonstrates ETL pipelines, orchestration, containerization, and visualization.


🎯 Motivation & Learning Goals

  • Learn how to build end-to-end ETL pipelines.
  • Practice containerization with Docker.
  • Work with relational databases (PostgreSQL) in a production-like setup.
  • Explore Airflow for workflow orchestration.
  • Experiment with PySpark for scalable data processing.
  • Eventually deploy to cloud platforms (AWS/GCP/Azure).

This project is not just about weather data – it’s about practicing the core skills of a data engineer.


🚀 Features

  • Historical data ingestion → Load all past weather data for Romania’s stations.
  • Incremental updates → Periodically fetch and append new weather observations.
  • PostgreSQL integration → Data stored in structured, queryable format.
  • Streamlit dashboard → Visualize trends, statistics, and comparisons.
  • Dockerized setup → Reproducible environment for local development.
  • 🔜 Airflow DAGs → Orchestrate historical + current data pipelines.
  • 🔜 PySpark refactor → Transform data with distributed processing.
  • 🔜 Cloud deployment → Run pipelines and dashboard in the cloud.

🏗️ Architecture

  • main.py → One-time job for historical data.
  • current_data.py → Scheduled job for fresh data.
  • dashboard/app.py → Streamlit dashboard to explore results.

🖼️ Screenshots (Streamlit App)

Below are a few screenshots of the Streamlit dashboard.
These demonstrate how historical and current weather data are displayed, compared, and summarized.

👉 Note: In the constants.py file you can configure which regions are included.
For the examples below, only the Hargita region was selected.


🌦 Station Selection Dropdown

Romania Map with selected meteo stations.

Map with current data


📈 Historical vs Current Weather

Monthly comparison of past trends with the latest fetched data.

Historical vs Current


📊 Aggregated Statistics

Summaries such as averages, min/max values, and long-term weather patterns.

Aggregated Stats

🔧 Tech Stack

  • Python 3.11+
  • PostgreSQL 15
  • Streamlit (dashboard)
  • Docker & Docker Compose (containers)
  • Airflow (optional, orchestration)
  • PySpark (optional, scalable ETL)

📅 Roadmap

  • Add data quality checks before loading
  • Deploy to a cloud environment (AWS / GCP / Azure)
  • Integrate with PySpark for scalable transformations
  • Add Airflow DAGs for production scheduling
  • Extend dashboard with forecasting models
  • Add user authentication for dashboard access

🛠 Development Notes

  • Built with Python, PostgreSQL, and Streamlit
  • Uses Meteostat API for historical & OpenWeatherMap API for current weather data
  • ETL pipeline includes:
    • Extract: fetch weather data for all stations in Romania
    • Transform: clean & validate raw data
    • Load: insert into Postgres with COPY and INSERT strategies
  • Dockerized setup for reproducible local environment
  • Airflow (Docker-based) used for scheduling and orchestration

🎓 Learning Outcomes

This project is primarily educational, focusing on Data Engineering concepts:

  • Designing ETL pipelines with Python & SQL
  • Working with API data ingestion (Meteostat)
  • Practicing data validation & transformation
  • Using Docker & Docker Compose for environment management
  • Setting up Airflow for scheduling workflows
  • Building dashboards with Streamlit for visualization
  • Preparing for Cloud & Big Data tools (PySpark, AWS/GCP/Azure)

About

Fetches and visualizes historical and current weather data from Romanian meteorological stations in an interactive dashboard.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages