A suite of data mining, analytics, and visualization solutions to create an awesome dashboard for the Museum Barberini, Potsdam, in order to help them analyze and assess customer, advertising, and social media data!
This solution has been originally developed in part of a Data Analytics project run as a cooperation of the Museum Barberini (MB) and the Hasso Plattner Institute (HPI) in 2019/20 (see Credits below). The project comprises a data mining pipeline that is regularly run on a server and feeds several visualization dashboards that are hosted in a Power BI app. For more information, see also the following resources:
- System architecture slides
- Original project description (organizational) (German)
- Original project description (technical) (German)
- Final press release (German)
- Official presentation video (mirror on YouTube)
While this solution has been tailored for the individual needs of the MB and the overall project is characterized by the structure of a majestic monolith, we think that it contains some features and components that have great potential for being reused as part of other solutions. In particular, these features include the following highlights:
-
Gomus binding: Connectors and scrapers for accessing various data sources from the museum management system go~mus. See
src/gomusand the relevant documentation. -
Apple App Store Reviews binding: Scraper for fetching all user reviews about an app in the Apple App Store. See
src/apple_appstoreand the relevant documentation. -
Visitor Prediction: Machine-Learning (ML) based solution to predict the future number of museum visitors by extrapolating historic visitor data. See
src/visitor_prediction.Credits go to Georg Tennigkeit (@georgt99).
-
Postal Code Cleansing: Collection of heuristics to correct address information entered by humans with errors. See [
src/_utils/cleanse_data.py].Credits go to Laura Holz (@lauraholz).
-
Power BI Crash Tests: Load & crash tests for Power BI visualization reports. See https://github.com/LinqLover/pbi-crash-tests.
Credits go to Christoph Thiede (@LinqLover).
Development is currently being continued on GitLab (private repo) but a mirror of the repository is available on GitHub.
If you are interested in reusing any part of our solution and have further questions, ideas, or bug reports, please do not hesitate to contact us!
- UNIX system
Please note that these instructions are optimized for Ubuntu/amd64.
If you use a different configuration, you may need to adjust the toolchain installation (see install_toolchain.sh).
-
Clone the repository using git
git clone https://github.com/Museum-Barberini/Barberini-Analytics.git
- For best convenience, clone it into
/root/barberini-analytics.
- For best convenience, clone it into
-
Copy the
secretsfolders (which is not part of the repository) into/etc/barberini-analytics. From thesecret_filessubdirectory, you may omit files denoted as caches in the documentation. -
Set up the toolchain. See
scripts/setup/install_toolchain.shhow to do this. If you use ubuntu/amd64, you can run the script directly. Usesudoto run the commands! -
Set up the docker network and add the current user to the
dockeruser group. Do not run this script withsudo!./scripts/setup/setup_docker.sh
-
Make sure to set the timezone of the machine to match the timezone of the gomus server.
sudo timedatectl set-timezone Europe/Berlin
To use TLS encryption, we recommend using Let's Encrypt and certbot. Installation:
./scripts/setup/setup_letsencrypt.shAlternatively, just make sure that in /var/barberini-analytics/db-data, the following files are present and up to date:
server.crtserver.key
See configuration for more information.
mkdir -p /var/barberini-analytics/db-data
make startup-dbRun scripts/setup/setup_db.sh.
This has not been tested for a long time!
ssh -C <oldremote> "docker exec barberini_analytics_db pg_dump -U postgres -C barberini | bzip2" | bunzip2 | docker exec -i barberini_analytics_db psql -U postgres
scp <oldremote>:/var/barberini-analytics/db-data/applied_migrations.txt /var/barberini-analytics/db-data/
./scripts/setup/setup_db_config.shRun sudo scripts/setup/setup_cron.sh.
If you cloned the repository in a different folder than /root/barberini-analytics, you may want to adapt the paths in scripts/setup/.crontab first.
If no crontab exists before, create it using crontab -e.
These instructions assume that you want to use a custom GitLab CI runner:
-
Go to the GitLab CI/CD settings of your repository (e.g., https://gitlab.com/Museum-Barberini/Barberini-Analytics/-/settings/ci_cd#js-runners-settings), locate "New project runner"
, and choose "Show runner instalation and registration instructions" from the menu.Follow the instructions.
Follow these instructions instead to install the runner via apt with an update path. See https://gitlab.com/gitlab-org/gitlab/-/issues/424394 for the inconsistency in the docs.Configuration:
- To set up a new runner, use these options:
- executor type:
shell
- executor type:
- To reuse the config of an existing runner, you may need to somehow cancel this dialog and reuse your existing
/etc/gitlab-runner/config.tomlfile instead.
Check whether the runner is displayed in the GitLab CI/CD settings.
- To set up a new runner, use these options:
-
Add the gitlab-runner user to the docker group:
sudo usermod -aG docker gitlab-runner
-
Fix shell profile loading: Check whether
/home/gitlab-runner/.bash_logouttries to clear the console, and if so, comment out the respective line. See https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information. -
Customize the runner config (
/etc/gitlab-runner/config.toml) depending on your needs. This is what we use:-concurrent = 1 +concurrent = 2 # ... [[runners]] + # WORKAROUND for permission issues. See: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2221 + # NOTE: Update the string to match the runner user's password. This requires the user to be in the sudo group. + pre_clone_script = "echo <password_here> | sudo -S chown -f gitlab-runner:gitlab-runner -R /home/gitlab-runner/builds"
-
Trigger a pipeline run to check whether the runner works.
See CONFIGURATION.md.
Run scripts/setup/setup_dev.sh to set up the development environment.
Have a look at our beautiful Makefile!
To access the luigi docker, do:
make startup connectClose the session by executing:
make shutdownmake docker-do do='make luigi-scheduler'
make luigi-frontendThis will also start a webserver on http://localhost:8000 where you can trace all running tasks.
make docker-do do='make luigi'Or, if you want to run a specific task:
make connect:
make luigi-task LTASK=<task> LMODULE=<module> [LARGS=<task_args>] [MINIMAL=True]If you see this error:
To modify production database manually, set BARBERINI_ANALYTICS_CONTEXT to the PRODUCTION constant.
Then you want either to set the BARBERINI_ANALYTICS_CONTEXT environment variable to PRODUCTION or to run the task against a test database:
export POSTGRES_DB=barberini_testmake connect:
make test./scripts/tests/run_minimal_mining_pipeline.sh- Windows 10
Download and install Power BI: https://powerbi.microsoft.com/downloads
See DOCUMENTATION.md.
See MAINTENANCE.md.
Authors: Laura Holz, Selina Reinhard, Leon Schmidt, Georg Tennigkeit, Christoph Thiede, Tom Wollnik (bachelor project BP-FN1 @ HPI, 2019/20).
Organizations: Hasso Plattner Institute, Potsdam; Museum Barberini; Hasso Plattner Foundation.
