Barberini Analytics

A suite of data mining, analytics, and visualization solutions to create an awesome dashboard for the Museum Barberini, Potsdam, in order to help them analyze and assess customer, advertising, and social media data!

This solution has been originally developed in part of a Data Analytics project run as a cooperation of the Museum Barberini (MB) and the Hasso Plattner Institute (HPI) in 2019/20 (see Credits below). The project comprises a data mining pipeline that is regularly run on a server and feeds several visualization dashboards that are hosted in a Power BI app. For more information, see also the following resources:

System architecture slides
Original project description (organizational) (German)
Original project description (technical) (German)
Final press release (German)
Official presentation video (mirror on YouTube)

Note regarding reuse for other projects

While this solution has been tailored for the individual needs of the MB and the overall project is characterized by the structure of a majestic monolith, we think that it contains some features and components that have great potential for being reused as part of other solutions. In particular, these features include the following highlights:

Gomus binding: Connectors and scrapers for accessing various data sources from the museum management system go~mus. See src/gomus and the relevant documentation.
Apple App Store Reviews binding: Scraper for fetching all user reviews about an app in the Apple App Store. See src/apple_appstore and the relevant documentation.
Visitor Prediction: Machine-Learning (ML) based solution to predict the future number of museum visitors by extrapolating historic visitor data. See src/visitor_prediction.

Credits go to Georg Tennigkeit (@georgt99).
Postal Code Cleansing: Collection of heuristics to correct address information entered by humans with errors. See [src/_utils/cleanse_data.py].

Credits go to Laura Holz (@lauraholz).
Power BI Crash Tests: Load & crash tests for Power BI visualization reports. See https://github.com/LinqLover/pbi-crash-tests.

Credits go to Christoph Thiede (@LinqLover).

Development is currently being continued on GitLab (private repo) but a mirror of the repository is available on GitHub.

If you are interested in reusing any part of our solution and have further questions, ideas, or bug reports, please do not hesitate to contact us!

Backend

Installation

Requirements

UNIX system

Please note that these instructions are optimized for Ubuntu/amd64. If you use a different configuration, you may need to adjust the toolchain installation (see install_toolchain.sh).

Actual installation

Clone the repository using git
```
git clone https://github.com/Museum-Barberini/Barberini-Analytics.git
```
- For best convenience, clone it into /root/barberini-analytics.
Copy the secrets folders (which is not part of the repository) into /etc/barberini-analytics. From the secret_files subdirectory, you may omit files denoted as caches in the documentation.
Set up the toolchain. See scripts/setup/install_toolchain.sh how to do this. If you use ubuntu/amd64, you can run the script directly. Use sudo to run the commands!
Set up the docker network and add the current user to the docker user group. Do not run this script with sudo!
```
./scripts/setup/setup_docker.sh
```
Make sure to set the timezone of the machine to match the timezone of the gomus server.
```
 sudo timedatectl set-timezone Europe/Berlin
```

Database

TLS encryption

To use TLS encryption, we recommend using Let's Encrypt and certbot. Installation:

./scripts/setup/setup_letsencrypt.sh

Alternatively, just make sure that in /var/barberini-analytics/db-data, the following files are present and up to date:

server.crt
server.key

See configuration for more information.

Start the database

mkdir -p /var/barberini-analytics/db-data
make startup-db

Set up the database

Option 1: Initial setup

Run scripts/setup/setup_db.sh. This has not been tested for a long time!

Option 2: Restore from backup

ssh -C <oldremote> "docker exec barberini_analytics_db pg_dump -U postgres -C barberini | bzip2" | bunzip2 | docker exec -i barberini_analytics_db psql -U postgres
scp <oldremote>:/var/barberini-analytics/db-data/applied_migrations.txt /var/barberini-analytics/db-data/
./scripts/setup/setup_db_config.sh

Schedule regular DB updates

Run sudo scripts/setup/setup_cron.sh. If you cloned the repository in a different folder than /root/barberini-analytics, you may want to adapt the paths in scripts/setup/.crontab first. If no crontab exists before, create it using crontab -e.

GitLab CI Runner

These instructions assume that you want to use a custom GitLab CI runner:

Go to the GitLab CI/CD settings of your repository (e.g., https://gitlab.com/Museum-Barberini/Barberini-Analytics/-/settings/ci_cd#js-runners-settings), locate "New project runner"~~, and choose "Show runner instalation and registration instructions" from the menu~~. ~~Follow the instructions.~~
Follow these instructions instead to install the runner via apt with an update path. See https://gitlab.com/gitlab-org/gitlab/-/issues/424394 for the inconsistency in the docs.

Configuration:
- To set up a new runner, use these options:
  - executor type: shell
- To reuse the config of an existing runner, you may need to somehow cancel this dialog and reuse your existing /etc/gitlab-runner/config.toml file instead.
Check whether the runner is displayed in the GitLab CI/CD settings.
Add the gitlab-runner user to the docker group:
```
sudo usermod -aG docker gitlab-runner
```
Fix shell profile loading: Check whether /home/gitlab-runner/.bash_logout tries to clear the console, and if so, comment out the respective line. See https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information.

Customize the runner config (/etc/gitlab-runner/config.toml) depending on your needs. This is what we use:

-concurrent = 1
+concurrent = 2
# ...
[[runners]]
+  # WORKAROUND for permission issues. See: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2221
+  # NOTE: Update the string to match the runner user's password. This requires the user to be in the sudo group.
+  pre_clone_script = "echo <password_here> | sudo -S chown -f gitlab-runner:gitlab-runner -R /home/gitlab-runner/builds"

Trigger a pipeline run to check whether the runner works.

Configuration

See CONFIGURATION.md.

Development

Run scripts/setup/setup_dev.sh to set up the development environment.

Usage

Accessing the docker containers

Have a look at our beautiful Makefile! To access the luigi docker, do:

make startup connect

Close the session by executing:

make shutdown

Controlling the pipeline

Open the luigi web interface

make docker-do do='make luigi-scheduler'
make luigi-frontend

This will also start a webserver on http://localhost:8000 where you can trace all running tasks.

Running the pipeline manually

make docker-do do='make luigi'

Or, if you want to run a specific task:

make connect:

make luigi-task LTASK=<task> LMODULE=<module> [LARGS=<task_args>] [MINIMAL=True]

If you see this error:

To modify production database manually, set BARBERINI_ANALYTICS_CONTEXT to the PRODUCTION constant.

Then you want either to set the BARBERINI_ANALYTICS_CONTEXT environment variable to PRODUCTION or to run the task against a test database:

export POSTGRES_DB=barberini_test

Running the tests

make connect:

make test

Running the minimal mining pipeline

./scripts/tests/run_minimal_mining_pipeline.sh

Frontend (Power BI)

Installation

Requirements

Windows 10

Actual Installation

Download and install Power BI: https://powerbi.microsoft.com/downloads

Complete documentation

See DOCUMENTATION.md.

Maintenance

See MAINTENANCE.md.

Credits

Authors: Laura Holz, Selina Reinhard, Leon Schmidt, Georg Tennigkeit, Christoph Thiede, Tom Wollnik (bachelor project BP-FN1 @ HPI, 2019/20).
Organizations: Hasso Plattner Institute, Potsdam; Museum Barberini; Hasso Plattner Foundation.

Name		Name	Last commit message	Last commit date
Latest commit History 4,574 Commits
data		data
docker		docker
power_bi		power_bi
scripts		scripts
src		src
tests		tests
visualizations/sigma		visualizations/sigma
.gitattributes		.gitattributes
.gitconfig		.gitconfig
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.markdownlint.json		.markdownlint.json
CONFIGURATION.md		CONFIGURATION.md
DOCUMENTATION.md		DOCUMENTATION.md
MAINTENANCE.md		MAINTENANCE.md
Makefile		Makefile
README.md		README.md
bandit.yml		bandit.yml
banner.png		banner.png
luigi.cfg		luigi.cfg
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Barberini Analytics

Note regarding reuse for other projects

Backend

Installation

Requirements

Actual installation

Database

TLS encryption

Start the database

Set up the database

Option 1: Initial setup

Option 2: Restore from backup

Schedule regular DB updates

GitLab CI Runner

Configuration

Development

Usage

Accessing the docker containers

Controlling the pipeline

Open the luigi web interface

Running the pipeline manually

Running the tests

Running the minimal mining pipeline

Frontend (Power BI)

Installation

Requirements

Actual Installation

Complete documentation

Maintenance

Credits

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Barberini Analytics

Note regarding reuse for other projects

Backend

Installation

Requirements

Actual installation

Database

TLS encryption

Start the database

Set up the database

Option 1: Initial setup

Option 2: Restore from backup

Schedule regular DB updates

GitLab CI Runner

Configuration

Development

Usage

Accessing the docker containers

Controlling the pipeline

Open the luigi web interface

Running the pipeline manually

Running the tests

Running the minimal mining pipeline

Frontend (Power BI)

Installation

Requirements

Actual Installation

Complete documentation

Maintenance

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages