Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
189 commits
Select commit Hold shift + click to select a range
10fa806
Start a modified document describing the Dataset graph pattern.
mbjones Mar 19, 2021
456bb45
Port the essential services from Docker to Kubernetes
ThomasThelen Mar 20, 2021
683d625
Add the Dockerfiles for building the K8 requirements and add a Readme…
ThomasThelen Mar 22, 2021
7e7e402
Run 2to3 -w on all .py files
amoeba Feb 9, 2021
fc45196
Add a basic Dockerfile to run tests
amoeba Feb 9, 2021
d85e04a
Upgrade for Python3
amoeba Feb 10, 2021
70e63dc
Add py.bak files to gitignore
amoeba Feb 10, 2021
ff27fdf
Fix dockerfile merge conflicts
ThomasThelen Mar 23, 2021
b414825
Add Pipfile for worker service
amoeba Mar 2, 2021
75fd8e5
Add the docker folder to the helm ignore
ThomasThelen Mar 23, 2021
9ef251e
Merge pull request #22 from DataONEorg/feature_kubernetes_deployment
amoeba Mar 24, 2021
2fd8b20
Add docs for how to install d1lod on macOS+pyenv
amoeba Mar 25, 2021
e385b02
Port K8 to Python 3
ThomasThelen Mar 25, 2021
7142084
Merge branch 'develop' into feature_1_python3
ThomasThelen Mar 25, 2021
ff7a02b
Add instructions for adding authentication to the SPARQL endpoint
ThomasThelen Mar 25, 2021
0077585
Merge pull request #25 from DataONEorg/feature_1_python3
amoeba Mar 26, 2021
d598ad8
Merge pull request #26 from DataONEorg/feature_auth_sparql
amoeba Mar 26, 2021
b7ef1bc
WIP: Partially update mappings document
amoeba Apr 8, 2021
8e6cda2
Finish first draft of mappings documents
amoeba Apr 10, 2021
e018a08
Remove 2to3.txt from Python 2 to 3 migration
amoeba Apr 13, 2021
d9e67c1
Remove unneeded void.ttl file from d1lod folder
amoeba Apr 13, 2021
05741d9
Update d1lod package readme and setup.py
amoeba Apr 13, 2021
8ca22e0
Tweak mappings doc and add full example
amoeba Apr 21, 2021
da57b25
Fix URL in mappings doc
amoeba Apr 22, 2021
4cfebac
Create initial prototype of web app
amoeba Apr 22, 2021
109bc54
WIP Begin overhaul of classes for Slinky
amoeba Apr 24, 2021
8378a34
Merge branch 'feature_web_app' into feature_update_graph_pattern
amoeba Apr 24, 2021
478b62f
Completely refactor d1lod package
amoeba Apr 30, 2021
46009a7
Remove top-level makefile
amoeba Apr 30, 2021
062c562
Convert web front-end to use a SlinkyClient
amoeba Apr 30, 2021
2fca953
Remove unused variable in SparqlTripleStore.get_ua_string
amoeba Apr 30, 2021
a782ff9
Fix broken URIs in eml_processor
amoeba Apr 30, 2021
e1dd51f
Add note about Exceptions in d1lod readme
amoeba May 1, 2021
85163f6
Wrap up last bit of work for full EML processing
amoeba May 4, 2021
fd6f7cb
Create a Blazegraph connector to replace Virtuoso
amoeba May 4, 2021
ce15120
Alignment graphic for OBOE/SSN-EXT/schema.org.
mbjones May 12, 2021
ac73868
Add a main method in cli.py so the cli can be debugged
amoeba May 12, 2021
104e190
Create an ISOProcessor class to process ISO docs
amoeba May 12, 2021
777cf1f
Make SlinkyClient's choice of store an argument
amoeba May 12, 2021
04f9407
Add more tests to Blazegraph and SparqlStore
amoeba May 12, 2021
be4d94a
Hook up new classes for easy testing
amoeba May 15, 2021
9783fb6
Create new Virtuoso-specific store model
amoeba May 18, 2021
08cbd7a
Change BlazegraphStore's default port
amoeba May 18, 2021
025933b
Remove unused Exception from client.py
amoeba May 18, 2021
cb2ff80
Do a cleanup pass over the entire test suite
amoeba May 18, 2021
414fce3
Adjust logic for when update_job runs or doesn't
amoeba May 18, 2021
9daeef9
Re-use module-level global in jobs add_dataset_job
amoeba May 18, 2021
65879b9
Add --debug argument to work command in cli
amoeba May 18, 2021
729d615
Make get_new_datasets_since query range-exclusive
amoeba May 18, 2021
776f842
Use response.content instead of response.text
amoeba May 19, 2021
dd233d9
Add blazegraph to d1lod package's docker compose
amoeba May 19, 2021
21e4323
Begin work refactoring setups/environments
amoeba May 19, 2021
d1af30a
Remove unused code from cli.py
amoeba May 26, 2021
aab1c30
Add SPARQL DELETE support to VirtuosoStore
amoeba May 26, 2021
a6ef121
Prevent EMLProcessor from re-inserting identifier blank nodes
amoeba May 26, 2021
ae07392
Make VirtuosoStore's count method support patterns
amoeba May 26, 2021
1c315ed
Capitalize 'select' in VirtuosoStore.all
amoeba May 26, 2021
9b35c2a
Add remaining pieces of VirtuosoStore delete impl
amoeba May 26, 2021
9958cf3
Remove extra trailing slash from VirtuosoStore endpoint
amoeba May 26, 2021
49d78bd
Fix bug in Processor's handling of sysmeta 'obsoletes'
amoeba May 26, 2021
7eee7bd
Add a datatype for isAccessibleForFree triples (boolean)
amoeba May 26, 2021
2a8586b
Fix bug in schema:byteSize routine
amoeba May 26, 2021
c3eadc4
Guard against unset accessPolicy in Processor
amoeba May 26, 2021
7f64dbe
Add .strip() calls to all ElementTree .text calls
amoeba May 26, 2021
c954903
Remove unused code from ISOProcessor
amoeba May 26, 2021
1d43a69
Fix logic bug in handling datePublished
amoeba May 26, 2021
ee1907c
Add schema:distribution triples
amoeba May 28, 2021
9eaec1e
Add insert, insertall, clear, and count commands to CLI
amoeba May 28, 2021
6c60a8f
Add architecture diagram to readme
amoeba May 28, 2021
da28ab4
Finish up support for semantic annotations
amoeba Jun 3, 2021
dffb273
Finish spdx:Checksum support
amoeba Jun 3, 2021
ee4e1a6
Tweak style of slinky-architecture diagram a tad
amoeba Jun 3, 2021
fa74fa5
Move lookup* functions around in eml_processor
amoeba Jun 4, 2021
c154540
Clean up whitespace in readme
amoeba Jun 4, 2021
7482b19
Add in support for SOSO PropertyValue model for attributes
amoeba Jun 4, 2021
6d15143
Rename variable in test_eml220_processor
amoeba Jun 4, 2021
d4ac71c
Add count and format options to cli's get method
amoeba Jun 4, 2021
30e06a6
Remove unused pagination code in filtered_d1_client
amoeba Jun 5, 2021
aeec699
Finish up implementation of EML attributes
amoeba Jun 5, 2021
471dda8
Remove RQ Dashboard from compose file
amoeba Jun 5, 2021
8f1217d
Change update schedule from 5min to 1min
amoeba Jun 5, 2021
59ca2cb
Remove test for double-processing
amoeba Jun 5, 2021
91aa958
Re-organize code between client and jobs module
amoeba Jun 5, 2021
d89cf23
Add start of test suite for client
amoeba Jun 5, 2021
aa24cf2
Fix broken imports from previous refactor
amoeba Jun 5, 2021
3928b6d
Fix bug in FilteredCoordinatingNodeClient logic
amoeba Jun 5, 2021
81f81c3
Fix test regressions in for FilteredD1Client
amoeba Jun 23, 2021
72b00fa
Switch d1lod test suite's docker-compose to use official VOS image
amoeba Jun 23, 2021
60ca5d9
Remove the persistent volume decleration & rename d1lod folder
ThomasThelen Jun 3, 2021
7ba9b9c
Fix invalid EML doc in d1lod test suite
amoeba Jul 8, 2021
55ca3bf
Use a ClusterIP rather than NodePort for the Virtuoso service
ThomasThelen Jul 8, 2021
04c2487
Reorder ClusterIP and add instructions for connecting
ThomasThelen Jul 9, 2021
8d664dc
Merge remote-tracking branch 'origin/develop' into feature_update_gra…
ThomasThelen Aug 19, 2021
4b78c5c
Create two separate worker deployments that can be individually scaled
ThomasThelen Nov 5, 2021
16c9632
Add a step to the Dockerfile to install d1lod to the image
ThomasThelen Nov 5, 2021
50aa722
Refactor the Scheduler and SlinkyClient interactions to support servi…
ThomasThelen Nov 5, 2021
8e03b92
Add __init__.py to the iso folder to let the python packager know we …
ThomasThelen Nov 5, 2021
3051b99
Refactor the scheduler to always pull an image to avoid using old cac…
ThomasThelen Nov 5, 2021
832c7a2
Change the name of 'redis-main' deployment to just 'redis'.
ThomasThelen Nov 5, 2021
ae271ea
Remove the 'docker' folder since the d1lod image is now being used by…
ThomasThelen Nov 5, 2021
b938f59
Remove helm chart fils and simplify the deployment directory structure
ThomasThelen Nov 5, 2021
913791c
Combine the enable-update feature with the virtuoso image
ThomasThelen Nov 5, 2021
49e9c72
Remove debug flags from the worker deployments
ThomasThelen Nov 5, 2021
6d6306c
Add a ReadinessProbbe to the Virtuoso deployment
ThomasThelen Nov 6, 2021
a266898
Add ReadinessProbe to redis
ThomasThelen Nov 6, 2021
145b22f
Reduce startup time
ThomasThelen Nov 6, 2021
5f15ce7
Add a Makefile for ordered deployments
ThomasThelen Nov 6, 2021
97c5441
Use CephFS for storage
ThomasThelen Nov 6, 2021
f58aaff
Use the slinky dockerhub account for pulling images
ThomasThelen Nov 6, 2021
72ddb38
Create kubernetes architecture diagrams and update the Readme
ThomasThelen Nov 6, 2021
01461c7
Add a unit test for checking the problematic EML document
ThomasThelen Nov 6, 2021
1dcad7f
Fix eml path
ThomasThelen Nov 6, 2021
10e6b12
Preserve the ElementTree.Element identifier
ThomasThelen Nov 6, 2021
1ad7647
Merge pull request #43 from DataONEorg/feature_update_graph_pattern
amoeba Nov 10, 2021
5857673
Merge branch 'develop' into deployment_upgrades
ThomasThelen Nov 10, 2021
5e15885
Merge branch 'develop' into 37_fix
ThomasThelen Nov 10, 2021
9d1134d
Merge pull request #48 from DataONEorg/deployment_upgrades
amoeba Nov 11, 2021
9fde47b
Merge pull request #50 from DataONEorg/37_fix
amoeba Nov 11, 2021
6da693c
Make the worker and scheduler wait for redis and virtuoso before star…
ThomasThelen Nov 18, 2021
66f1f3f
Add more deployment options to the makefile
ThomasThelen Nov 18, 2021
ea6feb4
Use a configMap for storing the networking environmental variables
ThomasThelen Nov 18, 2021
3ff13e3
Remove the cli, development, and production environemnts and use env …
ThomasThelen Nov 18, 2021
bfb607b
Generalize the graph database endpoint so that others like blazegraph…
ThomasThelen Nov 18, 2021
aef613b
Remove initContainer
ThomasThelen Dec 8, 2021
dd21017
Add missing unit test changes
ThomasThelen Dec 8, 2021
666e0e0
Fix import
ThomasThelen Dec 8, 2021
65b3f33
Remove virtuoso env var from the dockerfile
ThomasThelen Dec 8, 2021
c921777
Use the REDIS_HOST env var for running the scheduler
ThomasThelen Dec 9, 2021
d53fafa
Remove --debug flags
ThomasThelen Dec 9, 2021
0be270f
Update the readme with dockerized testing instructions
ThomasThelen Dec 9, 2021
ebf2c1f
Add BLAZEGRAPH_ environmental variables for unit testing
ThomasThelen Dec 9, 2021
e3a8e62
Remove the legacy 'Graph' class
ThomasThelen Dec 15, 2021
e6a62be
Change the default Redis location to localhost
ThomasThelen Dec 16, 2021
def0f9c
Remove unused reference to the Graph class in tests
ThomasThelen Dec 17, 2021
3d82991
Add a flag to the cli arguments to enable using LocalStore
ThomasThelen Dec 17, 2021
57dd95e
Always use localstore for 'get'
ThomasThelen Jan 18, 2022
81ff31a
Add example turtle output
amoeba Feb 3, 2022
d6b8067
Force CLI's get method to use local RDF store
amoeba Feb 24, 2022
7033442
Remove http:// from REDIS_HOST fallback value
amoeba Feb 24, 2022
243a823
Merge pull request #54 from DataONEorg/feature_deployment_ordering
amoeba Feb 24, 2022
55c0a21
Remove legacy codebase
amoeba Feb 24, 2022
6dd715c
Apply Black formatting to repo
amoeba Feb 25, 2022
5a6b6f5
Add note in d1lod readme about using black
amoeba Feb 25, 2022
bee2a66
Set up flake8 and fix issues in d1lod package
amoeba Feb 25, 2022
cf439e3
Split out unit and integration tests
amoeba Feb 25, 2022
db2fce1
Merge pull request #21 from DataONEorg/feature_14_graph_pattern
amoeba Feb 26, 2022
7d2bdbc
Fix bug in adding checksumAlgorithm triples
amoeba Feb 26, 2022
5723f5d
Re-add the starlette web frontend to the slinky stack
ThomasThelen Mar 3, 2022
199a973
Remove unused test api endpoint
ThomasThelen Mar 3, 2022
b70aea5
Merge pull request #68 from DataONEorg/web-frontend
ThomasThelen Mar 4, 2022
715bd52
Add support for EML 2.0.0 and up
ThomasThelen Jun 11, 2022
13e97cc
Remove debug print statements
ThomasThelen Jul 9, 2022
40e8016
Fix the EML 2.2.0 format ID
ThomasThelen Jul 9, 2022
0833ce5
Merge pull request #69 from DataONEorg/expanded_eml_support
amoeba Jul 11, 2022
fe564a4
Tidy update python imports
amoeba Aug 1, 2022
69dab0b
Refactor LocalClient impl and handling
amoeba Aug 1, 2022
d806ccb
Run black on test_eml_processor.py
amoeba Aug 1, 2022
cc77a5e
Separate our compose files and add devcontainer
amoeba Aug 1, 2022
9ec686b
Add sparql proxy to web service
amoeba Aug 1, 2022
a6dc3b9
Update k8s yaml files to match recent changes
amoeba Aug 1, 2022
7cfe25c
Update Slinky readme to cover docker compose
amoeba Aug 1, 2022
d2f43b1
Create first working but rough version of Helm chart
amoeba Aug 2, 2022
86fc579
Finish a first full version of Slinky helm chart
amoeba Aug 2, 2022
4ca3789
Fix issues in Helm chart and related files
amoeba Aug 3, 2022
8844e3d
Fix issue in Helm chart where sparql queries don't work
amoeba Aug 3, 2022
c244c3f
Make Slinky web proxy actually proxy queries
amoeba Aug 3, 2022
50fcd59
Update root docker-compose file to match repo
amoeba Aug 3, 2022
1a06a10
Update redlands install docs
amoeba Aug 6, 2022
7aa9d94
Add full support for SOSO award structure
amoeba Aug 7, 2022
d8fbec1
Fix predicate in EML220 processor for funding
amoeba Aug 7, 2022
102941f
Update readme an arch diagram
amoeba Aug 7, 2022
77309b3
Tweak identifier logic in processor.py
amoeba Aug 8, 2022
8f62b43
Add units support to variableMeasured
amoeba Aug 8, 2022
931d266
Fix bug in handling of userId
amoeba Aug 8, 2022
c5e2979
Refactor slinky to use rdflib.Namespace
amoeba Aug 8, 2022
17718d0
Update all SDO references to http://
amoeba Aug 8, 2022
6aefcec
Add missing str.strip() call in eml220_processor
amoeba Aug 8, 2022
f2faac5
Add support for seriesIds
amoeba Aug 8, 2022
4600081
Add pytest GHA workflow
amoeba Aug 9, 2022
da665c5
Make CLI's get command output more compact
amoeba Aug 9, 2022
ac31bd5
Make GHA Workflow use :latest tag
amoeba Aug 9, 2022
9ced01e
Add sphinx documentation to the python package
amoeba Aug 10, 2022
a47c4b4
Rename python package to slinky
amoeba Aug 10, 2022
32decbf
Add GHA workflow for sphinx build
amoeba Aug 10, 2022
dc93656
Switch sphinx GHA workflow to only trigger on main
amoeba Aug 10, 2022
f878c5a
v0.3.0
amoeba Aug 10, 2022
88a8900
Remove the old k8 deployment files in favor of the Helm deployment
ThomasThelen Aug 12, 2022
56eb842
Add the dynamically provisioned PVC configuration to the Helm chart
ThomasThelen Aug 12, 2022
63658d4
bitnami/redis -> bitnamilegacy/redis
artntek Oct 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: docs

on:
push:
branches:
- main

jobs:
document:
runs-on: ubuntu-latest
container:
image: ghcr.io/dataoneorg/slinky:latest
steps:
- name: Install rsync 📚
run: |
apt-get update && apt-get install -y rsync
- uses: actions/checkout@v3
- name: pip install
working-directory: ./slinky
run: python -m pip install .[docs]
- name: Build documentation
run: sphinx-build . _build
working-directory: ./slinky/docs
- name: Deploy 🚀
uses: JamesIves/github-pages-deploy-action@v4
with:
folder: ./slinky/docs/_build
17 changes: 17 additions & 0 deletions .github/workflows/pytest.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: pytest

on: push

jobs:
test:
runs-on: ubuntu-latest
container:
image: ghcr.io/dataoneorg/slinky:latest
steps:
- uses: actions/checkout@v3
- name: pip install
working-directory: ./slinky
run: python -m pip install .[test]
- name: pytest
working-directory: ./slinky
run: python -m pytest
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*.py.bak

# C extensions
*.so
Expand All @@ -22,6 +23,7 @@ var/
*.egg-info/
.installed.cfg
*.egg
_build

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down Expand Up @@ -66,3 +68,6 @@ webapps/
.idea/
.venv/
*.logs

# macOS Specifics
.DS_Store
11 changes: 0 additions & 11 deletions Makefile

This file was deleted.

123 changes: 66 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,99 @@
# Slinky, the DataONE Graph Store

## Overview
Service for the DataONE Linked Open Data graph.
[![pytest](https://github.com/dataoneorg/slinky/actions/workflows/pytest.yaml/badge.svg)](https://github.com/dataone/slinky/actions/workflows/pytest.yaml)

A Linked Open Data interface to [DataONE](https://dataone.org) designed to run on [Kubernetes](https://kubernetes.io).

This repository contains a deployable service that continuously updates the [DataOne](https://www.dataone.org/) [Linked Open Data](http://linkeddata.org/) graph. It was originally developed as a provider of data for the [GeoLink](http://www.geolink.org/) project, but now is a core component of the DataONE services. The service uses [Docker Compose](https://docs.docker.com/compose/) to manage a set of [Docker](https://www.docker.com/) containers that run the service. The service is intended to be deployed to a virtual machine and run with [Docker Compose](https://docs.docker.com/compose/).
## Overview

The main infrastructure of the service is composed of four [Docker Compose](https://docs.docker.com/compose/) services:
Slinky is essentially just a backround job system hooked up to an RDF triplestore that converts DataONE's holdings into Linked Open Data.

1. `web`: An [Apache httpd](https://httpd.apache.org/) front-end serving static files and also reverse-proxying to an [Apache Tomcat](http://tomcat.apache.org/) server running a [GraphDB](http://graphdb.ontotext.com/display/GraphDB6/Home) Lite instance which is bundled with [OpenRDF Sesame](http://rdf4j.org) Workbench.
2. `scheduler`: An [APSchduler](https://apscheduler.readthedocs.org) process that schedules jobs (e.g., update graph with new datasets) on the `worker` at specified intervals
3. `worker`: An [RQ](http://python-rq.org/) worker process to run scheduled jobs
4. `redis`: A [Redis](http://redis.io) instance to act as a persistent store for the `worker` and for saving application state
It's made up of five main components:

In addition to the core infrastructure services (above), a set of monitoring/logging services are spun up by default. As of writing, these are mostly being used for development and testing but they may be useful in production:
1. `web`: Provides a public-facing API over Slinky
2. `virtuoso`: Acts as the backend graph store
3. `scheduler`: An [RQScheduler](https://github.com/rq/rq-scheduler) process that enqueues repeated jobs in a cron-like fashion
4. `worker`: One or more [RQ](http://python-rq.org/) processes that runs enqueues jobs
5. `redis`: A [Redis](http://redis.io) instance to act as a persistent store for the `worker` and for saving application state

1. `elasticsearch`: An [ElasticSearch](https://www.elastic.co/products/elasticsearch) instance to store, index, and support analysis of logs
2. `logstash`: A [Logstash](https://www.elastic.co/products/logstash) instance to facilitate the log pipeline
3. `kibana`: A [Kibana](https://www.elastic.co/products/kibana) instance to search and vizualize logs
4. `logspout`: A [Logspout](https://github.com/gliderlabs/logspout) instance to collect logs from the [Docker](https://www.docker.com/) containers
5. `cadvisor`: A [cAdvisor](https://github.com/google/cadvisor) instance to monitor resource usage on each [Docker](https://www.docker.com/) container
6. `rqdashboard`: An [RQ Dashboard](https://github.com/nvie/rq-dashboard) instance to monitor jobs.
![slinky architecture diagram showing the components in the list above connected with arrows](./docs/slinky-architecture.png)

As the service runs, the graph store will be continuously updated as datasets are added/updated on [DataOne](https://www.dataone.org/). Another scheduled job exports the statements in the graph store and produces a Turtle dump of all statements at [http://dataone.org/d1lod.ttl](http://dataone.org/d1lod.ttl).
As the service runs, the graph store will be continuously updated as datasets are added/updated on [DataOne](https://www.dataone.org/).

### Contents of This Repository

```
```text
.
├── d1lod # Python package which supports other services
├── docs # Detailed documentation beyond this file
├── logspout # Custom Dockerfile for logspout
├── logstash # Custom Dockerfile for logstash
├── redis # Custom Dockerfile for Redis
├── rqdashboard # Custom Dockerfile for RQ Dashboard
├── scheduler # Custom Dockerfile for APScheduler process
├── web # Apache httpd + Tomcat w/ GraphDB
├── worker # Custom Dockerfile for RQWorker process
└── www # Local volume holding static files
├── slinky # Python package used by services
├── docs # Documentation
├── helm # A Helm chart for deploying on Kubernetes
```

Note: In order to run the service without modification, you will need to create a 'webapps' directory in the root of this repository containing 'openrdf-workbench.war' and 'openrdf-sesame.war':
## What's in the graph?

```
.
├── webapps
│   ├── openrdf-sesame.war
└   └── openrdf-workbench.war
```
For an overview of what concepts the graph contains, see the [mappings](/docs/mappings.md) documentation.

These aren't included in the repository because we're using GraphDB Lite which doesn't have a public download URL. These WAR files can just be the base Sesame WAR files which support a variety of backend graph stores but code near https://github.com/ec-geolink/d1lod/blob/master/d1lod/d1lod/sesame/store.py#L90 will need to be modified correspondingly.
## Deployment

Slinky is primarily designed for deployment on the DataONE [Kubernetes](https://kubernetes.io/) cluster.
However, a [Docker Compose](https://docs.docker.com/compose/) file has been provided for anyone that doesn't have a cluster readily available but still wants to run Slinky.

## What's in the graph?
### Deployment on Kubernetes

For an overview of what concepts the graph contains, see the [mappings](/docs/mappings.md) documentation.
To make installing Slinky straightforward, we provide a [Helm](https://helm.sh) chart.

Pre-requisites are:

## Getting up and running
- A [Kubernetes](https://kubernetes.io) cluster
- [Helm](https://helm.sh)

Assuming you are set up to to use [Docker](https://www.docker.com/) (see the [User Guide](https://docs.docker.com/engine/userguide/) to get set up):
Install the Chart by running:

```sh
cd helm
helm install $YOUR_NAME .
```
git clone https://github.com/DataONEorg/slinky
cd slinky
# Create a webapps folder with openrdf-sesame.war and openrdf-workbench.war (See above note)
docker-compose up # May take a while

See the [README](./helm/README.md) for more information, including how to customize installation of the Chart to support Ingress and persistent storage.

### Local Deployment with Docker Compose

To deploy Slinky locally using [Docker Compose](https://docs.docker.com/compose/), run:

```sh
docker compose up
```

After running the above `docker-compose` command, the above services should be started and available (if appropriate) on their respective ports:
1. Apache httdp → $DOCKER_HOST:80`
2. OpenRDF Workbench → `$DOCKER_HOST:8080/openrdf-workbench/`
3. Kibana (logs) → `$DOCKER_HOST:5601`
4. cAdvisor → `$DOCKER_HOST:8888`
After a few minutes, you should be able to visit http://localhost:9181 to see the worker management interface and see work being done or http://localhost:8080 to send SPARQL queries to the endpoint.

Where `$DOCKER_HOST` is `localhost` if you're running [Docker](https://www.docker.com/) natively or some IP address if you're running [Docker Machine](https://docs.docker.com/machine/). Consult the [Docker Machine](https://docs.docker.com/machine/) documentation to find this IP address. When deployed on a Linux machine, [Docker](https://www.docker.com/) is able to bind to localhost under the default configuration.
### Virtuoso

The virtuoso deployment is a custom image that includes a runtime script
for enabling sparql updates. This command is run alongside the Virtuoso
startup script in a different process and completes when the Virtuoso
server comes online. This subsystem is fully automated and shouldn't need
manual intervention during deployments.

## Testing
#### Protecting the Virtuoso SPARQLEndpoint

Tests are written using [PyTest](http://pytest.org/latest/). Install [PyTest](http://pytest.org/latest/) with
In order to protect the `sparql/` endpoint that Virtuoso exposes, follow
[this](http://vos.openlinksw.com/owiki/wiki/VOS/VirtSPARQLProtectSQLDigestAuthentication)
guide from Open Link. While performing 'Step 6', use the `Browse` button
to locate the authentication function rather than copy+pasting
`DB.DBA.HP_AUTH_SQL_USER;`, which is suggested by the guide. _This
should be done for all new production deployments_.

### Scaling Workers

To scale the number of workers processing datasets beyond the default, run:

```sh
kubectl scale --replicas=3 deployments/{dataset-pod-name}
```
pip install pytest
cd d1lod
py.test
```

As of writing, only tests for the supporting Python package (in directory './d1lod') have been written.
Note: The test suite assumes you have a running instance of [OpenRDF Sesame](http://rdf4j.org) running on http://localhost:8080 which means the Workbench is located at http://localhost:8080/openrdf-workbench and the Sesame interface is available at http://localhost:8080/openrdf-sesame.
## Testing

A test suite is provided for the `slinky` Python package used by workers.
Tests are written using [pytest](http://pytest.org).

See the [slink README](./slinky/README.md) for more information.
11 changes: 0 additions & 11 deletions d1lod/Makefile

This file was deleted.

24 changes: 0 additions & 24 deletions d1lod/README.md

This file was deleted.

18 changes: 0 additions & 18 deletions d1lod/d1lod/__init__.py

This file was deleted.

Loading