Skip to content

Data tier MVP requirements  #20

@tdudgeon

Description

@tdudgeon

This issue summarises the requirements of a MVP data tier.
This is a server side application that provides data and services to mini-app applications such as the pose viewer.

Mini-apps licensing and security

The data tier is the glue in the mini-apps ecosystem.

Authentication

Use of mini-apps and the data tier should be restricted to people who have logged in. Authentication and Authorisation will be done using the existing Keycloak environment providing SSO across these and other IM apps.
Users will be able to register themselves.

Data visibility

In the public version data for all users is publicly visible. Any user will be able to get a list of registered users and see and use their datasets. A private instance will allow data access to be restricted so that data can be kept private.

Licensing

The mini-app components will be licensed with a permissive license allowing them to be used freely.
The individual mini-app applications will be available under two licenses:

  1. an open license that allows the application to be deployed and used without charge but obligates that all data is public
  2. a commercial license that allows the application to be deployed and used in a public or private setting and allows data to be kept private.

Key functions

User management

  1. A user can register
  2. A user can see a list of other users

Dataset upload

  1. A user can upload a dataset from a local file
  2. A user can load a dataset from a HTTP(s) or FTP URL

Data types (e.g. a media type such as chemical/x-mdl-sdfile) would be determined where possible for the file extension or Content-Type header, but the user can override this. User can also give the dataset a simple name, a detailed description and zero or more labels that can be used as filters. The date of creation and hash code of the data should be recorded.
Data is immutable. If it needs to be changed a new version is created.

Initial data types to be supported:

  • SDF
  • PDB
  • CSV/TAB

Dataset fetch

Each dataset has a URL that can be used to fetch the dataset from any mini-app.
This includes datasets from other users (assuming user that dataset is visible to the user).

A dataset can be fetched in a number of formats that can be requested using the Accept-Type header.
For instance, a dataset that was uploaded form a SDF file could be fetched in SDF format (chemical/x-mdl-sdfile) or in Squonk JSON format (application/x-squonk-dataset-molecule, along with it's corresponding metadata).

Explore datasets

User can list their datasets, including applying filters for data types, date and labels.
User can do the same for datasets from other users if they are visible to them.

This API should support exploring datasets from any application e.g. another mini-app will want to be able to list and filter the datasets that are available.

Dataset share

In the public version all datasets are visible to all users.
In a private environment datasets are private by default but can be made public or shared with one or more users.

Services

Data tier should allow execution of services using the datasets as input.
The result of execution is a new dataset, with a record or how it was generated.

Exact details need clarifying, but these sorts of services should be possible (though not initially):

  1. Squonk/Pipelines services
  2. Dataset manipulations (e.g. merge/filter datasets probably using Pandas data frames)

Metadata

Metadata

Assignees

No one assigned

    Labels

    data-tier-clientRelating to the services packageneeds-transferIssues that need to be transferred to the correct repo

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions