# DMS (Data Management System)

This document is an introduction to the technical design of Data Management Systems (DMS). This also covers Data Portals since Data Portals are one major solution one can build with a data management system.

# Domain Model

  • Project: a data project. It has has a single dataset in the same way GitHub or Gitlab “project” has a single repo. Traditionally in, say CKAN, this has been implicit and identified with the dataset. There are, however, important differences: a project can include a dataset but also other related functionality such as issues, workflows etc.
  • Dataset: a set of data, usually zero or more resources.
  • Resource (or File): a single data object.

Revisioning

  • Revision
  • Tag
  • (Branch)

Presentation

  • View
  • Showcase
  • Data API

Identity and Permissions

  • Account
  • Profile
  • Permission

Data Factory

  • Task
  • DAG (Pipeline)
  • Run (Job)

# Actions / Flows [component]

  • View Dataset: [Showcase page] a page displaying the dataset (or a resource)
    • View a Revision / Tag / Branch:
  • Add / Upload: …
  • Tag

# Components

  • MetaStoreService: stores dataset metadata (and revisions)
  • HubStore: stores all the users, organizations and their connections to the datasets.
  • SearchStore + Service: search index and API
  • BlobStore: stores blobs (for files)