Data Warehouse Knowledge Base

The Data Model

A visual representation of the data platform model depicting the core model and metadata, described in detail below.

This diagram provides an overview of the structure and relationships within the Data Platform’s hierarchical data model. 

Each black dot connector represents a one-to-many relationship, where each item at a higher level can be associated with multiple items at the next level. For example, one Data Source Organization can be attached to many Data Source Documents. 

The diagram includes both core model and metadata entities, which are described below.

Core model: Data Sources and Data Collections

Data Sources

Data is attributed to both the Data Source Organization and Data Source Document entities: 

  • A Data Source Organization is an organization providing data. For example, FEWS NET or the Ministry of Agriculture - Kenya. 

  • A Data Source Document specifies the source of data provided by a Data Source Organization matching a Document Type. For example, an Excel spreadsheet from the Ministry of Finance or a survey by FEWS NET staff. This allows you to distinguish between different kinds of data provided from the same Data Source Organization.

Data Collections

Each instance of data collected via a Data Source Document is a Data Collection.

A Data Collection may contain:

  • Data Collection Assets, such as a file or an external URL, and/or;

  • Data Points that are the values of one or more variables for a Data Series for a specific Data Collection Period. For example, a DataPoint could be the Price of a 90kg bag of Maize at Garissa Market for April 2013.

    • The Data Collection Period specifies the period for which Data Points are collected as specified by a Data Source Document. For example, a Data Source Document with data collected on a monthly schedule could have Data Collection Periods for January 2024, February 2024, March 2024, etc.

Permissions and visibility

Uploaded Data Collections are automatically assigned a Draft status. Data Collections must be updated to Submitted before any of its Data Points can be included in the Data Sets they belong to.

Once submitted, Data Collections must subsequently be Published to be visible to most users.

All the Data Points in the Data Collection must be published together. If any Data Points in the Data Collection are not complete or correct then the Data Collection should be given the status Under Review. Data Collections placed under review will be excluded from Data Sets. Once the corrections are made, it can be resubmitted and published. See Generic Data Management Processes for more information on the country-level data review process.

Users need to have specific permissions granted to view Data Collections that have a status of Draft, Submitted, or Under Review. See Users and Permissions for more details.

Data Series

Time series of Data Points with the same metadata make up Data Series. For example, a Data Series could be the Wholesale Price of a locally-produced 90kg bag of Maize in KES at Garissa Market.

Metadata

Data Source Metadata

The following Data Source metadata are used to specify the domain, usage policy, and collection schedule for a Data Source Document:

  • Topic: A broad topic, such as Markets & Trade, Nutrition, etc. used to categorize Document Types. For example, the domains MarketProduct, ExchangeRate, TradeFlowQuantity all belong to an overarching Topic called Markets & Trade.

  • Document Type: A type of data provided by Data Source Documents. For example, Market Product, Preliminary Crop Production Estimate, SMART, 30x30, etc.

    • In the Crop Production domain, the Document Type is used to distinguish between crop:preliminary and crop:final Data Source Documents with the same name.

  • Data Usage Policy: A policy specifying how the data provided by a Data Source Document may be used. The application uses the Data Usage Policy to control user access to data. Regardless of whether the Data Source Document has a License specifying the full terms and conditions controlling the data, a Data Usage Policy must be set.

  • Data Source Document Schedule: The Collection Schedule that a Data Source Document follows for a specific period, such as Weekly, Monthly, or Ad Hoc. Usually the period is open-ended, but the Collection Schedule can be changed in response to a crisis.

Data Series Metadata

The following metadata are used to specify the geographic area and indicator for a Data Series.

  • Geographic Unit: A geographic area or point such as a portion of a country or other region delineated for a specific purpose, such as an Administrative Unit (Province, District, County, etc.) or a Livelihood Zone or a Market.

  • Indicator Group: A collection of Indicators that measure a common dimension of a specified data domain, e.g. Morbidity, Mortality, Stunting.

  • Indicator: An Indicator tracked by a Data Series.