Metadata Reconciliation
The BSS spreadsheets contain many similar items that use slightly different terminology. We want to eliminate these duplicates as we load the Baseline Storage Sheets (BSS), to ensure that they are comparable across countries and regions.
Metadata Reconciliation Tables
To enable this we have two additional metadata items: Wealth Characteristic Label and Activity Label. These items are used to translate the items in Column A of the ‘WB’ and ‘Data’ worksheets respectively, converting them into specific deduplicated items across many different metadata types.
These two tables are used by the data ingestion pipelines to recognize the data in the BSS and convert it to a standard form. Completing the review of these metadata items is critical to being able to automatically recognize the data in the BSS and automate the ingestion of it.
These metadata items contain some standard columns:
label: the exact text from Column A of the “WB” or “Data” worksheet
langs: the primary language(s) detected for the BSS where the label is used
datapoint_count_sum: The number of data points (i.e. non-blank cells) that use that label
unique_filename_count: The number of BSS that contain data with this label
min_row_number: The smallest row number in any BSS that contains this label
max_row_number: The largest row number in any BSS that contains this label
filename_for_min_row: The name of the BSS where the label has the smallest row number
filename_for_max_row: The name of the BSS where the label has the largest row number
status: Indicates whether the label has been reviewed and the correct lookups have been assigned and the record is ready to be used.
The Wealth Characteristic Label worksheet contains the following additional columns:
wealth_characteristic_id: the name of the standard WealthGroupCharacteristic that this label represents. For example, Donkey number owned matches the
number owned
WealthGroupCharacteristic, and Land area cultivated (hectares) matches theland area cultivated
WealthGroupCharacteristic.product_name: the common name of the standard Product that this label represents, if appropriate. For example, Donkey number owned matches the
L02132: Donkeys
Product.unit_of_measure_id: the name of the standard Unit Of Measure that this label represents. For example, Land area cultivated (hectares) matches the
ha: Hectare
Unit Of Measure.
The Activity Label worksheet contains the following additional columns:
is_start: Indicates whether this label indicates the start of a new Livelihood Strategy. In the BSS these cells often have a light green background. For example Sorghum Deyr: kg produced or any other crop name followed by kg produced indicates the start of a new Activity.
strategy_type: the name of the Livelihood Strategy subtype, such as
MilkProduction
, orOtherCashIncome
.attribute: the standard name for this attribute that matches the name of the field in the data model for the Livelihood Activity that will store the data from this row. For example Sorghum Deyr: kg produced maps to the
quantity_produced
field.product_name: the common name of the standard Product that this label represents, if appropriate. For example, Donkey number owned matches the
L02132: Donkeys
Product.unit_of_measure_id: the name of the standard Unit Of Measure that this label represents. For example, Land area cultivated (hectares) matches the
ha: Hectare
Unit Of Measure.currency_id: the iso4217a3 code for the currency if one is specified in the label. This field is not often used.
season: the name or alias of the season, if one is specified in the label.
additional_identifier: an additional identifier required to distinguish between Livelihood Strategies. For example, there may be two Strategies for growing maize, with labels Maize rainfed: kg produced and maize irrigated: kg produced. In this case the additional identifiers will be
rainfed
andirrigated
, the Strategy Type will beCropProduction
, the Unit Of Measure will bekg: Kilogram
and the product will beR01122: Maize/corn grain
Metadata Reconciliation Process
Find the row in the ActivityLabel or WealthCharacteristicLabel worksheet in the Reference Data spreadsheet.
Open the files listed in the filename_for_min_row and filename_for_max_row columns so that you can see the label in context to decide whether it is the start of a new Livelihood Activity, etc.
Fill in the attributes in the Reference Data spreadheet. Most attributes are constrained by validation to selecting from a pre-approved list of values. When you find the need for a Product, for example, that isn’t already available then reach out to the Hub and they add the name to the list of available products after checking the appropriate code, etc.
Mark the metadata row as Complete.