Data Collection Assets Overview
Data Collection assets are a strategic way to catalog FEWS NET’s data beyond time series. Assets can be added to Data Collections as uploaded files or as links to external resources.
A Data Collection can contain Data Points and/or Data Collection assets. For example, qualitative documents can be uploaded as assets alongside structured Data Points. Alternatively, a Data Collection may consist solely of a file containing survey data that is not suitable for ingestion as Data Points, and is therefore stored as an asset.
Using assets to catalog FEWS NET’s data stored both in FDW and in external locations has several key advantages:
Users can easily find uploaded and externally stored resources and their associated metadata in the FEWS NET Data Explorer (FDE).
The FEWS NET API can run queries on the full data inventory regardless of where the data is stored.
Tools such as the Data Freshness and Data Inventory Dashboards can be updated to allow a visual exploration of externally stored, in addition to FDW-stored, data.
Supporting documentation can be organized and preserved alongside ingested data, providing a richer context and preserving valuable information.
For details on adding and editing Data Collection assets, see Managing Data Collection Assets.
Use cases
Possible Data Collection assets include survey designs, training materials for enumerators, scripts used to clean or process data, code books, and profile reports and fact sheets associated with baselines. Several use cases are described below.
Supporting documentation for tabular data
Crop production data
Crop production data is often extracted from PDF documents and standardized into a tabular format for ingestion into FDW.
By uploading the source PDF as a Data Collection asset alongside the ingested Data Points, we document the standardization process, retain the original language and formatting of the data, and allow users to reference the raw source material for independent validation or further analysis.
Market price projections
Price projections are developed using a technical projections Excel file and an integrated price projections worksheet Word document.
Data Points are added to four Data Series in FDW: the three technical projections models and the integrated projection.
The Excel file contains the details of the data used to develop these Data Points. The matching Word document gives more information on the assumptions and the algorithms used by the FEWS NET Markets and Trade team to develop integrated price projections.
By uploading both the Excel and Word files as Data Collection assets, we have the ability to run internal retrospective analyses around the accuracy of the chosen integrated projection.
Data outside standard ingestion formats
Nutrition survey
Nutrition surveys often vary in structure, requiring effort to standardize and load the data into FDW’s Nutrition Domain.
If the survey structure varies month to month, it may be more practical to upload the PDF of tabular data as a Data Collection asset, rather than trying to pull out Data Points for upload.
Supporting documentation on the form design, collection methodology, and enumerator training materials can also be uploaded to the same Data Collection. Cataloging these materials together preserves the context of the data, making it easier to assess its reliability.
When not to use Data Collection assets
Semi-structured Data Series vs. assets
The Semi-structured Domain in FDW allows you to upload tabular data without meeting the additional metadata requirements of FEWS NET’s standardized domains. If the data will have scheduled updates, you likely want to ingest it as multiple Data Collections with Data Points. This allows you to use the FEWS NET API to extract the data across multiple Data Collections.
If the data does not lend itself to a time series analysis, it may be more appropriate to upload it as a set of assets.
Files not specific to one Data Collection
Supporting documentation such as reports, training materials, and process documentation may apply to multiple Data Collections. You should only add assets that are unique to a given Data Collection.
For example, if a market functioning map is updated every quarter with new data, the ArcGIS file that is used to generate multiple Data Collections should not be attached to a single Data Collection.