Crop Production Estimates
All crop production data are legitimately called “estimates,” including final corrections and revisions of early estimates. There is no better category of crop data information. FDW’s objective is to capture and archive the best and hopefully final estimates.
Crop production data is extracted from seasonal and annual crop assessment reports and surveys produced by Source Organizations. The data is prepared and uploaded to FDW once or twice a year per country. This page describes the process for preparing the data for upload using spreadsheet templates.
Data preparation
Most crop production estimate uploads will use an existing country-specific template with several key tabs filled out (DATA, ADMIN, CROPS, SEASONS, NOTES).
This will help determine the types of metadata you will use to complete the steps described below. Inserting a term that does not match one of those choices will produce an error upon upload.
If, after reading the description of the steps below, you are uncertain how to proceed in any of them, check with FEWS NET Data Management staff.
Extracting source data
Retrieve a copy of the country’s spreadsheet template, usually named <CountryName>AgStats. You will enter the new source crop data in the tab named DATA. There may already be data in the country template, which should provide concrete examples for all the steps that follow.
Most of the fields in the country template provide key data that instruct FDW how to document and archive the data. A few fields are intended to inform you, the data enterer, about how to best interpret, document, and enter the data. The template fields are described below.
Review the source document(s) containing the new crop data to determine how you will extract, process, and add the source tabular data into the upload file format shown in the DATA tab in this Excel Crop Production Data workbook. Examine the structure and format of the tabular data in the source document.
The fundamental variables that you want to extract are: reporting unit name, year, season, crop name, crop production system, area planted, area harvested, yield, and quantity.
The fundamental units of measure should be “hectares” for area, “metric tons per hectare” for yield, and metric tons for quantity. If not, you should convert them in an editing process.
If the structure of all source tables is the same, then you will need less error-checking than if the tables vary in structure or format. If the tables vary in structure, naming convention, composition of variables, or in other important manners, you will need to consider extra quality-control steps in your extraction and editing (see step 7).
If the source document is in a .pdf format, you will need an application to convert the PDF tables into spreadsheet tables. The current preferred application is ABBYY FineReader 16. It contains a full-featured suite of PDF editing tools and uses optical-character recognition (OCR) to convert the PDF content.
To convert the PDF tables, open the .pdf file in FineReader, select Save As, and choose a spreadsheet as the format.
Tip: If the source PDF document contains a lot of text, and/or other topics in addition to the tabular data you want, consider deleting the unwanted pages.
Immediately check the quality of the saved spreadsheets. If the conversion to spreadsheet format has made a number of recognition errors, or has not correctly identified or structured all the tables you want, you can then work with the OCR-editing application offered-up at the same time by FineReader to edit the output before then selecting Recognize (any edited pages) and, again, Save As to complete the conversion.
If the source data are offered in any of a variety of other non-PDF online applications that allow you to define a CSV or spreadsheet output format, prioritize that choice for your download of source data.
If the source document tables vary unpredictably in the way they are structured, or how they present the fundamental variables of crop production, it will be important to define a secondary process for error-checking and quality control steps immediately after the extraction and conversion. In many cases, this will consist of visually checking each extracted table to see if it accurately duplicates the source table.
Use all available tools and alternatives available to you to avoid manually copying and pasting the data into the FDW upload spreadsheet.
Working with templates
The order of columns in a crop data upload spreadsheet is not necessarily fixed, except for DNL (Do Not Load) columns which must always be located to the right of the columns intended for upload. However, it is helpful to follow the most common pattern described below to ensure consistent and complete documentation of the crop data to be uploaded.
Note that some of the expected content of the fields described below may differ slightly between crop data and other domains.
Source_organization
The first column is typically Source_Organization. It correlates to the entity that gathers, organizes, and originally offers access to the crop data. It is most often a Ministry of Agriculture, statistical bureau, or a multi-entity crop assessment mission; this latter term will most often indicate a UN-sponsored Crop and Food Supply Assessment Mission (CFSAM) organized by either, or both, the Food and Agriculture Organization (FAO) and the World Food Program (WFP).
The source_organization name is comprised of the entity’s name, followed by its acronym in parentheses, a comma, and the name of the country. If possible, use a source_organization name already found in the FDW Crop domain metadata for the country involved.
Source_document
The second column is Source_Document. It describes the nature or type of activity or publication which publishes the crop data (note: this differs from other FDW domains).
A common entry here would be “Official crop statistics, CountryName”. Others include “Agricultural Census,” “Annual crop assessment,” “Statistical Yearbook,” or “CFSAM.” As in the first column, this descriptor is followed by a comma and the name of the country. If possible, use a source_document name that is already found in the FDW Crop domain metadata for the country involved.
Publication
The third column is often referred to as Publication. It should provide enough specific details to be able to identify the actual document or location from which the crop data were extracted.
Common entries found in this column might include “2017/2018 Crop Assessment Report”, or “2018 Statistical Yearbook”, or “2017/2018 Crop and Food Supply Assessment Mission FSAM to Sudan”, or a URL address. Again, the name is followed by a comma and the country name.
Survey type
The fourth column is usually titled Survey_type. For crop data, the most common entry here will be “crop_final.” This indicates that the data are complete for the period described.
If a country produces multiple harvest estimates during the cropping season, or habitually produces an initial estimate followed one year later by a final estimate, a different descriptive term, usually already found in the country’s crop domain metadata, may be entered here.
FEWS NET practice is generally not to capture intermediate estimates during a season, but if a country normally publishes two estimates per harvest, and the only estimate that can be found is the first, FEWS would normally put an entry like “1st round estimate” in this field to document the status of the data entered.
Country
In the Country column, enter the country's [two-letter intl code name] two-letter code.
Zone
The Zone column is only used to provide additional information about where the reporting unit is located in the country. If it has utility, the zone field will likely already be filled-in in the country template. This field is not uploaded to the FDW.
FNID
The FNID column contains FNID codes that have already been assigned to every crop-reporting and/or administrative unit in the country. These ID codes should be selected from annual boundary sets which are found in the FDW Spatial domain. They change ID code every year in which a country’s boundaries change their geographic shape/location. If the newly-extracted data reveal a new administrative unit, you should notify FEWS NET Data specialists to have it added. Consult with Hub personnel to receive a list of FNIDs for the year(s) represented by the data.
Administration units
The Admin1 column contains the name of the first subnational crop reporting/administration unit, i.e., the first administrative unit after the national level.
After you have uploaded the new data into the template, you, or other FEWS NET personnel will update the file with FNIDs and the official Admin names as defined in FEWS NET administration shapefile hierarchy.
The Admin2, Admin3, and Admin4 columns contain the name of the next subnational administration unit names in the country’s reporting hierarchy. If the source data report only at a higher Admin level (e.g. Admin1), this field, and all others, will be empty. If you have names for Admin2, you will also have an administrative unit “parent” name in the Admin1 field. Enter the names as they appear in the source file and continue similarly for Admin3 and Admin4 if there are such.
For any of the administrative unit columns, some countries may have a different set of crop-reporting units that differ from the administrative units and are only used for crop statistics. If that is the case, what is required in one of the columns are the crop reporting unit names, which will likely have been previously coded and assigned for that purpose; annual boundary sets for them should be found in the Spatial domain.
Year
In the Data Upload column for Year, you will enter the agricultural “year” of the country's new crop data. The field is restricted to four digits in the FDW, so you will enter a four-digit (e.g., 1999) year descriptor in this column.
It is extremely important that this year be consistent with FEWS NET’s practice for identifying crop years. For countries whose annual agricultural year crosses over a December 31 date, identifying which year to insert here can be complex. Looking at the SEASON tab should help you understand which year to enter here, or it may be clear from the data already in the DATA tab. Please consult the FEWS NET Database Manager if the year to enter is not crystal clear.
If the source document identifies the year in a different format (e.g. 2017/18), also enter the source document year as it is found in a DNL_Source_year column at the right of the spreadsheet.
Additional dates
In the country template, you may next encounter several fields regarding start and end periods of the season and you may also find multiple columns describing the crop development stage dates (“phenology”), which are relative to the specific crop data being entered on each row. These are entered automatically upon upload and are not your responsibility.
Look for the Season column, where you will enter the season name(s) of the crop data being uploaded. The SEASON tab in the workbook provides additional information regarding how seasons are named for this country, and any season name entered for new data should correspond to one of those.
Crop names and codes
Enter CPCv2 crop codes in the column/field labeled “CPCv2” or “Crop.” These have already been defined for each crop in the FDW, and existing crop names for this country can be found in the country template. CPC codes can be found in FDW under Metadata management>Common>Classified products.
Source document crop names for every crop should be retained and placed in a column to the far right of the sheet generally named “DNL_Source_Crop.”
FEWS NET practice is to upload all crop data offered in the source document, except for some “fruit tree crops” (oranges, apples, pears, etc.). Coffee, cocoa, bananas, and plantains are “tree” crops whose data is often uploaded. Commercially-grown “flower” estimates are not collected. If unsure about what to add, ask the field analyst or FEWS NET Database Manager.
If you find that existing FDW crop names and code metadata provide less information about the type of crop being reported now in the source document (e.g. existing crop data for this country includes a name/code of “R01701AA” for common beans, while the new data describes the crop as “Red beans”, or “Vigna beans”), ask the field analyst or FEWS NET Database Manager which name/code should be entered.
All crop names need to pre-exist and be coded in the FDW before being uploaded.
Crop production system
In the Crop domain upload template, the column/field sometimes labelled Dominant Livelihood Zone is supposed to instead be called the Crop Production System.
Crop production system descriptors entered here would include metadata such as “rainfed,” “irrigated,” “commercial,” small-farmer,” or, very frequently, “all.” All these terms should be already present in the FDW Crop Domain metadata for the specific country you are working on, and you should enter those if they fit. If not, contact the Data Management staff who will help in defining/creating new metadata descriptors.
Each entry in this column should be followed by a “(PS)” to indicate a production system descriptor.
Crop estimate data
In the following columns/fields, you will enter crop estimate data provided in the source document.
Source documents generally distinguish between a zero value and “no data collected/no data available/no data reported.” Try to retain those differences as you move the data into the data upload sheet (“0” for a zero-value, “NA” for no crop grown, “NC” for no data collected/reported).
Enter the area planted in hectares (ha) in the Area planted column/field. The source document may refer to the area “planted” by other names, e.g., “Area under [crop]”. If the source file provides this data in a different unit (e.g. “acres”), make the conversion, but add a DNL column (“DNL_Source_area planted”) for the source unit data for information purposes. If the area data estimate does not mention “planted” or “harvested” in some manner, ask the Database Manager to assist in determining whether the area estimate is ”planted” or “harvested”.
As above, enter Area harvested in hectares (ha), if area harvested data are provided.
Enter the yield in MT/ha into the Yield column. Enter the actual yield figure given in the source document, unless: a) you have to convert it first to MT/ha, b) there is an obvious error in the yield figure, or c) no yield data are given. In the latter two cases, compute yield as “quantity” divided by “area planted” (or “area harvested” if that alone is given.
Enter the quantity produced in the Quantity column/field as Metric tons (MT) only, i.e., not bags, kgs, or sacs. If the source file quantity estimate is in a different weight or volume unit, make the conversion, but add a DNL column for the source quantity estimate for information purposes.
Quality assurance tips
You should try to ensure the accuracy of your capture and annotation of source data. Spot-checking and comparing results is a basic method for identifying systematic errors.
Frequent errors include duplications of already entered crop data. This may result if the source document provides revisions of the previous year data, which was entered into the FDW last year. There are a variety of methods you should use to identify duplicates, as all duplicates will produce an error upon upload into the FDW.
File names should be kept consistent by naming the final file with the naming convention CountryCode_AgStats (e.g., El Salvador file name would be SV_AgStats).