Error matrix is the most commonly used accuracy assessment tool mostly during Land Cover Classification. In accuracy assessment, typically the land cover classification in the map is compared to the true land cover condition. This is done using the ‘ground truth’, or ‘reference data’. Since the ground truth data is not practically attainable, researchers use the reference data such as higher quality data. Again, it is not possible to obtain the reference land cover classification for the entire region of interest, therefore, a statistical sampling method is used to produce a ‘sample’ i.e. subset or portion of the region mapped, and accuracy assessment is carried out. Most common method for accuracy assessment at present is the error matrix where a confusion matrix displaying the proportion of area that is correctly classified and misclassified for the different land cover types is produced. It is helpful in estimating overall accuracy, user and producer accuracy, errors of omission and commission, and Kappa statistics.
What is an Error Matrix?
“An error matrix is a square array of numbers organized in rows and columns that express the number of sample units (i.e., pixels cluster of pixels, or polygons) assigned to a particular relative to the actual category as indicated by the reference data.”Congalton (2004)
In error matrix, the columns are the reference data and rows are map generated from remotely sensed data. Reference data are assumed to be correct and are collected from a variety of sources, including photographic interpretation, ground or field observation, and ground or field measurement. Error matrix is used to find the overall accuracy, user and producer accuracy, error of commission and omission, and Kappa statistics.
Overall Accuracy: The overall accuracy is a ratio of sum of the number of correctly classified sites (diagonal elements) to the total number of reference sites. It tells us about the proportion of maps that were mapped correctly.
User Accuracy: The accuracy with respect to a data user who needs to know how often a given map class will be represented on the ground. It is the total number of correct sample units in a category divided by the total number of sample units classified into that category on the map.
Producer Accuracy: The overall accuracy with respect to the dataset produced. It is the total number of correct sample units in a category divided by the total number of sample units of that category from the reference data (i.e., the column total).
Both user and producer accuracy provide critically important information on class specific accuracy, and these accuracy parameters sure the components of omission and commission errors.
Omission error: The rate at which sites were erroneously omitted from the correct class in the map
Commission Error: The rate at which sites are correctly classified as “reference sites” but were erroneously omitted from the correct class in the classified map.
Kappa Statistics: A measure used to evaluate the accuracy of a classification. It evaluates how well the classification was performed in comparison to just randomly assigning values.
Error Budget Analysis
Typically, the error matrices are built on the overall errors that all classes in general show on classification. There are a few instances where the error matrices are built based on the partition of its components. This has led to a lack of information on which class/component has been accurately classified or which component has contributed to most errors. Some errors could be small while some could be large. It is worth noting that misclassification of each component could add to the total error budget. Therefore, it is important to determine an error matrix budget as well.
A common approach to look at an error budget is to create a special error budget analysis table. Congalton (2004) suggests that this table is generated, column-by-column, beginning with a listing of the possible sources of errors. Once the various components that comprise the total error are listed, then each component can be assessed to determine its contribution to the overall error. Finally, an error index can be created directly multiplying the error contribution potential by the error control difficulty. This way a priority can be set to deal with individual errors.
Congalton (2004) suggested the following template to conduct error budget analysis:
|Error Source||Error Contribution Potential||Error Control Difficulty||Error Index||Error Priority|
|Error Contribution Potential:||Relative potential for this source as contributing factor to the total error (1=low, 2= medium, and 3 = high)|
|Error Control Difficulty:||Given the current knowledge about this source, how difficult is controlling the error contribution (1= not very difficult to 5= very difficult).|
|Error Index:||An error index that represents the combination of error potential and error difficulty|
|Error Priority:||Order in which methods should be implemented to understand control, reduce, and/or report the error due to this source based on the error index.|