EO Training


Categories: Agriculture, Land,


Introduction Print

In this tutorial, we will employ RUS to run a supervised pixel/object based classification using the Random Forest algorithm and Sentinel-2 as input data over an agricultural area in Seville, Spain.

Reliable information on agriculture and crops is required to assist and help in the decision-making process of different applications. Different methods can be used to gather this information, but satellite earth observation offers a suitable approach based on the coverage and type of data that are provided.

A few years ago, the European Union (EU) started an ambitious program, Copernicus, which includes the launch of a new family of earth observation satellites known as the Sentinels. Amongst other applications, this new generation of remote sensing satellites will improve the observation, identification, mapping, assessment, and monitoring of crop dynamics at a range of spatial and temporal resolutions.

Note: This exercise was developed by the Serco EO Training Team within the framework of the RUS contract. The final outcome of this tutorial has not been validated and the information contained in this tutorial does not purport to constitute professional advice.

Data used

7 Sentinel-2A images acquired from June 1st until July 31st 2017 [downloadable at https://scihub.copernicus.eu/ using the .meta4 file provided in the AuxData folder of this exercise]

Pre-processed auxiliary data available on request

Software used

Reference folder structure

Folder structure referenced in the exercise is as follows:

  • /EO4GEO0621_CropMapping_Seville
    • /AuxData – includes auxiliary data needed for the processing – provided upon request
    • /Original – should contain your downloaded input products
    • /Processing – should contains all intermediate and final results of the processing organized in subfolders

Data download – ESA SciHUB

First, we will download the Sentinel-2 scenes from the Copernicus Open Access Hub using the online interface. Go to Applications → Network → Firefox Web Browser or click the link below. Go to https://scihub.copernicus.eu/

Go to Open HUB. If you do not have an account please sign up in the upper right corner, fill in the details and click register.

You will receive a confirmation email in the account you have specified: open the email and click on the link to finalize the registration.
Once your account is activated – or if you already have an account – log in.

Switch the rectangle drawing mode to pan mode by clicking on the icon in the upper right corner of the map (green arrow) and navigate to the Seville city area (approximate area – blue rectangle).

In this exercise, we will use 7 Sentinel-2 images (See NOTE 1) during the year 2017. The following table shows the date and reference of the images that will be used:


Switch back to drawing mode (box) and draw a search rectangle approximately as indicated below. Open the search menu by clicking to the left part of the search bar and specify the parameters below. Press the search button () after that.

In the search bar type: filename:T30STG

Sensing period: From 2017/06/01 to 2017/07/31
Check Mission: Sentinel-2
Product type: S2MSI2Ap

The search will return the 7 products that will be used in the exercise. (Check the product names against the table above). Select all the relevant products and click Add selected products to cart at the bottom of the results list.

NOTE 1: Due to the ESA policy on the availability of Sentinel data on the Copernicus Open Access Hub and to ensure the continued access to all Sentinel data at all time, the Long-Term Archive (LTA) Access has been implemented to roll-out the oldest data from the online access. More information about the LTA can be found in the following links:

To download a product from the LTA, click on the Download Product icon. A confirmation message will appear informing you that your request has been queued (image on the left), you will also see a clock icon next to the Offline sign . You can try to do this for all the products until you receive notification that the que is full (image on the right), and you will have to re-try the request few minutes later.

You will have to manually check your Cart from time to time to know when the product is available to be downloaded (no automatic notification will be sent). Once online, the product will remain available for 4 days until been roll-out to the LTA again.
Please note that every user account is only allowed to request 1 offline product every 30 minutes, if there is free space in the queue. The number of concurrent requests for offline products from all users is limited. You may receive an error when trying to download. If so, try again later.

To improve the data acquisition process, we will use a download manager (See NOTE 2) to take care of downloading all products that will be used in this exercise. Once all the products are online, go to the Cart (next to the text search field) and click Download Cart. A products.meta4 file containing the links to the selected Sentinel products will be downloaded. Move it to the following path: /shared/Training/EO4GEO0621_CropMapping_Seville/Original/

NOTE 2: A download manager is a computer program dedicated to the task of downloading possibly unrelated stand-alone files from (and sometimes to) the Internet for storage. For this exercise, we will use aria2. Aria2 is a lightweight multi-protocol & multi-source command-line download utility. More info at: https://aria2.github.io/

Before using the downloading manager and the .meta4 file, let us test if aria2 is properly installed in the Virtual Machine. To do this, open the Command Line (in the bottom of your desktop window) and type:


If aria2 is properly installed, the response should be as follows.

If the response is ‘-bash aria2c: command not found’ it means aria2 is not installed (See NOTE 3).

NOTE 3: If (and only if) the response is ‘-bash aria2c: command not found’, you need to install aria2. In the command line, type: sudo apt-get install aria2
When requested, type: Y
Once finished, test the installation as explained before.

Once aria2 is ready to use, we can start the download process. For that, we need to navigate to the folder where the products.meta4 is stored. Adapt the following command to your path, type it in the terminal and run it.

cd .../Training/EO4GEO0621_CropMapping_Seville/Original/

Next, type the following command (in a single line) to run the download tool. Replace username and password (leave the quotation marks) with your login credentials for Copernicus Open Access Hub (COAH). Do not clear your cart in the COAH until the download process is finished.

aria2c --http-user='username' --http-passwd='password' --check-certificate=false --max-concurrent-downloads=2 -M products.meta4

The Sentinel products will be saved in the same path where the products.meta4 is stored. Move the Sentinel-2 images to their corresponding folder and do not forget to unzip the Sentinel-2 products after that (once unzipped, you can delete the zipped products to save storage capacity).

R – Processing

At this point, we are ready to start the next part of our analysis. This exercise will be done using R (See NOTE 4) code implemented in a Jupyter Notebook (See NOTE 5).

NOTE 4: R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering…) and graphical techniques, and is highly extensible. R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics. It is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form.

NOTE 5: Project Jupyter is a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages. Notebook documents (or “notebooks”, all lower case) are documents produced by the Jupyter Notebook App, which contain both computer code and rich text elements (paragraph, equations, figures, links, etc…). Notebook documents are both human-readable documents containing the analysis description and the results (figures, tables, etc..) as well as executable documents which can be run to perform data analysis. More info at: www.jupyter.org

We will open Jupyter Notebook (note it must be installed) by launching it from Terminal. For that, open Terminal in your RUS Virtual Machine and copy-paste the following script. Then, press enter to run it:

jupyter lab

Jupyter lab will open in your browser, on the left side of the window navigate to to the folder where your Training folder is saved and enter the AuxData folder. Here we will create a new R notebook. The new notebook will immediately open.

In a new code cell copy-paste the following code to start analyzing the Sentinel-2 products. We start first by loading the R packages needed for this processing. If not installed, they will be installed. After that, we set the working directory the our AuxData folder.

# Load R packages
pck <- c("tidyr","rgdal","ggplot2","randomForest","RColorBrewer", "caret", "reshape2", "raster", "e1071", "rasterVis", "gdalUtils", "tcltk", "tools")
new_pck <- pck[!pck %in% installed.packages()[,"Package"]]

loaded <- lapply(pck, require, character.only=TRUE)

if (!'FALSE' %in% loaded) {print('All packages loaded')}

# Specify working directory

Before starting with the analysis, load the auxiliary data we will need later on. In this case, we need:

  • Study area shapefile
  • Training shapefile
  • Validation shapefile
Study_area <- readOGR("Study_Area.shp")
training <- readOGR("Training.shp")
validation <- readOGR("Validation.shp")

In the next chapter, we will set all the required paremeters that will be used later on to run the supervised classification. Run the section and reply to the questions accordingly.

# Where are your S2 images stored? Set the path.
S2_path <- tk_choose.dir(default= ".../Training/EO4GEO0621_CropMapping_Seville/Original/", caption="Select directory for S2 products")

# How many trees do you want to use in the RF classification? 
n_trees <- as.numeric(readline('How many trees do you want to use in the RF classification? (e.g. 500): '))

# How many images do you have? 
t_images <- 7

# How many images do you want to use?
n_images <- as.numeric(readline('How many images do you want to use? (1:7): '))

# For S2, bands to be used for classification (all except B1-10 OR B23481112)
band_input <- toTitleCase(readline('Which S2 bands do you want to use? All (input: All) or 2,3,4,8,11,12 (input: Reduced) '))

if (band_input=="All") {bands <- c("B((0[2348]_10m)|(((0[567])|(1[12])|(8A))_20m)).jp2$")}
if (band_input=="Reduced") {bands <- c("B((0[2348]_10m)|((1[12])_20m)).jp2$")}


All the processing for Sentinel-2 images will be done in R. In this chapter we will:

  • Load the S2 products in R
  • Crop the loaded products using the study area shapefile as mask
  • Resample all products to a common spatial resolution (10 meters)
  • Create a raster stack of all cropped products
  • Visualize a true/false color RGB composition of one of the cropped Sentinel-2 products as an example
# Load S2 images
S2 <- list.files(S2_path, full.names = TRUE, pattern = ".SAFE")
S2 <- S2[1:n_images]
S2 <- list.files(S2, recursive = TRUE, full.names = TRUE, pattern=bands)
S2 <- lapply(1:length(S2), function (x) {raster(S2[x])})

# Crop - Resample - Stack
S2_crop <- lapply(S2, function(S2) if(xres(S2)==10) {crop(S2, Study_area, snap="out")} else {crop(S2, Study_area, snap="near")}) 
S2_rsp <- lapply(S2_crop, FUN = function(S2_crop) {if (xres(S2_crop)==20) {disaggregate(S2_crop, fact=2, method="")} else {S2_crop}})
S2_images <- stack(S2_rsp)

# Plot True/False color
plotRGB(S2_images, r=3, g=2, b=1, scale=maxValue(S2_images[[1]]), stretch="lin")
plotRGB(S2_images, r=4, g=3, b=2, scale=maxValue(S2_images[[1]]), stretch="lin")
True color RGB composition
False color RGB composition

Once our input images for the classification are ready, we need to start working on our training data and make it ready for the classifier. For this, we will:

  • Rasterize training data to allow further processing
  • Add the output training raster to the stack of Sentinel products we have previously created.
  • Extract band information for training pixels and create a dataframe out of it. This output will be used in the next step as input to train the model
# Plot training data
plotRGB(S2_images, r=3, g=2, b=1, scale=maxValue(S2_images[[1]]), stretch="lin")
plot(training, add=TRUE, col="orange")  

# Rasterize training data
training_r <- rasterize(training, S2_images[[1]], field=training$Crop_ID)
names(training_r) <- "Class"

# Add training raster to stack
S_images_t <- addLayer(S2_images, training_r) 

# Extract band infromation and create df
training_S <- raster::extract(S_images_t, training, df=TRUE)
True color RGB composition and training polygons

The Random Forest (Breiman, 2001) algorithm is a machine learning technique that can be used for classification or regression. In opposition to parametric classifiers (e.g. Maximum Likelihood), a machine learning approach does not start with a data model but instead learns the relationship between the training and the response dataset. The Random Forest classifier is an aggregated model, which means it uses the output from different models (trees) to calculate the response variable.

Decision trees are predictive models that recursively split a dataset into regions by using a set of binary rules to calculate a target value for classification or regression purposes. Given a training set with n number of samples and m number of variables, a random subset of samples n is selected with replacement and used to construct a tree. At each node of the tree, a random selection of variables m is used and, out of these variables, only the one providing the best split is used to create two sub-nodes.

By combining trees, the forest is created. Each pixel of a satellite image is classified by all the trees of the forest, producing as many classifications as number of trees. Each tree votes for a class membership and then, the class with the maximum number of votes is selected as the final class.

When classifying using RF, the following steps will be followed:

  • Train the RF model
  • Run the classification using the trained model
# Train model
RF <- randomForest(x=training_S[ ,c(2:(length(training_S)-1))], y=training_S$Class, importance = TRUE, ntree=n_trees) 

# Classification
LC <- predict(S2_images, model=RF, na.rm=TRUE)

# Visualize output

Once the classification is run, we can visualize and assess the result.

# Convert pixel values to numbers ()
LC <- as.factor(LC)
LC_l <- levels(LC)[[1]]
LC_l[["Crop"]] <- c("Cotton","Others","Rice","SugarBeet","Tomato", "Water")
levels(LC) <- LC_l

# Plot classification
rasterVis::levelplot(LC, col.regions=c("lightsalmon1","azure2","goldenrod1","mediumseagreen","firebrick1","blue"),main=paste("RF_S2_", n_images, "_images", if (nchar(bands)>40) {"_B_2:9.11.12"}, if (nchar(bands)<40) {"_B_2."}, sep=""))

# Save classification as GeoTiFF
writeRaster(LC, filename = paste(".../Training/EO4GEO0621_CropMapping_Seville/Processing/", paste("RF_S2_", n_images, "_images", if (nchar(bands)>40) {"_B_291112"}, if (nchar(bands)<40) {"_B_23481112"}, sep = "")), format="GTiff", overwrite=TRUE)
Supervised classification. Sentinel-2 data – Random Forest

Finally, to validate the output and check the performance of the model, we will run a classic accuracy assessment using independent validation data. For this we first need to:

  • Rasterize the validation shapefile
  • Extract pixel values in the validation and classified rasters
  • Derive the confusion matrix

In addition, we will save the overall accuracy in a dataframe to be able to compare the performance of different classification scenarios.

# Rasterize validation data
validation_r <- rasterize(validation,S2_images[[1]], field=validation$Crop_ID)

# Extract values at validation pixels 
test <- raster::extract(validation_r, validation, df=TRUE)
prediction <- raster::extract(LC, validation, df=TRUE)

# Create confusion matrix
CM <- caret::confusionMatrix(data=as.factor(prediction$layer), reference=as.factor(test$layer))

# Save result in dataframe
if (!exists("accatable_RF")) {
  accatable_RF<-as.data.frame(matrix(nrow=5, ncol=t_images, dimnames = list(c("S1", "S2", "S1_S2", "S2*", "S1_S2*"),as.character(seq(1,t_images)))))
  accatable_RF[nrow, n_images]<-CM$overall["Accuracy"]} else if (exists("accatable_RF") && RForest) {
    accatable_RF[nrow, n_images]<-CM$overall["Accuracy"] 


Browse by topic
Related Tutorials


Reference folder structure Folder structure referenced in the exercise is as follows: /LAND06_UrbanClassification_Germany /AuxData – includ…


Reference folder structure Folder structure referenced in the exercise is as follows: /LAND01_CropMapping_Seville /AuxData – includes auxil…

Doing business with us

EO expertise at your service

We are used to prepare and release training materials, but also to work in partnership with the Earth Observation industry and we may complement your business case with training or consultancy services. Do not hesitate to get in touch with us to get an offer for a customized training event or EO consultancy services…