1 R & RStudio
1.1 History and overview
R is a dialect of S, a language developed in the 1970s at what was then Bell Telephone Laboratories, designed to support statistical analysis in a private development environment. Originally, S was created as a set of FORTRAN libraries, but during the 1980s, it was completely rewritten in the C programming language. The main limitation of S is that it is only available through the commercial product S-PLUS. For this reason, a movement began in the 1990s that eventually led, toward the end of the decade, to the creation of R. Since then, R has been released under the GNU license and has become one of the most widely used software tools in the world for statistical analysis.
Figure 1.1: Spread of statistical software (source: https://r4stats.com).
1.2 Install and usage
I recommend using the RStudio IDE, which is available for free on the official
posit.co website.
The paid version offers priority support, which is generally not essential for most common use cases.
The first step to setting up a working R environment is to download and install the latest version of R for your operating system from the CRAN website.
Next, proceed with the download and installation of the latest version of RStudio for your system.
Finally, in addition to the base R packages included with the installation, you will need to add the packages required to build your own working environment.
To that end, the following tutorial will suggest a few essential packages to reproduce the geostatistical analysis presented, directly within your R environment.
For more advanced users, it is also possible to install the R + RStudio stack as a Docker container.
Several projects exist on GitHub, such as RStudio R-Docker or the official RStudio version.
For greater simplicity, we recommend using the portal of the Rocker Project.
On your machine (certainly with Linux and macOS, and likely also from a Windows terminal), you can run a container with the following command:
docker run --rm -ti -e PASSWORD=yourpassword -p 8787:8787 rocker/rstudio
This command assumes that the Docker Engine has already been installed, and it will download the rocker/rstudio image if it is not already available on your machine.
To access RStudio, simply open a web browser and navigate to localhost:8787: the RStudio IDE will open directly in your browser.
For more advanced configurations — such as binding to persistent folders, setting the main user, or customizing the port — please refer to the relevant online resources. For example, you can change the base image by selecting rocker/geospatial from this web page.
For more information and customization options, see the related quickstart guide.
1.3 Resources on R
The web is full of resources to deepen your understanding of R installation, usage, management, and advanced programming.
It’s easy to come across video tutorials with clear explanations that help you learn both the technical aspects of R and domain-specific applications such as statistics, geospatial data, geostatistics, machine learning, and more.
The primary reference is the official CRAN repository, which includes at least two essential introductory manuals:
In particular, with reference to the focus of this tutorial, the following readings and resources are highly recommended:
R Programming for Data Science.
This is an excellent resource for getting started with R. It covers all the fundamental topics such as the main data types, data import/export operations, and data frame manipulation and subsetting.
In its second part, the book clearly and thoroughly explains how to write functions, construct loops, and perform parallel computing in R.Geocomputation with R.
This book introduces modern techniques for geospatial data analysis and manipulation, including visualization and processing workflows.
1.4 Libraries
Below is a summary of the environment and R version used for this tutorial:
version
#> _
#> platform x86_64-pc-linux-gnu
#> arch x86_64
#> os linux-gnu
#> system x86_64, linux-gnu
#> status
#> major 4
#> minor 2.2
#> year 2022
#> month 10
#> day 31
#> svn rev 83211
#> language R
#> version.string R version 4.2.2 (2022-10-31)
#> nickname Innocent and Trusting
The setup uses a Docker-based RStudio environment running on a Unix system with R version 4.2.2 (2022-10-31).
The following is a list of the main packages used to develop this lecture notes.
Library | Version | Description | Key Features |
---|---|---|---|
readxl | 1.4.5 | Read Excel files without external dependencies | Simplified import of Excel files into R |
readr | 2.1.5 | Read text files (CSV, TSV, etc.) efficiently | Fast reading of structured text data (CSV, TSV, etc.) |
ggplot2 | 3.5.2 | Create advanced and visually appealing plots | Grammar of graphics for reproducible, layered plotting |
dplyr | 1.1.4 | Intuitive manipulation and transformation of tabular data | Filter, group, mutate and join data frames easily |
lubridate | 1.9.4 | Manage and parse date-time formats | Work with timestamps and time zones in a clean way |
tmap | 4.1 | Design thematic and cartographic maps | Facilitates creation of both static and interactive maps |
sf | 1.0-21 | Handle vector spatial data with modern classes | Supports Simple Features standard, spatial joins, and transformations |
terra | 1.8-54 | Manipulate raster data in a modern and efficient way | Raster algebra, resampling, reclassification, and more |
stars | 0.6-8 | Support for multi-dimensional spatiotemporal data (e.g., NetCDF, HDF5) | Work with cubes, temporal series, and environmental rasters |
gstat | 2.1-3 | Perform classical geostatistical analysis (e.g., variograms, kriging) | Model spatial autocorrelation, fit variograms, perform kriging |
1.4.1 Install library
Installation is performed via the command line and may use the official repository (as in the case of gstat
) or a third-party repository (such as GitHub or others, as in the case of gstlearn
):
install.packages("gstat")
1.4.2 Load library
At startup, R loads only a few base libraries (such as base
and graphics
).
Therefore, it is necessary to load any additional libraries required to carry out an analytical workflow specific to a given domain.
I proceed to load into the R environment the libraries installed in the previous step:
1.4.3 Aiuto / Help
help(package = "gstat")
1.5 File system
R provides a set of functions to interact with the file system, which is essential for managing working directories and accessing data files.
-
getwd()
– returns the current working directory
-
setwd("path/to/your/folder")
– sets the working directory
-
list.files()
– lists files in the current working directory
-
dir.create("myfolder")
– creates a new folder
-
file.exists("myfile.csv")
– checks whether a file exists
-
unlink("myfile.csv")
– deletes a file or directory
You can also use file.path()
to create paths that are compatible across operating systems.
1.6 Basic operations
R can be used as a calculator and supports basic arithmetic, logical, and relational operations.
# Arithmetic
2 + 3 # addition
5 * 10 # multiplication
10 / 3 # division
2^3 # exponentiation
# Logical
TRUE & FALSE
TRUE | FALSE
!TRUE
# Relational
5 > 3
5 == 5
"cat" != "dog"
# Vectors and data frames are core structures for handling data in R:
x <- c(1, 2, 3, 4)
mean(x)
sum(x)
df <- data.frame(id = 1:3, name = c("Alice", "Bob", "Charlie"))
df$name
1.7 Inport / Read data
1.7.1 Tabular data
R provides several functions to import tabular data, such as text or CSV files:
read.csv2("file.csv")
orread_csv2("file.csv")
(from readr package) – for CSV files with semicolon separatorsreadxl::read_excel("file.xlsx")
– for Excel files (you can also specify a sheet with sheet = 2)
1.7.2 Geospatial data
To read geospatial data formats, R offers modern functions and drivers:
-
sf::st_read("file.shp")
– reads vector data (e.g., shapefiles, GeoPackage, etc.) -
terra::rast("file.tif")
– reads raster data (GeoTIFF and similar formats) -
stars::read_stars("file.tif")
– also reads raster or multidimensional data formats
1.8 Export / Write data
1.8.1 Tabular data
Exporting basic data structures like data frames:
-
write.csv2(df, "file.csv")
– writes a data frame to a CSV file with semicolon separators
1.8.2 Geospatial data
For spatial data, use the appropriate functions based on the data model:
-
sf::st_write(obj, "output.gpkg")
– writes vector data (GeoPackage, shapefile, etc.) -
terra::writeRaster(r, "output.tif")
– writes raster data -
stars::write_stars(obj, "output.tif")
– writes multidimensional raster data
Note that the file extension plays a crucial role in defining both the format and the appropriate driver used to write the data.
For example:
- Vector data:
-
.shp
→ ESRI Shapefile driver
-
.gpkg
→ GeoPackage driver
-
- Raster data:
-
.tif
→ GeoTIFF driver
-
.asc
→ ASCII Grid (ESRI format)
-
.nc
→ NetCDF (for multidimensional data)
-
Choosing the correct extension ensures compatibility with GIS software and downstream processing tools.
1.9 R Markdown
R Markdown is a versatile format that combines code, text, and visualizations in a single document.
- Files use the .Rmd extension and are composed of text blocks (written in Markdown) and code chunks (usually in R).
- You can render to multiple formats: HTML, PDF, Word, and slides.
Basic structure of an .Rmd document:
---
title: "My Report"
author: "Your Name"
output: html_document
---
## Introduction
This is a paragraph.
``` r
# This is an R chunk
summary(cars)
#> speed dist
#> Min. : 4.0 Min. : 2.00
#> 1st Qu.:12.0 1st Qu.: 26.00
#> Median :15.0 Median : 36.00
#> Mean :15.4 Mean : 42.98
#> 3rd Qu.:19.0 3rd Qu.: 56.00
#> Max. :25.0 Max. :120.00
```
Useful shortcuts in RStudio:
- Ctrl + Alt + I
– insert a new code chunk
- Knit
button – render the document to the selected output format
R Markdown is especially useful for generating reproducible reports, lecture notes, data summaries, and interactive documents (with shiny
, plotly
, or leaflet
).