Chapter 5 Data Exploration and Transformation

There are many different ways of arranging data, especially within the R environment. Two of the most frequently used are the wide and long formats. Here we show what the wide and long formats are and ways to move between the two.

5.1 Wide

Wide format is where each row represents a data point and each column an attribute of that data point. In the example below each data point is a site, and each column is a species percentage cover.
Table 5.1: Table 5.2: Example of wide format
Site Species1 Species2 Species3 Species4 Species5
Site_1 67 42 96 6 88
Site_2 38 13 84 72 100
Site_3 0 81 20 78 36
Site_4 33 58 53 84 33
Site_5 86 50 73 36 88

5.2 Long

Long format is where each row has a value of an attribute of the data point. Each row in the example below records what the value of a species is at a particular site.

Table 5.3: Table 5.4: Example of long format
Site Species cover
Site_1 Species1 67
Site_2 Species1 38
Site_3 Species1 0
Site_4 Species1 33
Site_5 Species1 86
Site_1 Species2 42

There are a number of ways of moving between the two:

5.3 Wide to long


# library(reshape2)
df_long <- melt(data = df_wide,
             variable.name = "Species",
             value.name = "cover")

df_long <- df_wide |>
  gather(key = Species,
         value = cover,
         Species1:Species5,  # vector of columns to gather
         factor_key = FALSE)

5.4 Long to Wide

df_wide <- dcast(data = df_long,
                 Site ~ Species, 
                 value.var = "cover")

df_wide <- df_long |>
  spread(key = Species,
         value = cover,
         fill = 0)

5.5 Spatial data

Spatial data have coordinates. IN the simplest form they are a set of points. They may be polygonal (both regular and irregular) depicting any sort of feature. They may have additional information attached to them such as a data frame. They may take the form of a grid and may be in the raster format. Very frequently they represent the land surface and as such may have coordinate reference information attached (eg. SWEREF 99 TM) There are several resources for handling such data:

  • in R the most frequent base libraries encountered are:

    • sp - Classes and methods for spatial data
    • raster - Geographic data analysis and modeling
    • sf - Simple features for R
    • rgdal - Bindings for the ‘Geospatial’ Data Abstraction Library
  • There are a number of other programs that are used for spatial GIS data. THese include:

    • QGIS - Open source GIS
    • ArcGIS - Esri ArcGIS

Provided are the libraries and the appropriate functions for exporting spatial data from R

5.5.1 Spatial vector data

library(sf)
write_sf() #capable of writing multiple different file formats

library(rgdal)
writeOGR() #capable of writing multiple different file formats

library(raster)
shapefile() #writes ESRI shapefiles

5.5.2 Raster

library(raster)
writeRaster() #capable of writing multiple different file formats

library(terra)
writeRaster() #capable of writing multiple different file formats