3. Map samples

Explore the spatial and seasonal distribution of samples and inspect associated environmental data.

Inspect contextual data

Core sample-level metadata is available in merged_df$events:

colnames(merged_df$events)

Samples are ordered consistently across tables, which makes it straightforward to link sequence data with contextual data. Thus, rownames(merged_df$events), rownames(merged_df$emof) and colnames(merged_df$emof) are the same. Likewise, ASVs are ordered consistently between merged_df$counts and merged_df$asvs.

To simplify downstream analyses, we identify samples belonging to each dataset:

DS_2013 <- grep("KTH-2013-Baltic", rownames(merged_df$events))
DS_2019_2020 <- grep("PRJEB55296", rownames(merged_df$events))

Check primers

We check which primers were used in the datasets:

unique(merged_df$events$pcr_primer_name_forward[DS_2013])
unique(merged_df$events$pcr_primer_name_forward[DS_2019_2020])

unique(merged_df$events$pcr_primer_name_reverse[DS_2013])
unique(merged_df$events$pcr_primer_name_reverse[DS_2019_2020])

This verifies that the same primers were used in the two datasets.


Extract spatial and temporal variables

We extract latitude, longitude, month, and day of year for each sample:

lat <- merged_df$events$decimalLatitude
lon <- merged_df$events$decimalLongitude
month <- month(merged_df$events$eventDate)
yday <- yday(merged_df$events$eventDate)

Inspect EMoF data

Additional contextual variables are available in merged_df$emof. Unlike fields in other metadata tables, EMoF variables are dataset-specific, meaning that the same variable may be stored under different column names. These variables therefore need to be identified and harmonised before analysis. We start by inspecting the column names:

colnames(merged_df$emof)

Salinity

We now extract salinity for all samples. Because these variables are recorded differently in the two datasets, values are taken from different columns.

salinity <- rep(NA, nrow(merged_df$emof))
salinity[DS_2013] <- as.numeric(merged_df$emof$`salinity (psu)`[DS_2013])
salinity[DS_2019_2020] <- as.numeric(merged_df$emof$`salinity_average (psu)`[DS_2019_2020])

Prepare plotting symbols

We define dataset-specific plotting symbols:

pch <- rep(NA, nrow(merged_df$counts))
pch[DS_2013] <- 21
pch[DS_2019_2020] <- 22

We also define a colour scale for seasonality:

color_yday <- colorRampPalette(
  c("#2c7fb8", "#addd8e", "#edf8b1", "#fa9fb5", "#2c7fb8")
)(366)

Map samples

We define a function that plots monthly maps of the study area, showing where and when samples were collected, with point colour indicating sampling date and point size indicating salinity.

plot_map <- function(dataset) {
  par(mfrow = c(4, 4), mar = c(3, 3, 3, 3), xpd = TRUE)
  for (i in 1:12) {
    ix <- intersect(dataset, which(month == i))
    newmap <- getMap(resolution = "low")
    plot(
      newmap,
      xlim = c(11, 22),
      ylim = c(62, 63),
      asp = 1,
      main = paste("Month", i)
    )
    points(
      lon[ix],
      lat[ix],
      col = "black",
      bg = color_yday[yday][ix],
      pch = pch[ix],
      cex = 1.5 + as.numeric(salinity[ix]) / 20
    )
  }

  plot(
    1:365, rep(1, 365),
    col = color_yday,
    pch = "|",
    cex = 3,
    axes = FALSE,
    ylim = c(0.9, 1.3)
  )
  axis(1, at = c(1, 182, 365), labels = c("12", "182", "365"), cex = 3)
  text(182, 1.1, "Day of year", cex = 1.4)

  plot(
    c(0, 10, 20), rep(1, 3),
    col = "black",
    pch = 1,
    cex = c(1.6, 2.4, 3.2),
    xlim = c(-5, 25),
    ylim = c(0.9, 1.3),
    axes = FALSE
  )
  axis(1, at = c(0, 10, 20), labels = c("2", "18", "34"), cex = 3)
  text(2, 1.1, "Salinity", cex = 1.4, adj = 0)
}

We apply the function to plot the DS_2013 dataset:

plot_map(DS_2013)

And the DS_2019_2020 dataset:

plot_map(DS_2019_2020)

← Previous · Overview · Next →