3. Map samples
Explore the spatial and seasonal distribution of samples and inspect associated environmental data.
Inspect contextual data
Core sample-level metadata is available in merged_df$events:
colnames(merged_df$events)Samples are ordered consistently across tables, which makes it straightforward to link sequence data with contextual data. Thus, rownames(merged_df$events), rownames(merged_df$emof) and colnames(merged_df$emof) are the same. Likewise, ASVs are ordered consistently between merged_df$counts and merged_df$asvs.
To simplify downstream analyses, we identify samples belonging to each dataset:
DS_2013 <- grep("KTH-2013-Baltic", rownames(merged_df$events))
DS_2019_2020 <- grep("PRJEB55296", rownames(merged_df$events))Check primers
We check which primers were used in the datasets:
unique(merged_df$events$pcr_primer_name_forward[DS_2013])
unique(merged_df$events$pcr_primer_name_forward[DS_2019_2020])
unique(merged_df$events$pcr_primer_name_reverse[DS_2013])
unique(merged_df$events$pcr_primer_name_reverse[DS_2019_2020])This verifies that the same primers were used in the two datasets.
Extract spatial and temporal variables
We extract latitude, longitude, month, and day of year for each sample:
lat <- merged_df$events$decimalLatitude
lon <- merged_df$events$decimalLongitude
month <- month(merged_df$events$eventDate)
yday <- yday(merged_df$events$eventDate)Inspect EMoF data
Additional contextual variables are available in merged_df$emof. Unlike fields in other metadata tables, EMoF variables are dataset-specific, meaning that the same variable may be stored under different column names. These variables therefore need to be identified and harmonised before analysis. We start by inspecting the column names:
colnames(merged_df$emof)Salinity
We now extract salinity for all samples. Because these variables are recorded differently in the two datasets, values are taken from different columns.
salinity <- rep(NA, nrow(merged_df$emof))
salinity[DS_2013] <- as.numeric(merged_df$emof$`salinity (psu)`[DS_2013])
salinity[DS_2019_2020] <- as.numeric(merged_df$emof$`salinity_average (psu)`[DS_2019_2020])Prepare plotting symbols
We define dataset-specific plotting symbols:
pch <- rep(NA, nrow(merged_df$counts))
pch[DS_2013] <- 21
pch[DS_2019_2020] <- 22We also define a colour scale for seasonality:
color_yday <- colorRampPalette(
c("#2c7fb8", "#addd8e", "#edf8b1", "#fa9fb5", "#2c7fb8")
)(366)Map samples
We define a function that plots monthly maps of the study area, showing where and when samples were collected, with point colour indicating sampling date and point size indicating salinity.
plot_map <- function(dataset) {
par(mfrow = c(4, 4), mar = c(3, 3, 3, 3), xpd = TRUE)
for (i in 1:12) {
ix <- intersect(dataset, which(month == i))
newmap <- getMap(resolution = "low")
plot(
newmap,
xlim = c(11, 22),
ylim = c(62, 63),
asp = 1,
main = paste("Month", i)
)
points(
lon[ix],
lat[ix],
col = "black",
bg = color_yday[yday][ix],
pch = pch[ix],
cex = 1.5 + as.numeric(salinity[ix]) / 20
)
}
plot(
1:365, rep(1, 365),
col = color_yday,
pch = "|",
cex = 3,
axes = FALSE,
ylim = c(0.9, 1.3)
)
axis(1, at = c(1, 182, 365), labels = c("12", "182", "365"), cex = 3)
text(182, 1.1, "Day of year", cex = 1.4)
plot(
c(0, 10, 20), rep(1, 3),
col = "black",
pch = 1,
cex = c(1.6, 2.4, 3.2),
xlim = c(-5, 25),
ylim = c(0.9, 1.3),
axes = FALSE
)
axis(1, at = c(0, 10, 20), labels = c("2", "18", "34"), cex = 3)
text(2, 1.1, "Salinity", cex = 1.4, adj = 0)
}We apply the function to plot the DS_2013 dataset:
plot_map(DS_2013)And the DS_2019_2020 dataset:
plot_map(DS_2019_2020)← Previous · Overview · Next →