Diet composition sampling will need to include more than the species in the set of 11… for now lets get the diet comp for just those species. In the next step we will figure out how to deal with biomass pool groups in surveys and for diets.
We will want a new atlantisom
wrapper function to deal with diets separately. The current om_comps()
already takes a long time to run generating length compositions and adding the huge diet file here will be really cumbersome for users only looking for ages and lengths.
The om_diet()
function will either call load_detailed_diet_comp()
on a pre-processed zipped DetailedDietCheck.txt.gz
file as explained here or will read in an .rds file saved from its output.
It will then apply create_survey()
with the diet file as dat and the survey specifications from the config file:
om_diet<- function(config = configfile,
dietfile = file_diet,
usersurvey = usersurvey_file,
omlist_ss,
n_reps = n_reps,
save = TRUE){
source(config)
#Load functional groups
fgs <- atlantisom::load_fgs(dir=d.name,
file_fgs = functional.groups.file)
# load or read in saved detailed diet
if(!file.exists(file.path(d.name,
paste0(scenario.name, "detaileddiet.rds")))){
detaileddiet <- load_detailed_diet_comp(dir = d.name,
file_diet,
fgs = fgs)
if(save){
saveRDS(detaileddiet, file.path(d.name, paste0(scenario.name, "detaileddiet.rds")))
}
} else {
detaileddiet <- readRDS(file.path(d.name,
paste0(scenario.name, "detaileddiet.rds")))
}
#one script for dimension parameters to be used in multiple functions
source("config/omdimensions.R", local = TRUE)
survObsDiets <- list()
for (s in usersurvey)
{
source(s, local = TRUE)
# survtime doesn't match units of time.days in detaileddiet
survtime <- survey_sample_full*omlist_ss$runpar$outputstep
# apply survey design to detailed diet
survey_cons <- create_survey_diet(dat = detaileddiet,
time = survtime,
species = survspp,
boxes = survboxes,
effic = surveffic,
selex = survselex)
# add observation error to survey diet
survObsDiet <- list()
for(i in 1:n_reps){
survObsDiet[[i]] <- atlantisom::sample_diet(survey_cons, fgs, unidprey = unidprey, alphamult = alphamult)
}
#save survey diets, takes a long time to generate with lots of reps/species
if(save){
saveRDS(survObsDiet, file.path(d.name, paste0(scenario.name, "_",
survey.name, "surveydiet.rds")))
}
survObsDiets[[survey.name]] <- survObsDiet
}
return(survObsDiets)
}
Reading in the full diet file works. Saved as [...]detaileddiet.rds
, a huge file.
fgs <- load_fgs(here("atlantisoutput", "NOBA_March_2020"), "nordic_groups_v04.csv")
source(here("config", "NOBA2config.R"))
file_diet <- "NOBADetDiet.gz"
save = TRUE
if(!file.exists(file.path(d.name,
paste0(scenario.name, "detaileddiet.rds")))){
detaileddiet <- load_detailed_diet_comp(dir = d.name,
file_diet,
fgs = fgs)
if(save){
saveRDS(detaileddiet, file.path(d.name, paste0(scenario.name, "detaileddiet.rds")))
}
} else {
detaileddiet <- readRDS(file.path(d.name,
paste0(scenario.name, "detaileddiet.rds")))
}
Fixing mismatches between diet data objects and survey function inputs: align time units and column name, add prey column to aggregateData function call. It was hard-coded without the prey column in create_survey()
and also didn’t output prey so a new function, create_survey_diet()
fixes this.
If we modify the existing create_survey()
, the total consumption in tons aggregated over layers (output of aggregateData) will be multiplied by our survey efficiency (q) and selectivity at agecl. Is this appropriate?
In a global diet composition, applying selectivity may make sense because it excludes or downweights consumption by less selected age classes. However, if we are looking at agecl specific diet, the composition is the same, just scaled down. Similar thoughts for survey q, all it does is scale the total consumption in tons across all ages so the comp is the same but tonnage smaller. Would this be double counting the effects of q if put back into a model and scaled up somehow? Need to see how consumption in tons is used in our models. For now create_survey_diet()
applies q and selectivity exactly like create_survey()
.
omlist_ss <- readRDS(file.path(d.name, paste0(scenario.name, "omlist_ss.rds")))
source(here("config/omdimensions.R"))
# first a census in spring
source(here("config/mssurvey_spring.R"))
# survtime doesn't match units of time.days in detaileddiet, add to om_diet
survtime <- survey_sample_full*omlist_ss$runpar$outputstep
# aggregateData called by create_survey expects a column "time", add to function
# if("time.days" %in% names(dat) & !("time" %in% names(dat))) {
# names(dat)[names(dat) == 'time.days'] <- 'time'
# }
# aggregateData doesn't keep the prey column
# create survey also won't output prey column and hardcodes keep columns for aggegateData
# use functions separately
# aggregate over layer, adding prey column (names resulting column "numAtAge", should generalize)
aggDat <- aggregateData(dat = detaileddiet,
time = survtime,
species = survspp,
boxes = survboxes,
keepColumns=c("species","agecl","polygon","time", "prey"))
# should we be multiplying true consumption by q and selectivity? create_survey does
# selectivity misses consumption from younger age classes
effic <- surveffic
selex <- survselex # should be survselex.agecl but identical in census survey config
surv <- aggDat #this can be removed and uncomment density stuff above. Make sure to think how this plays into sampling
#merge in efficiency
surv <- merge(surv,effic,by="species",all.x=T)
#merge in selex
surv <- merge(surv,selex,by=c("species","agecl"),all.x=T)
#should I change any missing selex to zero???
#should I scale or check selex maximum at 1?
#surv$numAtAgeSurv <- surv$density * surv$survArea * surv$efficiency * surv$selex
surv$numAtAgeSurv <- surv$numAtAge * surv$efficiency * surv$selex
#Should I be checking for NA's along the way to identify problems?
#Create final dataframe in same format as input
#put time (mean) and layers (NA) back in the dataframe for completeness
out <- data.frame(species = surv$species,
agecl = surv$agecl,
polygon = surv$polygon,
layer = NA,
time = surv$time,
prey = surv$prey,
atoutput = surv$numAtAgeSurv)
survey_cons_test <- out[order(out$species,out$time,out$polygon,out$agecl),]
# survey_diet <- create_survey(dat = detaileddiet,
# time = survtime,
# species = survspp,
# boxes = survboxes,
# effic = surveffic,
# selex = survselex)
create_survey_diet()
omlist_ss <- readRDS(file.path(d.name, paste0(scenario.name, "omlist_ss.rds")))
source(here("config/omdimensions.R"))
# first a census in spring
source(here("config/mssurvey_spring.R"))
# survtime doesn't match units of time.days in detaileddiet, add to om_diet
survtime <- survey_sample_full*omlist_ss$runpar$outputstep
survey_cons <- create_survey_diet(dat = detaileddiet,
time = survtime,
species = survspp,
boxes = survboxes,
effic = surveffic,
selex = survselex)
sample_diet()
We want sample_diet()
to give us something like we would get out of the survey sampling process. The diet sampling function could return diet proportions by predator, including unknown/family level id, with observation error. This would then be input into the models. True diets will be the actual consumption in tons output from load_detailed_diet_comp()
.
The intial partial sample_diet()
function in atlantisom
needs to change because it expects a wide format file of diet proportions, rather than a long-format file of consumption in tons which is the output of create_survey_diet()
.
It would be nice if sample_diet()
could take either the output of load_diet_comp()
(global proportions not surveyed, long format) or create_survey_diet()
.
The initial function removed predators not expected on a fish trawl survey, removed prey types that are normally not counted, then adds uniform error to half the remaining diet proportions, removes a random subet of prey accounting for <0.25 diet proportion, then renormalizes. I would rather have a function that could be used on any “survey” so have already removed hardcoded predator selection.
I think also the distributional assumptions need revision; a Dirichlet is most appropriate to simulate continuous proportions. See Douma and Weedon 2019 for an excellent overview, as well as a description of the Dirichlet alpha parameter and Lin 2016.
Here I’ll use the random number generation function rdirichlet()
in the package DirichletReg
but others may work too.
# global proportion diet comp for comparison with survey_cons
diettest <- load_diet_comp(here("atlantisoutput", "NOBA_march_2020"),
"nordic_runresults_01DietCheck.txt", fgs = fgs) %>%
filter(atoutput>0) %>%
filter(species %in% survspp)
dat <- survey_cons
#dat <- diettest
unidprey <- 0.5 #0 makes none unid, 1 could be all, reflects max proportion to unid
alphamult <- 10000000 #default is almost identical to true, smaller more variable
sample_diet <- function(dat, fgs, unidprey = 0, alphamult = 10000000) {
# dat can be long format output of load_diet_comp, or create_survey_diet, both have layers aggregated
# load_diet_comp output has no polygon column and atoutput is proportion,
# create_survey_diet output has polygon column and atoutput is consumption in tons
# first aggregate to get global diet comp by species, agecl, time in proportion, if not already in that form
if("polygon" %in% names(dat)){
#sum over boxes and assume sampling occurs coastwide (the sampled boxes were already subset in create functions)
dat2 <- aggregate(dat$atoutput,list(dat$species,dat$agecl,dat$time, dat$prey),sum)
names(dat2) <- c("species","agecl","time","prey", "sumcons")
dat2 <- dat2 %>%
group_by(species, agecl, time) %>%
mutate(totcons = sum(sumcons)) %>%
mutate(dietprop = sumcons/totcons) %>%
arrange(species, agecl, time)
# rename to match output of load_diet_comp
names(dat2)[names(dat2) == 'dietprop'] <- 'atoutput'
names(dat2)[names(dat2) == 'time'] <- 'time.days'
} else dat2 <- dat
# aggregate over age? which ones? leave as age class specific?
# then apply id/aggregation bias--this adds prey categories that don't exist??
# user specified max proportion to reallocate
# apply taxonomically, allocating fish prey to fish unid, invert prey to invert unid, then some portion unid
# phytoplankton and zooplankton prey direct to unid?
# use grouptype column to allocate
colnames(fgs) <- tolower(colnames(fgs))
# check for GroupType or InvertType
if (!("grouptype" %in% colnames(fgs) | "inverttype"%in% colnames(fgs))) {
stop(paste("The columns GroupType or InvertType ars not in your functional\n",
"groups file."))
}
# change inverttype to grouptype, contents should be the same
if("inverttype" %in% names(fgs)) names(fgs)[names(fgs) == 'inverttype'] <- 'grouptype'
fgs$grouptype <- tolower(fgs$grouptype)
# associate prey categories (atlantis user guide I Table 8) with unid categories
fgs<- fgs%>%
mutate(unidtype = case_when((grouptype %in% c("mammal",
"bird",
"reptile")) ~ "Unid_Vert",
(grouptype %in% c("fish",
"shark")) ~ "Unid_Fish",
(grouptype %in% c("fish_invert",
"cep",
"pwn",
"lg_inf",
"sm_inf",
"sed_ep_ff",
"sed_ep_other",
"mobile_ep_other",
"coral",
"sponge")) ~ "Unid_Invert",
(grouptype %in% c("lg_zoo",
"med_zoo",
"sm_zoo",
"dinoflag",
"lg_phy",
"sm_phy")) ~ "Unid_Plankton",
TRUE ~ "Unid"))
# associate prey with prey category
# randomly assign up to unidprey % of prey in each prey category to associated unid category
# take remainder for identified prey
fgsprey <- fgs[,c("name", "unidtype")]
names(fgsprey)[names(fgsprey) == 'name'] <- 'prey'
dat2 <- left_join(dat2, fgsprey) %>%
rowwise() %>%
mutate(unidprop = runif(1, 0, unidprey*atoutput),
sampprop = atoutput - unidprop)
# add rows with unid categories and sum prop assigned to the category
unidrows <- dat2 %>%
#select(-atoutput, -prey, -sampprop) %>%
group_by(species, time.days, agecl, unidtype) %>%
summarise(unidtot = sum(unidprop)) %>%
ungroup() %>%
rename(prey = unidtype, sampprop = unidtot)
dat2 <- dat2 %>%
select(species, time.days, agecl, prey, sampprop) %>%
full_join(unidrows)
# then apply Dirichlet obs error; Dirichlet equivalent of rmultinom(1, effN, trueagecomp)?
# see e.g., https://rdrr.io/cran/DirichletReg/man/Dirichlet.html
# alpha vector is the diet comp scaled by a user-supplied multiplier
# default to a very high multiplier for close-to-true diet comp
# lower multipliers stray further from true diet comp
# testdiet <- c(.1, .32, .06, .02, .49, .01)
# library(DirichletReg)
# rdirichlet(10, 10000000*testdiet)
# rdirichlet(10, 100*testdiet)
#tidy
dat2 <- dat2 %>%
filter(sampprop>0) %>%
group_by(species, time.days, agecl) %>%
do(mutate(., dietSamp = DirichletReg::rdirichlet(1, alphamult * sampprop))) %>%
ungroup() %>%
select(-sampprop)
# returns diet proportions with observation error: Dirichlet, and bias from unknown id
return(dat2)
}
# old function below
# # first remove species not sampled and not quantified in gut analyses
# colnames(fgs) <- tolower(colnames(fgs))
#
# # check for GroupType or InvertType
# if (!("grouptype" | "inverttype") %in% colnames(fgs)) {
# stop(paste("The columns GroupType or InvertType ars not in your functional\n",
# "groups file and thus sample_diet does not know which groups to sample."))
# }
#
# # change inverttype to grouptype, contents should be the same
# if("inverttype" %in% names(fgs)) names(fgs)[names(fgs) == 'inverttype'] <- 'grouptype'
#
# fgs$grouptype <- tolower(fgs$grouptype)
# nonsampledtypes <- c("bird", "mammal", "cep", "sed_ep_ff", "sed_ep_other",
# "mob_ep_other", "pwn", "lg_zoo", "lg_inf", "phytoben")
# nonSampled <- subset(fgs, isfished == 0 | grouptype %in% nonsampledtypes |
# code == "REP")
# notenumeratedtypes <- c("bird", "mammal", "cep", "sed_ep_other", "lg_zoo",
# "lg_inf", "phytoben")
# notEnum <- subset(fgs, grouptype %in% notenumeratedtypes | code == "BFF")
#
# dat <- dat[!(dat$Predator %in% nonSampled), !(colnames(dat) %in% notEnum)]
#
# # add uniform error to half of the "observations"
# nPreyObs <- NROW(dat) * (NCOL(dat)-1)
# for(obs in 1:(nPreyObs/2)){
# # determine row and column indices
# rowR <- sample(1:NROW(dat), 1)
# colC <- sample(2:NCOL(dat), 1)
#
# dat[rowR, colC] <- dat[rowR, colC] + runif(1, -0.1, 0.1)
# }
#
# # add bias by removing little-observed prey at random
# for(i in 1:NROW(dat)){
# for(j in 2:NCOL(dat)){
# if(dat[i,j] < 0.25 & runif(1) < 0.15){
# dat[i,j] <- 0
# }
# }
# }
#
# # recalibrate so that rows add to 1
# # first need to adjust/account for negative values
# baseAdd <- min(dat[,2:NCOL(dat)])
# if(baseAdd < 0){
#
# for(i in 1:NROW(dat)){
# for(j in 2:NCOL(dat)){
# if(dat[i,j] != 0){
# dat[i,j] <- dat[i,j]-baseAdd
# }
# }
# }
#
# }
#
#
# for(r in 1:NROW(dat)){
#
# denom <- rowSums(dat[r,2:NCOL(dat)])
# dat[r,2:NCOL(dat)] <- (dat[r,2:NCOL(dat)])/denom
# }
sample_diet()
Testing the new sample_diet()
function with both surveyed diet (survey_cons, output of create_survey_diet) and global diet (diettest, output of load_diet_comp), with both unidentified prey bias and Dirichlet observation error.
samp1 <- sample_diet(survey_cons, fgs, unidprey = 0, alphamult = 10000000)
samp2 <- sample_diet(diettest, fgs, unidprey = 0.5, alphamult = 10000000)
samp3 <- sample_diet(survey_cons, fgs, unidprey = 0.3, alphamult = 10000000)
samp4 <- sample_diet(survey_cons, fgs, unidprey = 0.3, alphamult = 10)
samp5 <- sample_diet(diettest, fgs, unidprey = 0, alphamult = 10000000)
samp6 <- sample_diet(survey_cons, fgs, unidprey = 0, alphamult = 10)
Plotting functions and colors for everyone
# length(unique(samp2$prey)) #39 prey categories max
# 35 categories in samp5 without unid length(unique) #http://medialab.github.io/iwanthue/
preycol <- c("#7b7927",
"#746dd8",
"#a6bc3a",
"#4f3587",
"#86af42",
"#c666c7",
"#42c87f",
"#cb417f",
"#6fb95c",
"#882e7b",
"#67b271",
"#c83c63",
"#43c8ac",
"#da4953",
"#3ba7e5",
"#caa432",
"#5081db",
"#c8802f",
"#6c79c2",
"#c7b961",
"#bb86d6",
"#3b7125",
"#d672b9",
"#b57e43",
"#872957",
"#dd6741",
"#db75a2",
"#9e3d14",
"#db6f81",
"#7b3015",
"#e08262",
"#8d2a41",
"#d56467",
"#8c242e",
"#a73830")
names(preycol) <- as.factor(sort(unique(samp5$prey)))
# going for more greyscale for unident categories, same website
unidcol <- c("#b8b8b2",
"#302a1d",
"#6b7069",
"#1e3430")
names(unidcol) <- as.factor(c("Unid", "Unid_Fish", "Unid_Invert", "Unid_Plankton"))
col <- c(preycol, unidcol)
# plot diet comp over time at age by species
plotdiet <- function(dat, compdat=NULL, namedat, namecomp=NULL){
dat <- dat %>% add_column(run = namedat)
if(!is.null(compdat)) compdat <- compdat %>% add_column(run = namecomp)
ggplot() +
geom_bar(data=dat, aes(time.days/365, dietSamp, fill=prey), stat = "identity") +
{if(!is.null(compdat)) geom_bar(data=compdat, aes(time.days/365, dietSamp, fill=prey), stat = "identity")} +
theme_tufte() +
theme(legend.position = "bottom") +
xlab("year") +
ylab("diet proportion") +
facet_grid(agecl~run) +
scale_fill_manual(values=col) +
ggtitle(dat$species)
}
# method for a single species diet, no comparisons
# plist = lapply(split(ms_diet, ms_diet$species), function(d) {
# ggplot(d, aes(time.days/365, atoutput, fill=prey)) +
# geom_bar(stat = "identity") +
# facet_wrap(species~agecl) +
# xlab("year") +
# ylab("diet proportion") +
# theme_tufte() +
# theme(legend.position="bottom")
# })
Here we compare the surveyed diet comp with the global diet comp, both with no prey bias (unidprey = 0) and extremely low Diriclet observation error (alphamult = 10000000):
preds <- unique(samp1$species)
for(i in 1:length(preds)) {
cat(" \n####", as.character(preds[i])," \n")
print(plotdiet(dat = filter(samp1, species %in% preds[i]), namedat = "Survey diet", compdat = filter(samp5, species %in% preds[i]), namecomp = "Global diet"))
cat(" \n")
}
Here we compare the global diet comp without and with unidentified prey bias (unidprey = 0 and unidprey = 0.5, or up to 50% of each prey proportion could be set to unidentified) using extremely low Diriclet observation error (alphamult = 10000000):
for(i in 1:length(preds)) {
cat(" \n####", as.character(preds[i])," \n")
print(plotdiet(dat = filter(samp5, species %in% preds[i]), namedat = "Global diet", compdat = filter(samp2, species %in% preds[i]), namecomp = "Biased global diet"))
cat(" \n")
}
Here we compare survey diet comp with no bias and variance (unidprey = 0 and alphamult = 10000000) to survey diet with some bias (unidprey = 0.3 or up to 30% of each prey proportion could be set to unidentified, and alphamult = 10000000):
for(i in 1:length(preds)) {
cat(" \n####", as.character(preds[i])," \n")
print(plotdiet(dat = filter(samp1, species %in% preds[i]), namedat = "Survey diet", compdat = filter(samp3, species %in% preds[i]), namecomp = "Biased survey diet"))
cat(" \n")
}
Here we compare survey diet comp with no bias or variance (unidprey = 0 and alphamult = 10000000), with no bias and high Dirichlet observation error (unidprey = 0 and alphamult = 10):
for(i in 1:length(preds)) {
cat(" \n####", as.character(preds[i])," \n")
print(plotdiet(dat = filter(samp1, species %in% preds[i]), namedat = "No variance survey diet", compdat = filter(samp6, species %in% preds[i]), namecomp = "High variance survey diet"))
cat(" \n")
}
Last we compare survey diet comp with some bias (unidprey = 0.3 or up to 30% of each prey proportion could be set to unidentified) and low observation error (alphamult = 1000000) to the same bias and high Dirichlet observation error (alphamult = 10):
for(i in 1:length(preds)) {
cat(" \n####", as.character(preds[i])," \n")
print(plotdiet(dat = filter(samp3, species %in% preds[i]), namedat = "Biased survey diet", compdat = filter(samp4, species %in% preds[i]), namecomp = "Bias and high variance survey diet"))
cat(" \n")
}
The new om_diet
function should load the detailed diet file, run the survey, then sample the diets and save outputs for each survey. This takes 3 minutes to run with two surveys and 10 species.
NOBAom_ms <- readRDS(file.path(d.name, paste0(scenario.name, "omlist_ss.rds")))
NOBAom_ms_diet <- om_diet(config = here("config", "NOBA2config.R"),
dietfile = "NOBADetDiet.gz",
usersurvey = c(here("config/mssurvey_spring_01.R"),
here("config/mssurvey_fall_01.R")),
omlist_ss = NOBAom_ms,
n_reps = 1,
save = TRUE)
The diet was sampled with alphamult = 10 and unidprey = 0.3 using the survey configuration files.
NOBAom_ms_diet <- read_savedsurvs(d.name, 'survDiet')
preds <- unique(unique(NOBAom_ms_diet[[2]][[1]]$species))
fall <- as.data.frame(NOBAom_ms_diet$BTS_fall_nearbox_qmix_selmix[[1]])
spring <- as.data.frame(NOBAom_ms_diet$BTS_spring_nearbox_qmix_selmix[[1]])
for(i in 1:length(preds)) {
cat(" \n####", as.character(preds[i])," \n")
print(plotdiet(dat = filter(spring, species %in% preds[i]), namedat = "Spring survey diet", compdat = filter(fall, species %in% preds[i]), namecomp = "Fall survey diet"))
cat(" \n")
}