Generate data for multispecies modeling, including diet!

Which species?

Let’s use the wrapper functions to get initial multispecies assessment model inputs.

We use config files for the NOBA model, and define multispecies surveys and fisheries. We will also test diet functions which are needed as multispecies and ecosystem model inputs.

Our initial species selection includes 11 single species groups from the NOBA model:

fgs <- load_fgs(here("atlantisoutput", "NOBA_March_2020"), "nordic_groups_v04.csv")

lname <- data.frame(Latin = c("*Hippoglossoides platessoides*",
                              "*Reinhardtius hippoglossoides*",
                              "*Scomber scombrus*",
                              "*Melongrammus aeglefinus*",
                              "*Pollachius virens*",
                              "*Sebastes mentella*",
                              "*Micromesistius poutassou*",
                              "*Clupea harengus*",
                              "*Gadus morhua*",
                              "*Boreogadus saida*",
                              "*Mallotus villosus*"),
                    Code = c("LRD", "GRH", "MAC", "HAD", "SAI", "RED", 
                             "BWH", "SSH", "NCO", "PCO", "CAP")
)

sppsubset <- merge(fgs, lname, all.y = TRUE)
spptable <- sppsubset %>% 
  arrange(Index) %>%
  select(Name, Long.Name, Latin)

knitr::kable(spptable, col.names = c("Model name", "Full name", "Latin name"))

Model name	Full name	Latin name
Long_rough_dab	Long rough dab	Hippoglossoides platessoides
Green_halibut	Greenland halibut	Reinhardtius hippoglossoides
Mackerel	Mackerel	Scomber scombrus
Haddock	Haddock	Melongrammus aeglefinus
Saithe	Saithe	Pollachius virens
Redfish	Redfish	Sebastes mentella
Blue_whiting	Blue whiting	Micromesistius poutassou
Norwegian_ssh	Norwegian spring spawning herring	Clupea harengus
North_atl_cod	Northeast Atlantic cod	Gadus morhua
Polar_cod	Polar cod	Boreogadus saida
Capelin	Capelin	Mallotus villosus

These represent a range of life histories which are similar to those on the Northeast US shelf (some are the same species), so they form a reasonable set of species for multispecies model testing.

Diet interactions

How much do these species interact with each other? Now we test the diet composition functions in atlantisom.

Diet composition is loaded with load_diet_comp():

diettest <- load_diet_comp(here("atlantisoutput", "NOBA_march_2020"), 
                           "nordic_runresults_01DietCheck.txt", fgs = fgs)

ms_diet <- diettest %>%
  filter(species %in% sppsubset$Name) %>%
  filter(atoutput>0)

Plotting diets

plist = lapply(split(ms_diet, ms_diet$species), function(d) {
  ggplot(d, aes(time.days/365, atoutput, fill=prey)) + 
    geom_bar(stat = "identity") +
    facet_wrap(species~agecl) +
    xlab("year") +
    ylab("diet proportion") +
    theme_tufte() +
    theme(legend.position="bottom")
})

Capelin

plist$Capelin

Blue whiting

plist$Blue_whiting

Saithe

plist$Saithe

Cod

plist$North_atl_cod

Haddock

plist$Haddock

Greenland halibut

plist$Green_halibut

Herring

plist$Norwegian_ssh

Redfish

plist$Redfish

Mackerel

plist$Mackerel

Long rough dab

plist$Long_rough_dab

Polar cod

plist$Polar_cod

Interactions among these species

Do our selected species eat each other? What proportion of each species diet comes from species in our selected group?

This table shows the proportion of diet for each species and age class where only our selected species are considered as prey. The statistics are across all time steps in this model run, and do not include timesteps where the diet composition was 0.

prop <- ms_diet %>%
  filter(prey %in% unique(species)) %>%
  select(species, agecl, time.days, prey, atoutput) %>%
  group_by(species, agecl, time.days) %>%
  summarise(mspreyprop = sum(atoutput)) #sum over prey each timestep
  
range <- prop %>%
  group_by(species, agecl) %>%
  summarise(minprop = min(mspreyprop),
         medprop = median(mspreyprop),
         meanprop = mean(mspreyprop),
         maxprop = max(mspreyprop))

#knitr::kable(range)
library(DT)
datatable(range, rownames = FALSE, options = list(pageLength = 25))

Mature age classes of cod and redfish average over a third of their diet compositions from other species in the selected group, suggesting fairly strong predation interactions.

Many of the selected species share zooplankton prey, so there may be species interactions via prey as well.

Total consumption time series

The true data can be generated without updated config files (once I fix missing diets in wrappers!).

UPDATE: added an option to bring in biomass_eaten in the om_species wrapper. However, the way this is calculated is probably inappropriate for what we want. biomass_eaten is the output of box specific total tons consumed by predators from PROD.nc split by global diet comp from DietCheck.txt. This means predators will have the same diet comp in all boxes, whether prey were there or not.

To do diet sampling using our survey design, we need to retain actual box-specific diet. This is found in the DetailedDietCheck.txt file, which outputs tons consumed since last timestep for each predator by box, layer, and prey. Because there is no conversion needed for tons, we can just read this in, get rid of all the zero diet comp values to make it a (possibly) manageable size, and apply the survey from there. A new function load_detailed_diet_comp.R is developed below.

Bottom line is don’t use biomass_eaten; maybe we should take it out of run_truth() output.

NOBAom <- om_init(here("config/NOBA2config.R")) #run and saved June 19,
# run again 3 August to update biomass_eaten 
# and output archived with _bio_eaten suffix
# run again 15 Sept to get rid of biomass_eaten

#NOBAom_ms <- om_species(sppsubset$Name, NOBAom) #run and saved June 19
# run again 3 August to test new wrapper that should save biomass_eaten
#NOBAom_ms <- om_species(sppsubset$Name, NOBAom, diet=TRUE)
# and output archived with _bio_eaten suffix
# run again 15 Sept to get rid of biomass eaten
NOBAom_ms <- om_species(sppsubset$Name, NOBAom)

source(here("config", "NOBA2config.R"))

# diet data are not in here, need to revise wrapper
#NOBAom_ms <- readRDS(file.path(d.name, paste0(scenario.name, "omlist_ss.rds")))

# get from full truth run--big file
#truth <- get(load(file.path(d.name,
#                                paste0(scenario.name, "run_truth.RData"))))

# questions: does biomass_eaten have some diet comp info as dietcheck.txt?
# load dietcheck with om_init using a switch in config file?
# add or pass along switch to keep biomass_eaten truth output in species file
# is biomass_eaten correct?
#   revised load_diet_comp to *not include* NAs that merge poorly
# polygon specific total mg N consumed per unit volume times volume of polygon
# then split to prey using global diet fraction (and converted to t)

#bioeaten_ss <- truth$biomass_eaten %>%
#  filter(species %in% sppsubset$Name) 

# compare ms_diet$atoutput to bioeaten_ss$dietfrac
# should match for species, agecl, time.days, prey

cod5.365 <- NOBAom_ms$truebioeaten_ss %>%
  filter(species=="North_atl_cod",
         agecl==5,
         time.days==365, 
         polygon==4)

cod5.365.check <- ms_diet %>%
  filter(species=="North_atl_cod",
         agecl==5,
         time.days==365)
  
#it does, but I think we need this by box
#it makes no sense to apply global diet comp to boxes, not all prey 
#may be in the box
#DetailedDietCheck.txt is the file to read for using survey sampling.

#this file is huge, try data.table::fread
detaileddiettest <- data.table::fread(file.path(here("atlantisoutput", "NOBAwithAnnAgeOutput"), "outputnordic_runresults_01DetailedDietCheck.txt"),
                                      data.table=FALSE
                                      #nrows = 200
                                      )

diet <- detaileddiettest

#remove rows that are all 0 prey
#diet <- diet[apply(diet[,-c(1:5)], 1, function(x) !all(x==0)),]
  
#should do the same thing faster
diet <- diet[as.logical(abs(as.matrix(diet[,-c(1:5)])) %*% rep(1L,ncol(diet[,-c(1:5)]))), ]

#test pieces of new load_detailed_diet function
  if(length(grep("Group",colnames(diet)))>0)
    colnames(diet) <- gsub("Group","Predator",colnames(diet))

  # Change column order
  diet <- diet[, c("Predator", "Cohort", "Time", "Box", "Layer",
    names(diet)[which(!names(diet) %in% c("Predator", "Cohort", "Time", "Box", "Layer"))])]

  # Convert to tidy dataframe to allow joining/merging with other dataframes.
  diet <- tidyr::gather_(data = diet, key_col = "prey", value_col = "dietcomp",
    #gather_cols = names(diet)[(which(names(diet) == "Updated") + 1):NCOL(diet)])
    gather_cols = names(diet)[(which(names(diet) %in% fgs$Code))]) 
  
  # Get rid of 0 dietcomp rows to make manageable size
  diet <- filter(diet, dietcomp>0)

  names(diet) <- tolower(names(diet))

  diet$prey <- as.character(diet$prey)
  diet$predator <- as.character(diet$predator)
  
  # Change species acronyms to actual names.
  species_names <- fgs[, c("Name", "Code")]
  diet$species <- species_names$Name[match(diet$predator, species_names$Code)]
  diet$prey <- species_names$Name[match(diet$prey, species_names$Code)]
  
  # diet outputs are not all divisible by toutinc, leave in days
  diet$time.days <- diet$time #/ toutinc

  #fix cohort to agecl
  
  diet <- diet %>%
    dplyr::mutate(agecl = cohort + 1)
  
  # drop redundant columns and reorder
  diet <- diet[,c("species", "agecl", "time.days", "box", "layer", "prey", "dietcomp")]

Try reading in the box-specific diet in weight and see if the proportions match dietcheck.txt outputs:

Oy, have to deal with filesize first. Try stripping out all the 0 lines:

Even running a “bash” chunk here is still running R system2(); I don’t think this is worth it. Cant even count the columns without the fans kicking up.

The awk code, for the record, is this, executed in terminal from the directory with the .gz file in it:

zcat < nordic_runresults_01DetailedDietCheck.txt.gz | awk 'NR > 1{s=0; for (i=6;i<=NF;i++) s+=$i; if (s!=0)print}' | gzip > DetDiet.gz

On linux it would just say zcat filename; macOS needs zcat < filename to work. The awk statement is saying after the header row (NR>1), sum up the values from column 6 to the last column and keep the row if they are >0. The last pipe sends the new file to a zip file.

Links consulted: awk is the right thing

awk to delete lines summing to 0

awk a compressed file leaving it compressed

Started command at 5:35 pm. Finished at 6:58. Maybe I can fread the output smaller file?

I can but it stripped off the header. So I need to grab that and cat it back together. cat works on two gzipped files, so check it with first line, make the zip version with second, and concatenate them with third.

zcat < nordic_runresults_01DetailedDietCheck.txt.gz | head -n1 > DetDietHead.txt
zcat < nordic_runresults_01DetailedDietCheck.txt.gz | head -n1 | gzip > DetDietHead.gz
cat DetDietHead.gz DetDiet.gz > NOBADetDiet.gz

Now we diagnose the load function again and try to optimize with data.table functions.

#keeps failing with vector memory exhausted (limit reached?)
#gets to the step of eliminating 0 rows
#try awk in a system command
# this test works but R overhead isnt worth it, faster run from terminal
#system("zcat < atlantisoutput/NOBA_march_2020/nordic_runresults_01DetailedDietCheck.txt.gz | awk '{print NF}' | uniq")
# see notes above for preprocessing step, here debug new load_detailed_diet function

dir <- here("atlantisoutput","NOBA_march_2020") 
file_diet <-  "NOBADetDiet.gz"

  if (is.null(dir)) {
    diet.file <- file_diet
  } else {
    diet.file <- file.path(dir, file_diet)
  }
  diet <- data.table::fread(diet.file)

  # remove all 0 prey rows--done with awk statement first, file is too big for R to do this
  #diet <- diet[as.logical(rowSums(diet[,-c(1:5)] != 0)), ]


  # SKG June 2020: changing to the other way around. more new models now,
  # only one diet function to change, and "Predator" seems clearer for users.
  if(length(grep("Group",colnames(diet)))>0)
    colnames(diet) <- gsub("Group","Predator",colnames(diet))

  # Change column order NOT DATA TABLE SAFE
  #diet <- diet[, c("Predator", "Cohort", "Time", "Box", "Layer",
  #  names(diet)[which(!names(diet) %in% c("Predator", "Cohort", "Time", "Box", "Layer"))])]

  # Change column order
  data.table::setcolorder(diet, c("Predator", "Cohort", "Time", "Box", "Layer"))

  diet <- data.table::melt(diet, id = 1:5,
               variable.name="prey",
               value.name = "dietcomp")

  # # Convert to tidy dataframe to allow joining/merging with other dataframes.
  # diet <- tidyr::gather_(data = diet, key_col = "prey", value_col = "dietcomp",
  #   #gather_cols = names(diet)[(which(names(diet) == "Updated") + 1):NCOL(diet)])
  #   gather_cols = names(diet)[(which(names(diet) %in% fgs$Code))])

  # Get rid of 0 dietcomp rows to make manageable size
  diet <- diet[dietcomp>0]
  #diet <- filter(diet, dietcomp>0)

  names(diet) <- tolower(names(diet))

  diet$prey <- as.character(diet$prey)
  diet$predator <- as.character(diet$predator)

  # Change species acronyms to actual names.
  species_names <- fgs[, c("Name", "Code")]
  diet$species <- species_names$Name[match(diet$predator, species_names$Code)]
  diet$prey <- species_names$Name[match(diet$prey, species_names$Code)]
  #diet <- diet[, -which(colnames(diet) == "predator")]

  # diet outputs are not all divisible by toutinc, leave in days
  diet$time.days <- diet$time #/ toutinc

  #fix cohort to agecl
  # diet <- diet %>%
  #   dplyr::mutate(agecl = cohort + 1)
  diet <- diet[, agecl := cohort + 1]

  # drop redundant columns and reorder NOT DATA TABLE SAFE
  #diet <- diet[,c("species", "agecl", "time.days", "box", "layer", "prey", "dietcomp")]

  #leave layer info for now, this is a load function

  #does the Update column matter? this assumes it doesn't
  #in SETAS_Example, it doesn't make extra comps

  dietcompwt <- data.frame(species = diet$species,
                         agecl = diet$agecl,
                         time.days = diet$time.days,
                         polygon = diet$box,
                         layer = diet$layer,
                         atoutput = diet$dietcomp,
                         prey = diet$prey)

Using data.table improves speed for the atantisom::load_detailed_diet_comp() function. But it turns out you need to import data.table within a package in both the DESCRIPTION and NAMESPACE files instructions! and now this beast of a function works.

It takes minutes to load the detailed diet, even with pre-processing with awk as above, so saving the output is a good idea. We save only for our 10 species to streamline things a bit.

# still not working but works line by line--fixed
# N.B. data.table wants to be in both imports and namespace within a function!
detdiet <- load_detailed_diet_comp(here("atlantisoutput","NOBA_march_2020"), 
                                   "NOBADetDiet.gz", 
                                   fgs = fgs)

ms_detaileddiet <- detdiet %>%
  filter(species %in% sppsubset$Name)

saveRDS(ms_detaileddiet, file.path(d.name, paste0(scenario.name, "ms_detaileddiet.rds")))

All this was just to see what the regional diet data looks like compared with global diet comps from DietCheck.txt.

# read in the saved file
source(here("config", "NOBA2config.R"))

ms_detaileddiet <- readRDS(file.path(d.name, paste0(scenario.name, "ms_detaileddiet.rds")))

# look at a single species/age combo from above
#    agecl 5 (age 9-10) cod
cod5det <- ms_detaileddiet %>%
  filter(species=="North_atl_cod",
         agecl==5)

# how many polygons included? how many layers per polygon have data?
#cod5det %>% group_by(time.days, polygon, layer) %>% tally()
# why are layers -1 and 0 included (must have trace data if 0 diet was eliminated)

# are there differences by layer within a polygon?
#    polygons 12 and 19 are on the slope, have most layers with agecl 5 cod
cod5det1219 <- cod5det %>%
  filter(polygon %in% c(1,6,7,12,19,25,29,37))

plist2 = lapply(split(cod5det1219, cod5det1219$polygon), function(d) {
  ggplot(d, aes(time.days/365, atoutput, fill=prey)) + 
    geom_bar(stat = "identity") +
    facet_wrap(polygon~layer) +
    xlab("year") +
    ylab("diet tons") +
    theme_tufte() +
    theme(legend.position="bottom")
})

As suspected, diet composition varies greatly by model area and layer.

Consumption by polygon and layer

Cod agecl 5 polygon 1

plist2$'1'

Cod agecl 5 polygon 6

plist2$'6'

Cod agecl 5 polygon 7

plist2$'7'

Cod agecl 5 polygon 12

plist2$'12'

Cod agecl 5 polygon 19

plist2$'19'

Cod agecl 5 polygon 25

plist2$'25'

Cod agecl 5 polygon 29

plist2$'29'

Cod agecl 5 polygon 37

plist2$'37'

# are there differences by polygon? (aggregate layers)
coddet_agg <- as.data.frame(
  ms_detaileddiet %>%
  filter(species=="North_atl_cod") %>%
  group_by(species, agecl, time.days, polygon, prey) %>%
  summarise(polytons = sum(atoutput))
)

coddet_agg2 <- atlantisom::aggregateData(dat = ms_detaileddiet, 
                                        time = unique(ms_detaileddiet$time.days), 
                                        species = "North_atl_cod", 
                                        boxes = unique(ms_detaileddiet$polygon),
  keepColumns = c("species", "agecl", "polygon", "time.days", "prey"))

names(coddet_agg2)[names(coddet_agg2)=="numAtAge"] <- "polytons"

plist3 = lapply(split(coddet_agg2, coddet_agg2$agecl), function(d) {
  ggplot(d, aes(time.days/365, polytons, fill=prey)) + 
    geom_bar(stat = "identity") +
    facet_wrap(species~polygon) +
    xlab("year") +
    ylab("diet tons") +
    theme_tufte() +
    theme(legend.position="bottom")
})

Cod agecl 5 all polygons

plist3$'5'

Cod agecl 10 all polygons

plist3$'10'

Aggregate detailed diet across layers and polygons, compare to aggreate diet comp output (proportion). This looks identical to the DietCheck output:

# are there differences by polygon? (aggregate layers)
coddet_aggall <- as.data.frame(
  coddet_agg %>%
  filter(species=="North_atl_cod") %>%
  group_by(species, agecl, time.days, prey) %>%
  summarise(tottons = sum(polytons))
)

  ggplot(coddet_aggall, aes(time.days/365, tottons, fill=prey)) + 
    #geom_bar(stat = "identity") +
    geom_bar(position = "fill", stat = "identity") +
    facet_wrap(species~agecl) +
    xlab("year") +
    ylab("diet tons") +
    theme_tufte() +
    theme(legend.position="bottom")

This is detailed diet aggregated across layers and polygons in absolute tons consumed by age class (y axes are different–most consumption from age class 1):

  ggplot(coddet_aggall, aes(time.days/365, tottons, fill=prey)) + 
    geom_bar(stat = "identity") +
    #geom_bar(position = "fill", stat = "identity") +
    facet_wrap(species~agecl, scales = "free_y") +
    xlab("year") +
    ylab("diet tons") +
    theme_tufte() +
  
    theme(legend.position="bottom")

This is the output of biomass_eaten in absolute tons–a short cut that applies global diet comp to polygon-specific tons consumed. First we compare diet by polygon for age class 5 as above. As expected, the diet comp doesn’t match DetailedDietCheck by polygon, but the amounts are also vastly different by polygon:

NOBAom_ms_be <- readRDS(file.path(d.name, paste0(scenario.name, "omlist_ss_bio_eaten.rds")))

codbioeaten <- NOBAom_ms_be$truebioeaten_ss %>%
  filter(species=="North_atl_cod")

plist4 = lapply(split(codbioeaten, codbioeaten$agecl), function(d) {
  ggplot(d, aes(time.days/365, bio_eaten, fill=prey)) + 
    geom_bar(stat = "identity") +
    facet_wrap(species~polygon) +
    xlab("year") +
    ylab("diet tons") +
    theme_tufte() +
    theme(legend.position="bottom")
})

plist4$'5'

Aggregate proportion from biomass_eaten should match, because diet proportions come from DietCheck:

ggplot(codbioeaten, aes(time.days/365, bio_eaten, fill=prey)) + 
    #geom_bar(stat = "identity") +
    geom_bar(position = "fill", stat = "identity") +
    facet_wrap(species~agecl) +
    xlab("year") +
    ylab("diet tons") +
    theme_tufte() +
    theme(legend.position="bottom")

Total biomass consumed from bio_eaten calculation is about a factor of 10 higher than from DetailedDietCheck. Patterns look a bit different too:

ggplot(codbioeaten, aes(time.days/365, bio_eaten, fill=prey)) + 
    geom_bar(stat = "identity") +
    #geom_bar(position = "fill", stat = "identity") +
    facet_wrap(species~agecl, scales = "free_y") +
    xlab("year") +
    ylab("diet tons") +
    theme_tufte() +
    theme(legend.position="bottom")

Bottom line: biomass_eaten won’t work for our purposes in generating truth for survey data that will select polygons for sampling. Also, I’m not sure why consumption estimates are an order of magnitude higher.

So I’ll make some functions for diet survey sampling that work on the beastly DetailedDietCheck. I won’t try to bring this into run_truth due to the size of the file, and I will remove biomass_eaten from the run_truth output until I can troubleshoot the estimates.

Generating the dataset (September 2020)

Now to get a range of data:

Config files

NOBA2config.R looks like this (adjusted from Alfonso’s original):

d.name <- here("atlantisoutput","NOBA_march_2020")
functional.groups.file <- "nordic_groups_v04.csv"
biomass.pools.file <- "nordic_biol_v23.nc"
biol.prm.file <- "nordic_biol_incl_harv_v_007_3.prm"
box.file <- "Nordic02.bgm"
initial.conditions.file <- "nordic_biol_v23.nc"
run.prm.file <- "nordic_run_v01.xml"
scenario.name <- "nordic_runresults_01"
bioind.file <- "nordic_runresults_01BiomIndx.txt"
catch.file <- "nordic_runresults_01Catch.txt"
annage <- TRUE
fisheries.file <- "NoBAFisheries.csv"

omdimensions.R standardizes timesteps, etc. (this is part of atlantisom and should not need to be changed by the user):

#survey species inherited from omlist_ss
survspp <- omlist_ss$species_ss
# survey season and other time dimensioning parameters
# generalized timesteps all models
noutsteps <- omlist_ss$runpar$tstop/omlist_ss$runpar$outputstep
timeall <- c(0:noutsteps)
stepperyr <- if(omlist_ss$runpar$outputstepunit=="days") 365/omlist_ss$runpar$toutinc
midptyr <- round(median(seq(0,stepperyr)))

# model areas, subset in surveyconfig
allboxes <- c(0:(omlist_ss$boxpars$nbox - 1))

# fishery output: learned the hard way this can be different from ecosystem outputs
fstepperyr <- if(omlist_ss$runpar$outputstepunit=="days") 365/omlist_ss$runpar$toutfinc

# survey selectivity (agecl based)
sp_age <- omlist_ss$funct.group_ss[, c("Name", "NumCohorts", "NumAgeClassSize")]

# should return all age classes fully sampled (Atlantis output is 10 age groups per spp)
n_age_classes <- sp_age$NumCohorts
# changed below for multiple species NOTE survspp alphabetical; NOT in order of fgs!!
# this gives correct names
age_classes <- sapply(n_age_classes, seq)
names(age_classes)<-sp_age$Name

n_annages <- sp_age$NumCohorts * sp_age$NumAgeClassSize
# changed below for multiple species
annages <- sapply(n_annages, seq)
names(annages)<-sp_age$Name

mssurvey_spring.R and mssurvey_fall.R configure the fishery independent surveys (in this test, surveys sample all model polygons in all years and have efficiency of 1 for all species, with no size selectivity):

# Default survey configuration here has a range of efficiencies and selectivities
# To emulate a range of species in a single multispecies survey
# Also now happens in "spring" and "fall"
# Need to define survey season, area, efficiency, selectivity

# Survey name
survey.name="BTS_spring_allbox_effic1"

#Atlantis model timestep corresponding to the true output--now from census_spec.R
timestep <- stepperyr #5

#Which atlantis timestep does the survey run in?--now from census_spec.R
# with 5 output steps per year, 0 is Jan-Feb-midMar, 1 is midMar-Apr-May, 
# 2 is June-July-midAug, 3 is midAug-Sept-Oct, 4 is Nov-Dec (ish)

survey_sample_time <- 1 # spring survey

#The last timestep to sample
total_sample <- noutsteps-1 #495

#Vector of indices of survey times to pull
survey_sample_full <- seq(survey_sample_time,
                          total_sample, by=timestep)

survtime <- survey_sample_full

# survey area
# should return all model areas
survboxes <- allboxes

# survey efficiency (q)
# should return a perfectly efficient survey 
surveffic <- data.frame(species=survspp,
                     efficiency=rep(1.0,length(survspp)))

# survey selectivity (agecl based)
# this is by age class, need to change to use with ANNAGEBIO output
#survselex <- data.frame(species=rep(names(age_classes), each=n_age_classes),
#                     agecl=rep(c(1:n_age_classes),length(survspp)),
#                     selex=rep(1.0,length(survspp)*n_age_classes))

# for annage output uses names(annages) NOT alphabetical survspp
survselex <- data.frame(species=rep(names(annages), n_annages), #  
                        agecl=unlist(sapply(n_annages,seq)),
                        selex=rep(1.0,sum(n_annages)))

# effective sample size needed for sample_fish
# this effective N is high but not equal to total for numerous groups
surveffN <- data.frame(species=survspp, effN=rep(100000, length(survspp)))

# survey index cv needed for sample_survey_xxx
# cv = 0.1
surv_cv <- data.frame(species=survspp, cv=rep(0.1,length(survspp)))

# length at age cv for input into calc_age2length function
# function designed to take one cv for all species, need to change to pass it a vector
lenage_cv <- 0.1

# max size bin for length estimation, function defaults to 150 cm if not supplied
maxbin <- 200

# Default survey configuration here has a range of efficiencies and selectivities
# To emulate a range of species in a single multispecies survey
# Also now happens in "spring" and "fall"
# Need to define survey season, area, efficiency, selectivity

# Survey name
survey.name="BTS_fall_allbox_effic1"

#Atlantis model timestep corresponding to the true output--now from census_spec.R
timestep <- stepperyr #5

#Which atlantis timestep does the survey run in?--now from census_spec.R
# with 5 output steps per year, 0 is Jan-Feb-midMar, 1 is midMar-Apr-May, 
# 2 is June-July-midAug, 3 is midAug-Sept-Oct, 4 is Nov-Dec (ish)

survey_sample_time <- 3 # fall survey

#The last timestep to sample
total_sample <- noutsteps-1 #495

#Vector of indices of survey times to pull
survey_sample_full <- seq(survey_sample_time,
                          total_sample, by=timestep)

survtime <- survey_sample_full

# survey area
# should return all model areas
survboxes <- allboxes

# survey efficiency (q)
# should return a perfectly efficient survey 
surveffic <- data.frame(species=survspp,
                     efficiency=rep(1.0,length(survspp)))

# survey selectivity (agecl based)
# this is by age class, need to change to use with ANNAGEBIO output
#survselex <- data.frame(species=rep(survspp, each=n_age_classes),
#                     agecl=rep(c(1:n_age_classes),length(survspp)),
#                     selex=rep(1.0,length(survspp)*n_age_classes))

# for annage output
survselex <- data.frame(species=rep(names(annages), n_annages), #  
                        agecl=unlist(sapply(n_annages,seq)),
                        selex=rep(1.0,sum(n_annages)))

# effective sample size needed for sample_fish
# this effective N is high but not equal to total for numerous groups
surveffN <- data.frame(species=survspp, effN=rep(100000, length(survspp)))

# survey index cv needed for sample_survey_xxx
# cv = 0.1
surv_cv <- data.frame(species=survspp, cv=rep(0.1,length(survspp)))

# length at age cv for input into calc_age2length function
# function designed to take one cv for all species, need to change to pass it a vector
lenage_cv <- 0.1

# max size bin for length estimation, function defaults to 150 cm if not supplied
maxbin <- 200

msfishery.R configures the fishery dependent data:

# Default fishery configuration here is a census
# Inherits species from input omlist_ss
fishspp <- omlist_ss$species_ss 

#Number of years of data to pull
nyears <- 50

#Atlantis initialization period in years
burnin <- 30

# fishery output: learned the hard way this can be different from ecosystem outputs
fstepperyr <- if(omlist_ss$runpar$outputstepunit=="days") 365/omlist_ss$runpar$toutfinc


# same time dimensioning parameters as in surveycensus.R
#Vector of indices of catch in numbers to pull (by timestep to sum)
fish_sample_full <- c(0:total_sample)  #total_sample defined in sardinesurvey.R
fish_burnin <- burnin*fstepperyr+1
fish_nyears <- nyears*fstepperyr
fish_times <- fish_sample_full[fish_burnin:(fish_burnin+fish_nyears-1)]
fish_timesteps <- seq(fish_times[fstepperyr], max(fish_times), by=fstepperyr) #last timestep
#fish_years <- unique(floor(fish_times/fstepperyr)+1) # my original
fish_years <- unique(floor(fish_times/fstepperyr)) #from Christine's new sardine_config.R

fishtime <- fish_times


# fishery sampling area
# should return all model areas, this assumes you see everything that it caught
fishboxes <- c(0:(omlist_ss$boxpars$nbox - 1))

# effective sample size needed for sample_fish
# this effective N is high but not equal to total for numerous groups
fisheffN <- data.frame(species=survspp, effN=rep(1000, length(survspp)))

#this adjusts for subannual fishery output so original effN is for whole year
fisheffN$effN <- fisheffN$effN/fstepperyr 

# fishery catch cv can be used in sample_survey_biomass
# perfect observation
fish_cv <- data.frame(species=survspp, cv=rep(0.01,length(survspp)))

Using the wrappers (now updated for annual age, multiple surveys, and multispecies)

All the config files go in the config folder and we’ll try running this:

Run notes: om_comps output took 4 hours for this group of 11 species with 2 surveys and 1 fishery.

#NOBAom <- om_init(here("config/NOBA2config.R"))

#NOBAom_ms <- om_species(sppsubset$Name, NOBAom)

NOBAom_ms <- readRDS(file.path(d.name, paste0(scenario.name, "omlist_ss.rds")))

# new wrapper for multiple surveys saves survey output separately
# but returns a list with all surveys, not ideal; make consistent

NOBAom_ms_ind <- om_index(usersurvey = c(here("config/mssurvey_spring.R"), 
                                         here("config/mssurvey_fall.R")), 
                           userfishery = here("config/msfishery.R"),
                           omlist_ss = NOBAom_ms, 
                           n_reps = 1, 
                           save = TRUE)

NOBAom_ms_comp <- om_comps(usersurvey = c(here("config/mssurvey_spring.R"), 
                                         here("config/mssurvey_fall.R")), 
                           userfishery = here("config/msfishery.R"),
                           omlist_ss = NOBAom_ms, 
                           n_reps = 1, 
                           save = TRUE)

Now need a function to read survey-specific output files back into a single object with named surveys.

readsurvs <- function(path, type){
  
  datlook <- data.frame(dattype = c('survB', 'survAge', 'survLen', 'survWtage', 'survAnnAge', 'survAnnWtage'),
                        pattern = c("*surveyB.rds", "*survObsAgeComp.rds", "*survObsLenComp.rds", "*survObsWtAtAge.rds",
                                    "*survObsFullAgeComp.rds", "*survObsFullWtAtAge.rds"),
                        datname = c('survObsBiom', 'age_comp_data', 'len_comp_data', 'wtage', 'annage_comp_data', 'wtannage'))
  
  survs <- list.files(path=path, pattern = as.character(datlook$pattern[datlook$dattype %in% type]), full.names = TRUE)
  
  survey.name <-  str_match(survs, paste0(scenario.name,"_\\s*(.*?)\\s",datlook$pattern[datlook$dattype==type]))[,2]
  
  datname <- lapply(survs, readRDS)
  
  names(datname) <- survey.name
  
  return(datname)
}

Testing–read in the data for plotting:

survObsBiom <- readsurvs(d.name, 'survB')
age_comp_data <- readsurvs(d.name, 'survAge')
len_comp_data <- readsurvs(d.name, 'survLen')
wtage <- readsurvs(d.name, 'survWtage')
annage_comp_data <- readsurvs(d.name, 'survAnnAge')
annage_wtage <- readsurvs(d.name, 'survAnnWtage')

Also need new plot functions and updated input file-writing functions that expect this structure. Preliminary plot functions below.

Wrapper test results

Biomass index

omlist_ss <- readRDS(file.path(d.name, paste0(scenario.name, "omlist_ss.rds")))
source(here("config/omdimensions.R"))

#read time series data--now needs survey name from config files

#need plotting/visualization wrappers
#survObsBiom <- readRDS(file.path(d.name, paste0(scenario.name, "surveyB.rds")))

# object in memory is no longer the same as saved files, need to read back into same configuration
# list of files in directory with surveyB.rds ending
# append to survObsBiom
# apply survey name from filename
# survs <- list.files(path=d.name, pattern = "*surveyB.rds", full.names = TRUE)
# survey.name <-  str_match(survs, paste0(scenario.name,"_\\s*(.*?)\\s*surveyB.rds"))[,2]
# survObsBiom <- lapply(survs, readRDS)
# names(survObsBiom) <- survey.name

#survObsBiom <- NOBAom_ms_ind$survObsBiom

# survey.name="BTS_fall_allbox_effic1"
# fallsurv <- readRDS(file.path(d.name, paste0(scenario.name, "_",survey.name,"surveyB.rds")))

plotB <- function(dat){
  
    ggplot() +
    geom_line(data=dat, aes(x=time/stepperyr,y=atoutput, color="survey Biomass"), 
              alpha = 10/10) +
    theme_tufte() +
    theme(legend.position = "top") +
    labs(colour=scenario.name) +
    facet_wrap(~species, scales="free") 
  
}

# why is herring missing from both?
# herring <- NOBAom_ms$truebio_ss %>%
#   filter(species %in% c("Norwegian_ssh")) %>%
#   group_by(species, time) %>%
#   summarise(totB = sum(atoutput))
  
# problem--omdimensions is doing species age accounting wrong
# species alphabetical pulled separately but ages not
# pull species name, n age classes and n ages all at once to avoid bad sorting
# also fix same problem in om_index with wtatage hack!

BTS_fall_allbox_effic1

map(survObsBiom$BTS_fall_allbox_effic1, plotB)[[1]]

BTS_spring_allbox_effic1

map(survObsBiom$BTS_spring_allbox_effic1, plotB)[[1]]

Catch time series

#read time series data
catchbio_ss <- readRDS(file.path(d.name, paste0(scenario.name, "fishCatch.rds")))

catchbio_ss <- catchbio_ss[[1]]

plotC <-ggplot() +
  geom_line(data=catchbio_ss, aes(x=time/365,y=atoutput, color="observed Catch"), 
            alpha = 10/10) +
  theme_tufte() +
  theme(legend.position = "top") +
  labs(colour=scenario.name)

plotC +
  facet_wrap(~species, scales="free")

Survey length composition

#length comps
#len_comp_data <- readRDS(file.path(d.name, paste0(scenario.name, "survObsLenComp.rds")))

# now using readsurvs function above
# survs <- list.files(path=d.name, pattern = "*survObsLenComp.rds", full.names = TRUE)
# survey.name <-  str_match(survs, paste0(scenario.name,"_\\s*(.*?)\\s*survObsLenComp.rds"))[,2]
# len_comp_data <- lapply(survs, readRDS)
# names(len_comp_data) <- survey.name

fish_len_comp_data <- readRDS(file.path(d.name, paste0(scenario.name, "fishObsLenComp.rds")))

#len_comp_data <- len_comp_data[[1]]
fish_len_comp_data <- fish_len_comp_data[[1]]

#add this to om_indices function so that this has years when read in
#fish_len_comp_data$time <- as.integer(floor(fish_len_comp_data$time/fstepperyr))

plotlen <- function(dat){
  
    ggplot(dat, aes(upper.bins)) +
    geom_bar(aes(weight = atoutput)) +
    theme_tufte() +
    labs(subtitle = paste(scenario.name,
                          dat$species)) +
    facet_wrap(~time/stepperyr, ncol=6, scales="free_y")

}

# len <- filter(len_comp_data, time %in% c(55:175))
# 
#   lfplot <- ggplot(len, aes(upper.bins)) +
#     geom_bar(aes(weight = atoutput)) +
#     theme_tufte() +
#     labs(subtitle = paste(scenario.name,
#                           len$species))
#   
#   lfplot + facet_wrap(~time/stepperyr, ncol=6, scales="free_y")

BTS_fall_allbox_effic1

lcompsub <- as.data.frame(len_comp_data$BTS_fall_allbox_effic1[1]) %>% filter(time %in% c(55:175)) %>%
  group_by(species) %>%
  group_map(~ plotlen(.x), keep = TRUE)

for(i in 1:length(lcompsub)) {
  print(lcompsub[[i]])
}

#map(len_comp_data$BTS_fall_allbox_effic1 %>% keep(time %in% c(55:175)), plotlen)[[1]]

BTS_spring_allbox_effic1

lcompsub <- as.data.frame(len_comp_data$BTS_spring_allbox_effic1[1]) %>% filter(time %in% c(55:175)) %>%
  group_by(species) %>%
  group_map(~ plotlen(.x), keep = TRUE)

for(i in 1:length(lcompsub)) {
  print(lcompsub[[i]])
}

Survey age composition (age classes)

#read in comp data--now done above with function
# age_comp_data <- readRDS(file.path(d.name, paste0(scenario.name, "survObsAgeComp.rds")))
# age_comp_data <- age_comp_data[[1]]

#Natage <- filter(age_comp_data, time %in% c(150:270))

Natageplot <- function(dat){
  ggplot(dat, aes(x=agecl, y=atoutput)) +
    geom_point() +
    theme_tufte() +
    labs(subtitle = paste(scenario.name,
                          dat$species)) + 
    facet_wrap(~time/stepperyr, ncol=6, scales="free_y")
}
  
#Natageplot

BTS_fall_allbox_effic1

acompsub <- as.data.frame(age_comp_data$BTS_spring_allbox_effic1[1]) %>% filter(time %in% c(150:270)) %>%
  group_by(species) %>%
  group_map(~ Natageplot(.x), keep = TRUE)

for(i in 1:length(acompsub)) {
  print(acompsub[[i]])
}

BTS_spring_allbox_effic1

acompsub <- as.data.frame(age_comp_data$BTS_spring_allbox_effic1[1]) %>% filter(time %in% c(150:270)) %>%
  group_by(species) %>%
  group_map(~ Natageplot(.x), keep = TRUE)

for(i in 1:length(acompsub)) {
  print(acompsub[[i]])
}

Survey weight at age (age classes)

# now done with function above
# wtage <- readRDS(file.path(d.name, paste0(scenario.name, "survObsWtAtAge.rds")))
# wtage <- wtage[[1]]

wageplot <- function(dat){
  ggplot(dat, aes(time, atoutput)) +
    geom_line(aes(colour = factor(agecl))) +
    theme_tufte() +
    theme(legend.position = "bottom") +
    xlab("model timestep (5 per year)") +
    ylab("average individual weight (g)") +
    labs(subtitle = paste0(scenario.name)) +
    facet_wrap(c("species"), scales="free_y")
}

#wageplot

BTS_fall_allbox_effic1

map(wtage$BTS_fall_allbox_effic1, wageplot)[[1]]

BTS_spring_allbox_effic1

map(wtage$BTS_spring_allbox_effic1, wageplot)[[1]]

Fishery length composition

lcompsub <- as.data.frame(fish_len_comp_data) %>% filter(time %in% c(55:175)) %>%
  group_by(species) %>%
  group_map(~ plotlen(.x), keep = TRUE)

for(i in 1:length(lcompsub)) {
  print(lcompsub[[i]])
}

Fishery age composition (age classes)

#read in comp data
fish_age_comp <- readRDS(file.path(d.name, paste0(scenario.name, "fishObsAgeComp.rds")))
fish_age_comp <- fish_age_comp[[1]]

#add this to om_indices function so that this has years when read in
#fish_age_comp$time <- fish_age_comp$time/fstepperyr

acompsub <- as.data.frame(fish_age_comp) %>% filter(time %in% c(150:270)) %>%
  group_by(species) %>%
  group_map(~ Natageplot(.x), keep = TRUE)

for(i in 1:length(acompsub)) {
  print(acompsub[[i]])
}

Fishery weight at age (age classes)

fishwtage <- readRDS(file.path(d.name, paste0(scenario.name, "fishObsWtAtAge.rds")))
fishwtage <- fishwtage[[1]]

wageplot(fishwtage)

Wrappers still work for NOBA cod using standard age classes.

Visualize new annual age output for numbers at age, catch at age, and weight at age (interpolated):

Wrapper test results for annual ages

Survey age composition (annual ages)

# annage_comp_data read in above, apply plot function, quit hardcoding survey names

for(s in names(annage_comp_data)){
  cat("  \n##### ",  s,"  \n")
  acompsub <- as.data.frame(annage_comp_data[[s]][[1]]) %>% filter(time %in% c(150:270)) %>%
    group_by(species) %>%
    group_map(~ Natageplot(.x), keep = TRUE)
  
  for(i in 1:length(acompsub)) {
    print(acompsub[[i]])
  }
  cat("  \n")
}

BTS_fall_allbox_effic1

BTS_spring_allbox_effic1

Survey iterpolated weight at age (annual ages)

# annage_wtage read in above 
# wtage <- wtage[[1]] #this still has second list component, diagnostic plot
# wtage <- wtage[[1]]

for(s in names(annage_wtage)){
  cat("  \n##### ",  s,"  \n")
  print(wageplot(annage_wtage[[s]][[1]][[1]]))
  cat("  \n")
}

BTS_fall_allbox_effic1

BTS_spring_allbox_effic1

Fishery catch at age (annual ages)

fish_annage_comp <- readRDS(file.path(d.name, paste0(scenario.name, "fishObsFullAgeComp.rds")))
fish_annage_comp <- fish_annage_comp[[1]]

#add this to om_indices function so that this has years when read in
fish_annage_comp$time <- fish_annage_comp$time/fstepperyr

acompsub <- as.data.frame(fish_annage_comp) %>% filter(time %in% c(30:54)) %>%
  group_by(species) %>%
  group_map(~ Natageplot(.x), keep = TRUE)

for(i in 1:length(acompsub)) {
  print(acompsub[[i]])
}

Fishery iterpolated weight at age (annual ages)

fish_annage_wtage <- readRDS(file.path(d.name, paste0(scenario.name, "fishObsFullWtAtAge.rds")))

wageplot(fish_annage_wtage[[1]][[1]])

Generate data for multispecies modeling, including diet!

Sarah Gaichas

22 October, 2020

Which species?

Diet interactions

Plotting diets

Capelin

Blue whiting

Saithe

Cod

Haddock

Greenland halibut

Herring

Redfish

Mackerel

Long rough dab

Polar cod

Interactions among these species

Total consumption time series

Consumption by polygon and layer

Cod agecl 5 polygon 1

Cod agecl 5 polygon 6

Cod agecl 5 polygon 7

Cod agecl 5 polygon 12

Cod agecl 5 polygon 19

Cod agecl 5 polygon 25

Cod agecl 5 polygon 29

Cod agecl 5 polygon 37

Cod agecl 5 all polygons

Cod agecl 10 all polygons

Generating the dataset (September 2020)

Config files

Using the wrappers (now updated for annual age, multiple surveys, and multispecies)

Wrapper test results

Biomass index

BTS_fall_allbox_effic1

BTS_spring_allbox_effic1

Catch time series

Survey length composition

BTS_fall_allbox_effic1

BTS_spring_allbox_effic1

Survey age composition (age classes)

BTS_fall_allbox_effic1

BTS_spring_allbox_effic1

Survey weight at age (age classes)

BTS_fall_allbox_effic1

BTS_spring_allbox_effic1

Fishery length composition

Fishery age composition (age classes)

Fishery weight at age (age classes)

Wrapper test results for annual ages

Survey age composition (annual ages)

BTS_fall_allbox_effic1

BTS_spring_allbox_effic1

Survey iterpolated weight at age (annual ages)

BTS_fall_allbox_effic1

BTS_spring_allbox_effic1

Fishery catch at age (annual ages)

Fishery iterpolated weight at age (annual ages)