Getting mortality from atlantisom: testing calc

Introduction

This page documents initial testing of the atlantisom package in development at https://github.com/r4atlantis/atlantisom using three different Atlantis output datasets. Development of atlantisom began at the 2015 Atlantis Summit in Honolulu, Hawaii, USA. On this page we demonstrate use of atlantisom on Norwegian Barents Sea Atlantis (NOBA) output files to test function calc_Z calculating total mortality. This is used both to derive “natural mortality” for comparison with stock assessments and to split age classes into true ages using the function calc_stage2age.

NOBA has output files in true ages that we can compare results with. These files are in the folder NOBAwithAnnAgeOutput (not kept on github due to large filesize).

All model setup and configuration is described here.

library(tidyr)
require(dplyr)
library(ggplot2)
library(data.table)
library(here)
library(ggforce)
library(ggthemes)
library(atlantisom)

initCCA <- FALSE
initNEUS <- FALSE
initNOBA <- TRUE

#function to make a config file? need one for each atlantis run

if(initCCA) source(here("config/CC1Config.R"))

if(initNEUS) source(here("config/NEUSConfig.R"))

if(initNOBA) source(here("config/NOBAaaConfig.R"))

#Load functional groups
funct.groups <- load_fgs(dir=d.name,
                         file_fgs = functional.groups.file)
#Get just the names of active functional groups
funct.group.names <- funct.groups %>% 
  filter(IsTurnedOn == 1) %>%
  select(Name) %>%
  .$Name

# default run_truth setup will save the file, so check for that first

if(!file.exists(file.path(d.name, 
                          paste0("output", scenario.name, "run_truth.RData")))){
  #Store all loaded results into an R object
  truth <- run_truth(scenario = scenario.name,
                     dir = d.name,
                     file_fgs = functional.groups.file,
                     file_bgm = box.file,
                     select_groups = funct.group.names,
                     file_init = initial.conditions.file,
                     file_biolprm = biol.prm.file,
                     file_runprm = run.prm.file
  )
} else{
  truth <- get(load(file.path(d.name,
                              paste0("output", scenario.name, "run_truth.RData"))))
}

truecatchbio <- load_catch(d.name, file_catch = catch.file, fgs = funct.groups)

Test calc Z function

This section tests the atlantisom::calc_Z() function step by step. Ultimately this output goes to calc_stage2age to produce annual numbers at true age, which we can compare with the ANNAGEBIO.nc output produced by NOBA.

# make a function for this
source(here("config/census_spec.R"))

Output timestep toutinc for the population is 73,

so steps per year in run_truth output is 5

and the number of output steps in run_truth output is 560.

True numbers total (annual)

This gives us the true total numbers on an annual basis. We can compare this with age specific outputs.

# this uses result$nums and a new function to get survey index in numbers (abundance)

survey_testNall <- create_survey(dat = truth$nums,
                                 time = timeall,
                                 species = survspp,
                                 boxes = boxall,
                                 effic = effic1,
                                 selex = selex1)

# save this, needed as input to calc_Z?
saveRDS(survey_testNall, file.path(d.name, paste0(scenario.name, "survey_testNall.rds")))

# as above, make up a constant 0 cv for testing
surv_cv <- data.frame(species=survspp, cv=rep(0.0,length(survspp)))

surveyN <- sample_survey_numbers(survey_testNall, surv_cv)

#save for later use, takes a long time to generate
saveRDS(surveyN, file.path(d.name, paste0(scenario.name, "surveyNcensus.rds")))

surveyN <- readRDS(file.path(d.name, paste0(scenario.name, "surveyNcensus.rds")))

surveyN_ss <- surveyN[surveyN$species == "North_atl_cod",]

plotN <-ggplot() +
  geom_line(data=surveyN_ss, aes(x=time/5,y=atoutput, color="survey census N"), 
            alpha = 10/10) +
  theme_tufte() +
  theme(legend.position = "top") +
  labs(colour=scenario.name)

plotN +
  facet_wrap(~species, scales="free")

Biological sampling: true numbers at age class

We get true numbers at age at each output step by appling sample_fish to the survey_testNall result if the model group has one true age per output age class:

# We need an effective N for the sample fish function, setting equal to actual N
# this one is high but not equal to total for numerous groups
effNhigh <- data.frame(species=survspp, effN=rep(1e+8, length(survspp)))

numsallhigh <- sample_fish(survey_testNall, effNhigh)

#save for later use, takes a long time to generate
saveRDS(numsallhigh, file.path(d.name, paste0(scenario.name, "Natageclcensus.rds")))

This is NOBA Cod true age class output. However, the 10 standard atlantis age classes aggregate true ages each for Cod.

Number of true age classes is stored in the […]groups[…].csv file, which is read in using load_fgs. We read this in as funct.groups above. The column NumAgeClassSize stores the number of true ages:

Number of age classes for Cod: 2

Natage <- readRDS(file.path(d.name, paste0(scenario.name, "Natageclcensus.rds")))

Natage_ss <- Natage[Natage$species == "North_atl_cod",]

Natageplot <- ggplot(Natage_ss, aes(x=agecl, y=atoutput)) +
  geom_point() +
  theme_tufte() +
  labs(subtitle = paste(scenario.name,
                        Natage_ss$species))

Natageplot + facet_wrap_paginate(~time, ncol=4, nrow = 4, page = 1, scales="free")

Natageplot + facet_wrap_paginate(~time, ncol=4, nrow = 4, page = 2, scales="free")

Natageplot + facet_wrap_paginate(~time, ncol=4, nrow = 4, page = 3, scales="free")

Natageplot + facet_wrap_paginate(~time, ncol=4, nrow = 4, page = 4, scales="free")

What level of aggregation works for calc_Z?

]The atlantisom function calc_stage2age calls the calc_Z function, applying a total mortality rate to estimate the numbers at true age within an age class, but it is unclear whether to use this on the aggregated numbers or the full resolution numbers at age. Here we test outputs of Z for several levels of aggregation:

# add YOY file to the config files
YOY <- load_yoy(d.name, paste0("output", scenario.name, "YOY.txt"))

# load biolprm in some initialize file?
biol <- load_biolprm(d.name, biol.prm.file)

## Warning in `[<-.data.frame`(`*tmp*`, , 3:12, value = structure(list(`1` =
## structure(c(124L, : provided 20 variables to replace 10 variables

# cut to a single species in YOY file, hardcoded for NOBA cod
YOY_ss <- YOY %>%
  select(Time, NCO.0)

# numbers at agecl at full resolution (all polygons and layers)
truenums_ss <- truth$nums[truth$nums$species == "North_atl_cod",]

survey_testNall <- readRDS(file.path(d.name, paste0(scenario.name, "survey_testNall.rds")))

# numbers at ageclaggregated over layers, retain polygons (output of create_survey)
survey_testN_ss <- survey_testNall[survey_testNall$species == "North_atl_cod",]

# numbers at agecl aggregated over layers and polygons (output of sample_fish)
# done above
# Natage_ss <- Natage[Natage$species == "North_atl_cod",]

#calc_Z <- function(yoy, nums, fgs, biolprm, toutinc)

fullresZ <- calc_Z(yoy = YOY_ss,
                   nums = truenums_ss,
                   fgs = funct.groups,
                   biolprm = biol,
                   toutinc = runpar$toutinc)

surveyresZ <- calc_Z(yoy = YOY_ss,
                   nums = survey_testN_ss,
                   fgs = funct.groups,
                   biolprm = biol,
                   toutinc = runpar$toutinc)

sampleresZ <- calc_Z(yoy = YOY_ss,
                   nums = Natage_ss,
                   fgs = funct.groups,
                   biolprm = biol,
                   toutinc = runpar$toutinc)

Fixed a bug where grep was used if a # appeared, none of those in NOBA, will have to see if output is produced in CCA. This ran for the survey resolution numbers. I then went back and ran the two other levels above after fixing calc_Z.

Something is wrong with running after sample_fish because sampleresZ returned all 0s. Both true full resolution numbers and survey resolution numbers (aggregated over polygon) returned nonzero Z values. They are identical aside from seven values with no apparent pattern (rows 84, 85, 130, 131, 147, 172, 196) out of 201 that differ at e-16. I would be happer if the differences were all exactly 0, but I suppose we can attribute that to rounding error.

# test with one species at survey scale (layers aggregated, not time or polygon)
yoy <- YOY_ss
nums <- survey_testN_ss
fgs <- funct.groups
biolprm <- biol
toutinc <- runpar$toutinc

# everything below the function definition line
  # subset the yoy for species included in the fgs file
  # that are turned on
  # colnames of the recruit data are "Time" and a column for each
  # species where the name is the functional group with an additional .0
  turnedon <- fgs[fgs$IsTurnedOn > 0, ]
  recruits <- yoy[, colnames(yoy) %in% c("Time", paste0(turnedon$Code, ".0"))]

  # mg carbon converted to wet weight in tonnes
  k_wetdry <- biolprm$kgw2d / 1000000000
  # Sum of structural and reserve nitrogen (KWSR_RN + KWSR_SN)
  nitro <- merge(biolprm$kswr, biolprm$kwrr, by = "1")
  nitro$sum <- apply(nitro[, 2:3], 1, sum)

  # legacy: If output is from legacy code there will be an error in the
  # yoy data, where the first row is in a different unit.
  # yoy.txt is the biomass in tonnes per spawning event summed over the total
  # model domain.
  # The first row (< Nov/Dec 2015) is stored as biomass and the remaining rows
  # are stored in numbers, must convert the entire matrix to biomass
  # Check if legacy code and if so convert the numbers to biomass

  # G.Fay 2/21/16 : changed yoy to recruits in below loop.
  if (abs(recruits[1, 2] / recruits[2, 2]) > 10) {
    recruits[2:NROW(recruits), 2:NCOL(recruits)] <- recruits[2:NROW(recruits), 2:NCOL(recruits)] *
    nitro[match(gsub(".0", "", colnames(recruits)[-1]), nitro[, 1]), "sum"]
  }


  # Wide to long
  recruits <- reshape(data = recruits, direction = "long",
    varying = colnames(recruits)[-1],
    v.names = "recruits",
    times = colnames(recruits)[-1],
    timevar = "group")
  rownames(recruits) <- 1:NROW(recruits)
  recruits <- recruits[, -which(colnames(recruits) == "id")]
  # Switch from species code to species
  recruits$group <- gsub("\\.0", "", recruits$group)
  recruits <- merge(recruits, fgs[, c("Code", "Name")],
    by.x = "group", by.y = "Code")

  # merge recruits with strucn and resn of recruits from biol.prm file
  recruits <- merge(recruits, nitro[, c(1, 4)],
    by.x = "group", by.y = "1", all.x = TRUE, all.y = FALSE)
  colnames(recruits)[which(colnames(recruits) == "recruits")] <- "recruitsbio"
  # Get recruits in numbers rather than biomass
  recruits$recruits <- recruits$recruitsbio / recruits$sum

  # G.Fay 2/21/16
  # UGLY code below tries to align fraction of annual yoy with timing of recruitment
  # values in YOY.txt are total YOY that year waiting to recruit.
  # seems to get rid of most of 'issues' - some v.minor survival >1,
  #perhaps due to averaging of survival over toutinc days
  nyrs <- ceiling(max(yoy$Time)/365)
  times <- unique(yoy$Time)
  recstart_temp <- biolprm$recruit_time
  #recstart_temp[,2] <- biolprm$time_spawn[-(grep('#',
  #                     biolprm$time_spawn[,1])),2] + biolprm$recruit_time[,2]
  # was this grep to get rid of a species with #XXX? breaks when there are none
  recstart_temp[,2] <- biolprm$time_spawn[,2] + biolprm$recruit_time[,2]
  
  #recstart_temp <- recstart_temp[recstart_temp[,1]%in%turnedon$Code,] #bug added codes with no YOY output
  recstart_temp <- recstart_temp[recstart_temp[,1]%in%recruits$group,]
  recruits$frac_recruit <- 0
  for (irow in 1:nrow(recstart_temp)) {
    group <- recstart_temp[irow,1]
    pick <- which(recruits$group == group)

    recstart <- seq(recstart_temp[irow,2],by=365,length.out=nyrs)
    recstart <- recstart[recstart<max(recruits$Time[pick])]
    recend <- recstart + biolprm$recruit_period[irow,2]
    #rec_times <- rbind(rec_times,cbind(group,recstart,recend))

    for (i_rec in 1:length(recstart)) {
      i_tstart <- which(pick==min(pick[recruits$Time[pick]>=recstart[i_rec]]))
      i_tstop <- which(pick==min(pick[recruits$Time[pick]>=recend[i_rec]]))
      if (i_rec == length(recstart)) i_tstop <- length(pick)
      n_t <- 1+i_tstop-i_tstart
      for (i_t in 1:n_t) {
        t_temp <- recruits$Time[pick[i_tstart+i_t-1]]
        num_temp <- t_temp - recstart[i_rec]
        if (i_t>1) {
          if ((recend[i_rec]-t_temp)>toutinc) {
            num_temp <- toutinc
          }
          else {
            num_temp <- recend[i_rec]-(recruits$Time[pick[i_tstart+i_t-2]])
          }
        }
        frac_temp <- max(c(0,num_temp /(recend[i_rec]-recstart[i_rec])))
        recruits$frac_recruit[pick[i_tstart+i_t-1]] <- frac_temp
      }
    }
  }


  # match "Time" of the young of the year with the time-step periodicity
  # listed in the run.prm or run.xml file
  recruits$Time <- recruits$Time / toutinc

  #G.Fay 1/6/16, expand to num of recruits
  # Recruit / mg C converted to wet weight in tonnes / redfield ratio of C:N
  recruits$recruits <- recruits$recruits/k_wetdry/biolprm$redfieldcn

  # Sum over all boxes/depth/cohorts
  totnums <- aggregate(atoutput ~ species + time, data = nums, sum)

  # Combine recruits and numbers
  # Only pull recruits from the yearly time step, where the
  # yearly time step matches the time step
  totnums <- merge(totnums, recruits,
    by.x = c("time", "species"), by.y = c("Time", "Name"),
    all.x = TRUE, all.y = FALSE)
  totnums$group <- recruits$group[match(totnums$species, recruits$Name)]
  # For all time increments where there
  totnums$recruits[is.na(totnums$recruits)] <- 0

  # Calculate survivors for each species group
  totnums$survivors <- totnums$recruits
  # Make sure time is in order
  totnums <- totnums[order(totnums$species, totnums$time), ]

  totnums$recruits <- totnums$recruits*totnums$frac_recruit

  totnums$survival <- totnums$survivors
  # Calculate survival for each group
  for (group in unique(totnums$group)) {
    #if(group == "SHD") browser()
    pick <- which(totnums$group == group)
    survival_temp <- c(NA,
      totnums$survivors[pick[-1]]/totnums$atoutput[pick[-length(pick)]])

    # G.Fay 2/21/16  "think" this is what things should be, recruits don't show up
    # in numbers at age until time step after the recruitment event.
     survival_temp <- c(
       (totnums$atoutput[pick[-1]]-
          totnums$recruits[pick[-1]])/totnums$atoutput[pick[-length(pick)]],NA)

#     survival_temp <- c(
#       (totnums$atoutput[pick[-1]])/
#          (totnums$recruits[pick[-1]]+totnums$atoutput[pick[-length(pick)]]),NA)

    survival_temp[survival_temp < 0] <- NA
    # Use first positive value to replace the initial year and all negative vals
    firstgood <- which(!is.na(survival_temp))[1]

    survival_temp[1:firstgood] <- survival_temp[firstgood]
    for(ii in seq_along(survival_temp)) {
      if (is.na(survival_temp[ii])) {
        nonzero <- which(which(survival_temp > 0) > ii)
        if (length(nonzero) == 0) nonzero <- which(survival_temp > 0)
        survival_temp[ii] <- survival_temp[which.min(abs(nonzero - ii))]
      }
    }
    totnums$survival[pick] <- survival_temp
   }

  #Calculate Z
  totnums$Z <- -1 * log(totnums$survival)
  finaldata <- data.frame("species" = totnums$species,
    "agecl" = NA, "polygon" = NA, "layer" = NA,
    "time" = totnums$time, "atoutput" = totnums$Z)

surveyresZ <- finaldata

What is F at each timestep?

This probably depends on how the model is set up, but for models forced with F by species we ought to be able to read this out. If it is a daily rate then we mulitply by toutinc to get the rate at each of these output timestep Z values.

Quick test to see if Z is right?

Perhaps overly optimistic, but if we feed these into calc_stage2age and get a result that matches the NOBA output ANNAGEBIO.nc, then we can assume the Zs are correct.

These are estimated true age comps for Cod in annual ages:

# for single species, Z was calculated above as surveyresZ
trueages_ss <- calc_stage2age(nums_data = survey_testN_ss,
                                 biolprm = biol, 
                                 yoy = YOY_ss,
                                 fgs = funct.groups, 
                                 runprm = runpar)

# run sample fish on this for completeness? yes to aggregate
trueNagesamp_ss <- sample_fish(trueages_ss, effNhigh)

# plot it

Natageplot <- ggplot(trueNagesamp_ss, aes(x=agecl, y=atoutput)) +
  geom_point() +
  theme_tufte() +
  labs(subtitle = paste(scenario.name,
                        trueNagesamp_ss$species))

Natageplot + facet_wrap_paginate(~time, ncol=3, nrow = 3, page = 1, scales="free")

Natageplot + facet_wrap_paginate(~time, ncol=3, nrow = 3, page = 2, scales="free")

Natageplot + facet_wrap_paginate(~time, ncol=3, nrow = 3, page = 3, scales="free")

Natageplot + facet_wrap_paginate(~time, ncol=3, nrow = 3, page = 4, scales="free")

Not looking great, but at least not constant. This could be wrong Zs or wrong calculations in calc_stage2age. Load true ages from ANNAGEBIO.nc:

annage_nc <- paste0("output", scenario.name, "ANNAGEBIO.nc")
bboxes <- get_boundary(boxpars)

# need to change load_nc, at line 117 hardcoded for 10 cohorts!
trueage_atout <- load_nc(dir = d.name,
                         file_nc = annage_nc,
                         bps = boxpars,
                         fgs = funct.groups,
                         select_groups = survspp,
                         select_variable = "Nums",
                         check_acronyms = TRUE,
                         bboxes = bboxes)

## [1] "Read /Users/sgaichas/Documents/0_Data/Atlantis/Poseidon/poseidon-dev/atlantisoutput/NOBAwithAnnAgeOutput/outputnordic_runresults_01ANNAGEBIO.nc successfully"

trueage_atout_ss <- trueage_atout[trueage_atout$species == "North_atl_cod",]

#survey census to agg over layers
survey_trueage_atout_ss <- create_survey(dat = trueage_atout_ss,
                                 time = timeall,
                                 species = survspp,
                                 boxes = boxall,
                                 effic = effic1,
                                 selex = selex1)

# sample fish to agg over polygons
sample_trueage_atout_ss <- sample_fish(survey_trueage_atout_ss, effNhigh)

Compare true age output from atlantis with our estimate (and with aggregated age comps):

compareAge <- ggplot() +
  geom_point(data=trueNagesamp_ss, aes(x=agecl,y=atoutput, color="atlantisom"), alpha = 10/10) +
  geom_point(data=sample_trueage_atout_ss, aes(x=agecl,y=atoutput, color="atout true"), alpha = 10/10) + 
  geom_point(data=Natage_ss, aes(x=agecl*2, y=atoutput, color="cohort true"), alpha = 10/10) + 
  theme_tufte() +
  theme(legend.position = "top") +
  labs(subtitle = paste(scenario.name,
                        trueNagesamp_ss$species))

compareAge + 
  facet_wrap_paginate(~time, ncol=3, nrow = 3, page = 1, scales="free")

compareAge + 
  facet_wrap_paginate(~time, ncol=3, nrow = 3, page = 2, scales="free")

compareAge + 
  facet_wrap_paginate(~time, ncol=3, nrow = 3, page = 3, scales="free")

compareAge + 
  facet_wrap_paginate(~time, ncol=3, nrow = 3, page = 4, scales="free")

We only have the first 10 cohorts of true age output due to hardcoding in load_nc. I may write another load function that changes that. Either way our early ages don’t match. Something about the calculation or application of Z isn’t correct.

Getting mortality from atlantisom: testing calc_Z function I

Sarah Gaichas and Christine Stawitz