library(here)
library(stringr)
library(magrittr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(atlantisom)

Intro

Total catch in tons has been pulled from the Atlantis Catch.txt output file from 2019-2023 due to difficulties scaling catch in numbers from the CATCH.nc file to total weight. See comparisons here.

For testing multispecies models, there is a need for subannual catch information, which is not provided in the annual Catch.txt output. In addition, there is no polygon information in the Catch.txt output, so spatial subsetting is not possible.

In order to get catch in tons subannually and by area, we can use the fleet-specific catch information in the CATCH.nc file. These outputs are reported in tons by species aggregated over all age classes for each defined fleet. Variable names in the CATCH.nc file for these outputs follow the format “XXX_Catch_FCN” and “XXX_Discard_FCN” where XXX is the 3 letter group Code defined in the Atlantis groups.csv file and N is the integer Index for the fishery defined in the Atlantis fisheries.csv file.

Methods

Workflow

  1. Modify atlantisom::load_nc to get catch in tons from different CATCH.nc outputs. New function because current function already gets catch in numbers, would be complicated to use the same “Catch” variable to get tons. Name new function atlantisom::load_nc_catchtons

  2. Test new atlantisom::load_nc_catchtons with NOBA, NEUS, CC by summing output over polygons and fleets to each year and comparing with annual output of current atlantisom::load_catch based on Catch.txt

  3. Incorporate atlantisom::load_nc_catchtons into atlantisom::run_truth. Add a new catchtons object keeping polygon, timestep, and fleet information to the output list. Consider removing the Catch.txt output in this function, currently called catch_all but not used? Or keep it, since catchtons can be summed to catch_all this could be used as an internal test as was started in the run_truth function.

  4. Adjust atlantisom::om_init for the new catchtons output. Should we keep reading in the Catch.txt here? It may be ok to keep it just to keep existing functions for aggregate annual catch working.

  5. Adjust atlantisom::om_species for the new catchtons output.

  6. Adjust atlantisom::om_index for the new catchtons output. Create a new index that will be consistent across the subannual and annual datasets: this means applying the cv at the level of reporting? So possibly by fleet and timestep, then summing to annual? This involves including and updating two functions: atlantisom::create_fishery_subset parallel to the atlantisom::create_survey function to subset areas, currently used only on fishery comps but can now be applied to the index as well, and updating another atlantisom::sample_fishery_totalcatch for the cv application. This will save perhaps one new output and update one existing output, or just one output? Will we have the same current censusfishCatch.rds output that is annually aggregated plus a disaggregated version? Or have a single output with the same name censusfishCatch.rds that is disaggregated, to be summed separately in mskeyrun?

  7. Adjust mskeyrun functions creating aggregated and subannual catch datasets. Do we need a fleet specific index?

1. New atlantisom::load_nc_catchtons function

Here is the new function, tested with snippets in mskeyrun sarah_wgsamsim branch and implemented in atlantisom dev branch

#' Load Atlantis catch tons by fleet and polygon outputfiles (netcdf)
#'
#' This function loads Atlantis CATCH outputfile (netcdf) and
#' converts them to a \code{data.frame}.
#' @template dir
#' @template file_nc
#' @template bps
#' @template fgs
#' @template biolprm
#' @template select_groups
#' @param select_variable A character value specifying which variable to return.
#'   loaded. Only one variable of the options available (i.e., \code{c(
#'   "Catch", "Discard")
#'   }) can be loaded at a time.
#' @param check_acronyms A logical, specifying if selected groups are active
#'   in the model run. This is used in automated runs.
#'   All groups are passed when plotting is called via batch-file,
#'   (which cannot be changed easily) this will result in errors if some
#'   groups are not active in the model run.
#'   By default this argument is \code{TRUE}.
#' @template bboxes
#' @template verbose
#' @family load functions
#' @return A \code{data.frame} in long format with the following coumn names:
#'   Species, timestep, polygon, agecl, and atoutput (i.e., variable).
#'
#' @keywords gen
#' @author Alexander Keth, modified by Sarah Gaichas
#'
#'
#' @importFrom magrittr %>%
#' @export

load_nc_catchtons <- function(dir = getwd(), file_nc, file_fish, bps, fgs, biolprm, select_groups,
  select_variable =
  c("Catch", "Discard"),
  check_acronyms = TRUE, bboxes = c(0), verbose = FALSE) {
  # NOTE: The extraction procedure may look a bit complex...
  # A different approach would be to
  # create a dataframe for each variable (e.g. GroupAge_Nums)
  # and combine all dataframes at the end.
  # This alternative approach requires a lot more storage
  # and the code wouldn't be vectorized.

  # Check input of the nc file
  if (tail(strsplit(file_nc, "\\.")[[1]], 1) != "nc") {
    stop("The argument for file_nc,", file_nc, "does not end in nc")
  }
  if (is.null(dir)) {
    file.nc <- file_nc
  } else {
    file.nc <- file.path(dir, file_nc)
  }

  # Check input of select_variable as only one value is allowed
  select_variable <- match.arg(select_variable, several.ok = FALSE)

  # select_variable gets renamed for fishery data, save input
  input_select_variable <- select_variable

  # select_groups get renamed to code, save input
  select_groups_name <- select_groups

  # Check input structure!
  if (check_acronyms) {
    active_groups <- as.vector(subset(fgs, IsTurnedOn == 1)$Name)
    inactive_groups <- select_groups[which(
      !is.element(select_groups, active_groups))]
    if (length(inactive_groups) >= 1) {
      select_groups <- select_groups[!is.element(select_groups, inactive_groups)]
      warning(paste(paste("Some selected groups are not active in the model run.",
        "Check 'IsTurnedOn' in fgs\n"),
        paste(inactive_groups, collapse = "\n")))
    }
    if (all(!is.element(select_groups, active_groups))) {
      stop(paste("None of the species selected are active in the model run.",
        "Check spelling and Check 'IsTurnedOn' in fgs"))
    }
  }

  # Deal with file structures

  # Load ATLANTIS output!
  at_out <- RNetCDF::open.nc(con = file.nc)
  on.exit(RNetCDF::close.nc(at_out))

  if (select_variable != "N" & all(is.element(select_groups, bps))) {
    stop("The only output for Biomasspools is N.")
  } else{
    print(paste("Read", file.nc, "successfully"))
  }

  # Get info from netcdf file! (Filestructure and all variable names)
  var_names_ncdf <- sapply(seq_len(RNetCDF::file.inq.nc(at_out)$nvars - 1),
    function(x) RNetCDF::var.inq.nc(at_out, x)$name)
  n_timesteps <- RNetCDF::dim.inq.nc(at_out, 0)$length
  n_boxes     <- RNetCDF::dim.inq.nc(at_out, 1)$length
  n_layers    <- RNetCDF::dim.inq.nc(at_out, 2)$length

  # Extract data from the ncdf file
  # Create a vector of all potential variable names
  # Only use names which are available in the ncdf-file as an
  # extraction of missing variables is not possible
  # Create vector of available species at the end using search_clean
  # This is needed to create species-names later on

  # To get what I hope is catch output in tons,
  # keep variable names with CODES instead of species names

  # get code matching species name and use as select groups
  select_groups <- fgs$Code[which(fgs$Name %in% select_groups_name)]

  # and _FC with a number corresponding to the fishery in the Fisheries.csv

  if(select_variable %in% c("Catch", "Discard")){
    #input_select_variable <- select_variable
    #read in fleet names
    fleetnames <- atlantisom::load_fisheries(dir = dir, file_fish = file_fish)
    #select_variable <- concatenate select_variable "_" each fleet name
    svfish <- c()
    for(i in select_variable){
      sv <-paste0(i, "_FC", fleetnames$Index)
      svfish <- c(svfish, sv)
    }
    select_variable <- svfish
  }

  # To make the creation of variables as robust as possible
  # we introduce different combinations of groups, variable, and cohort
  # Only combinations present in the ncdf are used later on
  # Loop over select_groups to use the same ordering

  search <- list()
  for (i in seq_along(select_groups)) {
    search[[i]] <- c(
      unlist(paste0(select_groups[i], select_variable)),  # GroupVariable
      unlist(paste(select_groups[i], select_variable,
                   sep = "_"))                           # Group_Variable
    )
    search[[i]] <- search[[i]][is.element(search[[i]], var_names_ncdf)]
    search[[i]] <- unique(search[[i]])
  }
  search_clean <- do.call(c, search)
  # If the combination of select_groups and select_variable ends up not being found.
  if (length(search_clean) == 0) return(0)

  at_data <- lapply(search_clean, RNetCDF::var.get.nc, ncfile = at_out)

  # Get final species and number of ageclasses per species
  final_species <- select_groups[sapply(
    lapply(select_groups, grepl, x = search_clean), any)]

  # # Get final fleets for full age structure fishery output--wrong order
  # if(input_select_variable %in% c("Catch", "Discard")){
  #   final_fleet <- fleetnames$Code[sapply(
  #     lapply(select_variable, grepl, x = search_clean), any)]
  # }

  num_layers <- RNetCDF::var.get.nc(ncfile = at_out, variable = "numlayers")[, 1]
  # add sediment layer!
  num_layers <- num_layers + ifelse(num_layers == 0, 0, 1)

  # Create an array of layerids.
  # Every entry in the array indicates if a layer is present (= 1) or not (= 0).
  # Boxes without layers (= islands) have only 0s as id,
  # used later on to remove data from non-existent layers!
  # By default output should be 0 in those layers.
  # Layers in boundary boxes are set to 0 if bboxes is anything other than NULL!
  # Applying a boolean array to an array results in a vector!
  for (i in seq_along(num_layers)) {
    if (i == 1) layerid <- array(dim = c(n_layers, n_boxes))
    if (num_layers[i] == 0) {
      layerid[, i] <- 0
    } else {
      if (!is.null(bboxes) & is.element((i - 1), bboxes)) {
        layerid[, i] <- 0
      } else {
        layerid[, i] <- c(rep(0, times = n_layers - num_layers[i]),
                          rep(1, times = num_layers[i]))
      }
    }
  }

  # Create vectors for polygons and layers
  # Each vector has the length equal to one time-step
  # All data from islands and non-existent layers is removed
  # Therefore the length of these
  # vectors is equal for each extracted variable
  boxes <- 0:(n_boxes - 1)
  # Remove islands and boundary boxes
  island_ids <- num_layers == 0
  if (!is.null(bboxes)) {
    boundary_ids <- is.element(boxes, bboxes)
    island_ids <- island_ids | boundary_ids
  }
  boxes <- boxes[!island_ids]
  num_layers <- num_layers[!island_ids]

  polygons <- rep(boxes, times = num_layers)
  layers <- sapply(num_layers[num_layers != 0] - 2,
    function(x) c(seq(x, from = 0, by = 1), n_layers - 1))
  if (any(sapply(layers, length) != num_layers[num_layers != 0])) {
    stop("Number of layers incorrect.")
  }
  layers <- do.call(c, layers)
  if (length(polygons) != length(layers)) {
    stop("Number of polygons and layers do not match.")
  }

  # In the following section the data is transformed to a long dataframe
  # I haven't found any solution to vectorize the creation of the dataframe
  # columns (species, age, polygons,...)
  # when data from 2d and 3d arrays
  # (e.g. select_variable = "N" all biomasspools are only present in the
  # sediment layer.) are read in simultaneously.
  # Therefore the current "messy" solution splits the data
  # in 2 subpopulations: 2d-data and 3d-data
  at_data3d <- at_data[which(sapply(at_data, function(x) length(dim(x))) == 3)]
  at_data2d <- at_data[which(sapply(at_data, function(x) length(dim(x))) == 2)]

  int_fs <- final_species
  int_fa <- rep(1, length(final_species))

  if(input_select_variable %in% c("Catch", "Discard")){
    # how many unique fleets per select_group in search_clean?
    # lookup of species and fleet names
    splitfleets <- search_clean %>%
      stringr::str_replace(paste0("_",input_select_variable,"_FC"), "-")
    sp_fleet <- as.data.frame(unique(splitfleets)) %>%
      dplyr::rename(spfleet = "unique(splitfleets)") %>%
      tidyr::separate(spfleet, c("species", "fleet"), sep = "-")
    # lookup of number of fleets per species
    sp_nfleet <- as.data.frame(table(sp_fleet$species)) %>%
      dplyr::rename(species = Var1, nfleet = Freq)
    final_fleet <- sp_nfleet[match(final_species, sp_nfleet$species),]
    # for indexing fleets
    int_ff <- final_fleet$nfleet
  }

  # 3d should not apply here
  # if (length(at_data3d) >= 1) {
  #   # Remove biomasspools if selected variable is "N"!
  #   if (input_select_variable == "N") {
  #     int_fs <- final_species[!is.element(final_species, bps)]
  #     int_fa <- final_agecl[!is.element(final_species, bps)]
  #     # Note this only works if age-structured vertebrates have 10 ageclasses
  #     # there is no "N" output in annage files so has no impact
  #     int_fa[int_fa == 10] <- 1
  #   }
  #   for (i in seq_along(at_data3d)) {# for loop over all variables
  #     if (i == 1) result3d <- list()
  #     for (j in 1:n_timesteps) {# loop over timesteps
  #       if (j == 1) values <- array(dim = c(length(layers), n_timesteps))
  #       values[, j] <- at_data3d[[i]][,, j][layerid == 1]
  #     }
  #     result3d[[i]] <- as.vector(values)
  #   }
  #   result3d <- data.frame(
  #     species = unlist(sapply(
  #       X = mapply(FUN = rep, x = int_fs, each = int_fa, SIMPLIFY = FALSE,
  #       USE.NAMES = FALSE),
  #       FUN = rep, each = length(layers) * n_timesteps, simplify = FALSE)),
  #     agecl = unlist(sapply(
  #       X = sapply(X = int_fa, FUN = seq, from = 1, by = 1, simplify = FALSE,
  #       USE.NAMES = FALSE),
  #       FUN = rep, each = length(layers) * n_timesteps, simplify = FALSE)),
  #     polygon = unlist(sapply(
  #       X = n_timesteps * int_fa, FUN = rep, x = polygons, simplify = F,
  #       USE.NAMES = FALSE)),
  #     layer = unlist(sapply(
  #       X = n_timesteps * int_fa, FUN = rep, x = layers, simplify = FALSE,
  #       USE.NAMES = FALSE)),
  #     time = unlist(sapply(
  #       X = int_fa, FUN = rep, x = rep(0:(n_timesteps - 1), each = length(layers)),
  #       simplify = FALSE, USE.NAMES = FALSE)),
  #     atoutput = do.call(c, result3d),
  #     stringsAsFactors = FALSE)
  # }

  if (length(at_data2d) >= 1) {
    # # Only select biomasspools if selected variable is "N"!
    # if (input_select_variable == "N") {
    #   int_fs <- final_species[is.element(final_species, bps)]
    #   int_fa <- final_agecl[is.element(final_species, bps)]
    # }
    # # age-structured invert groups are combined in ncdf file!
    # if (input_select_variable == "Grazing") int_fa <- 1

    for (i in seq_along(at_data2d)) {# for loop over all variables!
      if (i == 1) result2d <- list()
      for (j in 1:n_timesteps) {# loop over timesteps
        if (j == 1) values <- array(dim = c(length(boxes), n_timesteps))
        values[, j] <- at_data2d[[i]][, j][boxes + 1]
      }
      result2d[[i]] <- as.vector(values)
    }

    # Order of the data in value column = "atoutput".
    # 1. species  --> rep each with the number of
    #                 ageclasses * fleets and n_timesteps * boxes
    # 2. age      --> rep each (1:maxage for each species) with n_timesteps * boxes
    # 3. timestep --> rep each timestep (1:n_timesteps)
    #                 with the number of boxes and final_agecl
    #                 (num ages per species)
    # 4. polygon  --> rep boxes times n_timesteps * final_agecl
    #                 (num ages per species)
    # 5. fleet    -->

    # if(input_select_variable %in% c("Nums", "Weight")){
    #
    # result2d <- data.frame(species = unlist(sapply(
    #   X = mapply(FUN = rep, x = int_fs, each = int_fa, SIMPLIFY = FALSE,
    #              USE.NAMES = FALSE),
    #   FUN = rep, each = length(boxes) * n_timesteps, simplify = FALSE)),
    #   agecl = unlist(sapply(X = sapply(X = int_fa, FUN = seq, from = 1,
    #                                    by = 1, simplify = FALSE, USE.NAMES = FALSE),
    #                         FUN = rep, each = length(boxes) * n_timesteps, simplify = FALSE)),
    #   polygon = unlist(sapply(X = n_timesteps * int_fa,
    #                           FUN = rep, x = boxes, simplify = FALSE, USE.NAMES = FALSE)),
    #   time = unlist(sapply(X = int_fa, FUN = rep, x = rep(0:(n_timesteps - 1),
    #                                                       each = length(boxes)), simplify = FALSE, USE.NAMES = FALSE)),
    #   atoutput = do.call(c, result2d),
    #   stringsAsFactors = F)
    #   if (select_variable == "N") result2d$layer <- n_layers - 1
    # }

    #should now work properly for multiple fleets
    if(input_select_variable %in% c("Catch", "Discard")){
      result2d <- data.frame(species = unlist(sapply(X = mapply(FUN = rep, x = int_fs,
                   each = (int_fa * int_ff),SIMPLIFY = FALSE,USE.NAMES = FALSE),
        FUN = rep, each = length(boxes) * n_timesteps, simplify = FALSE)),
        agecl = unlist(sapply(X = sapply(X = rep(int_fa, int_ff), FUN = seq, from = 1,
                                         by = 1, simplify = FALSE, USE.NAMES = FALSE),
                              FUN = rep, each = (length(boxes) * n_timesteps), simplify = FALSE)),
        polygon = unlist(sapply(X = n_timesteps * int_fa * int_ff,
                                FUN = rep, x = boxes, simplify = FALSE, USE.NAMES = FALSE)),
        fleet = unlist(sapply(X = mapply(FUN = rep, x = sp_fleet$fleet,
                                         each = (rep(int_fa, int_ff)),SIMPLIFY = FALSE,USE.NAMES = FALSE),
          FUN = rep, each = (length(boxes) * n_timesteps), simplify = FALSE)),
        time = unlist(sapply(X = int_fa * int_ff , FUN = rep, x = rep(0:(n_timesteps - 1),
                                                            each = length(boxes)), simplify = FALSE, USE.NAMES = FALSE)),
        atoutput = do.call(c, result2d),
        stringsAsFactors = F)
      #if (select_variable == "N") result2d$layer <- n_layers - 1
    }

  }

  # Combine dataframes if necessary!
  # if (all(sapply(lapply(at_data, dim), length) == 3) & input_select_variable != "N") {
  #   result <- result3d
  # }
  if (all(sapply(lapply(at_data, dim), length) == 2) & input_select_variable != "N") {
    result <- result2d
  }
  if (input_select_variable == "N") {
    if (length(at_data2d) >= 1 & length(at_data3d) == 0) result <- result2d
    if (length(at_data2d) == 0 & length(at_data3d) >= 1) result <- result3d
    if (length(at_data2d) >= 1 & length(at_data3d) >= 1) {
      result <- rbind(result2d, result3d)
    }
  }

  # # Remove min_pools if existent (well, there always are min pools... ;)).
  # min_pools <- is.element(result$atoutput, c(0, 1e-08, 1e-16))
  # if (length(min_pools) > 0) {
  #   # exclude 1st timestep and sediment layer from calculation
  #   print_min_pools <- sum(min_pools) -
  #     length(result[min_pools & result$time == 1, 1]) -
  #     length(result[min_pools & result$time > 1 & result$layer == 7, 1])
  #   if (print_min_pools > 0 & verbose) {
  #     warning(paste0(round(print_min_pools/dim(result)[1] * 100),
  #       "% of ", select_variable, " are true min-pools (0, 1e-08, 1e-16)"))
  #   }
  #   result <- result[!min_pools, ]
  # }

  # Remove non-existent layers.
  # WARNING: Biomass is build up (very few) in sediment layer for
  # NON sediment groups (e.g. baleen whales)
  # Therefore, I subset all data from that layer for non biomass groups and
  # groups which cannot penetrate into the sediment!
  # UPDATE: Doesn't work with layers as species are not distributed through
  # the whole water column and do not appear in
  # every polygon.

  # Sum up N for invert cohorts if invert cohorts are present!
  # NOTE: invert cohorts of size 10 are not considered!
  # if (input_select_variable == "N" & any(final_agecl != 10 & final_agecl > 1)) {
  #   result <- result %>%
  #     dplyr::group_by(polygon, layer, species, time) %>%
  #     dplyr::summarise(atoutput = sum(atoutput))
  # }

  return(result)
}

2. Test load_nc_catchtons with the NOBA, NEUS, CC model outputs

For all plots, the total catch derived from Catch.txt is considered correct, and is represented by the black line. The total catch summed across polygon, subannual timestep and fleet is in blue.

Only NOBA first, this one is model sacc30 with climate, different from sacc38 in mskeyrun without climate:

# define for each model

## NOBA (different from mskeyrun version)

atlmod <- here("config/NOBAsacc30Config.R")

select_groups <- c("Long_rough_dab",
                   "Green_halibut",
                   "Mackerel",
                   "Haddock",
                   "Saithe",
                   "Redfish",
                   "Blue_whiting",
                   "Norwegian_ssh",
                   "North_atl_cod",
                   "Polar_cod",
                   "Capelin")

stepperyr <- 5



## NEUS



##### aggregate over polygons to fleets #######
# make this a function

comparecatch <- function(atlmod, select_groups, stepperyr) {
  source(atlmod, local = TRUE)
  
  dir <- d.name
  nc_catch <- paste0(scenario.name, 'CATCH.nc')
  file_fgs <- functional.groups.file
  file_init <- initial.conditions.file
  file_fish <- fisheries.file
  
  # Get the boundary boxes
  allboxes <- atlantisom::load_box(dir = d.name, file_bgm = box.file)
  boxes <- atlantisom::get_boundary(allboxes)
  
  
  # Read in information
  # Read in the functional groups csv since that is used by many functions
  fgs <- load_fgs(dir = dir, file_fgs = file_fgs)
  # Read in the biomass pools
  bps <- load_bps(dir = dir,
                  fgs = file_fgs,
                  file_init = file_init)
  
  catchtons <- load_nc_catchtons(
    dir = dir,
    file_nc = nc_catch,
    file_fish = file_fish,
    bps = bps,
    fgs = fgs,
    select_groups = select_groups,
    select_variable = "Catch",
    check_acronyms = TRUE,
    bboxes = boxes
  )
  #if(verbose) message("Catch tons read in.")
  
  # result is output of load_nc_catchtons
  
  aggcatchtons <- catchtons %>%
    dplyr::select(-agecl) %>%
    dplyr::group_by(species, fleet, time) %>%
    dplyr::summarise(totcatch = sum(atoutput)) %>%
    dplyr::mutate(year = ceiling(time / stepperyr))
  
  # compare annual catch in tons to Catch.txt output
  yearcatchtons <- aggcatchtons %>%
    dplyr::group_by(species, year) %>%
    dplyr::summarise(yeartot = sum(totcatch))
  
  txtCatchtons <- atlantisom::load_catch(d.name, catch.file, fgs) %>%
    dplyr::filter(species %in% select_groups) %>%
    dplyr::mutate(year = time / 365) %>%
    dplyr::left_join(dplyr::select(fgs, Code, Name), by = c("species" = "Name"))
  
  comparetons <- txtCatchtons %>%
    dplyr::left_join(yearcatchtons, by = c("Code" = "species", "year" = "year")) %>%
    ggplot2::ggplot() +
    geom_line(aes(year, atoutput)) +
    geom_point(aes(year, yeartot), color = "blue", alpha = 0.2) +
    facet_wrap( ~ species, scales = "free_y")
  
  # looks like a match!
  comparetons
  
}

comparecatch(atlmod = atlmod, 
             select_groups = select_groups,
             stepperyr = stepperyr)
## [1] "Read /Users/sarah.gaichas/Documents/0_Data/Atlantis/Poseidon/poseidon-dev/atlantisoutput/NOBA_sacc_30/nordic_runresults_01CATCH.nc successfully"

Now try the Isaac’s recent CC model:

## CC

atlmod <- here("config/CCConfigSep22.R")

select_groups <- c("Arrowtooth_flounder",
                   "Petrale_sole",
                   "Pacific_sardine",
                   "Anchovy",
                   "Herring",
                   "Pacific_Ocean_Perch",
                   "Bocaccio_rockfish",
                   "Yelloweye_rockfish",
                   "Mesopel_M_Fish", #Pacific hake
                   "Mesopel_N_Fish", #Sablefish
                   "Demersal_P_Fish") #Dover sole

stepperyr <- 5

comparecatch(atlmod = atlmod, 
             select_groups = select_groups,
             stepperyr = stepperyr)
## [1] "Read /Users/sarah.gaichas/Documents/0_Data/Atlantis/Poseidon/poseidon-dev/atlantisoutput/CC2063_Sep2022/outputCCV3CATCH.nc successfully"

And finally the NEUS model, thanks to Joe for output files:

## NEUS
atlmod <- here("config/NEUStestConfigMay23.R")

select_groups <- c("Summerflounder",
                   "Winterflounder",
                   "Mackerel",
                   "Herring",
                   "Cod",
                   "Monkfish",
                   "Haddock",
                   "Spiny_Dogfish",
                   "Winter_Skate",
                   "Yellowtail_Flounder",
                   "Silver_Hake")

stepperyr <- 5

comparecatch(atlmod = atlmod, 
             select_groups = select_groups,
             stepperyr = stepperyr)
## [1] "Read /Users/sarah.gaichas/Documents/0_Data/Atlantis/Poseidon/poseidon-dev/atlantisoutput/NEUStest_May2023/neus_outputCATCH.nc successfully"

Looks like we have a match to Catch.txt output for all systems using the same function and inputs from the CATCH.nc file. However, it looks like there is a final value in Catch.txt that is not present, or not being summed correctly in the CATCH.nc output in the CC and NEUS models. Worth investigating…

3. Incorporate new load_nc_catchtons function into atlantisom::run_truth

Added catchtons and disctons (same function with select_variable = "Discards") to run_truth in the atlantisom dev branch. At present, catchtons and disctons are in the run_truth output and catch_all which was originally from the Catch.txt file have been removed from the output. Sections of the function that read in both the Catch.txt and the CatchPerFishery.txt files have been commented out. They could be reinstated if needed, but om_init reads in Catch.txt already and seems like a good place to read in text files.

Here is the new function, tested by re-running the code in SimData.Rmd in the mskeyrun repo using the NOBA sacc38 outputs. The new runtruth object includes the new catchtons and disctons (0) objects.

#' Load Atlantis scenario output
#'
#' Reads in data generated from an Atlantis scenario and returns a list
#' containing the desired information. The list contains the 'truth' as known
#' from the Atlantis scenario. The truth can later be sampled
#' from to create a data set with observation error.
#' Currently, the \code{run_truth} depends on the following files
#' being in your working directory (\code{dir}):
#' \itemize{
#'  \item{"functionalGroups.csv"}
#'  \item{"[...]TOTCATCH.nc"}
#'  \item{"[...]DietCheck.txt"}
#' },
#' where [...] specifies the entry used for the \code{scenario} argument.
#'
#' @family run functions
#' @author Sean Lucey, Kelli Faye Johnson
#'
#' @template scenario
#' @template dir
#' @template file_fgs
#' @template file_bgm
#' @template select_groups
#' @template file_init
#' @template file_biolprm
#' @template file_runprm
#' @template verbose
#' @template save
#'
#' @return Returns a list object.
#' @export
#' @examples
#' d <- system.file("extdata", "SETAS_Example", package = "atlantisom")
#' groups <- load_fgs(dir = d, "Functional_groups.csv")
#' truth <- run_truth(scenario = "outputs",
#'   dir = d,
#'   file_fgs = "Functional_groups.csv",
#'   file_bgm = "Geography.bgm",
#'   select_groups = groups[groups$IsTurnedOn > 0, "Name"],
#'   file_init = "Initial_condition.nc",
#'   file_biolprm = "Biology.prm",
#'   file_runprm = "Run_settings.xml",
#'   file_fish = "Fisheries.csv")
#' str(truth)
#' rm(truth)
#'
run_truth <- function(scenario, dir = getwd(),
  file_fgs, file_bgm, select_groups, file_init, file_biolprm, file_runprm,
  file_fish, verbose = FALSE, save = TRUE, annage = FALSE){

  # Read in information
  # Read in the functional groups csv since that is used by many functions
  fgs <- load_fgs(dir = dir, file_fgs = file_fgs)
  # Read in the biomass pools
  bps <- load_bps(dir = dir, fgs = file_fgs, file_init = file_init)
  # Read in the biological parameters
  biol <- load_biolprm(dir = dir, file_biolprm = file_biolprm)
  # Read in the run parameters
  runprm <- load_runprm(dir = dir, file_runprm = file_runprm)

  nc_catch <- paste0(scenario, 'CATCH.nc')
  dietcheck <- paste0(scenario, 'DietCheck.txt')
  nc_out <- paste0(scenario, ".nc")
  nc_prod <- paste0(scenario, "PROD.nc")
  file_catchfish <- file.path(dir,
    paste0(scenario, "CatchPerFishery.txt"))
  file_catch <- paste0(scenario, "Catch.txt")

  if(annage){
    if(!file.exists(paste0(file.path(dir,paste0(scenario, 'ANNAGEBIO.nc'))))){
      stop("ANNAGEBIO.nc file not found")
    }
    if(!file.exists(paste0(file.path(dir,paste0(scenario, 'ANNAGECATCH.nc'))))){
      stop("ANNAGECATCH.nc file not found")
    }
    nc_annagebio <- paste0(scenario, 'ANNAGEBIO.nc')
    nc_annagecatch <- paste0(scenario, 'ANNAGECATCH.nc')
  }

  # Get the boundary boxes
  allboxes <- load_box(dir = dir, file_bgm = file_bgm)
  boxes <- get_boundary(allboxes)

  #Extract from NetCDF files
  # Need: dir, file_nc, bps, fgs, select_groups, select_variable,
  # check_acronyms, bboxes

  nums <- load_nc(dir = dir,
                  file_nc = nc_out,
                  bps = bps,
                  fgs = fgs,
                  select_groups = select_groups,
                  select_variable = "Nums",
                  check_acronyms = TRUE,
                  bboxes = boxes)
  if(verbose) message("Numbers read in.")

  resn <- load_nc(dir = dir,
                  file_nc = nc_out,
                  bps = bps,
                  fgs = fgs,
                  select_groups = select_groups,
                  select_variable = "ResN",
                  check_acronyms = TRUE,
                  bboxes = boxes)
  if(verbose) message("Reserve nitrogen read in.")

  structn <- load_nc(dir = dir,
                  file_nc = nc_out,
                  bps = bps,
                  fgs = fgs,
                  select_groups = select_groups,
                  select_variable = "StructN",
                  check_acronyms = TRUE,
                  bboxes = boxes)
  if(verbose) message("Structural nitrogen read in.")

  eat <- load_nc(dir = dir,
                     file_nc = nc_prod,
                     bps = bps,
                     fgs = fgs,
                     select_groups = select_groups,
                     select_variable = "Eat",
                     check_acronyms = TRUE,
                     bboxes = boxes)
  if(verbose) message("Eaten read in.")

  grazing <- load_nc(dir = dir,
                 file_nc = nc_prod,
                 bps = bps,
                 fgs = fgs,
                 select_groups = select_groups,
                 select_variable = "Grazing",
                 check_acronyms = TRUE,
                 bboxes = boxes)
  if(verbose) message("Grazing read in.")

  vol <- load_nc_physics(dir = dir,
                         file_nc = nc_out,
                         physic_variables = "volume",
                         aggregate_layers = FALSE,
                         bboxes = boxes)
  if(verbose) message("Volume read in.")

  catch <- load_nc(dir = dir,
                 file_nc = nc_catch,
                 bps = bps,
                 fgs = fgs,
                 select_groups = select_groups,
                 select_variable = "Catch",
                 check_acronyms = TRUE,
                 bboxes = boxes)
  if(verbose) message("Catch in numbers at agecl read in.")

  catchtons <- load_nc_catchtons(dir = dir,
                                 file_nc = nc_catch,
                                 file_fish = file_fish,
                                 bps = bps,
                                 fgs = fgs,
                                 select_groups = select_groups,
                                 select_variable = "Catch",
                                 check_acronyms = TRUE,
                                 bboxes = boxes
  )
  if(verbose) message("Catch tons by fleet read in.")

  disctons <- load_nc_catchtons(dir = dir,
                                 file_nc = nc_catch,
                                 file_fish = file_fish,
                                 bps = bps,
                                 fgs = fgs,
                                 select_groups = select_groups,
                                 select_variable = "Discard",
                                 check_acronyms = TRUE,
                                 bboxes = boxes
  )
  if(verbose) message("Discard tons by fleet read in.")


  if(annage){
    numsage <- load_nc_annage(dir = dir,
                              file_nc = nc_annagebio,
                              bps = bps,
                              fgs = fgs,
                              biolprm = biol,
                              select_groups = select_groups,
                              select_variable = "Nums",
                              check_acronyms = TRUE,
                              bboxes = boxes,
                              verbose = TRUE)
    if(verbose) message("Numbers read in from ANNAGEBIO.")

    # Weight output seems wrong compared with standard nc weights
    # Don't include until we can sort this out
    # weightage <- load_nc_annage(dir = dir,
    #                             file_nc = nc_annagebio,
    #                             bps = bps,
    #                             fgs = fgs,
    #                             biolprm = biol,
    #                             select_groups = select_groups,
    #                             select_variable = "Weight",
    #                             check_acronyms = TRUE,
    #                             bboxes = boxes,
    #                             verbose = TRUE)
    # if(verbose) message("Weight read in from ANNAGEBIO.")

    catchage <- load_nc_annage(dir = dir,
                              file_nc = nc_annagecatch,
                              file_fish = file_fish,
                              bps = bps,
                              fgs = fgs,
                              biolprm = biol,
                              select_groups = select_groups,
                              select_variable = "Catch",
                              check_acronyms = TRUE,
                              bboxes = boxes,
                              verbose = TRUE)
    if(verbose) message("Catch in numbers at true age read in from ANNAGECATCH.")

    discage <- load_nc_annage(dir = dir,
                               file_nc = nc_annagecatch,
                               file_fish = file_fish,
                               bps = bps,
                               fgs = fgs,
                               biolprm = biol,
                               select_groups = select_groups,
                               select_variable = "Discard",
                               check_acronyms = TRUE,
                               bboxes = boxes,
                               verbose = TRUE)
    if(verbose) message("Discard read in from ANNAGECATCH.")

  }

  # May 2019 this is the catch in nums correction needed for legacy atlantis codebases

  # check for logfile, send warning if not found.
  if(file.exists(paste0(file.path(dir, "log.txt")))){
    #if found compare codedate and do catch numbers correction if necessary
    logfile <- paste0(file.path(dir, "log.txt"))
    codedate <- system(paste0("grep 'Atlantis SVN' ", logfile), intern = TRUE)
    codedate <- as.Date(stringr::str_extract(codedate, "\\d+[- \\/.]\\d+[- \\/.]\\d+"))
    if(codedate < "2015-12-15"){
      if(verbose) message("Catch numbers correction needed for this codebase, starting")
      # read in initial conditions NC file
      at_init <- RNetCDF::open.nc(con = file.path(dir, file_init))

      # Get info from netcdf file! (Filestructure and all variable names)
      var_names_ncdf <- sapply(seq_len(RNetCDF::file.inq.nc(at_init)$nvars - 1),
                               function(x) RNetCDF::var.inq.nc(at_init, x)$name)
      numlayers <- RNetCDF::var.get.nc(ncfile = at_init, variable = "numlayers")

      RNetCDF::close.nc(at_init)

      # are these in box order??? if so make a box-numlayer lookup
      layerbox.lookup <- data.frame(polygon=c(0:(allboxes$nbox - 1)), numlayers)

      catch.tmp <- merge(catch, layerbox.lookup)

      # divide the numbers at age by (86400 * number_water_column_layers_in_the_box)
      # replace truth$catch atoutput with correction
      catch <- catch.tmp %>%
        mutate(atoutput = atoutput / (86400 * numlayers)) %>%
        select(species, agecl, polygon, time, atoutput)
      if(verbose) message("Catch numbers corrected")
    }else{
      message("Codebase later than December 2015, no correction needed")
    }
  }else{
    warning(strwrap(prefix = " ", initial = "",
                    "log.txt file not found; catch in numbers correction not done. For Atlantis SVN dates prior to December 2015, CATCH.nc output units were incorrect. Correction requires presence of log.txt file in the directory."))
  }

  # May 2023 not using catch per fishery and can get this from catchtons above
  # also not using catch.txt output here, read in at om_init stage so drop here

  # catchfish <- read.table(file_catchfish, header = TRUE)
  # over <- colnames(catchfish)[-(1:2)]
  # catchfish <- reshape(catchfish, direction = "long",
  #   varying = over, v.names = "catch",
  #   timevar = "species", times = over)
  # rownames(catchfish) <- 1:NROW(catchfish)
  # catchfish <- catchfish[catchfish$catch > 0,
  #   -which(colnames(catchfish) == "id")]
  # catchfish$species <- fgs$Name[match(catchfish$species, fgs$Code)]
  # colnames(catchfish) <- tolower(colnames(catchfish))
  # catchfish$time <- catchfish$time / runprm$toutfinc
  # if(verbose) message("Catch per fishery read in.")

  # # Get catch from txt. Sum per species and compare with values from nc-file!
  # catch_all <- load_catch(dir = dir, file = file_catch, fgs = fgs)
  # # over <- colnames(catch_all)[(colnames(catch_all) %in% fgs$Code)]
  # # catch_all <- reshape(catch_all[, c("Time", over)], direction = "long",
  # #   varying = over, v.names = "catch",
  # #   timevar = "species", times = over)
  # # rownames(catch_all) <- 1:NROW(catch_all)
  # # catch_all <- catch_all[catch_all$catch > 0,
  # #   -which(colnames(catch_all) == "id")]
  # # catch_all$species <- fgs$Name[match(catch_all$species, fgs$Code)]
  # # colnames(catch_all) <- tolower(colnames(catch_all))
  # catch_all$time <- catch_all$time / runprm$toutfinc
  # if(verbose) message("Catch for all fisheries in biomass read in.")

  diet <- load_diet_comp(dir = dir, file_diet = dietcheck, fgs = fgs)
  diet <- diet[diet$atoutput>0,]
  if(verbose) message("Global diet composition (proportion) read in.")

  # May 2019 let's not do the catch calcs until they are corrected
  # Sept 2020 nor the biomass_eaten

  if(verbose) message("Start calc_functions")
  # catchbio <- calc_biomass_age(nums = catch,
  #   resn = resn, structn = structn, biolprm = biol)
  biomass_eaten <- calc_pred_cons(eat = eat,
     grazing = grazing, vol = vol, biolprm = biol,
     runprm = runprm)
  biomass_ages <- calc_biomass_age(nums = nums,
    resn = resn, structn = structn, biolprm = biol)
  # bio_catch <- calc_biomass_age(nums = catch,
  #   resn = resn, structn = structn, biolprm = biol)
  #
  # bio_catch <- aggregate(atoutput ~ species + time,
  #   data = bio_catch, sum)

  # todo: check that the biomass of the catches are correct
  # also should catch in biomass be exported as well
  # as catch in numbers?
  # check <- merge(catch_all, bio_catch,
  #   by = c("species", "time"))
  # check$check <- with(check, atoutput / catch)

  # SKG May 2019, no export of catch in biomass for now
  # does not match catch.txt output file
  # read that in separately instead

  # SKG May 2023, catch in biomass now from CATCH.nc
  # catchtons shown to match Catch.txt file
  # this is catch by fleet, polygon, time, aggregated over all ages
  # if discards are included they will also be output

  # SKG Sept 2020, no export of biomass_eaten
  # does not match DetailedDietCheck.txt output
  # polygon-specific consumption will be from DetailedDietCheck
  # files are too big to combine; do separately from run_truth
  # export model-wide true diet comp though

  # SKG June 2022, biomass_eaten is now total consumption
  # based on the new calc_pred_cons() which gives results
  # on a more reasonable scale, adding back to output object

  if(!annage){
    result <- list("biomass_ages" = biomass_ages,
                   "biomass_eaten" = biomass_eaten,
                   "catch" = catch,
                   "catchtons" = catchtons,
                   "disctons" = disctons,
                   "nums" = nums,
                   "resn" = resn,
                   "structn" = structn,
                   "diet" = diet,
                   "biolprm" = biol,
                   "fgs" = fgs)
  }

  if(annage){
    result <- list("biomass_ages" = biomass_ages,
                   "biomass_eaten" = biomass_eaten,
                   "catch" = catch,
                   "catchtons" = catchtons,
                   "disctons" = disctons,
                   "nums" = nums,
                   "numsage" = numsage,
                   "catchage" = catchage,
                   "discage" = discage,
                   "resn" = resn,
                   "structn" = structn,
                   "diet" = diet,
                   "biolprm" = biol,
                   "fgs" = fgs)
  }

  if(verbose) message("Start writing to HDD.")
  if(save) {
    save(result,
      file = file.path(dir, paste0(scenario, "run_truth.RData")))
  }

  invisible(result)
}

4. Adjust atlantisom::om_init for the new catchtons output

Actually, no adjustment needed for om_init. This was run as is to produce the new truth output. I think we keep reading in Catch.txt here.

5. Adjust atlantisom::om_species for the new catchtons output

Added the catchtons and disctons truth outputs here, subset by code rather than species name, update documentation. Note that code will need to be changed to species prior to applying the fishery sampling functions.

New function, tested by re-running code in SimData.Rmd in mskeyrun as above. The most recent omlist_ss.rds file in the mskeyrun simulated data atlantisoutput NOBAsacc38 directory reflects the updated catch in tons at full resolution.

#'Species-specific atlantisom outputs
#'@description A wrapper function to reduce a set of \code{atlantisom} objects from atlantis
#'output to a single species or subset of species for creating assessment input datasets.
#'Optionally includes diet data (set diet=TRUE in function call, default is FALSE)
#'@param species The species to sample in the survey and fishery (a vector)
#'@param omlist output of \code{om_init}
#'@return Returns an omlist_ss list object containing dataframes and lists:
#' \itemize{
#'  \item{species_ss, species name(s)}
#'  \item{code_ss, species code from funct.groups}
#'  \item{truetotbio_ss, dataframe output of \code{load_bioind}, species_ss only}
#'  \item{truecatchbio_ss, dataframe output of \code{load_catch}, species_ss only}
#'  \item{YOY_ss, dataframe young of year output, code_ss only}
#'  \item{truenums_ss, numbers at age output of \code{run_truth}, species_ss only}
#'  \item{truebio_ss, biomass at age output of \code{run_truth}, species_ss only}
#'  \item{trueresn_ss, reserve nitrogen output of \code{run_truth}, species_ss only}
#'  \item{truestructn_ss, structural nitrogen output of \code{run_truth}, species_ss only}
#'  \item{truecatchnum_ss, fishery catch at age output of \code{run_truth}, species_ss only}
#'  \item{truecons_ss, total consumption output of \code{run_truth}, species_ss only}
#'  \item{truecatchtons_ss, fishery catch by fleet output of \code{run_truth}, code_ss only}
#'  \item{truedisctons_ss, fishery discard by fleet output of \code{run_truth}, code_ss only}
#'  \item{truenumsage_ss, true numbers at annual age output of \code{run_truth}, species_ss only}
#'  \item{truecatchage_ss, fishery catch at annual age output of \code{run_truth}, species_ss only}
#'  \item{truediscage_ss, fishery discard at annual age output of \code{run_truth}, species_ss only}
#'  \item{funct.groups_ss, dataframe of species characteristics, species_ss only}
#'  \item{biol, list of biological parameters passed from input omlist}
#'  \item{boxpars, dataframe of spatial parameters}
#'  \item{runpar, list of run parameters passed from input omlist}
#' },
#'
#'@export
#'
#'@family wrapper functions
#'@author Sarah Gaichas
#'
#'@examples
#'\dontrun{
#' # assuming CC3om is output of om_init(here("config/CC3config.r"))
#'CC3om_sardine <- om_species(c("Pacific_sardine"), CC3om)
#'
#'CC3om_2spp <- om_species(c("Pacific_sardine", "Mesopel_M_Fish"), CC3om)
#'}
#'
om_species <- function(species = spp, omlist, save = TRUE,
                       removefullom = TRUE){
  # spp format c("speciesname1", "speciesname2")
  if(!all(species %in% omlist$funct.group.names)) stop("species name not found")
  species_ss <- species

  #subset species true bio
  truetotbio_ss <- omlist$truetotbio[omlist$truetotbio$species %in% species_ss,]

  #subset species true catch
  truecatchbio_ss <- omlist$truecatchbio[omlist$truecatchbio$species %in% species_ss,]

  #subset species YOY
  # get code matching species name to split YOY file
  code_ss <- omlist$funct.groups$Code[which(omlist$funct.groups$Name %in% species_ss)]
  # cut to a single species in YOY file
  YOY_ss <- omlist$YOY %>%
    select(Time, paste0(code_ss, ".0"))
  # reformat to be like all the other objects

  # numbers at agecl at full resolution (all polygons and layers)
  truenums_ss <- omlist$truth$nums[omlist$truth$nums$species %in% species_ss,]

  # biomass at agecl at full resolution (all polygons and layers)
  truebio_ss <- omlist$truth$biomass_ages[omlist$truth$biomass_ages$species %in% species_ss,]

  # reserve nitrogen at agecl at full resolution
  trueresn_ss <- omlist$truth$resn[omlist$truth$resn$species %in% species_ss,]

  # structural nitrogen at agecl at full resolution
  truestructn_ss <- omlist$truth$structn[omlist$truth$structn$species %in% species_ss,]

  # catch in numbers at agecl at full resoluation (all polygons, no layer in output)
  truecatchnum_ss <- omlist$truth$catch[omlist$truth$catch$species %in% species_ss,]

  # catch in tons at full resolution by fleet (all polygons, no layer or agecl)
  truecatchtons_ss <- omlist$truth$catchtons[omlist$truth$catchtons$species %in% code_ss,]

  # discard in tons at full resolution by fleet (all polygons, no layer or agecl)
  truedisctons_ss <- omlist$truth$disctons[omlist$truth$disctons$species %in% code_ss,]

  # consumption (biomass_eaten) at agecl at full resolution (all polygons, no layer in output)
  # based on atlantisom::calc_pred_cons()
  truecons_ss <- omlist$truth$biomass_eaten[omlist$truth$biomass_eaten$species %in% species_ss,]

  # if annage output exists, add in, otherwise fill with NULL
  if("numsage" %in% names(omlist$truth)){
    truenumsage_ss <- omlist$truth$numsage[omlist$truth$numsage$species %in% species_ss,]
  } else {truenumsage_ss <- NULL}

  if("catchage" %in% names(omlist$truth)){
    truecatchage_ss <- omlist$truth$catchage[omlist$truth$catchage$species %in% species_ss,]
  } else {truecatchage_ss <- NULL}

  if("discage" %in% names(omlist$truth)){
    truediscage_ss <- omlist$truth$discage[omlist$truth$discage$species %in% species_ss,]
  } else {truediscage_ss <- NULL}

  # original biomass_eaten was not correct by polygon, and also appeared inflated
  # a separate set of functions will get true diet by polygon from the
  # DetailedDietCheck.txt file
  # now biomass_eaten is based on atlantisom::calc_pred_cons() and included above by default

  # if(diet){
  #   truebioeaten_ss <- omlist$truth$biomass_eaten[omlist$truth$biomass_eaten$species %in% species_ss,]
  # } else {truebioeaten_ss <- NULL}

  #subset species biol parameters? no, may miss some globals that aren't by species

  #subset species functional group info
  funct.groups_ss <- omlist$funct.groups[omlist$funct.groups$Code %in% code_ss,]

  #keep the runpar and also need boxes for survey selection

  omlist_ss <- list("species_ss" = species_ss,
                    "code_ss" = code_ss,
                    "truetotbio_ss" = truetotbio_ss,
                    "truecatchbio_ss" = truecatchbio_ss,
                    "YOY_ss" = YOY_ss,
                    "truenums_ss" = truenums_ss,
                    "truebio_ss" = truebio_ss,
                    "trueresn_ss" = trueresn_ss,
                    "truestructn_ss" = truestructn_ss,
                    "truecatchnum_ss" = truecatchnum_ss,
                    "truecons_ss" = truecons_ss,
                    "truecatchtons_ss" = truecatchtons_ss,
                    "truedisctons_ss" = truedisctons_ss,
                    "truenumsage_ss" = truenumsage_ss,
                    "truecatchage_ss" = truecatchage_ss,
                    "truediscage_ss" = truediscage_ss,
                    "funct.group_ss" = funct.groups_ss,
                    "biol" = omlist$biol,
                    "boxpars" = omlist$boxpars,
                    "runpar" = omlist$runpar)

  if(save){
    saveRDS(omlist_ss, file.path(d.name, paste0(scenario.name, "omlist_ss.rds")))
  }
  if(removefullom) rm(omlist) #not removing passed data object

  return(omlist_ss)
}

6. Adjust atlantisom::om_index for the new catchtons output

The complex part. This requires adjusting the atlantisom::create_fishery_subset to take catchtons as an input, then feeding this into the index using atlantisom::sample_fishery_totalcatch and applying cv at a more disaggregated point, then summing to the annual index so that subannual and annual indices are consistent. Then om_index will output the subannual index only, or both? It should probably just make one and users can decide how to aggregate to annual (ie. mskeyrun will provide both a subannual and annual index based on the same om_index output).

6a. Update om_index

The fishery portion of this wrapper can now have parallel structure to the survey portion. It allows multiple fishery.R config files so the user can get a different index for each fleet if that is desired. Specifications for which polygons are kept for which fleet number would be in the fishery.R config file, with an appropriate fleet.name.

I think we want to aggregate across all fleets for a species for the given set of polygons: for each fishery.R file the first step is to select fleets identified in fishery.R and aggregate across those fleets for the polygons specified in fishery.R, keeping species separated out–this is done in create_fishery_subset below.

Note that allowing for multiple fishery.R files for each fleet means we need to make a parallel change to om_comps.

Note that species in here are the codes not the names, so conversion back to names needs to happen somewhere. The fishery.R file now specifies the species codes in fishspp instead of species names (survspp still does that).

Time in this output is timestep not days.

Here is the new om_index

#'Generate index data from atlantisom
#'
#'#'@description A wrapper function to create survey and fishery index data for assessment input.
#'Takes the output of \code{om_species}. Wrapper can generate replicates. Saves output as .rds
#'Results for more than one survey are generated with multiple survey config files and
#'saved as separate .rds files.
#'@param usersurvey survey config file in format of /config/usersurvey.R
#'@param userfishery fishery config file in format of /config/fisherycensus.R
#'@param omlist_ss output of \code{om_species}
#'@param n_reps number of replicate indices to be generated
#'@template save
#'@return Returns list objects containing dataframes of survey biomass index and total catch:
#' \itemize{
#'  \item{survObsBiomB, list of replicate dataframes of observed survey biomass (tons)}
#'  \item{fishObsCatchB, list of replicate dataframes of observed fishery catch (tons)}
#' },
#'
#'@export
#'
#'@family wrapper functions
#'@author Sarah Gaichas
#'
#'@examples
#'\dontrun{
#' # assuming CC3om is output of om_init(here("config/CC3config.r"))
#' # and CC3om_sardine <- om_species(c("Pacific_sardine"), CC3om)
#'
#'CC3om_sard_ind <- om_index(usersurvey = here("config/usersurvey.R"),
#'     userfishery = here("config/fisherycensus.R"),
#'     omlist_ss = CC3om_sardine,
#'     n_reps = 5,
#'     save = TRUE)
#'}
#'
om_index <- function(usersurvey = usersurvey_file,
                     userfishery = userfishery_file,
                     omlist_ss,
                     n_reps = n_reps,
                     save = TRUE){

  #one script for dimension parameters to be used in multiple functions
  source("config/omdimensions.R", local = TRUE)

  # user options for survey--default is a census with mid-year sample
  # allows muliple surveys
  survObsBiomBs <- list()

  for (s in usersurvey)
  {
    source(s, local = TRUE)

    #biomass based fishery independent survey index
    # this uses result$biomass_ages to sample biomass directly, no need for wt@age est
    survey_B <- atlantisom::create_survey(dat = omlist_ss$truebio_ss,
                                          time = survtime,
                                          species = survspp,
                                          boxes = survboxes,
                                          effic = surveffic,
                                          selex = survselex)

    # call sample_survey_biomass with a bunch of 1000s for weight at age
    # in the code it multiplies atoutput by wtatage/1000 so this allows us to use
    # biomass directly
    wtage <- data.frame(species=rep(names(age_classes), n_age_classes),
                        agecl=unlist(sapply(n_age_classes,seq)),
                        wtAtAge=rep(1000.0,sum(n_age_classes)))

    # this is the step to repeat n_reps time if we want different realizations
    # of the same survey design specified above; only observation error differs
    # using the census cv of 0 will produce identical reps!
    survObsBiomB <- list()
    for(i in 1:n_reps){
      survObsBiomB[[i]] <- atlantisom::sample_survey_biomass(survey_B, surv_cv, wtage)
    }

    #save survey indices, takes a long time to generate with lots of reps/species
    if(save){
      saveRDS(survObsBiomB, file.path(d.name, paste0(scenario.name, "_",
                                                     survey.name, "surveyB.rds")))
    }

    survObsBiomBs[[survey.name]] <- survObsBiomB
  }

  #configure the fishery, a default is in config/fisherycensus.R
  #fishery configuration can specify only area and time of observation
  #fishery species inherited from omlist_ss
  #this is total catch not by fleet, so only one "fishery"

  # 2023 update, can now get fleet specific catch by polygon
  # user options for fishery--default is a census in all areas for all fleets
  # allows multiple fisheries
  fishObsCatchBs <- list()

  for(f in userfishery){

    source(f, local = TRUE)

    # 2023 update: we can now subset fishery catch from CATCH.nc using fleet output
    # create_fishery_subset as currently written aggregates across fleets and polygons
    # change so that fleets to be aggregated together are specified in the userfishery file

    fishery_C <- atlantisom::create_fishery_subset(dat = omlist_ss$truecatchtons_ss,
                                                   time = fishtime,
                                                   fleets = fishfleets,
                                                   species = fishspp,
                                                   boxes = fishboxes)

    fishObsCatchB <- list()
    for(i in 1:n_reps){
      fishObsCatchB[[i]] <- atlantisom::sample_fishery_totcatch(fishery_C, fish_cv)
    }

    if(save){
      saveRDS(fishObsCatchB, file.path(d.name, paste0(scenario.name, "_",
                                                      fishery.name, "fishCatch.rds")))
    }

    fishObsCatchBs[[fishery.name]] <- fishObsCatchB

  }

  indices <- list("survObsBiomB" = survObsBiomBs,
                  "fishObsCatchB" = fishObsCatchBs)

  return(indices)
}

And an example fishery.R file (modified msfishery.R file used in ms-keyrun project):

# Default fishery configuration here is a census
# June 2023 now aggregating over fleets in input
# change name to identify fleet specific config files
# output will be stored with this name

fishery.name="allfleet"

# select fleets by number from fisheries.csv Index column
# NULL is all fleets
fishfleets <- NULL

# Inherits species from input omlist_ss
fishspp <- omlist_ss$code_ss 

# fishery output: learned the hard way this can be different from ecosystem outputs
fstepperyr <- if(omlist_ss$runpar$outputstepunit=="days") 365/omlist_ss$runpar$toutfinc

#The last timestep to sample
total_sample <- noutsteps-1 #495

# leave out model burn in period? define how long
burnin <- 0

# take only some years? for mskeyrun we do this later, here take all
nyears <- NULL

# same time dimensioning parameters as in surveycensus.R
#Vector of indices of catch in numbers to pull (by timestep to sum)
fish_sample_full <- c(0:total_sample)  #total_sample
#fish_burnin <- burnin*fstepperyr+1
#fish_nyears <- nyears*fstepperyr
#fish_times <- fish_sample_full[fish_burnin:(fish_burnin+fish_nyears-1)]
#fish_timesteps <- seq(fish_times[fstepperyr], max(fish_times), by=fstepperyr) #last timestep
#fish_years <- unique(floor(fish_times/fstepperyr)+1) # my original
#fish_years <- unique(floor(fish_times/fstepperyr)) #from Christine's new sardine_config.R

#fishtime <- fish_times

fishtime <- fish_sample_full

# fishery sampling area
# should return all model areas, this assumes you see everything that it caught
fishboxes <- c(0:(omlist_ss$boxpars$nbox - 1))

# effective sample size needed for sample_fish
# this effective N is divided by the number of annual timesteps below, so 200 per time
# use as input to the length samples, ages can be a subset
fisheffN <- data.frame(species=survspp, effN=rep(1000, length(survspp)))

#this adjusts for subannual fishery output so original effN is for whole year
fisheffN$effN <- fisheffN$effN/fstepperyr 

# fishery catch cv can be used in sample_survey_biomass
# perfect observation
fish_cv <- data.frame(species=fishspp, cv=rep(0.01,length(fishspp)))

6b. Update create_fishery_subset

Left alone, this function will aggregate across all fleets and polygons given in the input data (since it is not expecting a fleet column). This needs modification to use the fleet column found in catchtons. That way the user can control which fleets are included in the aggregated output. Specifying fleets=NULL means all fleets are aggreagated.

#' @title Create fishery subset from Atlantis output

#' @description Create a subset of the fishery observations for an Atlantis scenario.

#' @details The function takes fishery catch data from an Atlantis scenario
#'   where the data was read in from Atlantis output using \code{\link{load_nc}}
#'   within \code{\link{run_truth}}. One does not need to use these functions
#'   to create \code{dat}, rather you must only ensure that the structure of
#'   \code{dat} is the same.
#'   Currently, the function subsets the data by fleet, polygon and time (there
#'   are not layers in fishery outputs).  The input is fishery observations (either
#'   catch nums or catch bio outputs of run_truth), so nothing else needs to be done.
#'   Atlantis already applies a fishery efficiency and selectivity internally.
#'   This function works for specific defined species, specific defined polygons,
#'   and specific defined time.

#' @author Poseidon
#' @export

#' @template      dat
#' @param time    The timing of the survey (a vector indicating specific time steps, which are typically associated with years)
#'                    i.e., seq(365,10*3650,365) would be an annual survey for 10 years
#' @param fleets  which fleet indices to aggregate in the output (NULL aggregates all fleets)
#' @param species The species to sample in the survey (a vector)
#' @param boxes   A vector of box numbers

#' @return The function returns a subsetted matrix aggregated over fleets with the columns
#'   as the input data, i.e.,:
#'   \itemize{
#'     \item{species}
#'     \item{agecl}
#'     \item{polygon}
#'     \item{layer}
#'     \item{time}
#'     \item{atoutput}
#'   }
#' This function is for a vector of defined species
#' Returns only boxes where fishery was sampled

create_fishery_subset <- function(dat, time, fleets, species, boxes) {

    #Do some vector length tests (species=effic, column names, )

    #first select the appropriate rows (fleets) if non-NULL fleets specified
  if(!is.null(fleets)){
    dat <- dat[dat$fleet %in% fleets,]
  }

    #first select the appropriate rows and
    aggData <- aggregateData(dat, time, species, boxes, keepColumns=c("species","agecl","polygon","time"))

    #Should I be checking for NA's along the way to identify problems?

    #Create final dataframe in same format as input
    #put time (mean) and layers (NA) back in the dataframe for completeness
    out <- data.frame(species = aggData$species,
                      agecl = aggData$agecl,
                      polygon = aggData$polygon,
                      layer = NA,
                      time = aggData$time,
                      atoutput = aggData$numAtAge)

    return(out)
}

6c. Update sample_fishery_totcatch

Now this function determines if there are polygons in the input, and aggregates over polygon if there are. It then applies cv at the subannual level for the specified fishery.

#' @title Sample a fishery catch index from an atlantis scenario
#'
#' @description The function takes total catch data from an Atlantis scenario
#'   where the data was read in from Atlantis output using \code{\link{load_catch}}
#'   function or from the \code{\link{load_nc_catchtons}} function.
#'   One does not need to use these functions
#'   to create \code{dat}, rather you must only ensure that the structure of
#'   \code{dat} is the same.
#' @details
#'   If a total catch time series is input from \code{\link{load_catch}},
#'   this function simply creates an observed total catch time series from
#'   total catch input using by applying a user-specified cv.
#'   If catch by polygon and timestep is input from \code{\link{load_nc_catchtons}}
#'   followed by \code{\link{create_fishery_subset}}, then catches are aggregated
#'   over polygons and the subannual timesteps are retained. Fleet specific catch
#'   can be created by selecting a subset of fleets as input to
#'   \code{\link{create_fishery_subset}}. Either way, the result is a coastwide
#'   catch (tons) estimate in tons from the fishery.
#' @author Poseidon
#' @export
#'
#' @template dat
#' @param cv      Coefficient of variation for the entire species specific catch
#'                    a matrix with columns: species, cv
#'
#' @return The standard dataframe as specified used in \code{dat}.
#'
#' @examples
#' d <- system.file("extdata", "SETAS_Example", package = "atlantisom")
#' groups <- c("Pisciv_T_Fish","Pisciv_S_Fish")
#' file <- "outputsCatch.txt"
#' fgs <- load_fgs(dir = d, "Functional_groups.csv")
#' truetotcatch <- load_catch(dir = d, fgs = fgs, file_catch = file)
#' groupcatch <- truetotcatch[truetotcatch$species %in% groups, ]
#' cv <- data.frame(species=groups, cv=c(0.2,0.3))
#' fishObsCatch <- sample_fishery_totcatch(dat=groupcatch,cv=cv)


sample_fishery_totcatch <- function(dat,cv) {

  #dat is the output of load_catch, polygons are all NA

  # if dat is output of load_nc_catchtons
  # it needs to be aggregated over fleet and polygon
  if(!all(is.na(dat$polygon))){
    #sum over boxes and ages (the sampled boxes and fleets were already subset
    #in create functions)
    dat <- aggregate(dat$atoutput,list(dat$species,dat$time),sum)
    names(dat) <- c("species","time","catchtons")
  }

    #add observation error
  totCobs <- merge(dat,cv,by="species",all.x=T)
  totCobs$var <- log(totCobs$cv^2+1)
  totCobs$obsCatch <- rlnorm(nrow(totCobs), log(totCobs$catchtons)-totCobs$var/2, sqrt(totCobs$var))

    out <- data.frame(species=totCobs$species,
                      agecl = NA,
                      polygon=NA,
                      layer=NA, time=totCobs$time,
                      atoutput=totCobs$obsCatch)

    return(out)
}


if(F) {
    #to check how multiple sampling can work
    a <- seq(10,100,10)
    b <- seq(0.1,1,0.1)

    x <- matrix(rlnorm(length(a)*1000,log(a)-(b^2)/2,b),ncol=10,byrow=T)
    apply(x,2,mean)


    directory <- system.file("extdata", "INIT_VMPA_Jan2015", package = "atlantisom")
    scenario <- "SETAS"
    groups <- load_fgs(dir = directory, "functionalGroups.csv")
    groups <- groups[groups$IsTurnedOn > 0, "Name"]
    results <- run_truth(scenario = scenario,
    dir = directory,
    file_fgs = "functionalGroups.csv",
    file_bgm = "VMPA_setas.bgm",
    select_groups = groups,
    file_init = "INIT_VMPA_Jan2015.nc",
    file_biolprm = "VMPA_setas_biol_fishing_Trunk.prm")

    species=c("Pisciv_T_Fish","Pisciv_S_Fish")
    boxes <- 1:3
    effic <- data.frame(species=c("Pisciv_T_Fish","Pisciv_S_Fish"), efficiency=c(0.3,0.1))
    selex <- data.frame(species=c(rep("Pisciv_T_Fish",10),rep("Pisciv_S_Fish",10)),
                    agecl=c(1:10,1:10),
                    selex=c(0,0,0.1,0.5,0.8,1,1,1,1,1,0,0,0.1,0.3,0.5,0.7,0.9,1,1,1))

    tmp <- create_survey(dat=results$nums, time=seq(70,82,3), species=species, boxes=boxes, effic=effic, selex=selex)

    wtAtAge <- data.frame(species=c(rep("Pisciv_T_Fish",10),rep("Pisciv_S_Fish",10)),
                    agecl=c(1:10,1:10),
                    wtAtAge=c(0.1,0.5,0.9,1.5,1.8,2.0,2.1,2.2,2.3,2.35,0.05,0.2,0.3,0.35,0.45,0.5,0.55,0.6,0.63,0.65))
    cv <- data.frame(species=species, cv=c(0.2,0.3))

    survObsBiom <- sample_survey_biomass(dat=tmp,cv=cv,wtAtAge)

}

Testing

Compare the previous total catch time series (annual) with the new one summed to annual and hope the mostly match (both have a cv applied but it is 0.01 for all species so shouldn’t matter.)

Read in saved files created previously in ms-keyrun and the one just created with the new functions. This is now comparing the “sampled” fishery data.

mskeyrundir <- "/Users/sarah.gaichas/Documents/0_Data/ms-keyrun/simulated-data/atlantisoutput/NOBA_sacc_38"

scenario.name <- "nordic_runresults_01"

stepperyr <- 5

fisherycomp <- atlantisom::read_savedfisheries(mskeyrundir, "Catch")

from_nc <- fisherycomp$allfleet[[1]]

from_txt <- fisherycomp$census[[1]]

fgs <- atlantisom::load_fgs(mskeyrundir, "nordic_groups_v04.csv")

  aggcatchtons <- from_nc %>%
    dplyr::select(species, time, atoutput) %>%
    dplyr::mutate(year = ceiling(time / stepperyr)) %>%
    dplyr::group_by(species, year) %>%
    dplyr::summarise(yeartot = sum(atoutput))
## `summarise()` has grouped output by 'species'. You can override using the
## `.groups` argument.
  txtCatchtons <- from_txt %>%
    dplyr::select(species, time, atoutput) %>%
    dplyr::mutate(year = time / 365) %>%
    dplyr::left_join(dplyr::select(fgs, Code, Name), by = c("species" = "Name"))
  
  comparetons <- txtCatchtons %>%
    dplyr::left_join(aggcatchtons, by = c("Code" = "species", "year" = "year")) %>%
    ggplot2::ggplot() +
    geom_line(aes(year, atoutput)) +
    geom_point(aes(year, yeartot), color = "blue", alpha = 0.2) +
    facet_wrap( ~ species, scales = "free_y")
  
  # looks like a match!
  comparetons
## Warning: Removed 299 rows containing missing values (`geom_point()`).

We have a match so it seems the workflow gives the same results in aggregate. Lets call the fleet specific and subannual catch good then. It can go into mskeyrun.

7. Adjust mskeyrun functions creating aggregated and subannual catch datasets

See mskeyrun repo. I would like to keep life simple and not have fleet specific subannual catch indices, but since we just did fleet specific indices with the real Georges Bank data I suppose I will have to test this with simulated data.

But maybe not for the WGSAM skill assessment. See below for why I don’t quite have a full set of fleet specific outputs.

8. Adjust atlantisom::om_comps for the new catchtons output

Well this sounded easier than it is. We can have multiple fleets in om_comps using a similar structure to surveys. Sadly, the length comps from fisheries cannot be by fleet because they come from the CATCH.nc numbers output, which is by age class and not by fleet. Lengths are estimated using the age class information by numbers, structn, and resn. I don’t think we can apply the calcage2length function to the true age data because we don’t have the growth information from structn and resn at that level.

I can rewrite om_comps to use fleet information on all but the lengths. Placeholder code below.

#'Generate composition data from atlantisom
#'
#'#'@description A wrapper function to create survey and fishery compositional data for assessment input.
#'Takes the output of \code{om_species}. Wrapper can generate replicates. Saves output as .rds
#'Results for more than one survey are generated with multiple survey config files and
#'saved as separate .rds files.
#'@param usersurvey survey config file in format of /config/usersurvey.R
#'@param userfishery fishery config file in format of /config/fisherycensus.R
#'@param omlist_ss output of \code{om_species}
#'@param n_reps number of replicate age, length, and weight-at-age compositions to be generated
#'@template save
#'@return Returns list objects containing dataframes of survey catch, length, and weight at age class and age:
#' \itemize{
#'  \item{survObsAgeComp, list of replicate dataframes of observed survey age comp (n fish)}
#'  \item{survObsLenComp, list of replicate dataframes of observed survey length comp (n fish)}
#'  \item{survObsWtAtAge, list of replicate dataframes of observed survey weight at age (avg wt)}
#'  \item{fishObsAgeComp, list of replicate dataframes of observed fishery age comp (n fish)}
#'  \item{fishObsLenComp, list of replicate dataframes of observed fishery length comp (n fish)}
#'  \item{fishObsWtAtAge, list of replicate dataframes of observed fishery weight at age (avg wt)}
#'  \item{survObsFullAgeComp, list of replicate dataframes of observed survey annual age comp (n fish)}
#'  \item{fishObsFullAgeComp, list of replicate dataframes of observed fishery age comp (n fish)}
#'  \item{survObsFullWtAtAge, list of replicate dataframes of observed survey weight at annual age (avg wt)}
#'  \item{fishObsFullWtAtAge, list of replicate dataframes of observed fishery weight at annual age (avg wt)}
#' },
#'
#'@export
#'
#'@family wrapper functions
#'@author Sarah Gaichas
#'
#'@examples
#'\dontrun{
#' # assuming CC3om is output of om_init(here("config/CC3config.r"))
#' # and CC3om_sardine <- om_species(c("Pacific_sardine"), CC3om)
#'
#'CC3om_sard_comp <- om_comps(usersurvey = here("config/usersurvey.R"),
#'     userfishery = here("config/fisherycensus.R"),
#'     omlist_ss = CC3om_sardine,
#'     n_reps = 1,
#'     save = TRUE)
#'
#'}
om_comps <- function(usersurvey = usersurvey_file,
                     userfishery = userfishery_file,
                     omlist_ss,
                     n_reps = n_reps,
                     save = TRUE){

  #one script for dimension parameters to be used in multiple functions
  source("config/omdimensions.R", local = TRUE)

  # user options for survey--default is a census with mid-year sample
  # allows multiple surveys
  age_comp_datas <- list()
  survObsLenComps <- list()
  survObsWtAtAges <- list()

  for (s in usersurvey)
  {
    source(s, local = TRUE)

    #numbers based fishery independent survey for age and length comps
    # same user specifications as indices
    survey_N <- atlantisom::create_survey(dat = omlist_ss$truenums_ss,
                                          time = survtime,
                                          species = survspp,
                                          boxes = survboxes,
                                          effic = surveffic,
                                          selex = survselex.agecl)

    #Sample fish for age composition
    # if we want replicates for obs error this sample function will generate them
    age_comp_data <- list()
    for(i in 1:n_reps){
      age_comp_data[[i]] <- atlantisom::sample_fish(survey_N, surveffN)
    }

    # save age comps
    if(save){
      saveRDS(age_comp_data, file.path(d.name, paste0(scenario.name, "_",
                                                      survey.name, "survObsAgeComp.rds")))
    }

    #weights needed for weight at age and length comp calcs
    # aggregate true resn per survey design
    survey_aggresn <- atlantisom::aggregateDensityData(dat = omlist_ss$trueresn_ss,
                                                       time = survtime,
                                                       species = survspp,
                                                       boxes = survboxes)

    # aggregate true structn per survey design
    survey_aggstructn <- atlantisom::aggregateDensityData(dat = omlist_ss$truestructn_ss,
                                                          time = survtime,
                                                          species = survspp,
                                                          boxes = survboxes)

    #dont sample these, just aggregate them using median
    structnss <- atlantisom::sample_fish(survey_aggstructn, surveffN, sample = FALSE)

    resnss <- atlantisom::sample_fish(survey_aggresn, surveffN, sample = FALSE)

    #this is all input into the length function, replicates follow age comp reps
    #  separating the length comps from the weight at age here
    survey_lenwt <- list()
    survObsLenComp <- list()
    survObsWtAtAge <- list()

    for(i in 1:n_reps){
      survey_lenwt[[i]] <- atlantisom::calc_age2length(structn = structnss,
                                                       resn = resnss,
                                                       nums = age_comp_data[[i]],
                                                       biolprm = omlist_ss$biol,
                                                       fgs = omlist_ss$funct.group_ss,
                                                       maxbin = maxbin,
                                                       CVlenage = lenage_cv,
                                                       remove.zeroes=TRUE)

      survObsLenComp[[i]] <- survey_lenwt[[i]]$natlength
      survObsWtAtAge[[i]] <- survey_lenwt[[i]]$muweight
    }

    if(save){
      saveRDS(survObsLenComp, file.path(d.name, paste0(scenario.name, "_",
                                                       survey.name, "survObsLenComp.rds")))
      saveRDS(survObsWtAtAge, file.path(d.name, paste0(scenario.name, "_",
                                                       survey.name, "survObsWtAtAge.rds")))
    }
    # add each survey to master list objects for survey data
    age_comp_datas[[survey.name]] <- age_comp_data
    survObsLenComps[[survey.name]] <- survObsLenComp
    survObsWtAtAges[[survey.name]] <- survObsWtAtAge

  }

  #now do fishery comps
  # user options for fishery--default is a census with mid-year sample
  # 2023 update, can now get fleet specific catch by polygon
  # user options for fishery--default is a census in all areas for all fleets
  # allows multiple fisheries
  catch_age_comps <- list()
  fishObsLenComps <- list()
  fishObsWtAtAges <- list()

  for(f in userfishery){

    source(f, local = TRUE)

    # 2023 update: we can now subset fishery catch from CATCH.nc using fleet output
    # create_fishery_subset as currently written aggregates across fleets and polygons
    # changed so that fleets to be aggregated together are specified in the userfishery file

    # however truecatchnum does not have fleets so needs to be null here, no fleet info
    # SO, no length comps by fleet! ugh. can get true age comps by fleet though

    # With current setup, if user specifies multiple fishery files, they will get
    # catch index by fleet
    # true age by fleet
    # but the length comps will have all fleet lengths for the specified fleet times and boxes

    #fishery catch at age each observed timestep summed over observed polygons
    # catch at age by area and timestep
    catch_numbers <-  atlantisom::create_fishery_subset(dat = omlist_ss$truecatchnum_ss,
                                                        time = fishtime,
                                                        fleets = NULL,
                                                        species = survspp,
                                                        boxes = fishboxes)

    # if we want replicates for obs error this sample function will generate them
    catch_age_comp <- list()
    for(i in 1:n_reps){
      catch_age_comp[[i]] <- atlantisom::sample_fish(catch_numbers, fisheffN)
    }

    # save fishery age comps
    if(save){
      saveRDS(catch_age_comp, file.path(d.name, paste0(scenario.name, "_",
                                                       fishery.name, "fishObsAgeComp.rds")))
    }

    #Get catch weights for length comp calc
    # aggregate true resn per fishery subset design
    catch_aggresnss <- atlantisom::aggregateDensityData(dat = omlist_ss$trueresn_ss,
                                                        time = fishtime,
                                                        species = survspp,
                                                        boxes = fishboxes)

    # aggregate true structn fishery subsetdesign
    catch_aggstructnss <- atlantisom::aggregateDensityData(dat = omlist_ss$truestructn_ss,
                                                           time = fishtime,
                                                           species = survspp,
                                                           boxes = fishboxes)

    #dont sample these, just aggregate them using median
    catch_structnss <- atlantisom::sample_fish(catch_aggstructnss, fisheffN, sample = FALSE)

    catch_resnss <- atlantisom::sample_fish(catch_aggresnss, fisheffN, sample = FALSE)

    # these fishery lengths and weight at age are each output timestep
    #same structure as above for surveys, replicates follow age comp reps
    #  separating the length comps from the weight at age here
    fishery_lenwt <- list()
    fishObsLenComp <- list()
    fishObsWtAtAge <- list()

    for(i in 1:n_reps){
      fishery_lenwt[[i]] <- atlantisom::calc_age2length(structn = catch_structnss,
                                                        resn = catch_resnss,
                                                        nums = catch_age_comp[[i]],
                                                        biolprm = omlist_ss$biol,
                                                        fgs = omlist_ss$funct.group_ss,
                                                        maxbin = maxbin,
                                                        CVlenage = lenage_cv,
                                                        remove.zeroes=TRUE)

      fishObsLenComp[[i]] <- fishery_lenwt[[i]]$natlength
      fishObsWtAtAge[[i]] <- fishery_lenwt[[i]]$muweight
    }

    if(save){
      saveRDS(fishObsLenComp, file.path(d.name, paste0(scenario.name, "_",
                                                       fishery.name, "fishObsLenComp.rds")))
      saveRDS(fishObsWtAtAge, file.path(d.name, paste0(scenario.name, "_",
                                                       fishery.name, "fishObsWtAtAge.rds")))
    }

    # add each survey to master list objects for survey data
    catch_age_comps[[fishery.name]] <- catch_age_comp
    fishObsLenComps[[fishery.name]] <- fishObsLenComp
    fishObsWtAtAges[[fishery.name]] <- fishObsWtAtAge

  }

  if(!is.null(omlist_ss$truenumsage_ss)){
    #numbers based fishery independent survey for age and length comps
    #allows for mulitple surveys
    annage_comp_datas <- list()

    for (s in usersurvey)
    {
      source(s, local = TRUE)

      # same user specifications as indices
      survey_annageN <- atlantisom::create_survey(dat = omlist_ss$truenumsage_ss,
                                                  time = survtime,
                                                  species = survspp,
                                                  boxes = survboxes,
                                                  effic = surveffic,
                                                  selex = survselex)
      #Sample fish for age composition
      # if we want replicates for obs error this sample function will generate them
      annage_comp_data <- list()
      for(i in 1:n_reps){
        annage_comp_data[[i]] <- atlantisom::sample_fish(survey_annageN, surveffN)
      }

      # save survey annual age comps
      if(save){
        saveRDS(annage_comp_data, file.path(d.name, paste0(scenario.name, "_",
                                                           survey.name, "survObsFullAgeComp.rds")))
      }
      annage_comp_datas[[survey.name]] <- annage_comp_data
    }

  }else{annage_comp_datas <- NULL}

  if(!is.null(omlist_ss$truecatchage_ss)){
    #fishery catch at age each observed timestep summed over observed polygons
    # there is a fleet variable in this dataset so can get catch at age by fleet
    # catch at age by area and timestep
    #allows for mulitple fisheries
    catch_annage_comps <- list()

    for (f in userfishery)
    {
      source(f, local = TRUE)

      catch_annagenumbers <-  atlantisom::create_fishery_subset(dat = omlist_ss$truecatchage_ss,
                                                                time = fishtime,
                                                                fleets = fishfleets,
                                                                species = survspp,
                                                                boxes = fishboxes)

      # if we want replicates for obs error this sample function will generate them
      # WARNING THIS AGGREGATES ACROSS FLEETS
      # TODO: need to change sample_fish to fix
      # SKG June 2023, with fleets defined in the input, the output aggregates only
      # selected fleets so I think this is ok to get annual age comp by user defined fleet
      catch_annage_comp <- list()
      for(i in 1:n_reps){
        catch_annage_comp[[i]] <- atlantisom::sample_fish(catch_annagenumbers, fisheffN)
      }

      # save fishery annual age comps
      if(save){
        saveRDS(catch_annage_comp, file.path(d.name, paste0(scenario.name,"_",
                                                            fishery.name, "fishObsFullAgeComp.rds")))
      }
      catch_annage_comps[[fishery.name]] <- catch_annage_comp
    }

  }else{catch_annage_comps <- NULL}

  # call interpolate weight at age function to get survObsFullWtAtAge
  if(!is.null(omlist_ss$truenumsage_ss)){
    interp_survWtAtAges <- list()
    for (s in usersurvey)
    {
      source(s, local = TRUE)

      interp_survWtAtAge <- list()
      for(i in 1:n_reps){
        interp_survWtAtAge[[i]] <- calc_avgwtstage2age(wtagecl = survObsWtAtAges[[survey.name]][[i]],
                                                       annages = omlist_ss$truenumsage_ss,
                                                       fgs = omlist_ss$funct.group_ss)
      }
      if(save){
        saveRDS(interp_survWtAtAge, file.path(d.name, paste0(scenario.name, "_",
                                                             survey.name, "survObsFullWtAtAge.rds")))
      }
      interp_survWtAtAges[[survey.name]] <- interp_survWtAtAge
    }
  }else{interp_survWtAtAges <- NULL}

  # do we want fishery average weight at true age too? why not
  # call interpolate weight at age function to get fishObsFullWtAtAge
  # WARNING currently aggregates out fleet info, but no fleets in aggregate wtage
  # June 2023
  # This will produce separate output files with fleets defined in separate fishery
  # config files areas and times but with problem notes above for lengths
  if(!is.null(omlist_ss$truecatchage_ss)){
    interp_fishWtAtAges <- list()
    for (f in userfishery)
    {
      source(f, local = TRUE)

      interp_fishWtAtAge <- list()
      for(i in 1:n_reps){
        interp_fishWtAtAge[[i]] <- calc_avgwtstage2age(wtagecl = fishObsWtAtAge[[i]],
                                                       annages = omlist_ss$truecatchage_ss,
                                                       fgs = omlist_ss$funct.group_ss)
      }
      if(save){
        saveRDS(interp_fishWtAtAge, file.path(d.name, paste0(scenario.name,"_",
                                                             fishery.name, "fishObsFullWtAtAge.rds")))
      }
      interp_fishWtAtAges[[fishery.name]] <- interp_fishWtAtAge
    }
  }else{interp_fishWtAtAge <- NULL}



  comps <- list("survObsAgeComp" = age_comp_datas,
                "survObsLenComp" = survObsLenComps,
                "survObsWtAtAge" = survObsWtAtAges,
                "fishObsAgeComp" = catch_age_comps,
                "fishObsLenComp" = fishObsLenComps,
                "fishObsWtAtAge" = fishObsWtAtAges,
                "survObsFullAgeComp" = annage_comp_datas,
                "fishObsFullAgeComp" = catch_annage_comps,
                "survObsFullWtAtAge" = interp_survWtAtAges,
                "fishObsFullWtAtAge" = interp_fishWtAtAges
                )

  return(comps)
}