Update prey overlap analysis

For bluefish forage index modeling, we are selecting a set of predators that have high diet similarity to bluefish.

Garrison and Link (2000) evaluated similarity of predator diets on the Northeast US shelf to develop foraging guilds. They used NEFSC bottom trawl survey data 1973-1997. We are using diets from 1985-2020 to characterize the prey index. Therefore an additional 20+ years of diet information is available to assess whether predator diet similarity has changed. In addition, identifying the predators with the most similar diets (and foraging habits) to bluefish is useful for this analysis.

Methods

Garrison and Link (2000) used hierarchical agglomerative clustering to evaluate groups of species with similar diets. Specifically, the Schoener similarity index (Schoener, 1970) was applied

to assess the dietary overlap, Dij, between predator pairs:

\[ D_{i,j} = 1 – 0.5 (∑ |p_{i,k} – p_{j,k}|) \]

where \(p_{i,k}\) = mean proportional volume of prey type k in predator i and \(p_{i,k}\) = mean proportional volume of prey type k in predator j.

Garrison and Link (2000) used a set of 52 prey categories to characterize diet. This does not correspond to current standardized prey categories, which range from highly aggregated general categories (“gencat”) through analysis categories (“analcat”) and collection categories (“collcat”). Lowest possible taxonomic information is “pynam” and the number of prey groups within each category is:

# object is called `allfh`
load(url("https://github.com/Laurels1/Condition/raw/master/data/allfh.RData"))
gencomlist <- allfh %>%
  select(pynam, gencat, analcat, collcat, gencom2, analcom3) 

gencomlist %>%
  summarise_all(n_distinct)
##   pynam gencat analcat collcat gencom2 analcom3
## 1  1373     21     108     327      21      107

It seems we could either reconstruct the categories from Garrison and Link (2000) or use the “analcom” category. Over 100 prey categories may be a bit much. Alternatively, additional detail on prey may help identify which piscivores are closest to bluefish. Scott is using a list of 56 prey categories that may be close enough to the orignal list.

preycats <- read_csv(here("fhdat/prey_categories.csv"))

The mean proportions were presumably taken over the entire time period 1973-1997 for the entire Northeast US shelf (survey area) and did not distinguish between seasons.

We could try to reproduce the original time period, add all years up to recent, do only 1997-recent, or some combination.

Also, we may want to include a different list of predators based on sample size. How many of each predator were observed in the earlier time period vs the later?

predn <- allfh %>%
  mutate(yearrange = case_when(year < 1998 ~ "early, 1973-1997",
                               year > 1997 ~ "recent, 1998-2020",
                               TRUE ~ as.character(NA))) %>%
  select(yearrange, pdcomnam) %>%
  group_by(yearrange, pdcomnam) %>%
  summarise(count = n()) %>%
  pivot_wider(names_from = yearrange, values_from = count)

datatable(predn)

There is a table on they NEFSC shiny app that updates feeding guilds based on 50 predators, including striped bass, so perhaps we can use that instead of re-doing this whole analysis.

dietoverlap <- read_csv(here("datfromshiny/tgmat.2022-02-15.csv"))

This can be input into a cluster analysis:

# follows example here https://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html

library(dendextend)

d_dietoverlap <- dist(dietoverlap)

guilds <- hclust(d_dietoverlap)

#plot(guilds)

dend <- as.dendrogram(guilds)

dend <- rotate(dend, 1:136)

dend <- color_branches(dend, k=6)

labels(dend) <- paste(as.character(names(dietoverlap[-1]))[order.dendrogram(dend)],
                           "(",labels(dend),")", 
                           sep = "")

dend <- hang.dendrogram(dend,hang_height=0.1)

# reduce the size of the labels:
# dend <- assign_values_to_leaves_nodePar(dend, 0.5, "lab.cex")
dend <- set(dend, "labels_cex", 0.5)
# And plot:
par(mar = c(3,3,3,7))
plot(dend, 
     main = "Clustered NEFSC diet data, (complete)
     (the labels give the predator species/size)", 
     horiz =  TRUE,  nodePar = list(cex = .007))

#legend("topleft", legend = iris_species, fill = rainbow_hcl(3))

Another visualization

par(mar = rep(0,4))
circlize_dendrogram(dend)

Find bluefish in clusters, and write list of “piscivores” (should be the same as shiny app):

#dend %>% get_nodes_attr("members")
# from https://talgalili.github.io/dendextend/reference/noded_with_condition.html
has_any_labels <- function(sub_dend, the_labels) any(labels(sub_dend) %in% the_labels)

# cols <- noded_with_condition(dend, has_any_labels,
#   the_labels = c("Bluefish..S(37)", "Bluefish..M(36)", "Bluefish..L(35)")
# ) %>%
#   ifelse(2, 1)
# set(dend, "branches_col", cols) %>% plot(horiz =  TRUE,  nodePar = list(cex = .007))

# number of members by node 
#dend %>% get_nodes_attr("members", id = c(2,44)) #2 is first major node, 44 separates "piscivores"

# list of species in node with all three bluefish sizes
partition_leaves(dend)[[
which_node(dend, c("Bluefish..S(37)", "Bluefish..M(36)", "Bluefish..L(35)"))
]]
##  [1] "Striped bass..M(106)"           "Sea raven..L(89)"              
##  [3] "Striped bass..L(105)"           "Bluefish..S(37)"               
##  [5] "Weakfish..M(117)"               "Fourspot flounder..L(50)"      
##  [7] "Northern shortfin squid..M(71)" "Longfin squid..M(63)"          
##  [9] "Longfin squid..S(64)"           "Northern shortfin squid..S(72)"
## [11] "Pollock..L(78)"                 "White hake..M(120)"            
## [13] "Red hake..L(82)"                "Silver hake..M(93)"            
## [15] "Atlantic halibut..M(17)"        "Pollock..XL(81)"               
## [17] "White hake..L(119)"             "Silver hake..L(92)"            
## [19] "Cusk..L(45)"                    "Goosefish..XL(56)"             
## [21] "Bluefish..L(35)"                "Goosefish..L(53)"              
## [23] "Goosefish..M(54)"               "Goosefish..S(55)"              
## [25] "Spiny dogfish..M(100)"          "Thorny skate..XL(116)"         
## [27] "Atlantic halibut..L(16)"        "Sea raven..M(90)"              
## [29] "Sea raven..S(91)"               "Atlantic cod..XL(13)"          
## [31] "Spiny dogfish..L(99)"           "Spotted hake..M(103)"          
## [33] "Summer flounder..M(111)"        "Summer flounder..L(110)"       
## [35] "Bluefish..M(36)"                "Buckler dory..M(38)"

How much difference does clustering method make?

# again directly from https://cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html
hclust_methods <- c("ward.D", "single", "complete", "average", "mcquitty", 
        "median", "centroid", "ward.D2")
diet_dendlist <- dendlist()
for(i in seq_along(hclust_methods)) {
   hc_diet <- hclust(d_dietoverlap, method = hclust_methods[i])   
   diet_dendlist <- dendlist(diet_dendlist, as.dendrogram(hc_diet))
}
names(diet_dendlist) <- hclust_methods
#diet_dendlist

Correlations between clustering methods

diet_dendlist_cor <- cor.dendlist(diet_dendlist)
#diet_dendlist_cor
corrplot::corrplot(diet_dendlist_cor, "pie", "lower")

Comparison of clusters

par(mfrow = c(4,2))
for(i in 1:8) {
   diet_dendlist[[i]] %>% set("branches_k_color", k=6) %>% plot(axes = FALSE, horiz = TRUE)
   title(names(diet_dendlist)[i])
}

Compare complete (default) and average

diet_dendlist %>% dendlist(which = c(3,4)) %>% ladderize %>% 
   untangle(method = "step1side", k_seq = 2:6) %>%
   set("branches_k_color", k=6) %>% 
   tanglegram(faster = TRUE) # 

   #tanglegram(common_subtrees_color_branches = TRUE)

Compare complete and mcquitty

diet_dendlist %>% dendlist(which = c(3,5)) %>% ladderize %>% 
   untangle(method = "step1side", k_seq = 2:6) %>%
   set("branches_k_color", k=6) %>% 
   tanglegram(faster = TRUE) # 

Compare complete and wardD

diet_dendlist %>% dendlist(which = c(3,1)) %>% ladderize %>% 
   untangle(method = "step1side", k_seq = 2:6) %>%
   set("branches_k_color", k=6) %>% 
   tanglegram(faster = TRUE) # 

See about common nodes between methods, looks better

diet_dendlist_cor2 <- cor.dendlist(diet_dendlist, method = "common")
#iris_dendlist_cor2
corrplot::corrplot(diet_dendlist_cor2, "pie", "lower")

Maybe we go with ward.D which seems most consistent across methods?

d_dietoverlap <- dist(dietoverlap)

guilds <- hclust(d_dietoverlap, method = "ward.D")

#plot(guilds)

dend <- as.dendrogram(guilds)

dend <- rotate(dend, 1:136)

dend <- color_branches(dend, k=6)

labels(dend) <- paste(as.character(names(dietoverlap[-1]))[order.dendrogram(dend)],
                           "(",labels(dend),")", 
                           sep = "")

dend <- hang.dendrogram(dend,hang_height=0.1)

# reduce the size of the labels:
# dend <- assign_values_to_leaves_nodePar(dend, 0.5, "lab.cex")
dend <- set(dend, "labels_cex", 0.5)
# And plot:
par(mar = c(3,3,3,7))
plot(dend, 
     main = "Clustered NEFSC diet data (ward.D)
     (the labels give the predator species/size)", 
     horiz =  TRUE,  nodePar = list(cex = .007))

Find bluefish in clusters, and list of “piscivores” from ward.D (this adds pelagic feeders I would not consider piscivores):

#dend %>% get_nodes_attr("members")
# from https://talgalili.github.io/dendextend/reference/noded_with_condition.html
has_any_labels <- function(sub_dend, the_labels) any(labels(sub_dend) %in% the_labels)

# cols <- noded_with_condition(dend, has_any_labels,
#   the_labels = c("Bluefish..S(37)", "Bluefish..M(36)", "Bluefish..L(35)")
# ) %>%
#   ifelse(2, 1)
# set(dend, "branches_col", cols) %>% plot(horiz =  TRUE,  nodePar = list(cex = .007))

# number of members by node 
#dend %>% get_nodes_attr("members", id = c(2,36)) #2 is first major node, 36 separates "piscivores"

partition_leaves(dend)[[36]]
##  [1] "Spiny dogfish..M(100)"          "Thorny skate..XL(116)"         
##  [3] "Atlantic cod..L(10)"            "Winter skate..XL(130)"         
##  [5] "Spotted hake..M(103)"           "Summer flounder..M(111)"       
##  [7] "Fourspot flounder..M(51)"       "Summer flounder..S(112)"       
##  [9] "Atlantic herring..L(18)"        "Atlantic mackerel..M(22)"      
## [11] "Pollock..L(78)"                 "White hake..M(120)"            
## [13] "Atlantic halibut..M(17)"        "Pollock..XL(81)"               
## [15] "Red hake..L(82)"                "Silver hake..M(93)"            
## [17] "Silver hake..L(92)"             "Bluefish..L(35)"               
## [19] "Goosefish..L(53)"               "Cusk..L(45)"                   
## [21] "Goosefish..XL(56)"              "White hake..L(119)"            
## [23] "Goosefish..M(54)"               "Goosefish..S(55)"              
## [25] "Summer flounder..L(110)"        "Bluefish..M(36)"               
## [27] "Buckler dory..M(38)"            "Atlantic halibut..L(16)"       
## [29] "Sea raven..M(90)"               "Sea raven..S(91)"              
## [31] "Atlantic cod..XL(13)"           "Spiny dogfish..L(99)"          
## [33] "Longfin squid..M(63)"           "Blackbelly rosefish..M(30)"    
## [35] "Longfin squid..S(64)"           "Northern shortfin squid..S(72)"
## [37] "Fourspot flounder..L(50)"       "Northern shortfin squid..M(71)"
## [39] "Striped bass..M(106)"           "Sea raven..L(89)"              
## [41] "Striped bass..L(105)"           "Bluefish..S(37)"               
## [43] "Weakfish..M(117)"

Node 68 in this tree contains all bluefish sizes, perhaps closer to “pelagic piscivores”:

partition_leaves(dend)[[
which_node(dend, c("Bluefish..S(37)", "Bluefish..M(36)", "Bluefish..L(35)"))
]]
##  [1] "Silver hake..L(92)"             "Bluefish..L(35)"               
##  [3] "Goosefish..L(53)"               "Cusk..L(45)"                   
##  [5] "Goosefish..XL(56)"              "White hake..L(119)"            
##  [7] "Goosefish..M(54)"               "Goosefish..S(55)"              
##  [9] "Summer flounder..L(110)"        "Bluefish..M(36)"               
## [11] "Buckler dory..M(38)"            "Atlantic halibut..L(16)"       
## [13] "Sea raven..M(90)"               "Sea raven..S(91)"              
## [15] "Atlantic cod..XL(13)"           "Spiny dogfish..L(99)"          
## [17] "Longfin squid..M(63)"           "Blackbelly rosefish..M(30)"    
## [19] "Longfin squid..S(64)"           "Northern shortfin squid..S(72)"
## [21] "Fourspot flounder..L(50)"       "Northern shortfin squid..M(71)"
## [23] "Striped bass..M(106)"           "Sea raven..L(89)"              
## [25] "Striped bass..L(105)"           "Bluefish..S(37)"               
## [27] "Weakfish..M(117)"

Alternatively, break out the nodes with all three bluefish sizes from all of the clustering options and see how different they really are.

preds <- list()

for(i in 1:8) {
  dendi <- diet_dendlist[[i]]
  namei <- names(diet_dendlist)[i]
  labels(dendi) <- paste(as.character(names(dietoverlap[-1]))[order.dendrogram(dendi)],
                           "(",labels(dendi),")", 
                           sep = "")
  #preds[[namei]] <- partition_leaves(dendi)[[which_node(dendi, c("35", "36", "37"))]]
  preds[[namei]] <- partition_leaves(dendi)[[which_node(dendi, c("Bluefish..S(37)", "Bluefish..M(36)", "Bluefish..L(35)"))]]

}

preds
## $ward.D
##  [1] "Silver hake..L(92)"             "Bluefish..L(35)"               
##  [3] "Goosefish..L(53)"               "Cusk..L(45)"                   
##  [5] "Goosefish..XL(56)"              "White hake..L(119)"            
##  [7] "Goosefish..M(54)"               "Goosefish..S(55)"              
##  [9] "Summer flounder..L(110)"        "Bluefish..M(36)"               
## [11] "Buckler dory..M(38)"            "Atlantic halibut..L(16)"       
## [13] "Sea raven..M(90)"               "Sea raven..S(91)"              
## [15] "Atlantic cod..XL(13)"           "Spiny dogfish..L(99)"          
## [17] "Longfin squid..M(63)"           "Blackbelly rosefish..M(30)"    
## [19] "Longfin squid..S(64)"           "Northern shortfin squid..S(72)"
## [21] "Fourspot flounder..L(50)"       "Northern shortfin squid..M(71)"
## [23] "Striped bass..M(106)"           "Sea raven..L(89)"              
## [25] "Striped bass..L(105)"           "Bluefish..S(37)"               
## [27] "Weakfish..M(117)"              
## 
## $single
##  [1] "Summer flounder..L(110)"  "Sea raven..L(89)"        
##  [3] "Fourspot flounder..L(50)" "Buckler dory..M(38)"     
##  [5] "Bluefish..M(36)"          "Striped bass..L(105)"    
##  [7] "Bluefish..S(37)"          "Weakfish..M(117)"        
##  [9] "Winter skate..XL(130)"    "Spiny dogfish..L(99)"    
## [11] "Sea raven..S(91)"         "Atlantic cod..XL(13)"    
## [13] "Atlantic halibut..L(16)"  "Sea raven..M(90)"        
## [15] "Spiny dogfish..M(100)"    "Thorny skate..XL(116)"   
## [17] "Goosefish..M(54)"         "Silver hake..L(92)"      
## [19] "Goosefish..S(55)"         "Goosefish..XL(56)"       
## [21] "Cusk..L(45)"              "Bluefish..L(35)"         
## [23] "Goosefish..L(53)"        
## 
## $complete
##  [1] "Striped bass..M(106)"           "Sea raven..L(89)"              
##  [3] "Striped bass..L(105)"           "Bluefish..S(37)"               
##  [5] "Weakfish..M(117)"               "Fourspot flounder..L(50)"      
##  [7] "Northern shortfin squid..M(71)" "Longfin squid..M(63)"          
##  [9] "Longfin squid..S(64)"           "Northern shortfin squid..S(72)"
## [11] "Pollock..L(78)"                 "White hake..M(120)"            
## [13] "Red hake..L(82)"                "Silver hake..M(93)"            
## [15] "Atlantic halibut..M(17)"        "Pollock..XL(81)"               
## [17] "White hake..L(119)"             "Silver hake..L(92)"            
## [19] "Cusk..L(45)"                    "Goosefish..XL(56)"             
## [21] "Bluefish..L(35)"                "Goosefish..L(53)"              
## [23] "Goosefish..M(54)"               "Goosefish..S(55)"              
## [25] "Spiny dogfish..M(100)"          "Thorny skate..XL(116)"         
## [27] "Atlantic halibut..L(16)"        "Sea raven..M(90)"              
## [29] "Sea raven..S(91)"               "Atlantic cod..XL(13)"          
## [31] "Spiny dogfish..L(99)"           "Spotted hake..M(103)"          
## [33] "Summer flounder..M(111)"        "Summer flounder..L(110)"       
## [35] "Bluefish..M(36)"                "Buckler dory..M(38)"           
## 
## $average
##  [1] "Spiny dogfish..M(100)"   "Thorny skate..XL(116)"  
##  [3] "Atlantic cod..XL(13)"    "Spiny dogfish..L(99)"   
##  [5] "Sea raven..S(91)"        "Atlantic halibut..L(16)"
##  [7] "Sea raven..M(90)"        "White hake..L(119)"     
##  [9] "Goosefish..M(54)"        "Silver hake..L(92)"     
## [11] "Goosefish..S(55)"        "Goosefish..XL(56)"      
## [13] "Cusk..L(45)"             "Bluefish..L(35)"        
## [15] "Goosefish..L(53)"        "Sea raven..L(89)"       
## [17] "Striped bass..L(105)"    "Bluefish..S(37)"        
## [19] "Weakfish..M(117)"        "Spotted hake..M(103)"   
## [21] "Summer flounder..M(111)" "Summer flounder..L(110)"
## [23] "Bluefish..M(36)"         "Buckler dory..M(38)"    
## 
## $mcquitty
##  [1] "Fourspot flounder..L(50)"       "Northern shortfin squid..M(71)"
##  [3] "Striped bass..M(106)"           "Sea raven..L(89)"              
##  [5] "Striped bass..L(105)"           "Bluefish..S(37)"               
##  [7] "Weakfish..M(117)"               "Longfin squid..M(63)"          
##  [9] "Longfin squid..S(64)"           "Northern shortfin squid..S(72)"
## [11] "White hake..L(119)"             "Goosefish..S(55)"              
## [13] "Goosefish..M(54)"               "Silver hake..L(92)"            
## [15] "Goosefish..XL(56)"              "Cusk..L(45)"                   
## [17] "Bluefish..L(35)"                "Goosefish..L(53)"              
## [19] "Spiny dogfish..M(100)"          "Thorny skate..XL(116)"         
## [21] "Atlantic cod..XL(13)"           "Spiny dogfish..L(99)"          
## [23] "Sea raven..S(91)"               "Atlantic halibut..L(16)"       
## [25] "Sea raven..M(90)"               "Spotted hake..M(103)"          
## [27] "Summer flounder..M(111)"        "Summer flounder..L(110)"       
## [29] "Bluefish..M(36)"                "Buckler dory..M(38)"           
## 
## $median
##  [1] "Atlantic cod..L(10)"     "Winter skate..XL(130)"  
##  [3] "Thorny skate..XL(116)"   "Spiny dogfish..M(100)"  
##  [5] "Spiny dogfish..L(99)"    "Atlantic cod..XL(13)"   
##  [7] "Sea raven..S(91)"        "Sea raven..M(90)"       
##  [9] "Atlantic halibut..L(16)" "Goosefish..S(55)"       
## [11] "Goosefish..M(54)"        "Silver hake..L(92)"     
## [13] "Goosefish..XL(56)"       "Cusk..L(45)"            
## [15] "Bluefish..L(35)"         "Goosefish..L(53)"       
## [17] "Summer flounder..L(110)" "Buckler dory..M(38)"    
## [19] "Bluefish..M(36)"         "Sea raven..L(89)"       
## [21] "Striped bass..L(105)"    "Bluefish..S(37)"        
## [23] "Weakfish..M(117)"       
## 
## $centroid
##  [1] "Bluefish..S(37)"         "Atlantic halibut..M(17)"
##  [3] "Pollock..XL(81)"         "Sea raven..L(89)"       
##  [5] "Weakfish..M(117)"        "Spotted hake..M(103)"   
##  [7] "Summer flounder..M(111)" "Thorny skate..XL(116)"  
##  [9] "Buckler dory..M(38)"     "Spiny dogfish..M(100)"  
## [11] "Summer flounder..L(110)" "Bluefish..M(36)"        
## [13] "White hake..L(119)"      "Atlantic cod..XL(13)"   
## [15] "Spiny dogfish..L(99)"    "Sea raven..S(91)"       
## [17] "Sea raven..M(90)"        "Atlantic halibut..L(16)"
## [19] "Goosefish..M(54)"        "Silver hake..L(92)"     
## [21] "Goosefish..S(55)"        "Goosefish..XL(56)"      
## [23] "Cusk..L(45)"             "Bluefish..L(35)"        
## [25] "Goosefish..L(53)"       
## 
## $ward.D2
##  [1] "Summer flounder..L(110)"        "Bluefish..M(36)"               
##  [3] "Buckler dory..M(38)"            "Atlantic halibut..L(16)"       
##  [5] "Sea raven..M(90)"               "Sea raven..S(91)"              
##  [7] "Atlantic cod..XL(13)"           "Spiny dogfish..L(99)"          
##  [9] "White hake..L(119)"             "Goosefish..S(55)"              
## [11] "Bluefish..L(35)"                "Goosefish..L(53)"              
## [13] "Cusk..L(45)"                    "Goosefish..XL(56)"             
## [15] "Goosefish..M(54)"               "Silver hake..L(92)"            
## [17] "Ocean pout..L(73)"              "American plaice..L(4)"         
## [19] "Ocean pout..M(74)"              "Barndoor skate..S(26)"         
## [21] "Windowpane..S(123)"             "Butterfish..S(40)"             
## [23] "Butterfish..XS(41)"             "Fourspot flounder..L(50)"      
## [25] "Northern shortfin squid..M(71)" "Striped bass..M(106)"          
## [27] "Sea raven..L(89)"               "Striped bass..L(105)"          
## [29] "Bluefish..S(37)"                "Weakfish..M(117)"              
## [31] "Longfin squid..M(63)"           "Longfin squid..S(64)"          
## [33] "Northern shortfin squid..S(72)" "Atlantic herring..L(18)"       
## [35] "Atlantic mackerel..M(22)"       "Blackbelly rosefish..M(30)"    
## [37] "Offshore hake..M(76)"           "Buckler dory..S(39)"           
## [39] "Spiny dogfish..S(101)"

Results

Common to all bluefish-containing nodes across clustering methods with updated data:

shortlist <- Reduce(intersect, preds)
shortlist
##  [1] "Silver hake..L(92)"      "Bluefish..L(35)"        
##  [3] "Goosefish..L(53)"        "Cusk..L(45)"            
##  [5] "Goosefish..XL(56)"       "Goosefish..M(54)"       
##  [7] "Goosefish..S(55)"        "Summer flounder..L(110)"
##  [9] "Bluefish..M(36)"         "Buckler dory..M(38)"    
## [11] "Atlantic halibut..L(16)" "Sea raven..M(90)"       
## [13] "Sea raven..S(91)"        "Atlantic cod..XL(13)"   
## [15] "Spiny dogfish..L(99)"    "Sea raven..L(89)"       
## [17] "Bluefish..S(37)"         "Weakfish..M(117)"

Full piscivore list with updated data from “complete” method:

  dendi <- diet_dendlist$complete

  labels(dendi) <- paste(as.character(names(dietoverlap[-1]))[order.dendrogram(dendi)],
                           "(",labels(dendi),")", 
                           sep = "")
  
  pisccomplete <- partition_leaves(dendi)[[which_node(dendi, c("Bluefish..S(37)", "Bluefish..M(36)", "Bluefish..L(35)"))]]
  
  pisccomplete
##  [1] "Striped bass..M(106)"           "Sea raven..L(89)"              
##  [3] "Striped bass..L(105)"           "Bluefish..S(37)"               
##  [5] "Weakfish..M(117)"               "Fourspot flounder..L(50)"      
##  [7] "Northern shortfin squid..M(71)" "Longfin squid..M(63)"          
##  [9] "Longfin squid..S(64)"           "Northern shortfin squid..S(72)"
## [11] "Pollock..L(78)"                 "White hake..M(120)"            
## [13] "Red hake..L(82)"                "Silver hake..M(93)"            
## [15] "Atlantic halibut..M(17)"        "Pollock..XL(81)"               
## [17] "White hake..L(119)"             "Silver hake..L(92)"            
## [19] "Cusk..L(45)"                    "Goosefish..XL(56)"             
## [21] "Bluefish..L(35)"                "Goosefish..L(53)"              
## [23] "Goosefish..M(54)"               "Goosefish..S(55)"              
## [25] "Spiny dogfish..M(100)"          "Thorny skate..XL(116)"         
## [27] "Atlantic halibut..L(16)"        "Sea raven..M(90)"              
## [29] "Sea raven..S(91)"               "Atlantic cod..XL(13)"          
## [31] "Spiny dogfish..L(99)"           "Spotted hake..M(103)"          
## [33] "Summer flounder..M(111)"        "Summer flounder..L(110)"       
## [35] "Bluefish..M(36)"                "Buckler dory..M(38)"

Former piscivore list from Garrison and Link (2000), which I think we can agree is dated and used less refined prey categories:

garlink2000 <- ecodata::species_groupings %>%
  select(COMNAME, SizeCat, Garrison.Link) %>%
  filter(!is.na(Garrison.Link),
         Garrison.Link == "Piscivores") %>%
  mutate(PiscGuild = case_when(COMNAME == "WINTER SKATE" ~ "c",
                               COMNAME == "WEAKFISH" ~ "b", 
                               COMNAME == "BLUEFISH" & SizeCat == "S" ~ "b",
                               TRUE ~ "a")) %>%
  distinct()

garlink2000
##                     COMNAME SizeCat Garrison.Link PiscGuild
## 1              ATLANTIC COD       L    Piscivores         a
## 2              ATLANTIC COD      XL    Piscivores         a
## 3  ATLANTIC SHARPNOSE SHARK       L    Piscivores         a
## 4                  BLUEFISH       S    Piscivores         b
## 5                  BLUEFISH       M    Piscivores         a
## 6                  BLUEFISH       L    Piscivores         a
## 7         FOURSPOT FLOUNDER       M    Piscivores         a
## 8                 GOOSEFISH       L    Piscivores         a
## 9                 SEA RAVEN       S    Piscivores         a
## 10                SEA RAVEN       M    Piscivores         a
## 11              SILVER HAKE       L    Piscivores         a
## 12            SPINY DOGFISH       L    Piscivores         a
## 13             SPOTTED HAKE       M    Piscivores         a
## 14          SUMMER FLOUNDER       M    Piscivores         a
## 15          SUMMER FLOUNDER       L    Piscivores         a
## 16             THORNY SKATE       L    Piscivores         a
## 17             THORNY SKATE      XL    Piscivores         a
## 18                 WEAKFISH       S    Piscivores         b
## 19                 WEAKFISH       M    Piscivores         b
## 20               WHITE HAKE       L    Piscivores         a
## 21             WINTER SKATE       L    Piscivores         c
## 22             WINTER SKATE      XL    Piscivores         c

Discussion/recommendation

We are still filtering the prey list after this to match top prey for bluefish, so the choice of predators is largely intended to balance increasing sample size for modeling bluefish prey with using predators likely to be foraging similarly to bluefish.

One extreme assumption would be to include only bluefish as predators, but there are relatively few bluefish samples due to incomplete availability to bottom trawl surveys. This would miss prey available to bluefish because we have not sampled bluefish adequately.

The opposite extreme assumption would be to include all stomachs that contain any of the top bluefish prey, regardless of which species ate the prey. This would include predators that do not forage similarly to bluefish and might therefore “count” prey that are not actually available to bluefish due to habitat differences.

The intermediate approach which selects a group of piscivores that forage similarly to bluefish both increases sample size and screens out the most dissimilar predators to bluefish. The three possibilities above represent different piscivore groupings. We can compare differences in number of stomachs overall from these three groupings to get an idea of potential sample size tradeoffs among the choices:

fh.nefsc.pisc.garlink2000 <- allfh %>%
  #filter(pynam != "EMPTY") %>%
  left_join(garlink2000, by = c("pdcomnam" = "COMNAME",
                               "sizecat" = "SizeCat")) %>%
  filter(!is.na(Garrison.Link))

 preycount.garlink2000  <- fh.nefsc.pisc.garlink2000 %>%
   #group_by(year, season, pdcomnam, pynam) %>%
   group_by(pdcomnam, pynam) %>%
   summarise(count = n()) %>%
   #arrange(desc(count))
   pivot_wider(names_from = pdcomnam, values_from = count)
 
 shortlistdf <- data.frame("COMNAME" = toupper(str_remove(shortlist, "\\..*")),
                                "SizeCat" = str_remove(str_extract(shortlist, "\\..*[:upper:]+"), "\\.."),
                                "feedguild" = "shortlist")

 
 fh.nefsc.pisc.shortlist <- allfh %>%
  #filter(pynam != "EMPTY") %>%
  left_join(shortlistdf, by = c("pdcomnam" = "COMNAME",
                               "sizecat" = "SizeCat")) %>%
  filter(!is.na(feedguild))
 
 pisccompletedf <- data.frame("COMNAME" = toupper(str_remove(pisccomplete, "\\..*")),
                              "SizeCat" = str_remove(str_extract(pisccomplete, "\\..*[:upper:]+"), "\\.."),
                              "feedguild" = "pisccomplete")
 
 fh.nefsc.pisc.pisccomplete <- allfh %>%
  #filter(pynam != "EMPTY") %>%
  left_join(pisccompletedf, by = c("pdcomnam" = "COMNAME",
                               "sizecat" = "SizeCat")) %>%
  filter(!is.na(feedguild))

Which species are considered in each group, and how does this affect overall sample size 1985-2020?

Predator inclusion

garlink2000df <- garlink2000 %>% 
  mutate(feedguild = "garlink2000") %>%
  select(COMNAME, SizeCat, feedguild)

pisctable <- rbind(garlink2000df, shortlistdf, pisccompletedf) %>%
  mutate(included = 1) %>%
  pivot_wider(names_from = feedguild, values_from = included)

datatable(pisctable, options = list(pageLength = 50))

Sample sizes

Overall number of stomachs in each dataset (does not consider presence of bluefish prey):

  • Garrison Link 2000: 169013

  • Piscivores from full time series, complete algorithm: 260813

  • Shortlist across all clustering algorithms: 78838

Number of stomachs from years 1985-2020:

gl85on <- fh.nefsc.pisc.garlink2000 %>%
  filter(year>1984)

short85on <- fh.nefsc.pisc.shortlist %>%
  filter(year>1984)

pisccom85on <- fh.nefsc.pisc.pisccomplete %>%
  filter(year>1984)
  • Garrison Link 2000: 149887

  • Piscivores from full time series, complete algorithm: 224912

  • Shortlist across all clustering algorithms: 69233

Stations sampled and those with bluefish prey

gencomlist <- allfh %>%
  select(pynam, gencom2) %>%
  distinct()

blueprey <- preycount.garlink2000 %>% #this still contains all bluefish prey just like the other two
  filter(BLUEFISH > 9) %>%
  filter(!pynam %in% c("EMPTY", "BLOWN",
                       "FISH", "OSTEICHTHYES",
                       "ANIMAL REMAINS",
                       "FISH SCALES")) %>%
  #filter(!str_detect(pynam, "SHRIMP|CRAB")) %>%
  left_join(gencomlist) %>%
  filter(!gencom2 %in% c("ARTHROPODA", "ANNELIDA",
                         "CNIDARIA", "UROCHORDATA",
                         "ECHINODERMATA", "WORMS",
                         "BRACHIOPODA", "COMB JELLIES",
                         "BRYOZOA", "SPONGES",
                         "MISCELLANEOUS", "OTHER"))

#tally for garlink (already done)
fh.nefsc.pisc.garlink2000.blueprey <- fh.nefsc.pisc.garlink2000 %>%
  mutate(blueprey = case_when(pynam %in% blueprey$pynam ~ "blueprey",
                              TRUE ~ "othprey"))

preystn.garlink2000 <- fh.nefsc.pisc.garlink2000.blueprey %>%
  group_by(year, season, station) %>%
  count(blueprey) %>%
  pivot_wider(names_from = blueprey, values_from = n) %>%
  filter(year>1984)

#dim(preystn)[1]

bluepreystn.garlink2000 <- preystn.garlink2000 %>% 
  arrange(desc(blueprey)) %>%
  filter(!is.na(blueprey))

#dim(bluepreystn)[1]

# tally for shortlist
fh.nefsc.pisc.shortlist.blueprey <- fh.nefsc.pisc.shortlist %>%
  mutate(blueprey = case_when(pynam %in% blueprey$pynam ~ "blueprey",
                              TRUE ~ "othprey"))

preystn.shortlist <- fh.nefsc.pisc.shortlist.blueprey %>%
  group_by(year, season, station) %>%
  count(blueprey) %>%
  pivot_wider(names_from = blueprey, values_from = n)  %>%
  filter(year>1984)

#dim(preystn)[1]

bluepreystn.shortlist <- preystn.shortlist %>% 
  arrange(desc(blueprey)) %>%
  filter(!is.na(blueprey))

#dim(bluepreystn)[1]

# tally for pisccomplete
fh.nefsc.pisc.pisccomplete.blueprey <- fh.nefsc.pisc.pisccomplete %>%
  mutate(blueprey = case_when(pynam %in% blueprey$pynam ~ "blueprey",
                              TRUE ~ "othprey"))

preystn.pisccomplete <- fh.nefsc.pisc.pisccomplete.blueprey %>%
  group_by(year, season, station) %>%
  count(blueprey) %>%
  pivot_wider(names_from = blueprey, values_from = n) %>%
  filter(year>1984)

#dim(preystn)[1]

bluepreystn.pisccomplete <- preystn.pisccomplete %>% 
  arrange(desc(blueprey)) %>%
  filter(!is.na(blueprey))

#dim(bluepreystn)[1]

Our previous dataset using Garrison and Link (2000) piscivores, 1985 on had:

  • 21931 total stations, 7545 with bluefish prey.
  • Proportion of stations with bluefish prey: 0.3440336

The piscivores complete, 1985 on has:

  • 22186 total stations, 8825 with bluefish prey.
  • Proportion of stations with bluefish prey: 0.3977734

The shortlist dataset, 1985 on has:

  • 17485 total stations, 4925 with bluefish prey.
  • Proportion of stations with bluefish prey: 0.28167

Recommendation

The piscivores complete dataset using Brian’s updated diet similarity matrix seems to better capture predators that feed similarly to bluefish (e.g. striped bass as noted by the WG), and has a higher proportion of stations with bluefish prey than our original dataset. While the shortlist dataset keeps only the predators that always cluster with bluefish no matter what clustering algorithm is applied, it misses predators highlighted by the WG (striped bass) and results in fewer overall stations than the original dataset and a lower proportion of stations with bluefish prey.

My choice would be to use the piscivores complete dataset.

References

Garrison, L., and Link, J. 2000. Dietary guild structure of the fish community in the Northeast United States continental shelf ecosystem. Marine Ecology Progress Series, 202: 231–240. http://www.int-res.com/abstracts/meps/v202/p231-240/ (Accessed 22 October 2018).

Schoener, T. W. 1970. Nonsynchronous Spatial Overlap of Lizards in Patchy Habitats. Ecology, 51: 408–418. https://onlinelibrary.wiley.com/doi/abs/10.2307/1935376 (Accessed 14 February 2022).