--- title: "Cut Flower Farm Survey Analysis" author: "Sonja Glasser" date: '2025-01-25' output: html_document --- # Does cultivated and wild Asteraceae flower abundance correlate to gut pathogen prevelance in the wild common eastern bumble bee (Bombus impatiens)? ```{r setup, include=FALSE} require(here)#telling R where to look for a file require(lme4) # generalized mixed models library(performance) # check model performance library(tidyverse) # language & ggplot library(car) # Anova library(emmeans) #extract the estimated marginal means from the model library(effects)#allEffects() library(AICcmodavg) #compare models library(GGally) # pairwise plots library(lattice) # another graphing package library(MASS)#glm.nb library("glmmTMB") library("bbmle") library(influence.ME)#check for outlier in GLMM library(DHARMa) #model fit library(ggpubr)#ggarrange library(pscl)#zeroinfl() ``` ## Combining dataframes ans summarizing per flower category Data was collected across multiple excel tabs in a master workbook. From this single data frame file I will make totals for type of flower per site as well as totals in each category "cut, border, nscut, sf, tot" explained in more depth below. ```{r} #calling in data data <- read.csv(here("raw_data", "flr.resource.row_cover_submit.csv")) #looking at column names colnames(data) #summary of data summary(data) unique(data$genus) #I need to filter out the non-asteraceae flowers and the "" (empty columns) from the data frame data<-data%>% filter(genus!="cucumis", #remove non-Asteraceae genus genus!="clover", genus!="", genus!="citrillus", genus!="citrullus", genus!="galinsoga") data_v3 <- data %>% filter(!(cut_border == "border" & visit %in% c("1", "2"))) ##aggregated data, I added all the inflorescens/flower heads together and grouped ## by farm, visit and genus. tot_site_visit_flower<- data_v3 %>% group_by(farm, visit, genus, row_cover, sf_nsfcut_border, cut_border)%>% summarise(sum_inflr = sum(tot_inflr, na.rm = TRUE)) #max abundances by farm by cut_border a<- data_v3 %>% group_by(farm, visit, sf_nsfcut_border)%>% summarise(sum_inflr = sum(tot_inflr, na.rm = TRUE)) %>% filter(sf_nsfcut_border!="sf") b<-a %>% group_by(sf_nsfcut_border, farm) %>% summarise(max_cb = max(sum_inflr, na.rm = TRUE), .groups = "drop") c<-b%>% group_by(farm)%>% summarise(sum_tot=sum(max_cb)) # I will take the average per site and create a table for the supplementary text. visit_ave_fwr <- tot_site_visit_flower %>% group_by(farm, genus, cut_border)%>% summarise(mean_inflr = mean(sum_inflr, na.rm = TRUE)) write.csv(visit_ave_fwr, here("data", "visit_ave_fwr_tbl_supp.csv")) ``` ```{r} ##aggregated tot inflr based on site, visit, and sf vs. nsfcut vs. border (the sun flower data) tot_site_visit_sf<-tot_site_visit_flower%>% group_by(farm, visit, sf_nsfcut_border)%>% summarise(tot_sf_nsfcut_border_inflr = sum(sum_inflr)) ##aggregated tot inflr based on site, visit, and cut vs. border tot_site_visit_cut_border<-tot_site_visit_flower%>% group_by(farm, visit, cut_border)%>% summarise(tot_cut_or_border_inflr = sum(sum_inflr)) ## aggregated all asteraceae flowers at each sites tot_site_visit_ast<-tot_site_visit_flower%>% group_by(farm, visit)%>% summarise(tot_ast_inflr = sum(sum_inflr)) #joining dataframes tot_site_visit_flower, tot_site_visit_sf m1<-left_join(tot_site_visit_flower, tot_site_visit_sf)#, #joining dataframes m1,tot_site_visit_cut_border m2<-left_join(m1, tot_site_visit_cut_border) #joining dataframes m2, tot_site_visit_ast m3<-left_join(m2,tot_site_visit_ast) ``` ```{r} # need to spread the column values sf, nsfcut cut and border so that they are in different #columns. I will connect this with the totals columnn and create another csv. # for data analysis / exploration. m3_select <- m3 %>% ungroup() %>% dplyr::select( -genus, -row_cover, -sum_inflr) %>% unique() #pair down to just the unique totals. #first I start with the sunflower related columns. m3_sf<-m3_select%>% dplyr::select(-cut_border, -tot_cut_or_border_inflr, -tot_ast_inflr)%>% spread(key = "sf_nsfcut_border", value = "tot_sf_nsfcut_border_inflr") m3_cut_border<-m3_select%>% dplyr::select(-tot_ast_inflr, -tot_sf_nsfcut_border_inflr, -sf_nsfcut_border)%>% unique()%>% spread(key = "cut_border", value = "tot_cut_or_border_inflr") #pull apart df_tot_ast to only have the totals m3_tot <- m3_select %>% dplyr::select(farm,visit,tot_ast_inflr)%>% unique() #combine dataframes, this is the correct format, but it double the rows. #cut_border + tot_ast m3_cut_border_tot<-left_join(m3_cut_border, m3_tot) # I want to grab just the cut column from the cut_border_tot_u and add it to the sf_tot_u sf<-m3_sf%>% dplyr::select(farm,visit,nsfcut, sf) flwr_abun<-left_join(sf, m3_cut_border_tot)%>% dplyr::select(farm,visit,cut,border,sf,nsfcut,tot_ast_inflr)#this just reorders the columns. ``` We need to add 0 to the sf and nsfcut categories where there weren't flowers observed. ```{r} flwr_abun[is.na(flwr_abun)] <- 0 ``` ### Merge count data with floral data ```{r, echo=FALSE} crit <- read.csv(here("raw_data", "crith2022_submit.csv")) # merge with just Bombusimpatiens crit_imp<-left_join(crit,flwr_abun)%>% #w/o obs. where wings were not measured. filter(sp=="imp", Wing_MM!="NA", crit_cnt!="NA") ``` Now we have a data frame with all the values that we need. Now we need a z scale transformation of all variables in they are on very different numerical scales. ```{r} o <- crit_imp # o is original dataframe no transformations z <- crit_imp%>% mutate(across(c(j_date, cut, border, nsfcut, sf, tot_ast_inflr, Wing_MM), ~c(scale(.)))) z<- z%>% rename(z.date=j_date, z.cut=cut, z.border=border, z.nsfcut = nsfcut, z.sf =sf, z.tot = tot_ast_inflr, z.wing = Wing_MM) #renamed so the columns wouldn't be confused. ``` ### Questions: How does cultivated cut flower abundance affect Crithidia infection prevalence and intensity in wild Bombus impatiens? How do wild Asteraceae flowers (border flowers), sunflowers and total Asteraceae flower abundance (cut and border flowers) affect Crithidia infection prevalence and intensity in wild Bombus impatiens? How do the covariates of bumble bee size and time of season also contribute to Crithidia infection prevalence and intensity in wild Bombus impatiens? ### Hypothesis #### Principal hypothesis : Higher abundance of cut flowers from the Asteraceae family will result in both lowered probability of Crithidia infection presence and lower infection intensity in wild Bombus impatiens. Crithidia infection changes over the course of the season, therefore the structural zeros will be described by the date. #### Secondary hypothesis : Higher abundance of sunflowers, border and total floral abundance from the Asteraceae family will result in both lowered probability of Crithidia infection presence and infection intensity in wild Bombus impatiens. Crithidia infection changes over the course of the season, therefore the structural zeros will be described by the date. ### Statisitical analysis All data analysis was performed in Rstudio. #### Data Exploration ###### response variables * "crit_cnt" = raw crithidia count ###### explanatory variables * "cut" = sum all cut flr inflr * "sf" = sum all sunflower * "nsfcut" = sum all non-sunflower cut flower * "border" = sum all border flr inflr * "tot_ast_inflr" = sum all asteraceae flr inflr ###### transformed explanatory variables *scale()* * "z.cut" = scale(sum all cut flr inflr) * "z.sf" = scale(sum all sunflower) * "z.nsfcut" = scale(sum all non-sunflower cut flower) * "z.border" = scale(sum all border flr inflr) * "z.tot" = scale(sum all asteraceae flr inflr) ###### covariates * "Wing_MM" = wing size * "j_date" = julian date ###### transformed covariate *scale()* * "z.wing" = scale(Wing_MM) * "z.date" = scale(j_date) ###### random factors * "farm" = farm For this project I will use "crit_cnt" or the cell count (without transformation) for the Crithidia infection. All floral abundance values are either in their raw counts, or log10 transformed or scaled. For the data exploration portion of this report I will evaluate which type of data transformation I will apply to the variables before continuing to the modeling process. Because sunflower pollen has been proven to significantly decrease Crithidia infection in bumblebees we will consider sunflower abundance as a separate explanatory variable from cut flower abundance. Only three farms had sunflowers planted and at low abundances. If sunflower abundance has no significant effect, sunflowers and non-sunflower cut flowers will be grouped into the cut flower category. We are also interested in looking at the covariates Wing_mm (length of marginal cell on wing) which is used as a proxy for bee size and Julian date (for how infection levels changes over the season). ###### I. Response variable distribution ```{r, echo=FALSE} par(mfrow=c(1,3)) hist(z$crit_cnt, breaks = 100, main = "Histogram Crithidia cell count", xlab="Crithidia cell count") hist(z$log10_cnt, main = "Histogram \nlog10(Crithidia cell count)", xlab="log10(Crithidia cell count)") boxplot(z$crit_cnt, main="Spread of Crithidia cell count") ``` Looks like the raw Crithidia count numbers would be better modeled with a negative binomial model without using the log10 transformation on the response variable. Summary of the Crithidia cell count data below: ```{r, echo=FALSE} summary(z$crit_cnt) ``` The median and the mean are very different and the max is about 22 x that of the value of the 3rd quartile. These summary values suggest a zero inflated negative binomial model. But I will also check for outliers. ###### II. Outlier in the response variable The spread of this data looks like a clear example of a negative binomial distribution with many zeros and a long thin tail. I wonder is the value of 570 may be acting as an outlier. Although the higher Crithidia cell counts are fewer in number, the count that was 570 may be skewing the data too far. I will take out this cell count and try the models separately with this new data frame. ```{r, echo=FALSE} crit_imp2<-z%>% filter(crit_cnt!=570) ``` I have created another dataframe "crit_imp2" without this potential outlier value of 570. As we can see below there is not much of a difference between the spread of the data, just a shorter tail. ```{r, echo=FALSE} par(mfrow=c(1,3)) hist(crit_imp2$crit_cnt, breaks = 100, main = "Histogram Crithidia cell count") hist(crit_imp2$log10_cnt, main = "Histogram \nlog10(Crithidia cell count)") boxplot(crit_imp2$crit_cnt, main="Spread of Crithidia cell count") ``` Taking out one outlier point did not do so much. I will try to remove influential outlier points using a better criteria than just "it seems really different". Here I use the complete full "global" model that I will describe and fit in a later section to test for outliers in the model. ```{r} crit<-glmmTMB(crit_cnt ~ z.cut*z.date + z.wing + (1|farm), ziformula= ~z.date, family=nbinom2, data = z ) sim_res <- simulateResiduals(fittedModel = crit, n = 1000) # Diagnostic plots plot(sim_res) # uniformity, dispersion, outliers testOutliers(sim_res) # statistical outlier test testDispersion(sim_res) # overdispersion check ``` I will continue to use the full data frame because it looks like there are no outliers! ###### III. Distribution selection It is apparent that there is a very large spread of the data mostly driven by the amount of zeros reported for the Crithidia count data. ```{r, echo=FALSE} mean.crit<- mean(z$crit_cnt) var.crit<- var(z$crit_cnt) thet.crit <- mean.crit^2 / (var.crit - mean.crit) hist(crit_imp$crit_cnt,main="Crithidia cell counts distribution",xlab="crithidia cell counts", right=TRUE,freq=FALSE,col=adjustcolor("darkred",0.5)) lines(0:600, # range of the observed counts dpois(0:600,lambda=mean.crit),lwd=2) lines(0:600,dnbinom(0:600,mu=mean.crit, size = thet.crit),lty=2) mean.crit var.crit ``` The mean of the crithidia count was 24.9 where as the variance was calculated as 3179.7. In the figure above, the solid line and the dotted line represent the poisson distribution and the negative binomial distribution, respectively. These distributions were simulated using sample parameters calculated from the Crithidia cell counts. Visually, the poisson distribution is inappropriate for this data set while the negative binomial distribution looks like a better fit. But, the dotted line does not fit the spread of the data perfectly, which makes sense in light of the fact that 54.5% of the bumblebees presented zero crithidia cells, therefore, a zero inflated negative binomial model would be likely be the most appropriate model. Before confirming which distribution to use I will check the distribution of the explanatory variables and explore different transformations. ###### IV. Explanatory and covariate distribution *Flower abundance* Transforming explanatory variables helps with model fit and ensuring a more equal effect of each variable on the response variable. Below are histograms showing the spread of raw, log10, and scaled floral abundances for the flower types. ```{r, echo=FALSE} par(mfrow=c(3,3)) hist(o$cut, main = "Raw cut flower abundance") hist(z$z.cut, main = "scale(cut flower abundance)") hist(o$nsfcut, main = "Raw non sf cut flower abundance") hist(z$z.nsfcut, main = "scale(non sf cut flower abundance)") hist(o$sf, main = "Raw sunflower flower abundance") hist(z$z.sf, main = "scale(sunflower flower abundance)") hist(o$border, main = "Raw border flower abundance") hist(z$z.border, main = "scale(border flower abundance)") hist(o$tot_ast_inflr, main = "Raw total flower abundance") hist(z$z.tot, main = "scale(total flower abundance)") ``` Looking at the explanatory variables we can see that the scaled versions more closely reflect the distribution of the raw numbers The raw abundances for the explanatory variables span from 300 to 120,000, therefore having them on a similar scale would be the best option. The scaled variables will be used in the models below so that the variance is more even between explanatory variables creating an equal opportunity to predict the variability in the response. I will move forward with the z dataframe Presented below is a pair plot (ggplot function ‘ggpairs’, Wickham 2016). We have Crithidia count, Wing_mm (wing size, proxy for bee body size), cut flower abundance, border flower abundance, sunflower abundance, total flower abundance and Julian date. ```{r, echo=FALSE} ggpairs(z[,c(4,9,11,16:20)]) + theme_bw() ``` It seems like there are significant correlations between Crithidia cell count, Wing_MM and border flower abundance Let's take a closer look at the relationship between crithidia count and the explanatory variables and covariates (ggplot function 'ggplot', Wickham 2016). Plots arranged in single panel (ggpubr function 'ggarrange', Kassambara 2020). ```{r, echo=FALSE, warning=FALSE} c<-ggplot(z, aes(z.nsfcut, crit_cnt))+ geom_point()+ geom_smooth()+ ggtitle("Crithidia count and \ncut flower abundance") + theme(plot.title = element_text(hjust = 0.5))+ ylab("Crithidia cell count") + xlab("scale(cut flower abundance)") + theme_bw() sf<-ggplot(z, aes(z.sf, crit_cnt))+ geom_point()+ geom_smooth()+ ggtitle("Crithidia count and \nsunflower abundance") + theme(plot.title = element_text(hjust = 0.5))+ ylab("Crithidia cell count") + xlab("scale(sunflower abundance)") + theme_bw() b<-ggplot(z, aes(z.border, crit_cnt))+ geom_point()+ geom_smooth()+ ggtitle("Crithidia count and \nborder flower abundance") + theme(plot.title = element_text(hjust = 0.5))+ ylab("Crithidia cell count") + xlab("scale(border flower abundance)") + theme_bw() t<-ggplot(z, aes(z.tot, crit_cnt))+ geom_point()+ geom_smooth()+ ggtitle("Crithidia count and \ntotal flower abundance") + theme(plot.title = element_text(hjust = 0.5))+ ylab("Crithidia cell count") + xlab("scale(total flower abundance)") + theme_bw() w<-ggplot(z, aes(z.wing, crit_cnt))+ geom_point()+ geom_smooth()+ ggtitle("Crithidia count and \nbumblebee size") + theme(plot.title = element_text(hjust = 0.5))+ ylab("Crithidia cell count") + xlab("scale(Wing_mm)") + theme_bw() j<-ggplot(z, aes(z.date, crit_cnt))+ geom_point()+ geom_smooth()+ ggtitle("Crithidia count and \nJulian date") + theme(plot.title = element_text(hjust = 0.5))+ ylab("Crithidia cell count") + xlab("Julian date") + theme_bw() ggarrange(c, sf, b, t, w, j + rremove("x.text"), ncol = 3, nrow = 3) ``` For the affect of the explanatory variables, the covariates and the Crithidia cell count does not seem obvious and many of the values are 0 for the crithidia count. Although the "geom_smooth" line was added to look for patterns or a tendency between each separate explanatory variables and the response variable, we see that nearly a flat line can be drawn across along the 0 value of the y axis value (crithidia count). For the cut flowers, border flowers and total flower abundance there seems to be a slightly negative tendency, but this is very speculative. Let's look at mean infection per farm. ```{r} farm_means <- z %>% group_by(farm) %>% summarise( mean_crit_cnt = mean(crit_cnt, na.rm = TRUE), sd_crit_cnt = sd(crit_cnt, na.rm = TRUE), n = n(), se_crit_cnt = sd_crit_cnt / sqrt(n) ) farm_means ``` ```{r} ggplot(z, aes(x = farm, y = crit_cnt)) + # Raw data (lighter so summaries pop) geom_jitter(width = 0.12, height = 0, size = 2, alpha = 0.4, color = "grey40") + # Error bars FIRST (thick & dark) geom_errorbar(data = farm_means, aes(y = mean_crit_cnt, ymin = mean_crit_cnt - se_crit_cnt, ymax = mean_crit_cnt + se_crit_cnt), width = 0.25, linewidth = 1.3, color = "black") + # Mean point LAST (so it sits on top) geom_point(data = farm_means, aes(y = mean_crit_cnt), size = 4, shape = 21, fill = "white", color = "black", stroke = 1.5) + theme_classic(base_size = 14) + labs( x = "Farm", y = expression("Obs. No. of " * italic("C. bombi") * " cells in 0.02µL") ) ``` ```{r} p <- ggplot(z, aes(x = farm, y = crit_cnt)) + # Boxplot (no outlier points — we'll show raw data instead) geom_boxplot(width = 0.6, outlier.shape = NA, linewidth = 1, fill = "white", color = "black") + # Raw data geom_jitter(width = 0.12, size = 1.8, alpha = 0.5, color = "grey40") + theme_classic(base_size = 14) + labs( x = "Farm", y = expression(italic("C. bombi") * " cells in 0.02µL") ) + theme( axis.text = element_text(color = "black"), axis.title = element_text(face = "bold"), axis.line = element_line(linewidth = 0.8) ) p ``` ```{r} ggsave("Farm_boxplot_crit_cnt.tiff", plot = p, width = 6, height = 4, units = "in", dpi = 600, compression = "lzw") ``` ###### V. Fitting model to the distribution In order to confirm that I should be using a negative binomial distributed zero inflated model I will fit the following full and null models; a generalized linear model with a negative binomial distribution (MASS function 'glm.nb', Venables & Ripley, 2002), a generalized linear model with a poisson distribution (stats function, 'glm', R Core Team, 2021) and zero inflated genelarized linear models with both negative and poisson distribution (pscl function 'zeroinfl', Zeileis et al.,2008). Additionally, I check for to see if the structural zeros was best described by julian date. ```{r} distributionList<-list( null_nb = glm.nb(crit_cnt ~ 1, z), null_nb_zi = zeroinfl(crit_cnt ~ 1, data = z, dist = "negbin"), full_nb = glm.nb(crit_cnt ~ z.nsfcut*z.date + z.wing, z ), zi_full_nb = zeroinfl(crit_cnt ~ z.nsfcut*z.date + z.wing|1, data = z , dist = "negbin" ), zi_full_nb_zi.zjul=zeroinfl(crit_cnt ~ z.nsfcut*z.date + z.wing|z.date, data = z , dist = "negbin"), null_poi = glm(crit_cnt ~ 1, data = z, family = "poisson"), full_poi = glm(crit_cnt ~ z.nsfcut*z.date + z.wing, data = z, family = "poisson"), zi_full_poi = zeroinfl(crit_cnt ~ z.nsfcut*z.date + z.wing, data = z, dist = "poisson" ), zi_full_poi_zi.zjul = zeroinfl(crit_cnt ~ z.nsfcut*z.date + z.wing|z.date, data = z, dist = "poisson" ) ) aic.table_dist <-data.frame(AIC= sapply(distributionList,AIC)) aic.table_dist$Delta.AIC<-aic.table_dist$AIC - min(aic.table_dist$AIC) aic.table_dist<-aic.table_dist[order(aic.table_dist$AIC),] aic.table_dist ``` As expected, zero inflated negative binomial models were the best models. Additionally the best model also includes Julian date to account for the structural zeros. ###### VI. VIF In order to test for collinearity between the explanatory variables below are three of the potential full models without interaction terms and quantify their variance inflation factors (car function 'vif', Fox & Weisberg 2019). ```{r} check_collinearity(glm.nb(crit_cnt ~ z.wing + z.date + z.nsfcut + z.border, data=z)) check_collinearity(glm.nb(crit_cnt ~ z.wing + z.date + z.nsfcut + z.sf + z.border, data=z)) check_collinearity(glm.nb(crit_cnt ~ z.wing + z.date + z.tot, data=z)) ``` Using a conservative "cut off" point of 2, it looks like for the three models I would like to try, there will not be an issue with collinearity between explanatory variables (VIF < 2). I will jut check to see if sunflower abundance actually matters, and if it doesn't I will use the "cut flower" category which groups them all together. ### Describe the Model I will be using a zero-inflated negative binomial mixed model. This model has a linear predictor and 2 parts for the error distribution, the binary process (binomial) and the count process (negative binomial). Because infection has been found to change over the season I will use Julian date as the predictor variable for the structural zero portion of the model. Here I will only describe the model from the principal hypothesis. The model presented below states that the: Crithida count = (𝜂𝑖) is affected by the abundance of cut flowers = (𝛽nsfcut𝑋nsfcut𝑖) and the additive covariates time of season (𝛽julian.date𝑋julian.date𝑖) and size of bumblebee = (Wing_mm𝑋Wing_mm𝑖) The variability between sites is also accounted for by using the random factor of farm ID = (1|farm). The random factor farm only assumes that the mean of the response variable are affected by site. The 𝛽0 = When all continuous explanatory variables (cut flower abundance, julian.date and wing_mm) are set to to 0. 𝛽nsfcut = change in 𝜂 with a unit increase in non sunflower cut flower abundance 𝛽julian.date = change in 𝜂 with a unit increase in julian.date. 𝛽Wing_mm = change in 𝜂 with a unit increase in wing marginal cell size (proxy for bee size). Yfarm = is the contrast in 𝜂 between groups (farm sites), estimated hierarchically. For the secondary hypotheses border flowers, sunflowers and total flowers can be substituted for the cut flower abundance in these models. #### Linear predictor: *Count:* 𝜂𝑖 = 𝛽0 + 𝛽nsfcut𝑋nsfcut𝑖 + 𝛽julian.date𝑋julian.date𝑖 + 𝛽Wing_mm𝑋Wing_mm𝑖+ YfarmZfarm𝑖 *Binomial:* ~ julian.date #### Error distribution: *Count process:* 𝑒𝑖~ NB(r, k) *Binary process:* 𝑒𝑖~ Binomial(𝑛, 𝑝𝑖), n = 1 #### Link function *Count process:* log link ln(u) = n *Binary process:* logit link ui= e^ni/1+e^ni ### Model(s) fit and hypothesis testing I will be fitting the full models and compare to the null model and alternative model, which just includes bee size, since we know that bee size is associated with infection. #### Principal model How does cut flower abundance affect Crithidia infection? I fit the global model using the generalized linear mixed zero inflated negative binomial distributed model (glmmTMB function 'glmmTMB', Brooks et al. 2017). ##### Hypothesis testing principal hypothesis: Below I present all of the model variations of the proposed model setting the structural zero portion of the model to the null, which means that I assume no prior knowledge how the structural zeros are explained in this system. All predictor variables are scaled. Cut flower model list: ```{r} cut_full<-glmmTMB(crit_cnt ~ z.nsfcut + z.date + z.wing +(1|farm), ziformula= ~z.date, family=nbinom2, data = z ) alt<-glmmTMB(crit_cnt ~ z.date + z.wing +(1|farm), ziformula= ~z.date, family=nbinom2, data = z ) cut_null<-glmmTMB(crit_cnt ~ 1 +(1|farm) , ziformula= ~z.date, family=nbinom2, data = z ) ``` ```{r} anova(cut_full, alt, cut_null) ``` We can see that the best model is the alternative model which only includes bee size and time of season. The null model and the full model are not significantly different. Cut abundance does not enhance the model, does not fit the data better than the null. #### Secondary hypotheses As demonstrated above, the best model included z.date to account for the structural zeros, therefore the formula of the principal hypothesis will be duplicated below for the secondary hypotheses, but, the floral abundance explanatory factor will substituted for the floral abundances of other types of Asteraceae flowers that are a subset of the "cut flowers", wild "border flowers" or the total of all Asteraceae flowers. ##### Model selection secondary hypotheses: Below are the models that I have made to account for sunflower presence on farms. Effect of sunflower presence is not my primary question, but because sunflower pollen does significantly reduce Crithidia infection, it is important to test if sunflower abundance influenced Crithidia infection. *Secondary hypothesis models:* Although I was creative in the way that I back estimated border flower abundances, we decided to just analyze farm data from site visit 3, which we did quantify all border flowers. ```{r} z3<-z%>% filter(visit == 3) ``` ```{r} sf_full<-glmmTMB(crit_cnt ~ z.sf+z.date + z.wing +(1|farm), ziformula= ~z.date, family=nbinom2, data = z ) #leave this in in case reviewers want to check sunflower abundance, but this dataset includes some estimates (back calculating) for sunflower abudance. border_full<-glmmTMB(crit_cnt ~ z.border+z.date + z.wing +(1|farm), ziformula= ~z.date, family=nbinom2, data = z3 ) border_alt<-glmmTMB(crit_cnt ~ z.date + z.wing +(1|farm), ziformula= ~z.date, family=nbinom2, data = z3 ) border_null<-glmmTMB(crit_cnt ~ 1 +(1|farm), ziformula= ~z.date, family=nbinom2, data = z3 ) tot_full<-glmmTMB(crit_cnt ~ z.tot+z.date + z.wing +(1|farm), ziformula= ~z.date, family=nbinom2, data = z3 ) #anova(sf_full, alt, cut_null) anova(border_full, border_alt, border_null) anova(tot_full,border_alt, border_null) ``` ```{r} Anova(border_full) ``` ### Model Validation To validate the model fit the following tests were ran: KS test, a dispersion test and an outlier test which will be plotted in a qq plot and a residual plots using the simulated residuals against the observed residuals (DHARMa functions 'simulateResiduals', 'plot', Hartig, 2022). Additional dispersion tests and zero inflation tests will also be ran to further validate that the correct distributions and models types was selected (DHARMa functions, 'testDispersion', 'testZeroInflation', Hartig, 2022). cut_full, cut_alt (the model with only covariates), border_full, tot_full, sf_full, nsfcut_full ```{r} simulated.resids.best.model <- simulateResiduals(fittedModel = cut_full, n = 100) plot(simulated.resids.best.model) ``` ```{r} simulated.resids.best.model <- simulateResiduals(fittedModel = alt, n = 100) plot(simulated.resids.best.model) ``` ```{r} simulated.resids.best.model <- simulateResiduals(fittedModel = border_full, n = 100) plot(simulated.resids.best.model) ``` ```{r} simulated.resids.best.model <- simulateResiduals(fittedModel = tot_full, n = 100) plot(simulated.resids.best.model) ``` ```{r} simulated.resids.best.model <- simulateResiduals(fittedModel = sf_full, n = 100) plot(simulated.resids.best.model) ``` a bit of deviation here... maybe I don't include the sunflower model... since there was a lot of zeros... maybe I should ask Lynn if I just use the nscut model? ```{r} simulated.resids.best.model <- simulateResiduals(fittedModel = nsfcut_full, n = 100) plot(simulated.resids.best.model) ``` The QQ plot of our best model has non-significant deviations and the observed vs. expected looks "normal". The simulated residuals from the model fits well with the observed data, additionally there are no significant problems detected with the DHARMa residuals and model predictions. ### Interpret the model effect sizes (beta) ```{r} Anova(cut_full, type = 2) summary(cut_full) ``` ```{r} Anova(border_full, type = 2) summary(border_full) ``` ```{r} Anova(tot_full, type = 2) summary(tot_full) ``` According the count process of the zero inflated negative binomial model, for every unit increase of Julian data there is an estimated 0.25 decrease in Crithidia cell count. On the other hand, for every unit increase of bee size there is an estimated 0.27 increase in Crithidia cell count. For the structural zero part of the model or the binary part of the model as Julian date increases it is less likely to that the zero is a structural zero. ## Results Across all sites we found 46% bees with Crithidia infection for Bombus impatiens. The variation of infection was effects by bee size, with larger bee having higher infection levels, and by Julian date, where infection decreased over time and the structural zeros were mostly associated with zero values earlier in the season. There was no significant effect of sites to account for the stochastic portion of the model. None of the floral abundance variables significantly explained the variation observed in Crithidia cell count.