Interrogation of the perturbed gut microbiota in gouty arthritis patients through in silico metabolic modeling

Abstract Recent studies have shown perturbed gut microbiota associated with gouty arthritis, a metabolic disease characterized by an imbalance between uric acid production and excretion. To mechanistically investigate altered microbiota metabolism associated with gout disease, 16S rRNA gene amplicon sequence data from stool samples of gout patients and healthy controls were computationally analyzed through bacterial community metabolic models. Patient‐specific community models constructed with the metagenomics modeling pipeline, mgPipe, were used to perform k‐means clustering of samples according to their metabolic capabilities. The clustering analysis generated statistically significant partitioning of samples into a Bacteroides‐dominated, high gout cluster and a Faecalibacterium‐elevated, low gout cluster. The high gout cluster was predicted to allow elevated synthesis of the amino acids D‐alanine and L‐alanine and byproducts of branched‐chain amino acid catabolism, while the low gout cluster allowed higher production of butyrate, the sulfur‐containing amino acids L‐cysteine and L‐methionine, and the L‐cysteine catabolic product H2S. By expanding the capabilities of mgPipe to provide taxa‐level resolution of metabolite exchange rates, acetate, D‐lactate and succinate exchanged from Bacteroides to Faecalibacterium were predicted to enhance butyrate production in the low gout cluster. Model predictions suggested that sulfur‐containing amino acid metabolism generally and H2S more specifically could be novel gout disease markers.


Introduction
The human gut microbiota play essential roles in digestion of plant polysaccharides (1,2), synthesis of essential and health-promoting metabolites (3,4), development of host immune response (5) and maintenance of colonization resistance to pathogens (6). The relative abundances of the diverse bacterial taxa that comprise the microbiota can be determined from stool samples through the application of 16S rRNA gene amplicon sequencing (7)(8)(9). As 16S sequencing has become increasingly routine, we have learned that numerous disease processes are correlated with disruptions of gut microbiota composition, also termed dysbiosis (10)(11)(12). Microbiota-associated diseases range from direct ailments of the gut such as inflammatory bowel disease (13) and Clostridioides difficile infection (14), to general metabolic diseases such as diabetes (15) and obesity (16), to systemic aliments such as cardiovascular disease (17), and even to neurological disorders such as depression (18) and Parkinson's disease (19). While studies that correlate changes in microbiota abundances to disease development have revolutionized our understanding of human disease, such compositional-based analyses often provide little information about the underlying mechanisms by which the microbiota may drive and/or respond to disease processes.
Gouty arthritis is a metabolic disease related to the inability of the human host to properly regulate uric acid, a primary metabolite of purine metabolism (20)(21)(22). As the uric acid concentration in blood serum exceeds ~400 mol/L (termed hyperuricemia; (23,24)), susceptible individuals may begin to suffer gout symptoms including painful inflammation due to the deposition of uric acid crystals in joints (25,26). Therapeutic treatments include drugs such as Allopurinol and Febuxostat that reduce host uric acid synthesis, Krystexxa that increases the breakdown of uric acid to urea, Probenecid and Lesinurad which increase uric acid excretion, and a broad array of anti-inflammatory compounds (27,28). Several recent studies in humans (29)(30)(31) All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint and murine models (32-35) correlated changes in gut microbiota composition to the presence of gout disease, suggesting that microbiota properties could be used to monitor disease development, progression and recovery. Several of these 16S-based studies have been combined with gene catalog (36) and metabolomic (29) analyses to better understand the metabolic changes that accompanied compositional dysbiosis. While they provided new insights into the association between gout disease and altered gut microbiota, these studies were inherently limited in their ability to quantify the functional differences between the gut communities of gout patients versus healthy controls.
This in silico computational study was based on the hypothesis that altered gut microbiota were the result rather than the cause of gout disease, as a causative role has not been demonstrated to date. Indeed, uric acid in mainly produced in the liver by nucleic acid catabolism and only about 20% of uric acid production occurs from digestion of purine-rich foods (32-35). Furthermore, only about 30% of host generated uric acid is secreted into and excreted out of the intestine, with the remainder is excreted through the kidneys (35, 37). Although altered uric acid metabolism in the gut microbiota of gout patients has been demonstrated (36), such perturbations are unlikely to be the major cause of elevated uric acid levels in the blood. Consequently, this study examined the possibility of using predicted gut microbiota properties as clinically-relevant signatures of gout disease rather than as treatable disease drivers.
Consistent with this hypothesis, 16S abundance data derived from patient stool samples (36) were used to build sample-specific computational models for identifying microbiotasynthesized metabolites that may be under-or overproduced in gout patients compared to healthy controls. The 16S dataset, which included bacterial taxa abundances for 41 gout patients and 42 healthy controls, was processed using a metagenomics modeling pipeline (mgPipe; (38)) to All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint 6 construct community metabolic models that spanned 50 taxa (48 genera and 2 families). The computational models were simulated using three in silico diets, and the resulting simulation data was subjected to machine learning and statistical analyses to correlate metabolic function and patient type, extract information about metabolite synthesis capabilities at the community and individual taxa levels, predict intra-taxa metabolite crossfeeding relationships and explore the impact of dietary nutrient levels on community metabolism.

Samples clustered by metabolic capability were associated with gouty and healthy patients
Prior to metabolic modeling, the 16S-derived abundance data were analyzed directly to identify community compositional features associated with gouty and healthy sample. Data analysis was limited to samples in which the modeled taxa accounted for at least 90% of the unnormalized abundances (39/41 gouty samples, 39/42 healthy samples; see Materials and Methods). Among the 25 most abundant taxa across the 78 sample, the abundances of six taxa were significantly different (Wilcoxon rank-sum test, FDR < 0.05) between the 39 gouty and 39 healthy samples.
Five taxa including three butyrate producers (Faecalibacterium, Coprococcus, Roseburia) were most negatively correlated with the blood uric acid concentration across the 78 samples (Fig. S1B).
Faecalibacterium was most positively correlated to three butyrate producers (Coprococcus, Roseburia, Subdoligranulum) and Akkermansia as measured by the proportionality coefficient ( Fig. S1C). A principal component plot of the taxa abundances showed no clear delineation of gouty versus healthy samples (Fig. S1D). Taken together, these results support the conclusion in All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
Using an in silico European diet (Table S2), the net maximal production rates (NMPCs) predicted for 409 exchanged metabolites across the 78 samples were clustered to investigate if model-predicted metabolic capabilities were associated with sample type. Clustering was performed within MATLAB using the kmeans method with the optimal number of three clusters Interestingly, gout patients have been reported to have elevated abundances of Prevotella intermedia in the oral microbiota (39). Taken together, these results suggested that elevated Bacteroides abundance may result from the gout disease process.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Metabolic modeling predicted differential synthesis of amino acids and fermentation products in gouty and healthy samples
Due to the small number of samples contained in the Prevotella-elevated cluster, further statistical analyses were focused on the Bacteroides-dominated cluster (also called the high gout cluster) and the Faecalibacterium-elevated cluster (also called the low gout cluster). A rank sum test (FDR < 0.05), which assesses differences in median values, was performed to identify metabolites with the potential to be differentially produced in the high and low gout clusters. To reduce the 106 metabolites identified to a more manageable number, each metabolite also was required to have an average production rate of >10 mmol/day in at least one cluster and to exhibit at least 10% difference between the average production rates in the two clusters. These thresholds ensured that the differential metabolites would have relatively high average production that differed between the two clusters. The resulting set of 42 differentially produced metabolites covered a wide range of metabolic pathways and included the amino acids D-alanine, L-alanine, L-cysteine, L-histidine, L-isoleucine, L-methionine and L-tyrosine as well as common products of gut microbiota fermentation such as butyrate, H2, H2S, isobutyrate, isocaproate, isovalerate and L-lactate ( Figure   S2, Table S6). Interestingly, hypoxanthine was the only the metabolite directly involved in purine metabolism that had the potential to be differentially produced between the two clusters, supporting the hypothesis that the gut microbiota were not the main drivers of gout disease. Similar predictions were obtained when the samples were partitioned directly according to their clinical status (Table S6), suggesting that sample clustering according to metabolite production capabilities captured the dominant metabolic features differentiating gouty and healthy samples.
Further computational analyses were performed on the seven differentially produced amino acids along with six additional amino acids that shared metabolic pathways with these seven amino All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint acids and the eight differentially expressed fermentation products along with seven additional byproducts commonly produced by the gut microbiota. The high gout cluster was predicted to have significantly elevated capabilities for production of alanine, H2 and three products of branchedchain amino acid catabolism (isobutyrate from valine, isocaproate and isovalerate from leucine; Figure 2). By contrast, the low gout cluster was characterized by the potential for significantly elevated production of butyrate, L-lactate, the sulfur-containing amino acids L-cysteine and Lmethionine, the L-cysteine catabolic product H2S, L-isoleucine and its catabolic product 3-methyl-2-oxovaleric acid, L-histidine and L-tyrosine. Model predictions of elevated alanine and reduced butyrate metabolism in gout patients compared to healthy controls were consistent with gene catalog and metabolomic studies (29,31,36).
To further investigate the metabolite production capabilities of the gout-and healthassociated gut communities, the contributions of individual taxa to the maximal synthesis of the differentially produced amino acids and fermentation byproducts were computed. Bacteroides was responsible for enhanced D-alanine, L-alanine and L-histidine production in the high gout cluster samples (Fig. 3), which were characterized by high Bacteroides abundances. The production of Lisoleucine and L-tyrosine, two amino acids not secreted by the Bacteroides metabolic model, were elevated in the low gout cluster due to increased synthesis by more abundant butyrate-producing taxa (Faecalibacterium, Lachnospiraceae, Roseburia, Coprococcus) as well as Megamonas.
Interestingly, Bacteroides was predicted to have similar L-cysteine and reduced L-methionine synthesis capabilities in the high gout cluster despite these samples having relatively high Bacteroides abundances. This metabolic behavior resulted in significantly reduced total production of L-methionine and L-cysteine, which were also synthesized by Faecalibacterium and other butyrate producers, in the high gout cluster. These predictions suggested a possible role for All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint sulfur-containing amino acids in the perturbed microbiota of gout patients. Synthesis of the six non-differentially produced amino acids was similar between the two clusters because elevated synthesis by Bacteroides in the high gout cluster was balanced with increased synthesis by butyrate producers and Megamonas in the low gout cluster (Fig. S3).
Similar analyses for differentially produced fermentation byproducts revealed that the potential for significantly elevated synthesis of H2, isobutyrate, isocaproate, and isovalerate in the high gout cluster was attributable to Bacteroides (Fig. 4). Byproducts not secreted by Bacteroides such as 3-methyl-2-oxovaleric acid, butyrate and L-lactate were predicted to have the potential for significantly elevated production in the low gout cluster due to increased synthesis by more abundant taxa. For example, butyrate was synthesized at higher rates by Faecalibacterium, Roseburia, Coprococcus and Subdoligranulum in the low gout cluster. In addition to the recognized importance of microbiota-derived butyrate for gut health (40, 41) and its previous implication as gout protective (36), these model predictions suggested that butyrate-producing taxa may contribute to the synthesis of other metabolites possibly involved in gut microbiota dysbiosis.
For example, L-cysteine was predicted to have the potential for significantly elevated production in the low gout cluster due to enhanced synthesis by Faecalibacterium and other butyrate producers (Fig. 3). Since H2S is a common byproduct of cysteine degradation (42, 43), the ability of butyrate producers to synthesize L-cysteine could be related to the potential for elevated H2S production in the low gout cluster. These predictions along with previous studies showing H2S as a possible inducer of inflammation (44, 45) suggested a possible role for H2S specifically and sulfur-containing amino acids more generally in the perturbed microbiota of gout patients. As observed with amino acids, maximal production of the seven non-differentially produced byproducts was similar between the two clusters with elevated synthesis by Bacteroides in the All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Metabolite crossfeeding supports differential amino acid and fermentation byproduct synthesis in gouty and healthy samples
The previous model-based analyses identified butyrate and hydrogen sulfide as putative markers of gout disease. To investigate interactions between taxa that supported maximal production of these two metabolites, crossfeeding relationships were identified by finding metabolites which were both secreted by at least one taxa and uptaken by at least one other taxa above a defined threshold (5 mmol/day to focus on the largest contributors). Consistent with being more abundant in the low gout cluster, the five taxa mainly responsible for butyrate production were predicted to synthesize more butyrate in these sample communities (Fig. 4). However, butyrate production was higher than would be expected based on abundance differences between the two clusters. For example, Faecalibacterium was 230% more abundant in the low gout cluster (Fig. S1) yet synthesized 550% more butyrate. Faecalibacterium was predicted to achieve such elevated butyrate production by exploiting the availability of metabolites secreted from other taxa, most notably acetate, CO2, D-lactate and succinate from Bacteroides ( Figure 5). Similarly, Roseburia utilized acetate and D-lactate from Bacteroides and Subdoligranulum utilized D-alanine from Lachnospiraceae. Very different crossfeeding relationships were predicted for maximal production of other fermentation byproducts. For example, Bacteroides exploited the availability of secreted L-alanine and format to achieve elevated D-lactate synthesis in the high gout cluster (Fig. S5). These results demonstrated the inherent metabolic flexibility of gut bacterial communities and suggested that taxa crossfeeding relationships could be highly context dependent. All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint Unlike butyrate, H2S was predicted to be synthesized almost exclusively by a single taxa, Bacteroides. Although Bacteroides was 60% more abundant in the high gout cluster than the low gout cluster (Fig. 1), the maximal H2S production rate was 120% higher in the low gout cluster ( Fig. 2). Because H2S is a product of cysteine degradation (46, 47) and maximal production of Lcysteine was elevated in the low gout cluster (Fig. 2), we hypothesized that L-cysteine crossfeeding was mainly responsible for differential H2S production between the two clusters. When H2S production was maximized, L-cysteine synthesis by Faecalibacterium was predicted to be 525% higher in the low gout cluster (Fig. 6), which matched the higher Faecalibacterium abundance in this cluster (Fig. 1). Elevated L-cysteine synthesis resulted in a 50% increase in H2S production by Bacteroides, which also preferentially utilized available D-lactate and L-lactate in the low gout cluster. Interestingly, D-alanine crossfeeding supporting maximal H2S production was predicted to differ dramatically between the clustered samples with Bacteroides consuming the metabolite in the high gout cluster and secreting the metabolite in the low gout cluster. Faecalibacterium was predicted to achieve maximal L-cysteine synthesis through acetate and D-lactate crossfeeding (Fig.   S6). Collectively, these results demonstrated that complex relationships may exist between taxa and their metabolic products due to crossfeeding interactions that can be quantified with the type of metabolic modeling approach used in this study.

Different in silico diets generated subtle changes in community metabolism
Previous simulations were performed by constraining community nutrient uptake rates according to an average EU diet. Diet is known to be strongly associated with gout disease, with high consumption of purine-rich foods such as meat, poultry and seafoods more likely to result in hyperuricemia and eventual gout development (48-50). To investigate the possible effects of dietary nutrients on microbiota metabolism, two other in silico diets were simulated (Table S2). All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
When model-processed abundance data generated with the HPD were clustered, three clusters (silhouette value 0.30) contained the same samples as when clustering was performed with the EU diet ( Fig. S7A,B). However, when a rank sum test was performed to find metabolites with the potential to be differentially produced between the high gout clusters or the low gout clusters obtained with the two diets, 17 metabolites were identified including H2S and two metabolites associated with purine metabolism (guanosine, guanine; Fig. S7C). Otherwise, the two diets generated very similar clustered production of amino acids degradation and fermentation byproducts (Fig. S8). Collectively, these predictions suggested that in silico community metabolism was not strongly affected by changes in nutrient availability resulting from the simulated HPD.
Subtle changes in community metabolism were predicted when the EUD and HFD were compared. When partitioned with three clusters (silhouette value 0.29), the two diets generated different sample clustering with the high gout cluster increased from 26 to 31 samples and the low gout cluster decreased from 44 samples to 41 samples (Fig. 7A). The number of gouty samples in these two clusters also changed (Fig. 7B), but the Bacteroides-dominated cluster remained disproportionally gouty (23/31) compared to the Faecalibacterium-elevated cluster (12/41, p < 10 -sensitivity with TPR = 0.59. A rank sum test performed to find metabolites with the potential to be differentially produced between the high gout clusters or the low gout clusters of the two diets identified 10 metabolites, including three metabolites associated with plant polysaccharide degradation (D-galactose, D-glucose, D-maltose) and the glycolytic intermediate glycerol 3phosphate (Fig. 7C). Interestingly, H2S was no longer differentially produced even though the HFD contained 84% more L-cysteine than the EUD as nutrient constraints. Although not statistically significant, the HFD produced slightly elevated maximal production of L-cysteine, Lhistidine, L-isoleucine, 3-methyl-2-oxovaleric acid and L-lactate and slightly reduced maximal production of L-phenylalanine, H2, isobutyrate and isocaproate (Fig. S9). These predictions suggested subtle enhancements in sulfur-containing amino acid metabolism for the HFD and branched-chain amino acid catabolism for the EUD.
To further explore how dietary nutrients affected maximal H2S production, crossfeeding relationships were identified for the three high gout clusters and for the three low gout clusters generated from the different diets. When the low gout clusters were compared, the HPD was predicted to generate the most H2S due to elevated synthesis by Bacteroides (Fig. S10). Although the HPD contained the most dietary amino acids, the L-cysteine uptake rate by Bacteroides was not substantially different between the three diets. By contrast, the HPD generated only slightly higher H2S production than the HFD in the high gout clusters (Fig. 8). Interestingly, crossfeeding of L-cysteine from Faecalibacterium to Bacteroides was elevated for the HFD. These predictions suggested that H2S production was partially attributable to reactions associated with sulfur metabolism (51) other than L-cysteine degradation. The models predicted substantially reduced Lcysteine crossfeeding and H2S production in the high gout clusters compared to the low gout All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion
Gouty arthritis is a chronic inflammatory joint disease that results from imbalanced purine metabolism in the human host. Recent findings that the gut microbiota are perturbed by gout disease (30, 32-36) motivated this in silico modeling study aimed at identifying putative metabolic features associated with gout development. To this end, bacterial community metabolic models were constructed for 39 gout patients and 39 healthy controls using taxa abundance data generated To gain mechanistic insights into gut metabolic features associated with gout, secreted metabolites with significantly different maximal synthesis rates in the Bacteroides-dominated, high gout cluster and the Faecalibacterium-elevated, low gout cluster were identified for a simulated EU diet. Interestingly, hypoxanthine was the only identified metabolite associated directly with purine metabolism. Of the four purine bases (adenine, guanine, hypoxanthine, xanthine), foods rich in hypoxanthine such as animal and fish meats have been reported to be more strongly associated with gout development (50). This uric acid precursor was elevated in the low All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint gout cluster, suggesting higher rather than lower synthesis capabilities for uric acid, which was not secreted by the metabolic models. By contrast, the low gout cluster was predicted to have elevated synthesis of many metabolites, most notably gut health promoting butyrate and several metabolites associated with sulfur-containing amino acid metabolism including the L-cysteine degradation product H2S. Although altered uric acid metabolism in the gut microbiota of gout patients has been demonstrated (36), the in silico model predictions suggested that such perturbations are unlikely to be the major cause of elevated uric acid levels in the blood and that altered gut microbiota were the result rather than the cause of gout disease.
Detailed analyses of individual taxa contributions to the maximal synthesis of differentially produced amino acids predicted a tradeoff between Bacteroides and butyrate producers such as Enhanced alanine metabolism in gut communities of gout patients has been proposed previously based on gene catalog (36) and metabolomics (29) analyses. Model predictions associated with Lcysteine and L-methionine were novel and particularly interesting since Bacteroides was more abundant in the high gout cluster yet was predicted to have lower maximal synthesis rates of these two sulfur-containing amino acids. Similar analyses performed for maximal synthesis of common fermentation byproducts predicted elevated synthesis of isobutyrate, isocarpoate and isovalerate in the high gout cluster resulting from branched-chain amino acid catabolism. The low gout cluster was predicted to have elevated maximal production of the short-chain fatty acid butyrate, L-lactate and H2S, a common All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint end product of L-cysteine catabolism. The ability of bacterial communities contained in the low gout cluster to generate higher butyrate levels was related to higher abundances of butyrate producers in these samples as well as the crossfeeding of metabolites such as acetate, D-alanine, D-lactate and succinate to the butyrate producers. Butyrate has been widely identified as a gut health promoting metabolite (40, 41), and its reduced production by the gut microbiota has been associated with gout disease in several other studies (31, 36). Unlike some previous studies, the in silico models did not predict that propionate (31, 36) would be reduced or that acetate would be elevated (29) in the gut communities of gout patients. Therefore, the computational predictions support the hypothesis that butyrate is the key short-chain fatty acid associated with gout development and that loss of butyrate producers may an important feature of gout-altered gut microbiota.
Reduced maximal H2S production by Bacteroides in the high gout cluster was consistent with the prediction of lower total L-cysteine synthesis in this cluster. By contrast, the low gout cluster had elevated maximal H2S production due to substantially increased L-cysteine crossfeeding from Faecalibacterium to Bacteroides. While Bacteroides is not typically viewed as an important genus for H2S production (46), many Bacteroides strains process the necessary enzymes for cysteine-to-H2S conversion (52, 53). H2S has been proposed to have both antiinflammatory and proinflammatory effects on the human host depending on a number of increasingly understood factors (54-56), including whether H2S is synthesized endogenously by mucosal epithelial cells or derived from the gut microbiota (43, 57). More specifically, H2S synthesized by the gut microbiota through cysteine degradation has been proposed to promote gut health at relatively low concentrations (47). Since most cysteine-to-H2S conversion in the gut is performed by the microbiota (47), these results suggested that tight regulation of this metabolic All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
More generally, the in silico predictions revealed an interesting gut community behavior where synthesis of a potentially inflammatory metabolite (e.g. H2S) may be supported by a healthpromoting taxa (e.g. Faecalibacterium) crossfeeding a metabolite (e.g. L-cysteine) to a diseasepromoting taxa (e.g. Bacteroides). Collectively, these results suggest that cysteine-to-H2S conversion by the gut microbiota may represent a novel metabolic signature of gout disease.
Because consumption of high-purine containing foods such as meat, poultry and fish is a known correlative to gout disease (48-50), in silico high protein and high fiber diets were simulated and compared to the average EU diet. The high protein diet (HPD) allowed higher amino acid and lower carbohydrate uptake rates and therefore was expected to generate substantially different predictions than the EU diet (EUD). This hypothesis proved to be largely false, although H2S continued to be elevated in the low gut cluster with the HPD. Similar results were obtained when the high fiber diet (HFD) was compared to the EUD, with the exception that sample clustering was altered between the two diets. These inconclusive results may have been attributable to limitations of the in silico approach in which sample community compositions were fixed while dietary nutrients were varied, while in reality different diets would be expected to alter microbiota composition. A more consistent analysis could be performed by having 16S-derived abundance data for both gout patients and healthy controls over a range of known diets. Despite this limitation, one consistent prediction across all three diets was elevated H2S in the low gout clusters. Therefore, these results again supported enhanced cysteine-to-H2S conversion as a novel metabolic signature of gout disease.

Patient Data
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. were provided for each sample (Table S1).

Community Metabolic Modeling
Community metabolic models were restricted to 50 taxonomic groups to limit the computational effort associated with model building, simulation and analysis while ensuring adequate coverage of the 16S OTU read data. The models accounted for the 48 most abundant genera across the 83 samples subject to the requirement that each genus could be modeled using genome-scale metabolic reconstructions available in the Virtual Metabolic Human (VMH) resource ((58); www.vmh.life). Combined reads for Escherichia/Shigella were equally split between the two genera. Because unidentified Lachnospiraceae and Ruminococcaceae accounted for 4.9% and 1.1% of total reads, respectively, these two families were included in the models and combined with unmodelable genera (Lachnobacterium, Anaerosporobacter, Parasporobacterium, Hespellia and Robinsoniella for Lachnospiraceae; Oscillibacter, Anaerofilum, Acetivibrio, Acetanaerobacterium, Sporobacter and Hydrogenoanaerobacterium for Ruminococcaceae). This procedure resulted in 50 modeled taxa that accounted for an average of 97.0% of total OTU reads across the 83 samples (Table S1). All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint For each sample, the OTUs associated with each taxon were summed and then normalized to unity by dividing by the sum of OTUs. These normalized taxa abundances were used to construct sample-specific community metabolic models. The function createPanModels within the metagenomics pipeline (mgPipe; (38)) of the MATLAB Constraint-Based Reconstruction and Analysis (COBRA) Toolbox (59) was used to construct genus-and family-level models from the 818 strain models available in the VMH database. According to function documentation available on the COBRA website, the function createPanModels combined all reactions for strains belonging to the taxon of interest and attempted to remove futile cycles that may result from the combined reactions by making certain reactions irreversible. Because the 16S data did not provide resolution at the species and strain levels, the community metabolic models were unable to account for differences in community composition and function below the genus level. Despite this limitation, the results showed that the pan-genome metabolic models used for community modeling allowed substantial differentiation of samples according to their functional capabilities.
The function initMgPipe was used to construct a community metabolic model for each of the 83 patient samples. Model construction required specification of taxa abundances for each sample and maximum uptake rates of dietary nutrients, which were specified according to EU average, high protein and high fiber diets downloaded from the VMH resource (Table S2). The community models contained an average of 38,570 reactions (minimum 22,099; maximum 57,820). All models contained the same constraints for the maximum nutrient uptake rates specified for the chosen diet, while each model had different constraints imposed for the sample taxa abundances.
Following the model building process, mgPipe automatically performed flux variability analysis (FVA) for each model with respect to each of the 409 metabolites assumed to be All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint 21 exchanged between the microbiota and the lumen and fecal compartments. FVA calculations were performed with the COBRA code fastFVA using the CPLEX linear program solver to either maximize/minimize the production of the metabolite or to maximize/minimize the uptake of the metabolite subject to the additional constraint that the community biomass flux remained in the range 0.4-1.0 mmol/day (60). The FVA results were used to compute the net maximal production capability (NMPC; (60)) of each metabolite by each model. Each NMPC value was calculated as the difference between the objective functions of two computed FVA solutions, the first which maximized metabolite production/secretion into the fecal compartment and the second which minimized metabolite uptake from the lumen compartment. An NMPC value represents the sample-specific potential for community production of a single metabolite given the applied nutrient uptake and biomass flux constraints. The mgPipe framework (38) is based on analysis of metabolite-specific NMPCs across samples to assess the capabilities of modeled communities to differentially produce metabolites. Due to nature of the FVA calculations, the NMPCs do not imply that maximal production of multiple metabolites may be achieved simultaneously. In this study, each NMPC was calculated and analyzed independently for each modeled sample. Furthermore, NMPCs were calculated for the three different diet to assess the possible impact of nutrient levels on community metabolism (Tables S3-S5).
Unfortunately, mgPipe did not offer the capability to directly extract the metabolite synthesis capabilities of each modeled taxa. This information was important to understand which taxa were contributing to metabolite synthesis and the intra-taxa crossfeeding relationships which supported maximal production of a particular metabolite. Therefore, specialized MATLAB scripts were written to utilize available Microbiome Modeling Toolbox functions (e.g. adaptVMHDietToAGORA, guidedSim, useDiet) to perform FVA with respect to selected All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020.

Data analysis
Patient data consisted of normalized taxa abundances and model-predicted data consisted of dietdependent NMPCs, both of which could be connected to associated metadata on a sample-bysample basis (Table S1). Data analysis was limited to samples in which the modeled taxa accounted for at least 90% of the unnormalized abundances (39/41 gouty samples, 39/42 healthy samples) to achieve adequate representation of the original 16S gene amplicon data. Both normalized taxa abundances and model-predicted NMPCs were subjected to unsupervised machine learning techniques including clustering and principal component analysis (PCA) to extract putative relationships between partitioned samples and patient gout status. Rather than apply supervised learning to samples partitioned on their known clinical status (i.e. gouty, healthy), unsupervised learning was performed to determine if samples clustered by taxa abundances or NMPCs could be associated with gout status. This approach was applied under the hypothesis that clustering could partially unravel the complex gout disease etiology and reveal at least one cluster with statistically high levels of gouty or healthy samples.
Clustering was performed using the MATLAB function kmeans with the squared Euclidean distance metric, the k-means++ algorithm for cluster center initialization (62) and 1,000 replicates. When clustering was applied to NMPC data generated from the average EU diet, an optimal number of three clusters was determined using the MATLAB function evalclusters with All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint the kmeans clustering method, the silhouette evaluation criterion (63), the sum of absolute difference as the distance measure and 100 replicates. To facilitate subsequent comparisons, three clusters also were used for the high protein and high fiber diets and for clustering of abundance data. For each in silico diet tested, the clustering method proved robust in that the same clustered samples were consistently returned despite the randomness of cluster initialization and the existence of local minima (64).
PCA was performed directly on normalized taxa abundances and model-predicted NMPC data rather than on data preprocessed with sample dissimilarity measures such as the Bray-Curtis (65) or UniFrac (66) metrics. This approach was deemed appropriate since PCA was used for preliminary data visualization and not quantitative data analysis. Statistical significance of associations between categorial variables (e.g. gouty/healthy) across sample groups were assessed using Fisher's exact test (67). Correlations between taxa based on their abundances across samples were calculated using the proportionality coefficient (68), which accounts for the effects of data normalization. Statistically significant differences between NMPCs across samples were assessed using the Wilcoxon rank-sum test (69). The resulting p-values were used to calculate the falsepositive discovery rate (FDR) for each metabolite using the MATLAB function mafdr with the Benjamini-Hochberg method (70). All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020.

Individual taxa contributions to maximal synthesis of fermentation byproducts differentially produced between the high and low gout clusters from an average EU diet.
The byproducts shown from top left to bottom right are 3-methyl-2-oxovaleric acid, butyrate, All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint hydrogen, hydrogen sulfide, isobutyrate, isocaproate, isovalerate and L-lactate. For each byproduct, the top five taxa are shown in the order of their total production across the two clusters. diet. (C) Significant differences in maximal metabolite production rates were determined by applying the Wilcoxon rank sum test (FDR < 0.05) to each metabolite across all samples in the two high gout clusters and the two low gout clusters. In addition to being statistically different, each metabolite shown had an average production rate >10 mmol/day in at least one of the All rights reserved. No reuse allowed without permission.

Individual taxa synthesis
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020.  All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020.  All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020.  All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020.  All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint Figure S2. Differentially produced metabolites between the high and low gout clusters with an average EU diet. Significant differences in maximal metabolite production rates were determined by applying the Wilcoxon rank sum test (FDR < 0.05) to each metabolite across all samples in the two clusters. In addition to being statistically different, each metabolite shown had an average production rate > 10 mmol/day in at least one cluster and average production rates that differed between the clusters by at least 10%. Metabolite abbreviations are taken from the VMH database (www.vmh.life). Full metabolite names, their associated metabolic pathways and numeric values for their average production rates in each cluster are given in Table S6.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020.  Each crossfed metabolite shown had at least one taxa which satisfied minimal bounds on the metabolite secretion and uptake rates. For each metabolite, the top five taxa were ordered by the sum of the absolute values of their uptake and secretion rates across the two clusters.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. Each crossfed metabolite shown had at least one taxa which satisfied minimal bounds on the metabolite secretion and uptake rates. For each metabolite, the top five taxa were ordered by the sum of the absolute values of their uptake and secretion rates across the two clusters.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint Figure S7. Sample clustering and differentially produced metabolites between average EU and high protein diets. (A) Total samples shared between the three clusters obtained with the average EU diet and the three clusters obtained the high protein diet. (B) Gouty samples shared between the three clusters obtained with the average EU diet and the three clusters obtained the high protein diet. (C) Significant differences in maximal metabolite production rates were determined by applying the Wilcoxon rank sum test (FDR < 0.05) to each metabolite across all samples in the two high gout clusters and the two low gout clusters. In addition to being statistically different, each metabolite shown had an average production rate >10 mmol/day in at least one of the compared clusters and average production rates that differed between the compared clusters by at least 10%. Metabolite abbreviations are taken from the VMH database (www.vmh.life). All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted September 3, 2020. . https://doi.org/10.1101/2020.09.02.20187013 doi: medRxiv preprint 47 Figure S10. Individual taxa synthesis and uptake of crossfed metabolites for maximal H2S production from low gout clusters generated from average European, high protein and high fiber diets. The metabolites shown from top left to bottom right are hydrogen sulfide, L-alanine, L-cysteine, D-lactate, L-lactate and succinate. Each crossfed metabolite shown had at least one taxa which satisfied minimal bounds on the metabolite secretion and uptake rates for at least one diet. For each metabolite, the top five taxa are ordered by the sum of the absolute values of their uptake and secretion rates across the three diets.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.