Introduction
Maple sap consists of several organic and inorganic compounds that contribute to the quality of the maple syrup produced from it. As maple trees progress towards bud break (when the tip of a plant is visible) throughout the growing season, metabolic changes occur close to bud break that result in sap, known as buddy sap, that produces off-flavor maple syrup [1]. Unfortunately, buddy syrup cannot be sold and, since there has previously been no feasible way for farmers to detect the buddy defect until the sap is already processed into syrup, the money invested in preparing it from sap serves as a profit loss for sugarmakers. When researchers investigated the composition of early to late season sap, they found elevated levels of amino acids and amino acid derivatives in late season sap that are close to bud break that are believed to be involved with the development of off-flavors [1,2]. The potentially significant increases in the concentration of these molecules not only helps to elucidate the nature of buddy sap formation, but also gives us the potential to measure changes in the concentrations of these molecules to detect buddy defects in sap. Specifically, our team has worked to develop methods for farmers to detect these molecules - asparagine, sarcosine, and choline. However, as previous studies focused primarily on finding causative agents of the buddy defect to better understand the biology, we chose to instead conduct a statistical evaluation of these molecules as potential biomarkers for buddy-ness [1,2]. Specifically, we used literature data on the concentration of asparagine, sarcosine, and choline in different stages of sap collection to evaluate how changes in their concentrations correlate to each other, how sensitive they are to the buddy defect, and we generated generate threshold values in their concentrations to identify buddy sap. GitLab code of the models
Overview
Maple sap consists of several organic and inorganic compounds that contribute to the quality of the maple syrup produced from it. As maple trees progress towards bud break (when the tip of a plant is visible) throughout the growing season, metabolic changes occur close to bud break that result in sap, known as buddy sap, that produces off-flavor maple syrup [1]. Unfortunately, buddy syrup cannot be sold and, since there has previously been no feasible way for farmers to detect the buddy defect until the sap is already processed into syrup, the money invested in preparing it from sap serves as a profit loss for sugarmakers. When researchers investigated the composition of early to late season sap, they found elevated levels of amino acids and amino acid derivatives in late season sap that are close to bud break that are believed to be involved with the development of off-flavors [1,2]. The potentially significant increases in the concentration of these molecules not only helps to elucidate the nature of buddy sap formation, but also gives us the potential to measure changes in the concentrations of these molecules to detect buddy defects in sap. Specifically, our team has worked to develop methods for farmers to detect these molecules - asparagine, sarcosine, and choline. However, as previous studies focused primarily on finding causative agents of the buddy defect to better understand the biology, we chose to instead conduct a statistical evaluation of these molecules as potential biomarkers for buddy-ness [1,2]. Specifically, we used literature data on the concentration of asparagine, sarcosine, and choline in different stages of sap collection to evaluate how changes in their concentrations correlate to each other, and other and how sensitive they are to the buddy defect, as well as to generate threshold values in their concentrations to identify buddy sap
Background
The maple sap that produces buddy syrup (hereafter “buddy sap”) contains upregulated amino acid and amino acid derivatives. The concentration of total amino acids seems to increase exponentially close to bud break, as does the proportion of buddy sap (Figures 1 and 2 from Nguyen et al.[1]). The chemicals responsible for the distinct flavor of buddy syrup are not present in the sap itself; they result from numerous thermally driven chemical reactions (the group of reactions is known as Maillard reactions) during the evaporation process from sap to syrup [2]. Our team decided to look at sarcosine, asparagine, and choline which are suspected to cause the buddy defect. We chose the amino acids based on their elevated level compared to other amino acids and their derivatives in buddy sap as well as their chemical consequence on the quality of the syrup due to the Maillard reactions.
Why did we choose the 3 molecules as our target of analysis?
Sarcosine, an amino acid with a secondary amine group, is involved in the one-carbon metabolism, an important pathway that regulates several physiologic processes such as nucleotide biosynthesis and amino acid homeostasis [10]. A systems metabolomics study. found that the strongest correlation between the proportion of class 5 syrups and sarcosine among all 23 amino acids they tested (Spearman ρ = 0.59 ; adjusted P-value = 7e−6; range: 0.01 to 0.12 µM)[1].
Choline consists of ethanolamine and contains three methyl substituents attached to the amino function. The quaternary ammonium moiety in choline acts as methyl donors during the heating/evaporation of sap into syrup affecting taste [2].
Asparagine is found in higher concentrations in late-season sap. This amino acid has been shown to be very efficiently converted to pyrazines in thermally driven chemical reactions [4]. Alkyl pyrazines have been reported to contribute to the buddy flavor of the maple syrup [5]. Pyrazines, such as those reported in late season or buddy sap, have an aftertaste characterized as ‘malty’ and ‘astringent’ [6].
Datasets Used
We utilized published data sets on maple syrup metabolomics in late season maple syrup samples from Garcia et al.[2] along with a data set from Nguyen et al. [1].
The dataset from Nguyen et al.[1] consists of maple sap concentrated by membrane processing and syrup samples (n=62) which were obtained from Québec producers located in a latitude range of 45.9 to 48.0 and longitude range of −72.5 to −68.5 during the spring of 2013, 2016 and 2017 with an emphasis on the end of the flow period. We used the data for concentration of sarcosine and asparagine in the maple sap collected at different time points until termination of maple syrup production. The data on proportion of buddy syrup barrels was used to classify the samples as buddy or not buddy.
The dataset from Garcia et al.[2] consists of maple sap samples (n=282) from 2019 which were collected over the entire production season by 12 members of the Ontario Maple Syrup Producers Association (OMSPA). These farms covered all the OMSPA regions of Ontario. We used the data for concentration of different metabolites in the maple sap collected at different time periods until termination of maple syrup production. From this dataset, we used the data for concentration of choline in maple sap which was collected at different time points and used the average value of DTBB below which buddy syrup were being produced to classify samples as buddy or not buddy.
Correlation Testing
Understanding the correlation between levels of these biomarkers helped our hardware team by revealing whether it is worthwhile to combine several biosensors into one electrode test strip. If the values are uncorrelated they could potentially be combined to create a more robust metric and would be worth integrating into a single strip. However, if they are dependent, then their readouts would not be able to be integrated to create a more sensitive detector. Using published data, we compared choline and asparagine concentrations along with sarcosine and asparagine levels by conducting Spearman correlation tests using the statistical tool R. Unfortunately, there does not exist any data where choline and sarcosine were measured together so we could not evaluate their correlation, however as sarcosine is directly made from choline, we considered it likely that their levels are correlated [11].Main Question
Main Questions:
- Is there a correlation between levels of asparagine and choline in buddy syrup?
- Is there a correlation between levels of asparagine and sarcosine in buddy syrup?
Hypotheses
Choline vs. Asparagine:We hypothesize that there is no correlation between asparagine and choline in buddy syrup due to no known mechanistic relationship between these two molecules.
Sarcosine vs. Asparagine:Asparagine and sarcosine are involved in the one-carbon metabolism, an important pathway that regulates several physiologic processes such as nucleotide biosynthesis and amino acid homeostasis [4] Given their associations in this pathway, we hypothesize that there is a correlation between sarcosine and asparagine levels in buddy syrup.
Methods
Using published data sets on maple syrup metabolomics in late season maple syrup samples from Garcia et al. along with a data set from Nguyen et al., we compared these molecules to examine if there is a correlation between them by using the statistical computing tool R [7][9]. These data sets included the concentrations of various amino acids and molecules from various samples of sap and maple syrup. The Garcia et al. data set, which contained concentrations of choline and asparagine, had these values along with DTBB information while the Nguyen data set contained information on amino acid quantification (sarcosine and asparagine) from different sap barrels. Based on these available data sets, we conducted a Spearman correlation test in R, which we visualized using the ggplot package (Figure I).
The Spearman correlation is a nonparametric assessment that determines the strength and direction of a monotonic relationship between two variables. A monotonic relationship indicates that as the value of one variable increases, so does the value of the other variable. On the other hand, it is possible that as the value of one variable increases, the other variable value decreases. Doing a Spearman correlation will help elucidate the relationship between these two variables. The correlation coefficient (Spearman’s rho or r value) ranges from -1 to 1, with the sign indicating a positive or negative relationship [10]. The p (or probability) value is a measure of how probable an observed correlation is due to chance. P-values range from 0 and 1, with a value of 1 suggesting no correlation (due to random chance) and a value close to 0 indicating that the observed correlation is most likely not due to random chance [6].
Choline vs. Asparagine:
We merged the data sets from the two papers. We then filtered our data to only include choline and asparagine quantities (ng/ml) for samples measured at different Days to Bud Break (DTBB), a common time scale used by farmers and syrup producers. DTBB represents an approximate estimate of how many days remain until a flower or plant opens its bud to eventually become a flower or fruit [7][8]. For this specific data set, we readjusted the value of “lower than minimal detectable levels (mdl),” which was represented as "<mdl" to equal zero. We made this adjustment to normalize the data in an accurate manner. We were thus able to conduct our correlation test on choline and asparagine quantities in samples that were collected at different DTBB.
Sarcosine vs. Asparagine:For this analysis, we used data from Nguyen et al [9]. Here, the concentrations (ng/µL) of the amino acids from various samples of syrup were compared against each other. This data set included various samples of sap in which both sarcosine and asparagine values were calculated. We were thus able to conduct our correlation test on sarcosine and asparagine quantities that were collected from different sap samples (Figure II).
Results
The spearman coefficient (rho) for choline vs. asparagine is 0.18 (Figure 1). The p-value for this correlation is 4.97 x 10-5, which suggests that the observed correlation is most likely not due to chance. According to statistical literature, any spearman correlation coefficient between 0 and 0.19 indicates a very weak relationship hence there is a very weak correlation between these molecules [10].
On the other hand, the correlation coefficient between sarcosine and asparagine is 0.074 (Figure 2). The corresponding p-value is 0.5365, suggesting that there is no significant correlation in expression levels between these two molecules.
Conclusion
Our data does not support our hypothesis that there is no correlation in expression between choline and asparagine since we found there to be a significant correlation between those two molecules. Our data regarding sarcosine and asparagine, however, informs us that we should reject our hypothesis that there is a weak correlation between sarcosine and asparagine in buddy syrup. In addition, our logistic regression model for detecting buddy defects can also be conducted to further bolster our results. Since we found a weak correlation between choline and asparagine and no correlation between sarcosine and asparagine, it is important for our hardware team to create separate diagnostic sensors for each small molecule of interest.
Usefulness to Hardware
Though we found a significant correlation between asparagine and choline, the correlation itself is weak so it is still hard to predict the concentration of one from the other. From this result, our hardware team was able to rule out the initial notion of integrating different sensors into one device.
Likewise, the lack of correlated expression levels between asparagine and sarcosine also helped guide the direction in which the hardware team chose to develop and combine sensors. Here too, the lack of statistical correlation between these two amino acids suggested that it is difficult to combine sensors together and that different detection electrode strips are necessary for each molecule of interest.
While this data is crucial for our team’s future steps, it is also important to understand the specificity and sensitivity of various small molecules and amino acids especially if we wish to predict buddy syrup and its causes. These analyses, which are explained in the Buddy Sap Detection, also have great relevance to our wetlab and hardware team.
Sensitivity Analysis and Threshold Selection
Main Question
What is the likelihood of having buddy syrup based on the quantity of 3 amino acids - sarcosine, choline, and asparagine - in a sap sample?
Hypothesis
We predict that the concentration of choline, asparagine, and/or sarcosine would be good predictor variables for determining buddy syrup outcome.
Methods
Logistic Regression as a Model
Since our response variable – buddy-ness – is a binary variable (it only has two potential values: buddy or not buddy), we decided to use logistic regressions to determine the predictive ability of each biomarker - asparagine, sarcosine, and choline. Our predictor variables are the concentrations of sarcosine, asparagine, and choline, which were scaled and centered. The response variable – buddy-ness – was classified in accordance with the dormancy release index by Nguyen et al.[1] and by Days To Bud Break (DTBB) by Garcia et al.[2]
Indices
The data used for the models were collected from two different papers - Nguyen et. al. and Garcia et. al.. These studies are based on two different indices - dormancy release index, indicated as Sbb in Nguyen et al., [1] and Days To Bud Break (DTBB) in Garcia et al [2]. The indices are useful and more accurate predictors of bud break than Julian date since bud break date varies by region and climate. Both the indices were formulated using the Raulier and Bernier model, which was formulated to predict foliation in trees [11].
Dormancy release is the period when the tree metabolism changes prior to bud break. Dormancy release index or Sbb (range = 15.2 - 791.8) represents the remaining sum of cumulative temperatures necessary to reach bud break. Sbb = 0 corresponds to bud break, and high Sbb values are found early in the season, prior to bud break. DTBB, on the other hand, is the number of days until predicted bud break in Julian date, and normalized across an area. Zero DTBB represents the predicted date of leaf emergence, calculated using meteorological data with the help of the Raulier and Bernier model. The indices help normalize the sap collection season across regions since different regions have different end of the season dates for sap collection.
Assumptions
Before constructing the model, we need to check if our data violates the assumptions of a logistic regression model.
- The response variable is binary. Our outcome is whether the maple syrup is “buddy” or “not buddy”.
- The observations are independent. All sap measurements come from independent samples.
- There are no extreme outliers. There are no extreme outliers or influential observations in the dataset.
- There is a linear relationship between the predictor variables and the logit of the response variable.
Evaluating Sbb as a Measure of Buddy Sap
We examined whether dormant release index, or Sbb, is a good predictor of buddy-ness in maple sap since all our amino acid concentrations seem to elevate close to bud break. We classified a sample as buddy if the proportion of barrels of buddy syrup compared to non-buddy syrup is more than 50%.Results
We fit a logistic model for ranking of predicted probabilities of buddy-ness against Sbb vs predicted probability of buddy-ness using data from Nguyen et al. (Figure 2). Our estimated model coefficients are as follows:
Coefficients | Estimate | Standard Error | Z value | Pr(>|z|) |
---|---|---|---|---|
(Intercept) | 4.8372 | 2.0595 | 2.349 | 0.0188 |
SBB | -0.2238 | 0.0895 | -2.5 | 0.0124 |
Evaluating Sarcosine Concentration as a Measure of Buddy Sap
Results
We fit a logistic model for ranking of predicted probabilities of buddy-ness against sarcosine concentration vs predicted probability of buddy-ness using data from Nguyen et al. (Figure 3). Our estimated model coefficients are as follows:
Coefficients | Estimate | Standard Error | Z value | Pr(>|z|) |
---|---|---|---|---|
(Intercept) | -2.0712 | 0.4928 | -4.203 | 2.64e-05 |
Sarcosine | 1.5617 | 0.4578 | 3.412 | 0.000646 |
Now we have a binary classifying model to work with, it’s important to test its diagnostic ability. We used Receiver Operating Characteristic curves, or ROC curves, to test the diagnostic ability of our model. First, we divided the dataset into two subsets – training and testing. Then we fit our model using the training set and tested the fit using the testing set. Our curve is thus created by plotting sensitivity and specificity at various threshold settings. Optimal threshold value is calculated by optimizing for misclassification error using Youden’s J statistic.
We determined the optimal cut-off for predicted probability using Youden’s J statistic as 0.6575, corresponding to a concentration of 1.75 ng/mL. At this threshold value, we have the lowest misclassification error (15.38%) with a true positive rate of 33.33% and true negative rate of 100%. Our results indicate that sarcosine levels are a sensitive metric for changes in sap buddy-ness, though the high false negative rate could be improved. The results also suggest that utilizing it with a cutoff of 0.6575 would be effective at accurately detecting buddy sap samples, without misidentifying normal sap as buddy.
Evaluating Choline Concentration as a Measure of Buddy Sap
Results
We fit a logistic model for ranking of predicted probabilities of buddy-ness against Sbb vs predicted probability of buddy-ness using data from Garcia et al. (Figure 5). Our estimated model coefficients are as follows:
Coefficients | Estimate | Standard Error | Z value | Pr(>|z|) |
---|---|---|---|---|
(Intercept) | -0.6785 | 0.2341 | -2.899 | 0.00374 |
Choline | 1.0348 | 0.2475 | 4.181 | 2.9e-05 |
The optimal cut-off from this curve using Youden’s J statistic is 0.679, corresponding to a concentration of 1.31 ng/mL. At this threshold value, we have a misclassification error rate of 20.00% with a true positive rate of 44.44% and a true negative rate of 100%. The results indicate that choline levels are a highly specific metric for changes in sap buddy-ness capable of detecting a higher proportion of buddy sap samples than sarcosine. a higher spe that the generated cutoff of 0.679 would be effective at discerning buddy sap from non-buddy sap.
Evaluating Asparagine Concentration as a Measure of Buddy Sap
Results
We fit a logistic model for ranking of predicted probabilities of buddy-ness against asparagine vs predicted probability of buddy-ness using data from Nguyen et al. (Figure 7). Our estimated model coefficients are as follows:
Coefficients | Estimate | Standard Error | Z value | Pr(>|z|) |
---|---|---|---|---|
(Intercept) | -2.0253349 | 0.4315629 | -4.695 | 2.66e-06 |
Asparagine | 0.0008959 | 0.0004780 | 1.874 | 0.0609 |
The optimal cut-off from this curve is 0.305, corresponding to a concentration of 2600 ng/mL. At this threshold value, we have a misclassification error rate of 15.38% with a true positive rate of 33.33% and a true negative rate of 100%. The results indicate that despite the lack of significance of the linear model, possibly due to small sample size, asparagine is still functional as a specific biomarker.
Discussion
Considering the strength of the statistical models for the biomarkers, the predictive ability of choline concentration is found to be the best predictor, having the best fitting linear model and the highest true positive rate, meaning it would identify a greatest proportion of buddy sap samples. All three could serve as biomarkers with sarcosine, choline, and asparagine having diagnostic cutoffs of 1.75, 1.31, and 2600 ng/mL, respectively. All three of these thresholds have 0% false positive rates meaning farmers could accurately avoid wrongly labeling normal sap as being buddy, which would result in wasting good quality sap. Finally, while all three models do have low true positive rates, they would successfully identify up to 44% of ropy sap samples without any misidentification of normal sap, and thus reduce profit loss due to processing this buddy sap by 44%. These models could likely be further refined, with a more sensitive cutoff through future testing our devices on sap samples to provide a dataset.
Usefulness to Hardware/Wetlab
Our wetlab and hardware team have developed several different types of biosensors – one is an agglutination assay for asparagine using engineered E.coli and the other is a sarcosine aptasensor which uses sarcosine aptamer to detect the level of sarcosine in maple sap. This model will be used to design the test for buddy defect based on the determined threshold concentration of each amino acid. The concentration values that we received from those biosensors can be used to predict “buddy-ness” in the maple sap sample.References
- N’guyen, G.Q., Martin, N., Jain, M. et al. A systems biology approach to explore the impact of maple tree dormancy release on sap variation and maple syrup quality. Sci Rep 8, 14658 (2018). https://doi.org/10.1038/s41598-018-32940-y
- Garcia EJ, McDowell T, Ketola C, Jennings M, Miller JD, Renaud JB (2020) Metabolomics reveals chemical changes in Acer saccharum sap over a maple syrup production season. PLoS ONE 15(8): e0235787. https://doi. org/10.1371/journal.pone.0235787
- Ball DW. The Chemical Composition of Maple Syrup. Journal of Chemical Education. 2007; 84(10). https://doi.org/10.1021/ed084p1647
- Amrani-Hemaimi M, Cerny C, Fay LB. Mechanisms of Formation of Alkylpyrazines in the Maillard Reaction. Journal of Agricultural and Food Chemistry. 1995; 43(11):2818–22. https://doi.org/10.1021/ jf00059a009
- Hwang H-I, Hartman TG, Ho C-T. Relative Reactivities of Amino Acids in Pyrazine Formation. Journal of Agricultural and Food Chemistry. 1995; 43(1):179–84. https://doi.org/10.1021/jf00049a033
- Perkins TD, van den Berg AK. Maple syrup-production, composition, chemistry, and sensory characteristics. Adv Food Nutr Res. 2009; 56:101–43. Epub 2009/04/25. https://doi.org/10.1016/S1043-4526(08) 00604-9 PMID: 19389608.
- Siegel A. Quantification of Alkyl Pyrazines in Maple Syrup. In: Costanza-Robinson M, editor. The Sixth Spring Student Symposium Mc Cardell Bicentennial Hall, Great Hall, Middlebury College; April 19 & 20, 2012; Vermont: Middlebury College; 2012.
- Camara M, Cournoyer M, Sadiki M, Martin N. Characterization and Removal of Buddy Off-Flavor in Maple Syrup. J Food Sci. 2019; 84(6):1538–46. Epub 2019/05/24. https://doi.org/10.1111/1750-3841. 14618 PMID: 31120572.
- Calvani R, Picca A, Marini F, Biancolillo A, Gervasoni J, Persichilli S, Primiano A, Coelho-Junior HJ, Bossola M, Urbani A, Landi F, Bernabei R, Marzetti E (2018) A Distinct Pattern of Circulating Amino Acids Characterizes Older Persons with Physical Frailty and Sarcopenia: Results from the BIOSPHERE Study. Nutrients. 6;10(11):1691. https://doi.org/10.3390/nu10111691
- Pietzke M, Meiser J, Vazquez A (2020) Formate metabolism in health and disease Mol Metab. 33:23-37. https://doi.org/10.1016/j.molmet.2019.05.012
- Raulier, F. & Bernier, P. Y. Predicting the date of leaf emergence for sugar maple across its native range. Revue canadienne de recherche forestière 20, 1429–1435 (2000).
- Miller, David. “What Causes Buddy Syrup and What Can Be Done to Prevent It?” Maple Research, Maple Syrup Digest, 1 Mar. 2021, https://mapleresearch.org/pub/buddy0321/.