Bananas are one of the most popular fruits in the world. With a market value of 20-25 billion dollars, bananas are crucial to nations such as Ecuador, Colombia, and Costa Rica (Eurostat 2022). Now it is threatened by Fusarium Wilt, whose pandemic has caused alarm in every nation that produces bananas, either for domestic use or to export. Fusarium Wilt threatens 99% of all banana exports and 80% of all banana and plantain production (Perez-Vicente et al, 2014). Without proper control of the pandemic, Fusarium Wilt will destroy most of the bananas and plantains that many rely on for food (FAO of UN 2022). To further understand the current and future environmental impact of Fusarium Wilt spread, we projected the distribution of bananas globally to see what environmental factors are most important for growth and applied it to future climate projections to gain more insight into banana agriculture in the future.
There are many methods and software designed to predict species distributions, but we focused on
simple-to-use models that anyone can use. We used MaxEnt (short for maximum entropy modeling), an
open-source software used by ecologists to model species distribution (Elith et al. 2010). The Maxent algorithm predicts
species distribution by taking into account the limits of environmental variables of known species
occurrence locations (BCCVL 2021). It
only uses the points where the species (in our case bananas) have been found and correlates it to
environmental variables. The points where our species have been found are coordinate data given in a CSV
form. The environmental data, on the other hand, are data values overlaid onto a map that corresponds
with a certain variable, such as mean annual temperature or mean annual precipitation.
There are two parts to the Maxent model: the calibration/fitting of data and the entropy calculation.
First, to calibrate the model and fit it to species occurrence and environmental data, the algorithm
takes sample points from the environmental variables (hereafter referred to as background points) and
sets them aside for comparison. Then, it creates two probability density distributions. The first one
takes the values of environmental data at species occurrence locations (presence points) and plots it
into a distribution according to its value. For example, in figure 1, the presence points are used to
determine which values of temperature and precipitation are most probable in the distribution.
Similarly, the second probability distribution takes the background points and creates a probability
density distribution of it. The distribution of the species occurrence shows the environment of the
species while the distribution of the background points characterizes the environment.
Using these two distributions, Maxent then calculates the ratio between them to find the relative environmental suitability of the species. The process is then run many times to create as many distributions as needed (typically 500), then each distribution is run through the entropy equation. The maximum entropy equation formula calculates the amount of bias using the probability obtained from the distributions. Maxent uses this formula to calculate the amount of bias of 500 distributions, with varying accuracy, made from the data, taking the 1 distribution with the least amount of bias as the final distribution of bananas. In addition to the distribution map, we can also create several analysis diagrams showing model performance, accuracy, trends, and logistics.
To create a global view of the distribution, the extent of our model stretches across the globe, covering all landmasses on the globe, including smaller islands. This includes all the environments on Earth and where farmers could plant bananas. With our model serving as a guide, the processes and framework of this model should be applicable to individual countries or continents.
Species occurrence locations for cavendish bananas used in the model were primarily obtained from Global Biodiversity Information Facility (GBIF), an archive of species distribution data from around the world (GBIF, 2022). GBIF archives species locations derived from peer-review scientific papers and datasets, presenting a simpler method of obtaining global data. We combed through the downloaded data, removing those that may affect the model negatively (i.e. data with whole coordinates, repeating data, and data with no coordinates) (Mona Papes, 2018). In total, 1098 occurrences were used in the model for species occurrence data.
Many variables affect species distribution (e.g. temperature, precipitation, elevation) and can greatly influence habitat distribution (Purohit and Rawat, 2021). The most commonly used variables are the nineteen bioclimatic variables. These nineteen variables are biologically meaningful factors that represent annual trends in temperature and precipitation (e.g. mean annual temperature, Mean Temperature of Coldest Quarter, Precipitation of Coldest Quarter) (WorldClim, 2020). The environmental data were downloaded at 2.5 arc minutes resolution from the Worldclim website and converted to ASCII raster format for use in Maxent (Fick and Hijmans. 2017). Worldclim is the most commonly used peer-reviewed database for climate data used in SDMs (Booth, 2018).
In Maxent, the highly correlated (similar) environmental variables can have a significant effect on the predicted species distribution and hinder analysis (Wei et al., 2017). To minimize the correlation, the Pearson correlation coefficient (r) between each variable was calculated and highly correlated variables (r > 0.85) were removed from the model. From the highly correlated variables, one was chosen to represent the data through their predictive power (calculated using jackknife training gain) and biological relevance (Wei et al., 2017; (Onderwater, 2020). After eliminating highly correlated variables, seven variables were used for the final model: Annual mean temperature (Bio 1), Minimum Temperature of Coldest Month (Bio 6), Mean Temperature of Coldest Quarter (Bio 11), Annual Precipitation (Bio 12), Precipitation of Wettest Month (Bio 13), Precipitation of Driest Month (Bio 14), Precipitation of Coldest Quarter (Bio 19). All variables shown in Table 1.
Future environmental data were also downloaded from the WorldClim website at 2.5 arc minute resolution (Fick and Hijmans, 2017). These environmental data were derived from CMIP6 data and created for four Shared Socioeconomic Pathways (SSPs) (WorldClim, 2020). SSPs represent future emission scenarios with varying levels of climate change, the lowest being SSP1-2.6 and the highest being SSP5-8.5 (Hausfather, 2019). We used the SSP1-2.6 and SSP5-8.5 versions (2041 - 2060) to predict both ends of the spectrum of climate change. This allows us to find distribution at both the lower and upper limits of climate change. We chose the investigation timeframe to be 2041 to 2060 for simplicity, as an intermediary step between climate conditions in the current and the far future. However, the timeframe can be set to any time as long as projected climate data are available. Then, we run the Maxent model to achieve current banana distribution and future banana distribution.
Bioclimatic Variable | Abbreviation | Units |
---|---|---|
Annual Mean Temperature | BIO1 | °C |
Mean Diurnal Range | BIO2 | °C |
Isothermality | BIO3 | - |
Temperature Seasonality | BIO4 | °C |
Max Temperature of Warmest Month | BIO5 | °C |
Min. Temperature of Coldest Month | BIO6 | °C |
Temperature Annual Range | BIO7 | °C |
Mean Temperature of Wettest Quarter | BIO8 | °C |
Mean Temperature of Driest Quarter | BIO9 | °C |
Mean Temperature of Warmest Quarter | BIO10 | °C |
Mean Temperature of Coldest Quarter | BIO11 | °C |
Annual Precipitation | BIO12 | mm |
Precipitation of Wettest Month | BIO13 | mm |
Precipitation of Driest Month | BIO14 | mm |
Precipitation Seasonality | BIO15 | - |
Precipitation of Wettest Quarter | BIO16 | mm |
Precipitation of Driest Quarter | BIO17 | mm |
Precipitation of Warmest Quarter | BIO18 | mm |
Precipitation of Coldest Quarter | BIO19 | mm |
Here we present the map of the current potential distribution of bananas Figure5). Most of the areas with the high suitability of bananas fall in tropical rainforest climates (Af) and tropical monsoon climates (Am) under the Köppen climate classification (Afield, 2020). These areas are situated near the equator. Looking back at figure 4, these areas follow the general trend, high annual temperatures and high annual rainfall.
These areas of different climates will eventually change however, with climate change impacting the world in many different ways. Which led us to produce models of the distribution of future banana distribution based on SSP1-2.6 and SSP5-8.5 from 2041 to 2060 (Figure 3). In general, as climate change continues, the suitable area of bananas also increases.
Model accuracy and relevance can be determined using the ‘Area under the ROC Curve’ (AUC) value (Purohit and Rawat, 2021; MedCalc, 2022). The AUC value determines whether the model’s prediction of the distribution of bananas is better or worse than randomly plotting the distribution of bananas. Between a range from 0 to 1, a value of 0.5 indicates that the value has the same importance as random, and a value of 1 indicates a perfect prediction (Purohit and Rawat, 2021). Performance is rated as failing (0.5-0.6), bad (0.6-0.7), reasonable (0.7-0.8), good (0.8-0.9) or great (0.9-1) (Swets, 2018).
The model of the current banana distribution had an AUC value of 0.920, indicating the simulated model is ‘great’ and can be used for analysis. Figure 2.
Each variable in the model has a different importance to the model. The importance is determined through how much ‘gain’ over a random distribution the variable has. This is visualized in the jackknife test for test gain (Figure 3). Here, Maxent identified the three most important variables under ‘only variable’: the mean temperature of coldest quarter (Bio 11), annual precipitation (Bio 12) , and the minimum temperature of the coldest month (Bio 6). The test gains under ‘without variable’ also show how the model is impacted when only the variable is removed. If a variable decreases accuracy, it would raise the teal bar past the red bar. But since the red bar shows the most gain, all the variables considered add to the accuracy and specificity of the model.
In addition, the importance of the variables is also shown with their percent contribution and permutation importance (table 2). The variable that contributed most in the end was Bio 12 (42%), followed closely by Bio 11 (29.3%). With permutation importance, the most important variable by far is Bio 6 (78.8%). Since permutation importance is based on the final model, it is a better representation of the importance of each variable.
Variable | Percent contribution | Permutation importance |
---|---|---|
Bio12 | 42 | 8.3 |
Bio11 | 29.3 | 4.6 |
Bio6 | 14.5 | 78.8 |
Bio1 | 5.9 | 1.4 |
Bio19 | 5.6 | 4 |
bio14 | 2.1 | 1.2 |
Bio13 | 0.5 | 1.7 |
What’s more, the model also creates response curves to show how bananas respond to different values of each variable by running a simulation based solely on that variable Figure4). We simulate the suitability of bananas solely on the variable using the same methods mentioned previously. Here, the higher the cloglog output, the more suitable the value of that variable is. Take Bio 1 as an example, the response curves predict that bananas grow best/are most suited for areas with mean annual temperatures between 10°C to 30°C, as compared to a traditional banana’s range of 10°C to 38 °C (Turner, 2003). Similarly, Bio 11’s response curve predicts the mean temperature of the coldest quarter shows the best range at around 15°C to 25°C, dropping off after that. Any place with mean temperatures in the coldest quarter above that cannot support bananas too well. On the other hand, Bio 6 shows that bananas grow best as the coldest minimum temperature of the month increases, reaching the peak at around 22°C to 35°C.
Using the response curves of bananas, we were able to gain insight into the environments bananas were most suitable for. We first focused on the minimum temperature of the coldest month (Bio6), which may heavily affect the viability of our bacteria. The results showed an increasing suitability as the temperature rose, the most effective range being between 20°C to 35°C, indicating most bananas are grown around this range for maximum yields (fig. 4b). This shows that as long as the bacteria can survive temperatures from 20°C and above, the bacteria should work well as a chassis. In addition, the response curves helped us choose our bacterial chassis (B. subtilis), whose optimal temperature range is around 25°C to 37°C (Sidorova et al, 2020). We also utilized the temperature indicated in the response curves to calculate the affinities of malic acid and mleR for our Toxin-Antitoxin model. We also used the response curve of the precipitation of the driest month (Bio14) and annual temperature (Bio1) to design our hardware, adapting it to fit regions with these parameters.
Our model showed that in the next 20 to 40 years, climate change will significantly increase suitable areas for banana growth, by a significant amount. Under a future where carbon emissions are highly controlled and limited (SSP1-2.6), an expected area of 3.8 million km2 becomes suitable for growing bananas (not considering any cities, protected areas, wilderness, etc.). On the other hand, the future we are currently heading towards (SSP5-8.5), an additional expected area of 5 million km2 becomes suitable. The expansion of the suitable areas into food scarce regions such as Nigeria and South Sudan provides a possible solution to the food crisis there today. Today, over 70 million Africans rely on the plant for food security and income; the plant being up to 35% of their daily calories (Stellenbosch, 2013). More bananas can be grown in this region with the advent of climate change, providing essential nutrients and calories to millions of starving people. This situation is not limited to only Africa however, there are many in India who are malnourished that may turn to bananas as the source of their nutrients. Therefore, liquid inoculant can be applied in countries with large potential areas, but also a high malnutrated population to improve food security. These areas require bananas the most and will suffer most from the spread of Fusarium Wilt into the region. The effort required to avoid and keep up with Fusarium Wilt requires an upfront of $50 USD per hectare and $5 USD per hectare per year for surveillance, money that many farmers lack (Staver et al, 2020). If Fusarium Wilt does spread, it can cause widespread damage and loss of money for many local farmers. In our economic model, we estimated the economic damage of TR4 in Taiwan from both a holistic approach, but also a local approach from the perspectives of farmers. We used the predicted area of bananas to estimate the growth of the banana economy.
Our bacteria offers an alternative for these farmers, especially in future suitable areas without infrastructure. This saves money for farmers and increases the time we have to develop more sustainable methods of growing bananas. Our bacterial solution acts as the stepping stone between the current monoculture of bananas and the future genetically diverse bananas.
For future iGEM teams, our Maxent model provides a precedent for the further use of species distribution modeling in future iGEM projects to use and learn more about a particular species. We encourage the use of the Maxent to find the distribution of other species to find the effects of climate change and to gather habitable data for the species. There are many different directions to take this model with different parameters, different species, and different countries. We included all files we used to make the use easier to use and be built upon.
Using the Maxent software, we were able to create suitability maps for the distribution of bananas, and in doing so, find the extent of damage Fusarium Wilt could potentially do. We found that it is imperative to find a solution, to save the bananas for the future. Our solution will stall the current pandemic to allow time for the sustainable, genetically diverse bananas.
Using annual trends of temperature and precipitation, our model does not consider other factors such as invasions and outbreaks, extreme weather events (droughts, freezes, storms, floods), fires, etc. It is very difficult to predict weather of these events years in advance, which brings up the need for further research into predicting suitability and species distribution.
Banana Production in Africa. (n.d.). Stellenbosch University. Link to Source
Bioclimatic variables—WorldClim 1 documentation. (n.d.). Link to Source
Booth, T. H. (2018). Why understanding the pioneering and continuing contributions of BIOCLIM to species distribution modelling is important. Austral Ecology, 43(8), 852–860. Link to Source
Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., & Yates, C. J. (2011). A statistical explanation of MaxEnt for ecologists. Diversity and Distributions, 17(1), 43–57. Link to Source
Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: New 1‐km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37(12), 4302–4315. Link to Source
Hausfather, Z. (2019, December 2). CMIP6: The next generation of climate models explained. Carbon Brief. Link to Source
Koppen climate classification | Definition, System, & Map | Britannica. (n.d.). Link to Source
Maxent. (n.d.). BCCVL. Link to Source
Onderwater, N. (2020, April). A Global Banana Map: Disaggregating National Production Statistics Through Land Use Analysis and Land Suitability Evaluation. Wageningen University. Link to Source
Papes, M. (2018, June 15). Maxent Introduction. Link to Source
Pérez-Vicente, L. F., Dita, M., & Martinez de la Parte, E. (2014). Technical Manual Prevention and diagnostic of Fusarium Wilt(Panama Disease) of banana caused by Fusarium oxysporum f. Sp. Cubense Tropical Race 4(TR4). Link to Source
Purohit, S., & Rawat, N. (2022). MaxEnt modeling to predict the current and future distribution of Clerodendrum infortunatum L. under climate change scenarios in Dehradun district, India. Modeling Earth Systems and Environment, 8(2), 2051–2063. Link to Source
Schoonjans, F. (n.d.). ROC curve analysis. MedCalc. Link to Source
Sidorova, T. M., Asaturova, A. M., Homyak, A. I., Zhevnova, N. A., Shternshis, M. V., & Tomashevich, N. S. (2020). Optimization of laboratory cultivation conditions for the synthesis of antifungal metabolites by bacillus subtilis strains. Saudi Journal of Biological Sciences, 27(7), 1879–1885. Link to Source
Staver, C., Pemsl, D. E., Scheerer, L., Perez Vicente, L., & Dita, M. (2020). Ex Ante Assessment of Returns on Research Investments to Address the Impact of Fusarium Wilt Tropical Race 4 on Global Banana Production. Frontiers in Plant Science, 11. Link to Source
Swets, J. A. (1988). Measuring the Accuracy of Diagnostic Systems. Science, 240(4857), 1285–1293. Link to Source
Turner, D.W. (1985) Bananas—response to temperature. NSW Agriculture. Link to Source
Wei, J., Zhang, H., Zhao, W., & Zhao, Q. (2017). Niche shifts and the potential distribution of Phenacoccus solenopsis (Hemiptera: Pseudococcidae) under climate change. PLOS ONE, 12(7), e0180913. Link to Source
What is GBIF? (n.d.). GBIF. Link to Source