MaxEnt Model

Overview

Bananas are one of the most popular fruits in the world. With a market value of 20-25 billion dollars, bananas are crucial to nations such as Ecuador, Colombia, and Costa Rica (Eurostat 2022). Now it is threatened by Fusarium Wilt, whose pandemic has caused alarm in every nation that produces bananas, either for domestic use or to export. Fusarium Wilt threatens 99% of all banana exports and 80% of all banana and plantain production (Perez-Vicente et al, 2014). Without proper control of the pandemic, Fusarium Wilt will destroy most of the bananas and plantains that many rely on for food (FAO of UN 2022). To further understand the current and future environmental impact of Fusarium Wilt spread, we projected the distribution of bananas globally to see what environmental factors are most important for growth and applied it to future climate projections to gain more insight into banana agriculture in the future.

MaxEnt Model Description

There are many methods and software designed to predict species distributions, but we focused on simple-to-use models that anyone can use. We used MaxEnt (short for maximum entropy modeling), an open-source software used by ecologists to model species distribution (Elith et al. 2010). The Maxent algorithm predicts species distribution by taking into account the limits of environmental variables of known species occurrence locations (BCCVL 2021). It only uses the points where the species (in our case bananas) have been found and correlates it to environmental variables. The points where our species have been found are coordinate data given in a CSV form. The environmental data, on the other hand, are data values overlaid onto a map that corresponds with a certain variable, such as mean annual temperature or mean annual precipitation.
There are two parts to the Maxent model: the calibration/fitting of data and the entropy calculation. First, to calibrate the model and fit it to species occurrence and environmental data, the algorithm takes sample points from the environmental variables (hereafter referred to as background points) and sets them aside for comparison. Then, it creates two probability density distributions. The first one takes the values of environmental data at species occurrence locations (presence points) and plots it into a distribution according to its value. For example, in figure 1, the presence points are used to determine which values of temperature and precipitation are most probable in the distribution. Similarly, the second probability distribution takes the background points and creates a probability density distribution of it. The distribution of the species occurrence shows the environment of the species while the distribution of the background points characterizes the environment.

Figure 1.
Figure 1.HHow Maxent calculates probability density distributions for species occurrence locations (top) and background points (bottom) Referenced from BCCVL, 2021

Using these two distributions, Maxent then calculates the ratio between them to find the relative environmental suitability of the species. The process is then run many times to create as many distributions as needed (typically 500), then each distribution is run through the entropy equation. The maximum entropy equation formula calculates the amount of bias using the probability obtained from the distributions. Maxent uses this formula to calculate the amount of bias of 500 distributions, with varying accuracy, made from the data, taking the 1 distribution with the least amount of bias as the final distribution of bananas. In addition to the distribution map, we can also create several analysis diagrams showing model performance, accuracy, trends, and logistics.

Materials and Methods

To create a global view of the distribution, the extent of our model stretches across the globe, covering all landmasses on the globe, including smaller islands. This includes all the environments on Earth and where farmers could plant bananas. With our model serving as a guide, the processes and framework of this model should be applicable to individual countries or continents.

Species occurrence locations for cavendish bananas used in the model were primarily obtained from Global Biodiversity Information Facility (GBIF), an archive of species distribution data from around the world (GBIF, 2022). GBIF archives species locations derived from peer-review scientific papers and datasets, presenting a simpler method of obtaining global data. We combed through the downloaded data, removing those that may affect the model negatively (i.e. data with whole coordinates, repeating data, and data with no coordinates) (Mona Papes, 2018). In total, 1098 occurrences were used in the model for species occurrence data.

Many variables affect species distribution (e.g. temperature, precipitation, elevation) and can greatly influence habitat distribution (Purohit and Rawat, 2021). The most commonly used variables are the nineteen bioclimatic variables. These nineteen variables are biologically meaningful factors that represent annual trends in temperature and precipitation (e.g. mean annual temperature, Mean Temperature of Coldest Quarter, Precipitation of Coldest Quarter) (WorldClim, 2020). The environmental data were downloaded at 2.5 arc minutes resolution from the Worldclim website and converted to ASCII raster format for use in Maxent (Fick and Hijmans. 2017). Worldclim is the most commonly used peer-reviewed database for climate data used in SDMs (Booth, 2018).

In Maxent, the highly correlated (similar) environmental variables can have a significant effect on the predicted species distribution and hinder analysis (Wei et al., 2017). To minimize the correlation, the Pearson correlation coefficient (r) between each variable was calculated and highly correlated variables (r > 0.85) were removed from the model. From the highly correlated variables, one was chosen to represent the data through their predictive power (calculated using jackknife training gain) and biological relevance (Wei et al., 2017; (Onderwater, 2020). After eliminating highly correlated variables, seven variables were used for the final model: Annual mean temperature (Bio 1), Minimum Temperature of Coldest Month (Bio 6), Mean Temperature of Coldest Quarter (Bio 11), Annual Precipitation (Bio 12), Precipitation of Wettest Month (Bio 13), Precipitation of Driest Month (Bio 14), Precipitation of Coldest Quarter (Bio 19). All variables shown in Table 1.

Future environmental data were also downloaded from the WorldClim website at 2.5 arc minute resolution (Fick and Hijmans, 2017). These environmental data were derived from CMIP6 data and created for four Shared Socioeconomic Pathways (SSPs) (WorldClim, 2020). SSPs represent future emission scenarios with varying levels of climate change, the lowest being SSP1-2.6 and the highest being SSP5-8.5 (Hausfather, 2019). We used the SSP1-2.6 and SSP5-8.5 versions (2041 - 2060) to predict both ends of the spectrum of climate change. This allows us to find distribution at both the lower and upper limits of climate change. We chose the investigation timeframe to be 2041 to 2060 for simplicity, as an intermediary step between climate conditions in the current and the far future. However, the timeframe can be set to any time as long as projected climate data are available. Then, we run the Maxent model to achieve current banana distribution and future banana distribution.

Bioclimatic Variable Abbreviation Units
Annual Mean Temperature BIO1 °C
Mean Diurnal Range BIO2 °C
Isothermality BIO3 -
Temperature Seasonality BIO4 °C
Max Temperature of Warmest Month BIO5 °C
Min. Temperature of Coldest Month BIO6 °C
Temperature Annual Range BIO7 °C
Mean Temperature of Wettest Quarter BIO8 °C
Mean Temperature of Driest Quarter BIO9 °C
Mean Temperature of Warmest Quarter BIO10 °C
Mean Temperature of Coldest Quarter BIO11 °C
Annual Precipitation BIO12 mm
Precipitation of Wettest Month BIO13 mm
Precipitation of Driest Month BIO14 mm
Precipitation Seasonality BIO15 -
Precipitation of Wettest Quarter BIO16 mm
Precipitation of Driest Quarter BIO17 mm
Precipitation of Warmest Quarter BIO18 mm
Precipitation of Coldest Quarter BIO19 mm
Table 1. All bioclimatic variables used for modeling banana distribution. Bolded variables were used in the final model.

Results of Current Banana Distribution

Here we present the map of the current potential distribution of bananas Figure5). Most of the areas with the high suitability of bananas fall in tropical rainforest climates (Af) and tropical monsoon climates (Am) under the Köppen climate classification (Afield, 2020). These areas are situated near the equator. Looking back at figure 4, these areas follow the general trend, high annual temperatures and high annual rainfall.

Figure 2.
Figure 5. Banana suitability map made using Maxent, edited using QGIS

Results of Future Banana Distribution

These areas of different climates will eventually change however, with climate change impacting the world in many different ways. Which led us to produce models of the distribution of future banana distribution based on SSP1-2.6 and SSP5-8.5 from 2041 to 2060 (Figure 3). In general, as climate change continues, the suitable area of bananas also increases.


SSP1-2.6

Figure 3.

SSP5-8.5

Figure 3.
Figure 6. Banana suitable area changes under SSP1-2.6 and SSP5-8.5 climate projections

Discussion

a. Model Validation

Model accuracy and relevance can be determined using the ‘Area under the ROC Curve’ (AUC) value (Purohit and Rawat, 2021; MedCalc, 2022). The AUC value determines whether the model’s prediction of the distribution of bananas is better or worse than randomly plotting the distribution of bananas. Between a range from 0 to 1, a value of 0.5 indicates that the value has the same importance as random, and a value of 1 indicates a perfect prediction (Purohit and Rawat, 2021). Performance is rated as failing (0.5-0.6), bad (0.6-0.7), reasonable (0.7-0.8), good (0.8-0.9) or great (0.9-1) (Swets, 2018).

The model of the current banana distribution had an AUC value of 0.920, indicating the simulated model is ‘great’ and can be used for analysis. Figure 2.

Figure 4.
Figure 2.Receiver operating characteristic (ROC) curve for bananas. Stddev = 0.004

b. Variable Suitability

Each variable in the model has a different importance to the model. The importance is determined through how much ‘gain’ over a random distribution the variable has. This is visualized in the jackknife test for test gain (Figure 3). Here, Maxent identified the three most important variables under ‘only variable’: the mean temperature of coldest quarter (Bio 11), annual precipitation (Bio 12) , and the minimum temperature of the coldest month (Bio 6). The test gains under ‘without variable’ also show how the model is impacted when only the variable is removed. If a variable decreases accuracy, it would raise the teal bar past the red bar. But since the red bar shows the most gain, all the variables considered add to the accuracy and specificity of the model.

Figure 5.
Figure 3. Jackknife test of test gain for Banana 1

In addition, the importance of the variables is also shown with their percent contribution and permutation importance (table 2). The variable that contributed most in the end was Bio 12 (42%), followed closely by Bio 11 (29.3%). With permutation importance, the most important variable by far is Bio 6 (78.8%). Since permutation importance is based on the final model, it is a better representation of the importance of each variable.

Table 2. Variable Contribution to Model
Variable Percent contribution Permutation importance
Bio12 42 8.3
Bio11 29.3 4.6
Bio6 14.5 78.8
Bio1 5.9 1.4
Bio19 5.6 4
bio14 2.1 1.2
Bio13 0.5 1.7

What’s more, the model also creates response curves to show how bananas respond to different values of each variable by running a simulation based solely on that variable Figure4). We simulate the suitability of bananas solely on the variable using the same methods mentioned previously. Here, the higher the cloglog output, the more suitable the value of that variable is. Take Bio 1 as an example, the response curves predict that bananas grow best/are most suited for areas with mean annual temperatures between 10°C to 30°C, as compared to a traditional banana’s range of 10°C to 38 °C (Turner, 2003). Similarly, Bio 11’s response curve predicts the mean temperature of the coldest quarter shows the best range at around 15°C to 25°C, dropping off after that. Any place with mean temperatures in the coldest quarter above that cannot support bananas too well. On the other hand, Bio 6 shows that bananas grow best as the coldest minimum temperature of the month increases, reaching the peak at around 22°C to 35°C.

Figure 6.1.
Figure 6.2.
Figure 6.3.
Figure 6.4.
Figure 6.5.
Figure 6.6.
Figure 6.5.
Figure 6. Response curves of each variable used in the model. Blue areas are stddev.

Using the response curves of bananas, we were able to gain insight into the environments bananas were most suitable for. We first focused on the minimum temperature of the coldest month (Bio6), which may heavily affect the viability of our bacteria. The results showed an increasing suitability as the temperature rose, the most effective range being between 20°C to 35°C, indicating most bananas are grown around this range for maximum yields (fig. 4b). This shows that as long as the bacteria can survive temperatures from 20°C and above, the bacteria should work well as a chassis. In addition, the response curves helped us choose our bacterial chassis (B. subtilis), whose optimal temperature range is around 25°C to 37°C (Sidorova et al, 2020). We also utilized the temperature indicated in the response curves to calculate the affinities of malic acid and mleR for our Toxin-Antitoxin model. We also used the response curve of the precipitation of the driest month (Bio14) and annual temperature (Bio1) to design our hardware, adapting it to fit regions with these parameters.

c. Future Distribution Projections

Our model showed that in the next 20 to 40 years, climate change will significantly increase suitable areas for banana growth, by a significant amount. Under a future where carbon emissions are highly controlled and limited (SSP1-2.6), an expected area of 3.8 million km2 becomes suitable for growing bananas (not considering any cities, protected areas, wilderness, etc.). On the other hand, the future we are currently heading towards (SSP5-8.5), an additional expected area of 5 million km2 becomes suitable. The expansion of the suitable areas into food scarce regions such as Nigeria and South Sudan provides a possible solution to the food crisis there today. Today, over 70 million Africans rely on the plant for food security and income; the plant being up to 35% of their daily calories (Stellenbosch, 2013). More bananas can be grown in this region with the advent of climate change, providing essential nutrients and calories to millions of starving people. This situation is not limited to only Africa however, there are many in India who are malnourished that may turn to bananas as the source of their nutrients. Therefore, liquid inoculant can be applied in countries with large potential areas, but also a high malnutrated population to improve food security. These areas require bananas the most and will suffer most from the spread of Fusarium Wilt into the region. The effort required to avoid and keep up with Fusarium Wilt requires an upfront of $50 USD per hectare and $5 USD per hectare per year for surveillance, money that many farmers lack (Staver et al, 2020). If Fusarium Wilt does spread, it can cause widespread damage and loss of money for many local farmers. In our economic model, we estimated the economic damage of TR4 in Taiwan from both a holistic approach, but also a local approach from the perspectives of farmers. We used the predicted area of bananas to estimate the growth of the banana economy.

Figure 7.
- Area Gained: 3.8 million km2 (380 million hectares)
- Area Retained: 14.9 million km2 (1.49 billion hectares)
- Area Lost: 800 thousand km2 (80 million hectares)
Figure 7.
- Area Gained: 5 million km2 (500 million hectares)
- Area Retained: 15.2 million km2 (1.52 billion hectares)
- Area Lost: 500 thousand km2 (50 million hectares)
Figure 7.Banana suitable area changes according to climate projections

Our bacteria offers an alternative for these farmers, especially in future suitable areas without infrastructure. This saves money for farmers and increases the time we have to develop more sustainable methods of growing bananas. Our bacterial solution acts as the stepping stone between the current monoculture of bananas and the future genetically diverse bananas.

For future iGEM teams, our Maxent model provides a precedent for the further use of species distribution modeling in future iGEM projects to use and learn more about a particular species. We encourage the use of the Maxent to find the distribution of other species to find the effects of climate change and to gather habitable data for the species. There are many different directions to take this model with different parameters, different species, and different countries. We included all files we used to make the use easier to use and be built upon.

Using the Maxent software, we were able to create suitability maps for the distribution of bananas, and in doing so, find the extent of damage Fusarium Wilt could potentially do. We found that it is imperative to find a solution, to save the bananas for the future. Our solution will stall the current pandemic to allow time for the sustainable, genetically diverse bananas.

Limitations

Using annual trends of temperature and precipitation, our model does not consider other factors such as invasions and outbreaks, extreme weather events (droughts, freezes, storms, floods), fires, etc. It is very difficult to predict weather of these events years in advance, which brings up the need for further research into predicting suitability and species distribution.

References

Banana Production in Africa. (n.d.). Stellenbosch University. Link to Source

Bioclimatic variables—WorldClim 1 documentation. (n.d.). Link to Source

Booth, T. H. (2018). Why understanding the pioneering and continuing contributions of BIOCLIM to species distribution modelling is important. Austral Ecology, 43(8), 852–860. Link to Source

Elith, J., Phillips, S. J., Hastie, T., Dudík, M., Chee, Y. E., & Yates, C. J. (2011). A statistical explanation of MaxEnt for ecologists. Diversity and Distributions, 17(1), 43–57. Link to Source

Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: New 1‐km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37(12), 4302–4315. Link to Source

Hausfather, Z. (2019, December 2). CMIP6: The next generation of climate models explained. Carbon Brief. Link to Source

Koppen climate classification | Definition, System, & Map | Britannica. (n.d.). Link to Source

Maxent. (n.d.). BCCVL. Link to Source

Onderwater, N. (2020, April). A Global Banana Map: Disaggregating National Production Statistics Through Land Use Analysis and Land Suitability Evaluation. Wageningen University. Link to Source

Papes, M. (2018, June 15). Maxent Introduction. Link to Source

Pérez-Vicente, L. F., Dita, M., & Martinez de la Parte, E. (2014). Technical Manual Prevention and diagnostic of Fusarium Wilt(Panama Disease) of banana caused by Fusarium oxysporum f. Sp. Cubense Tropical Race 4(TR4). Link to Source

Purohit, S., & Rawat, N. (2022). MaxEnt modeling to predict the current and future distribution of Clerodendrum infortunatum L. under climate change scenarios in Dehradun district, India. Modeling Earth Systems and Environment, 8(2), 2051–2063. Link to Source

Schoonjans, F. (n.d.). ROC curve analysis. MedCalc. Link to Source

Sidorova, T. M., Asaturova, A. M., Homyak, A. I., Zhevnova, N. A., Shternshis, M. V., & Tomashevich, N. S. (2020). Optimization of laboratory cultivation conditions for the synthesis of antifungal metabolites by bacillus subtilis strains. Saudi Journal of Biological Sciences, 27(7), 1879–1885. Link to Source

Staver, C., Pemsl, D. E., Scheerer, L., Perez Vicente, L., & Dita, M. (2020). Ex Ante Assessment of Returns on Research Investments to Address the Impact of Fusarium Wilt Tropical Race 4 on Global Banana Production. Frontiers in Plant Science, 11. Link to Source

Swets, J. A. (1988). Measuring the Accuracy of Diagnostic Systems. Science, 240(4857), 1285–1293. Link to Source

Turner, D.W. (1985) Bananas—response to temperature. NSW Agriculture. Link to Source

Wei, J., Zhang, H., Zhao, W., & Zhao, Q. (2017). Niche shifts and the potential distribution of Phenacoccus solenopsis (Hemiptera: Pseudococcidae) under climate change. PLOS ONE, 12(7), e0180913. Link to Source

What is GBIF? (n.d.). GBIF. Link to Source