Software | Thessaly

Overview

Our software page contains all the programming that was needed to our project. All the code is easily accessible to our GitLab Software Tools page. But why does project “Navanthus” need a software part ?
Our project’s starting point is the genetically modified plans that our wet lab department developed. These plants, in the presence of microcystins, uptake the excessive phosphorus from the operating water body. Then, comes our dry lab department, which role is to support and optimize the bioremediation of the water body through these plants. The first part of the dry lab department was to make a Constructed Floating Wetland (CFW), which role would be to carry these plants in the water body, letting their roots be underwater to absorb the phosphorus. Having the CFWs as our primarily concern, our dry lab team tried to optimise the use of the CFWs. The first idea that would accompany them was a way for them to be optimally placed in the water body. The plants, then, would have maximum absorbance, resulting in a faster way of bioremediation, but also a better use of our resources. In order to place them optimally, would be to have enough data on the water body. Having done lots of research, we found that there are not a lot of data, that our project could benefit for, for our local lake Karla, that was the inspiration of our project. This way, the idea of monitoring occurred to us. The monitoring is happening through an RC boat called R.A.S.A. . This boat carries sensors to make measurements in the water body placed in, to get both data on how eutrophicated the water body is, but also help us in deciding where is the perfect spot of our CFWs. Software helps in the monitoring part of the project. In more detail, our software contains the programming of the sensors placed in the boat and how the measurements will be sent to a server. Also, it presents a machine learning algorithm that would help to detect areas of microcystins. Finally, a web application was developed that gets the data measured by the boat and visualises them in a graphical user interface. Hence, the end user would easily place the CFWs in the optimal spots, without any difficulty reading a data file, but by just seeing them through his phone or computer.
The rest of the page has been divided to these three parts, electrical sensor programming, machine learning algorithm presentation and web application.

Electrical

Detecting Eutrophication before Solving it.

R.A.S.A., our method to detect eutrophication, before tackling it. R.A.S.A. is a portable R.C. boat that holds various sensors that depict the level of eutrophication of a water body, specifying the exact locations in it that serve the best implementation of our platform.

Background

Our wet lab team developed genetically modified plants that in presence of microcystins uptake the excessive phosphorus from the operating water body. Though, there are no specific spots in the eutrophicated water bodies that the microcystin takes or spots that the microcystin can be identified using vision. Having to deal with this problem, our dry lab team came up with the idea of making several measurements in the water body and trying to find the exact locations where microcystins reside. This way, our platform using wet lab’s plants will have the maximum efficiency on the water body.

Electronics Device R.A.S.A.: Remote Automation System Analysis

R.A.S.A. is equipped with a various, eutrophication related sensors inside, which are powered and connected to a microcontroller arduino mega 2560.

Arduino

Open-source hardware and software make up the foundation of the electronics platform known as Arduino. Arduino boards have the ability to receive inputs, such as light from a sensor, a user pressing a button, or a tweet, and convert them into outputs, such as starting a motor, turning on an LED, or posting anything online.

Our project uses Arduino mega 25601, a microcontroller board based on the ATmega2560. It contains 16 analog inputs, 4 hardware serial ports (UARTs), a 16 MHz crystal oscillator, 54 digital input/output pins (of which 15 can be used as PWM outputs), a USB connector, a power jack, an ICSP header, and a reset button.

Figure 1. Arduino Mega.

The sensors boarded in R.A.S.A. in the first build are an analog dissolved oxygen sensor and an analog pH sensor. Both these sensors are required for measuring and watercolour the water body. R.A.S.A also carries a GPS module and a GSM module, so that the measurements the sensors make, be mapped to a specific location of the water body, through the GPS sensor and then uploaded to a server.

Sensors' Principle

Analog sensors, that were merely used to build the prototype, take measurements of the environmental variables and output an analog voltage. They generate a constant voltage or output signal that is proportionate to the quantity being measured. The output voltage fluctuates between 0-5V.

Figure 2. pH Analog Sensor

The pH scale specifies the acidity or basicity of an aqueous solution. Values range from 0 to 14, with 7 representing neutrality. Acidity is indicated by pH values below 7, whereas alkalinity is shown by pH values above 7. In reality, pH is a measurement of the proportion of free hydrogen and hydroxyl ions in water.

When inorganic nutrients, primarily nitrogen and phosphorus, enter a water body, eutrophication starts to occur. The simplest, swiftest-moving creatures benefit the most from this feed delivery. Algae that may produce light blanket the surface of lakes and ponds. This photosynthetic process produces chemical byproducts that raise the pH of the water and make it more basic. The population of hardier species that consume algae will increase, whereas delicate organisms that cannot endure these chemical circumstances will perish. 2

Through this procedure, a basic pH is produced in the water body and is a clear indicator of an eutrophicated water body.

Calibrating pH Sensor

The pH sensor, bought, was not out of the box calibrated by the manufacturer. In order to calibrate the sensor, we made a couple of steps.

First we measured the value of a liquid we had plenty of (tap water was used) and got the result for it.
Then, we measured the pH of known liquid (pH = 4) and we stored the value of the pH sensor as the value that should be for pH = 4.
Before measuring the next known liquid’s pH, we neutralized the sensor to tap water as the first step to be sure that the previous liquid will not make the next measure more basic.
Then, we repeat step 2, but with another liquid, which pH is equal to 7, and we store one more value.
Finally, having these measurements, we found the line equation that passes from these 2 points.
1. Found the slope using the slope formula
2. Used the slope and one of the points to solve for the y-intercept
3. Knowing the value for m and the value for b, we plugged these into the slope-intercept form of a line (y = mx + b) got the equation for the line.
Last but not least, we made one last measurement of another liquid, which we already knew the pH, to test our calibration.

pH Arduino Code


                  #include <Arduino.h>
                  #define DO_PIN A1
                  
                  #define VREF 5000    //VREF (mv)
                  #define ADC_RES 1024 //ADC Resolution
                  
                  #define SensorPin A0          // the pH meter Analog output is connected with the Arduino’s Analog
                  float avgValue;  //Store the average value of the sensor feedback
                  float b;
                  float buf[10],temp;
                  
                  //Single-point calibration Mode=0
                  //Two-point calibration Mode=1
                  #define TWO_POINT_CALIBRATION 0
                  
                  #define READ_TEMP (25) //Current water temperature ℃, Or temperature sensor function
                  
                  //Single point calibration needs to be filled CAL1_V and CAL1_T
                  #define CAL1_V (2182) //mv
                  #define CAL1_T (25)   //℃
                  //Two-point calibration needs to be filled CAL2_V and CAL2_T
                  //CAL1 High temperature point, CAL2 Low temperature point
                  #define CAL2_V (1300) //mv
                  #define CAL2_T (15)   //℃
                  
                  const uint16_t DO_Table[41] = {
                      14460, 14220, 13820, 13440, 13090, 12740, 12420, 12110, 11810, 11530,
                      11260, 11010, 10770, 10530, 10300, 10080, 9860, 9660, 9460, 9270,
                      9080, 8900, 8730, 8570, 8410, 8250, 8110, 7960, 7820, 7690,
                      7560, 7430, 7300, 7180, 7070, 6950, 6840, 6730, 6630, 6530, 6410};
                  
                  uint16_t Temperaturet;
                  uint16_t ADC_Raw;
                  float ADC_Voltage;
                  float DO;
                  
                  float readDO(uint32_t voltage_mv, uint8_t temperature_c)
                  {
                  #if TWO_POINT_CALIBRATION == 0
                    uint16_t V_saturation = (uint32_t)CAL1_V + (uint32_t)35 * temperature_c - (uint32_t)CAL1_T * 35;
                    return (voltage_mv * DO_Table[temperature_c] / V_saturation);
                  #else
                    uint16_t V_saturation = (int16_t)((int8_t)temperature_c - CAL2_T) * ((uint16_t)CAL1_V - CAL2_V) / ((uint8_t)CAL1_T - CAL2_T) + CAL2_V;
                    return (voltage_mv * DO_Table[temperature_c] / V_saturation);
                  #endif
                  }
                  
                  void setup()
                  {
                    pinMode(13,OUTPUT);    
                    Serial.begin(115200);
                    Serial.println("Ready");    //Test the serial monitor
                  }
                  
                  void loop()
                  {
                  
                    // pH
                  
                    for(int i = 0; i < 10; i++)       //Get 10 sample value from the sensor for smooth the value
                    { 
                      buf[i] = analogRead(SensorPin);
                      delay(10);
                    }
                    
                    for(int i = 0; i < 9; i++)        //sort the analog from small to large
                    {
                      for(int j = i + 1; j < 10; j++)
                      {
                        if(buf[i] > buf[j])
                        {
                          temp = buf[i];
                          buf[i] = buf[j];
                          buf[j] = temp;
                        }
                      }
                    }
                    avgValue = 0;
                    for(int i = 2; i < 8; i++)                      //take the average value of 6 center sample
                      avgValue += buf[i];
                    avgValue = avgValue / 6;
                  
                    float phValue = 0;
                  
                    phValue = - 0.025 * avgValue + 14.97;          // calculate the ph based on the calibration
                    
                    Serial.print("    pH:");  
                    Serial.print(phValue, 4);
                    Serial.println(" ");
                    digitalWrite(13, HIGH);       
                    delay(800);
                    digitalWrite(13, LOW); 
                  
                    // DO
                    Temperaturet = (float)READ_TEMP;
                    ADC_Raw = analogRead(DO_PIN);
                    ADC_Voltage = uint32_t(VREF) * ADC_Raw / ADC_RES;
                  
                    Serial.print("Temperature(C):\t" + String(Temperaturet) + "\t");
                    Serial.print("ADC RAW:\t" + String(ADC_Raw) + "\t");
                    Serial.print("ADC Voltage:\t" + String(ADC_Voltage) + "\t");
                    Serial.println("DO:\t" + String(float(readDO(ADC_Voltage, Temperaturet) / 1000) ) + "mg/L" +"\t");
                  
                    delay(1000);
                  }

Dissolved Oxygen (DO)

Figure 3. DO Analog Sensor

The quantity of free, non-compound oxygen that is present in water or other liquids is referred to as ”dissolved oxygen”. Due to its impact on aquatic life, it is a crucial factor in determining the quality of the water. Dissolved oxygen is a crucial element in limnology, (the study of lakes), second only to water quality. A change in dissolved oxygen balance levels inside the water can harm aquatic life and alter its quality.

Free oxygen (O2) (non-compound oxygen) is oxygen that is not bound to another element. These liberated O2 molecules in the water are what is referred to as dissolved oxygen. Water's (H2O) bound oxygen molecule is part of a compound and is not included when calculating the concentration of dissolved oxygen. One may suppose that when water is churned, free oxygen molecules will dissolve in it, similarly to how salt or sugar do.

Due to an increase in algal density in eutrophicated lakes, the process of vigorous photosynthesis and respiration results in a very significant shift in the concentration of dissolved oxygen. This phenomenon has a significant impact on the survival of aquatic plant and animal species and presents numerous challenges when determining the quality of the water.

A lot of studies in eutrophication have been conducted, and they have shown that do and chlorophyll has a high correlation with the various ways to calculate trophic indices.

As shown in the research by Viet Duc Nguyen et al., an index that calculates eutrophication is Trophic State Index (TSI), where a higher value indicates increased nutrient enrichment. The TSI is a numerical measure of lake trophic status on a scale from 1 to 100. \begin{equation} TSI = \frac{TSI(PO4P) + TSI(Chl-a) + TSI(SD) + TSI(DIN)}{4}, \end{equation} \begin{equation} TSI \to\ Trophic State Index \end{equation} \begin{equation} PO4P \to\ PO_4^{3+} \end{equation} \begin{equation} DIN \rightarrow NH_4^ + NO_3^- + NO_2 \end{equation} In this study (Dissolved Oxygen as an Indicator for Eutrophication in Freshwater Lakes), DO and TSI have 87% correlation, which indicates that DO can very closely depict the level of eutrophication in the water body3.

Calibrating DO Sensor

As the pH sensor, the DO sensor was not calibrated out of the box either. In order to calibrate it, we made these steps:

The membrane cap was filled with 0.5 mol/L NaOH as the filling solution. This is happening so that the probe when oxygen concentration increases, partial pressure and the rate of diffusion will be increased as well, producing higher analog current to the sensor.
We connect the arduino board with the sensor, and we make a measurement after we first wet the probe and let it dry.
Having the above measurement, we get the millivolts produced at that specific room temperature. When the temperature is fixed, the dissolved oxygen concentration is linearly related with voltage. Knowing the above rule, we build a mapping table of temperatures and DO values.
We build the algorithm, in which when a measurement is being made, a function is being called that calculates the DO having the millivolts measured.

Figure 4. Calibrating DO Sensor

This function firstly calculates the saturated dissolved oxygen with this formula: \begin{equation} Saturation = initial millivolts when calibrating the probe + 35 25 C^o - 35 temperature the initial temperature when calibrating \end{equation} Then, the function gets the mapped DO value depending on the temperature (in our case, 25 \begin{equation}C^o\end{equation}) multiplies it by the voltage got from the sensor and divides it by the saturated value that got calculated. \begin{equation} DO = \frac{measured voltage DO values [25]}{saturation} \end{equation} And then divided by 1000 to get mg/L. For higher accuracy, a second value of voltage can be recorded in a different room temperature.

DO Arduino Calibration Code


                          #include <Arduino.h>
                           
                          #define VREF    5000//VREF(mv)
                          #define ADC_RES 1024//ADC Resolution
                           
                          uint32_t raw;
                           
                          void setup()
                          {
                              Serial.begin(115200);
                          }
                           
                          void loop()
                          {
                              raw = analogRead(A1);
                              Serial.println("raw:\t" + String(raw) + "\tVoltage(mv)" + String(raw * VREF / ADC_RES));
                              delay(1000);
                          }

DO Arduino Code


                  #include <Arduino.h>
                  
                  #define DO_PIN A1
                  
                  #define VREF 5000    //VREF (mv)
                  #define ADC_RES 1024 //ADC Resolution
                  
                  //Single-point calibration Mode=0
                  //Two-point calibration Mode=1
                  #define TWO_POINT_CALIBRATION 0
                  
                  #define READ_TEMP (25) //Current water temperature ℃, Or temperature sensor function
                  
                  //Single point calibration needs to be filled CAL1_V and CAL1_T
                  #define CAL1_V (2182) //mv
                  #define CAL1_T (25)   //℃
                  //Two-point calibration needs to be filled CAL2_V and CAL2_T
                  //CAL1 High temperature point, CAL2 Low temperature point
                  #define CAL2_V (1300) //mv
                  #define CAL2_T (15)   //℃
                  
                  const uint16_t DO_Table[41] = {
                      14460, 14220, 13820, 13440, 13090, 12740, 12420, 12110, 11810, 11530,
                      11260, 11010, 10770, 10530, 10300, 10080, 9860, 9660, 9460, 9270,
                      9080, 8900, 8730, 8570, 8410, 8250, 8110, 7960, 7820, 7690,
                      7560, 7430, 7300, 7180, 7070, 6950, 6840, 6730, 6630, 6530, 6410};
                  
                  uint16_t Temperaturet;
                  uint16_t ADC_Raw;
                  float ADC_Voltage;
                  float DO;
                  
                  float readDO(uint32_t voltage_mv, uint8_t temperature_c)
                  {
                  #if TWO_POINT_CALIBRATION == 0
                    uint16_t V_saturation = (uint32_t)CAL1_V + (uint32_t)35 * temperature_c - (uint32_t)CAL1_T * 35;
                    return (voltage_mv * DO_Table[temperature_c] / V_saturation);
                  #else
                    uint16_t V_saturation = (int16_t)((int8_t)temperature_c - CAL2_T) * ((uint16_t)CAL1_V - CAL2_V) / ((uint8_t)CAL1_T - CAL2_T) + CAL2_V;
                    return (voltage_mv * DO_Table[temperature_c] / V_saturation);
                  #endif
                  }
                  
                  void setup()
                  {
                    Serial.begin(115200);
                  }
                  
                  void loop()
                  {
                    Temperaturet = (float)READ_TEMP;
                    ADC_Raw = analogRead(DO_PIN);
                    ADC_Voltage = uint32_t(VREF) * ADC_Raw / ADC_RES;
                  
                    Serial.print("Temperature(C):\t" + String(Temperaturet) + "\t");
                    Serial.print("ADC RAW:\t" + String(ADC_Raw) + "\t");
                    Serial.print("ADC Voltage:\t" + String(ADC_Voltage) + "\t");
                    Serial.println("DO:\t" + String(float(readDO(ADC_Voltage, Temperaturet) / 1000) ) + "mg/L" +"\t");
                  
                    delay(1000);
                  }

Results

Combining the codes for pH and DO and measuring water, safely taken from lake Karla, we got these results after a couple of seconds, when both sensors converged.

Figure 5. Making Measurements on Lake Karla's Water

In the figure below we can observe from the small window, where the outputs from the sensors are being written, that the sensors have converged to specific values that both indicate an eutrophicated water body.

pH is basic at value 8.57, which can be characterised as low for a highly eutrophicated water body like lake Karla, but still indicates the eutrophication of the water body. (normal pH values for a lake range from 6 to 8)
DO is very low at a price of 4.35 mg / L specifying in contrast with pH value that lake Karla is highly eutrophicated. (eutrophicated bodies DO, values range from 4 to 6 mg / L).

Figure 6. Results

GPS Module

Figure 7. GPS Module

Having the aforementioned sensors and making the above measurements, R.A.S.A. needs also to keep track of the exact location of these measurements, as well as the exact time these measurements are happening. In order to tackle this problem, a Global Position System (GPS) module is used.

GPS Module Arduino Code

The code that is going to follow, unfortunately was not tested for its functionality, as our GPS module probably is faulted, or as we may have found from various online sources, it needs to be reset to its factory settings.
We came down with the code underneath, having done lots of research on the matter and this was the code that most people were using for their GPS module programming (which was the same model, made from the same manufacturer). The part we want to highlight to the process on the GPS module is the connectivity to the arduino, where the RXPin in the arduino should be connected to the TXPin in the GPS module and the other way around for the TXPin in the arduino and RXPin in the module. This is needed, because the RXPin is used for receiving information and the TXPin for transmitting.

                  
                    
                      #include <TinyGPS++.h>
                    #include <SoftwareSerial.h>

                    // Choose two Arduino pins to use for software serial
                    int RXPin = 15;
                    int TXPin = 14;

                    int GPSBaud = 9600;

                    // Create a TinyGPS++ object
                    TinyGPSPlus gps;

                    // Create a software serial port called "gpsSerial"
                    SoftwareSerial gpsSerial(RXPin, TXPin);

                    void setup()
                    {
                      // Start the Arduino hardware serial port at 9600 baud
                      Serial.begin(9600);

                      // Start the software serial port at the GPS's default baud
                      gpsSerial.begin(GPSBaud);
                    }

                    void loop()
                    {
                      // This sketch displays information every time a new sentence is correctly encoded.
                      while (gpsSerial.available() > 0)
                        if (gps.encode(gpsSerial.read()))
                          displayInfo();

                      // If 5000 milliseconds pass and there are no characters coming in
                      // over the software serial port, show a "No GPS detected" error
                      if (millis() > 5000 && gps.charsProcessed() < 10)
                      {
                        Serial.println("No GPS detected");
                        while(true);
                      }
                    }

                    void displayInfo()
                    {
                      if (gps.location.isValid())
                      {
                        Serial.print("Latitude: ");
                        Serial.println(gps.location.lat(), 6);
                        Serial.print("Longitude: ");
                        Serial.println(gps.location.lng(), 6);
                        Serial.print("Altitude: ");
                        Serial.println(gps.altitude.meters());
                      }
                      else
                      {
                        Serial.println("Location: Not Available");
                      }
                      
                      Serial.print("Date: ");
                      if (gps.date.isValid())
                      {
                        Serial.print(gps.date.month());
                        Serial.print("/");
                        Serial.print(gps.date.day());
                        Serial.print("/");
                        Serial.println(gps.date.year());
                      }
                      else
                      {
                        Serial.println("Not Available");
                      }

                      Serial.print("Time: ");
                      if (gps.time.isValid())
                      {
                        if (gps.time.hour() < 10) Serial.print(F("0"));
                        Serial.print(gps.time.hour());
                        Serial.print(":");
                        if (gps.time.minute() < 10) Serial.print(F("0"));
                        Serial.print(gps.time.minute());
                        Serial.print(":");
                        if (gps.time.second() < 10) Serial.print(F("0"));
                        Serial.print(gps.time.second());
                        Serial.print(".");
                        if (gps.time.centisecond() < 10) Serial.print(F("0"));
                        Serial.println(gps.time.centisecond());
                      }
                      else
                      {
                        Serial.println("Not Available");
                      }

                      Serial.println();
                      Serial.println();
                      delay(1000);
                    }

Conclusion

The alpha version of R.A.S.A., is capable of making measurements of pH and DO and stores them with the exact location of these measurements in an online database. This database later can be used to calculate approximately the level of eutrophication in the water body and if the platform with the genetically modified plants would be necessary to help.

Artificial Intelligence

AI and Machine Learning

What is it?

Machine Learning (ML) is a topic of study focused on comprehending and developing "learning" methods, or methods that use data to enhance performance on a certain set of tasks. It is considered to be a component of Artificial Intelligence (AI). Without being expressly taught to do so, machine learning algorithms create a model using sample data, also referred to as training data, in order to make predictions or judgments.

Navanthus and ML

As mentioned earlier, Navanthus is a project that not only solves eutrophication, but also detects it. Wet lab’s plants are activated, when microcystins are present. The system R.A.S.A. makes measurements concerning the water body and uploads the information online. Though, we want both these parts to be connected. Our system needs to be able to inform the user exactly where to place the floating platform so that the uptake of phosphorus and the bioremediation of the water body is optimal. This exact location on the water body would be the one with the most microcystins present.Unfortunately, microcystin sensors do not exist. Therefore it is impossible to include microcystin concentration directly to our system. In order to solve this problem, machine learning with our already set of sensors is able to predict on a high accuracy, in which locations in the water body, microcystins are present helping us our project’s goal more achievable.

Algorithm

The ML algorithm we believe that would work best for our problem would be the Support Vector Machine Algorithm (SVM), as we found through our bibliography research4.

SVM

What is it?

A supervised machine learning approach called Support Vector Machine (SVM) is used for both classification and regression. Even if we also refer to regression issues, classification is the best fit for SVM. Finding a hyperplane in an N-dimensional space that clearly classifies the data points is the goal of the SVM method. The number of features determines the hyperplane's size. The hyperplane is essentially a line if there are just two input features. The hyperplane turns into a 2-D plane if there are three input features. Imagining something with more than three features gets challenging. SVMs are one of the most robust prediction methods, being based on statistical learning. 5

How it works?

Figure 9. SVM Example

Given a training dataset of n points, where each point is in the form of $(x_i, y_i)$, $y_i$ is equal to 1 or -1, indicating the class to which $x_i$ belongs. $x_i$ is an m dimensional vector From the above figure we can observe that there are 2 classes, the one with the blue dots and the one with the red dots, and the $x_i$s are 2 dimensional vectors.
E.g., $(x_C = (2, 1)$ and $y_C = \;blue\; or\; -1$
$x_K = (6, 2)$ and $y_K = red\; or \;+1$

In order to maximize the distance between the hyperplane and the nearest points from either group, we are looking for the “maximum-margin hyperplane”, h0, that divides the group of points xi to the ones where $y_i = 1 \; and \; y_i = -1$.

Any hyperplane h can then be written as the set of points: $$ w^T \cdot x - b = 0,$$ where w is the weight vector that determines the hyperplane and b is the bias. ($w^T$ is the transpose vector of w).

This way, the equation of the hyperplane would give: \begin{equation} w^T \cdot x - b \geq{0}, \end{equation} for the points on the right (red points) and $$ w^T \cdot x - b \leq 0 $$, for the points on the left (blue points)

The distances between the points and the hyperplane is called geometric margin.

Having the above figure, we can observe that the hyperplane the SVM model came within this example is the green line that separates the two classes, having equal distances from all the red points and the blue ones.
w = [4] and b = -16.

Also, the SVM algorithm comes with two more hyperplanes h1 and h2, which are parallel to h0. h1 is the hyperplane that goes through the closest vector to the geometric margin of the one class (in this example the K point of the red class) and h2 is the hyperplane going through the closest vector to the geometric margin of the other class (D point of the blue class). These two vectors are called support vectors.
In the above example, the equations for h1 and h2 are : $$ h1 = 4 \cdot x - 16 - 6 $$ $$ h1 = 4 \cdot x - 16 + 6 $$
Another example where x is a 3 dimensions vector would look like this:

Figure 10. 3D Plane SVM Classifier

The two classes are represented by the red and blue dots, and the hyperplane is the light blue plane that separates the classes.

Also, SVM is ideal for non-linearly separable data as well.

Figure 11. Circle SVM Classifier

In this example, we observe that the classes are being separated by a circle.

X vectors with more than 3 dimensions cannot be visualized.

How it learns?

SVM algorithm tries to find the optimal geometric margin (γ) for the training set. The geometric margin of the classifier of the hyperplane (w, b) with respect to a specific point $(x_i , y_i)$ is: $$ γ^i = \frac{y_i \cdot (w^T \cdot x_i + b)}{||w||}, $$

This equation comes from the two hyperplanes from figure 1.
The distance from a point to a line comes from the equation: $$ distance = \frac{|A \cdot x_0 + B \cdot y_0 + C |}{\sqrt{A^2 + B^2}}$$
Where the nominator is the parameters multiplied with the specific point $( w x_i )$.
And the denominator is the norm of the parameters $( ||w|| )$.

Thus getting the equation for the geometric margin, where $γ_i$ indicates the distance from every possible point to the optimal classifier.

In order to get the optimal margin classifier we need to maximize $γ_i$, thus minimize the denominator with the condition that there will be no data points between h1 and h2. h1 and h2 can possibly be as close to the h0 as possible (in this proof, h1 and h2, we will assume that they are only a point away from it, whereas in the above example they were 6 points away). $$ w \cdot x + b \geq + 1 \;,\; \;when\; y = + 1 \;And\; $$ $$ w \cdot x + b \leq - 1 \;,\; \;when\; y = - 1 $$
Which combined can be expressed as : $$ y_i \cdot (w \cdot x + b) \geq 1, $$ as $y_i$ can get values of 1 or -1 switching the inequality accordingly matching our equations above.

So from the first equation: $$ γ^i = \frac{y_i \cdot (w^T \cdot x_i + b)}{||w||}, $$ What we need to do is to minimize the denominator, having in mind the distance to the points.

So we end up with this minimization problem : $$ min ||w|| $$ $$ such\; that\; y_i \cdot (w^T \cdot x + b) \geq 1 $$ This problem leads to this equation (complex proof): $$ min \frac{1}{2} \cdot ||w||^2 $$ $$ such\; that \;y_i \cdot (w^T \cdot x + b) \geq 1 $$ This is a constrained optimization problem, as we need to minimize a quadratic function.

Gradient Descent

Gradient descent is a first-order iterative optimization procedure for locating a local minimum of a differentiable function in mathematics. It is also frequently referred to as steepest descent. The algorithm works by taking repeated steps in the opposite direction of the function’s gradient at the current point, since this is the direction of the steepest descent.

In our constrained optimization problem, we will use gradient descent to minimize the quadratic function6.

How Gradient Descent works?

Gradient Descent uses this formula to find the local minimum in a function: $$ while \;(until \;convergance) \;do: $$ $$ Θ_{i + 1} = Θ_i - α \cdot \nabla{f}$$ $$ i = i + 1 $$ Where:

$Θ_{i + 1}$ is the updated parameter of $Θ_i$
$Θ_i$ is the ith parameter of w and $Θ_0$ is the b parameter in our example
α is the learning rate of the algorithm (usually it is a number between $10^{-1}$ and $10^{-3}$ depending on how big or small steps we want to have at every iteration of the algorithm
∇ is the del or nabla of the function f, which denotes the standard derivative of the function
f is the loss function, and we want to minimize it, which in our case is the error of our classifier being described by the hyperplane:

The error of the classifier can be written as the prediction of the model minus the actual result squared. When getting the ∇ of the error, we get the prediction of the model minus the actual result multiplied by the data vector: $$ error = (h_Θ(x) - y )^2 $$ $$ \nabla{error} = \nabla{(h_Θ(x) - y )^2} \Longrightarrow (h_Θ(x) - y ) \cdot x$$
So we end up with the equation : $$ Θ_{i + 1} = Θ_i - α \cdot (h_Θ(x^i) - y^i ) \cdot x^i $$

We know that the function will converge after a finite number of iterations, because the function we want to minimize is a quadratic one $((h_Θ(x) - y)^2)$ and, quadratic functions have only one local minimum, which is the global minimum as well.

Figure 12. Quadratic Function's Graph

Kernels

So far, the SVM machine learning algorithm has been explained and how it learns to optimize its classification hyperplane. Though, there are lots of cases (like in our project), where the data obtained have sample size lower than the number of dimensions. In our case, and in general, water quality data have this feature and one cannot easily get to concrete results with this data. Here comes Kernels to help these kinds of problems, adding more dimensions to the dataset, making it easier for the algorithm to come with a better classifier.

In order to make it more simple to understanding, let’s take the non-linear dataset in the figure with the circle. The data cannot be linearly separable in a 2-d dimensional space, and a polynomial function should be needed to fit them. Transforming the data into a higher-dimensional space (in this example a 3D space) the data can be separated by a linear classifier. This transformation is happening with kernels.
So kernels basically take a vector x and map it to another vector $\phi(x)$ that has more dimensions.

Having more dimensions, it would seem that the algorithm would be more computationally expensive. However, the way the kernels are being applied to the SVM algorithm they do not require more computations, even when one adds infinite dimensions 7.

Kernels are the reason SVM algorithm works so well in classifying problems.

Famous Kernel examples

The most famous kernels used in machine learning are :

Polynomial Kernel, which is used in image processing and the mapping of the vectors is: $$ \phi(x_i, x_j) = (x_i + x_j + 1)^d, $$ where d is the degree of the polynomial.
Gaussian Kernel, which is a general-used kernel mostly used when there is no prior knowledge to the data and with mapping : $$ \phi(x_i, x_j) = \exp{-\frac{||x_i - x_j||^2}{2 \cdot σ^2}},$$ where σ is the standard deviation of the normal distribution.
Gaussian Radial Basis Function (RBF) Kernel, which is used as the gaussian kernel.
With mapping : $$ \phi(x_i, x_j) = \exp{- γ \cdot ||x_i - x_j||^2},$$ where γ > 0.
Sigmoid Kernel with mapping: $$ \phi(x_i, x_j) = tanh(a \cdot x_i^T \cdot x_j + b)$$

Implementation

Data

Through our project, we tried a lot to find data to create, train and test our SVM algorithm model in action. Unfortunately, no matter how hard we tried to get some local data, there have never been conducted surveys that had the data we needed combined. So, we tried to find some free data and train our model on these. The dataset, we found, is through research that has been conveyed in some of the Great Lakes, Canada17. This dataset includes various values that we required, and a lot of them that we did not. Though, the main problem we found out was that the dataset, even though it has lots of entries, it misses a lot that are required to our training phase.
This was found as we manipulated the data so that each input would include:

an ID, the lake’s name,
a timestamp with the exact date and time the measurement was made,
the longitude and latitude of the measurement
and values of DO, pH, water’s temperature, depth, nitrogen, phosphorus, chlorophyll, microcystins and cyanobacteria with their according units.

The manipulation of data was done using the Pandas10 python library. Having made this manipulation and having brought the data to our format on how to use them, the entries from, 10087 have been reduced to almost 40.

The cyanobacteria were the main focus on the study that was released with the dataset, so we focused on predicting cyanobacteria to follow the research’s data and also get as much data as we possibly could.

Model Built

In our train state, the data we gave to the model were only the different values as features and the cyanobacteria existence as label. The train phase as well as the test phase were all done with the help of scikit-learn8, 9 python library.
Finally, the linear kernel was preferred, as it always produced better predictions.
The problem with our try is the absence of lots of data, so our output cannot serve as a proof of our problem. Though, Masaya Moria et al.4, that have run the same model, they found that the SVM algorithm was having better prediction rate than the deep-learning architecture they used. This for us is more than enough to have faith in our approach.

Final Thoughts

Even though an analog microcystin sensor does not exist, AI, and machine learning, provide the necessary knowledge that is able to find patterns and gather more information with our data.

As mentioned, the SVM machine learning algorithm with the help of kernels and gradient descent as a learning algorithm, we can get very reassuring results with significant fewer data.

Software

Overview

So far, in our project, we have programmed the sensors to communicate with the arduino, the arduino to send the information of the sensors and the exact location online. Then online, the server would run the machine learning model, which would classify the places that have microcystins that our plants detect. The only part missing that would circle our problem and our approach in solving eutrophication was the visualization of the data captured. Through our application, we provide an easy way to help people where exactly to place the CFWs, but also a useful tool that would raise awareness for the problem.

Application

Usage

The web-application built from our team shows a heatmap, where the points taken from the data , that are eutrophicated, are appearing with gradient colours.

Heatmap with point in the eutrophicated lake

Figure 13. Heatmap specifying eutrophicated point in lake

Data

In order to test our application, we had to find some data as an input. The data we found online were from DataStream17 from various Canadian lakes. Our example, though, as we have not got our own data uploaded, are imported from a JSON file that contains the manipulated data (we changed the way the data were saved in the file so that each row would include a lake ID the timestamp when the measurement was taken, the longitude and latitude, and the various measurements) from DataStream.

Tools

As we wanted to build an application, that would be easily accessible to everyone using it without knowing any explicit knowledge, a web-application was preferred. This makes the application easy for everyone to use it by just connecting to the website that hosts the application. Being a university research team, we chose to use the open-source framework Vue.js 11.

Vue.js is an open-source front end JavaScript framework that let the user combine HTML CSS and JavaScript to build user interfaces and single page applications.

In the Vue.js framework, we installed various packages, which provide us certain APIs (Application Programming Interface) and helped us build the application.
These packages and APIs are :

Leaflet12. Leaflet is the best JavaScript library for mobile-friendly interactive maps.
Axios13. Axios is an API that lets the user make a request in a URL and save the response.
Nominatim14. Nominatim is an API that provided geographic location it returns various information of the location, like country, county, address and more depending on the accuracy.
Leaflet Heatmap15. Leaflet Heatmap is a leaflet plugin (coming as a package) that provides the user a heatmap and certain options about the heatmap.
Bootstrap. Bootstrap is a free and open-source CSS framework, designed for responsive, mobile-first front-end web development, which was a package for Vue framework as well.
Vue2-leaflet16. Vue2-leaflet is a package from Vue.js that communicates with leaflet and works best for leaflet applications.

Architecture

Our application has four layers of Vue.js components.

Figure 14. App Architecture: Components Connection

The first layer consists of the App component that is responsible to create the app. The second layer consists of the divider component that gets the data from the JSON file, or in the future from a data cloud server, and divides the app in two. The third layer consists of two components. The WaterbodyList component that gets the data from the divider component as props and prints from them only the longitude and latitude.

Figure 15. App WaterBodyList component output

The LeafletMap component, which creates the map from a point and a zoom perspective from that point (our point wanting to show the whole map is [0, 0] with zoom 2), initialize some variables that are going to be used in the map and shows the map in the screen. Moreover, it utilises the OpenStreetMaps tiling for the map.

Figure 16. App LeafletMap component output

Finally, our last layer of components includes the HeatMap component, which gets the data provided from the divider and divides the data to location and the rest of the data. Then, from the location points it prints on the map a heatmap around them, which can be easily be adjusted on how strong (reddish) or pale (blueish) should be. This will happen according to the rest of the data that the divider provided, but through this data we could not find a function that utilise all the data and maps these data to a eutrophication index. Though, this could be easily be added if such a function is found.

Figure 17. App HeatMap component output.

Figure 18. App HeatMap component output zoomed.

What is great from this architecture is that it is very easily adjustable and therefore useful to a lot of future teams wanting to create an app that has map features. The first three layers of components would stay the same for all map projects, while the HeatMap component would be changed depending on the needs of the team, or the intensity of the HeatMap points could change according to ones data by adding a function. Possible scenarios would be clustering several points together, or mapping services, but the sky is the limit, since the leaflet library12 is very well documented and provides plugins for everything related to maps.

Final Output

Our final app looks like this:

Future Vision

Electrical

Our prototype is functional and very useful to someone who wants to calculate how much eutrophicated is a water body. Though, due to time, product availability and money, we could not have all the sensors we wanted for the prototype. That is why, we would like to analyse all the sensors we would have used if we could find them affordable and available to be shipped to us in a reasonable time period. The sensors, that would be required to get all the data we would need from a water body would be:

Water Temperature Sensor
Water Conductivity Sensor
Total Phosphorus Sensor
Total Nitrogen Sensor
Chlorophyll Sensor
Secchi Depth Sensor

All these sensors can help to calculate the level of eutrophication in the water body more accurately and provide better understanding of how polluted the water body is.

Artificial Intelligence

Provided a better dataset, we could have a better understanding of how well the SVM algorithm works for our case. Finally, with the sensors, mentioned above, we could test our model.

App

The future development of the app will aim to better showing the heatmaps and the intensity of it around the points. Also, one more component will be added that translates the geolocation (longitude, latitude) to a place, so that in the list next to map would show places instead of geolocation data (Nominatim plugin). Also, the axios API would be used in the divider component as the data would be online and not locally in a file for us to fetch. Finally, the final goal, if everything would be implemented, is to create a second heatmap that indicates the exact points having microcystins and therefore the exact location the CFWs should be added to the water body for the optimal usage of them.

References

https://www.arduino.cc
Verspagen JM, et al. Rising CO2 levels will intensify phytoplankton blooms in eutrophic and hypertrophic lakes. PLoS One. 2014 Aug 13;9(8):e104325. doi: 10.1371/journal.pone.0104325. PMID: 25119996; PMCID: PMC4132121.
Nguyen Duc Viet, et al. Dissolved Oxygen as an Indicator for Eutrophication in Freshwater Lakes. Proceedings of International Conference on Environmental Engineering and Management for Sustainable Development. Sept 2016.
Masaya Moria, Roberto Gonzalez, et al. Prediction of Microcystis Occurrences and Analysis Using Machine Learning in High-Dimension, Low-Sample-Size and Imbalanced Water Quality Data.
Corinna Cortes and Vladimir Vapnik. Support-Vector Networks. AT&T Bell Labs.
Saad, Yousef. Iterative methods for sparse linear systems (2nd ed.). Philadelphia, Pa.: Society for Industrial and Applied Mathematics. pp. 195. ISBN 978-0-89871-534-7. (2003)
THOMAS HOFMANN, BERNHARD SCHÖLKOPF AND ALEXANDER J. SMOLA. KERNEL METHODS IN MACHINE LEARNING
https://scikit-learn.org/stable/modules/svm.html
https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
https://pandas.pydata.org/
https://vuejs.org/
https://leafletjs.com/
https://axios-http.com/
https://nominatim.org/
https://github.com/Leaflet/Leaflet.heat
https://vue2-leaflet.netlify.app/
Alberta Lake Management Society. 2022-07-14. "LakeKeepers Water Quality Data" (dataset). 6.0.0. DataStream. https://doi.org/10.25976/2eh8-7s91.