Modeling

Modeling

What it is and why we decide to do it that way

The capsule was designed to be small and able to handle the task of analyzing its position . However, because the computing resources in the capsule are limited, we decided to compute the model outside the capsule on the server and send the calculated result of the prediction of the location to the capsule . This gives the opportunity to utilise heavy models.

The time series information will be added later after the AlphaTest by the data collected . We can do it now, but it's pointless because the way endoscopys move and the way capsules move are so very different

The main idea is to receive an image X and find the best parameters in a defined architecture to generate \hat{y} to minimise loss(y,\hat{y})

The architecture with the parameters will be the algorithm and the data of the program that is the model.

Figure 1 is an example of X

Some Figure is Missing!
Fig.1 : an image taken from the endoscopy of hospital that is the cecum
Modeling

The task description

Train a model to receive an image file from the capsule and predict where in the digestive system is the image taken from.

As human experiment is not possible in this stage, we are using the image taken from the digestive system endoscopes by OSF.

Modeling

The architecture of the model

The model is a computer program formed by algorithms and data . The algorithm we designed is shown in the following blocks

Some Figure is Missing!

and here is the actual code which does it.


# build the model
class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        # self.bnn2d = nn.BatchNorm2d(3)
        self.layer1 = make_layer(3, 6, 5)
        self.layer2 = make_layer(6, 12, 5)
        self.bnn2d = nn.BatchNorm2d(12)
        self.layer3 = make_layer(12, 24, 5)
        self.layer4 = make_layer(24, 48, 5)
        self.bnn2d = nn.BatchNorm2d(48)
        self.layer5 = make_layer(48, 64, 5)
        self.layer6 = make_layer(64, 128, 5)
        self.bnn2d = nn.BatchNorm2d(128)
        self.avg = nn.AvgPool2d(3)
        self.max = nn.MaxPool2d(5)
        self.fc1 = nn.Linear(7168,192)
        self.fc2 = nn.Linear(192,92)
        self.fc3 = nn.Linear(92,23)
    # forward method
    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avg(x)
        x = self.max(x)
        x = self.layer5(x)
        x = self.layer6(x)
        x = self.max(x)
        x = x.reshape([len(x),int(torch.numel(x)/len(x))])
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        return F.softmax(x,dim = 1)
    

where the ResGroup in the image was defined as

Some Figure is Missing!

and here is the actual code which does it.


def make_layer(in_channel, out_channel, block_num, stride=1):
    shortcut = nn.Sequential(
        nn.Conv2d(in_channel, out_channel, 1, stride),
        nn.BatchNorm2d(out_channel)
    )
    layers = list()
    layers.append(ResBlock(in_channel, out_channel, stride, shortcut))
    for i in range(1, block_num):
        layers.append(ResBlock(out_channel, out_channel))
    return nn.Sequential(*layers)

class ResBlock(nn.Module):
    def __init__(self, in_channel, out_channel, stride=1, shortcut=None):
        super(ResBlock, self).__init__()
        self.left = nn.Sequential(
            nn.Conv2d(in_channel, out_channel, 3, stride, 1, bias=False),
            nn.BatchNorm2d(out_channel),
            nn.ReLU(True),
            nn.Conv2d(out_channel, out_channel, 3, 1, 1, bias=False),
            nn.BatchNorm2d(out_channel),
        )
        self.right = shortcut
    def forward(self, x):
        out = self.left(x)
        residual = x if self.right is None else self.right(x)
        out += residual
        return F.relu(out)

          

So how does it work?

Mathematically it makes sense to model the whole process to be y-hat = Algo(X, W) where y-hat is the neural network's prediction, X is the input and W is all the model's parameters. Therefore, X will be a 531 x 633 matrix and W will be a 3444292-dimensional vector.

The loss function is used to measure how correct the model is. in this case, we were using Binary Cross Entropy

Some Figure is Missing!

as the loss function. Because we were using PyTorch, the PyTorch will automatically calculate the partial derivative of the W to loss(y,y-hat). The optimiser will generate a better value of W based on the value of the derivative . In this case, the optimiser was selected to be Adam

Some Figure is Missing!
`

Heatmap


Some Figure is Missing!
Fig.1 : an image taken from the endoscopy of hospital that is the cecum

Figure 1 shows the lightness of the pixel with index of [Row = r, Column = c] means the number of the sample been classified as a c-th class and it's acctually the r-th class

According to the diagram, class 23 has a lot of wrong data, because class 23 is the class for unpredictables

Some Figure is Missing!
Fig.2 Heat map showing how the prediction is related to classes in posibilitie

Figure 2 shows the same thing but less sophisticated. In this image, the lightness of the pixel with index of [Row = r, Column = c] therfore P({ {X \in Set(c) }|{\hat{y}\in Set(r)}} )

Where X is a sample from the sample space and \hat{y} is the prediction from the network

Heatmap


The results for this model are really decent and as a medical-purposed model. Because the higher the recall is, the safer it is, and therefore the 0.97 recall (which is very high and very good) is so satisfying.

accuracy_score accuracy_score recall_score f1_score
The model 0.7297880322641156 0.7297880322641156 0.976408583260133 0.8352745424292845

The results for this model are really decent and as a medical-purposed model. Because the higher the recall is, the safer it is, and therefore the 0.97 recall (which is very high and very good) is so satisfying.

To deploy the model in production, we have to add the corresponding timing information( which we wont have the data until we do the animal test, because endoscopys don't move the way capsules move) to it with RNNs or transformers. But it's clear enough to say the data will just be better because better data is fed in.

SZ-SHD

References

  • [1] The OSF dataset https://osf.io/mh9sj/files/osfstorage
  • [2] The binary cross entropy https://www.researchgate.net/profile/Vamsidhar-Yendapalli/publication/344854379_Binary_cross_entropy_with_deep_learning_technique_for_Image_classification/links/5f93eed692851c14bce1ac68/Binary-cross-entropy-with-deep-learning-technique-for-Image-classification.pdf
  • [3] The Adam Optimiser https://arxiv.org/abs/1412.6980