The capsule was designed to be small and able to handle the task of analyzing its position . However, because the computing resources in the capsule are limited, we decided to compute the model outside the capsule on the server and send the calculated result of the prediction of the location to the capsule . This gives the opportunity to utilise heavy models.
The time series information will be added later after the AlphaTest by the data collected . We can do it now, but it's pointless because the way endoscopys move and the way capsules move are so very different
The main idea is to receive an image X and find the best parameters in a defined architecture to generate to minimise
The architecture with the parameters will be the algorithm and the data of the program that is the model.
Figure 1 is an example of X
Train a model to receive an image file from the capsule and predict where in the digestive system is the image taken from.
As human experiment is not possible in this stage, we are using the image taken from the digestive system endoscopes by OSF.
The model is a computer program formed by algorithms and data . The algorithm we designed is shown in the following blocks
and here is the actual code which does it.
# build the model
class Classifier(nn.Module):
def __init__(self):
super().__init__()
# self.bnn2d = nn.BatchNorm2d(3)
self.layer1 = make_layer(3, 6, 5)
self.layer2 = make_layer(6, 12, 5)
self.bnn2d = nn.BatchNorm2d(12)
self.layer3 = make_layer(12, 24, 5)
self.layer4 = make_layer(24, 48, 5)
self.bnn2d = nn.BatchNorm2d(48)
self.layer5 = make_layer(48, 64, 5)
self.layer6 = make_layer(64, 128, 5)
self.bnn2d = nn.BatchNorm2d(128)
self.avg = nn.AvgPool2d(3)
self.max = nn.MaxPool2d(5)
self.fc1 = nn.Linear(7168,192)
self.fc2 = nn.Linear(192,92)
self.fc3 = nn.Linear(92,23)
# forward method
def forward(self, x):
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avg(x)
x = self.max(x)
x = self.layer5(x)
x = self.layer6(x)
x = self.max(x)
x = x.reshape([len(x),int(torch.numel(x)/len(x))])
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
return F.softmax(x,dim = 1)
where the ResGroup in the image was defined as
and here is the actual code which does it.
def make_layer(in_channel, out_channel, block_num, stride=1):
shortcut = nn.Sequential(
nn.Conv2d(in_channel, out_channel, 1, stride),
nn.BatchNorm2d(out_channel)
)
layers = list()
layers.append(ResBlock(in_channel, out_channel, stride, shortcut))
for i in range(1, block_num):
layers.append(ResBlock(out_channel, out_channel))
return nn.Sequential(*layers)
class ResBlock(nn.Module):
def __init__(self, in_channel, out_channel, stride=1, shortcut=None):
super(ResBlock, self).__init__()
self.left = nn.Sequential(
nn.Conv2d(in_channel, out_channel, 3, stride, 1, bias=False),
nn.BatchNorm2d(out_channel),
nn.ReLU(True),
nn.Conv2d(out_channel, out_channel, 3, 1, 1, bias=False),
nn.BatchNorm2d(out_channel),
)
self.right = shortcut
def forward(self, x):
out = self.left(x)
residual = x if self.right is None else self.right(x)
out += residual
return F.relu(out)
So how does it work?
Mathematically it makes sense to model the whole process to be y-hat = Algo(X, W) where y-hat is the neural network's prediction, X is the input and W is all the model's parameters. Therefore, X will be a 531 x 633 matrix and W will be a 3444292-dimensional vector.
The loss function is used to measure how correct the model is. in this case, we were using Binary Cross Entropy
as the loss function. Because we were using PyTorch, the PyTorch will automatically calculate the partial derivative of the W to loss(y,y-hat). The optimiser will generate a better value of W based on the value of the derivative . In this case, the optimiser was selected to be Adam
Figure 1 shows the lightness of the pixel with index of [Row = r, Column = c] means the number of the sample been classified as a c-th class and it's acctually the r-th class
According to the diagram, class 23 has a lot of wrong data, because class 23 is the class for unpredictables
Figure 2 shows the same thing but less sophisticated. In this image, the lightness of the pixel with index of [Row = r, Column = c] therfore
Where X is a sample from the sample space and is the prediction from the network
The results for this model are really decent and as a medical-purposed model. Because the higher the recall is, the safer it is, and therefore the 0.97 recall (which is very high and very good) is so satisfying.
accuracy_score | accuracy_score | recall_score | f1_score | |
---|---|---|---|---|
The model | 0.7297880322641156 | 0.7297880322641156 | 0.976408583260133 | 0.8352745424292845 |
The results for this model are really decent and as a medical-purposed model. Because the higher the recall is, the safer it is, and therefore the 0.97 recall (which is very high and very good) is so satisfying.
To deploy the model in production, we have to add the corresponding timing information( which we wont have the data until we do the animal test, because endoscopys don't move the way capsules move) to it with RNNs or transformers. But it's clear enough to say the data will just be better because better data is fed in.