Deep Learning Applications in the Medical Image Recognition

: In this essay, the researcher is focusing on the deep learning systems and its major applications in various fields. Song Yukun uses the relu incentive algorithm and the convolution functions to make the program automatically recognize different things or same type of things with different features. Before actually processing the image recognition part, the researcher adds a transforming program which change all kinds of image into one small form. Then, using this modelled image, the program could delicately determine the type of the contents in the image. This technological program is automatic and performs as an essential part of artificial intelligences. The main work it does is imitating the learning process of human brain, which accumulate experiences from thousands of events. It realizes this function by adding different algorithms in the program including the relu incentive algorithm which “teaches” the program particular types of images. After massive input, this technological program could quickly solve current problems with the lack of human labor force doing repetitive but intelligent works like checking particular tumor in the X-ray films. Besides, learning by themselves, the programs could generate results more specific than humans do. This deep-learning principle could be widely utilized since everything in human lives are learning and accumulating experiences. It could change any previous mechanical program into “intelligent” programs which would have an acceleration in their delicacy of determination.


Introduction
During the 1940s, our predecessors invented the first computer in the world, which took up three rooms and was designed to have over ten thousand conductors in order to calculate the projectile of the missile for military use. It was composed of five main parts: the logic components where the computer deal with all the data it holds, chief-controller, input device, output device, as well as the data recoding document. Through the development during these decades, our computers have evolved from the huge machine into the portable laptops, which brings us convenience and facilitates our daily lives.
On the other hand, the software in the computer has also experienced enormous change that people could do things unimaginable in the previous century, for example, individuals could contact face to face with each other using Facebook invented by Zuckburg even if they are actually thousands of miles away from each other. People could also use their phones to read the QR code. And now, human societies are facing an increasing demand of artificial intelligence. As John McCarthy, a professor of computer science at the Stanford university, defines, artificial intelligence is the science of making intelligent machines, especially using computers to understand or act like humans do. This new modern technology is necessary in most of the careers in the current society. Companies need quicker and more comprehensive ways to understand their customers. Face-scanning and fingerprint recognizing systems are employed in the security departments where engineers invented delicate face-scanning locks and safe face-scanning channels for payment as well as departure site in the airport. Virtual fitting rooms are set up in high-tech dress stores where people could change their appearance in the mirror by only touching some icons. Voice detecting and categorizing applications are implemented in the musical applications to make it easier for people to found out the song they hear in their daily lives in a few seconds. Furthermore, there are even some olfactory devices in the recently invented intelligent phone called Ali-phone for identifying products like the tea. What beyond our imagination is that those widespread different technologies all derive from the same program -deep learning network, and that's far less than its limit.
Deep learning could be also utilized to do something more influential and helpful to the society. There are millions of disabled people and deeply suffering patients who got diseases that expired their ability of moving. The deep-learning program could help them. The intelligent sticks that help those blind people are perfect examples in which the deep learning systems could play important roles to detect and avoid possible dangers like moving cars, stairs, and walls. Deep learning programs could also help detect whether the patient is opening their eyes or not, through which the nurses can let patients make choices by winkling their eyes. With the help of deep learning artificial intelligence, there could be much simpler tasks for people to deal with instead of those complicated and dangerous tasks. But how?

Convolutional Neural Network Approach
Starting from the simplest geometrical shapes could be more acceptable for explanation since the actual recognition of the human eyes are too complicated while they actually share the intrinsic essence. Imagine the most common figurative shapes on public bathrooms: male and female. How could human distinguish them during our life and never make mistakes? Dresses stands for women, so all the places take the dress as the most distinguishable sign on their doors for female bathrooms. Dress, in abstract, represents an important factor and tool for deep learning -particular characteristics. People see those shapes of trapezoids, so do computers. Mathematics now play an important role in recognizing the figures. As is widely acknowledged, the figures are combined by pixels and each single pixel have red, green, and blue (RGB) to alternate the color in that pixel. Actually, all colors are stored in computers as numbers from 000000 to FFFFFF, corresponding to the three original colors. Therefore, the first thing is to decode the graph received from the cameras, which could be accomplished by adding decoding lines in our programs thus changing the initial image inputs into readable four dimensional matrixes, which contains four parameters corresponding to batches, rows, columns, and layers. Then, the second question comes, how could computers "learn" from those data? The researcher uses incentive relu functions for solving this problem. The relu functions are lines that reserve useful data and eliminate the irrelevant lines of numbers.
As the graph shows, the relu function actually excludes all the negative values regarding irrelevant data in our programs. [1][2][3][4] This function make it much easier for computers to conclude and induce the internal unique characteristics of the product. Then, based on this relu incentive, the computers could do regularization to the matrix of weight it generated before. This process is mimicking the function of our brain composed of billions of neurons and it is just making certain pat of the "brain" active. [5][6][7][8][9][10] Apart from that, convolution to achieve deep learning is still needed. First, the computer may randomly generate a matrix for the following convolution. During each round of process, the computer multiplies every number recorded in one pixel with the corresponding element in the random matrix and get a new matrix of sum. Some of the pictures may be too large or small, so the program contains normalizing functions and padding elements in the program to avoid blind area in the procedure of convolution. In addition, the picture could be huge if it is calculated in the number of pixels, for which the program should contain pooling system to combine multiple numbers together. [11][12][13][14][15][16][17][18] The "pooling" function is:   Figure 3 describes the procedure of convolution. In order to cover each single pixel, the program could either make the stride of multiplying be one, which is very ineffective because the picture is actually huge for the weight matrix to crumble on it, or make the picture have some more pixels so that the length and width could be divided by the weight matrix without remainders, called padding. Normally programmers add 0 or the marginal value of the input image on its margin so that it doesn't affect the accuracy of convolution. But the images don't only have one characteristic. It's necessary for program to generate hundreds of weight matrixes and do convolution to each layer of the graph, which is divided by colors. Due to the fact that the characteristics are relatively randomly distributed on the range of number from 00 to FF, when the program generates two random matrixes, it's possible that they get totally different results by convolution only because they are at different position on the number axis. Therefore, the reasonable number of matrixes generated is required in order to cover each characteristic. Besides, for obtaining more possible characteristics, the program could rotate the picture to different angle to mimic any possible visual angle in the reality. First of all, the original two-dimensional array (or matrix) is [[0, 1, 2, 3], [4,5,6,7], [8,9,10,11], [12,13,14,15]]. Then, after reversing the picture, it becomes [[0, 4, 8, 12], [1,5,9,13], [2,6,10,14], [3,7,11,15]], which enables the convolution to cover more characteristics from different angle.
After all, the computer would get a lot of valuable weight matrix after the training using relu incentive function, and now it need to "recognize" the new picture. When the computer get the new picture, it still transforms it into arrays, then, it multiplies the array to each weight matrix to get hundreds of sums. Using the determination function based on the variance of the whole sample of data, the credibility of each sum could be get, thus get the credible interval of each single characteristic. As a result, if the program could generalize the image into some certain characteristic of, for example, the mouse, that the computer "learned" before, it could output a percentage standing for our confidence of regarding the input image as a mouse, or it could also give a percentage standing for the confidence of not regarding the input image as a mouse. The credibility of the confidence could also be learned through the process of deep learning and comparing to the standard deviation of the whole population's credible interval. Thus, the program could conclude out criteria by which it could more specifically determine whether the image is one of the target images that the computer "learned" with titles before. After being "trained" by sources of different types of pictures, taking planes, cars, and birds for examples, the deep-learning program could precisely predict the upcoming pictures as previous types of images.

Applications in Image Recognition
This spectacular technology could be utilized in multiple fields [19][20][21][22][23][24]. For example, in the car-number detecting system on the road, the cameras could deliver the images to the central controlling computer so that it could utilize programs which are previously generalized by millions of car-number images. Then, the computer could know whether the car has a number plate and whether the car owner blocks the plate. With the support of great database, the program could match the car owner with the illegal behavior towards the car-number plate and finally punish him or her. As Fujiyoshi Hironobu et al. write, "Their features are a labeling of the smoothed image using Gaussian filter, template matching using averaged patterns and some artificial neural networks (NNs) based on the Back-Propagation algorithm. First, in the case of Gaussian filtering method, the smoothed image is used so that the separation of car body and number plate is not sufficient in the making labels. Therefore, a body color of the car influences to the detection score. The template matching is given the high score because of the edge emphasizing. It does not depend on a body color. Finally, we show that the NNs are able to detect with much higher score." Based on this real implementation, the program could evolve and become able to directly connect the pedestrians, riders, and drivers who violate the transportation law and automatically warn and punish them on the personal trust network which is under constructing, which could save all the police who have to stand in the middle of the road every day in metropolitan. In the future, maybe the self-driving car implementing this program to distinguish pedestrians and other cars could also get into peoples' lives. Besides, this technology could also be applied in the medical field. The programs could figure out some kinds of hidden figure that human could not notice based on our natural sensing, which may make the programs play unreplaceable important status in diagnosing diseases. For instance, eye is one of the most delicate apparatus in human body and there's hundreds of different kinds of disease that cause trouble to human eyes, which are difficult for doctors to recognize since the structure of eyes are too complex. However, utilizing the deep-learning program, "It has a higher accuracy rate on finding indications of diseases on retinal images than slower conventional diagnoses by doctors. As such, it has the potential to make preventative healthcare available to millions of people, not just in China, but around the world." The extreme high speed of diagnosing could also bring incredible efficiency to the hospitals, which helps save millions of people's eyesight or even lives. Using the Airdoc machine with intelligent learning systems, "The machine has just taken high-resolution medical-grade images of both your retinas. It instantly sends them to the cloud where it takes 20 to 30 milliseconds." Furthermore, there are millions of patients suffering from serious diseases and must stay in bed, some of whom even could only move their eyes, this deep learning program could also be trained with types of human with eyes opening and closing which may make those patients being able to express their own ideas by choosing the choices on a large screen in front of them with blinking different times.
Based on this program, the nurses and doctors don't need to stuck at the ICU and looking every single movement of the patients while doing more precise detections through the automatic programs. [25][26][27][28][29][30][31] As a result, this program could massively bring benefits and convenience to multiple fields.
Codes could be finished based on convolutional neural network to recognize the pictures of human face. At first step, the nurses could train the codes with hundreds of pictures of patients' faces. These pictures can be classified to a few types, therefore each picture is labelled as one specific name. After training the codes with these pictures and the corresponding labels, our convolutional neural network can capture the features of each kind of pictures. The figure below present the learning rate accuracy of the codes. After training many steps, the codes can fit their feature matrix and ready for predictions. In the hospital, these codes can monitor people's face and "know" if this kind of face is painful or not. In the below figure, with more training steps, the mean loss reflects the discrepancy between code predictions and the real data. It decreases with steps, and the code prediction accuracy increases with steps, which means the codes can learn the training pictures.

Conclusion
In conclusion, those inherent written lines in the programs constructed more and more intelligent system with high speed and accuracy of dealing with information which is beyond human's reach and imagination. Based on these profound programs, human civilization could make a great leap and improve in every corner in the society. Artificial intelligence with data regression and distinguishing systems, which is more and more popular and developed in the status quo, is actually changing our lives, both in front of us and under the surface.