Non Linear Cellular Automata Enhanced with Active Learning for Pattern Classification in Highly Dense Images
P. Kiran Sree1, Sssn Usha Devi N.2
1Dept of Computer Science and Engineering, Shri Vishnu Engineering College for Women, Bhimavaram, India
2Dept of Computer Science and Engineering, University College of Engineering, Jawaharlal Nehru Technological University, Kakinada, India
To cite this article:
P. Kiran Sree, Sssn Usha Devi N. Non Linear Cellular Automata Enhanced with Active Learning for Pattern Classification in Highly Dense Images. Machine Learning Research. Vol. 1, No. 1, 2016, pp. 15-18. doi: 10.11648/j.mlr.20160101.12
Received: November 27, 2016; Accepted: December 17, 2016; Published: January 16, 2017
Abstract: This paper introduces a new approach to classify several high density images based on the properties of Non Linear Cellular Automata. We use a state-transition which consists of a set of disjoint trees rooted at cyclic states of unit cycle length thus forming a natural classifier. The framework proposed is strengthened with genetic algorithm to find the desired local rule of the modeling as a global state function.
Keywords: Cellular Automata (CA), Active Learning (DL), Non Linear CA
In the first part of the paper we have developed a classifier based on Linear DLM and Non Linear Active Learning Mechanism which can address major problems in bioinformatics like protein coding region identification, protein structure prediction and promoter region identification. We have also proposed Artificial Immune System a novel computational intelligence technique for strengthening the system with more adaptability and incorporating more parallelism to the system. We have also shown how the quality of clustering can be improved with Cellular Automata.
In the second part of the paper we explored a Heuristic based Non Uniform ActiveLearningMechanisam based Intrusion detection system that monitors network for malicious activities or policy violations and produces reports to a management station. We found a pattern of abstract IDS that define the general features and patterns for behavior based IDS and signature based IDS which will be used to find the potential threats in the network.
A protein is a mind boggling, high-atomic weight, natural intensifies that contains of amino acids joined by peptide bonds. Proteins are basic to the structure and capacity of every single living cell and infections. The proteins in a cell figure out what that cell will look like and what employments that cell will do. The qualities likewise decide how the various cells of a body will be orchestrated. In the event that we recognize the protein coding district and we can extricate parcel of data like, how DNA controls what number of fingers you have, where your legs are put on your body, and the shade of your eyes. DNA is sorted out as introns and exons. Introns shape the significant part of the DNA strand and exons frame the minor part of the DNA strand. Be that as it may, exons just comprise of protein coding locales. Recognizing protein coding districts in the exons is a genuine test. The proposed calculations LMADLM, NPCRITDLMDLM can handle DNA successions of various lengths. Trial comes about affirm the versatility of the proposed FDLM based classifier to handle extensive volume of datasets regardless of the quantity of classes, tuples and traits. Great grouping exactness has been set up.
A protein is a complex, high-molecular weight, organic compounds that contains of amino acids joined by peptide bonds. Proteins are essential to the structure and function of all living cells and viruses. The proteins in a cell determine what that cell will look like and what jobs that cell will do. The genes also determine how the many different cells of a body will be arranged. If we identify the protein coding region and we can extract lot of information like, how DNA controls how many fingers you have, where your legs are placed on your body, and the color of your eyes. DNA is organized in the form of introns and exons. Introns form the major part of the DNA strand and exons form the minor part of the DNA strand. But, exons only consist of protein coding regions. Identifying protein coding regions in the exons is a real challenge. The proposed algorithms LMADLM, NPCRITDLMDLM can process DNA sequences of different lengths. Experimental results confirm the scalability of the proposed FDLM based classifier to handle large volume of datasets irrespective of the number of classes, tuples and attributes. Good classification accuracy has been established. Fickettand Tung data sets are used for measuring the efficiency of the classifier.
In genetics, a promoter is a region of DNA that initiates transcription of a particular gene. Promoters are located near the genes they transcribe, on the same strand and upstream on the DNA. An algorithm was proposed to identify the promoter regions with DLM. Eukaryotic Promoter Database new data sets are used.
Protein structure prediction is the prediction of the three dimensional structure of a protein from its amino acid sequence that is, the prediction of its secondary, tertiary, and quaternary structure from its primary structure. Structure prediction is fundamentally different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by bioinformatics. Data set used was taken from DLMSP.
2. AIS Augmented with Activelearing
An artificial immune system (ARTIS) is described which incorporates many properties of natural immune systems, including diversity, distributed computation, error tolerance, dynamic learning and adaptation, and self-monitoring. ARTIS is a general framework for a distributed adaptive system and could, in principle, be applied to many domains. This AIS-MADLM system was used to strengthen the protein coding region identification system and protein structure predicting system.
The fundamental unit of Artificial Deep Learning Mechanisam (DLM) is a cell that has a basic structure advancing in discrete time and space. A standout amongst the most essential turning points in the historical backdrop of improvement of the straightforward homogeneous structure of DLM is because of Wolfram. Answers for complex issues requests parallel registering environment. Most parallel PCs contain more than a couple of dozen processors. DLM can accomplish parallelism on a scale bigger than hugely parallel PCs. DLM is described by nearby availability of its cells. All communications occur on a simply nearby premise. A phone can just speak with its neighboring cells. Promote, the interconnection connects for the most part convey just a little measure of data. One ramifications of this rule is that no cell has a worldwide perspective of the whole framework.
A fundamental problem for network intrusion detection systems is the ability of a skilled attacker to evade detection by exploiting ambiguities in the traffic stream as seen by the monitor. We discuss the viability of addressing this problem by introducing a new network forwarding element called a traffic MADLM normalizer. The MADLM normalizer sits directly in the path of traffic into a site and patches up the packet stream to eliminate potential ambiguities before the traffic is seen by the monitor, removing evasion opportunities. We examine a number of tradeoffs in designing a MADLM normalizer, emphasizing the important question of the degree to which normalizations undermine end-to-end protocol semantics.
We discuss the key practical issues of "cold start" and attacks on the MADLM normalizer, and develop a methodology for systematically examining the ambiguities present in a protocol based on walking the protocol’s header. We then present norm, a publicly available user-level implementation of a MADLM normalizer that can normalize a TCP traffic stream at 100,000 pkts/sec in memory-to-memory copies, suggesting that a kernel implementation using PC hardware could keep pace with a bidirectional 100 Mbps link with sufficient headroom to weather a high-speed flooding attack of small packets. DARPA Intrusion Detection Data Setsare be used to evaluate the developed classifier.
We did an extensive survey on the key features of DLM which will be useful for pattern recognition. We have reported all the characteristics of DLM with their classes and applicability of the classes in various fields. After this study we have successfully developed a linear and non linear classifier to address various problems in bioinformatics. Then the proposed algorithm is strengthened with artificial immune system with better stability and accuracy. The proposed algorithm was slightly modified to identify intrusions in the network also.
3. Complexity of DLM
DLM performs computations in a distributed fashion on a spatially extended grid. It differs from the conventional approach to parallel computation in which a problem is split into independent sub-problems, each solved by a different processor; the solution of sub-problems are subsequently combined to yield the final solution
The evolution process is directed by the popular Genetic Algorithm (GA) with the underlying philosophy of survival of the fittest gene. This GA framework can be adopted to arrive at the desired CA rule structure appropriate to model a physical system. The goals of GA formulation are to enhance the understanding of the ways DLM performs computations and to learn how DLM may be evolved to perform a specific computational task and to understand how evolution creates complex global behavior in a locally interconnected system of simple cells.
The task of pattern recognition is encountered in a wide range of human activity. In a broader perspective, the term could cover any context in which some decision or forecast is made on the basis of currently available information. The problem deals with the construction of a procedure to be applied to a set of inputs; the procedure assigns each new input to one of a set of classes on the basis of observed attributes or features. The construction of such a procedure on an input dataset is defined as pattern recognition
4. DLM in Pattern Recognition
Pattern recognition algorithm has two phases, the learning or training phase and the testing phase. In the training phase, the algorithm is trained with some patterns. Based upon the nature of training, there are two broad categorization of pattern classification
This model is built describing a predefined set of data classes. A sample set from the database, each member belonging to one of the predefined classes, is used to train the model. The training phase is termed as supervised learning of the classifier. Each member may have multiple features. The classifier is trained based on a specific metric. Subsequent to training, the model performs the task of prediction in the testing phase. Prediction of the class of an input sample is done based on some metric, typically distance metric.
This paper can be extended by formulating the memorizing capacity of non linearDLM based associative memory model. FDLM (Fuzzy Cellular Automata) based model for complex functions involving datasets with attributes of real numbers can be explored. The proposed algorithm with some minor changes can be used as compression algorithm also. This paper can be extended to propose a hybrid system with a combination of Non Linear DLM (NLDLM) and fuzzy sets.