ANFIS-Based Visual Pose Estimation of Uncertain Robotic Arm Using Two Uncalibrated Cameras

: This paper describes a new approach for the visual pose estimation of an uncertain robotic manipulator using ANFIS (Artificial Neuro-Fuzzy Inference System) and two uncalibrated cameras. The main emphasis of this work is on the ability to estimate the positioning accuracy and repeatability of a low-cost robotic arm with unknown parameters under uncalibrated vision system. The vision system is composed of two cameras; installed on the top and on the lateral side of the robot


Introduction
The positioning problem of robot manipulators using visual information has been an area of research over the last 40 years. Attention to this subject has drastically grown in recent years. The feedback loop using visual information can solve many problems that limit applications of current robots: automatic driving, long range exploration, medical robotics, aerial robots, etc.
Neural networks are good candidates for approximating non-linear transformation functions because they possess the following desirable features. Firstly, neural networks have the capability to learn from experience. They do not require explicit programming to acquire the approximate model. Secondly, neural networks may approximate arbitrary nonlinear mappings subject to the availability of unlimited number of processing units. Thirdly, because of their massive parallel architecture, the data processing is fast. In the field of robotics, neural networks have been applied in the following problems: to solve the inverse kinematic problem of robots, to map the non-linear relationships in robot dynamics as an inverse dynamics controller, in path or trajectory planning, to map sensory information for robot control and in task planning and intelligent control.
This paper focuses about mapping visual sensory information for robot control. Recently, the three-dimension (3-D) vision systems for robot applications have been popularly studied. Baek and Lee [9] used two cameras and one laser sensor to recognize the elevator door and to determine its depth distance. Okada et al. [10] used multisensors for the 3-D position measurement. Winkelbach et al. [11] combined one camera with one range sensor to find the 3-D coordinate position of the target. Huang [4] addressed a 3-D position control for a robot arm utilizing two-CCD vision geometry and inverse kinematics. Zhou et al. [5] used position sensitive detector (PSD) for high-precision parallel kinematic mechanisms (PKMs) in order to allow them to accurately achieve their desired poses. Dallej et al. [12] developed 3D pose visual servoing for cable driven parallel robots.
An attractive approach is to have a system which learns the nonlinear relationship between the observed 2D feature deviations and the robot moments. Skaar et al. [6] developed a method for learning the image Jacobian, by way of leastsquares estimation, form several observations of cues along the approach trajectory. The method was successfully applied to a part-mating task. Neural networks have been applied in many areas of robot control, as described by Torras [13]. Hashimoto et al. [7] used a neural network to learn the direct mapping between the image deviations of four feature points and the joint angles of a 6-dof manipulator. A disadvantage to including the inverse kinematics in the mapping is that the learned relationship is pose-dependent, i.e. it only applies for positioning with respect to the target object in a particular location. In Wells' observation [8], a neural network is used to learn the pose-independent mapping between feature deviations and pose-changes based images sampled from the workspace. Cid et al. [14] developed fixed-camera visual servoing for planar robot manipulators composing control laws by the gradient of an artificial potential energy plus a nonlinear velocity feedback.
In this paper, the positioning problem of 5-DOF articulated robot manipulators is addressed under two fixed cameras configurations. The main contribution is the development of a new pose-independent learning method for the robotic endeffector positioning using two uncalibrated fixed cameras and robotic forward kinematics. The objective concerning the control is defined in terms of cartesian coordinates which are deduced from visual information.
The paper is organized by six sections. In section 2, the analysis of the R5150 robotic manipulator is performed with the forward kinematic modelling along with the mathematical treatment along with the development of the link coordinate diagram and the kinematic parameters. The theory of ANFIS technique is presented in section 3. In section 4, implementation of proposed system is described. Experimental tests and results are presented in section 5. Finally, the paper is concluded in section 6 with the observed results and future work.

Robotic Forward Kinematic Analysis
In this section, the forward kinematic analysis of a robot is described determining D-H parameters and calculating robot forward kinematics. To get the physical robotic model for simulating a robot in MATLAB, the link lengths and joint types of the LabVolt R5150 manipulator are modelled in Figure 1; and the link frame assignments of the robot is shown in Figure 2.

Determining D-H Parameters
D-H parameters table is a notation developed by Denavit and Hartenberg, which is intended for the allocation of orthogonal coordinates for a pair of adjacent links in an open kinematic system. It is used in robotics, where a robot can be modelled as a number of related solids (segments) and the D-H parameters are used to define the relationship between the two adjacent segments. The first step in determining the D-H parameters is to locate links and then, the type of movement (rotation or translation) is determined for each link. As it can be seen in Figure 1, the robot LabVolt R5150 has five rotational joints. Cranks, axes and rotation angles. They are shown as a simplified diagram in Figure 2. Using D-H parameters defined in the previous steps in Table l, the robot model was created in MATLAB software using the Robotic Toolbox. Robot model in addition to previously determined D-H parameters contains physical parameters which is using in the calculation of the dynamics movement.

c c s s s c c c s c s c a c a c d s s s c c s c c s c s s s s a c a c d s T T T T T T s c s s c d a s a s d c
where, The observed robot has 3-DOF three links with a 2-DOF wrist mechanism. In this work, the location of the end of the first three links is tracked using one colored feature point. Since a coloured feature point is attached to the end of the first three links of the robot, the first three basic transformation matrices are calculated to get the location of the feature point. The final homogeneous transformation matrix for locating the feature point is got from the product of three basic transformation matrices: where, cos cos cos sin sin sin cos sin sin cos θ θ θ θ θ θ θ From the position matrix P, the location of the feature point is calculated as along as the movement of robotic arm.
In this work, the forward kinematics of the robot is used to simulate and drive the robot for learning ANFIS networks and driving the robot to a specified trajectory or location.

Adaptive Neural-Fuzzy Inference System
Adaptive Neural-Fuzzy Inference System (ANFIS) is developed by R. Jang [1]. ANFIS is a hybridization of neural network and fuzzy logic methods. This is basically type of a feed forward neural network which involves fuzzy inference system through the structure of neural network and their neurons. It gives the learning ability of neural network to fuzzy inference system. The method is mainly developed for the evaluations of nonlinear functions that generally identifies nonlinear elements on line for control system design and predicts chaotic time series.
ANFIS structure is consists of five different layers such as fuzzy or input layer, normalization layer, product layer, defuzzification layer, and summation layer. Basic structure of the ANFIS is given in Figure 3, in which fixed node is given by circle and adjustable node is given by square. Suppose if there is two inputs x and y with one output z then ANFIS can be used as a first order Sugeno FIS. There are many fuzzy systems like Sugeno, Mamdani etc., but most popular and widely used system is Sugeno model due to its high interpretability and computational efficiency with default optimal and adaptive tools. Therefore, first order Sugeno fuzzy rule can be expressed as follows: First Rule: , If x is A and y is B then Z p x q x r = + + where, A i and B i are fuzzy sets; and p i , q i and r i are parameters which is assigned during training process. As presented in Figure 3, ANFIS structure consists all five layers.

i. Layer 1 (Input layer)
In this layer, each node is equal to a fuzzy set and output of a node in the respective fuzzy set is equal to the input variable membership grade. The parameters of each node determine the membership function in the fuzzy set of that node.
Now output node will be defined by ( ) where, x is the input value of the node; and 'c' and 'σ' determine the Gaussian membership function center and its width, respectively. Parameters in this layer are referred to as premise parameters.
ii. Layer 2 (Product layer) The output of each node represents the weighting factor of rule or product of all incoming signals. In which Each node output represents the firing strength of a rule.

iii. Layer 3 (Normalization layer)
Every node (circle) in this layer is a fixed node labelled as N. This layer is also called normalized layer. It calculates the ration of weight factor of the rule with total weight factor. In this layer, the average is calculated based on weights taken from fuzzy rules: where, i ω are normalized firing strengths.

iv. Layer 4 (Defuzzification Layer)
The output of every node is calculated by multiplying the normalized one with the consequent parameters ( , ,

v. Layer 5: (Summation Layer)
The single node here is a fixed node, labelled as Σ, which compute the overall output as the summation of all incoming signal. It can be expressed as follows: 5 1 1 2 2

Proposed System for ANFIS-Based Visual Positioning Approach
In this work, the visual positioning is trained using ANFIS as well as robotic forward kinematics and multi-view geometry. The flowchart of training process is shown in Figure 4.
In this research, the first step is to test the working space of the robotic arm in vision. In order to do so, a m-file have been created in MATLAB based on the direct kinematics of the robotic arm and the epipolar geometry of two cameras. In this work, two pin-hole cameras are used; installed above and on the lateral side of the robot, respectively. The cameras' focal length and view limits are identical as 0.002 in m and [0, 1024, 0, 1024] in pixels, respectively.
The required data for ANFIS-learning is created in the robotic working space by varying the relative position between the robotic arm elements and acquiring image data from two cameras. The displacement between two consecutive elements was limited to their maximum and minimum ranges. This is so called as motor babbling phase, and the code in MATLAB is presented in Figure 5, and the simulation setup of robot and two cameras is shown in Figure  6. Figure 7 represents the maneuvering points of the robotic arm captured by two camera views during the motor babbling phase. In this research, Corke's RVC v9.10 MATLAB toolbox [2] is used for the simulation of robot and two cameras. To get the required data, the manipulator was maneuvered within its workspace using forward kinematics, and the endeffector image coordinates were acquired from two cameras as shown in Figure 8. The image coordinates of these cameras are [u 1 , v 1 ] and [u 2 , v 2 ], and they are used as training data. Therefore, there are four inputs for ANFIS training.
ANFIS network is trained with the Gaussian membership function with a hybrid learning algorithm. For the neurofuzzy model in this work, 588 data points analytically obtained using forward kinematics, of which 294 are used for training and the remaining 294 are used for validating.    Since ANFIS is a judicious integration of FIS and ANN, it is capable of learning, high-level thinking and reasoning; and combines the benefits of these two techniques into a single capsule. The success for FIS is the finding of the rule base. The reason being that there are no specific techniques for converting the knowledge of human beings into the rule base and also in order to maximize the performance of the model and to minimize the output error, further fine tuning of the membership functions is required. Thus, when generating a FIS using ANFIS, it is important to select proper parameters, including the number of membership functions (MFs) for each individual antecedent variable. It is also vital to select appropriate parameters for learning and refining process, including the initial step size (ss). In the present work, the commonly used rule extraction method applied for FIS identification and refinement is subtractive clustering. The MATLAB Fuzzy Logic Toolbox has been used for ANFIS model development. The flowchart of the ANFIS training in the work is shown in Figure 9. Here the initial parameters of the ANFIS are identified using the subtractive clustering method. However, it is vital to properly define the subtractive clustering parameters, of which the clustering radius is the most important. It is determined through a trial and error approach. By varying the clustering radius r a with varying step size, the optimal parameters are obtained by minimizing the root mean squared error (RMSE) based on the validation datasets. Clustering radius r b is selected as 1.5 r a . Gaussian membership functions are used for each fuzzy set in the fuzzy system. The number of membership functions and fuzzy rules required for a particular ANFIS is determined through the subtractive clustering algorithm. Parameters of the Gaussian membership function are optimally determined using the hybrid learning algorithm. Each ANFIS is trained for 400 epochs.
Gaussian membership function has been used as the input membership function and linear membership function for the output function. Here, separate sets of input and output data has been used as input arguments. In MATLAB, "genfis2" generates a Sugeno-type FIS structure using subtractive clustering. genfis2 is generally used where there is only one output; hence here it has been used to generate initial FIS for training the ANFIS. On the other hand, "genfis2" achieves this by extracting a set of rules that simulates the data values. In order to determine the number of rules and antecedent membership functions, "subclust" function has been used by the rule extraction methods. Further it uses the linear least squares estimation to determine each rule's consequent equations.
However, ANFIS itself is only suitable for single output system. For a system with multiple outputs, ANFIS will be placed side by side to produce a Multiple-output ANFIS (MANFIS) [1]. The number of ANFIS required depends on the number of required output. In this research, the cartesian coordinate points have to be outputted as ANFIS outputs. Figure 10 shows a MANFIS with three outputs; x, y and z. Since the input data remains the same for each ANFIS, they also have the same initial parameter such as initial step size, membership function (MF) type and number of MF. The parameters used in the model for training ANFIS are given in Table 2 and the rule extraction method used is given in Table 3. Table 4 summarizes the results of types and values of model parameters after training MANFIS.

Simulation Tests and Results for Visual Pose Estimation
Three different ANFIS are designed for visual pose estimation of 5-DOF robot; x, y and z, respectively. The proposed method gives good estimation of the position of the 5-DOF robotic end-effector. A data set of 588 cartesian points analytically obtained using forward kinematics, and feature points captured by two cameras in the motor babbling phase is used for training and validation; 294 and 294, respectively.
After the training is complete, the model is validated using a different set of data from the one used before to train the FIS. In Figure 11-13, the rule viewers for x, y and z are presented. The rule viewer displays a roadmap of the whole fuzzy inference process. This represents a very useful tool for modifying and changing the fuzzy rules.
The validation of individual data set using ANFIS is done by calculating the difference between the cartesian coordinates deduced using robotic forward kinematics and the ones using ANFIS. A total of 294 observation points generated in the workspace for validating purpose are considered to find the error of the cartesian coordinates. The plot of the comparative results for deduced and predicted cartesian data is shown in Figure 14. Observing the results, the differences between FKbased and ANFIS-based data for individual Cartesian coordinate (X, Y, Z) are not much in 10 -3 . Therefore, validating has a good estimation using the specified ANFIS models.    After testing the ANFIS networks, the MSE, RMSE, Error Mean and Standard Deviation (STD) Errors are calculated to check the estimation performance; described in Table 5. RMSE is a useful tool for comparing the forecasting errors. STD is one of the indicators that show the distribution of data on average how much the average value away. If the standard deviation of the data set is close to zero, it means that the data are close to the average and dispersion are small, while large standard deviation indicates a significant distribution data. Observing the errors, it can be concluded that the proposed approach is efficient in estimating the location of the uncertain robotic arm.

Conclusions
An ANFIS-based visual positioning approach using two cameras is proposed in this paper. The idea of using forward kinematic equations and two cameras for generating training data for ANFIS led to a nearly accurate training of the ANFIS network. Simulation experiments show that the location of the robotic arm can be trained in ANFIS using two uncalibrated cameras. Observing the errors, the estimated position of the robotic arm is efficient for visual feedback control. Further, the proposed ANFIS based approach is very useful for obtaining the position of the robotic arm in Cartesian coordinate system as it can work as a control algorithm. The Cartesian-coordinate-based learning can be used in robotic calibration, visual servoing and Cartesian controller. The authors are planning to use the pose tracking using MANFIS and uncalibrated cameras for the visual servoing of the robot in future.