Feed Forward Neural Network Versus Kernel Regression a Case of Body Mass Index and Body Dimensions

: Body mass index is a measure of body fitness and is considered very important in screening body categories that may lead to health problems. Understanding risk factors of obesity provide more insight and nature of policies that can be put up to fight obesity. However, uncertainty regarding most appropriate means by which to define excess body weight remains. It is important to develop models that best calculate Body Mass Index to help reduce the chances of obesity. The objective of this research ismodeling Body Mass Index using Feed Forward Neural Network and Kernel regression. Modeling will be first done using height and weight alone, later 21 body dimensions will be added. The analysis was based on body dimensions data provided by San Jose State University and the U.S. Naval Postgraduate School in Monterey, California. To determine the best model, Adjusted R 2 and Mean Square Error (MSE) were used. From the results of the study, Kernel regression was better in modeling Body Mass Index than Feed Forward Neural Network.


Background of the Study
Better health is central to human happiness and well-being. It also makes an important contribution to economic progress, as healthy populations live longer, are more productive, and save more. Body Mass Index (BMI) is used as a measure of persons' fitness and is therefore considered a very important measure by health professionals. Using BMI one maybe categorized as underweight while the other overweight due to percentage of body fat. Understanding which factors influence individual body weight and how exactly excess body fat is contributing to increase risk for disease may help to reduce the increased prevalence of several common disorders associated with obesity, thereby lessening the burden placed on health care systems. Men are said to move up the rank of BMI as compared to women. BMI is termed as an indirect measure of body fat and it indicates weight-for-height without considering differences in body composition and the contribution of body fat to overall body weight. One of the most appealing features of nonparametric estimation techniques is that, by allowing the data to model the relationships among variables, they are robust to functional form specification and therefore have the ability to detect structure which sometimes remains undetected by traditional parametric estimation techniques. A feed forward network is an artificial neural network where connections between the units do not form a directed cycle. The feed forward neural network was the first and simplest type of artificial neural network devised. Information moves only in one direction, forward from input nodes through hidden nodes (if any) and to output nodes. There are no cycles or loops in the network.

Statement of the Problem
Body Mass Index is used for screening weight categories as either underweight, normal or overweight and is measured in kg/m 2 . The obesity epidemic in adults is an enormous societal problem with far reaching consequences. Overweight and obese adults also have higher rates of high blood pressure; abnormal insulin levels among other health problems and is associated with decreased survival. Body Mass Index traits are influenced by both genetic and non genetic factors and provide a simple measure of a persons' thickness. Body Mass Index has always been calculated using height and weight alone without considering the effect of other body dimensions like it ignores waist size, which is a clear indicator of obesity level. Incorrect BMI categorization may prevent some people from receiving necessary weight loss help and may mislabel others as overweight when they have a healthy percentage of body fat. There is therefore a need to develop better models that give more accurate Body Mass Index values. The developed models can therefore be applied when calculating BMI to reduce the risk of obesity and this will help in improving health in any population. Kvaavik et al. (2003) showed that women are more likely than men to move down in BMI rank, while men tend to move up in BMI rank.

Justification of the Study
World Health Organization (WHO) states that for adults, the healthy range for BMI is between (18.5 and 24.9), less than 18.5 is underweight while greater than 24.9 is overweight. BMI provides simple measure of a persons' thickness allowing health professionals to discuss weight problems more objectively with their patterns. This project came up with a more accurate statistical model that predicts Body Mass Index using other body dimensions other than height and weight alone. The study develops models using Feed Forward Neural Network and kernel regression techniques. These techniques are chosen because the predictor does not take a predetermined form but is constructed according to information derived from the data. The two models are then compared and best developed model can be used to reduce chances of obesity.

General Objective
The drive of this study is to compare the performance of feed forward neural network and kernel regression.

Review of the Previous Studies
Hoseini [5] applied Artificial Neural Network (ANN) in estimating Body Mass Index based on the connection between environmental factors and physical activity. The statistical analysis showed that despite the apparent association of Body Mass Index with physical activity level, it is influenced by several factors such as age, residence record, number of children, distance to bus or sport exercise. Then, Artificial Neural Network (ANN) was applied to predict the level of personal BMI. The results of this analysis showed that the generalized estimating ANN model was satisfactory in estimating the BMI based on the introduced pattern. Although BMI itself is easy to calculate, the system of underlying contributing factors and their inter-correlation is multifaceted. At the individual level, obesity is caused by a continuously positive energy balance, when more calories are consumed than expended. However, the influences driving individual choices which affect the energy balance are highly complex. Within the UK Government's Foresight Program, a system map was developed that describes the obesogenic environment of interacting influences on weight gain, without identifying any single dominating factor Frayling [2]. In addition to food and physical activity choices, these influences include biological and medical traits, social and psychological components, as well as effects from the built environment and infrastructure. Measurements were initially taken by Heinz [3] at San Jose State University and at the U.S. Naval Postgraduate School in Monterey, California. They modeled data using discriminant analysis and parametric approaches of multiple regression. Later, measurements were taken at dozens of California health and fitness clubs by technicians under the supervision of one of these authors. Usually, weight was thought to be linearly related to height. A better fit was achieved by modeling weight as linear combination of all of the girth measurements. The hypothesis that body build (skeletal) variables and height predict scale weight substantially better than height alone was affirmed by Heinz [3] the initial objective of the Study was to determine how well weight could be predicted from body build for a dataset of physically active young individuals within the normal weight range. With this in mind, weight was fitted from the nine skeletal variables and height. Other areas of study that saw early mention of body dimension data is in biostatistics, forensic and ergonomic topic. Body Mass Index has traditionally been chosen method by which to measure body size in epidemiological studies, alternative measures such as waist circumference Wei [12] Welborn and Dhaliwal [13] waist: hip ratio (WHR) Jansses [6]) and waist: height ratio Ho [4], which reflect central adiposity, have been suggested to be superior to BMI in predicting CVD risk. In part this stems from the observation that ectopic body fat is related to a range of metabolic abnormalities. Kvaavik [7] study tracked 485 subjects from 15 to 33 years of age, examining the effect of health-related behaviors (leisure time physical activity, smoking, and physical fitness), parents' BMI, and adult education as predictors of adult overweight and obesity. Results showed those with the highest BMI at baseline had the highest risk of having a BMI of 30 as an adult. Women were more likely than men to move down in BMI rank, while men tended to move up in BMI rank. The adolescents' BMI and their fathers' BMI were the strongest independent predictors of adult BMI. The development of layered feed-forward networks began in late 1950's represented by Rosenblatts perceptron and Widrow's Adaptive Linear Element. (ADLINE). Both the perceptron and ADLINE are single layer networks and are referred to as single layer perceptrons and solve only linearly separable problems. The limitation led to development of multi layer feed-forward networks with one or more hidden nodes called multi-layer perceptron networks. The first published paper in kernel estimation appeared in Rosenblatt [10] and the idea was proposed in an USAF technical report as a means of liberating discriminant analysis from rigid parametric specifications. Since then, the field has undergone exponential growth and has even become a fixture in undergraduate textbooks, which attests to the popularity of the methods among students and researchers alike. Though kernel methods are popular, they are but one of many approaches toward the construction of flexible models. Approaches to flexible modeling include spline, nearest neighbor, neural network, and a variety of flexible series methods, to name but a few. Related work includes Stone [7] who consider resistant local polynomial fitting using weighted least squares. Cizek and Hardle [1] considered robust estimation of dimension-reduction regression models. In a recent paper Li and Racine [8] propose a nonparametric kernel-based CDF estimation method. They consider a very general setting allowing for both continuous and discrete covariates, while the dependent variable (s) can also be discrete or continuous.

Introduction
In this section, we discuss Feed forward neural networks used. We then discuss kernel regression and its procedure. Lastly, the model performance measures.

Feed Forward Neural Networks
Feed forward neural networks is an artificial neural network which represent a function of explanatory variables which is composed of simple building blocks and which will be used to provide an approximation of conditional expectations. Connections between units do not form a directed cycle. Artificial neural network is a parallel connection of a set of nodes called neurodes (weights).
Input at hidden layer nodes are connected by weights for ℎ(∈ 1, … , ) and ∈ (1, … , ) where is the bias of the i th hidden node. The hidden and output layers are connected by weights for and ℎ(∈ 1, … , ). Considering an input vector = ( , , … , ) ℰℝ, and ℝ is the real line, the input ( ) to the hidden node is the value The output becomes

Training a Neural Network
The Sum of squared error (SSE) is used to train faced forward networks. In this method the weights are adjusted in such a way that the SSE between the targets y and the goal of output Z is minimized.

Kernel Regression
One of the most popular methods for nonparametric kernel regression was proposed by Nadaraya [9] Watson [9] and is known as the "Nadaraya-Watson" estimator though it is also known as the "local constant" estimator for reasons best described when we introduce the "local polynomial" estimator. Kernel simply means a weighted function and the primary role of the kernel is to impart smoothness and differentiability on the resulting estimator. The appeal on non-parametric methods lies in the ability to reveal structure in data that might be missed by classical parametric method. Kernel methods have the potential to recapture the efficiency losses associated with non-parametric frequency approaches as they do not rely on sample splitting rather they smooth the categorical variables in appropriate manner Li and Racine [8]. Kernel density estimation approach overcomes the discreteness of histogram approach by centering a smooth kernel function at each data point then summing to get a density estimate. The common kernel functions include uniform, triangle, Epanochnkoz, biweight, tricube, Gaussian and cosine. Kernel density estimate approach has a problem in varying data density; regions of high data density could have small h while sparse data need large h. To overcome this problem we allow bandwidth to vary Nadaraya [9] Watson [9]) proposed to estimate ' ( ( ) as a locally weighted average using kernel as a weighting function. The Nadaraya Watson estimator is given by Wherek is the kernel and h is the bandwidth.

Bandwidth Selection
The key to sound nonparametric estimation lies in selecting an appropriate bandwidth for the problem at hand. Least squares cross validation is a data driven bandwidth selection method. Typically bandwidth is chosen by minimizing 1 risk, Meanintegrated square errors (MISE).

Model Performance Measures
To estimate the best model among the two, Adjusted R 2 and mean squared error was used.

Non-parametric R 2
The model that has the highest value ofR 2 is the best model. Let y i denote the observed value and < = i denote the fitted value for observation i.

Adjusted R 2
The use of an adjusted R 2 is an attempt to take account of the phenomenon of the R 2 automatically increasing when extra explanatory variables are added to the model. Adjusted R 2 is defined as Wherep is the total number of explanatory variables in the model (not including the constant term), and n is the sample size. The model with the highest value of adjusted R 2 is the best model.

Mean Square Error (MSE)
A common and convenient measure of estimation precision is the mean squared error and it measures the average of the squares of the error that is the difference between the estimator and what is estimated.
It is defined by the following equation 2 4 H6 7 I 46 7 # 6 (10) A model with least MSE is the better model fit.

Wilcoxon Rank Sum Test
Wilcoxon rank sum test compares the medians from two populations and works when the Y variable is continuous, discrete-ordinal or discrete-count, and the X variable is discrete with two attributes. In this test % , % , … , % J are identically independent distribution function of K L . Let M 1 be the median for distribution K M and M 2 median for distribution K L . Then M 1 -M 2 =0 will be denoted by d m .
To test : J 0P Against a suitable alternative hypothesis.

: J Q 0P
This test assumes there is no difference between the medians.

Description of the Data
In this study, there was consideration of nine skeletal measurements. These included biacromial, biiliac, bitrochanteric and chest diameters. These measurements were done using anthropometer. To get the measurements of the other four skeletal measurements which included elbow, wrist, knee and ankle there was use of a smaller anthropometer. At this age it was noted that measurements like height already attained maximum size. Twelve girth measurements which included shoulder, chest, waist, hip, bicep, thigh, calf, forearm, navel, wrist, ankle and knee were included in the study. These measurements however are not fixed but vary over time except only the wrist, knee and ankle which are most likely to remain constant over time. The other measurements included in the study was height and weight and this was done for individuals in their twenties and a few individuals in their thirties. The total number of explanatory variables under consideration was therefore 23.  Figure 1 shows a box plot of gender on BMI, male have more extreme values (shown as circles separated from the box) or large departures from symmetry while female have fewer. Box plots are used to show overall patterns of response for a group. They provide a useful way to visualize the range and other characteristics of responses for a large group. The median is indicated by the horizontal line that runs across the center of the box. In the box plot above the median for male is approximately 22 while for female is approximately 24 and therefore the BMI of female is higher than the BMI for male.

Introduction
Feed forward neural network utilizes the nnet package while Kernel regression estimate was done using the add-on package "np" for nonparametric regression and nonparametric specification tests. This chapter describes how Feed Forward neural network and Kernel regression were used to model BMI. The chapter also describes how the modeling results from both models were compared.

Selecting Best Feed Forward Neural Network Model for BMI
Multiple values of MSE were calculated in order to determine the optimal number of hidden nodesfor 2 explanatory variables and was found to be 2 hidden nodes with MSE of 3.2199, the optimal hidden nodes is given by the least MSE. These hidden nodes gave Adjusted R 2 of 0.98518. Multiple values of MSE were calculated in order to determine the optimal number of hidden nodes for 23 explanatory variablesand was found to be 2 hidden nodes with MSE of 2.85234, the optimal hidden nodes is given by the least MSE. These hidden nodes gave Adjusted R 2 of 0.98523. From the study the model with 23 explanatory variables was a better model.

Testing the Effect of Gender on BMI
Using wilcoxon signed Rank test, the statistic W is 16100 and the P-value<2.26e-16. The p-value is less than 0.5% significance level and therefore we reject the null hypothesis and conclude that the median of male and female are significant. The study thus concludes that gender has an impact on BMI.

Conclusion
This chapter presents summary of key findings and conclusion drawn from the study. The main objective of this study was to compare performance of Feed forward neural network and kernel regression models in calculating BMI. Kernel regression model and Feed forward neural network model can effectively calculate BMI. However kernel regression is considered as the best model in this study. This is because of the ability of kernel regression to have datadriven methods of bandwidth selection. Nonparametric kernel smoothing methods have experienced tremendous growth in recent years, and are being adopted by applied researchers across a range of disciplines. Kernel approaches offer a set of potentially useful methods to those who must confront the vexing issue of parametric model misspecification. The appeal of nonparametric methods lies in their ability to reveal structure in data that might be missed by classical parametric methods.

Recommendations
We recommend that in calculation of Body Mass Index it is important to consider the effect of other body dimensions other than weight and height alone. We also recommend future research usinglocally weighted regression (LOWESS) and smoothing splines.