Food Components as Markers Linking Health and Environment: Statistical Invariance Analysis of in natura Diet
Sony Computer Science Laboratories, Inc., Tokyo, Japan
To cite this article:
Masatoshi Funabashi. Food Components as Markers Linking Health and Environment: Statistical Invariance Analysis of in natura Diet. American Journal of Bioscience and Bioengineering. Vol. 3, No. 6, 2015, pp. 183-196. doi: 10.11648/j.bio.20150306.17
Abstract: Diets are key factors that link environmental and human health. Global degradation of ecosystems and health state are firmly related to diet transition and production system. We propose a distinction of in cultura and in natura diet by the culture condition and consequent environmental load it imposes, which leads to the definition of in natura diet as a possible alternative for sustainable diet. By considering food components as markers linking health and environment, we investigate statistically invariant features that characterize the difference between in cultura/natura diets on 2 independent databases, INFOODS food composition database and Synecoculture products. Plural distinctive features between in cultura/natura diets were discovered in numerically sampled intake distribution. Taking the food diversity limit, in natura diet tended to be more consistent in relation to larger population with major components and minerals, and a significant difference with in cultura diet was encrypted in variance component. Possible interpretation of the results may relate recent health burden to historical transition from in natura to in cultura diet.
Keywords: Food Components, Statistical Analysis, Sustainable Diet
1.1. Diet and Food Production: Health and Environmental Burden
Food, health and environment is a trilemma to achieve sustainable social-ecological system . Recent increase of chronic diseases is raising medicare expenditures and has become one of the tight burdens of national budgets in many countries (e.g.  ). This problem presumably originates from the change of lifestyle along with the globalization, in which diet is an essential factor  .
Besides, modern production systems that support food supply is considered to exceed the acceptable limit of environmental burden. Agricultural land use is reported as the most devastating factor of biodiversity on the globe , and exacerbate the plant extinction rate more than climate change . Among a wide range of stakeholders including governments, scientists, and civil society, conventional agriculture and distribution system is considered not a sustainable option for the next generation .
For sustainability purpose, we need to seek ways to measure the impact of food systems on both health and environment, and define the condition of sustainable diet that is both health promoting and ecologically sound.
1.2. Bottom-up Components Analysis
Modern nutriology has clarified the five major nutrients necessary to support the basic functioning of our metabolism: proteins, carbohydrates, lipids, vitamins and minerals . While the major components are decisive to performances in short-time scale, a huge variety of complementary compounds such as phytochemicals and trace elements are considered to support long-term health protective effect   . These long-tail compounds are increasingly studied with the development of empirical measurement technology such as metabolomics . Especially phytochemicals are the source of most pharmaceutical products , and are candidates for long-tail drug discovery that is expected to be more compatible to the whole functioning of cellular metabolism than target-specific drugs .
While studying the long-tail part of food components, complexity of interaction arises in determining the net effect of bioactive compounds working in combination. Phytochemicals are known to express its diverse functionality as additive and synergetic interactions with other elements . Theoretically, the complexity of such combination can extend to the whole system size . This implies that the reconstruction of diet from element level such as synthetic supplement is not effective considering the complexity of the actual interaction in whole food, neither sustainable for the cost to reconstruct the whole biological process artificially.
1.3. Top-down Cohort Analysis
Traditional food studies have also focused from macroscopic perspective with cohort analysis based on whole food consumption. Functional effect of food containing bioactive chemicals has been increasingly identified (e.g.  ). Nutritional and health state comparison has been investigated in plural regions with changing dietary patterns . Recently, particular examples have been investigated on rapid food system transition in China, in relation to health risks   .
Among empirical studies that connect between food systems, human, health, and environmental factors, beneficial food systems are reported from indigenous and traditional food communities with a sustainable use of local ecosystems (e.g.   ). The common condition is the wide introduction of various food items available in natural local environment, especially edible wild plants. If widely adopted, these alternative diets could both reduce health risk and improve ecological state .
Bioactive compounds with health-protective effect in wild edible plants are widely studied   . In some case, it is possible to substitute conventional diet with wild food without losing major nutrition profile . The use of wild edible plants has common feature with traditional preventive medicine such as traditional Chinese medicine and Ayurveda, which have evolved with heuristic discovery of wild plants utility.
Culture conditions such as wild plants and cultivated crops also involve important health factors other than nutrition, such as gut microbe  . It is known that soil microbiological flora affects gut microbe composition through food intake, which can only be evaluated with the analysis of net effect including the epiphenomena associated with food production systems.
1.4. Synthetic Perspective: Food Components as Markers Linking Health and Environment
As an integrated perspective to characterize sustainable diet, I propose to make use of food components databases to derive statistical measures as markers linking health and environment. Since food components are both influenced by culture condition and related to human health, it is possible to consider diet as an interface of health and environment .
The physiological property of bioactive compounds and traditional practice of ecologically sound food system both contain some necessary conditions for sustainable diet in terms of health and environment. Though, it has not been integrated yet to achieve a sufficient condition to define sustainable diet in a global context.
Sufficient condition for sustainable diet is an integrated attribute of food system that satisfies desired health and environmental quality, to which I expect to find consistent statistical measures with respect to food components distributions.
More precisely, health effect is biologically referring to missing inheritability of disease, in which epigenetic profile based on the life-course activity including diet habit accounts for the susceptibility to disease . This is a common objective with current nutriology seeking for a complete list of bioactive compound and its effect through food intake  (p.283). Environmental quality is typically the degree of biodiversity that is under direct threat of agricultural activity  . It also projects influence on compounds distribution, especially phytochemicals as it mediates the interaction between plants and other species .
2. in natura Diet: Definition
2.1. in cultura and in natura Diet
In order to yield a possible definition of sustainable diet, I proposed to divide the current food systems into 2 categories based on the culture condition :
1) in cultura diet, that refers to the food grown by humans with the conventional agricultural practice, i.e. tillage, fertilizer, and chemicals. It principally consists of growing cultivars, aiming at physiologically optimum growth of a single crop. It can be represented as the progress toward the complete control of culture condition regardless of environmental load, and can be symbolized with a factory culture system in spaceship as a social-scientific ideal type .
2) in natura diet, based on the harvest from wild edible species or cultivars growing in natural condition based on the self-organization of ecosystem. The production is based on the ecological optimum that represents actual observed range of conditions in nature, where a species grows in association with others . In case of human intervention such as introduction of edible species and harvest, only point disturbance that stimulates totally positive biodiversity response is accepted. As an ideal type, it can be symbolized with pre-anthropogenic ecosystem such as the era of the dinosaurs.
The terms in cultura and in natura  is an ecological extension of biological terms in vitro (cells in artificial culture medium) and in vivo (experiments on live organisms in a lab), in this case they divide culture conditions in a non-laboratory environment according to the degree of human intervention on the growing process of food.
The definition of in cultura and in natura diet is summarized in Table.1. In contrast to the conventional supply of in cultura diet, I propose a possible candidate for sustainable diet as in natura diet, if sufficiently productive practice is possible without environmental degradation.
|in cultura||in natura|
|Crop genotype||Cultivars||Wild species, Cultivars|
|Culture condition||Tillage, fertilizer, chemicals||Self-organization, point disturbance|
|Typical fields||Farmland||Natural ecosystem, Synecoculture|
|Ideal type||Factory culture in a spaceship||Natural ecosystems before anthropogenesis|
2.2. Synecoculture Experiment: Sustainable Production of in natura Diet
Synecoculture project has been testing the high-density mixed polyculture of edible plants to establish environmentally sound production system of in natura diet  . So far, evidences on biodiversity and partial food composition were obtained: Synecoculture system promoted the field biodiversity and the products were proved to express health-beneficial secondary metabolites, particularly terpenoids and flavonoids, with the metabolome and taste analysis   . Minerals concentration in the Synecoculture products also complemented the soil dose deficiency with respect to the conventional standard, which presumably attributes to the effect of enhanced biodiversity . Taken together, it is possible to consider the Synecoculture as an environmentally sound system for the production of in natura diet. Including the Synecoculture products, we will further study the statistical feature of in natura food components in comparison to in cultura counterpart.
3. Statistical Invariance Analysis
3.1.1. Sampling of Food Systems and Intake Distributions Based on the Food Diversity Limit
We consider a general framework of nutrition intake from a food system in order to establish statistical measures invariant to the particularity irrelevant to in cultura/natura distinction. Statistical invariance of in cultura/natura diets signifies statistical features that are conditional only to in cultura/natura distinction and remain invariant to other conditions.
As depicted in Figure 1, consider the database of all diets of all food components. It represents the true distribution of all existing food components in this world. Next, consider a particular food system of food items with in cultura/natura attribute. This is a possibly biased sampling depending on local availability and people’s preferences. For those living on this food system, actual nutrition intake is the sampling from food items, which generally converges to a normal distribution by central limit theorem regardless of the original distribution, if sufficient diversity of food and choice is assured. Let’s define that our diet is the meal-wise sum (linear combination) of food items out of , that iterates times to express the physiological effect. Then, usually is sufficient to observe convergence to normal distribution with large (results not shown). By finely choosing and , one can randomly assimilate any history of component intake that converges to the normal distribution.
We denote this normal distribution representing the food component oral ingestion with in cultura and in natura diets as the intake distributions and , respectively, where is the dose of a component, and represent its mean value and variance. For simplicity, we consider the component-wise intake distribution with one-dimensional variable , but the generalization to multivariate normal distribution is possible. Since actual increases by a factor of , we renormalize by dividing with , to yield the intake distribution invariant to .
The random sampling premise to justify the central limit theorem corresponds to the diversity of food. The convergence of the intake distribution to a normal distribution is proportional to the variety of recipes in the food system. We consider the highest limit of food diversity as random uniform sampling, and numerically yield the intake distribution for the analysis.
Based on the intake distributions, we consider statistically invariant properties that are sensitive to in cultura/natura difference and remain invariant to other bias that may arise in this model.
3.1.2. Actual Confidence Level Based on the Food Diversity Limit
Intake distributions are based on the random sampling from corresponding food systems, which are also based on unknown sampling from all possible foods. In order to discuss the general difference between in cultura/natura diets, one needs to infer their true distribution, or their population mean and variance. In reality, this can never be achieved as the complete listing of in cultura/natura diets is impossible. Still, it is possible to work on the convergence of the estimates with respect to the diversity of food systems: Which of in cultura/natura diets can better represent the larger population when the given food system is limited in elements number ?
For that purpose, we can consider the actual confidence level of the estimates of mean value and variance of intake distributions conditioned by . It can be formulated based on the standard theory of estimation of population mean and variance, which leads to examine their based and based 95% confidence intervals with degrees of freedom and , respectively: How much intake distributions from food system with elements can infer the mean/variance values of those with ?
To measure that, we can define the actual confidence level of mean and variance, and , with parameters and . The actual confidence level is defined as the percentage that the actual mean/variance value of the intake distribution from food system with species is situated within the 95% confidence range of the estimation from the food system with .
This analysis will evaluate the consistency of intake distribution, therefore the stability of components distribution among diverse food systems, when culture condition differed between in cultura and in natura.
3.1.3. Asymmetric KL-divergence Between in Cultura and in natura Diet Intake Distributions
The derivation of intake distribution may contain lots of bias that affect mean and variance parameters: Dose unit and measurement method may differ by database. Sampling from a food system may be biased by preferential recipe, which may not be uniform in a long run. Additionally, intake distribution does not necessary represent actual intake in the metabolism, as it goes through complex digestion process with diverse absorption rate, influenced by the genetic profile of eating person, gut microbiota. In-body combinatory interactions of bioactive compounds may also alter consequent physiological effect . To get rid of these variabilities and obtain invariant statistical measures, we consider the use of Kullback-Leibler divergence and between in cultura/natura intake distributions, which are defined as follows :
The KL-divergence between continuous distributions is known to remain invariant under any continuous transformation of parameters, including non-linear deformation. Therefore, these measures are statistically invariant to dose unit change, measurement methods, food ratio bias, and possibly buffers intake efficiency, genetic background diversity, synergetic interactions of components, etc, as long as they are expressed as continuous transformation commonly applied to in cultura/natura intake distributions. This is a quite universal assumption that is adopted to study invariant structure of statistical manifold in information geometry . Longer and combinatorial effects of biological variabilities that affect the eating person’s background, however, does not necessary remain invariant because these become specific to in cultura/natura conditions.
The KL-divergence is asymmetric by definition and represents the degree of discrepancy seen from one distribution to the other. In information theoretical interpretation, for example, represents the amount of information gain when the sample distribution changed from to through observation. More simply, it represents the amount of change in terms of distribution from to .
3.1.4. Decomposition of KL-divergence Between Mean and Variance Components
Based on the information geometry, it is possible to decompose the KL-divergence into the following :
Where and . This means that the term and represent the discrepancy with the difference of the mean value components, while and correspond to those of the variance components. For simplicity, we call and as the mean components, and as the variance components of and , respectively.
This decomposition can orthogonally separate the effect of mean and variance parameters on the discrepancy of intake distributions. It allows us to study the high-order statistics such as variance that may be important to consider the intermittent effect on our metabolism, which has been mostly neglected.
3.2.1. INFOODS Database
As an empirical database of food components that contains both in cultura/natura diets, we analyzed the INFOODS database with the distinction of in natura foods with "Wild (W)" parameter in the Type code, and in cultura food without . The species may differ between sampled in cultura/natura food systems. Due to available number of data, the analysis performed was limited to the databases of starchy roots & tubers, nuts & seeds, vegetables, fruits, and meat, on the food components that could satisfy sufficiently large sampling number as food systems (, except nuts & seeds with ).
3.2.2. Synecoculture Database and Corresponding Japanese Food Composition Table
As another independent example of in natura diet, we analyzed the Synecoculture database that comprises dose data of 4 minerals (Na, K, Mg, Ca) in 140 samples of Synecoculture products from 37 vegetable species . The corresponding in cultura diet of the same species was obtained from the standard food composition table in Japan .
4.1. Numerical Simulations of Intake Distributions of in cultura/natura Diets
We simulated the intake distributions of in cultura/natura diets from INFOODS database with parameters . We took 100 random sampling, which means we obtained 100 different intake distributions for each of in cultura/natura diets. This results in the comparison of pairs of in cultura/natura intake distributions for each food categories (starchy roots & tubers, nuts & seeds, vegetables, fruits, and meat), in varying species profile.
As for Synecoculture database, we took the parameters , regarding the convergence to normal distribution and to investigate different orders of food system as the number of species are limited to 37. We estimated the population mean and variance of each mineral component with 95% intervals for both in cultura/natura diets with , from which we obtained randomly 1000 pairs of in cultura/natura intake distributions, which are expected to represent the random sampling from a larger population of Synecoculture and conventional products in 4 minerals. This assumption is partially supported by the analysis of the actual confidence level that only decreases linearly in smaller sample (Figure 4).
4.2. Means and Variances of Food Components Intake Distributions
We first compare the magnitude relation between in cultura/natura mean values of the intake distributions. We calculated the following 2 indices, and , to judge the magnitude relation with respect to inequality and ratio between in cultura/natura mean values:
where describes a function that returns the number of pairs and satisfying the inequality condition. The function calculates the mean value for all given pair of and . The logarithm was taken to compare the ratio between and in an equivalent scale.
Table 2 summarizes the results. For both and , the score more than 0 signifies , while less than 0 corresponds to . Results of Synecoculture were generated with species-wise means of data. General tendency between and could not be strongly supported.
The results for the variance with the following definition of and are summarized in Table 3.
More apparent tendency of than mean values can be observed, especially in . This implies that in cultura diet may contain larger variance ratio in the intake distribution. Since this tendency is common to Synecoculture products with the comparison of the same species in conventional data, this property might necessarily attributes to the phenotypic plasticity of crops, in response to the artificial intervention on culture condition.
4.3. Actual Confidence Level of Food Components Intake Distributions
The results of the actual confidence level with respect to the mean value and variance were classified into 4 qualitative relations: > and < in terms of the overall magnitude relation, 〜 as qualitatively difficult to distinguish, and ⊂ as inclusion when the right distribution covers the left distribution.
The qualitative relations were judged based on the plots of and with respect to the available range within , , for INFOODS database. The examples of results with starchy roots & tubers are depicted in Figures 2 and 3.
The result of Synecoculture database was obtained with a quantitative analysis with linear regression of and with respect to . The difference of slope of the liner regression was evaluated by the following formula, with the slope of linear regression and of in cultura/natura diets:
which translates to and when , inversely and when . The results are depicted in Figure 4. All linear regressions were significant (p<0.01), which implies that the mean complexity of the 4 mineral components in these species can reduce only linearly with respect to the food system size.
The summary is listed in Table 4. is the number of food components used for the analysis, with which phytochemical number is associated with parentheses. There exists overall tendency of , when compared in larger ensemble with respect to the number of food components .
Among INFOODS databases, only the starchy roots & tubers category was accessible to phytochemicals such as beta-caroten, lutein, carotenoids, caffeic acid, chlorogenic acid, flavonoids and flavonols. Other categories of food did not contain sufficient dataset of phytochemicals for the analysis. The results may therefore be limited to investigated range mainly with major components (proteins, carbohydrates, fats) and minerals.
Figure 2. Actual confidence level of in cultura/natura diets intake distribution mean value from INFOODS database (starchy roots & tubers). The color represents the component name, and the marker shape specifies the value. Top: in cultura diet. Bottom: in natura diet.
Figure 3. Actual confidence level of in cultura/natura diets intake distribution variance from INFOODS database (starchy roots & tubers). The color represents the component name, and the marker shape specifies the value. Top: in cultura diet. Bottom: in natura diet.
4.4. KL-divergence Ratio Between in cultura and in natura Diet Intake Distributions
The distribution of KL-divergence ratio was investigated. The results are partially shown in Figure 5. Since KL-divergence is an asymmetric measure between 2 distributions, the ratio represents the characteristic of in cultura/natura in reference to each other. The summary of the is listed in Tables. 5. #c+ and #c- represents the number of components with the value and , respectively. The number of phytochemicals is shown with parentheses. was taken as the mean value for all components. Although specific food components could not commonly be separated between the groups #c+ and #c-, the overall tendency inclines to .
This means that in cultura diet is more distant to in natura diet than the other way round, when the intake distribution is considered as information input to metabolism. This result could be applied in considering the coevolution of human metabolism and the development of agriculture, which can be formalized as a long-term transition from in natura to in cultura diet during 10,000-12,000 years of agricultural history .
Figure 5. Density distributions of KL-divergence ratio . Left legend lists the components with in average, while right legend with . Left: INFOODS starchy roots & tubers. Right: Synecoculture.
Figure 6. Mean vs. variance components of KL-divergence and . Linear regression was performed on double-logarithmic scale. Left: INFOODS starchy roots & tubers. Right: Synecoculture.
4.5. Mean and Variance Components of KL-divergence Between in cultura and in natura Diet Intake Distributions
We investigated the relation between the mean components , and variance components , in the equations (3) (4). Significant linear regressions were obtained in double logarithmic scale. The slope of linear regression between and , and the slope of linear regression between and were calculated with p<0.05 (except for Syneco Na with p=0.112 and Syneco Mg with p=0.312).
The results were partially depicted in Figures 6 and summarized in Table 6. The function was used to judge the value of the slope between positive (+) and negative (-) value. In total, the relation and were dominant in both INFOODS and Synecoculture database. This implies that the variance components of KL-divergence is positively related with the mean components, and the degree is higher in than . This fact addresses a novel subject in food component analysis, to target the high order statistics as distinctive factor between in cultura/natura diets and investigate possible physiological effect it exerts on our metabolism. More concretely, the effect of variance component may relate to the foraging behavior, that are originally inseparable with diet for wild animals and hunter-gatherers .
We defined the in cultura/natura diets based on the culture condition in view of defining a category of sustainable diet in natura, taking both health and environmental factors into account.
Through statistical invariance analysis, common invariant features of INFOODS and Synecoculture database were investigated. With respect to the in cultura/natura distinction, some results could derive meaningful insights that this distinction differentiates invariant features of intake distributions, therefore the component statistics could serve as markers to support the definition of in natura diet in comparison to in cultura diet.
Among the invariant features that distinguish between in cultura/natura diets, analysis of indicated that culture condition affected more than genetic profile when component distributions were projected to intake distributions.
The result of () coincides with the result of actual confidence level , with the view that in cultura diet is more diversely differentiated by culture condition and difficult to estimate the population distribution from limited samples. Inversely, in natura diet might be more consistent in terms of intake distribution even the food system is limited.
Although actual confidence level showed some complexity reduction in in natura diet, the convergence of lower bound within sometimes exceed linear relation with in INFOODS database (Figures 2 and 3). This suggests that empirical study with metabolomics is still encouraged to augment plausibility of the invariance analysis in larger population.
Especially, further incorporation of phytochemicals will introduce more link to biodiversity, environmental quality as well as health protective factors. This would ameliorate the quality of statistical measures as integrated markers of health and environment.
Another recommendation that the results can provide is the recognition of high order statistics as significant parameter. In most of food science, and even in cellular biology with highly controlled nutrition condition in cell culture, little has been investigated on the effect of nutrient fluctuation to biological activity. While our metabolism has been continuously exposed to intermittent intake of foods through foraging behavior, current studies only focus on the stable input of nutrition with regular frequency. With this respect, current state of knowledge on the health effect of food components seems to be biased by the stable input of in cultura diet. If we look back at the Paleolithic diet to recover lost aspects of our health , in natura diet in relation to the variance components might be a possible alternative for the investigation. As the dynamic aspect of our metabolic response is considered to be essential in the adaptive functioning , large fluctuation response of homeodynamics will be a key issue for further investigation of in natura diet.
Finally, how the asymmetry of between in cultura/natura diets affects our metabolism in physiological and evolutionary scale remains to be investigated. Theoretically, environmental drift would be imprinted in genetic profiles through the selection within phenotypic plasticity . How adaptation to in cultura diet that led to achieve today’s civilization has been affecting our health? This is a question that could relate to the metabolic consequence of agricultural development associated with the self-domestication of humans by diet transition. Through which, we may clarify the trade-off between in cultura and in natura diets with both health and environmental perspectives.
The author acknowledges Kousaku Ohta, Tatsuya Kawaoka, Kazuhiro Takimoto, and Shuntaro Aotake who worked as research assistant.
Supplement figures for Figures 2, 3, 5, 6 are shown below.
Figure 7. Supplement Figures for Figure 2. Actual confidence level of in cultura/natura diets intake distribution mean value from INFOODS database. From top to bottom, nuts & seeds, vegetables, fruits, meat. The color represents the component name, and the marker shape specifies the value. Left figures: in cultura diet. Right figures: in natura diet.
Figure 8. Supplement Figures for Figure 3. Actual confidence level of in cultura/natura diets intake distribution variance from INFOODS database. From top to bottom, nuts & seeds, vegetables, fruits, meat. The color represents the component name, and the marker shape specifies the value. Left figures: in cultura diet. Right figures: in natura diet.
Figure 9. Supplement Figures for Figure 5. Density distributions of KL-divergence ratio . Left legend lists the components with in average, while right legend with . From top left to bottom right: INFOODS nuts & seeds, vegetables, fruits, meat.
Figure 10. Supplement Figures for Figure 6. Mean vs. variance components of KL-divergence and . Linear regression was performed on double-logarithmic scale. From top left to bottom right: INFOODS nuts & seeds, vegetables, fruits, meat.