About the Evaluation of the Effectiveness of Various Methods of Diagnosis, Prediction and Classification Presented in the Literature

: The literature presents a large number of very different ways to diagnose, predict and classify. In support of their usefulness and effectiveness, various parameters are often used


Introduction
There is an extensive literature about various ways of diagnosis, predictions and classification, in which their main property -efficiency is described and evaluated. An analysis of this literature on how to evaluate the effectiveness of various methods is important. This is a really important question since it is appropriate to speak of any proposed method as such only when parameters describing its effectiveness are given in a publication describing it. This is what allows us to do a comparative analysis of this method with other similar ones with the ability to reproduce these parametric data, which is possible when the necessary and sufficient conditions are fulfilled for this. Meanwhile, in the literature, especially in the patent, this is often not observed right, mainly, due to the lack of a precise theory of this problem.

Evaluation of the Effectiveness of Various Methods of Diagnosis and Prediction According to the Data of Patent Literature
In the patent literature was proposed to predict the outcome of myocardial infarction for patients based on a special analysis of multi-sign data [1]. Evaluation of the effectiveness of the proposed forecasting method was done in the control group of patients of 43 people. Prognostic signs were assessed one day after patients arrived to the hospital. The correct outcome was predicted in 38 people and it was incorrect in 5 people. For all cases of death, the diagnosis was confirmed by a pathoanatomical research. In the authors terminology, the accuracy their forecasting method is 88.3%. The authors of this work, declared as the invention, use two terms: the accuracy and the authenticity of the forecast. From the description of the patent it is clear that the authors identify these two terms. Since the term forecast accuracy is commonly used, the use of other unclear terms by the authors only confuses readers. At the same time, to assess the effectiveness of their forecasting method, the authors used only one informative parameter -accuracy. However, one parameter is not enough to fully characterize the effectiveness of the proposed method. Indeed, the five patients were predicted with the wrong outcome. Maybe five patients were predicted a favorable outcome, but it ended up as lethal. But it may be that five patients were predicted a fatal outcome, which, fortunately, ended up as a favorable outcome. This is convincing evidence of great uncertainty in the use of one parameter, in this case accuracy, for assessing the effectiveness of the prediction methods. The positive moment of the work that deserves attention was the presence of a numerical description of the resource of patients used in the analysis, which is important for assessing the real effectiveness of a particular method of diagnosis, prediction and classification. Further, this problem will be considered in more details.
In the work of the authors Malishevsky M. V. et al a prediction method was proposed as an invention for predicting death in nephrological patients on the basis of a multifactorial discriminant analysis of patient vital signs [2].
In work [2], as an invention, a forecasting method was proposed for predicting a fatal outcome in nephrological patients on the basis of a multifactorial discriminant analysis of the vital signs of patients. According to the authors, they invented an objective method for predicting death in dialysis patients with the end-stage chronic renal failure. The method allows to assign a specific dialysis patient with end-stage chronic renal failure to one of the groups: a live group or a possible fatal group in the next 15 months. Meanwhile, the sensitivity of this method in this work was 75%, specificity -80% and predictive value -79%. Unfortunately, this parameter for evaluating the effectiveness of this method, such as predictability has two different options, and which of them is actually used is a mystery.
A method for predicting recurrent myocardial infarction, was proposed Gridasova R. A. et al, [3]. That method based on analyzing the behavior of two clinical signs, the concentration of modified lipoproteins and non-erythrocyte hemoglobin. According to the authors that method allows us to determine the possibility of recurrent myocardial infarction with high accuracy and choose the appropriate tactics for patient treatment.
A method for predicting recurrent myocardial infarction, proposed in work [3] and based on analyzing the behavior of two clinical signs, the concentration of modified lipoproteins and non-erythrocyte hemoglobin, according to the authors, allows us to determine the possibility of recurrent myocardial infarction with high accuracy and choose the appropriate tactics for patient treatment. It is not clear what is high accuracy, no data on this term is given. Moreover, it is not possible to objectively characterize the effectiveness of the prediction method by using only one parameter such as accuracy. A method for predicting an unfavorable course of sepsis, based on an analysis of the concentration of serum iron in patients, is presented in work Barkova E. N. et al [4]. This method of predicting an unfavorable course of sepsis, according to the authors of this patent, makes it possible to predict different outcomes in its course with an accuracy of 95-100%. It seems extremely strange that they give a numerical range of precision values, since this parameter of the effectiveness of the prediction method is determined uniquely and not by a range of values. In addition, as already noted, the parameter "accuracy" cannot objectively characterize the effectiveness of the forecast methods. In the literature, as already noted, there is a big uncertainty in regard to parameters characterizing the effectiveness of different prediction methods. For example, in the patent Shirokova N. M. et al [5] it is noted that "the method has the percentage of correct prediction of mortality -98% and recovery -95% (according to the sliding test)". There is no clarity on what are these parameters, since they are not presented in the literature as commonly used. It may be thought that, the cited parameters in these terms are: sensitivity and specificity. With regard to the approach used in the work, the so-called "sliding test", it would also be necessary to clarify it or at least provide a link to the source in which it was described. In any case, the principle of determining these parameters and accuracy implies testing of patients in the database in such a way that it is carried out strictly in accordance with their serial numbers or randomly without returning the already tested ones into the database in order to exclude the possibility of their re-testing. In the patent literature was also proposed a method for predicting the outcome of myocardial infarction in patients with diabetes mellitus, based on the selection of a set of clinical and morphological signs and their four-point assessment [6]. Depending on the number of points obtained, the probability of a fatal outcome of myocardial infarction is predicted in terms of high / low. Speaking of the evaluation of the effectiveness of this method, it was not even produced here at all. A similar situation took place in work Dabizheva A. N et al [7].

Evaluation of the Effectiveness of Various Methods of Diagnosis and Prediction According to the Data of the Journal Literature
The data of journal literature. In the journal literature, much attention is also paid to the topic of evaluating the effectiveness of various ways of diagnosing and predicting outcomes in various pathologies. In work Gravning J. et al a qualitative analysis of the capabilities of new biomarkers for the early diagnosis of acute myocardial infarction with respect to troponin T was made. In the analysis, the authors of this work used term of sensitivity and specificity and not on a quantitative level, but on a qualitative one [8]. In work Cabar F. R. et al the relationship between the depth of trophoblastic infiltration, divided into stages, and the concentration of serum vascular endothelial growth factor in patients with an ampullary ectopic pregnancy, was determined [9]. On the basis of the ROC curves, the boundary values of this endothelial growth factor were determined and then, using parameters sensitivity and specificity, they evaluated the effectiveness of isolating stage III from the first and second for this factor, namely, the concentration of endothelial growth factor. It was noted that the best separation is obtained with a sensitivity of 75% and a specificity of 76.9%. In work Luvizutto G. J., et al the facehand test method was used to study the phenomenon of sensory extinction with reference to 150 individuals with different socio-demographic data of the Brazilian population [10], Binomial models were adapted for processing the data obtained by this method, which were used to construct ROC curves and assess the sensitivity and specificity of sensory extinction. As a result, it was obtained that sensory extinction increases with age and significantly decreases with increasing level of education. The authors of this work believe that their data is justified by the high values of these two parameters.
In work Tkachenko A. N. et al a retrospective study of data on patients was performed who had undergone lower limb amputation for obliterating atherosclerosis of the lower extremity vessels, made it possible to create a prognosis program based on an analysis of 88 clinical signs [11].
In work Tkachenko A. N. et al a retrospective study of data on patients who had undergone lower limb amputation for obliterating atherosclerosis of the lower extremity vessels, made it possible to create a prognosis program based on an analysis of 88 clinical signs [11]. This program, according to the authors, makes it possible to predict the development of a fatal outcome in the early postoperative period in patients of older age groups who have undergone amputation of the lower limbs and to take timely preventive measures in each case. The informative ability of the proposed program is defined as 80%. This work does not contain information about what is information ability, but the main thing is that with only one parameter such a complex concept as the effectiveness of the method cannot be determined. A large group of Japanese researchers (15 authors) presented the work in which the relationship of neutrophils to lymphocytes (neutrophil -to -lymphosite ratio) was studied as a prognostic factor in detecting the phenomenon of response loss due to the action of a specific immune suppressive drug infliximab (infliximab) with ulcerative colitis [12]. The authors of the work obtained that, through this relationship, this phenomenon under certain conditions, is detected with a sensitivity of 78.6% and a specificity of 78.3%. The authors do not discuss whether the given parameters are sufficient for a full characteristic of the effectiveness of this method. Note that in the literature and on many sites, indeed, there are often unfounded judgments that sensitivity and specificity are the main parameters for evaluating the effectiveness of different methods, and accuracy, predictability, and others are secondary. Apparently, such an understanding of the issue in the literature is quite common, judging by the fact that in a large number of works, either one parameter or two are used to evaluate the effectiveness of the methods, and mostly sensitivity and specificity [13,14]. Further, it will be shown that these are fundamentally incorrect judgments, since operating with such canonical concepts as sensitivity and specificity does not lead to an unambiguous assessment of the effectiveness of the methods. Among the works in which the aspect of evaluating the effectiveness of diagnostic methods, prediction and classification is prominent and it seems reliable, should be cited the work of Swedish researchers [15]. The authors of this work set a goal to evaluate the effectiveness of the fine-needle aspiration biopsy method (fine-needle aspiration cytology, FNAC) for diagnosing parotid gland proliferation (parotid gland masses). This method is considered valuable for the preoperative localization of head and neck tumors. However, its accuracy in detecting the growth of the salivary gland is debatable in comparison with other methods. The authors analyzed the retrospective data of 114 patients and conducted a comparative analysis of FNAC results and final histological diagnoses for these patients. According to the histological diagnosis, 11 patients with malignant tumors and 103 patients with benign tissue damage were identified. At the same time, when evaluating the effectiveness of the FNAC method, numerical values of sensitivity, specificity, accuracy, value, as both positive and negative prognostic values were obtained. This work and its results present particular interest in terms of verifying our data, which is performed and given below in the conclusion. It is obvious that the evaluation of the effectiveness of the various proposed methods of diagnosis and prediction should be done and presented correctly in the literature, otherwise there is no point in discussing their effectiveness and reliability. After analyzing the literature on the problem of evaluating the effectiveness of the proposed methods, it can be noted that this is often not observed in the literature. There are works with a superficial attitude to the problem of the correctness of the estimates given, which characterize the effectiveness of diagnostic analyzers, which they obtain on the basis of their data sets, in particular, on patients. Authors often make inaccuracies, mistakes, and thus mislead others. Apparently, this circumstance became possible due to the lack of a rigorous theory of evaluating the diagnostic effectiveness of the proposed methods developed on various private data in the form of databases or arrays, the results of which are not possible due to the fact that most of them are not available for use. Thus, it can be noted that in the reviewed literature, both patent and journal, in the presentation of data about the effectiveness of various proposed methods there is a large uncertainty of the subjective property. The latter, apparently, is due to the lack of a rigorous and clear theory of evaluating the diagnostic effectiveness of the proposed methods, developed on different private data in the form of databases or arrays, to reproduce and interpret the results of which is not possible due to the fact that most of them are not available for use. In connection with the above, the purpose of this work was to construct a theory in which the characteristic parameters would be defined for an unambiguous assessment of the effectiveness of various diagnostic methods and the role of the databases to which they apply.

A Parametric Equation Linking a
Triad of Independent Basic Parameters (Sensitivity, Specificity, Accuracy) Sensitivity, specificity, accuracy, each parameters separately, describes different sides of a multi-valued concept of the effectiveness of a particular method of diagnosis and prediction.
In accordance with the definition of accuracy for the methods of diagnosis and prediction formulated above, we will present this parameter in the following formula: = , where N is the total number of patients in the database (sick and healthy), and means the number of correctly diagnosed patients from it, that is, both with truly positive and with truly negative diagnostic results, which were determined by one or another proposed method. It is clear that = + , where and respectively, the number of sick patients and healthy patients, which together add up to the total number of patients in the base. In turn, = + make up the number of correctly established diagnoses, respectively among sick patients, that is, with a truly positive diagnosis, and is the number of correctly established diagnoses among healthy, that is with a true negative diagnosis. Then the following holds. = = . Acting sequentially, in the right side of the equation we multiply the first term of the numerator by one as the ratio / , and the second term of the numerator by one as the ratio / , and then, dividing the numerator and the denominator of the right side of this equation by , as a result we get: where = ⁄ and this value can be terminologically defined as a parameter characterizing the database from which the method is tested to determine its effectiveness. This parameter, which is also an important characteristic of the database, can be terminologically defined as the binary coefficient of a database asymmetry used to evaluate a new diagnostic method. This parameter is directly related to the prevalence parameter (Prevalence) known in the literature, which is determined by the expression Prevalence = . This connection, as is easily shown, is defined by the following relation: Prevalence = K / (1 + K). The parameter K can be expressed in formula, using the expression (1), and get the following: Equations 1 and 2 tie together the three most important parameters sensitivity, specificity and accuracy, by means of which you can determine the parameter K, which characterizes an important feature of the base itself, by which a particular diagnostic method was tested. In continuation of the analysis, we consider the following two relationships: 1 − #$ = % , 1 − #& = ' , where ( is the number of true sick patients who are considered to be healthy (false negative), and therefore ( = − , and ) , is the number of healthy, which, when testing the base with a new method, were classified as sick (false positive), and therefore ) = − . The meaning of these relationships is as follows. The left part of the first ratio determines the proportion of those patients from the number of true sick who were recognized by the way of diagnosing as healthy. The left part of the second ratio determines the proportion of those genuinely healthy patients who were recognized as sick by the diagnostic method. The meaning of the right sides of these two relations is obvious. Dividing the first ratio by the second, we get: As a result: % ' = ⤫ , or taking into account the expression obtained for K, we have: Here % ' is a parameter that should be terminologically defined as a diagnostic invariant of the method of diagnosis or prediction. This diagnostic invariant is an important parameter expressing the inner essence of a particular method used. The test consequences resulting from the obtained equations 1, 2 and 3 are such that when specificity and accuracy are numerically the same, it is obvious that this is a case of absence of patients in the data array, that is, K = 0. Using these obtained data, using the appropriate expression for the binary asymmetry coefficient of the database used to evaluate a new diagnostic method, we define its numerical value. The conditions described in the above example confirm the correctness of the obtained numerical value for the parameter in question. Thus, two equations have been obtained in which three fundamentally important parameters are tied together: sensitivity, specificity and accuracy, which assess the diagnostic or prognostic capabilities of various methods developed and proposed in the literature as potential diagnostic methods. The connection of these three most important parameters together by the obtained equations allows, by means of them, to express an important property of the patient database, from which the proposed method was derived. Thus, the effectiveness of forecasting methods is presented as a composite three-parameter indicator. In other words, the efficiency of the method is determined by a system of parameters, which, in their entirety, unambiguously determine it. There may be different systems of parameters with the property of uniqueness. You can move from one such system to another, and this issue will be discussed below.

The Parametric Equation Obtained on the Basis of the Concept of Predictability of a Positive Diagnosis (>? )
According to the definition, the predictive value of a positive diagnosis is determined by the following equation: Replacing in the last equation K with its expression obtained above, by means of parameters sensitivity, specificity and accuracy, we get: It can be seen that the parameter AB is also expressed through the three basic parameters sensitivity, specificity and accuracy.

The Parametric Equation Obtained on the Basis of the Concept of Predictability of a Positive Diagnosis (>? )
According to the definition, the predictive value of a negative diagnosis is determined by the following equation It can be seen that the parameter AB , as well as the parameters K, % ' and AB , is also expressed in the basis of three main parameters, sensitivity, specificity and accuracy, or another basic triad sensitivity, specificity and binary coefficient of asymmetry of the base used to test the effectiveness of various methods of diagnosis, prediction and classification. All of the above data suggests that our theory determines the necessary and sufficient conditions for an unambiguous assessment of effectiveness.

Errors That Occur Due to the Non-ideality of the Method, Adopted as Standard, by Means of Which the Base Parameters Are Formed, Used for Testing Other Methods
Let us ask ourselves how a certain method, which is defined as standard, distorts an ideal hypothetical database, if this database represents the sick and the healthy, for which absolutely it is known who is truly sick and who is truly healthy. It can be assumed that this truth is achieved by using a special method, called the gold standard, unmistakable in all respects. The asymmetry parameter K of this hypothetical base is known, К = , and the standard method chosen is not perfect and is characterized by the parameters #$ and #& (superscripts "s" denote the belonging of the parameters, namely, to the selected standard method). So, let, according to the gold standard, a database of patients with some particular disease and not sick of them, relatively speaking, healthy is formed. It includes the number of sick patients , the number of healthy and its asymmetry parameter = . After acting on this base of method S, and healthy will be recognized at the exit. The number of sick and healthy is determined by the following ratios: The right part of the first expression in parentheses represents the number of false-positive data, that is, the test results show the number of healthy patients assigned, and the second expression in parentheses shows the number of false-negative data, that is, the test results show the number of patients classified as healthy. The first members of the right-hand parts in these ratios are, respectively, the number of truly sick and truly healthy ones. Obviously, that the sum of the number of sick and healthy at the same time remains the same, that is, + = + . The relationship between the identified sick and healthy people in the original database and those obtained after the actions of the S-method will differ. The nature of the differences is determined by the values of the expressions. The last expression shows what the asymmetry parameter of the new database will be, obtained after and as a result of testing it in any non-ideal way, taken as standard. It seems that by developing this direction, it is possible to establish connections and express the efficiency parameters of the tested methods not only in relation to the standard method, not ideal, but also in relation to the gold standard method, if one exists. However, this is already the subject of another scientific theoretical and experimental research. At the same time, it is clear that testing new ways of diagnosing using databases formed on the basis of not ideal standard methods entails the emergence of new errors, the analysis of which is already fraught with great difficulties.

Conclusion
The theory considered above organizes the element used in the literature of the efficiency parameters of various methods into strictly ordered systems in their cause-effect relationships and interrelations. Therefore, in conclusion, it would be advisable from the standpoint of this theory to consider the data of some works in which much attention is paid to evaluating the effectiveness of diagnostic, prediction and classification methods and the results obtained in them appear to be reliable. One such work was devoted to assessing the effectiveness of the FNAC detection method [15]. The introduction describes the essence of this work, in which the numerical values of many efficiency parameters characterizing this method were obtained. Therefore here we will only conduct their verification on the basis of the ideas developed above in this our work. The parameters of the effectiveness of the method of diagnosing malignant tumors, as defined in this work, were: sensitivity, specificity, accuracy, values, both positive prognostic value and negative. We present the numerical values of these parameters, using for them the symbolism proposed and used by us: Se = 73%, Sp =97%, Ac = 95%, Pr + =73%, Pr -=95%. In addition, the work presents the parameters: the number of patients in the array, N = 114, the number of patients with malignant tumors, = 11 , and the number with benign, = 103. Considering, as was shown above, the minimum number of required parameters, knowledge of which is sufficient for an unambiguous reflection of such a complex indicator as efficiency is 3. Here, 5 parameters are given at once, and one more parameter = ⁄ , characterizing the patient database used is easily calculated from the data of its description, that is, = ⁄ = 11 /103 ≈ 0,107 . It should be noted that this is the only work we have encountered, where data describing the effectiveness of the described diagnostic method are given in abundance. Note that in a large number of works, on the contrary, this aspect in them appears to be a lack of parameters that are unable to unambiguously characterize efficiency. This circumstance, of course, is a positive moment of the work, showing the serious attitude of the authors to the reliability of their data. The fact that in this work many efficiency parameters are given allows us to recheck them on the basis of our theory. Let us take as a base triad the parameters sensitivity, specificity and coefficient of asymmetry of the database and by means of them we will determine others. For accuracy, then we will have (see equation 1): Ac = (0,73* 0,107 +0,97) /1,107 ≈ 0,9468, that is, Ас ≈ 95%. For Pr + we will have (see equation 5): Pr + = (0,73*0,107) /(0,73*0,107 +1 -0,97) ≈ 72,2%, and for Pr -(see equation 6): Pr -= 0,97/ (0,97 + 0,107 -0,07811) ≈ 97%. For the parameter we will have (see equation 6): 0,963. These obtained data completely correspond to the data given in the work, which confirms the correctness of all the provisions of the theory and all equations expressing it. The observed small discrepancies are related to the fact that the authors of the work presented in it not the exact values of the parameters being determined, but their approximate values. It is easy to see that by taking the other triad as the base triad of parameters, all other parameters will also be easily determined by means of them. It should be borne in mind another circumstance.
For example, in the work alsaw [14] cited by us, it was shown that in relation to neutrophils for lymphocytes, the phenomenon of loss of response to the action of a specific immunosuppressive drug infliximab in ulcerative colitis with certain values of sensitivity and specificity is well identified.
Our analysis unambiguously states that these two parameters are not sufficient for an unambiguous assessment of this method. At the same time, this work clearly describes the used patient data set, which makes it possible to find another, third, necessary parameter characterizing it. The result is a necessary and sufficient triad of parameters, which makes it possible to determine all other parameters on the basis of the theory presented here. Thus, only knowing our theory, it is possible in the case of the known of any three parameters of efficiency to determine all the others, initially not known. This suggests that only the triads of any known parameters of efficiency are necessary and sufficient for an unambiguous assessment of the effectiveness of any method of diagnosis, prediction and classification. Full unambiguity and certainty in the reflection of the effectiveness of the method to diagnose and predict is achieved when all the parameters characterizing efficiency from different sides are interconnected by one equation. Unambiguous evaluation of the effectiveness of a particular method is a necessary and necessary condition, since it is only in this case that, generally, it is appropriate to talk about a comparative analysis of something. This is an important point, as it concerns the reliability of the published data and their scientific significance. The problems raised in this paper are important for assessing the reliability of diagnostic methods, new treatment methods, and other such tasks. Nevertheless, as the analysis of the literature conducted here shows, the attitude to this problem looks like dismissive, for various reasons. First, due to the lack of a strict theory on this. Secondly, because of the position of Russian scientific journals, which are not concerned about the level of accuracy in the articles published on their pages. This can be confirmed by the following. This work of ours was previously sent to the journal "Biophysica", where, after reviewing, it was rejected by the editorial board on the basis of the statement: "Good article, but it is interesting for mathematicians." Being then sent to the journal Modern Functional Diagnostics, it was also rejected with the wording "The article is very interesting, but our readers will not understand it." Approximately with the same formulation, it was rejected by the journal Clinical Laboratory Diagnostics. The university journal Vestnik of the Russian State Medical University rejected the article, explaining that it does not publish opinions, thus demonstrating a surprising disregard, calling the laws obtained in our work and which relate to the problem of the reliability of experimental data, opinions. It is clear, after all, what kind of science can be obtained without the reliability of data presented in scientific literature. Obtained in this work, the regularities relating to the problems of reliability, tested on examples of data taken from literary sources with the citing of their references.