Using Data Envelopment Analysis to Ranking ICMS’s Taxpayers

This study presents a practical methodology developed in the R software language, which makes use of Data Envelopment Analysis, in the Constant Returns of Scale model, to measure the tax collection efficiency of the ICMS taxpayers (Brazilian tax on commercial operations related to the movement of goods and interstate and inter-municipal transportation and communication services), using as input the component variables of the tax calculation function found in the amounts recorded in the Electronic Invoices (purchases and sales) and in billing obtained with sales made with Card (credit and debit mode). The data corresponding to a fiscal year are obtained in the databases of the Brazilian revenue agencies, tabulated and submitted to the DEA calculation (multipliers and the envelope models). Thus, in a process of monitoring taxpayers belonging to the same economic sector, the lower relative efficiency performances of the companies will raise suspicion and serve to identify those that deserve to be audited (fiscal audit). Two examples of application of the explained methodology are demonstrated (Department Stores sector and Retailing of Footwear sector), where it is possible to observe its positive results in the identification of the taxpayers with low efficiency in the tax collection and eligibility for the inspection action. Currently the methodology is in use in the Federal District Revenue (Brazil) as an instrument for selecting companies for auditing.


Introduction
Defined in Article 155, II, of the Federal Constitution of Brazil (1988) [1], the ICMS is a state tax that represents the Brazilian taxation on the value aggregated through commercial purchase and sale transactions, transport, and telecommunications services.
As the most representative tax for the states -since it is the largest source of public revenue -the fight against tax evasion denotes a desirable effort to balance the public finances of these entities.Thus, the optimized use of available resources for their surveillance is a desirable condition.
The current ICMS monitoring model in the Revenue Department is dedicated to analyzing the performance of taxpayers in their respective economic sectors.Therefore, it is important to hold a comparison parameter that can distinguish and expose companies with inconsistent behaviors when compares to others companies which are subject to the same pattern of taxation charge.
Thus, this research aims to propose a simple and practical appliance, making use of a non-parametrical optimization method called Data Envelopment Analysis, to measure and ranking the relative efficiency of the tax collection among ICMS's taxpayers who participate of the same economic sector, which will help maximize the tax monitoring task, aiding in the identification of possible tax evasion events.
That said, the goal of this paper is to present a DEA ranking tool that illustrate the present discrepancies in the tax collecting behavior of taxpayers.
The classical DEA model will be used, in particular the Constant Scale Returns (CRS) model, in the multipliers (primordial) and envelopes (dual) and output-oriented versions, to distinguish firms according to their tax collection efficiencies.

The ICMS Taxpayers Monitoring Process
The process of monitoring ICMS taxpayers include encompasses activities such as: choosing the economic sector to monitor, study of the relative taxpayer's behavior, selecting suspect taxpayers, analysis of the suspicious behavior in search of fraud hypotheses, notifying and/or audit and assessment of any perceived irregularity.
This process can be summarized by the following diagram: Stages of the Taxpayers Monitoring Process As can be seen in the diagram, the leading activity of the monitoring process is completed with the selection of companies from the same economic sector of interest and the extraction of the pertinent fiscal data.The Collective fiscal information about these firms should be analyzed according to defined parameters and requirements -to select taxpayers who offer an outlier behavior worthy of in-depth investigation.
The audit of the tax data of the suspected taxpayers will ascertain the hypotheses of irregularities and tax frauds that may be practiced.Some irregularities, of lesser offensive potential, can be remedied by notifying the taxpayer, encouraging him to spontaneously fulfill his tax obligation (without penalty).
In the case of noncompliance or in the case of fraud, a notice is filed that may lead to criminal prosecution.
Thus, the inaugural activity requires the classification of taxpayers for the selection of those who, due to their tax behavior, are shown to be interest for tax inspection.
One way of selecting to select taxpayers for a more detailed investigation is to compare their tax behavior to the performance of the tax collection among taxpayers of the same economic segment.It is correct to assume that these companies sharing the same parameters of tax requirements and tax applicable to the segment of economic activity condition that ensures homogeneity among agents.
It should be noted, however important, that this homogeneity assumption is incorrect if ones compares firms with different taxes burden.Therefore, it is not prudent to compare companies of distinct tax regimes of taxation (like Normal vs Simplified vs Encourages scheme for taxation).For this reason, the monitoring activity will always due segregate the taxpayers by type of method of calculation of ICMS and by economic sector.
Following the idea of carrying out the evaluation among the participating taxpayers the Data Envelopment Analysis technique is the appropriate tool for this purpose.

Data Envelopment Analysis (DEA)
and the Methodology Used

Data Envelopment Analysis (DEA)
The DEA is a multivariate technique used to analyze the productivity efficiency of the Decision Making Units (DMU), which establishes an indicator of the relative efficiency management of the input and output of these units (DMU) and providing quantitative data on possible directions to improve the performance of the units, when they are inefficient.
Also known as Frontier Analysis, DEA is based on nonparametric linear mathematical programming models, therefore, it does not make use statistical inferences, nor does it cling to measures of central tendency, coefficient tests or formalization of regression analysis.In this sense, DEA does not require the determination of functional relations between the inputs and the outputs and allows the use of discretionary, instrumental or decision variables, non-discretionary or exogenous and categorical variables (including dummy) in their applications.
The good reputation of this tool comes from its relative simplicity and the wide applicability in several problems found in the real world.Virtually any condition that has multiple units (DMUs) that operate in a similar way and that is concerned with the standardization of performance of these units can make use of this technique.
The DEA defines the relative competitive positioning of a set of organizations or activities by comparing their technical, scale and allocative efficiencies or inefficiencies.That methodology evaluates multiple resource and multiple products for each DMU.The ability of that entity to generate outputs for certain inputs defines its efficiency.It is understood that less efficient DMUs can improve their efficiency to the limit of the best DMUs, whose efficiency is 1.00.Among the attributes that make up the DEA model are: 1) the relative efficiency of each productive organization (DMU) summarized with a single number that synthesizes the interactions between multiple inputs and outputs; 2) the possibility of identifying input economies or production increases, that would allow inefficient DMUs to become efficient.
For tax purposes, has as its operational predicate: 1) to allow the classification of the relative contribution of taxpayers belonging to the same economic sector; 2) provides the selection of taxpayers of interest for a better investigation of their fiscal regularity, based on their low relative efficiency performance.
This exposed, the DEA corroborates with the proposal of this paper that seeks to establish a useful methodology for the selection of taxpayers who have contributory behavior incompatible with the economic sector to which they belong.

The Methodology Used
DEA models can be input or output oriented, and generally consider multiple inputs and outputs in multidimensional spaces.
The outputs and inputs grow in the same proportion as the tax contribution ratio, equivalent to a theoretical average rate corresponding to each economic sector.In other words, they follow the average incidence rate of ICMS on the products traded by the segment under study.It is for this reason that the proposed methodology makes use of the DEA-CRS model of constant returns to scale, since sectoral taxation follows a constant pattern of tax burden.
CCR/CRS, named after it's developers Chames, Cooper and Rhodes, it's the first and fundamental DEA model, built on the notion of efficiency as defined in the classical engineering ratio.The CCR ratio model calculates an overall efficiency for the unit in which both its pure technical efficiency and scale efficiency are aggregated into a single value.
In the tax collection model, what matters is the movement of output towards the efficiency frontier, given the economic conditions experienced and practiced by taxpayers.
The proposed model is product-oriented, assuming that inputs do not vary -they remain constant, since they are economic achievements of firms -as production varies to reach the efficient production frontier.
Accordingly, it is assumed that the respective values of the economic movement (purchases and sales) are maintained, and the variation in ICMS tax collection pertinent to each taxpayer according to their relative efficiency.
In the DEA-CRS technique the relative efficiency of the DMUs can be calculated under two models: 1) Algebraic model of multipliers, dedicated to establishing the efficiency frontier by algebraic optimization of the weights of each input and output component, and 2) Envelopment model (dual).
The CCR/CRS Multipliers Model Oriented output is given through the solution of the following optimization expression: Knowing that y is the product, x is the input, P is productivity, E is efficiency and variables µ and ν are the weights coefficients.
Transforming fractional to linear programming: -.

∑ / 0
(2) Subject to: The model allows each DMU to choose the weights for each variable -input (ν) or output (µ) -in the most benevolent manner, provided that those weights applied to the other DMUs do not generate a ratio less than 1.
The CCR/CRS Envelopment Model -Oriented output is given through the solution of the following expression: Knowing that Φ is the inverse of efficiency (such that 1 ≤ Φ ≤∞) and λ represents the weights.
Subject to: Because they are dual, the multipliers model has the same value as the objective function of the envelopment model.In these types of DEA models, DMU's products and virtual inputs are the products and inputs that result from the end of the process of minimization (multipliers) or maximization (envelopment) by linear mathematical programming.
To give greater reliability to the efficiency results and goals of each DMU, it is necessary to consider eventual slacks in the projection of the efficiency frontiers.Therefore, the best model should consider the slacks: Being S+ output slacks and S− input slacks.
Subject to: The model is based on the fact that, in the same economic segment, the result of the collection should remain at a reasonable minimum variation, given the behavior of purchases and sales (for other companies or for the final consumer).
Such inputs information represents the economic and fiscal movement of the company directly related to the level of expected taxes collection.
Obviously, several factors may explain a taxpayer's discrepancy and deserve to be thoroughly appreciated by pertinent tax auditing techniques.
The robustness of the CRS model determines that if the DMU is inefficient, it is actually relatively inefficient.This is a condition that is a criterion for the classification of suspected taxpayers.This is not to say that there are no problems in companies considered efficient nor that the inefficiency noticed doesn't have a reasonable and fair explanation.
However, this does not detract from the use of the DEA as a preliminary indicator of possible irregularity worthy of investigation or monitoring, since that is the idea of the model presented.
Introduced by Charnes, Cooper and Rhodes in 1978 [2], the DEA is a particular application of Operational Research, which offers an appropriate solution to the problem of the relative efficiency calculation, based on a linear programming model.That methodology can now be systematized and easily solved using computer programs.Its utility is attested by the development of many publications announcing practicals solutions constructed through the use of the tool (e.g.Emrouznejad [17], Cook and Seiford [18] and Charnes et al. [19]).
The work of Charnes et al. [2] and then Banker et al. [20] and Seiford and Thrall [21], it stands out as the foundation of the DEA method.In this method, each composite unit is a convex combination of its reference units constructing a hypothetical ideal reference of efficient.
As shown in Tone [22] and Zhu [4], in several applied studies, the DEA has been used to provide insights into various activities and the benchmark identification (Zhu [23]).The authors add the notice of since that DEA was first introduced in 1978, to present days, researchers from different fields of knowledge recognize DEA as an excellent, and simple, methodology for modeling operational processes of performance appreciation.
From the lessons of Chames, Cooper and Rhodes [2], the DEA can be explained as a non-parametric technique, constructed in linear programming, for the evaluation of organizational efficiencies and performance measurement of operational units (decision-makers DMUs), who operate in the same branch of activity, when the presence of multiple inputs and outputs makes comparison difficult.
A definition of the DEA, given by Zhu [23], is that it is a tool with support in mathematical programming being a method that offers the estimation of the best frontiers of production and benchmarking in relation to the efficiency of multiple entities.
In addition, the author (Zhu [24]) formally defines the DEA as a boundary methodology and not for central tendency limits.In contrast to trying to fit a plane through of the data center as in statistical regressions, the model defines a linear surface fractionated over the observations.Due to this peculiar perspective, the DEA is particularly suited to discover relationships that remain hidden from other methodologies.
Kassai [25], building an accounting application for the tool, offers the DEA in the perspective of an efficiency bend (or maximized productivity curve) considering the optimal relation between inputs and outputs.This curve can be determined as a frontier of efficiency.Thus, the units considered efficient will be in intersection with this paradigm curve, while inefficient ones will be located under it.That efficiency frontier will serve as a benchmark for an inefficient unit aims to become efficient.
According to Kassai [25], the solution coming from the application of the DEA can be summarized in: a) an envelope surface formed by the best performing (efficient) DMUs, which represent the reference set for the other units; b) an performance index, which means the distance of each unit to the frontier and; c) projections of inefficient units at the border, composing targets for these units.In addition, the author says, DMUs can mean business groups, individual companies, administrative units, provided that they comply with the requirements that: a) they are comparable; b) act under the same conditions; c) and the factors of inputs/outputs are the same, differing only in intensity and magnitude.
In the lesson of Ferreira [26], the DEA is a mathematical programming approach, alternative for the classical parametric statistical methods based on average or hypothetical maximum efficiencies, which provides an estimate of the relative efficiency by a border limit (efficiency), which informs points limits of productivity where a hypothetical productive unit is technically efficient.The idea of the DEA technique is to construct a convex referential setting where the DMUs can be classified into efficient and inefficient profile, having as a reference the outline of this surface area.
According to Casu and Molyneux [27], DEA is a mathematical programming model for the definition of the frontier of production (maximized) and Observation of the individual relative efficiency measure compared to the constructed frontier.For Ibrahim [28], The DEA solution measures the relative efficiency of each DMU compared to the best results presented.The maximum achieved performances indicate the frontiers of empirical production that set the limits to the achievable results with a given set of resources.The efficiency factors of a DMU are measured from the relative positions in collation with the established boundaries.Each result represents the descriptive of the abilities and the objective restrictions of the unit, assuming that, bypassing the restrictions and increasing the abilities, the results can be improved.
The initial DEA model was constructed by Chames, Cooper and Rhodes [2] and is called by its initials CCR, is to this day the most widely used model.This model has support in the definition of total unit efficiency, established as a ratio, that works with Constant Scale Returns (CRS).In the CCR model, weight weighting is associated to the inputs and output variables associated with the DMUs.Each double weight establishes the importance of the DMU in the composition of the input-output variables of the composite unit.The composite unit is a combination of efficient units.Thus, a given DMU is inefficient if the dual CCR model can present a hypothetical composite unit that surpasses it.
As Ferreira [26] teaches, the CCR model can be oriented to the inputs or to the outputs.Coelli points out [29] that the orientation towards the inputs tries to solve the question: observed the output standard of the unit, what is the possible reduction in the input, so as to maintain the current level of outputs?Concerning the models oriented towards the outputs, the answer is to the question: given the level of inputs used, which is the highest level of outputs that can be achieved by maintaining the level of these inputs constant?
Over the years, the applicability of the DEA has expanded, making it necessary new mathematical models to supply this new range of applications in different sectors.With this evolution, the models started to present modifications to the original model stemming from the incorporation of new concepts to each model.The DEA currently has a variety of models that from the classical DEA models (described in this article), until the approaches that combine DEA models with Monte Carlo simulation methods, sophisticated statistical models and fuzzy logic as shown by Tone [22], Emrouznejad [30] and Ghasemi et al. [31].
By providing a solution for measuring efficiencies between companies that share economic similarity, the DEA is an ideal tool for the identification and selection of the best eligible contributors for the analysis of anomalies, since it will be possible to o find the taxpayers with the least efficiency in terms of their taxes contributions.

The Data
This research uses information extracted from the databases of the Department of Revenue of the Brazil's Federal District.
Considering the data available to the state tax offices and the variables that explain the economic function that produces the tax ICMS (economic movement with goods and services), it is fair to use the following data for the composition of the DEA model in presentation: OUTPUT: As the purpose of the methodology is to define a comparative list of efficient of the amount collected from the tax for the companies participating in a common economic segment, the only product of interest for the measurement of efficiency to be relativized in the inputoutput equation is the annual total tax payment (ICMS) by the participating companies.Specifically, the amount collected in the year 2017 of ICMS, under the revenue code 1517 (regular ICMS).
Considering that the choice of inputs is based on the parameters of economic movement of the commercial companies (their purchases and their sales), the model proposes use of the following variables: INPUTS: Sum of Account Value of Electronic Receipt (Nfe's), documents generated, certified and approved electronically, which explains the bargaining movement of goods in a commercial transaction of purchase and sale, as well as the value of the goods and the levied ICMS.It is considered: 1. Purchase's Nfe, those representing the formation of commercial stocks of companies -purchases for asset formation manufacture or resale.2. Sale's Nfe, those who report the sales between corporate taxpayers (input or resale) or the sale to the final consumer (individual or corporation).The Nfe's (purchase or sale) can represent: 1. Internal Operations -when operations are carried out by local taxpayers within the same federated unit, or 2. Interstate Operations -when operations are carried out between companies from different federated units.This study will only considerer valid Nfe's, that is, those that have not been canceled for any reason.
Furthermore, much of the internal operations of sale to final consumer (especially individual) in the Federal District, are performed with the use of Tax Coupons (that will remain until its complete replacement by Electronic Consumer Tax Receipts).
The values of this commercialization to the final consumer, with the use of tax coupons, are not normally contemplated in NFEs.Nevertheless, they hold important values representative of the economic activity of taxpayers subject to ICMS taxation.
In order to reduce this predicament, considering that coupon information is not easily available for Revenue Organs, the billing movement of credit or debit card as a means of payment is used as input, since this type of operation represents approximately 60% to 70% of sales to the final consumer (especially the individual).
Finally, considering the desire to maintain homogeneity between the DMUs, the application of the DEA should focus on a cross section dedicated to the full calendar year (in the examples worked with the year 2017) and with the participation of only taxpayers in the active situation and submitted to the same tax regime since the previous year (in casu the regular regime).
The remote extraction of data from ORACLE database and the application of DEA technique in this study was carried out in R software under the RSTUDIO platform (Version 1.0.136),using respectively RODBC and Benchmarking packages.

Results -Analysis Procedures with Real Examples
Once chosen an Economic sector to analyze, values will be extracted from the fiscal information bases (Databases) of the Federal District Revenue through the ODBC connection between the ORACLE database and the R Software (RStudio platform), performed through the RODBC package that allows SQL queries directly, corresponding to the period under observation (as already said, 2017).
In reasons of mandatory tax secrecy, imposed by article 198 of the Brazilian Tax Code [32], the identification of taxpayers under study will not be disclosed.

Example 1 -Department Stores or Magazines
This first example will provide an analysis of the contributory efficiency of the taxpayer companies of the economic segments represented by the CNAE -G471300100 -Department stores or magazines.
The following table (table 1) presents the values of the inputs and the product to be used in the proposed methodology.In the Federal District there are thirty-eight active taxpayers belonging to economic sector of Department Stores or Magazines that meet the criteria established for the study in 2017.
From these companies were extracted the annual values corresponding to all ICMS collection of the period.These values will be used as the OUTPUT variable for the DEA model.
Likewise, the values related to total purchases, total sales, and total of credit or debit card transactions are extracted to compose the DEA model INPUT variables.
Applying the DEA-CRS (product-oriented) the following results are obtained regarding the efficiency of taxpayers in the economic sector.
DEA -Contributing efficiency in the economic sector of Department Stores or Magazines The graph (figure 2) demonstrates the individual result of the relative efficiency of each taxpayer collecting the ICMS tax, as a weighted function of their commercial movement of purchases (inventory formation) and sales.Recalling that the paradigms of total efficiency are the companies that have efficiency index equal to 1.
The summary of the results of the efficiency indicators of the Department Stores or Magazines found by the DEA-CRS method can be seen as follows (table 2): It is possible to say that, according to the results of the application of the DEA-CRS (output oriented) methodology, for the thirty-eight companies in the economic sector under analysis, only four companies had an efficiency index of 1, with the average efficiency for the sector equal to 0.6121, which is a bit far from the ideal expected.
By analyzing the λs results (the largest participation as reference of the DMUs) it is possible to determine that the companie GM-34 serve as the Benchmarks of the model.Also, no output slacks were reported in the model, i.e. the output slacks are all equal to zero.The individualized result per taxpayer is shown in the following table (table 3) -considering slacks output =0 ∀ DMU: In the economic sector of Retailing of Footwear, forty-four companies present themselves as candidates for the application of the DEA methodology, for the year 2017.
From these taxpayers are obtained the values: a) of the ICMS collected in the period and b) related to their operating economic variables of purchases and sales (including by card), to serve as output and input variables of the DEA model, respectively.
Applying the DEA (product-oriented) the following results are obtained regarding the efficiency of taxpayers in the economic sector.
DEA -Contributing efficiency in the economic sector of Retailing of Footwear.Graphically (figure 3) it is demonstrated the disposition of the efficiency indices found by the DEA-CRS method application in the companies of the economic sector, obtained from the combination of the input and output factors presented.
It is possible to see a better uniformity in the behavior of taxpayer efficiencies in this economic sector compared to the first example.
The results obtained by the application of the DEA-CRS method for the entire Footwear Retail sector are summarized below (table 5), to provide a better picture of the distribution of their performance.The performance obtained, based on the DEA-CRS (product-oriented) method for the forty-four companies participating in the economic segment (table 5), make evident that nine companies that have a maximum relative efficiency of 1, namely: SP -4, SP-5, SP-11, SP-15, SP-21, SP24, SP-26, SP-31 and SP-43.
By analyzing the λs (the largest participation as reference of the DMUs) it is possible to determine that the companies SP-15, SP-21, SP-26 and SP-31 serve as the Benchmark of the model.
The average of the efficiencies presented by the companies was 0.7516.The output slacks are all equal to zero.There is no slack in the outputs.
The individualized result per taxpayer is shown in the following table (table 6) -considering slacks output = 0 ∀ DMU: Table 6 presents the individual result of the relative efficiencies obtained by each DMU under analysis, also offering the efficiency frontier projection and the desirable tax collection targets for each taxpayer to achieve.
The highlighted blue lines show the companies that obtained the highest relative efficiency index (equal to 1).
Also, in red are the taxpayers who, by the established rule of result of suspicious efficiency, obtained index less than 0.4.Remember that this rule is the analyst's choice.In the example four companies are eligible to undergo fiscal audit procedures: SP-14, SP-16, SP-20 and SP-37.
With these two examples, it was possible to demonstrate the practicality and simplicity of the methodology proposed as an instrument for the monitoring of taxpayers and the selection of those who, due to their suspicious tax collection behavior, seem to be eligible for tax investigation.

Conclusion
This work agrees with the objective of constructing a useful analytical solution for the fiscal monitoring activity, which facilitates the selection of suspicious taxpayer companies, corroborating the fight against tax evasion that is more representative of state revenues.
The proposed tool consisted in applying the DEA to build a relative classification of collection efficiency by the company of an economic sector and thus to be able to identify the anomalous behaviors that require auditing.
As seen in the explored examples, the DEA model offered identifies companies that present the worst tax collection efficiency (ICMS), choosing them for audit research.It also makes possible the knowledge of the economic segment by the average behavior of all its participants.For all of the above, the presented model corroborates to increase the modus operandi of the fiscal programming activity, since it provided a methodology of quantitative and objective selection of taxpayers.This means that the solution presented here contributes to the increase of successful results in the fight against ICMS tax evasion, since it has the impact of rationalizing the focus of the audits.
Certainly, the model should evolve, especially in order to contemplate electronic consumer tax documents -when it is fully mandatory in 2019, as well as progress is expected to be made to incorporate the peculiarities of each economic segment (when necessary).
Likewise, there is an opportunity for greater use of mathematical programming methods in tax audit procedures, since optimization solutions are desirable in a context of resource constraints, especially in human terms.
As a result of its practical application in the Federal District (Brazil) Revenue Office, it was possible to identify more than a hundred taxpayers offering extravagant tax practices and the respective companies were duly selected for auditing in 2018.
Finally, the model presented in this study was implemented as a working procedure for the monitoring of ICMS taxpayers in the Federal District -Brazil tax jurisdiction.

Figure 2 .
Figure 2. Bar graph showing the result of the relative efficiencies of the 38 companies under study.

Figure 3 .
Figure 3. Bar graph showing the result of the relative efficiencies of the 44 companies under study.

Table 1 .
Values of output and inputs variables.

Table 2 .
Summary of Efficiencies.

Table 4 .
Values of output and inputs variables.

Table 5 .
Summary of Efficiencies.

Table 6 .
Projection of DMUs at the Border of Efficiencies.