Detecting FNE in Sound Free-choice Petri Net with Data

: Nowadays, the development of a third-party service (Express industry) and a third-party payment (Alipay) are very fast in online shopping. Despite there are many technologies to detect control flow errors in business process, the soundness verification in data flow is very hard. To support the design of a workflow, we usually consider the correct control flow structure. However, information about data flow should also be ensured correct. The operation of the system may suffer some external attacks, which makes the task change the read and write operations, which result in changing of control ﬂow structure which would lead to the emergence of unusual system. As a result, our approach provides a new technology to analysis the correctness of sound free-choice Petri net with data (SCDN). With the strong concealment of this attack, the system may suffer false-negative data ﬂow errors (FNE), which would bring some loses to the participants. On the basis of behavioral proﬁles (BP), redundant data ﬂow errors (RDE) and missing data ﬂow errors (MDE), we provide the theory of FNE to demonstrate the stability, effectiveness and adaptation of our detection methods. Finally, a real E-commerce business system is used to illustrate the practicability of the method provided in this paper.


Introduction
In recent years, the Internet technology has been widely used in stimulating the growth of E-commerce. I-Research Consulting statistics show that the transaction volume of Taobao reached 213.5 billion yuan in the Double Eleven of 2018. The E-commerce transaction keeps growing 30 percent to 40 percent every year. The advantages of tools such as laptop, mobile phone, iPad are low prices, more choices, never need to leave home for shopping make a contribution to the development of E-commerce.
When the design of the business process model can correctly reflect the system work without any control flow, data flow anomalies would get all participants' satisfaction, such as vendors, customers, express delivery, and trusted third party security interaction in order to achieve an electronic trading [1]. Designing a workflow model is also a big challenge and error-prone even for experienced process designers. For cost-efficient, rapid development of the design process, detecting errors in the design of system phase is more important than testing at run time. But some methods could not be understood easily by the system designers due to the lack of graphical language's description. Petri net as a suitable tool for modeling a real business process.
The detecting of control flow errors are basing on the analysis of reach-ability [2][3][4], live-ness [5], deadlock [6], lack of synchronization, the long cycle without reference model and so on [7][8][9]. The emergence of these anomalies caused by the incorrect link between the transitions in control flow. Verify the correctness of the control flow has become an important topic in today's research. In the last 23 years, many analysis techniques have been developed to analysis the process models. Languages like Business Process Modeling Notation (BPMN) [10][11][12], UML activity diagrams [5], [13][14][15] and extended Event-driven Process Chains (eEPCs) [16] et al.
Under the premise of sound free-choice Petri nets [17], existing approaches based on the notion of behavioral profile (BP) to analysis the model [18]. Such a profile is made up of three relations basing on the weak order between transitions. These relations could be used to detect the control flow errors. The BP theory has a strong application scope as it is not sensitive than trace equivalence and bi-simulation [19,20].
In the process of workflow execution, the role of the data flow is becoming more and more important. With data in the system, we could make the choice of some important path.
The data flow anomalies of the system sometimes cause the change of control flow structure. Data flow stresses what kinds of data need to be as the input /output of transition. The limitation of data flow could be used to analyze the dependencies between different transitions. If the system lack of data flow information, it would be considered to be nondeterministic and unfair. In some scientific research, some scholars, such as the Taverna, Triana and Kepler think the data flow is more important than the control flow [21]. Some data play as an input of workflow task and only the workflow task should satisfy the input conditions could it be executed effectively and produces output. The transitions which are in exclusiveness relation need one of the tasks meets data input conditions and generate an execution path. Data flow could be shown using document-driven workflow and Meta-graphs [22,23]. Up till now, as I know, no techniques have yet been used to detect of control flow and data flow errors simultaneously. At the beginning of this paper, we focus on the data flow testing. Provide two kinds of data flow errors: redundant data flow errors (RDE) and missing data flow errors (MDE). On the basis of these errors, there comes the definition of false negative data flow errors (FNE). This error affects the control flow structure of the system. The control flow errors and data flow errors are interdependent and we analysis them without a break until the end.
In order to remove the errors in the system, in 2006, Sherry X. Sun et al provide three basic types of data flow errors, namely the missing data flow errors, conflicting data flow errors, and redundant data flow errors [13]. Then, in 2007, M Hema Sundari and others extend and generalize the study and define the missing data flow errors, redundant data flow errors, and lost data flow errors [24]. In 2008, basing on the input and output data of each task in a business process, Sherry X. Sun and J. Leon Zhao could analysis the dependencies among different transitions [25]. In 2009, Nikola Trcka et al put forward six kinds of data flow errors, namely missing data flow errors, strongly redundant data flow errors, weakly redundant data flow errors, strongly lost data flow errors, weakly lost data flow errors and inconsistent data flow errors [26,27]. After then, in 2010, Hema S. Meda and Anup Kumar Sen provide missing data flow errors, inconsistent data flow errors, lost data flow errors and redundant data flow errors and present a graph traversal algorithm called GTforDF for detecting data flow errors in unstructured and nested workflow, also illustrate the operation on realistic examples [21]. This was different from the theory provided by Sherry X. Sun et al in 2006. A task makes use of existing data to generate a new data item. The sequencing of tasks is derived from the input/output data analysis of tasks. In 2014, Divya Sharma et al point out that there are some methods to classify the data flow errors and repair the data flow errors automatically [28]. In 2017, Mariusz Dramski indicates that the missing data flow errors in the process of transmission event logs and put forward some methods to restore the missing data [33]. However, they haven't considered the control flow and data flow could affect each other.
The complex links of control flow and data flow may bring a lot of problems in the process of electronic commerce, such as violation of fairness, so it is necessary to test them at the same time. Even if the formal methods of data flow analysis is presented in the literature [21,[26][27][28][29][30], but there is still a challenge in converting the formalized method to the design tools. The ADEPT flex tools supports a limited set of checks for the accuracy of the data flow, mainly focus on dynamic changes in workflow models [31].
In this article, we use the data flow analysis as a navigation of the right workflow design. In order to describe the problem in this article more clearly, the data flow could only be read and wrote, and could not be destroyed. The contribution of this paper is mainly embodied in the following aspects: Firstly, we define the sound free-choice Petri nets, weak order relation, behavioral profile; Secondly, we give some definitions about the data flow, like read and write operations, redundant data flow errors and missing data flow errors, we use them to detect false negative data flow errors. We haven't introduced lost data flow errors and inconsistent data flow errors, about these errors please refer to reference [26][27][28][29][30].
In order to get the correct workflow [32,33], we put forward two kinds of algorithms used in model checking: 1. Base on the false negative data flow errors to detect the data flow anomalies. 2. Use the behavior profile to detect the control flow anomalies. Our method provides effective guarantee for the correctness of the system design, and through the real case to support the given algorithm. For example, in Figure 7 and Figure 8, in order to make track of purchased product effectively, the transition C (sends the delivery information), which produces an output of the express company distributes, must precede the transition S receives the express arrival notice, which uses S as input. If claims S occurs before C, a data-flow error would occur, like redundant data flow errors and missing data flow errors. Obviously, this kind of error could be detected by existing system. Our method could be used to detect and eliminate some of the data flow errors, and puts forward a new kind of data flow error to detect model's anomaly more deeply.
In this article, we use real E-business process models to illustrate the FNE and control flow errors caused by FNE. The rest parts of this article are arranged as follows. Section 2 introduces some basic concepts. Section 3 presents the algorithm about detect the data flow anomaly based on FNE.
Section 4 gives a case study in E-commerce system and conclusions and future works are given in section 5.

Preliminaries
Petri nets are graphical languages for modeling concurrent and distributed systems. This part presents some basic concepts about Petri nets. Our concepts of Petri nets comprise of places, transitions and flow relations. Graphically, places are represented by circles, transitions are denoted as rectangles, as for flow relations we use arrows to denote them. For more details, please refer to these papers [5,18].
1) Definition 1 (Sound free-choice Petri nets, SCN). Let SCN P, T, F, be a sound free-choice Petri net if. 2) P is a limited non-empty set of places, T a limited non-empty set of transitions, P ∩ T ∅ . is the initial identification of P.
When there is a marking in the sink place, there are no markings in the inner places.
Therefore, the criterion of soundness lies in liveness and boundedness, which requires a SCN always terminate and does not have dead transitions. In fact, if a SCN is sound, then the short-circuit net is live and bounded [30]. Example: Figure  2 is a sound free-choice Petri net (SCN).
Definition 2 (Weak Order Relation, WOR). [16] Let SCN P, T, F, be a sound free-choice Petri net. The weak order relation λ ∈ T T contains all pairs of transition , , there exists a firing sequence σ , , ⋯ , . with N, /i1 /σ 2, j ∈ 1, ⋯ , n 6 1 , j 7 k 9 n, such that it holds : , ; . On the basis of sound free-choice Petri nets and weak order relation, there comes the following definition. We know the cyclic structures have a substantial impact on the behavioral relations. For example, two transitions which are in exclusiveness relations inside a cycle may be in interleaving relations. Therefore, our definition of behavior profile is based on sound free-choice Petri nets. (SCN) Definition 3 (Behavioral profile, BP). [16] For a sound free-choice Petri net SCN P, T, F, , < ⊂ , x, y ∈ < < where there are three kinds of the relations: 1) Strict order relation →, iff x ≻ y, y ⊁ x.
3) Interleaving order relation ∥, iff x ≻ y, y ≻ x. If x ⊁ y, y ≻ x, then the relation of x and y is the inverse strict order relation, denoted as x → C, . The relations above comprise the behavioral profile of a SCN, denoted as BP →, A, ∥ . In Figure 2, , → : E 2,3,4,5,6,7 , L → : M 2,3; E 4,5,6,7 , O → : E 5,6,7 , L → P E 5,6 , Q A R , The execution of two transitions of a SCN either in strict order, exclusiveness, interleaving or in inverse strict order relations. These relations specify potential dependencies. The definition of the three relations are mutual exclusiveness. But in an ordinary net, there exists some pairs of transitions may belong to all these relations. (i.e. Figure 3) Sometimes, these relations in the ordinary net shows behavioral anomaly like dead-locks and unsoundness. In Figure 3, we could see. 1) If the beginning firing sequence is O , then Q ∥ P . 2) If the beginning firing sequence is , , then Q A P . 3) If the beginning firing sequence is T , then P → Q . Therefore, the relations between Q and P are differently and depend on the firing sequence.
Definition 4 (SCDN). Let SCDN P, T, F, , D, V W , X W be a sound free-choice Petri net with data operations if 1) P, T, F, is a SCN. 2) D is the set of data items. 3) V W denotes the "read" operation on T, X W denotes the "write" operation on T. Each transition should contain the "read" or "write" operations, if there are no element in "read" or "write" operation, we use ∅ to represent it. The "read" operations and the preset places are the preconditions of the transitions; correspondingly, the "write" operations and the post-set places are the post-conditions of the transitions. We use "R" denote the "read" operation and "W" denote the "write" operation. For example, in Figure 4: As shown in Figure 4, Z L is the read operation of Y , Z L ∈ V [ \ . Z : is the write operation of Y , Z : ∈ X [ \ . Only occurring " , and V [ \ Z L could enable Y and output " Q with X [ \ Z : .
Among the following definitions, in order to reduce the complexity of data flow analysis, we only consider those transitions that have the read or write operations in common, for others we ignore them. Definition 5 (Relations of different data items). Let SCDN P, T, F, , D, V W , X W be a sound free-choice Petri net with data operations. Y ∈ , Z L , In this paper, we have ignored the delete operation and the guard function. Among the following definitions, in order to reduce the complexity of data flow analysis, we only consider those transitions that have the read or write operations in common, for others we ignore them.
Definition 6 (Redundant data flow errors, RDE). Let SCDN P, T, F, , D, V W , X W be a SCN with data operations, if a data element Z ; satisfies with the following conditions in SCDN where there is a "write" operation but without a "read" operation later corresponding to it. That is to say, it may satisfy with one of the following conditions: In Figure 5, the data element Z ; is belong to RDE. Transition Y creates Z ; , but it never read or may read by transitions in the SCDN. Definition 7 (Missing data flow errors, MDE). Let SCDN P, T, F, , D, V W , X W be a SCN with data operations, if a data element Z ; belongs to the "missing data flow errors" in SCDN where it may satisfy with one of the following conditions: The data element Z ; is in missing data flow errors. Note that Z ; needs to be read immediately by ^, but it has not been created yet on the left of Figure 5. On the right of Figure  6, data element Z ; is created by a not by Y , if the firing sequence is Y , ^ , then Z ; ∈ ]c. If Z ; needs to be read immediately by ^, but Z ; has not been created if ^ firing in front of a result in Z ; ∈ ]c. On the right of Figure 6, data element Z ; is created by ^ not by Y , but it should be read by a , if the firing sequence is Y , a , then Z ; ∈ ]c. Definition 8 (False negative data flow errors, FNE). Let in de]f < . 2) ∃Z : ∈ X [∈W h and Z : ∉ V [∈W , then Z : ∈ V]c in de]f < . If ∃Z L ∈ ]c in de]f < , then SCDN may not run to the end. If ∃Z : ∈ V]c in de]f < , then the de]f < could run to the end.

Data Flow Anomaly Detection Technology
Here we present our algorithms to analysis business process models. For ease of understanding, we illustrate the method in a usual way. The algorithms established upon detecting the data flow errors based on FNE.
Algorithm 1: Detect the data flow anomaly in SCDN based on FNE.
Input: SCDN P, T, F, , D, V W , X W , de]f < < , < , < , < , ] < , V W < , X W < . Output: The suspicious data flow elements in de]f < , 1. If ∃t ∈ T ∩ < , ∃Z L ∈ V [∈W h, Z L ∉ X [∈W then, 2. Z L ∈ ]c in de]f < , 3. End if, 4. If ∃Z : ∈ X [∈W h and Z : ∉ V [∈W then, 5. Z : ∈ V]c in de]f < , 6. End if, 7. Print the suspicious data flow elements D Z L ∪ Z : , If the system could not detect MDE and RDE in de]f < , the system may read the data from the transition outside as de]f < /de]f, result in changing the control flow structure which bring about control flow errors. In the following, we use algorithm 2 to analysis the control flow errors caused by FNE.
Algorithm 2: Detect the control flow errors based on BP relations.
Input:  Table 1. Control flow elements of Figure 7 and Figure 8.    Figure 7 and Figure 8.  In this section, we analysis the control flow errors and data flow errors based on the above algorithms. In the following tables, we use the alphabet M denotes "Merchant", S denotes "Shopper", D denotes "Delivery" and H denotes "Hacker".

Case Study
In Figure 7, the firing sequence of have not read data item Z ,, . Cancel the order produced by Hacker, so the system does not suffer hacker's attacks which make the structure of transitions , , ,, , ,Q , ,R are as normal.
However, in Figure 8, the firing sequence of , , ,, , ,Q , ,R are affected by reading data item Z ,, . Cancel the order produced by Hacker not by the Delivery. For SCDN, the data item Z ,, belongs to MDE.
For de]f < , the data item Z ,, belongs to RDE for the system. If the system read the data item Z ,, wrote by Hacker, it would change the control flow structure and make changes in the data flow; multiple iterations of control flow and data flow error checking must be needed. Often the developed method could not defect this kind of error which bring about FNE.
According to algorithm 1 and algorithm 2, in de]f < , data element Z ,, ∈ V]c in l , Z ,, ∈ ]c in m . The anomaly of control flow and data flow in source/target model caused by FNE are shown in Table 3.
Through Table 3, we know there are four different pairs of transitions' relations between the source model and the target model. At the time, there are five different read operations between them. The read operation's differences are just caused by the execution order change of control flow, which are all caused by FNE. We could draw the conclusion that the control flow and data flow could affect each other. Table 4 shows the proposed algorithm compared with previous algorithms. It also shows that our algorithm could detect the drawbacks and logic errors which could not detect by others. The most important thing is that using our methods could detect these errors before the implementation of an e-commerce system which could avoid greater losses.  Reference [13,25] No problem No problem Reference [24] No problem No problem Reference [26,27] No problem No problem Reference [28] No problem No problem Our algorithm RDE MDE

Conclusions and Future Works
The formal techniques of fraud detection proposed in this paper based on the correctness analysis of the control flow and data flow. The advantages of the methods provided in this paper not only increase the detection rate but also could shorten the testing time and reduce the cost of attacks. The contribution of this paper is mainly manifested in the following aspects: Firstly, by analyzing the FNE caused by Hacker, we give a method to the system which could identify the original data flow errors. Secondly, based on the theory of BP, we could analysis the control flow errors and the data flow anomalies in the system. Through multiple iterations of control flow and data flow error checking, we could amend the model to avoid greater losses.
In the future, it is important for us to provide more methods on the detection of false negative data errors. In order to get rid of control flow anomaly and data flow anomaly, a series of iterations of the anomalies are needed in order to build an excellent model. At last, we should extend the method of model checking in irrationally free-choice Petri nets.