Automatic Vulnerability Detection in Tizen Applications with Dynamic Symbolic Execution

: Security of Internet-of-Things (IoT) systems is important due to their widespread usage in everyday life. Much research has been performed on analyzing the security of IoT communication protocols and operating systems. However


Introduction
Internet of Things (IoT) has affected many aspects of modern human life.Devices we use for our daily activities such as shopping, communication, sport, and helping disabled people are all examples of the ever-increasing usage of IoT in our daily human life.Therefore, the security of IoT devices becomes important to protect human activities, privacy, and data against malicious intruders.A considerable part of an IoT device is its software and thus in order to make it secure, we must take enough care of the security of its running applications.
IoT devices use various architectures and operating systems, such as Windows IoT core [1], Amazon FreeRTOS [2], Tizen [3], Contiki [4], etc.This diversity complicates the security analysis of IoT systems.Also, some IoT applications are used for real-time purposes and so they do not apply common security measures to protect against well-known attacks to increase their performance.In addition, IoT devices are implemented using cheaper and lighter hardware compared to ordinary systems.Thus, they are not able to incorporate usual memory protection mechanisms which have a huge overhead [5].
Application analysis for vulnerability detection is conducted by two approaches: dynamic and static.The static approach reviews the application code to find vulnerabilities.This approach is sound because it covers all the execution paths of the program in its analysis.However, the results are conservative and contain many false positive alarms [6].On the other hand, the dynamic approach executes the application and analyzes its behavior.The tools with this approach report no false positive alarms because they run the program with actual data and analyze the exact behavior of applications.The drawback of dynamic analysis is that it cannot guarantee to cover all application paths and states, and thus there are false negative alarms in its reports.Dynamic symbolic execution [7] is a hybrid analysis method that combines static and dynamic analysis to profit high coverage and accuracy of each method, respectively.In this paper, we use this method for analyzing and detecting vulnerabilities in Tizen IoT applications.
In recent years, the security of communication protocols in IoT systems has been studied by some researchers, such as [8], [9], and [10].There are also limited studies regarding the static analysis of IoT OS and applications [11], [12] and [5].These works emphasize different vulnerability types in current IoT operating systems and applications alongside the lack of proper protection measures against such vulnerabilities.Although, to the best of our knowledge, there is yet no proposed solution for dynamic analysis of these applications.The dynamic analysis of IoT applications is more challenging than other applications because they are executed on special-purpose hardware that might not contain enough facilities to debug and analyze the application runtime behavior.This paper presents a method for dynamic analysis of applications in Tizen IoT operating system.We use the dynamic symbolic execution method in our analysis to achieve a proper path coverage of the program.We present the details of how to overcome the challenges of performing dynamic symbolic analysis of applications in Tizen devices.We demonstrate the effectiveness of our proposed method by employing the implemented tool for analyzing two sample Tizen applications.The implemented method is publicly available in our GitHub repository [26] with sample test scenarios.
We present the detailed method of dynamically analyzing two types of Tizen applications, namely native and web.Analysis of native applications is conducted by connecting the analyzer to the gdbserver in Tizen OS and statically analyzing the binary code of the target application to locate probably vulnerable function calls, such as strcpy(), and determine input arguments of those functions.Then, Symbion symbolic execution engine [13] is employed to calculate the path constraints on input data for executing a path that reaches located probably vulnerable function calls.After solving the constraints using a constraint solver, we generate consistent input values that are large enough to execute the intended paths and cause buffer overflow in considered function calls.
Tizen web applications are written in HTML, JavaScript and CSS that may execute some API functions in this OS.Since the analysis of such applications and detection of XSS vulnerability do not directly depend on the result of execution of Tizen API functions, and we could not employ a dynamic symbolic execution framework in that operating system for our analysis of web applications, we analyze the web application runtime behavior by executing it in an environment outside of Tizen.We use ExpoSE [14] tool that is developed to symbolically execute NodeJS applications for our analysis.Our proposed method extracts the JavaScript code of a web application and prepares it to be analyzed by ExpoSE.In fact, it creates a new version of the application code that can be processed by ExpoSE to detect vulnerabilities in it.To do so, our proposed method first analyzes the HTML and JavaScript codes of the test application statically to determine its HTML input/output entries and probably vulnerable JavaScript statements as hotspots.We add a conditional statement before a hotspot that checks if the data entering the hotspot might contain attack strings.In this way, we lead ExpoSE to calculate vulnerability constraints for input data in addition to path constraints in the paths containing hotspots.Afterward, the prepared code is given to ExpoSE in order to analyze the possibility of executing the application in an execution path containing hotspots with input values that lead to an attack.
The proposed method is evaluated using a group of native and web test programs that are executable in Tizen operating system.The results of our experiments demonstrate the effectiveness of our solution for detecting buffer overflow and XSS vulnerability in these programs.
This paper has the following structure: Section 2 reviews the previous related works.Tizen operating system and its application types are introduced in section 3. Section 4 describes the proposed vulnerability detection method for native applications of Tizen.Our proposed method for detecting vulnerability in Tizen web applications is presented in section 5. Section 6 evaluates the implemented solution and finally, section 7 presents the conclusion and possible directions for future works.

Related Works
Most studies in the field of IoT application security analysis have focused on operating system security and static application code analysis.For example, in [11], a tool named UnsafeFunsDetector is used to statically analyze the source code of some IoT operating systems such as Contiki, TinyOS, and openWSN to find a number of unsafe functions in their code.As another example, Firmalice is a static analyzer that searches through the binary statements of IoT firmware to find any hardcoded credentials that might reveal a backdoor in the firmware [15].
To the best of our knowledge, there has been no solution for dynamic analysis of Tizen applications until now.However, there are some general frameworks and tools, such as Avatar [16] and Qilling [17], to dynamically analyze native applications of IoT devices.Avatar is a simulator that physically connects to IoT devices and analyzes their system events [16].It hooks the intended events from the physical system to transfer their data to the simulator and analyze them.The limitation of this simulator is that it requires a JTAG port in the physical IoT system that is not supported by all IoT devices.
Qilling is another example that simulates an entire system from the OS image and its file system [17].This simulator reviews the system functions of the device file system to hook the intended system functions and make their behavioral analysis possible for the analyzer.This simulator requires the OS image of the intended device, which is not necessarily available for all devices.Tizen's producers have released its image but in our experience of running this image in Qilling, it got stuck in the initial boot step and could not proceed further.On the other hand, Qilling runs the application image on the analyzer device, so it doesn't analyze the real application behavior as it runs in an IoT device.Notably, none of these simulators have a framework for dynamic symbolic execution and such a framework must be installed in these simulators.
There are also some solutions for dynamic symbolic execution of web applications, such as SymJS [18] and JSSeek [19], which are not available to be used in our solution.SymJS tries to create test data that covers the highest percentage of execution paths in a web application using symbolic analysis.Also, this tool automatically extracts the application events to simulate them and execute the relevant codes responding to these events.Since this is a private tool, we could not use it in our solution.JSSeek [19] is another tool that performs symbolic analysis of JavaScript applications.This tool uses symbolic analysis to find different errors in JavaScript applications, such as undefined variable error.
Some solutions are presented for analyzing JavaScript environments, such as NodeJS, but none of them were ever used in the analysis of web Tizen OS applications.For example, ExpoSE [14] performs dynamic symbolic execution on NodeJS applications.This tool explores different paths of an application by generating appropriate input values.We cannot use this tool directly in the analysis of Tizen web applications because of different data structures and functions in JavaScript applications that are executed in browsers alongside the dependency of JavaScript codes on the HTML codes of web pages.

Tizen
Tizen is an open-source operating system used in a wide range of IoT devices, such as wearable devices, televisions, and smartphones [3].Hence, it is known as "the OS of Everything" [12].This OS is based on the Linux kernel, which makes it similar to other Linux versions in many aspects, such as process management, memory management, file system, and system calls.In this paper, we consider Tizen version 5.5.
Tizen OS supports three application types: 1. Native Applications: C and C++ applications.
2. Web Applications: JavaScript, HTML, and CSS-based applications that are executed in a web engine.3. NET Applications: Xamarin and Visual Studio-based applications.Also, Tizen OS allows programmers to write applications that pack web and native applications in a single application known as Hybrids.
Native applications are executed directly in the Linux kernel platform.When we compile a native application by default using the Tizen IDE, it will have the NX and PIE mechanisms enabled, but the Canary mechanism is disabled.However, as shown in [20], it is possible to bypass the active security mechanisms of this OS.NX and PIE are two security mechanisms used in well-known operating systems to prevent memory corruption attacks [25].
The web Tizen applications are executed using the OS's web engine.XSS and HTML Injection are the vulnerabilities that might occur in these applications.Web Tizen applications are able to access Tizen system calls and APIs.For example, the tizen object is defined for JavaScript codes of these applications for gaining access to different system functions or sensor values.Therefore, exploiting the vulnerabilities in Tizen web applications might lead to disclosure of confidential information and unauthorized access to the device.

Detecting Buffer Overflow Vulnerability in Tizen Native Applications
Unsafe usage of functions, such as strcpy, gets, scanf, etc., in Tizen native applications can lead to buffer overflow.Different static and dynamic analysis methods have been introduced to detect this vulnerability in various applications.Dynamic symbolic execution is one of the popular vulnerability detection methods that generates proper input data for executing different application paths by calculating the constraints on input data for executing each path.This method suffers from the path explosion problem, which means the number of possible paths in an application grows exponentially and analysis of all execution paths becomes impossible in practice [21].
Therefore, in our proposed solution, we first perform static analysis to find execution paths in the program that contains possible vulnerable function calls, e.g., strcpy, and then we limit the scope of dynamic symbolic execution to those paths to avoid path explosion problem.As an example, consider Figure 1, which is a simple Tizen native application with buffer overflow vulnerability in line 8.This application receives two input values: one is the main function argument and another is a console input value received via fgets function in line 6 that is saved in the input string.There is no buffer overflow in this stage because of the string input length control of line 6.Later in line 8, the strcpy function copies the input string into the buffer string if the value of the input string is equal to "XGXXXM*", in which X means an arbitrary character.Here a buffer overflow occurs because this function does not check the length of the strings.
To detect the buffer overflow vulnerability in this code, we first use the angr [22] framework for our static analysis and extract the application's control-flow graph to detect any calls of possible vulnerable functions.Thus, we find the execution path in which strcpy function is called.Then, we perform dynamic symbolic execution using Symbion plugin of the angr framework and calculate the path constraints on the input values that lead to the desired path.For this example, the main argument should be "-s" and the second and sixth characters of the console input value should be "G" and "M" respectively to reach the strcpy function call.
These constraints are delivered to Z3 constraint solver in angr to generate consistent input values accordingly.Then, we move on to the fuzzing step and increase the length of generated input data while considering the path constraint on it to generate new input data that executes the program's desired path and causes buffer overflow in the strcpy function.Then, we execute the application with these new data and report the vulnerability in case of a crash.Notably, the dynamic symbolic execution engine or the static analysis tool cannot be installed in the Tizen emulator, which is one of our main challenges in the static analysis and dynamic symbolic execution of the Tizen application.Therefore, we install and launch the test application in Tizen and analyze it by establishing a remote connection to the gdbserver of Tizen. Figure 2 illustrates the architecture of our solution for analyzing Tizen native applications.As shown in this figure, the compiled test application is installed and launched in Tizen to be analyzed dynamically.Also, the binary code of the application is given to the angr framework for static analysis.To perform dynamic symbolic execution, we use Symbion in our machine and connect remotely to the gdbserver in Tizen.All the commands used in our solution, including those for compiling and installing the application by sdb 1 and the Tizen configuration commands to debug applications using gdbserver, are presented in a script file in our Github repository [26].
Figure 3 illustrates the output of our implemented solution for analyzing the sample program.In this figure, SIGSEGV status code demonstrates that we could successfully generate long input data that is consistent with the path constraints to execute the program and cause buffer overflow in it.1 The communication bridge between the developer and the Tizen system

Vulnerability Detection in Tizen Web Applications
Dynamic symbolic execution is more challenging for Tizen web applications as there is no framework or platform available in Tizen that enables us to symbolically execute web applications in that operating system.Meanwhile, since Tizen web applications are written in HTML, JavaScript, and CSS languages and they do not depend heavily on Tizen web engine, we can execute them outside of Tizen without losing many functionalities.The only limitation is for calling Tizen APIs.With our proposed solution, it is possible to handle them by defining Tizen APIs in the target NodeJS code so that they are considered as input entries that return a symbolic variable.
Therefore, we extract the codes of Tizen web applications and execute them symbolically in our ordinary machine in a framework named ExpoSE [14].At the time of writing this article, ExpoSE is the best available tool for us which has a proper performance but only analyzes NodeJS codes.This makes some difficulties in analyzing JavaScript codes that are executed in browsers due to the differences between JavaScript and NodeJS codes.For example, web JavaScript codes use data structures, such as DOM or Document Object Model, for ease of access to web page elements.These structures are not defined for NodeJS and thus are not recognizable by ExpoSE.
Also, web JavaScript is capable of handling the events of a web page.For example, you can write a procedure in JavaScript in response to the user clicking on an HTML element.These events do not exist in NodeJS and also, they are not defined for ExpoSE.In fact, JavaScript codes running on browsers are closely related to HTML codes.They might read some values from HTML pages or write new values to them.
Figure 5 demonstrates how a web application is analyzed using our solution.Our python application extracts the HTML and JavaScript codes of the intended web application and prepares an equivalent NodeJS version of it that is processable by ExpoSE.Afterward, the ExpoSE conducts the dynamic symbolic analysis of this code to determine the existence of any possible vulnerable points.
In the following, we present the details of how ExpoSE works.

ExpoSE
This tool is based on Jalangi [23], which receives a NodeJS code with predefined symbolic variables as its input and generates some values for these symbolic variables that lead to the execution of different paths as its output.Actually, we must manually determine the symbolic variables of an application code before using this tool.For example, in Figure 4, lines 3 to 5 show a piece of the NodeJS code of an application.Lines 1 and 2 are added to this code to make it analyzable by ExpoSE.Line 1 adds the ExpoSE library and line 2 defines variable t using the symbol library function in ExpoSE as a symbolic variable named X.This code has two different paths based on the value of t, and ExpoSE finds the value needed for each path considering the symbolic t.
Figure 6 shows the output of ExpoSE after dynamic symbolic execution of this application in which there are two different values for variable t to execute each path.

Sample Application
In order to explain the details of our solution, we first present a sample Tizen web application that is based on one of Tizen Studio-based web applications.Figure 7 and Figure 8 show HTML and JavaScript codes of this application, and Figure 9 shows how this application has been executed in a wearable Tizen OS device.In this application, line 3 of the HTML code, in Figure 7, binds the event of changing the input HTML element to a function named myFunction.This function is defined in line 9 of the JavaScript code, Figure 8, and it is called whenever the input element changes.The input value of this element would be used directly by myFunction to generate an output string.Thus, it is possible to inject some JavaScript codes into the input tag text value and execute malicious commands.For example.
Figure 10 shows the result of executing the sample web application by injecting string <img src=x onerror=alert("XSS")> in the input tag text value.This shows that the application is vulnerable to XSS and HTML injection attacks.

Dynamic Symbolic Execution Process for Tizen Web Applications
As mentioned in the previous sections, to use ExpoSE for dynamic symbolic execution of Tizen web applications, we have to first prepare the application code to be processable by this tool.Therefore, we analyze the JavaScript and HTML codes of the web application through text processing and pattern matching with regular expressions and create an equivalent NodeJS version of the codes.The proposed solution is implemented as a python program and is publicly available in our GitHub repository [26].
The implemented solution first defines necessary data structures such as DOM at the beginning of the equivalent NodeJS code so that NodeJS is able to recognize these objects.Some of these objects are application input entries.For example, the right side of the following statement is an application input entry: var a=document.getElementById("theID").textContent; Therefore, the DOM data structure is defined in a way so that it considers the received input data from these entry points as symbolic.In addition, the proposed solution determines output points in the code which are also DOM objects.For example, the left side of the following statement is an output point: document.getElementById("theID").innerHTML=a;When something is stored in an output point of the application, we insert a conditional statement that checks if the stored data may contain attack strings.ExpoSE calculates this condition as a path constraint when analyzing the final code.In other words, ExpoSE attempts to generate proper data that contains attack strings and causes executing a specific path and reaching the hotspots.If ExpoSE generates such input data, it means that the program is vulnerable to considered attacks.

Employing the Proposed Solution for the Sample Application
In the case of the sample application in Figures 7 and 8, our solution analyzes the code in the only JavaScript file js/main.js.It reads each line of this code and transfers it with or without changes into an equivalent NodeJS code, which is partly shown in Figure 11.This process is explained in detail in the following.For anonymous functions in the JavaScript code, our method generates a random name to make the analysis simpler.For example, the first line of Figure 8 defines an anonymous function and assigns it to the window.onloadmethod.Therefore, the first line of NodeJS equivalent code considers the random function name instead of this anonymous function and assigns it to window.onload,as shown in line 103 of Figure 11.Then, it defines a new function with the same name in the NodeJS code according to its actual definition in the original JavaScript code, as shown in lines 105 to 115.Due to the hoisting feature of JavaScript [24], using a function before defining it is not a problem.For the input/output entries in the JavaScript code, our python script keeps a list of these entries to add a new conditional statement to the target NodeJS code for detecting XSS vulnerability whenever some data is stored in one of these entries.In line 2 of Figure 8, the textbox variable becomes an input/output entry, and thus it is added to this list.
In line 3 of this code, there is another anonymous function and the same operation is performed for this function as shown in lines 107 to 113 in Figure 11.The addEventListener function in this line is not defined in NodeJS.This function in web JavaScript codes analyzes the occurrence of a specific event and executes a function in response, which might have vulnerabilities in its body.This function should also be defined in final NodeJS code in a similar manner to document.getElementByIdfor NodeJS.Also, these events should be simulated in order to analyze their call-back function bodies.The simulation of application events requires the calling of event-related functions in different orders.Therefore, at the end of the final NodeJS code, we first shuffle the defined events and execute their related functions multiple times with some symbolic values as their inputs.
In line 4 of Figure 8, the box variable becomes an input/output entry and is added to the list of input/output entries.Line 5 of this code stores some data into box.innerHTMLwhich is one of the HTML elements of this page.In this line, box.innerHTML is used twice; one, as the right side of the assignment operation in box.innerHTML=="basic"condition.This conditional statement gets the innerHTML value of the box tag and compares it with the string "Basic".Based on the result of this comparison, either "Basic" or "Sample" string is stored into the other box.innerHTML on the left side of the assignment operation.Therefore, the left box.innerHTML of this statement is an output entry.In this example, constant strings "Sample" and "Basic" are stored in an output entry, and thus XSS and HTML injection attacks are not possible here.In real-world applications, these strings might depend on the user input data, and there would be a chance for these attacks.For this line, a conditional statement is inserted into the NodeJS code, as shown in lines 110 to 112 of Figure 11, to check if the code is vulnerable to XSS attack.
The same happens in the first line of myFunction body in lines 9 to 10 of Figure 8 and the result in the final NodeJS code is shown in lines 117 to 124 of Figure 11.Here, there is a chance of XSS and HTML Injection because variable val depends on the user input.
Note that not all events of a web page are defined in its JavaScript codes, some of these events might be defined in the attributes of its HTML elements, such as onChange.For example, we have the myFunction function call in line 3 of the sample program's HTML code, Figure 7, in response to the onChange event.Therefore, we move on to the HTML page codes and their related events after analyzing the JavaScript codes.The application inserts all event codes of a specific event into a function, changes this variable to this_, and allocates a symbolic value to this_ at the beginning of the function, as shown in lines 125 to 128 of Figure 11.Finally, it adds this new function to the array of event functions and treats them similarly to JavaScript events.The reason for assigning this_ to the input of the function, instead of directly assigning it to a symbolic variable, is to be consistent with events that are defined in JavaScript code using addEventListener.Those functions have an input that corresponds to a related event.In this way, the newly created function can be considered as a call-back function for an event.
The final NodeJS code is processed by ExpoSE to perform dynamic symbolic execution.If ExpoSE generates input data that is consistent with the constraints of paths containing hotspots, this means that the program is vulnerable and the generated data can be used to attack the program.Figure 12 illustrates the ExpoSE output after processing the sample NodeJS.

Evaluation
We have evaluated our implemented solution using two groups of benchmark programs.The experiments are performed in a system with Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz CPU, 16G of RAM and Ubuntu 20.04 operating system.To study the efficiency of our solution in detecting buffer overflow in native applications, we have used NIST SARD benchmark C programs [27] and compiled them to be executed in Tizen.The programs in this benchmark contain one vulnerable statement and one or two similar secure statements that copies some data into a heap or stack buffer using a strcpy, memcpy, etc function call.Thus, a precise vulnerability detection solution would achieve one true positive and one or two true negative results for each test program.Since there are simple path constraints in these programs, we have made them more complicated by adding an additional if statement to the vulnerable paths.In addition, instead of copying constant data into a heap or stack buffer, we have copied an input string, entered by the user as a command-line argument, into that buffer.A vulnerable function in one of these benchmark programs is presented in Figure 13 as an example, and our added if statement is underlined in line 16.The same if statement has been similarly added to all benchmark programs.Table 1 presents the results of analyzing NIST SARD benchmark programs by our implemented solutions.The columns in this table represent, from left to right, the number of test programs with stack or heap buffer overflow vulnerability, the total number of true positives, the total number of true negatives, the total number of false positives, and the total number of false negatives in the reports of our solution for each group.Figure 14 also represents the average time of analyzing a benchmark program with a specific vulnerable statement by our implemented solution.As the results demonstrate, our solution could detect all vulnerabilities in these programs effectively.To evaluate our solution for analyzing Tizen web applications, we have designed a group of vulnerable HTML and Javascript codes, as we could not find appropriate benchmark programs.There are various scenarios for XSS vulnerability occurrence and increasing levels of path complexity in these programs.These programs alongside a script to re-execute the experiment exist in the testcase directory of our solution source code.
Table 2 presents the results of this experiment.The columns in this table represent, from left to right, the name of each test program, the analysis time by our solution in seconds, the number of test cases generated to test the program, if our solution could detect the vulnerability with a true positive report and the constraints and details about the vulnerable statement in the program.As an example, XSS arises inside an event callback function in the test program named "Web4" and the path constraint on the input string to reach the vulnerable statement is "input[0] = 'a'".Our fully described the dynamic symbolic execution process for native and web applications while demonstrating the effectiveness of the proposed solution for detecting vulnerabilities in two sample applications and a group of benchmark programs.
In the future, we intend to expand our method to discover other types of vulnerabilities such as race conditions and use-after-free in Tizen applications.Also, when it comes to web applications, the implemented solution is limited to ES5 or older versions.Therefore, it does not support newer syntax versions of JS or any related libraries, such as JQuery.We will define these libraries and syntaxes in the future so that our solution could analyze new versions of JavaScript codes.

Figure 1 .
Figure 1.Sample of a Native Tizen Application.

Figure 2 .
Figure 2. Architecture of the Proposed Method for Analyzing Native Tizen Applications.

Figure 3 .
Figure 3. Output of a Native Application's Analysis.

Figure 4 .
Figure 4.A Piece of Code for Testing the ExpoSE Tool.

Figure 5 .
Figure 5. Architecture of the Proposed Method for Analyzing Tizen Web Applications.

Figure 6 .
Figure 6.ExpoSE output for Sample code in Figure 4.

Figure 7 .
Figure 7.The HTML Code of a Sample Web Tizen Application.

Figure 8 .
Figure 8.The js/main.jsCode of a Sample Web Tizen Application.

Figure 9 .
Figure 9. Image of the Sample Application in a Wearable Device.

Figure 11 .
Figure 11.Part of the generated equivalent NodeJS code.

Figure 12 .
Figure 12.ExpoSE output after analyzing the sample NodeJS code.

Figure 13 .
Figure 13.A vulnerable function in a benchmark program.

Table 1 .
The results of evaluating our solution using vulnerable C benchmark programs.Figure 14. Average analysis time of a test program with a specific vulnerable data copy operation in a stack or heap buffer, i.e. memcpy, strcpy, memove and strcat.