Some improvements of string contraint solving in automated test cases generation for symbolic execution

For a symbolic execution model, in addition to a string variable, all variables include integer variables,

which are assigned run-time values collected in dynamic symbol execution. This study is not modeling for

mixed constraint solving (constraint solving for mixed string and integer data). However, in string constraint

modeling, the execution progress extracts integer constraints on the length of the strings. This helps to find asatisfied length assignment. Thus, integer constraints are arised on string data. Most modern constraint solvers for integer data are efficient and accurate. Hence, generating integer constraints is not interfere with execution.

pdf27 trang | Chia sẻ: honganh20 | Ngày: 05/03/2022 | Lượt xem: 220 | Lượt tải: 0download
Bạn đang xem trước 20 trang tài liệu Some improvements of string contraint solving in automated test cases generation for symbolic execution, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
n. 1.6. The conclusion of chapter 1 Chapter 1 of the thesis presents the fundamental concepts of software testing techniques, basic techniques of automatic test case generating from the models. Apart from that, the comparison of static white box testing with dynamic white box testing is also stated. Moreover, the challenges of software testing is given 6 in this chapter. The basic aspects, overview of symbolic execution methods and unresolved problems are presented. CHAPTER 2. SYMBOLIC EXECUTION AND CONSTRAINT MODELING In this chapter, the difficulties of symbolic execution are analysed and assessed. Besides, the extended tools, the installation of symbolic execution techniques used in improving string constraint modeling technique are also presented. The modeling methods used in installation, evaluation of string constraint and mixed constraint solving methods are addressed in this chapter. The contents of this chapter are introduced in published results of PhD student, including [CT1], [CT2], [CT3]. 2.1. The difficulties and some solutions In this section, the difficulties in applying symbolic execution are discussed. Some solutions used to overcome the challenges are presented. 2.1.1. Path exploration One of the biggest challenges is path exploration. It means that a large number of paths is generated even though the tested program is small. The number of paths is often by a power function. In a certain perioid of time, the testing program has to discover the first path that is the most suitable. 2.1.2. Memory modeling The accuracy of instructions in program when transferring to symbolic constraints affects significantly to the coverage of symbolic execution and the constraint sovling capability as well. For example, in order to establish the range of integer parameters, a memory modeling technique is more effective than the normal way. However, in this case, the accuracy of code analysis is worse. It depends on the selection of value range. For example, math memory over flow error can cause the lack of paths in symbolic execution or mining the impossible paths, etc. 2 .2. Symbolic execution and extended tools 2.2.1. Symbolic execution and software testing Test case automatic generating techniques used symbolic execution are installed on Java to generate test data for programs. However, the performance of available tools has their own disadvantages. Constraint optimization algorithm is applied in order to reduce the execution time and guarantee the effectiveness of coverage. This solution is implemented on different Jave programs and gets the good results. This section presents the aspects of Java PathFinder (JPF) including the checking ability of JPF, high level architecture of JPF and extension capability of JPF. 2.2.2. Symbolic execution on Java programming language Introduction to Java PathFinder (JPF) Nowadays, symbolic execution techniques is performed by various tools on different programming languages such as DART[36], CUTE[35] on C/C++ source code or PEX[36, 37] on .NET source code. In this thesis, path finding is set up on Java because Java is a programming language that independs on environment. Java is also used in many important applications. Moreover, JPF is a powerful tool that support for symbolic execution. This leads to the widely extension ability and a huge developmental community of JPF. 7 JPF is a verification set of script state model for Java [68]. Not same as normal virtual machine, JPF is a virtual machine performing Java programs more than once. JPF executes on all branches, all possible paths of a program. JPF checks the range of attributes such as the exceptions that can not catch through potential execution paths. Figure 2.1 describe the model of JPF Figure 2.1. The model of JPF Checking ability of JPF JPF can check all Java programs. It can find dead key or exception. Apart from that, JPF can develop and extend by itself to test other attributes. High level architecture of JPF JPF is designed by two main components, including JVM and Search (as in Figure 2.3 below). In which, JVM is a Java typical state generation by executing Java bytecode statements. Search is used to choose the state that JVM should process, to direct JVM in generating the next state or to request JVM to go back a previous state. 8 Figure 2.3. High level architecture of JPF. The main installation of Search includes Deep first search (DFSearch) and HeuristicSearch. Extension ability of JPF JPF is considered as a framework that each developer can extend it in order to support for a typical purpose. JPF supplies an extension mechanism for developers in order to allow to add new functions without changing directly the installation of Search or JVM With this flexible extension structure, there are several extensions of JPF. 2.3. Constraint solving and symbolic execution Related to generating test cases, genetic algorithm approach has been studied in research work [CT[1] of PhD student. However, this approach containts many limitations. Although many improvements of constraint solving techniques have been proposed, the constraint solving is one of big obstacles of symbolic execution. Constraint solvers often take a high cost in run time. Thus, symbolic execution can not extend in some kind of program which the source code requires a high performance in constraint solving. To overcome this challenge, the optimization step in constraint solving is necessary in symbolic execution. This is very effective in real programs. There are two optimization methods used in modern tools, including: - Removing unrelated constraints - Constraint solving using incremental methods (Increamental Solving) [71] In constraint sets that are generated in symbolic execution, the representation for a set of static branches from source code is one of the most important properties. In KLEE, all query results is stored in buffer mapping the set of constraints and the typical values corresponding to those constraints. The special case happens when there are no solution or no satisfying a special notation. 9 2.4. Mixed constraints and improvement in string constraint solving In fact, string constraints appear in most of programs, especially in programs or procedures that clean strings in input data in order to test the validity, to discover and to remove harmful contents. String data in Jave is equipped by a huge library. String data conceals many errors. To accomplish testing capability and error discovering ability in Java programs with string constraints, the thesis focuses to the progress of defining errors in order to get the better programs. By applying software testing on string constraints, the coverage ability increases and then the quality of software is also improved. Why does symbolic execution on string data meet difficulty? The operators on string data are the mix of two value domains: string and integer. For example, we need to get the n th character in a string. In present, there are several solutions of symbolic execution on string data. But these solutions have just support for a subset of these operators. The iterative approach is chosen. First, we sovle the constraints on integer data. Then, the obtained results are used to solve the string constraints. If string constraints are satisfied, the task is complete. Otherwise, integer constraints are added into the constraint set. This procedure is repeated until there is no unsatisfied string constraint. The main idea for proposing the extension of symbolic execution is presented as in Figure 2.7 below. Figure 2.7. An example a string graph The iterative progress is performed by two following steps: - Step 1: Current solver of JPF is called to process integer constraints, real constraints, logic constraints, If all these constraints is solved, the step 2 is performed. - Step 2: + The typical values obtained from Step 1 are loaded in string graph. + The string solver is called to solve these constraints. In this case, string solver can be AutomataSolve or BitVectorSolve. The constraints of string graph may not satisfied with input values. Then, if there is enough time, Step 1 is run and string constraint set is added by a new integer constraint Solve(PathCondition pc){ 1: StringGraph sg; 2: sg=BuilStringGraph(pc); 3: boolean sat=false; 4: While(!sat and !timeout){ 5: sat=CurrentSolver(pc,sg); 6: if(sat) 7: sat=StringSolver(pc,sg); 8: return sat; } } 10 buildStringGraph(PathCondition pc) { 1: StringGraph sg ; 2: for string or mixed constraint c 2 pc: 3: sg= sg hyperedge(c) 4: return preprocess(pc; sg) } Figure 2.8. Mixed constraint solver algorithm The limitation of time “timeout” is added in order to guarantee the combination of string solver and integer solver can not give the final conclusion in a allowable time. If it reaches to “timeout”, the algorithm stops, the given constraint is concluded as unsatisfied. 2.4.1. String graph There are some types of constraints including string constraints, integer constraints, string and integer mixed constraints. In the installation progress, string constraints and mixed constraints are represented by special graph called as string graph. Definition: String graph is a direct graph denoted by H=(X, E, Ʊ). In which, the vertex set is defined by X=IS where I and S in separated sets (I presents for integer variables and S presents for string variables). Ʊ presents for the operators on string variables. E presents for labeled direct edges in graph (E = E1 E2 E3 E4) satisfying En= Ʊ X n . Using graph is a popular method in constraint satisfying programming. Figure 3.1 describe an example for string graph corresponding to constraint: s1.trim().equals(s2)^s1.equals(s3.concat(s2)) 2.4.2. Constraint re-construction Preparing for pre-processing progress, for each string constraint or mixed constraint of path constraint condition, considered as an edge of string graph, the graph re-constructs constraints by constraints and returned values of operators that are mapped directly into the edges of string graph. For example, the constraint s1.equals(s2) is added to edge (equals, s1,s2) with other operators. The returned value can be a char, an integer or a string. An extra variable is created and represent for result of operator. This extra variable is added as a vertex of string graph. The graph allows to add an edge as a returned value. In the example showed in Figure 2.7, each circle is a vertex of string graph. The dots and lines represent for edges in this graph. Each edge is labeled by the operator and the number on each edge shows the order of vertex in string graph corresponding to a constraint. The string variables s1, s2 and s3 appear as the vertex, elements of S. In this graph, Ʊ = {trim, equals, concat}. Operator trim creates an extra variable s‟ added into S. The operator is the edge (trim, s1, s‟) is labeled by element trim of Ʊ. This edge connects s1 and s‟. The operator concat is the same. It is the second constraint that creates extra variable s” and edge (concat, s3, s2, s”). The operator equals corresponds to two last edges. When a string graph is completed, it is processed by two following ways: 1. Define the constraints that have no satisfied values in a simple way. 2. Add the values generated by integer solver in path constraint expression. 11 2.4.3. Pre-processing progress When unsatisfied constraints are determined, execution progress will skip string constraint solving. Consequently, time consuming decreases. Eliminating equals edge from string graph is one of proposals. In above example, string graph is adjusted to new graph as shown in Figure 2.9. Figure 2.9. String graph obtained by removing equals operators. This also supports to define rapidly set of constraints represented in string graph that have no satisfied value. For example (shown in Figure 2.10), after removing edge in Figure 2.10.(a), vertex s1 connects to itself by notEquals edge. This connection is presented in Figure 2.10.(b). Then, there is not existing any string which satisfies this constraint. Most of proposals are introduced to deal with the constant operators. For example, the constraint s1.concat(“xyz”) = ”vwxyz” will lead to the elimination of the complex edges. Then, vertex s1 is replaced by string “vw”. Another example, the constraint s1.startsWith(“abc”).chatAt(0) ≠ „a‟ is not suitable and it is easy to discover. If this situation appears in pre-processing stage, the use of string constraint solver is unnecessary and false value is returned. Figure 2.10. Unsatisfied constraints after removing equals operators 2.4.4. String constraint generating and the implemental results Before the constraints are generated, new vertices are added in string graph. For each vertex that is not a string constant SiS, an integer variable liI is added in order to represent for the length of string Si. Figure 3.4 shows that the new vertex of string graph in Figure 3.2 is connected to corresponding string vertices by dashes. 12 Figure 2.11. New vertices represent for the length of added string Other constraints are based on the edges as shown in Table 2.1. Table 2.1. Constraint construction for operators on string. 2.4.5. Constraint solving using automata Constraint solvers is popular tools used to deal with many problems in reality such as theorem proving, time scheduling in real time. Symbolic execution in automatic generating test cases is used in research works CT[2], CT[3] by PhD student. Many researchers reduce the limitations of available constraint solvers in order to improve the applications in symbolic execution, especially in test case generating. Although there are various improvements, the constraint solvers are not effective with some certaint constraints. In computer programs, string data appears frequently, especially in the program components or procedure. The Java programming language provides a huge library of string data types but still has many errors. In software testing, identifying bugs is concerned. In symbol execution, string constraints are also considered in most of cases. Deterministic Finite Automaton (DFA) is used to solve string constraints based on specific algorithms (CT[5]). 2.5. The conclusion of chapter 2 This chapter has presented the issues related to symbol execution and extension tools with an effective role in test case automatic generation. The main content related to symbol execution includes various symbol Java operators Heper-edge New constraint For each constraint of vertex is not a string constant Si S li>0 Sa.charAt(n) charAt,Sa,n,c la≥n Sa.concat(Sb) concat, Sa , Sb, Sx lx=la+lb Sa.indexOf(Sb) indexOf, Sa, Sb,x (x=-1)˅(la≥lb+x) Sa.lastIndexOf(Sb) lastIndexOf, Sa, Sb,x (x=-1)˅(la≥lb+x) Sa.subString(n,k) subString,Sa,n,k,Sx la=n+lx Sa.contains(Sb) contains,Sa,Sb la≥lb 13 execution techniques; advantages and disadvantages of these techniques and the processings that need to apply symbolic execution in software testing progress. Chapter 2 also addresses the difficulties in implementing the symbolic execution and several solutions are pointed out in this chapter. Constraint modeling approaches using string graphs, constructing constraints on string graphs, and preprocessing to detect unsatisfied constraints are presented in this chapter. The contents presented in chapter 2 are the research results from the works CT[1], CT[2], CT[3], CT[5]. CHAPTER 3. STRING CONSTRAINT SOLVING In this chapter, the thesis presents a proposal to improve string constraint solving in symbol execution based on automata theory and apply it on string data type. Experimental results, evaluation of text data processing of the improved model with some existing models are also shown in this chapter. The contents of this chapter are published in [CT4], [CT5], [CT6] of PhD student. 3.1. String constraint solver Many applications depend on the handling of a text data type. Since the use of the Internet has become common with the proliferation of interactive applications, this problem becomes more serious. Interactive applications include social media, chat, sensitive and important information management, personal information management. The input data of the programs is used in SQL query statements and stored in database management systems. This allows users to access directly to the database. The reason is that the input data is open and public and it is also open to other objects. Some objects have bad intentions, so "cleaning" the text input is a requirement that appears mostly in web services. The current methods of applying symbol execution on string code are divided into two groups: based on automata [5, 53] and based on Bitvector [7]. 3.2. Bitvector and satisfiability modulo theories (SMT) 3.2.1. Satisfiability modulo theories (SMT) The study of this thesis uses the SMT Z3 constraint solver [70] constructed by Microsoft. This study also considers to the use of CVC [25] but realizes that it is slower than Z3 because of using the SMT-lib2 standard for the bitvector constraint. The thesis also applies alternative symbols for addressing by byte instead of by bit. For example, a [0: 7] = 01100010 will be replaced with a [0] = 'b'. Constraint resolving by bitvector is beyond the scope of this study. 3.2.2. String constraint solving based on BitVector Consider the example where the edge (contains, Sa, Sb) corresponds to the constraint Sa.contains(Sb), it is provided that the current estimate for the lengths of Sa and Sb is 5, 3 respectively, the constraint result for this edge is: (Sa[0]=Sb[0])˄(Sa[1]=Sb[1]) ˄ (Sa[2]=Sb[2]) ∨((Sa[1]=Sb[0]) ˄ (Sa[2]=Sb[1]) ˄ (Sa[3]=Sb[2]) ∨((Sa[2]=Sb[0])˄(Sa[3]=Sb[1])˄(Sa[4]=Sb[2]) If the platform of constraint solver supports the array type, it can be simplified to (Sa[0:2]=Sb)∨(Sa[1:3]=Sb) ∨(Sa[2:4]=Sb). The process of compiling the edges of the string graph into the BitVector constraints is given in the second column of Table 3.1 14 Table 3.1. Construction of constraint corresponding to string operators. Edge Automation formula Bitvector constraint (charAt,sa,n,x) Ma= Ma [n] {x} [*] Sa[n]=x (concat,sa,sb,sx) Ma= Ma substring(Mx,0,la) Mb=Mb subString(Mx,lb, ) Mx=Mx (Ma Mb) (sa=sx[la:0]) (sb=sx[lx:la]) (indexOf, sa,sb,x) Ma= Ma ([x] Mb) Mb [*] Mb=Mb substrings(Ma,x,x+lb) (sa[x+lb:x]=sb) sa[i+lb:i] sb, i,0 i<x) (lastIndexOf, sa,sb,x) Ma= Ma [*] Mb ([x+lb] ] ([*] Mb [*])) Mb= Mb substrings(Ma,x,x+lb) (sa[x+lb:x]=sb) (sa[i+lb:i] s b, , i x<i la-lb) substring(sa,n,sx) Ma= Ma [n] Mx Mx= Mx substring(Ma,n,∞) sa[n+lx:n]=sx substring(sa,n,k,sx) Ma= Ma [n] Mx [*] Mx= Mx substring(Ma,n,k) Sa[n+lx:n]=sx trim(sa,sx) Ma= Ma [„˽‟,*] Mx [„˽‟,*] Mx= Mx trim(Ma) (sx[0] ˽) sx[lx-1] „˽‟) (sa[j]= „˽‟ (sa[i+lx:i]=sx) (sa[k]= „˽‟), i,j,k |(0 i la- lx) (0 j i) (i+lx k<la)) (contains, sa,sb) Ma= Ma [*] Mb [*] Mb= Mb substring(Ma,0,∞) ∨ 0 i<la-lb sa[i+lb:i]=sb (endsWith, sa,sb) Ma= Ma [*] Mb Mb=suffises(Ma) sa[la:la-lb]=sb (startsWith, sa,sb) Ma:= Ma Mb [*] Mb:=prefixes(Ma) sa[lb:0]=sb 3.3. String constraint solving based on automata In this algorithm, when the edges in the processing of the automata are modified to reflect the effects of the constraints, the nature of the modification depends on the constraints. For example, edge (contains, sa,sb) corresponds to the constraint sa.contains(sb). 1: 2: 3: 4: 5: 6: 7: 8: 9: atomatonSolver(PathCondition pc;StringGraph StringGraph sg=(I S,E, Ʊ)) for( edges si S) Automaton Mi=[li]; W=E while W≠∅ W=W\e for(si connects to e) Update Mi based on constraint e; if(Mi=∅) pc=pc freshConstranints(pc:sg); 15 10: 11: 12: 13 return(false,pc,sg); else if(Mi is modified) W=W {all edges connected to si}; return checkNegativeEdges(pc;sg;(M1,M2..)); Figure 3.1. String constraint solving based on automata For negative constraints such as !equals, !constain, !startsWith and !endsWith, assumption that M1 and M2 are two automata corresponding to two string variables s1,s2 that have to solve the constraint s1.!equals(s2). Then, there are three following cases: Case 1: Both automata guess only one word and then this constraint is satisfied when these two words are not equal. Case 2: There is only one automation that can guess a word. Then, we need to remove that word from the other automation. In this case, the constraint will also be satisfied. Case 3: In all other cases, the constraints are satisfied and no automation must be changed. The main part of the algorithm in Figure 3.6 describes formally the steps used for finding the solutions to the negative constraints. CheckNegativeEdges(pathcondition pc, StringGraph sg, (M1,M2..)) 1: i=1; M‟1=M1; M‟2=M2; M‟k=Mk; 2: while(1≤i<k) 3: if(Mi=∅) 4: Mi=M‟I; 5: i=i-1; 6: else 7: vi=a Mi; 8: Mi=Mi\{a}; 9: for( each ei E and last(e)=si) 10: if(e is violated with vi) 11: Break;\\ go back line 2 12: i=i++; 13 return; Figure 3.2. Negative constraint solving procedure Making an update to the constraints: If the string constraints are concluded as not satisfied, the new integer constraints are added to the path constraint. 3.4. Proposing string constraint solving in symbolic constraint The main steps to perform execution include: Step 1: The path constraint condition will be converted to an intermediate format. Step 2: Solve or simplify the intermediate format. Step 3: Replace integer symbol variables with conjecture specific values. Step 4: Convert intermediate format to automata or Bitvector operators. Step 5: If constraint solving on the automata or bitvector fails, repeat Step 3 with other integer specific values in the set of best integer values. 16 Step 6: Constraint solving progress ends if a satisfactory string value is found or within a limited time, no new integer value can be found, and constraint solving progress on the automata and bitvector has no matching value. . 3.4.1 String constraint modeling using graph The process of string constraint modeling using the graph is shown in the diagram below: Figure 3.3. String constraint modeling using the graph diagram The process is as follows: In both cases above, the results are evaluated by using a set of suitable assessments for each of the mentioned cases. The evaluator is used to compare the constraint solvers. 3.4.2 Discovering more string constraints in string data For a symbolic execution model, in addition to a string variable, all variables include integer variables, which are assigned run-time values collected in dynamic symbol execution. This study is not modeling for mixed constraint solving (constraint solving for mixed string and integer data). However, in string constraint modeling, the execution progress extracts integer constraints on the length of the strings. This helps to find a satisfied length assignment. Thus, integer constraints are arised on string data. Most modern constraint solvers for integer data are efficient and accurate. Hence, generating integer constraints is not interfere with execution. 3.5. Experiment and result evaluation All experiments are conducted on a Windows 10 system with 8 GB RAM, using the open source tool JavaPathFinder downloaded at và phần mềm Eclipse Java EE IDE 17 Implementation SOLVE(PathCondition pc) 1: StringGraph sg 2: (pc,sg)←BuildStringGraph(pc); 3: boolean sat←false; 4: while (!sat˄!timeout) 5: (sat,pc,sg) ←IntegerSolver(pc,sg) 6: if(sat) 7: (sat,pc,sg) ←StringSolver(pc,sg) 8: if( pc unchanged): break; 9: return sat BuildStringGraph(PathCondition pc) 11: StringGraph sg:= ∅ 12: for( String or mixed contraint c pc) 13: sg:=sg Hyperedge(c) 14: return Preprocess(pc,sg) Figure 3.4: String constraint solving algorithm The algorithm follows two these main steps Step 1: JPF constraint solver will be used for types of constraints on integer, real number, boolean, v..v.. Step 2: If Step 1 perform successfully, it will provide some specific values. These values will be loaded to the string graph (the associated values in the mixed constraint) and a string constraint solver is called to solve them. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: public static String s = ""; private static boolean Is

Các file đính kèm theo tài liệu này:

  • pdfsome_improvements_of_string_contraint_solving_in_automated_t.pdf
Tài liệu liên quan