For a symbolic execution model, in addition to a string variable, all variables include integer variables,
which are assigned run-time values collected in dynamic symbol execution. This study is not modeling for
mixed constraint solving (constraint solving for mixed string and integer data). However, in string constraint
modeling, the execution progress extracts integer constraints on the length of the strings. This helps to find asatisfied length assignment. Thus, integer constraints are arised on string data. Most modern constraint solvers for integer data are efficient and accurate. Hence, generating integer constraints is not interfere with execution.
27 trang |
Chia sẻ: honganh20 | Lượt xem: 329 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Some improvements of string contraint solving in automated test cases generation for symbolic execution, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
n.
1.6. The conclusion of chapter 1
Chapter 1 of the thesis presents the fundamental concepts of software testing techniques, basic
techniques of automatic test case generating from the models. Apart from that, the comparison of static white
box testing with dynamic white box testing is also stated. Moreover, the challenges of software testing is given
6
in this chapter. The basic aspects, overview of symbolic execution methods and unresolved problems are
presented.
CHAPTER 2. SYMBOLIC EXECUTION
AND CONSTRAINT MODELING
In this chapter, the difficulties of symbolic execution are analysed and assessed. Besides, the extended
tools, the installation of symbolic execution techniques used in improving string constraint modeling technique
are also presented. The modeling methods used in installation, evaluation of string constraint and mixed
constraint solving methods are addressed in this chapter.
The contents of this chapter are introduced in published results of PhD student, including [CT1], [CT2],
[CT3].
2.1. The difficulties and some solutions
In this section, the difficulties in applying symbolic execution are discussed. Some solutions used to overcome
the challenges are presented.
2.1.1. Path exploration
One of the biggest challenges is path exploration. It means that a large number of paths is generated
even though the tested program is small. The number of paths is often by a power function. In a certain perioid
of time, the testing program has to discover the first path that is the most suitable.
2.1.2. Memory modeling
The accuracy of instructions in program when transferring to symbolic constraints affects significantly
to the coverage of symbolic execution and the constraint sovling capability as well. For example, in order to
establish the range of integer parameters, a memory modeling technique is more effective than the normal way.
However, in this case, the accuracy of code analysis is worse. It depends on the selection of value range. For
example, math memory over flow error can cause the lack of paths in symbolic execution or mining the
impossible paths, etc.
2 .2. Symbolic execution and extended tools
2.2.1. Symbolic execution and software testing
Test case automatic generating techniques used symbolic execution are installed on Java to generate test data for
programs. However, the performance of available tools has their own disadvantages. Constraint optimization
algorithm is applied in order to reduce the execution time and guarantee the effectiveness of coverage. This
solution is implemented on different Jave programs and gets the good results. This section presents the aspects of
Java PathFinder (JPF) including the checking ability of JPF, high level architecture of JPF and extension
capability of JPF.
2.2.2. Symbolic execution on Java programming language
Introduction to Java PathFinder (JPF)
Nowadays, symbolic execution techniques is performed by various tools on different programming
languages such as DART[36], CUTE[35] on C/C++ source code or PEX[36, 37] on .NET source code. In this
thesis, path finding is set up on Java because Java is a programming language that independs on environment.
Java is also used in many important applications. Moreover, JPF is a powerful tool that support for symbolic
execution. This leads to the widely extension ability and a huge developmental community of JPF.
7
JPF is a verification set of script state model for Java [68]. Not same as normal virtual machine, JPF is a
virtual machine performing Java programs more than once. JPF executes on all branches, all possible paths of a
program. JPF checks the range of attributes such as the exceptions that can not catch through potential execution
paths.
Figure 2.1 describe the model of JPF
Figure 2.1. The model of JPF
Checking ability of JPF
JPF can check all Java programs. It can find dead key or exception. Apart from that, JPF can develop
and extend by itself to test other attributes.
High level architecture of JPF
JPF is designed by two main components, including JVM and Search (as in Figure 2.3 below). In which,
JVM is a Java typical state generation by executing Java bytecode statements. Search is used to choose the state
that JVM should process, to direct JVM in generating the next state or to request JVM to go back a previous
state.
8
Figure 2.3. High level architecture of JPF.
The main installation of Search includes Deep first search (DFSearch) and HeuristicSearch.
Extension ability of JPF
JPF is considered as a framework that each developer can extend it in order to support for a typical
purpose. JPF supplies an extension mechanism for developers in order to allow to add new functions without
changing directly the installation of Search or JVM
With this flexible extension structure, there are several extensions of JPF.
2.3. Constraint solving and symbolic execution
Related to generating test cases, genetic algorithm approach has been studied in research work [CT[1] of
PhD student. However, this approach containts many limitations.
Although many improvements of constraint solving techniques have been proposed, the constraint
solving is one of big obstacles of symbolic execution. Constraint solvers often take a high cost in run time. Thus,
symbolic execution can not extend in some kind of program which the source code requires a high performance
in constraint solving.
To overcome this challenge, the optimization step in constraint solving is necessary in symbolic
execution. This is very effective in real programs. There are two optimization methods used in modern tools,
including:
- Removing unrelated constraints
- Constraint solving using incremental methods (Increamental Solving) [71]
In constraint sets that are generated in symbolic execution, the representation for a set of static branches
from source code is one of the most important properties. In KLEE, all query results is stored in buffer mapping
the set of constraints and the typical values corresponding to those constraints. The special case happens when
there are no solution or no satisfying a special notation.
9
2.4. Mixed constraints and improvement in string constraint solving
In fact, string constraints appear in most of programs, especially in programs or procedures that clean
strings in input data in order to test the validity, to discover and to remove harmful contents. String data in Jave
is equipped by a huge library. String data conceals many errors.
To accomplish testing capability and error discovering ability in Java programs with string constraints,
the thesis focuses to the progress of defining errors in order to get the better programs. By applying software
testing on string constraints, the coverage ability increases and then the quality of software is also improved.
Why does symbolic execution on string data meet difficulty?
The operators on string data are the mix of two value domains: string and integer. For example, we need
to get the n
th
character in a string. In present, there are several solutions of symbolic execution on string data. But
these solutions have just support for a subset of these operators. The iterative approach is chosen. First, we sovle
the constraints on integer data. Then, the obtained results are used to solve the string constraints. If string
constraints are satisfied, the task is complete. Otherwise, integer constraints are added into the constraint set.
This procedure is repeated until there is no unsatisfied string constraint.
The main idea for proposing the extension of symbolic execution is presented as in Figure 2.7 below.
Figure 2.7. An example a string graph
The iterative progress is performed by two following steps:
- Step 1: Current solver of JPF is called to process integer constraints, real constraints, logic constraints,
If all these constraints is solved, the step 2 is performed.
- Step 2:
+ The typical values obtained from Step 1 are loaded in string graph.
+ The string solver is called to solve these constraints.
In this case, string solver can be AutomataSolve or BitVectorSolve. The constraints of string graph may
not satisfied with input values. Then, if there is enough time, Step 1 is run and string constraint set is
added by a new integer constraint
Solve(PathCondition pc){
1: StringGraph sg;
2: sg=BuilStringGraph(pc);
3: boolean sat=false;
4: While(!sat and !timeout){
5: sat=CurrentSolver(pc,sg);
6:
if(sat)
7: sat=StringSolver(pc,sg);
8: return sat; }
}
10
buildStringGraph(PathCondition pc) {
1: StringGraph sg ;
2: for string or mixed constraint c 2 pc:
3: sg= sg hyperedge(c)
4: return preprocess(pc; sg)
}
Figure 2.8. Mixed constraint solver algorithm
The limitation of time “timeout” is added in order to guarantee the combination of string solver and
integer solver can not give the final conclusion in a allowable time. If it reaches to “timeout”, the algorithm
stops, the given constraint is concluded as unsatisfied.
2.4.1. String graph
There are some types of constraints including string constraints, integer constraints, string and integer
mixed constraints. In the installation progress, string constraints and mixed constraints are represented by special
graph called as string graph.
Definition: String graph is a direct graph denoted by H=(X, E, Ʊ). In which, the vertex set is defined by
X=IS where I and S in separated sets (I presents for integer variables and S presents for string variables). Ʊ
presents for the operators on string variables. E presents for labeled direct edges in graph (E = E1 E2 E3 E4)
satisfying En= Ʊ X
n
.
Using graph is a popular method in constraint satisfying programming. Figure 3.1 describe an example
for string graph corresponding to constraint:
s1.trim().equals(s2)^s1.equals(s3.concat(s2))
2.4.2. Constraint re-construction
Preparing for pre-processing progress, for each string constraint or mixed constraint of path constraint
condition, considered as an edge of string graph, the graph re-constructs constraints by constraints and returned
values of operators that are mapped directly into the edges of string graph.
For example, the constraint s1.equals(s2) is added to edge (equals, s1,s2) with other operators.
The returned value can be a char, an integer or a string. An extra variable is created and represent for
result of operator. This extra variable is added as a vertex of string graph. The graph allows to add an edge as a
returned value.
In the example showed in Figure 2.7, each circle is a vertex of string graph. The dots and lines represent
for edges in this graph. Each edge is labeled by the operator and the number on each edge shows the order of
vertex in string graph corresponding to a constraint.
The string variables s1, s2 and s3 appear as the vertex, elements of S. In this graph, Ʊ = {trim, equals,
concat}. Operator trim creates an extra variable s‟ added into S. The operator is the edge (trim, s1, s‟) is labeled
by element trim of Ʊ. This edge connects s1 and s‟. The operator concat is the same. It is the second constraint
that creates extra variable s” and edge (concat, s3, s2, s”). The operator equals corresponds to two last edges.
When a string graph is completed, it is processed by two following ways:
1. Define the constraints that have no satisfied values in a simple way.
2. Add the values generated by integer solver in path constraint expression.
11
2.4.3. Pre-processing progress
When unsatisfied constraints are determined, execution progress will skip string constraint solving.
Consequently, time consuming decreases. Eliminating equals edge from string graph is one of proposals. In
above example, string graph is adjusted to new graph as shown in Figure 2.9.
Figure 2.9. String graph obtained by removing equals operators.
This also supports to define rapidly set of constraints represented in string graph that have no satisfied
value. For example (shown in Figure 2.10), after removing edge in Figure 2.10.(a), vertex s1 connects to itself by
notEquals edge. This connection is presented in Figure 2.10.(b). Then, there is not existing any string which
satisfies this constraint.
Most of proposals are introduced to deal with the constant operators. For example, the constraint
s1.concat(“xyz”) = ”vwxyz” will lead to the elimination of the complex edges. Then, vertex s1 is replaced by
string “vw”. Another example, the constraint s1.startsWith(“abc”).chatAt(0) ≠ „a‟ is not suitable and it is easy to
discover. If this situation appears in pre-processing stage, the use of string constraint solver is unnecessary and
false value is returned.
Figure 2.10. Unsatisfied constraints after removing equals operators
2.4.4. String constraint generating and the implemental results
Before the constraints are generated, new vertices are added in string graph. For each vertex that is not a
string constant SiS, an integer variable liI is added in order to represent for the length of string Si. Figure 3.4
shows that the new vertex of string graph in Figure 3.2 is connected to corresponding string vertices by dashes.
12
Figure 2.11. New vertices represent for the length of added string
Other constraints are based on the edges as shown in Table 2.1.
Table 2.1. Constraint construction for operators on string.
2.4.5. Constraint solving using automata
Constraint solvers is popular tools used to deal with many problems in reality such as theorem proving,
time scheduling in real time. Symbolic execution in automatic generating test cases is used in research works
CT[2], CT[3] by PhD student. Many researchers reduce the limitations of available constraint solvers in order to
improve the applications in symbolic execution, especially in test case generating. Although there are various
improvements, the constraint solvers are not effective with some certaint constraints.
In computer programs, string data appears frequently, especially in the program components or
procedure. The Java programming language provides a huge library of string data types but still has many errors.
In software testing, identifying bugs is concerned. In symbol execution, string constraints are also considered in
most of cases. Deterministic Finite Automaton (DFA) is used to solve string constraints based on specific
algorithms (CT[5]).
2.5. The conclusion of chapter 2
This chapter has presented the issues related to symbol execution and extension tools with an effective
role in test case automatic generation. The main content related to symbol execution includes various symbol
Java operators Heper-edge New constraint
For each constraint of vertex is not a string constant Si S li>0
Sa.charAt(n) charAt,Sa,n,c la≥n
Sa.concat(Sb) concat, Sa , Sb, Sx lx=la+lb
Sa.indexOf(Sb) indexOf, Sa, Sb,x (x=-1)˅(la≥lb+x)
Sa.lastIndexOf(Sb) lastIndexOf, Sa, Sb,x (x=-1)˅(la≥lb+x)
Sa.subString(n,k) subString,Sa,n,k,Sx la=n+lx
Sa.contains(Sb) contains,Sa,Sb la≥lb
13
execution techniques; advantages and disadvantages of these techniques and the processings that need to apply
symbolic execution in software testing progress. Chapter 2 also addresses the difficulties in implementing the
symbolic execution and several solutions are pointed out in this chapter. Constraint modeling approaches using
string graphs, constructing constraints on string graphs, and preprocessing to detect unsatisfied constraints are
presented in this chapter. The contents presented in chapter 2 are the research results from the works CT[1],
CT[2], CT[3], CT[5].
CHAPTER 3. STRING CONSTRAINT SOLVING
In this chapter, the thesis presents a proposal to improve string constraint solving in symbol execution
based on automata theory and apply it on string data type. Experimental results, evaluation of text data
processing of the improved model with some existing models are also shown in this chapter. The contents of this
chapter are published in [CT4], [CT5], [CT6] of PhD student.
3.1. String constraint solver
Many applications depend on the handling of a text data type. Since the use of the Internet has become
common with the proliferation of interactive applications, this problem becomes more serious. Interactive
applications include social media, chat, sensitive and important information management, personal information
management.
The input data of the programs is used in SQL query statements and stored in database management
systems. This allows users to access directly to the database. The reason is that the input data is open and public
and it is also open to other objects. Some objects have bad intentions, so "cleaning" the text input is a
requirement that appears mostly in web services.
The current methods of applying symbol execution on string code are divided into two groups: based on
automata [5, 53] and based on Bitvector [7].
3.2. Bitvector and satisfiability modulo theories (SMT)
3.2.1. Satisfiability modulo theories (SMT)
The study of this thesis uses the SMT Z3 constraint solver [70] constructed by Microsoft. This study also
considers to the use of CVC [25] but realizes that it is slower than Z3 because of using the SMT-lib2 standard
for the bitvector constraint. The thesis also applies alternative symbols for addressing by byte instead of by bit.
For example, a [0: 7] = 01100010 will be replaced with a [0] = 'b'. Constraint resolving by bitvector is beyond
the scope of this study.
3.2.2. String constraint solving based on BitVector
Consider the example where the edge (contains, Sa, Sb) corresponds to the constraint Sa.contains(Sb), it
is provided that the current estimate for the lengths of Sa and Sb is 5, 3 respectively, the constraint result for this
edge is:
(Sa[0]=Sb[0])˄(Sa[1]=Sb[1]) ˄ (Sa[2]=Sb[2])
∨((Sa[1]=Sb[0]) ˄ (Sa[2]=Sb[1]) ˄ (Sa[3]=Sb[2])
∨((Sa[2]=Sb[0])˄(Sa[3]=Sb[1])˄(Sa[4]=Sb[2])
If the platform of constraint solver supports the array type, it can be simplified to
(Sa[0:2]=Sb)∨(Sa[1:3]=Sb) ∨(Sa[2:4]=Sb). The process of compiling the edges of the string graph into the
BitVector constraints is given in the second column of Table 3.1
14
Table 3.1. Construction of constraint corresponding to string operators.
Edge Automation formula Bitvector constraint
(charAt,sa,n,x) Ma= Ma [n] {x} [*] Sa[n]=x
(concat,sa,sb,sx) Ma= Ma substring(Mx,0,la)
Mb=Mb subString(Mx,lb, )
Mx=Mx (Ma Mb)
(sa=sx[la:0]) (sb=sx[lx:la])
(indexOf, sa,sb,x) Ma= Ma ([x] Mb) Mb [*]
Mb=Mb substrings(Ma,x,x+lb)
(sa[x+lb:x]=sb)
sa[i+lb:i] sb, i,0 i<x)
(lastIndexOf,
sa,sb,x)
Ma= Ma [*] Mb ([x+lb]
] ([*] Mb [*]))
Mb= Mb substrings(Ma,x,x+lb)
(sa[x+lb:x]=sb) (sa[i+lb:i] s b, , i
x<i la-lb)
substring(sa,n,sx) Ma= Ma [n] Mx
Mx= Mx substring(Ma,n,∞)
sa[n+lx:n]=sx
substring(sa,n,k,sx) Ma= Ma [n] Mx [*]
Mx= Mx substring(Ma,n,k)
Sa[n+lx:n]=sx
trim(sa,sx) Ma= Ma [„˽‟,*] Mx [„˽‟,*]
Mx= Mx trim(Ma)
(sx[0] ˽) sx[lx-1] „˽‟)
(sa[j]=
„˽‟ (sa[i+lx:i]=sx) (sa[k]=
„˽‟), i,j,k |(0 i la-
lx) (0 j i) (i+lx k<la))
(contains, sa,sb) Ma= Ma [*] Mb [*]
Mb= Mb substring(Ma,0,∞)
∨ 0 i<la-lb sa[i+lb:i]=sb
(endsWith, sa,sb) Ma= Ma [*] Mb
Mb=suffises(Ma)
sa[la:la-lb]=sb
(startsWith, sa,sb) Ma:= Ma Mb [*]
Mb:=prefixes(Ma)
sa[lb:0]=sb
3.3. String constraint solving based on automata
In this algorithm, when the edges in the processing of the automata are modified to reflect the effects of
the constraints, the nature of the modification depends on the constraints. For example, edge (contains, sa,sb)
corresponds to the constraint sa.contains(sb).
1:
2:
3:
4:
5:
6:
7:
8:
9:
atomatonSolver(PathCondition pc;StringGraph StringGraph sg=(I S,E, Ʊ))
for( edges si S)
Automaton Mi=[li];
W=E
while W≠∅
W=W\e
for(si connects to e)
Update Mi based on constraint e;
if(Mi=∅)
pc=pc freshConstranints(pc:sg);
15
10:
11:
12:
13
return(false,pc,sg);
else if(Mi is modified)
W=W {all edges connected to si};
return checkNegativeEdges(pc;sg;(M1,M2..));
Figure 3.1. String constraint solving based on automata
For negative constraints such as !equals, !constain, !startsWith and !endsWith, assumption that M1 and
M2 are two automata corresponding to two string variables s1,s2 that have to solve the constraint s1.!equals(s2).
Then, there are three following cases:
Case 1: Both automata guess only one word and then this constraint is satisfied when these two words
are not equal.
Case 2: There is only one automation that can guess a word. Then, we need to remove that word from the
other automation. In this case, the constraint will also be satisfied.
Case 3: In all other cases, the constraints are satisfied and no automation must be changed. The main
part of the algorithm in Figure 3.6 describes formally the steps used for finding the solutions to the negative
constraints.
CheckNegativeEdges(pathcondition pc, StringGraph sg, (M1,M2..))
1: i=1; M‟1=M1; M‟2=M2; M‟k=Mk;
2: while(1≤i<k)
3: if(Mi=∅)
4: Mi=M‟I;
5: i=i-1;
6: else
7: vi=a Mi;
8: Mi=Mi\{a};
9: for( each ei E and last(e)=si)
10: if(e is violated with vi)
11: Break;\\ go back line 2
12: i=i++;
13 return;
Figure 3.2. Negative constraint solving procedure
Making an update to the constraints: If the string constraints are concluded as not satisfied, the new integer
constraints are added to the path constraint.
3.4. Proposing string constraint solving in symbolic constraint
The main steps to perform execution include:
Step 1: The path constraint condition will be converted to an intermediate format. Step 2: Solve or simplify the
intermediate format.
Step 3: Replace integer symbol variables with conjecture specific values.
Step 4: Convert intermediate format to automata or Bitvector operators.
Step 5: If constraint solving on the automata or bitvector fails, repeat Step 3 with other integer specific values in
the set of best integer values.
16
Step 6: Constraint solving progress ends if a satisfactory string value is found or within a limited time, no new
integer value can be found, and constraint solving progress on the automata and bitvector has no matching value.
.
3.4.1 String constraint modeling using graph
The process of string constraint modeling using the graph is shown in the diagram below:
Figure 3.3. String constraint modeling using the graph diagram
The process is as follows:
In both cases above, the results are evaluated by using a set of suitable assessments for each of the
mentioned cases. The evaluator is used to compare the constraint solvers.
3.4.2 Discovering more string constraints in string data
For a symbolic execution model, in addition to a string variable, all variables include integer variables,
which are assigned run-time values collected in dynamic symbol execution. This study is not modeling for
mixed constraint solving (constraint solving for mixed string and integer data). However, in string constraint
modeling, the execution progress extracts integer constraints on the length of the strings. This helps to find a
satisfied length assignment. Thus, integer constraints are arised on string data. Most modern constraint solvers
for integer data are efficient and accurate. Hence, generating integer constraints is not interfere with execution.
3.5. Experiment and result evaluation
All experiments are conducted on a Windows 10 system with 8 GB RAM, using the open source tool
JavaPathFinder downloaded at
và phần mềm Eclipse Java EE IDE
17
Implementation
SOLVE(PathCondition pc)
1: StringGraph sg
2: (pc,sg)←BuildStringGraph(pc);
3: boolean sat←false;
4: while (!sat˄!timeout)
5: (sat,pc,sg) ←IntegerSolver(pc,sg)
6: if(sat)
7: (sat,pc,sg) ←StringSolver(pc,sg)
8: if( pc unchanged): break;
9: return sat
BuildStringGraph(PathCondition pc)
11: StringGraph sg:= ∅
12: for( String or mixed contraint c pc)
13: sg:=sg Hyperedge(c)
14: return Preprocess(pc,sg)
Figure 3.4: String constraint solving algorithm
The algorithm follows two these main steps
Step 1: JPF constraint solver will be used for types of constraints on integer, real number, boolean, v..v..
Step 2: If Step 1 perform successfully, it will provide some specific values. These values will be loaded to the
string graph (the associated values in the mixed constraint) and a string constraint solver is called to solve them.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
public static String s = "";
private static boolean Is
Các file đính kèm theo tài liệu này:
- some_improvements_of_string_contraint_solving_in_automated_t.pdf