INTRODUCTION. 1
1. The urgency of this thesis. 1
2. Research aim . 1
3. Research object and area . 1
4. Research outlines and methodology. 1
5. Thesis layout. 2
CHAPTER 1: THEORETICAL BASIS. 2
1.1. Definition and characteristics of IoT devices. 2
1.2. Definition of IoT botnet. 3
1.3. The evolution of IoT botnet. 3
1.4. Comparison between traditional botnet and IoT botnet . 3
CHAPTER 2. IOT BOTNET MALWARE DETECTION METHOD . 4
2.1. Comparison of static and dynamic analysis . 4
2.2. Evaluation of IoT botnet detection methods based on static analysis . 5
2.2.1. Constructing dataset for experimental. 6
2.2.2. Experimental results and discussions . 7
CHAPTER 3. PSI GRAPH FEATURE FOR DETECTION OF IOT BOTNET. 8
3.1. Statement of the problem. 8
3.2. Explaination of the problem. 8
3.3. Proposed method . 8
3.4. Function call graph in IoT botnet malware detection. 9
3.5. PSI Graph construction . 11
3.6. Experimental evaluation. 13
3.6.1. Experimental environment . 13
3.6.2. Evaluation model. 13
3.6.3. Experimental results and discussion. 14
CHAPTER 4. PSI-ROOTED SUBGRAPH FEATURE IN DETECTING IOT BOTNET . 16
4.1. Statement of the problem. 16
4.2. Building PSI-rooted subgraph feaure . 16
4.3. Experiment and evaluate the results. 18
30 trang |
Chia sẻ: honganh20 | Ngày: 04/03/2022 | Lượt xem: 351 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Researching and proposing psi graph as a feature for botnet detection on iot devices, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
n algorithm is different. The purpose is to assess the reliability and accuracy
of studies with the same dataset described in section 2.2.1.
Table 2.3. Experimental results of static features approach in IoT malware detection
Static features
approaches
Classifier Accuracy
FPR (False
Positive
Rate)
FNR (False
Negative
Rate)
Features
extracting
and pre-
processing
time
Classification
time
ELF-header [96]
RIPPER 99,8 0,2 0,2
1h50m
0,75s
PART 99,8 0,2 0,2 1,27s
DT (J48) 99,6 0,5 0,3 1s
String-based [70]
SVM 98 0,9 2,2
4m47s
12,4s
kNN 99,8 0,4 0,2 1s
DT (J48) 99,4 0,4 0,6 8,75s
RF 99.7 0,3 0,4 9,71s
Image-based
[25]
Neural
Network
89,1 12,7 1,4 14m19s 2m19s
CFG-based [32]
SVM 89 33,8 4,4
5 days
1,45s
LR 85 15,1 19,0 0,5s
RF 95 7,5 5,9 1,75s
From the result table 2.3, it can be seen that the studies of representing the executable using non graph-
based features is heavily depend on the value of the features (for example, function call inet_toa) and cannot
describe complex semantic information between features (for example, data dependency in the life cycle of an
IoT malware capable of a distributed denial of service attack, referred to as the IoT botnet). Besides, studies
using non graph-based features are often quite weak with obfuscation techniques such as encoding, data
insertion. Meanwhile, the graph-based approach can generally evaluate and represent structured information,
complex information of botnets behavior.
8
Chapter 2 conclusion: The results of this Chapter provide motivation for the proposed methods of the
thesis with the possible of the static analysis in IoT botnet malwares detecting problem. Moreover, graph-based
features bring high efficiency and prospects in detecting IoT botnet malware.
Contributions of Chapter 2: Evaluate and compare the difference between botnet malware on
traditional computers and IoT devices, thereby serving as a basis to propose a suitable static analysis method
for detecting IoT botnet malware; Building a reliable dataset for experiment in IoT botnet malware detection;
Re-experiment and evaluation of current studies based on static analysis with the same dataset and
experimental environment. These results have been published and presented in the Proceedings of Conferences
and prestigious journals (at [B3], [B4], [B5] in the author's list of works).
CHAPTER 3. PSI GRAPH FEATURE FOR DETECTION OF IOT BOTNET
3.1. Statement of the problem
The research problem in this chapter is defined as:
- Let 𝐿 = {𝑙1, 𝑙2, , 𝑙𝑛}, the set of 𝑛 executable files, in which 𝑙𝑖 ∈ {0,1} can be malicious executable
files (value 1), or benign executable files (value 0) with 𝑖 = 1, 𝑛̅̅ ̅̅̅
- Let 𝐹 = {𝑓𝐴𝑙ℎ𝑎𝑛𝑎ℎ𝑛𝑎ℎ, 𝑓𝑆𝑢, 𝑓𝐻𝑎𝑑𝑑𝑎𝑑, 𝑓𝐴𝑧𝑚𝑜𝑜𝑑𝑒ℎ𝑃𝑎𝑗𝑜𝑢ℎ, 𝑓Alasmary }, a set of feature for the detection of
botnet in IoT devices and have good results in recent years. Therefore, ∃𝑓𝑇𝑟𝑢𝑛𝑔 ∉ 𝐹 such that 𝑓𝑇𝑟𝑢𝑛𝑔(𝐿) is
simpler than 𝑓𝑗 ∈ 𝐹, in terms of the graph structure, the simpler is quantified through the number of edges and
the number of graph vertices. Although simpler in terms of structure, but 𝑓𝑇𝑟𝑢𝑛𝑔 gives better results than
𝑓𝑗 ∈ 𝐹 in terms of accuracy, execution time.
3.2. Explaination of the problem
The thesis chooses an approach based on static analysis in detecting botnet malware on IoT devices.
Currently, there have been studies following this approach, such as Alhanahnah et al. [4], Su et al. [25],
HaddadPajouh et al. [14], Azmoodeh et al. [36], Hisham Alasmary et al. [33]. Specifically, Mohannad
Alhanahnah et al. Combine various static features such as strings, control flow graph (CFG) and file structure
statistics to generate the signatures used for classification of multi-architecture IoT malware. Su et al. proposed
a lightweight method to distinguish IoT malicious patterns from IoT benign patterns based on grayscale
images, and by feeding these gray-scale images into the convolutional neural network model to detect IoT
malware. Hamed HaddadPajouh et al., Azmoodeh et al. Proposed a method of detecting IoT malware using
opcode sequences. Hisham Alasmary et al. Performed an in-depth study of the graph of Android malware and
IoT botnet. With a detailed description of the characteristics of typical studies in static analysis to detect botnet
malware on IoT devices, the new feature in the statement of the research problem in this Chapter will take
advantage of the strong points. as well as solving the limitations of existing features, thereby bringing high
efficiency in the problem of detecting IoT botnet malware with machine learning and deep learning algorithms.
3.3. Proposed method
The research problem in this Chapter of the thesis will follow the following assumptions: The basic
difference between botnet malware and other types of malicious code is that botnet always need a connection
to C&C server to send/receive attack command from hacker. The infection and attacks of botnet malware on
IoT devices have been studied a lot, and found that they often follow the general process. Each step in the life
9
cycle of the IoT botnet malware usually involves information represented in the form of strings such as hacker
instruction commands, IP addresses / domain names of C&C servers, etc.
Before going into the proposed methodology explaination, to better understand the problem, the
following definitions are specified in this thesis.
Definition 3.1: A function-call graph is a directed graph, represented by 𝐺 = (𝑉, 𝐸). Where 𝑉 is the
set of vertices 𝑉 = 𝑉(𝐺) representing functions and the set of edges 𝐸 = 𝐸(𝐺), where 𝐸(𝐺) ⊆ 𝑉(𝐺) × 𝑉(𝐺),
corresponding to the function calls. For each vertex 𝑣 ∈ 𝑉, the two defined functions 𝑉𝑛(𝑣) và 𝑉𝑓(𝑣) provide
the function name and function type of the function represented by 𝑣. The function type 𝑡 ∈ {0,1} can be a
local function (value 0) or an extension function (value 1).
Definition 3.2: A Printable String Information (PSI) is a printable string of information appearing in
the executable, either explicitly (eg “10.1.1.2”) or encrypted (eg “eGAIM”).
Definition 3.3: The PSI graph is a directed graph, represented as follows 𝐺𝑃𝑆𝐼 = (𝑉, 𝐸), where:
– 𝑉 is the vertices set composed of function which are in function-call graph and contain PSIs,
– 𝐸 is the set of directed edges {(𝑉𝑖,𝑉𝑗), (𝑉𝑘,𝑉ℎ), } , reflecting the caller-callee relationship
between two functions
To prove the above hypothesis and answer the proposed research problem, the proposed method in this
thesis has the following structure diagram:
Figure 3.1. The workflow of proposed method to detect IoT botnet malware
The thesis provides a general model of the proposed method, including 02 main processing phases:
training phase and utilizing phase, illustrated in figure 3.1. In particular, the training phase and the use phase
have a relatively similar treatment process, only different in Class Classification. With the input data being
executable files on the IoT device, including malicious and benign files, the implementation process consists
of four steps as follows: FCG graph generation, PSI graph generation , Preprocess and Feature Selection phase
and finally Classifier.
3.4. Function call graph in IoT botnet malware detection
A Function Call Graph (FCG) is a control flow graph, which represents the relational call between
functions or subfunctions in an executing program. The formal definition is shown as definition 3.1.
10
Before constructing the function call graph, it is necessary to check and pre-process the defense
techniques of the executable files to ensure the correctness of the function call graph. To check whether the
files use encapsulation techniques, the thesis uses the DiE tool (Detect It Easy) [115] to check whether the files
are packed and if so, what is the used packaging technique. Analysis on the thesis's test the dataset of 10010
samples found that only about 2% of the samples used obfuscation techniques, and the vast majority were UPX
packaging techniques. After a packer has been identified, there are many tools that support unpacking, such as
the UPX tool [121]. The executables that cannot be unpacked using the UPX tool will be removed from the
dataset. After the unpack process, the thesis used IDA Pro as a tool to support decompilation because it is a
cross-platform support tool. After performing the decompiling of the binary files with IDA Pro tool, the thesis
obtained the assembly code of the file. The algorithm for building function call graph (algorithm 3.1) is
deployed by the thesis inheriting from the research of Ming Xu et al. [108].
Algorithm 3.1: Constructing the Function Call Graph – FCG
Input: Functions of executable file 𝐹,
Output: The function call graph 𝐺𝐹 of the executable file 𝐹
// Initialization
1: 𝑮𝑭.𝑉 = 𝝓 and 𝑮𝑭.𝐸 = 𝝓
2: EntryFuncSet = 𝝓, FuncSet = 𝝓, FuncQ = 𝝓, VerSet = 𝝓
// Extracting functions from assembly code
3: FuncSet = SplitFuncs(𝐹)
4: EntryFuncSet = IdentifyEntryPointFuncs(𝑀)
5: FuncQ = InitQ(EntryFuncSet)
// Building a caller-callee relationship
6: while(FuncQ is not empty)
7: baseVertex = Dequeue(FuncQ)
8: Insert baseVertex in 𝑮𝑭
9: baseVertex.enQFlag = true
//Extracting the callee of set baseVertex
10: VerSet = getCallee(baseVertex)
11: for each vertex in VerSet
12: if((vertex ∩ FuncSet) ≡ 𝝓) // The vertices are not in FuncSet
13: continue
14: endif
15: headVertex = vertex
16: // Build the connecting edge between baseVertex and headVertex
17: if(𝑒 ∈ 𝑮𝑭.𝐸)
18: baseVertex.outDeg++
19: headVertex.inDeg++
20: else
21: Insert headVertex in 𝑮𝑭
22: Insert edge 𝑒 in 𝑮𝑭
23: endif
24: if(headVertex.enQFlag == false)
25: Enqueue headVertex in FuncQ
26: headVertex.enQFlag = true
27: endif
28: next vertex
29: end while
30: return 𝑮𝑭
31: end
11
The Call Graphs are still highly complex due to the large number of vertices and edges, and are often
expensive to compute and to store [97]. If the complexity of a graph is based on the number of edges and
vertices then the complexity will be 𝛰(|𝑉| ∗ |𝐸|) where |𝑉| is the number of edges and |𝐸| is the number
of vertices. Therefore, based on the function call graph, the thesis aims to build a new graph feature with high
efficiency (low complexity when it is possible to reduce the number of vertices and edges of the graph feature
but still ensures high detection rate) in the problem of detecting IoT botnet malware when applied to machine
learning and deep learning techniques.
3.5. PSI Graph construction
Before building the PSI graph (definition 3.3), the thesis extracts all PSI (definition 3.2) existing inside
the executable file with a plugin code of the IDAPro tool. Balancing the accuracy of classification results and
computational complexity, the thesis chooses PSI functions with a minimum length of 3 characters or more.
These PSIs can be in either explicit or encrypted form and often contain a lot of semantic information relevant
to the attacker's intent.
After constructing the function call graph, as well as identifying vertices containing PSI, the dissertation
proceeds to browse the function call graph to construct PSI graph, the implementation process is as in algorithm
3.2.
Algorithms 3.2: PSI-Graph Generation (FCG)
1 𝑉 = [ ], 𝐸 = [ ]
2 For each vertice 𝑣𝑖 in FCG do:
3 If exist psi in 𝑣𝑖 and do:
4 𝑉 = 𝑉 ∪ 𝑣𝑖
5 End if
6 For each edge 𝑒𝑗(𝑣𝑖, 𝑣𝑘) do:
7 If exist psi in 𝑣𝑘 and 𝑣𝑘 ∉ 𝑉 and 𝑒𝑗(𝑣𝑖 , 𝑣𝑘) ∉ 𝐸 do:
8 𝑉 = 𝑉 ∪ 𝑣𝑘
9 𝐸 = 𝐸 ∪ 𝑒𝑗(𝑣𝑖 , 𝑣𝑘)
10 End If
11 Enf for
12 End for
13 Return 𝑉, 𝐸
The PSI graphing process is based on trimming FCG graph to reduce the number of edges and the
number of vertices, so the complexity of the PSI graph generation algorithm is 𝑂(|𝑉| ∗ |𝐸|) as well. will
decrease. Table 3.1 shows the size comparison between PSI graph and function call graph. As can be seen, the
PSI graph has a much smaller size than the function call graph in terms of the number of vertices and edges in
both malicious and malicious files. Therefore, using the PSI graph as featured to detect malicious code can
reduce complexity (increase processing speed, reduce computation time cost) compared to using function call
graph.
Table 3.1. Comparison between the PSI graph and the call graph of the FCG function
Class Average number
of vertices in PSI
graph
Average number
of edges in PSI
graph
Average number
of vertices in
FCG
Average number
of edges in FCG
Maliciousness 147.1 1110.5 254.5 3075.5
Benignness 167.8 1693.9 530.9 2962.2
12
As can be seen in Figure 3.2, the number of vertices in PSI graph is concentrated mainly in the range [1,
300] for both malicious and benign files. Although there is a slight difference in distribution, this difference is
not obvious enough to establish a threshold value to distinguish between benign and IoT malicious samples.
Figure 3.2. Number of edges and vertices between sample patterns
In order to easily visualize the operation results of the PSI graph generation algorithm, Figure 3.3 shows
an example of the function call graph of the Linux.Bashlite pattern, it can be clearly seen that the PSI graph is
much simpler than the graph function call. On average, a PSI graph contains only about 16 vertices and 60
edges compared to the 156 vertices and 360 edges of the function call graph.
Figure 3.3. Function call graph (left) and PSI graph (right) of Linux.Bashlite malware sample
In summary, the PSI graph characteristics obtained by the thesis have the following characteristics:
- Be built based on static method;
- Can reflect "lifecycle behavior" or can be called as simulation of infection process of IoT botnet
malware;
- Only consider the structure of printable string information (PSI), not consider the value of the strings;
13
- Be built based on function call graph.
3.6. Experimental evaluation
3.6.1. Experimental environment
Using the experimental data set presented in section 2.2.1 of this thesis summary, to conduct the
experiments, the thesis divides the dataset into two subset: training set and testing set. The training set contain
an equal number of 2690 samples for both the malicious and the benign classes. The test subset contains 4630
samples. The experiment is built with Python and PyTorch framework on Ubuntu 16.04 operating system using
Intel Core i5-8500, 3.0GHz chip, NVIDIA GeForce GTX1080Ti graphics card and 32 GB RAM.
3.6.2. Evaluation model
To evaluate the effectiveness of PSI graph features in the IoT botnet malware detection problem, the
thesis feeds PSI graph features into the evaluation model as shown in Figure 3.4. The thesis aims at approach
based on the analysis and representation of the entire structure of the PSI graph into fixed-length numerical
vector values, so the thesis uses graph2vec [39] in the data preprocessing process.
Figure 3.4. Evaluation model of detecting IoT botnet malware using PSI Graph
Graph2vec is an unsupervised learning technique for converting a graph into a digital vector. Graph2vec
is based on the idea of a doc2vec approach [82] using the skip-gram network. Graph2vec learns to represent
graphs by treating an entire graph as a text and subgraphs as the words that make up that text.
Thuật toán 3.3: Graph2vec (𝒢, 𝐷, 𝛿, 𝔢, 𝛼)
Input: 𝒢 = {𝐺1, 𝐺2, , 𝐺𝑛}: Set of graphs such that each graph 𝐺𝑖 = (𝑉𝑖, 𝐸𝑖 , 𝜆𝑖) for which
embedding have to be learnt
𝐷: Maximun degree of rooted subgraphs to be considered for learning embeddings.
This will produce a vocabulary of subgraphs, 𝑆𝐺𝑣𝑜𝑐𝑎𝑏 = {𝑠𝑔1, 𝑠𝑔2, } from all the
graphs in 𝒢
𝛿: number of dimensions (embedding size)
𝔢: number of epochs
𝛼: Learning rate
Output: Matrix of vector representation of graphs Φ ∈ ℝ|𝒢| × 𝛿
1: Initialization: Sample Φ from ℝ|𝒢| × 𝛿
2: for 𝔢 = 1 to 𝔢 do
3: 𝜔 = 𝑆h𝑢𝑓𝑓𝑙𝑒(𝒢)
4: for each 𝐺𝑖 ∈ 𝜔 do
5: for each 𝑣 ∈ 𝑉𝑖 do
6: for 𝑑 = 0 to 𝐷 do
14
7: 𝑠𝑔𝑣
(𝑑)
:= GetWLSubgraph(𝑣, 𝐺𝑖 , 𝑑)
8: 𝒥(Φ) = − log Pr( 𝑠𝑔𝑣
(𝑑)|Φ(𝒢))
9: Φ = Φ − 𝛼
𝜕𝒥
𝜕Φ
10: Return Φ
The working principle of graph2vec is as follows: the entire graph is treated as a document, then the
subgraphs in the graph in question are treated as sentences where each vertex in the graph is processed as a
word. Then the document is built by using the graph traverse technique. Once the document has been built,
use the skipgram technique to represent this graph. Due to having to predict subgraphs, that is, graphs with
similar subgraphs and similar structures have similar embedding. The result of this step is a set of one-hot
vectors of arbitrary length representing the set of graphs. In the proposed study, the thesis presents PSI graphs
as numerical vectors of 1024 length and used for later classification. The data collected after the PSI graph
preprocessing step will be used to decide whether a file is malicious using the deep neural network classifier.
To build convolutional neural networks, the thesis inherits the network model proposed by Kim [75]. The first
layer of the neural network is the input layer, the next layer performs convolution operations using multiple
filter sizes. The output of this class is passed to a nonlinear function, called the ReLU trigger, defined as 𝑓(𝑥) =
max(0, 𝑥), because the ReLU trigger has a simpler computation. compared with the sigmoid activation
function (this usually requires an exponential computational complexity) [100]. Next, the max-pooling class
is used to reduce the data dimension from the convolutional layer, so the complexity and computational
resources of the processing can be reduced and data scalable. Finally, the fully connected layer performs
subclassing the outputs generated from the convolution layer and the pooling class.
3.6.3. Experimental results and discussion
In order to evaluate the effectiveness of features of PSI graph in detecting IoT botnet malware, the thesis
experimented and gave a result table in which focus on 02 features: PSI graph and FCG graph features with
Measurement metrics include accuracy, FNR, FPR and cost of processing time.
Table 3.2. The results of detecting IoT botnet malware by PSI graph and function call graph
Metric
Features
Accuracy
(%)
FNR
(%)
FPR
(%)
Time (m)
PSI-graphs 98,7 1,83 0,78 88
FCGs 95,3 5,81 4,13 545
From the results in Table 3.2, it can be seen that the proposed method using PSI graph features performs
better than the function call graph. The results showed that the proposed method achieved 1.7% higher
accuracy than using the call graph, and the execution time was also 457 minutes less. Besides, the false negative
rate (false nagative/false elimination rate) in the proposed method is 1.83% while the FCG method is 5.81%.
Meanwhile, with malware detection problems, the lower the false negative rate, the lower the classifier
misdetecting the malicious code as benign files. Besides, the proposed method of the thesis still has a very
small rate of error in wrongly labeling benign files as malicious code. This occurs in some benign files having
a PSI graph structure similar to that of some Linux.Bashlite malware samples. Manually analyzing those
sample sets found that the different executables, the FCG graph and the resulting assembly code were different
but still had the same PSI graph structure. However, this false detection rate is only 0.78%, a very small
percentage..
Table 3.3. Comparison between the IoT botnet detection methods
Methods Algorithms Dataset Accuracy (%)
Su et al. [25] Deep neural network (CNN) 95.13
15
Methods Algorithms Dataset Accuracy (%)
HaddadPajouh et
al. [14]
Recurrent neural network (RNN)
Dataset described in
section 2.2.1 includes
6943 samples (of
which 3098 botnet
from IoTPOT)
97.88
PSI-Graph Deep neural network (CNN) 98.7
From the result table 3.3, it can be seen that the research methods of Su et al. [25], HaddadPajouh et al
[14] all showed promising results. Although the results of the current studies are promising, the lack of test
data sets and the source code of the test models makes retesting and evaluating them quite difficult. This thesis
tries to rebuild those methods through the materials, published articles of the above methods. The results
showed that the proposed method of the thesis achieved better accuracy than that of Su and HaddadPajouh at
3.57% and 0.82%, respectively.
Table 3.4. Evaluation over-fitting
Methods Algorithms Dataset Accuracy (%)
PSI-Graph Deep neural network (CNN)
Dataset described in section
2.2.1 includes 10,010
samples (of which 6165
botnet IoTPOT and
VirusShare)
97,8
Finally, over-fitting problems often occur with deep learning algorithms. This occurs when the model
too matches the training data set but does not perform well when it executes it on the extended subsets. To
evaluate the over-matching problem in the proposed model, the thesis added 3067 malicious code samples
collected from VirusShare to the test set and recalculated the accuracy. As shown in Table 2.4, when adding
malicious code samples from VirusShare to the sample data set, the detection accuracy of malicious code
decreased slightly (down 0.9%). Thus, from the experimental results, the thesis finds that the proposed method
achieves good results in detecting IoT malware, and at the same time solving the problem of over-fitting in the
acceptable range.
Conclusion Chapter 3
Based on the analysis and evaluation of the characteristics of the IoT botnet malware and in order to
solve the limitations of previous studies in detecting the botnet IoT malware based on the feature of the graph
structure, the thesis proposed a high-level feature-based light approach, called the PSI graph, to detect the IoT
botnet malware. The proposed method of mining the life cycle of IoT botnet malware to generate PSI graph
characteristics, applying the advantages of deep learning method to achieve accuracy up to 98.7% with the
same degree of overlap in the handicap range. received with the problem of detecting IoT botnet malware.
However, the proposed method only focuses on exploiting the overall structure of the PSI graph, and still has
a rather large time cost complexity.
Contributions of Chapter 3
Proposing a new feature with a graph structure, effective in detecting multi-architectural botnet malware
on IoT devices, called PSI graph. The research results have been published and presented in the Proceedings
of Conferences and prestigious journals domestically and internationally (at [B1], [B6], [B7] in the list of
works of the author).
16
CHAPTER 4. PSI-ROOTED SUBGRAPH FEATURE IN DETECTING IOT BOTNET
4.1. Statement of the problem
The method of detecting IoT botnet malware based on PSI graph features has shown high feasibility and
efficiency. However, this proposed method focuses on exploiting the overall structure of the PSI graph and
does not exploit the paths in the PSI graph, in other words the method focuses on considering the PSI graph as
a graph. application. The fact that the growing trend of botnet malware executables on IoT devices is getting
more and more complex is the fact that the structure of the PSI Graph will also be complex. Meanwhile, the
malicious behaviors that often appear in the life cycle of the IoT botnet malware can be the paths in the PSI
graph, illustrated in Figure 4.1, it can be the green or red paths, while the other routes are redundant data. Based
on that, the research problem of this Chapter is stated as follows: Building a new feature based on PSI graph
features, but focusing on exploring paths in PSI graphs, thereby building the characteristic. Displaying a new
graph, called PSI-rooted subgraph representing malicious behavior of IoT botnet malware, improving
efficiency of detecting IoT botnet malware with simple machine learning algorithms.
Figure 4.1. Illustration the problem idea using a PSI-rooted subgraph
4.2. Building PSI-rooted subgraph feaure
Definition 4.1 (PSI-rooted subgraph): Let 𝐺𝑠𝑔 = (𝑉, 𝐸, 𝜃, 𝑑) represents an acyclic directed PSI-
Rooted sub-graph that is generated from 𝐺𝑃𝑆𝐼 rooted at vertex 𝜃; where 𝑉 𝜖 𝐺𝑃𝑆𝐼 is the set of vertexes whereas
the length between (𝜃, 𝑉𝑖) satisfy 0 ≤ (𝜃, 𝑉𝑖) ≤ 𝑑, and E is a set of directed edges between vertexes in 𝑉.
After building
Các file đính kèm theo tài liệu này:
- researching_and_proposing_psi_graph_as_a_feature_for_botnet.pdf