The experiment was performed on 3 data sets taken from
Pediatrics Department A, B of Bach Mai Hospital 2 from March
10, 2020 to March 14, 2020. Data were collected and pre-treated,
with each data set including 3 conditional index attributes,
namely disease symptoms, including fever, cough, runny nose
and 2 decision index attributes: the treatment regimen, fever virus
level were monitored over 4 days.
26 trang |
Chia sẻ: honganh20 | Ngày: 07/03/2022 | Lượt xem: 331 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Mining decision laws on the data block, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
e desire to contribute of the thesis.
2. The objective of the thesis
The objective of the thesis focus on solving three problems:
- To find decision laws on the data block and block’s
slice.
- To find decision laws bettwen object groups on the
block which has index attribute value change, particulary when
smoothing, or roughing attribute value.
- To find decision laws bettwen object groups on the
block when adding or removing block’s elements.
3. Layout of the thesis
The layout of the thesis consists of the introduction and
the three chapters of content, the conclusions and the references.
2
Chapter 1 presents the basic concepts of data block, data
mining, mining decision laws and equivalent relationship.
Chapter 2 presents two research results: the first is to
propose the MDLB algorithm to find decision laws on the block
and block’s slice. The second is to propose the MDLB_VAC
algorithm to find decision laws on the block in case the attribute
value changes. In addition, giving theoretical studies on block
mining, calculating complexity and setting test proposed
algorithms.
Chapter 3 builds a model to increase or decrease the
object set of decision blocks; proposes two incremental
algorithms MDLB_OSC1 and MDLB_OSC2 to find decision
laws when block’s object set changes and test setting.
CHAPTER 1. SOME BASIC KNOWLEDGE
1.1. Data mining
1.1.1. Definition of data mining
Data mining is the main stage in the process of
discovering knowledge in the database. This process’s output is
the latent knowledge from data to help forecast, making decisions
in business, management, production activities, ...
1.1.2. Some data mining techniques
- Classification.
- Prediction.
- Association Rule.
- Clustering.
1.2. Mining decision laws
3
1.2.1. Information System
Definition 1.1 (Informattion system)
Information system is a set of four S = (U, A, V, f) where
U is a finite object set, different than empty objects (U is also
known as the set of universe) and A is the finite and non-empty
attributes set; V is the values set, in which where V=⋃ VAa∈A , Va
is the value set of the attribute a A, f is the information function
f: U x A→V, where a A, u U: f(u,a) Va
1.2.2. Indiscernibility Relation
Given the information system S = (U, A, V, f), for each
attribute subset P A, there exists a binary relations on U, denoted
IND (P), defined as follows:
IND(P) = {(u,v) U x U|u(a) = v(a), a P)
IND (P) is called an Indiscernibility.
1.2.3. Decision table
Decision table is a special information system in which
attribute set A is divided into two separate non-empty sets C and
D, ( , )A C D C D= = , respectively called conditional
attribute set C and decision attribute set D.
The decision table is denoted as: DS = (U, C D, V, f)
or simply DS = (U, C D).
1.2.4. Decision law
Definition 1.4 (Decision law)
Given the decision table DS = (U, CD), suppose U/C
= {C1, C2, , Cm} and U/D = {D1, D2, , Dn} are the partitions
generated by C, D. For Ci U/C, Dj U/D, a decision law is
presented as: Ci → Dj , i=1..m, j=1..n.
1.3. Model of data block
4
1.3.1. The block.
Definition 1.8
Let R = (id; A1, A2,..., An) is a finite set of elements, where
id is non-empty finite index set, Ai (i=1.. n) is the attribute. Each
attribute Ai (i=1.. n) there is a corresponding value domain
dom(Ai). A block r on R, denoted r(R) consists of a finite number
of elements that each element is a family of mappings from the
index set id to the value domain of the attributes Ai (i = 1.. n).
t r (R) t = { ti : id → dom (Ai)}i=1.. n.
The block is denoted by r(R) or r(id; A1, A2,..., An), sometime
without fear of confusion we simply denoted r.
1.3.2. The block’s slice
Let R = (id; A1, A2,..., An), r(R) is a block over R. For
each xid we denoted r(Rx) is a block with Rx = ({x}; A1, A2,...,
An) such that:
tx r(Rx) tx = {t
i
x = t
i } i=1..n , t r(R), t = {t
i : id →
dom(Ai)} i=1..n , x
where tix (x) = t
i (x), i =1.. n.
Then r(Rx) is called a slice of the block r(R) at point x,
sometimes we denoted rx.
Here, for simplicity we use symbols:
x(i) = (x; Ai ) ; id
(i) = {x(i) | x id}.
We call x(i) (x id, i = 1..n) are the index attributes of the
block scheme R = (id; A1,A2,...,An ).
1.3.3. Relational algebra on the block
Fusion Intersection
5
Subtraction Descartes product
Descartes product with index set Projection
Selection Connection permission
Division
1.4 . Conclusion chapter 1
Chapter 1 of the thesis presents an overview of data mining,
data mining techniques, knowledge of mining decision law,
equivalence class ... The last part chapter presents basic concepts
of the data block model: blocks, block’s slices, relational algebra
on blocks These knowledge will be the basis for the issues
presented in the next chapter.
CHAPTER 2. MINING DECISION LAWS ON THE DATA
BLOCK HAS VARIABLE ATTRIBUTE VALUES
2.1 Some concepts built on the block
2.1.1 Information block
Definition 2.1
Let block scheme R = (id;A1,A2,...,An), r is a block over
R. Then, the information block is a tuples of four elements IB=
(U,A,V,f ) with U is a set of objects of r called space objects, A
= ( )
1
n
i
i
id
=
is the set of index attributes of the object, V = ( )
( )
i
i
x
x A
V
, ( )ixV is the set of values of the objects corresponding to the index
attribute x(i), f is an information function UxA→ V
satisfy:uU, x(i)A we have f(u, x(i)) ( )ixV .
2.1.2 Indiscernibility Relation on Block
Definition 2.3
Let information block IB = (U,A,V,f). Then for each
6
index attribute set P A we define an equivalence relation, sign
IND(P) defined as follows:
IND(P) ={(u,v) UxU| x(i)P: f(u,x(i))=f(v,x(i))}, and
called non-discriminatory relations:
2.1.3 Decision block
Definition 2.5
Let information block IB = (U,A,V,f) with U is the space
of objects. A = . Suppose A is divided into two sets C and
D such that:
C= ( )
1,
k
i
i x id
x
=
, D= ( )
1,
n
i
i k x id
x
= +
,
then information block IB is called the decision block and
denoted by DB=(U,CD,V,f), with C is conditional index
attributes set and D is decision index attributes set.
2.1.4 Decision laws on the block and slice.
Definition 2.7
Let decision block DB = (U,CD), with U is the space
of objects:
C = , D = , and Cx = ,
Dx = , xid.
Then:
U/C={C1,C2,,Cm}, U/C
x=
1 2{ , ,..., }xx x xtC C C
,
U/D={D1, D2,,Dk}, U/D
x=
1 2{D , ,..., }xx x xhD D
,
correspondingly, the partitions are generated by C, Cx, D, Dx. A
decision law on a block is denoted by:
Ci → Dj , i = 1..m, j=1..k ,
( )
1
n
i
i
id
=
( )
1,
k
i
i x id
x
=
( )
1,
n
i
i k x id
x
= +
( )
1
k
i
i
x
=
( )
1
n
i
i k
x
= +
7
and on the slice at point x is denoted by:
Cxi → Dxj , i =1..tx, j=1..hx .
Definition 2.8
Let decision block DB=(U,CD), CiU/C, DjU/D,
xpC U/C
x, xqD U/D
x, i =1..m, j=1..k, p{1,2,,tx },
q{1,2,,hx }, xid. Then, support, accuracy and coverage of
decision law Ci→ Dj on the block are:
- Support: Sup(Ci,Dj) = |CiDj|,
- Accuracy: Acc(Ci,Dj) = | |
| |
i j
i
C D
C
,
- Coverage: Cov(Ci,Dj) = | |
| |
i j
j
C D
D
.
Definition 2.9
Let decision block DB=(U,CD), CiU/C, DjU/D is
the conditional equivalence class and decision equivalence class
generated by C, D corresponding, Ci→ Dj is the decision law on
the block DB, i =1..m, j=1..k.
- If Acc(Ci→ Dj ) = 1 then Ci→ Dj is called certain decision
law.
- If 0 < Acc(Ci→ Dj ) < 1 then Ci→ Dj is called uncertain
decision law.
Definition 2.10
Let decision block DB = (U,CD), Ci U/C, Dj U/D,
i =1..m, j =1..k is the conditional equivalence class and decision
equivalence class generated by C,D corresponding; , are two
given thresholds (, (0,1)). If Acc(Ci,Dj) and Cov(Ci,Dj)
then we call Ci→ Dj is the decision law meaning.
2.2 Mining decision law on the data block and block’s slice
algorithm (MDLB).
The MDLB algorithm consists of the following steps:
8
- Step 1: Assign classes of conditional, decision
equivalence on blocks (on slices).
- Step 2: Calculate the support matrix on the block (on
slice)
- Step 3: Calculate accuracy matrix, coverage matrix
- Step 4: Find the decision laws on the block.
2.3. Mining decision laws on the block when index attribute
value changed.
Definition 2.11(Definition of smoothing index attribute value on the block)
Let decision block DB=(U,CD,V,f), with U is the space
of objects, a CD, Va is the set of existing values of the index
attribute a. Suppose Z={xsU | f(xs,a) = z} is the set of objects
whose z value is on the index attribute a. If Z is partitioned into
two sets W and Y such that: Z=WY, WY= with W={xpU|
f(xp,a) = w, wVa}, Y={xqU| f(xq,a)=y, yVa}, then we say the
z value of the index attribute a is smoothed to two new values w
and y
Definition 2.12(Definition of roughing index attribute value on the block)
Let decision block DB=(U,CD,V,f), with U is the space
of objects, a CD, Va is the set of existing values of the index
attribute a. Suppose f(xp,a)=w, f(xq,a)=y are respectively the
values of xp, xq on the index attribute a (pq). If at any one time
we have: f(xp,a)= f(xq,a)=z, (zVa) then we say the two values w,
y of a are roughened to the new value z.
Theorem 2.1
Let decision block DB = (U, CD, V, f ), with U is the
9
space of objects, a CD, Va is the set of existing values of the
index attribute a. Then, two equivalent classes Ep, Eq (Ep,
EqU/E, E{C,D}) is made rough into new equivalent class Es if
and only if aj a: f(Ep,aj) = f(Eq,aj).
Theorem 2.2
Let decision block DB = (U, CD, V, f ), with U is the
space of objects, a C D, Va is the set of existing values of the
index attribute a. Then, equivalent class Es (EsU/E, E{C,D})
smoothed into two new equivalents classes Ep, Eq if and only if
we can put: f(Ep,a)=w, f(Eq,a)=y và Ep Eq= Es, w, yVa, w
y.
Theorem 2.3
Let decision block DB = (U, CD, V, f ). , are two
given thresholds (, (0,1)). Suppose that if Ci → Dj is the
decision law meaning on the decision block then it is also the
decision law meaning on any slice of the decision block at xid.
2.3.1 Smoothing, roughening the conditional equivalente
clases on the decision block and on the slice.
Proposition 2.3
Let decision block DB = (U,CD,V,f ), a=x(i) C, Va is
the set of existing values of the conditional index attribute a, The
z value of a is smoothed to two new values w and y.
Suppose that if the conditional equivalence class Cs
U/C, (f(Cs,a)=z ) smoothed into two new conditional equivalents
classes Cp,Cq (f(Cp,a)=w, f(Cq,a)=y, with w,yVa ) then on the
slice rx, exists equivalence class Cxi satisfy: Cs Cxi, also
smoothed into two new conditional equivalents classes Cxi’ and
Cxi’’ satisfy: Cp Cxi’, Cq Cxi’’ (f(Cxi’,a)=w, f(Cxi’’,a)=y).
10
Proposition 2.5
Let decision block DB = (U, CD, V, f ), a=x(i) C, Va
is the set of existing values of the conditional index attribute a,
the w and y values of a are roughened to the new value z.
Suppose, if two conditional equivalents classes Cp,Cq
U/C, (f(Cp,a)=w, f(Cq,a)=y) is made rough into new conditional
equivalent class Cs U/C ( f(Cs,a)=z ) then on the slice rx exists
two conditional equivalents classes Cxi, Cxj satisfy: Cp Cxi, Cq
Cxj, also is made rough into new conditional equivalent class Cxk
satisfy: Cs Cxk .
2.3.2 Smoothing, roughening the decision equivalente clases
on the decision block and on the slice.
Proposition 2.7
Let decision block DB = (U, CD, V, f ), a=x(i) D, Va
is the set of existing values of the decision index attribute a, the
z value of a is smoothed to two new values w and y.
Suppose that if decision equivalent class Ds U/D (
f(Ds,a)=z ) smoothed into two decision equivalents classes Dp,Dq
(f(Dp,a)=w, f(Dq,a)=y, with w,yVa) then on the slice rx, exists
decision equivalence class Dxi satisfy: Ds Dxi , also smoothed
into two new decision equivalents classes Dxi’ and Dxi’’ satisfy:
Dp Dxi’, Dq Dxi’’ (f(Dxi’,a)=w, f(Dxi’’,a)=y).
Proposition 2.9
Let decision block DB = (U, CD, V, f ), a=x(i) D, Va
is the set of existing values of the decision index attribute a, the
w and y values of a are roughened to the new value z.
Suppose, if two decision equivalents classes Dp,Dq,
(f(Dp,a)=w, f(Dq,a)=y) is made rough into new decision
11
equivalent class Ds U/D ( f(Ds,a)=z ) then on the slice rx exists
two decision equivalents classes Dxi, Dxj satisfy: Dp Dxi, Dq
Dxj, also is made rough into new decision equivalent class Dxk
satisfy: Ds Dxk .
2.3.4 The algorimth of mining decision laws when smoothing,
roughing index attribute values on the block and the slice
(MDLB_VAC).
The algorimth MDLB_VAC consists of the following
steps:
Step 1: Calculate the support matrix Sup (C,D) of the
original block.
Step 2: Incremental calculating the support matrix on the
block Sup (C',D') after roughing/ smoothing the value of the
index attribute.
Step 3: Calculate accuracy matrix Acc (C',D'), the
coverage matrix Cov (C',D') after roughing/ smoothing the value
of the index attribute from the matrix Sup (C',D')
Step 4: Finding decision laws on the block.
2.4 Complexity of the Sup matrix algorithms on the block and
on the slice.
Proposition 2.13: The support matrix algorithm for decision
block and slice at xid has the same complexity of O(|U|2).
Proposition 2.14: The support matrix algorithm for decision
block and slice at xid after roughing the values of the
conditional index attribute has the same complexity of O(|U|2).
Proposition 2.15: The support matrix algorithm for decision
block and slice at xid after smoothing the values of the
conditional index attribute has the same complexity of O(|U|2).
12
2.6 Conclusion
This chapter presents the first results of the thesis:
Building some basic concepts of mining decision laws on block.
On that basis, a number of related properties, propositions and
theorems were stated and proved.
- Building MDLB algorithm to find decision law on
block and slice.
- Propose and prove some results on the relationship
between roughing, smoothing the values of the condition or
decision attribute on the block and slice. At the same time,
propose the MDLB_VAC algorithm to calculate the support
matrices on the block and slice, find decision rules when the
value of index attribute changes.
CHAPTER 3. MINING DECISION LAWS ON BLOCK
HAS OBJECT SET CHANGED.
3.1 Model of adding and removing objects on block and slice.
Proposition 3.1
Let decision block DB = (U, CD, V, f ), AN and DM is
a set of adding and removing objects to block decisions DB. Then
we have:
Acc(C’,D’ )=Acc(C’i,D’j)ij with: i =1..m+p, j = 1..h+q and
𝐴𝑐𝑐(𝐶′𝑖 , 𝐷′𝑗) =
{
|𝐶𝑖 ∩ 𝐷𝑗| + 𝑁ij −𝑀ij
|𝐶𝑖| + ∑ 𝑁ij' −∑ 𝑀ij'
ℎ
𝑗′=1
ℎ+𝑞
𝑗′=1
, 𝑖 = 1. .𝑚, 𝑗 = 1. . ℎ,
𝑁ij
|𝐶𝑖| + ∑ 𝑁ij' − ∑ 𝑀ij'
ℎ
𝑗′=1
ℎ+𝑞
𝑗′=1
, 𝑖 = 1. .𝑚, 𝑗 = ℎ+ 1. . ℎ+ 𝑞
𝑁ij
∑ 𝑁ij
ℎ+𝑞
𝑗=1
, 𝑖 = 𝑚 + 1. .𝑚 + 𝑝, 𝑗 = 1. . ℎ+ 𝑞
,
13
Proposition 3.3
Let decision block DB = (U, CD, V, f ), AN and DM is
a set of adding and removing objects to block decisions DB. Then
we have:
Cov(C’, D’) = Cov(C’i,D’j)ij (m+p)x(h+q), with i =1..m+p, j
= 1..h+q and
𝐶𝑜𝑣(𝐶′𝑖 , 𝐷′𝑗) =
{
|𝐶𝑖 ∩ 𝐷𝑗| + 𝑁𝑖𝑗 −𝑀𝑖𝑗
|𝐷𝑗| + ∑ 𝑁𝑖′𝑗 − ∑ 𝑀𝑖′𝑗
𝑚
𝑖′=1
𝑚+𝑝
𝑖′=1
, 𝑖 = 1. .𝑚, 𝑗 = 1. . ℎ
𝑁𝑖𝑗
|𝐷𝑗| + ∑ 𝑁𝑖′𝑗 − ∑ 𝑀𝑖′𝑗
𝑚
𝑖′=1
𝑚+𝑝
𝑖′=1
, 𝑖 = 𝑚 + 1. .𝑚 + 𝑝, 𝑗 = 1. . ℎ
𝑁𝑖𝑗
∑ 𝑁𝑖′𝑗
𝑚+𝑝
𝑖′=1
, 𝑖 = 1. .𝑚 + 𝑝, 𝑗 = ℎ + 1. . ℎ+ 𝑞
3.2 Incremental Calculating Acc and Cov when adding and
removing objects on decision block.
3.2.1 Adding object x into decision block
Case 1: Create a new conditional class and a new decision class.
Acc(C’m+1, D’h+1) = 1 and Cov(C’m+1, D’h+1) = 1,
j=1..h: Acc(C’m+1, D’j) = Cov(C’m+1, D’j) = 0,
i=1..m: Acc(C’i, D’h+1) = Cov(C’i, D’h+1) = 0.
Other way, i=1..m, j=1..h:
Acc(C’i, D’j) = Acc(Ci, Dj) ,
and Cov(C’i, D’j) = Cov(Ci, Dj) .
Case 2: Create only new conditional class
Acc(C’m+1, D’j*) = 1 and Cov(C’m+1, D’j*) =
1
|𝐷𝑗∗|+1
.
If k j* then: Acc(C’m+1, D’k) = Cov(C’m+1, D’k) = 0.
If i m+1 then: Acc(C’i, D’j*) = Acc(Ci, Dj*), Cov(C’i, D’j*)
=
|𝐶𝑖∩𝐷𝑗∗|
|𝐷𝑗∗|+1
.
14
Other way, i m+1, j j*: Acc(C’i, D’j) = Acc(Ci, Dj) and
Cov(C’i, D’j) = Cov(Ci, Dj).
Case 3: Create only new decision class
Acc(C’i*, D’h+1) =
1
|𝐶𝑗∗|+1
and Cov(C’i*, D’h+1) = 1.
If i i* then: Acc(C’i, D’h+1) = Cov(C’i, D’h+1) = 0.
If k h+1 then: Acc(C’i*, D’k) =
|𝐶𝑖∩𝐷𝑘|
|𝐶𝑖∗|+1
, Cov(C’i*, D’k) =
Cov(Ci*, Dk).
Other way, i i*, j h+1: Acc(C’i, D’j) = Acc(Ci, Dj) and
Cov(C’i, D’j) = Cov(Ci, Dj).
Case 4: No new conditional class or new decision class is
created.
Acc(C’i*,D’j*) =
|𝐶𝑖∗∩𝐷𝑗∗|+1
|𝐶𝑖∗|+1
and Cov(C’i*,D’j*) =
|𝐶𝑖∗∩𝐷𝑗∗|+1
|𝐷𝑗∗|+1
- If kj* then: Acc(C’i*,D’k)=
|𝐶𝑖∗∩𝐷𝑘|+1
|𝐶𝑖∗|+1
; Cov(C’i*,D’k)=
Cov(Ci*, Dk).
- If u i* then: Acc(C’u,D’j*) = Acc(Cu,Dj*) and Cov(C’u,D’j*)
=
|𝐶𝑢∩𝐷𝑗∗|
|𝐷𝑗∗|+1
- If i i* and j j* then: Acc(C’i, D’j) = Acc(Ci, Dj) and Cov(C’i,
D’j) = Cov(Ci, Dj).
3.2.2 Removing object x from decision block.
Acc(C’i*,D’j*)=
|𝐶𝑖∗∩𝐷𝑗∗|−1
|𝐷𝑖∗|−1
and Cov(C’i*,D’j*)=
|𝐶𝑖∗∩𝐷𝑗∗|−1
|𝐶𝑖∗|−1
.
- If kj* then: Acc(C’i*,D’k) =
|𝐶𝑖∗∩𝐷𝑘|
|𝐶𝑖∗|−1
and Cov(C’i*,D’k) =
Cov(Ci*,Dk)
- If ui* then: Acc(C’u,D’j*) = Acc(Cu,Dj*) and Cov(C’u,D’j*)
=
|𝐶𝑢∩𝐷𝑗∗|
|𝐷𝑗∗|−1
- If i i* and j j* then: Acc(C’i,D’j) = Acc(Ci,Dj) and
Cov(C’i,D’j) = Cov(Ci,Dj).
15
3.3 Mining decision laws algorithm using incremental
calculating Acc and Cov matrix after adding and removing
objects (MDLB_OSC1)
Step 1: Calculate the accuracy matrix Acc(C,D) and
coverage Cov(C,D) of the block before adding and removing the
object.
Step 2: Incremental calculating the accuracy matrix
Acc(C',D') and coverage Cov(C',D') after adding and removing
object.
Step 3: Remove rows/ columns in matrices Acc(C',D')
and Cov(C',D') that have value 0.
Step 4: Generate decision laws on block.
3.4 Complexity of the mining decision laws algorithm using
incremental calculating Acc and Cov matrix after adding
and removing objects on decision block.
Proposition 3.5: The algorimth’s complexity determining Acc
and Cov is O(|U|2 ).
Proposition 3.6: The algorimth’s complexity incremental
caculating Acc and Cov when adding N objects is O(N|U|2 ).
Proposition 3.7: The algorimth’s complexity incremental
caculating Acc and Cov when removing M objects is O(M|U|2 ).
Proposition 3.8: The algorimth’s complexity deleting rows/
columns in Acc and Cov matrices that have value 0 is O(|U|2 ).
3.5 Incremental Calculating Sup when adding and removing
objects on decision block
When adding N objects and removing M objects, we
have:
Sup(C’i,D’j) = Sup(Ci,Dj) + Nij – Mij, i=1..m+p, j=1..h+q
16
Other way, Mij = 0 and Sup(Ci,Dj)=0 with i=m+1..m+p,
j=h+1..h+q
3.6 Mining decision laws algorithm using incremental
calculating Sup matrix after adding and removing objects
(MDLB_OSC2).
Step 1: Calculate the Sup(C,D) of the block before
adding and removing the object.
Step 2: Incremental calculating the support matrix
Sup(C',D') after adding and removing object.
Step 3: Delete rows/ columns in Sup(C',D') that have
value 0.
Step 4: Calculate Acc(C',D') and Cov(C',D') through the
values of Sup(C’,D')
Step 5: Generate decision laws on block.
3.7 Complexity of the mining decision laws algorithm using
incremental calculating Sup matrix after adding and
removing objects on decision block.
Proposition 3.9: The algorimth’s complexity incremental
caculating Sup matrix when adding N objects is O(N|U|).
Proposition 3.10: The algorimth’s complexity incremental
caculating Sup matrix when removing M objects is O(M|U|).
Proposition 3.11: The algorimth’s complexity incremental
caculating Sup matrix to find out decision laws when adding N
objects is O(|U|2).
Proposition 3.12: The algorimth’s complexity incremental
caculating Sup matrix when adding N objects in block’s slice at
xid is O(N|U|).
17
Proposition 3.13: The algorimth’s complexity incremental
caculating Sup matrix when removing M objects in block’s slice
at xid is O(M|U|).
3.10 Experimental algorithms
3.10.1 Experimental objectives
(1) Evaluate the enforcement of the MDLB and
MDLB_VAC algorithms.
(2) Evaluate the enforcement of the MDLB_OSC1 and
MDLB_OSC2 algorithms. In addition, compare implementation
time MDLB_OSC1 algorithm with MDLB_OSC2 algorithm.
3.10.2 Experimental data
The experiment was performed on 3 data sets taken from
Pediatrics Department A, B of Bach Mai Hospital 2 from March
10, 2020 to March 14, 2020. Data were collected and pre-treated,
with each data set including 3 conditional index attributes,
namely disease symptoms, including fever, cough, runny nose
and 2 decision index attributes: the treatment regimen, fever virus
level were monitored over 4 days.
The element number of the data set is:
Database
name
BVBM2KN
A
BVBM2KN
B
KID PATIENT
FEVER VIRUS
Number of
objects
160 1360 939
Table 3.2: The basic information of Experimental data
3.10.3 Experimental tools and enviroment.
18
Programming algorithms is written with Java language.
Experimental environment is PC with Intel (R) Core ™ i5 2.5Ghz
configuration, 4G RAM, Windows 7 OS.
3.10.4. Experimental result
After running 3 algorithms on the data sets, we obtained
the following results:
- With problem 1: find the decision laws on the block and slice:
Figure 3.4: Founded decision laws on the block
When you change min_acc and min_cov, the number of laws
obtained will also change:
19
- With problem 2: Find decision laws on block and slice when
smoothing, roughing index attribute values
Figure 3.8: Calculate matrices Sup, Acc, Cov before and after smoothing
Figure 3.5: Relationship between the number of decision laws
and the threshold min_acc, min_cov
20
Figure 3.10: Calculate matrices Sup, Acc, Cov before and after roughing
Figure 3.11: Founded decision laws after smoothing, roughing attribute values
- With problem 3: find the decision laws on the block and slice
when adding or removing objects
+ Results of MDLB_OSC1 algorimth:
21
+ Results of MDLB_OSC2 algorimth:
Comment: Two methods give the same result of the rule
set with the same source set, the only difference in execution
time:
3.11 Conclusion
From the adding and removing object model on the
decision block and slice, some properties of Acc and Cov
22
matrices have been demonstrated. Based on that, two algorithms
for finding decision laws on block and slice were proposed:
- Algorithm MDLB_OSC1 calculates incremental matrices Acc,
Cov to find out decision laws on block and slice.
- Algorithm MDLB_OSC2 calculates the incremental matrix Sup
to find out decision laws.
At the end of the chapter is a comparison of the two
proposed algorithms and experimental settings.
CONCLUSION
1) Main results of the thesis
The thesis focuses on the problem of mining the decision
laws on the block in some cases with the following main results:
- Builded a model of mining decision laws on the data
block with proven definitions, theorems, and propositions.
- Proposes three algorithms to find decision laws on data
block in the following cases: fixed block data; value of index
attribute changes; and the object set of data block changes.
2) Future research of the thesis
- Continue to study mining decidion laws in some case:
the block has attributes changed, the data is not complete ...
- Mining decision laws on the chain of linked decision
blocks together (similar to blockchain technology).
NEW FINDINGS OF THE DOCTORAL DISSERTATION
This Dissertation has two key contributions, including:
23
- Builded a model of mining decision laws on the data
block with proven definitions, theorems, and propositions.
- Proposes three algorithms to find decision laws on data
block in the following cases: fixed block data; value of index
attribute changes; and the object set of data block changes.
LIST OF WORKS OF AUTHOR
1. Thang Trinh Dinh, Tuyen Tran Minh, Lan Anh Do
Thi, “Mining decision laws on data block has variable attribute
values”, Proceedings of 19th Nati
Các file đính kèm theo tài liệu này:
- mining_decision_laws_on_the_data_block.pdf