To test the sensitivity of our proposed method, we randomly select 1000
images on the corel database as the query images. We also asked 50 students to
respond to 1000 queries (represent the subjective perception of the user). Figure
3.9 shows the average accuracy of our proposed method in two scenarios: the first
is to use the ground-truth of images from the Corel image database, called
Aweight GT (Aweight with Ground Truth). The second is to use the subjective
perception of the user, called Aweight UP (Aweight with User Perception). From
Fig. 3.9 it can be seen that the average accuracy of our proposed method with the
use of feedback from students has decreased but not much
27 trang |
Chia sẻ: honganh20 | Ngày: 04/03/2022 | Lượt xem: 289 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Improve the efficiency of content-Based image retrieval through weight adjustment technique of the distance function, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
the same. for example: finding all roses (including red,
yellow and white roses) in the image database. This chapter and the next chapter
of the thesis will propose methods to solve the problem to find images that can
have different low-level features but with the same high-level semantic concept
(ie the same topic).
The similarities between images that people perceive (the images are
semantically relevant) are different from the similarities between them in the
feature space. That is, semantically related images can be scattered throughout the
feature space and scattered over several clusters rather than a single cluster. In
this case, the traditional methods [2,29,61,74] of using the feedback approach do
not work well (because these methods have used the single-point query
approach).
Performing feedback involves the calculation of one or more new query points
in the feature space and changing the distance function. The methods presented in
the feedback approach with the separate query have the advantage of obtaining
semantically related images scattered throughout the feature space. However,
these methods have limitations:
(1) It requires users to provide multiple query images simultaneously, for
example, to query for the rose theme, users must provide images of red roses,
yellow roses, white roses, ... as a query. If this condition is not met, the initial
retrieval result will be the images in a certain region without including relevance
images in different regions.
If the user only provides the system with yellow rose images, the initial
retrieval results may only include yellow rose images that ignore white and red
rose images. The reason for this is because, in traditional image retrieval systems,
images with low-level feature vectors that are similar will lie close together (or in
the same low-level feature cluster). On the list of initial results consisting of
yellow roses, the user can only select yellow roses. The system will rely on the
responses of the yellow roses to continue the retrieval. The next retrieval phase
will move to the yellow regions. The result of the system can only get yellow
roses. Therefore, red and white rose regions will be ignored, so the accuracy of
the system will be limited no matter how superior the next retrieval phase will be.
7
(2) The number of queries for the next iteration depends on the number of
related images provided by the user, therefore, there are two unfavorable
possibilities: Firstly, the user chooses too few responses (less than the number of
clusters in the featured space). In this capacity, the accuracy of the system will
not be guaranteed because according to clustering theory, more queries will cover
more clusters. The second possibility is that the user chooses too many feedback
images. This capability will increase the burden of aggregating the result lists
(each query will have a result list). In addition, too many queries do not improve
the accuracy of the system (experiments in [49] have shown that accuracy
increases rapidly from 1 to 8 queries and increases slowly when the number of
queries is from 8 to 20). For example, in the Corel database with the theme of
roses, each rose query image is only scattered in 4 clusters (each cluster
corresponds to a color of rose).
(3) Using the weights of different queries is equal, that is, the importance of
queries is the same even though each query has different neighbors.
(4) The features are equally weighted even though each feature component
has a different importance.
These limitations are the main reason leading to the low accuracy of the
retrieval system.
Based on the analysis of the limitations of the available methods, the thesis
proposes a semantic image retrieval method.
The proposed method has advantages:
(1) Use only one query to create diverse initialized retrieval results, which
include images located in different regions (reduce the burden on users).
(2) Cluster the related images with low-time cost.
(3) Identify the semantic importance of each query.
(4) Determine importance according to each feature.
These four advantages have been expressed in the method that the author has
published in [CT5, CT6].
2.2. The proposed method diagram
Based on the analysis in Section 2.1, the thesis proposes the diagram of the
method as shown in Figure 2.5.
8
Figure 2.5. The structure of the proposed method.
The next section of the thesis will describe in detail the proposed method. The
next section needs some definitions, so the thesis gives some definitions here.
Definition 2.1 (Feature set). A feature set F consists of T feature sets, each
consisting of m components, each of which is a real value.
(2.1)
Definition 2.2 (Feature space). A feature space FS consists of m dimensions,
each corresponding to a real component of the feature set t (t = 1..N) of the
feature set F, Each point pt (t = 1..N) in space FS corresponds to a feature set in
F.
(2.2)
Definition 2.3 (i
th
feature space). i
th
feature space is denoted by , is a
feature space of n dimensions, Each point in this space is denoted by
(t=1..N)
with n coordinates.
(2.3)
Definition 2.4 (Measure the distance between two points in the feature
space FS
i
). Measure the distance between two points
and
(k,l=1..N) with
kl, denoted by (
), is a measure of some distance.
The main idea of the proposed method is not to place images (including
both database and query image) in the same feature space but in different feature
spaces (In the context of this chapter, the thesis maps each representation of an
image into a corresponding feature space), then perform a retrieval by querying
each of these feature spaces and merging the results corresponding to the feature
spaces into one final result.
R
etrieve
C
alcu
latio
n
Calculation
Feedback Sort Incremental
Cluster Clustes
Multi-ponts
query
Importance of
each feature
Importance of
each query
Similarity Comparison Result set Feedback set
Feature Database
Feature vectors
Query
Multiple
representations
Representation
Result
9
The reason that the method in the thesis can get images scattered in the
original color space is that the images were converted to grayscale representation.
According to this representation, the characteristics of shape and texture will not
be overwhelmed by color. An image of a rose (gray representation) will be
mapped into a point in the feature space. In this space, because the color is not
included, images of the same subject (for example: yellow, white and red roses)
will be close to each other. Therefore, the proposed method can obtain red, pink
and yellow rose images corresponding to the red rose query image.
At this point, the retrieval process will match the query image and the database
image in each individual feature space to get a list of results. Thus, we will have
four result lists. Next, the four result lists will be combined to get a final list of
results.
2.3. Relevance feedback with multi-point query
The original approach for content-based image retrieval is incompatible with
users' perceptions of visual similarity. To fix this problem, several image retrieval
methods using relevant feedback are proposed. There are two components to
learning in the relevant feedback approach: distance function and new query
point. The distance function is changed through learning the weights of the
feature components and new query points are obtained by learning the desired
points that the users need.
2.4. The proposed image retrieval algorithm
Definition 2.5 (Multi-point query):
A multipoint query MQ=, with nMQ denoting
the number of query points in MQ, PMQ={PMQ1,,PMQn} is the set of nMQ query
points in the DB search space, WMQ={wMQ1,,wMQn} is the set of weights
associated with PMQ (the thesis assumes that the weights are normalized, ie,
∑
), DMQ is the distance that when we give any two points pi and pj in
the feature space, it will return the distance between them and k is the number of
points to be retrieved in each iteration.
2.4.1. Cluster algorithm for feedback images set
The algorithm below describes the initial clustering algorithm that uses k
eigenvectors, named CISE (Clustering Images Set using Eigenvectors).
Clustering Images Set using k Eigenvectors
Input: -Image set S={s1,s2 sn} with si Rn
- Number of cluster k
Output: k cluster: C1, C2 Ck
1. Form the affinity maxtrix
for i1 to n do
for j1 to n do
10
if (ij)
‖ ‖
else
2. Construct the diagonal matrix and Laplace matrix L
for i1 to n do
∑
L D-1/2 A D-1/2
3. Compute the k largest eigenvectors x1, x2 k of the Laplace matrix L
for i1 to k do
X [x1T ,x2T kT ]
4. Construct matrix Y from X
for i1 to n do
for j1 to k do
yij xij/ ∑
)1/2
Y [y1 ,y2 yk ]
Step 5: Form k cluster via K-Means
for i1 to n do
K-Mean(P)
Step 6: Assign the points to cluster
for i1 to n do
if
Return C1, C2 Ck
2.4.2. The proposed incremental clustering algorithm
There are many clustering algorithms such as K-means, K-medoid,. which
are used in image retrieval methods. However, when a new image is added, the
methods must recluster all the images. In incremental clustering algorithms,
determining the cluster for an object is the most important task. Below, we
describe our proposed incremental clustering algorithm.
Assume that the data has a Gauss distribution. In this algorithm, we treat
each cluster as a group. When training, we will estimate the center of each group
and the covariance matrix. The task of determining the cluster of an object is
11
considered as the problem of finding an estimate such that: for an input
, Its cluster label will be identified by:
ŷ0 y (2.8)
However, is difficult to calculate, so instead of calculating ,
we will estimate through and . According to Bayes rule, where is
the label of the group, we have the formula:
(2.9)
∑
(2.10)
Assume that is a multivariate normal distribution with a
density:
=
∑
∑ (2.11)
Where:
: Mean of the inputs for group
∑ : Covariance matrix (common to all groups)
Suppose that we know:
(2.12)
(2.13)
Note: formula (2.13) is the ratio of the training samples of group i to the total
number of training samples.
At this point, we obtain the formula:
(2.14)
Since the denominator in (2.14) is independent of , we can consider it a
constant C and obtain the formula:
(2.15)
Replacing from (2.11) into (2.15), we get:
∑
∑ (2.16)
Because 2 ∑ in (2.16) does not depend on , we set
∑
equal to the constant and we have:
∑ (2.17)
and take the logarithm of both sides of (2.17), we get:
∑
(2.18)
In (2.18), the value of the right side is true for all groups , so we are
only interested in:
∑
(2.19)
12
=
[ ∑
∑
]
∑
(2.20)
Thus, our goal is to maximize the formula (2.20) in .
In (2.20), since ∑
is independent of , we consider it a constant
and (2.20) transformed into
∑
∑
(2.21)
Ignoring the constant , we have the objective function:
∑
∑
(2.22)
With an input x, we predict its label as when is the largest.
2.4.3. Improved distance function
The distance from an image to multipoint query MQ = (Q1, Q2,..Qn).
Formula (2.23) is the minimum of weighted distances from an image to each
query Qi:
( ) (2.23)
In the formula (2.23), Dist( ,Qi ) với i=1..n, j=1..k is the distance
from an image to a query Qi with feature weight (determined by
the algorithm IF ), is the sematic weight combined with distance dij
(see way to calculate semantic weight in formula (2.24)).
2.4.4. Calculate the semantic weight of the query
The propose relies on the perception that a cluster containing multiple
semantically related images is more important than remaining clusters. Therefore,
the queries generated from that cluster have semantic weight higher than the
remaining clusters. So the calculation of semantic weight wij combined with
distance dij from image to query Qi (belongs to semantic cluster i) is the ratio
of the number of semantically related images in cluster i and the total number of
related images of n semantic clusters
∑
(2.24)
The weights need to satisfy the condition ∑
2.4.5. Calculation algorithm for the feature importance
The main idea of determining the feature importance is based on user
feedback. When a user responds some images as semantically related with query
image, we will cluster these semantically related images into clusters and
consider each of the clusters as follows: each image in the cluster will be a point
in multi-feature space and these points will be located near each other in the
multi-feature space. A shape covering these points will be projected into axes
corresponding to features, then we calculate the variance of these points in each
axis (knowing the degree of data dispersion in an axis in a large feature space
13
means the importance in the axis low). Thus the importance of each feature in
multi-feature space is the inverse of the variance of the points in that the axis.
2.4.6. The Combination algorithm performing the combination of result lists
With the input of query lists Q1, Q1,Qn and respective result lists R1, R2, .Rn, each Ri
will take first k classified images IRi1, IRi2 . . . IRik. The result lists need to be
contributed to getting the combined result R.
Proposition 1. [Complexity of the algorithm Combination]:
The complexity of the algorithm Combination is O(nk), where n is the number of
combined lists and k is the number of returned images of each list.
2.4.7. Semantic-related image retrieval (SRIR) algorithm
In this section, the thesis proposes an algorithm called SRIR (Semantic –
Related Image Retrieval), which does not require users to provide multiple
queries.
SRIR algorithm
Input:
Set of image database: DB
Query image: Q
Number of retrieved images after each iteration: k
Feature space: F
Number of feature: m
Ouput:
Set of result images: R
C+Q; PMQFC+ ;
WMQFC+ ; DMQFC+ ( )
s1 ;
C- ; PMQFC- ;
WMQFC- ; DMQFC- ( )
s2 ;
G+ 2 ; PMQFG+ ;
WMQFG+ ; DMQFG+ ( )
s3 ;
G- ; PMQFG- ;
WMQFG- ; DMQFG- ( )
s4 ;
( )
US ;
repeat
USUS ;
CL ;
for i1 to n do
14
;
ci (CiCL);
PMQici
for j1 to k do
WMQi
∑
DMQid ( );
Ri;
SR
until (User stops responding);
return R;
Proposition 2. [Complexity of the algorithm SRIR]:
The Complexity of the algorithm SRIR is O(N), where N is the size of the database
image set.
2.5. Experimental evaluation
2.5.1 Test environment
The database used for the test is a subset of Corel. This set includes 3400 images.
2.5.3. Query implementation and evaluation
To test the accuracy of the proposed method, all of 3400 images were used as
queries. Average precision
1
at 150 returned images are used to evaluate. In Table
2.2, f our different methods were implemented to compare including Basic C+, JF,
MMRF and SRIR at 1, 4, 8, 12, 16, 20 feedback queries.
Table 2.2. Result table of 4 methods according to the number of queries in
one feedback.
Method
Precision according to the number of queries
1 query 4 query 8 query 12 query 16 query 20 query
Basic C+ 0.20 0.22 0.23 0.24 0.245 0.25
JF 0.24 0.29 0.31 0.33 0.34 0.35
MMRF 0.243 0.31 0.315 0.323 0.334 0.365
SRIR 0.36490 0.39789 0.40035 0.40241 0.40360 0.40385
The experimental results are shown in Fig. 8. The horizontal axis indicates the
number of clusters (can be 1, 4, 8, 12, 16, 20). The vertical axis indicates the
precision. Four different methods including Basic C+ , JF, MMRF và SRIR are
indicated by 4 curves
We can make conclusions from Fig. 2.11. The system precision increases (the
vertical axis) along with the rise of the horizontal axis (number of clusters). The
more clusters we use to retrieve, the higher the system performance is. We also
15
found that the precision of SRIR method is better when the number of clusters
between 1 and 8, specifically 54.73% at 1, 59.68% at 4 and 60.05% at 8.
Figure 2.11. Compare the accuracy of the four methods.
In the SRIR method, the performance curve increases rapidly from 1 to 8
clusters and increase slowly in the range of 12 to 20 clusters because 8 clusters
covered most of the clusters in the feature space. Although Jin & French method
also increased rapidly in the range of 1 to 8 queries [49] our approach has much
higher precision without increasing retrieval time. The main reason is that in our
proposed method, although the number of clusters is between 1 and 8, our method
takes advantage of semantic information from user feedback more than 8.
2.6. Conclusion of chapter 2
The thesis focused on proposing a method, called SRIR, to solve four
main problems: (1) Use only one query to retrieve a variety of initialized results,
including images in the entire feature space (reducing the burden on users in not
having to select multiple query images); (2) Clustering related images with low
time; (3) determine the semantic importance of each query (4) determine the
importance of each feature.
Our experimental results on the feature database consisting of 3400 images have
shown that the proposed method SRIR offers a significantly higher precision when
compared to the Basic C+, JF and MMRF method.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
1 4 8 12 16 20
A
v
e
r
a
g
e
a
c
c
r
a
c
y
Number of feedback queries
Basic C+
JF
MMRF
SRIR
16
Chapter 3. AN EFFICIENT IMAGE RETRIEVAL METHOD USING
ADAPTIVE WEIGHTS
3.1. Introduction
Chapter 2 of the thesis presents the image retrieval method [CT5], which
can retrieve semantic images spread in the entire feature space with high
precision. However, this method has not solved two limitations:
Firstly, it does not fully exploit feedback information (the relevance level of
each image) to identify good query points. For example, Figure 3.1 is the general
interface for existing systems. This interface shows us that users can only check
the box at the top of the image (if the image is relevant) or uncheck it
(If the image is irrelevant). While users rate the image with a higher ID
pl_flower\84059 than the image with ID pl_flower\476083.
Figure 3.1: Typical interface of CBIR system with relevant feedback.
Second, the methods above consider the regions containing the different
good query points to be equal and assign the same weight to all adjacent points of
the good query. This is not appropriate because different regions often have
separate attributes.
Figure 3.2. Illustration of two equally well-queried regions. (a) Figure on the left:
the first query point. (b) Figure on right: second query point.
17
Based on this observation, the thesis has proposed an image retrieval method
through adaptive weight, named AWEIGH (An efficient image retrieval method
using adaptive weights) [CT6]. In this method, instead of using the same weight
vector for regions that contain different good query points, the method
automatically calculates the good query points and the good weight vectors
corresponding to the regions that contain the good query points based on user’s
feedback.
In addition, previous methods clustered all feedback images, thus the
computational complexity of those methods will be high. To address this
limitation, the proposed method only clustered feedback images in the first
iteration (from the second iteration onwards, the method only adds feedback
images to the clusters) (See section 2.3 of Chapter 2).
As shown in Fig. 3.3, the main difference between our proposed method and
the existing relevant feedback image retrieval methods lies in the three main
components: (a) Determining the optimal query points (b) Computing the weight
vectors; (c) Computing the improved distance functions. These components can
be embedded in any relevant feedback image retrieval system, so we will describe
each of these components separately.
Figure 3.3. Diagram of the image retrieval system using adaptive weights.
Search Engine
AWEIGHT
Determining the good
query points
Computing the weight
vectors
Search Engine
Result set
Relevant set
Initial Cluster
Training set
Result set
Relevant set
Incremental
Cluster
Computing the improved
distance functions
Initial Query
18
3.2. The algorithm determines the good query point and the adaptive weight
set of the improved distance function.
In this section, the thesis presents the proposed technique for determining the
optimal query point and the adaptive weights of the distance function. The
technique determines the optimal query point and the adaptive weights according
to a given cluster of images. In the case of multiple clusters, this technique is
performed for each cluster.
Given a cluster i (i=1,,g), each image in the cluster i that is represented by
with j=1 , matrix
where denotes the number of images for ith
cluster). Suppose the optimal query vector for cluster i is
. Assume a user’s evaluation information in
terms of relevancy for each
(j=1,.., ) (where
2 , vector
represents the user’s feedback of the relevance
level of each image in cluster i
. The problem of
finding the optimal query point and the weight matrix is referred to the
problem of minimizing penalties as follows:
∑
(
)
(3.1)
Subject to: det( )=1
Where det( ) is the determinant of the matrix (to avoid the case
is a zero matrix).
To solve the minimization problem, we use the method of Lagrange
multipliers
- As a result,n an optimal query point :
∑
where
∑
∑
(3.2)
- If (C(i))-1 exists, the matrix weight matrix :
C
C (3.3)
With C
be the weighted covariance matrix of the images in the
cluster i:
∑
̅̅ ̅̅ ̅
̅̅ ̅̅ ̅
(3.4)
Since the optimal query vector and the weight matrix W, we have the
distance function as: (
) (
)
(
) (3.5)
Let Cpf (
) be the list of points in the cluster of
positive feedback samples corresponding to the optimal query point i (
i.e.,
the list of points in the corresponding ellipse. is the list of k points
nearest to pi. C
are the positive
19
feedback points of the neighborhood k of pi. Our proposed distance formula is
written as follows:
(
)
(
) (3.6)
Where: (
) is the improvement distance from a point pi to the
optimal query point
.
is the distance from point pi to point
.
3.3. An efficient image retrieval method using adaptive weights
The thesis proposes an image retrieval algorithm that uses optimal query points,
optimal distance functions, and improved distance functions, named AWEIGHT.
AWEIGHT algorithm
Input:
Image set: S
Query: Qinitial
Number of retrieved images after each interation: k
Output:
The result set: Result(Qopt)
1. Result(Qinitial) ;
2. Relevant( ,N)Feedback
;
3. CISE(Relevant( ,N), g, X)
4. D{ }
5. Repeat
5.1 for i1 to g do
FQM( , , )
5.2 Result(Qopt) <g, { , ,... }, { }, ,
S, k>;
5.3 Relevant( ’)Feedback ( )
;
5.4 F j ’
INC(D, xj Relevant( ’), i);
Add(xj, )
until (User stops responding);
6. Return Result(Qopt);
3.4. Experimental results
3.4.1. Test environment
The set of images
The database used for the test is a subset of Corel. This set includes 10.800
images.
3.4.2. Evaluation
20
In our experiment, the parameters were chosen as follows:
The performance of the system is evaluated on COREL database including
10800 images, all images in the database were selected as the query images. We
evaluate our method based on the average accuracy of 10800 query images. Each
query will have 100 returned images. We chose 100 returned images for a query
because users usually only view within two screens and each screen contains 50
images.(
Table 3.2. The average accuracy in three feedbacks of the compared
methods
Name of
method
Average accuracy (%)
2 query points 4 query points 8 query points
CRF 0.2387 0.3065 0.3199
DSSA 0.3135 0.42658 0.4846
WATH 0.2856 0.3763 0.4218
AWEIGHT 0.3324 0.48658 0.5125
Table 3.2 shows the average accuracy of four methods, CRF, DSSA, WATH
and our AWEIGHT method at the second feedback loop with the number of
query points are 2, 4 and 8. With two query points, our method has
higher accuracy than the four methods CRF, DSS
Các file đính kèm theo tài liệu này:
- improve_the_efficiency_of_content_based_image_retrieval_thro.pdf