Improve the efficiency of content-Based image retrieval through weight adjustment technique of the distance function

To test the sensitivity of our proposed method, we randomly select 1000

images on the corel database as the query images. We also asked 50 students to

respond to 1000 queries (represent the subjective perception of the user). Figure

3.9 shows the average accuracy of our proposed method in two scenarios: the first

is to use the ground-truth of images from the Corel image database, called

Aweight GT (Aweight with Ground Truth). The second is to use the subjective

perception of the user, called Aweight UP (Aweight with User Perception). From

Fig. 3.9 it can be seen that the average accuracy of our proposed method with the

use of feedback from students has decreased but not much

27 trang | Chia sẻ: honganh20 | Lượt xem: 508 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Improve the efficiency of content-Based image retrieval through weight adjustment technique of the distance function, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

the same. for example: finding all roses (including red, yellow and white roses) in the image database. This chapter and the next chapter of the thesis will propose methods to solve the problem to find images that can have different low-level features but with the same high-level semantic concept (ie the same topic). The similarities between images that people perceive (the images are semantically relevant) are different from the similarities between them in the feature space. That is, semantically related images can be scattered throughout the feature space and scattered over several clusters rather than a single cluster. In this case, the traditional methods [2,29,61,74] of using the feedback approach do not work well (because these methods have used the single-point query approach). Performing feedback involves the calculation of one or more new query points in the feature space and changing the distance function. The methods presented in the feedback approach with the separate query have the advantage of obtaining semantically related images scattered throughout the feature space. However, these methods have limitations: (1) It requires users to provide multiple query images simultaneously, for example, to query for the rose theme, users must provide images of red roses, yellow roses, white roses, ... as a query. If this condition is not met, the initial retrieval result will be the images in a certain region without including relevance images in different regions. If the user only provides the system with yellow rose images, the initial retrieval results may only include yellow rose images that ignore white and red rose images. The reason for this is because, in traditional image retrieval systems, images with low-level feature vectors that are similar will lie close together (or in the same low-level feature cluster). On the list of initial results consisting of yellow roses, the user can only select yellow roses. The system will rely on the responses of the yellow roses to continue the retrieval. The next retrieval phase will move to the yellow regions. The result of the system can only get yellow roses. Therefore, red and white rose regions will be ignored, so the accuracy of the system will be limited no matter how superior the next retrieval phase will be. 7 (2) The number of queries for the next iteration depends on the number of related images provided by the user, therefore, there are two unfavorable possibilities: Firstly, the user chooses too few responses (less than the number of clusters in the featured space). In this capacity, the accuracy of the system will not be guaranteed because according to clustering theory, more queries will cover more clusters. The second possibility is that the user chooses too many feedback images. This capability will increase the burden of aggregating the result lists (each query will have a result list). In addition, too many queries do not improve the accuracy of the system (experiments in [49] have shown that accuracy increases rapidly from 1 to 8 queries and increases slowly when the number of queries is from 8 to 20). For example, in the Corel database with the theme of roses, each rose query image is only scattered in 4 clusters (each cluster corresponds to a color of rose). (3) Using the weights of different queries is equal, that is, the importance of queries is the same even though each query has different neighbors. (4) The features are equally weighted even though each feature component has a different importance. These limitations are the main reason leading to the low accuracy of the retrieval system. Based on the analysis of the limitations of the available methods, the thesis proposes a semantic image retrieval method. The proposed method has advantages: (1) Use only one query to create diverse initialized retrieval results, which include images located in different regions (reduce the burden on users). (2) Cluster the related images with low-time cost. (3) Identify the semantic importance of each query. (4) Determine importance according to each feature. These four advantages have been expressed in the method that the author has published in [CT5, CT6]. 2.2. The proposed method diagram Based on the analysis in Section 2.1, the thesis proposes the diagram of the method as shown in Figure 2.5. 8 Figure 2.5. The structure of the proposed method. The next section of the thesis will describe in detail the proposed method. The next section needs some definitions, so the thesis gives some definitions here. Definition 2.1 (Feature set). A feature set F consists of T feature sets, each consisting of m components, each of which is a real value. (2.1) Definition 2.2 (Feature space). A feature space FS consists of m dimensions, each corresponding to a real component of the feature set t (t = 1..N) of the feature set F, Each point pt (t = 1..N) in space FS corresponds to a feature set in F. (2.2) Definition 2.3 (i th feature space). i th feature space is denoted by , is a feature space of n dimensions, Each point in this space is denoted by (t=1..N) with n coordinates. (2.3) Definition 2.4 (Measure the distance between two points in the feature space FS i ). Measure the distance between two points and (k,l=1..N) with kl, denoted by ( ), is a measure of some distance. The main idea of the proposed method is not to place images (including both database and query image) in the same feature space but in different feature spaces (In the context of this chapter, the thesis maps each representation of an image into a corresponding feature space), then perform a retrieval by querying each of these feature spaces and merging the results corresponding to the feature spaces into one final result. R etrieve C alcu latio n Calculation Feedback Sort Incremental Cluster Clustes Multi-ponts query Importance of each feature Importance of each query Similarity Comparison Result set Feedback set Feature Database Feature vectors Query Multiple representations Representation Result 9 The reason that the method in the thesis can get images scattered in the original color space is that the images were converted to grayscale representation. According to this representation, the characteristics of shape and texture will not be overwhelmed by color. An image of a rose (gray representation) will be mapped into a point in the feature space. In this space, because the color is not included, images of the same subject (for example: yellow, white and red roses) will be close to each other. Therefore, the proposed method can obtain red, pink and yellow rose images corresponding to the red rose query image. At this point, the retrieval process will match the query image and the database image in each individual feature space to get a list of results. Thus, we will have four result lists. Next, the four result lists will be combined to get a final list of results. 2.3. Relevance feedback with multi-point query The original approach for content-based image retrieval is incompatible with users' perceptions of visual similarity. To fix this problem, several image retrieval methods using relevant feedback are proposed. There are two components to learning in the relevant feedback approach: distance function and new query point. The distance function is changed through learning the weights of the feature components and new query points are obtained by learning the desired points that the users need. 2.4. The proposed image retrieval algorithm Definition 2.5 (Multi-point query): A multipoint query MQ=, with nMQ denoting the number of query points in MQ, PMQ={PMQ1,,PMQn} is the set of nMQ query points in the DB search space, WMQ={wMQ1,,wMQn} is the set of weights associated with PMQ (the thesis assumes that the weights are normalized, ie, ∑ ), DMQ is the distance that when we give any two points pi and pj in the feature space, it will return the distance between them and k is the number of points to be retrieved in each iteration. 2.4.1. Cluster algorithm for feedback images set The algorithm below describes the initial clustering algorithm that uses k eigenvectors, named CISE (Clustering Images Set using Eigenvectors). Clustering Images Set using k Eigenvectors Input: -Image set S={s1,s2 sn} with si Rn - Number of cluster k Output: k cluster: C1, C2 Ck 1. Form the affinity maxtrix for i1 to n do for j1 to n do 10 if (ij)  ‖ ‖ else  2. Construct the diagonal matrix and Laplace matrix L for i1 to n do ∑ L  D-1/2 A D-1/2 3. Compute the k largest eigenvectors x1, x2 k of the Laplace matrix L for i1 to k do  X  [x1T ,x2T kT ] 4. Construct matrix Y from X for i1 to n do for j1 to k do yij  xij/ ∑ )1/2 Y  [y1 ,y2 yk ] Step 5: Form k cluster via K-Means  for i1 to n do   K-Mean(P) Step 6: Assign the points to cluster for i1 to n do if Return C1, C2 Ck 2.4.2. The proposed incremental clustering algorithm There are many clustering algorithms such as K-means, K-medoid,. which are used in image retrieval methods. However, when a new image is added, the methods must recluster all the images. In incremental clustering algorithms, determining the cluster for an object is the most important task. Below, we describe our proposed incremental clustering algorithm. Assume that the data has a Gauss distribution. In this algorithm, we treat each cluster as a group. When training, we will estimate the center of each group and the covariance matrix. The task of determining the cluster of an object is 11 considered as the problem of finding an estimate such that: for an input , Its cluster label will be identified by: ŷ0 y (2.8) However, is difficult to calculate, so instead of calculating , we will estimate through and . According to Bayes rule, where is the label of the group, we have the formula: (2.9) ∑ (2.10) Assume that is a multivariate normal distribution with a density: = ∑ ∑ (2.11) Where: : Mean of the inputs for group ∑ : Covariance matrix (common to all groups) Suppose that we know: (2.12) (2.13) Note: formula (2.13) is the ratio of the training samples of group i to the total number of training samples. At this point, we obtain the formula: (2.14) Since the denominator in (2.14) is independent of , we can consider it a constant C and obtain the formula: (2.15) Replacing from (2.11) into (2.15), we get: ∑ ∑ (2.16) Because 2 ∑ in (2.16) does not depend on , we set ∑ equal to the constant and we have: ∑ (2.17) and take the logarithm of both sides of (2.17), we get: ∑ (2.18) In (2.18), the value of the right side is true for all groups , so we are only interested in: ∑ (2.19) 12 = [ ∑ ∑ ] ∑ (2.20) Thus, our goal is to maximize the formula (2.20) in . In (2.20), since ∑ is independent of , we consider it a constant and (2.20) transformed into ∑ ∑ (2.21) Ignoring the constant , we have the objective function: ∑ ∑ (2.22) With an input x, we predict its label as when is the largest. 2.4.3. Improved distance function The distance from an image to multipoint query MQ = (Q1, Q2,..Qn). Formula (2.23) is the minimum of weighted distances from an image to each query Qi: ( ) (2.23)  In the formula (2.23), Dist( ,Qi ) với i=1..n, j=1..k is the distance from an image to a query Qi with feature weight (determined by the algorithm IF ), is the sematic weight combined with distance dij (see way to calculate semantic weight in formula (2.24)). 2.4.4. Calculate the semantic weight of the query The propose relies on the perception that a cluster containing multiple semantically related images is more important than remaining clusters. Therefore, the queries generated from that cluster have semantic weight higher than the remaining clusters. So the calculation of semantic weight wij combined with distance dij from image to query Qi (belongs to semantic cluster i) is the ratio of the number of semantically related images in cluster i and the total number of related images of n semantic clusters ∑  (2.24) The weights need to satisfy the condition ∑  2.4.5. Calculation algorithm for the feature importance The main idea of determining the feature importance is based on user feedback. When a user responds some images as semantically related with query image, we will cluster these semantically related images into clusters and consider each of the clusters as follows: each image in the cluster will be a point in multi-feature space and these points will be located near each other in the multi-feature space. A shape covering these points will be projected into axes corresponding to features, then we calculate the variance of these points in each axis (knowing the degree of data dispersion in an axis in a large feature space 13 means the importance in the axis low). Thus the importance of each feature in multi-feature space is the inverse of the variance of the points in that the axis. 2.4.6. The Combination algorithm performing the combination of result lists With the input of query lists Q1, Q1,Qn and respective result lists R1, R2, .Rn, each Ri will take first k classified images IRi1, IRi2 . . . IRik. The result lists need to be contributed to getting the combined result R. Proposition 1. [Complexity of the algorithm Combination]: The complexity of the algorithm Combination is O(nk), where n is the number of combined lists and k is the number of returned images of each list. 2.4.7. Semantic-related image retrieval (SRIR) algorithm In this section, the thesis proposes an algorithm called SRIR (Semantic – Related Image Retrieval), which does not require users to provide multiple queries. SRIR algorithm Input: Set of image database: DB Query image: Q Number of retrieved images after each iteration: k Feature space: F Number of feature: m Ouput: Set of result images: R C+Q; PMQFC+  ; WMQFC+ ; DMQFC+ (  ) s1 ; C-  ; PMQFC-  ; WMQFC- ; DMQFC- (  ) s2 ; G+  2 ; PMQFG+  ; WMQFG+ ; DMQFG+ (  ) s3 ; G-  ; PMQFG-  ; WMQFG- ; DMQFG- (  ) s4 ; ( ) US ; repeat USUS ; CL ; for i1 to n do 14  ; ci (CiCL); PMQici for j1 to k do WMQi ∑ DMQid (  ); Ri; SR until (User stops responding); return R; Proposition 2. [Complexity of the algorithm SRIR]: The Complexity of the algorithm SRIR is O(N), where N is the size of the database image set. 2.5. Experimental evaluation 2.5.1 Test environment The database used for the test is a subset of Corel. This set includes 3400 images. 2.5.3. Query implementation and evaluation To test the accuracy of the proposed method, all of 3400 images were used as queries. Average precision 1 at 150 returned images are used to evaluate. In Table 2.2, f our different methods were implemented to compare including Basic C+, JF, MMRF and SRIR at 1, 4, 8, 12, 16, 20 feedback queries. Table 2.2. Result table of 4 methods according to the number of queries in one feedback. Method Precision according to the number of queries 1 query 4 query 8 query 12 query 16 query 20 query Basic C+ 0.20 0.22 0.23 0.24 0.245 0.25 JF 0.24 0.29 0.31 0.33 0.34 0.35 MMRF 0.243 0.31 0.315 0.323 0.334 0.365 SRIR 0.36490 0.39789 0.40035 0.40241 0.40360 0.40385 The experimental results are shown in Fig. 8. The horizontal axis indicates the number of clusters (can be 1, 4, 8, 12, 16, 20). The vertical axis indicates the precision. Four different methods including Basic C+ , JF, MMRF và SRIR are indicated by 4 curves We can make conclusions from Fig. 2.11. The system precision increases (the vertical axis) along with the rise of the horizontal axis (number of clusters). The more clusters we use to retrieve, the higher the system performance is. We also 15 found that the precision of SRIR method is better when the number of clusters between 1 and 8, specifically 54.73% at 1, 59.68% at 4 and 60.05% at 8. Figure 2.11. Compare the accuracy of the four methods. In the SRIR method, the performance curve increases rapidly from 1 to 8 clusters and increase slowly in the range of 12 to 20 clusters because 8 clusters covered most of the clusters in the feature space. Although Jin & French method also increased rapidly in the range of 1 to 8 queries [49] our approach has much higher precision without increasing retrieval time. The main reason is that in our proposed method, although the number of clusters is between 1 and 8, our method takes advantage of semantic information from user feedback more than 8. 2.6. Conclusion of chapter 2 The thesis focused on proposing a method, called SRIR, to solve four main problems: (1) Use only one query to retrieve a variety of initialized results, including images in the entire feature space (reducing the burden on users in not having to select multiple query images); (2) Clustering related images with low time; (3) determine the semantic importance of each query (4) determine the importance of each feature. Our experimental results on the feature database consisting of 3400 images have shown that the proposed method SRIR offers a significantly higher precision when compared to the Basic C+, JF and MMRF method. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 1 4 8 12 16 20 A v e r a g e a c c r a c y Number of feedback queries Basic C+ JF MMRF SRIR 16 Chapter 3. AN EFFICIENT IMAGE RETRIEVAL METHOD USING ADAPTIVE WEIGHTS 3.1. Introduction Chapter 2 of the thesis presents the image retrieval method [CT5], which can retrieve semantic images spread in the entire feature space with high precision. However, this method has not solved two limitations: Firstly, it does not fully exploit feedback information (the relevance level of each image) to identify good query points. For example, Figure 3.1 is the general interface for existing systems. This interface shows us that users can only check the box at the top of the image (if the image is relevant) or uncheck it (If the image is irrelevant). While users rate the image with a higher ID pl_flower\84059 than the image with ID pl_flower\476083. Figure 3.1: Typical interface of CBIR system with relevant feedback. Second, the methods above consider the regions containing the different good query points to be equal and assign the same weight to all adjacent points of the good query. This is not appropriate because different regions often have separate attributes. Figure 3.2. Illustration of two equally well-queried regions. (a) Figure on the left: the first query point. (b) Figure on right: second query point. 17 Based on this observation, the thesis has proposed an image retrieval method through adaptive weight, named AWEIGH (An efficient image retrieval method using adaptive weights) [CT6]. In this method, instead of using the same weight vector for regions that contain different good query points, the method automatically calculates the good query points and the good weight vectors corresponding to the regions that contain the good query points based on user’s feedback. In addition, previous methods clustered all feedback images, thus the computational complexity of those methods will be high. To address this limitation, the proposed method only clustered feedback images in the first iteration (from the second iteration onwards, the method only adds feedback images to the clusters) (See section 2.3 of Chapter 2). As shown in Fig. 3.3, the main difference between our proposed method and the existing relevant feedback image retrieval methods lies in the three main components: (a) Determining the optimal query points (b) Computing the weight vectors; (c) Computing the improved distance functions. These components can be embedded in any relevant feedback image retrieval system, so we will describe each of these components separately. Figure 3.3. Diagram of the image retrieval system using adaptive weights. Search Engine AWEIGHT Determining the good query points Computing the weight vectors Search Engine Result set Relevant set Initial Cluster Training set Result set Relevant set Incremental Cluster Computing the improved distance functions Initial Query 18 3.2. The algorithm determines the good query point and the adaptive weight set of the improved distance function. In this section, the thesis presents the proposed technique for determining the optimal query point and the adaptive weights of the distance function. The technique determines the optimal query point and the adaptive weights according to a given cluster of images. In the case of multiple clusters, this technique is performed for each cluster. Given a cluster i (i=1,,g), each image in the cluster i that is represented by with j=1 , matrix where denotes the number of images for ith cluster). Suppose the optimal query vector for cluster i is . Assume a user’s evaluation information in terms of relevancy for each (j=1,.., ) (where  2 , vector represents the user’s feedback of the relevance level of each image in cluster i . The problem of finding the optimal query point and the weight matrix is referred to the problem of minimizing penalties as follows: ∑ ( ) (3.1) Subject to: det( )=1 Where det( ) is the determinant of the matrix (to avoid the case is a zero matrix). To solve the minimization problem, we use the method of Lagrange multipliers - As a result,n an optimal query point : ∑ where ∑ ∑ (3.2) - If (C(i))-1 exists, the matrix weight matrix : C C (3.3) With C be the weighted covariance matrix of the images in the cluster i: ∑ ̅̅ ̅̅ ̅ ̅̅ ̅̅ ̅ (3.4) Since the optimal query vector and the weight matrix W, we have the distance function as: ( ) ( ) ( ) (3.5) Let Cpf ( ) be the list of points in the cluster of positive feedback samples corresponding to the optimal query point i ( i.e., the list of points in the corresponding ellipse. is the list of k points nearest to pi. C are the positive 19 feedback points of the neighborhood k of pi. Our proposed distance formula is written as follows: ( ) ( ) (3.6) Where: ( ) is the improvement distance from a point pi to the optimal query point . is the distance from point pi to point . 3.3. An efficient image retrieval method using adaptive weights The thesis proposes an image retrieval algorithm that uses optimal query points, optimal distance functions, and improved distance functions, named AWEIGHT. AWEIGHT algorithm Input: Image set: S Query: Qinitial Number of retrieved images after each interation: k Output: The result set: Result(Qopt) 1. Result(Qinitial) ; 2. Relevant( ,N)Feedback ; 3. CISE(Relevant( ,N), g, X) 4. D{ } 5. Repeat 5.1 for i1 to g do FQM( , , ) 5.2 Result(Qopt) <g, { , ,... }, { }, , S, k>; 5.3 Relevant( ’)Feedback ( ) ; 5.4 F j ’ INC(D, xj  Relevant( ’), i); Add(xj, ) until (User stops responding); 6. Return Result(Qopt); 3.4. Experimental results 3.4.1. Test environment The set of images The database used for the test is a subset of Corel. This set includes 10.800 images. 3.4.2. Evaluation 20 In our experiment, the parameters were chosen as follows: The performance of the system is evaluated on COREL database including 10800 images, all images in the database were selected as the query images. We evaluate our method based on the average accuracy of 10800 query images. Each query will have 100 returned images. We chose 100 returned images for a query because users usually only view within two screens and each screen contains 50 images.( Table 3.2. The average accuracy in three feedbacks of the compared methods Name of method Average accuracy (%) 2 query points 4 query points 8 query points CRF 0.2387 0.3065 0.3199 DSSA 0.3135 0.42658 0.4846 WATH 0.2856 0.3763 0.4218 AWEIGHT 0.3324 0.48658 0.5125 Table 3.2 shows the average accuracy of four methods, CRF, DSSA, WATH and our AWEIGHT method at the second feedback loop with the number of query points are 2, 4 and 8. With two query points, our method has higher accuracy than the four methods CRF, DSS

Các file đính kèm theo tài liệu này:

improve_the_efficiency_of_content_based_image_retrieval_thro.pdf