A Novel Approach for Filtering Junk Images from Google Search
Transcription
A Novel Approach for Filtering Junk Images from Google Search
A Novel Approach for Filtering Junk Images from Google Search Results Yuli Gao1 , Jianping Fan1, , Hangzai Luo1 , and Shin’ichi Satoh2 1 Department of Computer Science, UNC-Charlott, USA {jfan,ygao,hluo}@uncc.edu 2 National Institute of Informatics, Tokyo, Japan satoh@nii.ac.jp Abstract. Keyword-based image search engines such as Google Images are now very popular for accessing large amount of images on the Internet. Because only the text information that are directly or indirectly linked to the images are used for image indexing and retrieval, most existing image search engines such as Google Images may return large amount of junk images which are irrelevant to the given queries. To filter out the junk images from Google Images, we have developed a kernel-based image clustering technique to partition the images returned by Google Images into multiple visually-similar clusters. In addition, users are allowed to input their feedbacks for updating the underlying kernels to achieve more accurate characterization of the diversity of visual similarities between the images. To help users assess the goodness of image kernels and the relevance between the returned images, a novel framework is developed to achieve more intuitive visualization of large amount of returned images according to their visual similarity. Experiments on diverse queries on Google Images have shown that our proposed algorithm can filter out the junk images effectively. Online demo is also released for public evaluation at: http://www.cs.uncc.edu/∼jfan/google−demo/. Keywords: Junk image filtering, similarity-preserving image projection. 1 Introduction As online image sharing and personal journalism become more and more popular, there is an urgent need to develop more effective image search engines, so that users can successfully retrieve large amount of images on the Internet. Text-based image search engines such as Google Images have achieved great success on exploiting text information to index and retrieve large-scale online image collections. Even Google has the most powerful text search engine in the world, Google Images is still unsatisfactory because of the relatively low precision rate of the top ranked images [6-10]. One of the major reasons for this phenomena is due to the fact that Google simplifies the image search problem as a purely text-based search problem, and the underlying assumption is that the image semantics are directly related to the text terms extracted from the associated documents. Unfortunately, such oversimplified online image indexing approach has completely ignored that the linkages between the image semantics and the text Correspondence Author. S. Satoh, F. Nack, and M. Etoh (Eds.): MMM 2008, LNCS 4903, pp. 1–12, 2007. c Springer-Verlag Berlin Heidelberg 2007 2 Y. Gao et al. terms (that can be extracted from the associated text documents) may not be one-to-one correspondence, but they could be one-to-many, many-to-one, or many-to-many relationships, or even there is no exact correspondence between the image semantics and the associated text terms. This is the major reason why Google Images may return large amount of junk images which are irrelevant to the given keyword-based queries. In addition, a lot of real world settings, such as photo-sharing websites, may only be able to provide biased and noisy text taggings which may further mislead the text-based image search engines such as Google Images. Therefore, there is an urgent need to develop new algorithms to support junk image filtering from Google Images [6-10]. With the increasing computional power of modern computers, it is possible to incorporate image analysis algorithms into the text-based image search engines such as Google Images without degrading their response speed significantly. Recent advance in computer vision and multimedia computing can also allow us to take advantages of the rich visual information (embedded in the images) for image semantics interpretation. Some pioneer works have been proposed to improve Google Images [6-10]. By integrating multi-modal information (visual similarity, associated text, and users’ feedbacks) for image semantics interpretation, we have developed a novel framework to filter out the junk images from Google Images. Our scheme takes the following major steps for junk image filtering: (a) Google Images is first performed to obtain large amount of images returned for a given text-based query; (b) Our feature extraction algorithm is then performed to extract both the global and local visual features for image similarity characterization; (c) The diverse visual similarities between the images are characterized jointly by using multiple kernels and the returned images are partitioned into multiple clusters according to their kernel-based visual similarities; (d) A hyperbolic visualization algorithm is developed to achieve more understandable assessment of the relevance between the returned images and the users’ real query intentions; (e) If necessary, users can involve to select few relevant images or junk images and such users’ feedbacks are automatically transformed to update the kernels for image similarity characterization; (f) The updated kernel is further used to create a new presentation of the returned images adaptively according to the users’ personal preferences. 2 Image Content Representation The visual properties of the images are very important for users to assess the relevance between the images returned by keyword-based queries and their real query intentions [1-2]. Unfortunately, Google Images has fully ignored such important characteristics of the images. In this paper, we have developed a new framework to seamlessly integrate keyword-based image search with traditional content-based image search. To avoid the pitfalls of image segmentation tools, image segmentation is not performed for feature extraction. To characterize the diverse visual properties of images efficiently and effectively, both the global visual features and the local visual features are extracted for image similarity characterization. The global visual features such as color histogram can provide the global image statistics and the general perceptual properties of entire images [11]. On the other hand, the local visual features via wavelet image transformation A Novel Approach for Filtering Junk Images from Google Search Results 3 can characterize the most significant information of the underlying image structures effectively [12]. To filter out the junk images from Google Images, the basic question is to define more suitable similarity functions to accurately characterize the diverse visual similarities between the images returned by the keyword-based queries. Recently, the use of kernel functions for data similarity characterization plays an important role in the statistical learning framework, where the kernel functions may satisfy some mathematical requirements and possibly capture some domain knowledge. In this paper, we have proposed two basic image descriptors to characterize various visual and geometrical properties of images [11-12]: (a) global color histogram; (b) texture histogram via wavelet filter bank. The diverse visual similarities between the images can be characterized more effectively and efficiently by using a linear combination of their basic image kernels (i.e., mixture-of-kernels): K̂(x, y) = κ αi Ki (x, y), i=1 κ αi = 1 (1) i=1 where αi ≥ 0 is the importance factor for the ith basic image kernel Ki (x, y) for image similarity characterization. The rules for multiple kernel combination (i.e., the selection of the values for the importance factors α) depend on two key issues: (a) The relative importance of various visual features for image similarity characterization; (b) The users’ preference. In this paper, we have developed an iterative algorithm to determine the values of the importance factors by seamlessly integrating both the importance of visual features and the users’ preferences. 3 Kernel-Based Image Clustering The images returned by the same keyword-based search are automatically partitioned into multiple clusters according to their kernel-based visual similarities. Our kernelbased image clustering algorithm is able to obtain the most significant global distribution structures of the returned images. Through using multiple kernels for diverse image similarity characterization, our kernel-based image clustering algorithm is able to handle high-dimensional visual features effectively. The optimal partition of the returned images is obtained by minimizing the trace of φ . The scatter matrix is given by: the within-cluster scatter matrix, Sw ⎛ ⎞ N τ N 1 1 φ Sw = βli ⎝K̂(xi , xj ) − βlj K̂(xi , xj )⎠ (2) N N l j=1 i=1 l=1 where K̂(·.·) is the mixture kernel function, N is the number of returned images and τ is the number of clusters, Nl is the number of images in the lth cluster. Searching the optimal values of the elements βli that minimizes the expression of the trace can be achieved effectively by an iterative procedure. One major problem for kernel-based image clustering is that it may require huge memory space to store the kernel matrix when large amount of images come into view. Some pioneer works have been done for 4 Y. Gao et al. reducing the memory cost, such as Chunking, Sequential Minimal Optimization (SMO), SVMlight , and Mixture of Experts. One common shortage for these decompositionbased approaches is that global optimization is not performed. Rather than following these decomposition-based approaches, we have developed a new algorithm for reducing the memory cost by seamlessly integrating parallel computing with global decision optimization. Our new algorithm takes the following key steps: (a) Users are allowed to define the maximum number of returned images which they want to see and assess, and thus it can reduce the memory cost significantly. In addition, the returned images are partitioned into multiple smaller subsets. (b) Our kernelbased image clustering algorithm is then performed on all these smaller image subsets to obtain a within-subset partition of the images according to their diverse visual similarities. (c) The support vectors for each image subset are validated by other image subsets through testing Karush-Kuhn-Tucker (KKT) conditions. The support vectors, which violate the KKT conditions, are integrated to update the decision boundaries for the corresponding image subset incrementally. This process is repeated until the global optimum is reached and an optimal partition of large amount of images under the same image topic can be obtained accurately. Our kernel-based image clustering algorithm has the following advantages: (1) It can seamlessly integrate multiple kernels to characterize the diverse visual similarities between the images more accurately. Thus it can provide a good insight of large amount of images by determining their global distribution structures (i.e., image clusters and their distributions) accurately, and such global image distribution structures can further be integrated to achieve more effective image visualization for query result assessment. (2) Only the most representative images (which are the support vectors) are stored and validated by other image subsets, thus it may request far less memory space. The redundant images (which are non-support vectors) are eliminated early, thus it can significantly accelerate kernel-based image clustering. (3) Because the support vectors for each subset are validated by other subsets, our algorithm can handle the outliers and noise effectively and it can generate more robust clustering results. 4 Kernel Selection for Similarity-Preserving Image Projection When the majority of the images returned by Google Images are relevant to the given keyword-based query, there should be an intrinsic clustering structure within the corresponding kernel matrix, i.e., the kernel matrix would be in the form of a perturbed block-diagonal matrix, where each block corresponds to one certain visual category, and other entries of the kernel matrix (which corresponds to outliers or wrong returns) are close to zero. Based on this understanding, it seems reasonable to apply some meaningful “clusterness” measurement, such as the sum of square within-cluster distances, to estimate the relative importance between various basic image kernels, and such clusterness measurement can further be used as the criteria for kernel selection. However, this naive approach may actually yield a faulty decision due to the following reasons: (a) The majority assumption may not hold true. In the study conducted by Fergus et al. [6-7], it is reported that, among the images returned by Google Images, contains more “junk A Novel Approach for Filtering Junk Images from Google Search Results 5 images” than “good images” for more than half of the queries they studied. (b) A high “clusterness” measurement may not directly imply a good kernel matrix, i.e., the reverse statement about the clusterness of the kernel matrix is not true. A trivial kernel matrix, with one in all its entries, may always yield the best clusterness score for all the queries. However, such trivial kernel matrix is certainly meaningless in revealing the true clustering structure. (c) The text-based search engines unavoidably suffer from the problem of semantic ambiguity. When users submit a query via keyword, the textbased image search engines such as Google Images may not know a priori which word sense corresponds to the user’s request. Therefore, even the ideal kernel matrix may be available, the text-based search engines can not possibly know which image clusters are most relevant to the users’ real needs. Because the systems may not know the real needs of users (i.e., which image cluster is relevant or which image cluster is irrelevant to a given keyword-based query), it is very hard to define the suitable criteria to evaluate the goodness of the kernel matrix and achieve automatic kernel selection for junk image filtering, i.e., without users’ inputs, it is very hard if not impossible to identify which image clusters correspond to the junk images. One potential solution for these difficulties is to allow users to interactively provide additional information for junk image filtering. Obviously, it is worth noting that such interaction should not bring huge burden on the users. In order to capture users’ feedbacks for junk image filtering, it is very important to enable similarity-based visualization of large amount of images returned by Google Images, so that users can quickly judge the relevance of an image with their real query intentions. It is well-known that the diverse visual similarities between the images can be characterized more effectively and efficiently by using different types of visual features and different types of kernels. Therefore, different types of these basic image kernels may play different roles on characterizing the similarity of the returned images from Google Images, and the optimal kernel for image similarity characterization can be approximated more effectively by using a linear combination of these basic image kernels with different importances. Obviously, such optimal combination of these basic image kernels for image similarity characterization also depends on users’ preference. To allow users to assess the relevance between the returned images and their real query intentions, it is very important to achieve similarity-based visualization of large amount of returned images by selecting an optimal combination of the basic image kernels. Instead of targeting on finding an optimal combination of these basic image kernels at the beginning, we have developed an iterative approach by starting from a single but most suitable basic image kernel for generating the image clusters and creating the hyperbolic visualization of the returned images, and the user’s feedbacks are then integrated for obtaining the most accurate combination of the basic image kernels iteratively. We adopt a semi-supervised paradigm for kernel combination and selection, where the most suitable basic image kernel is first used to generate the visually-similar image clusters and create the similarity-based visualization of the returned images. The users are then allowed to choose a couple of relevant/junk images. Such users’ feedbacks are then transformed and integrated for updating the underlying image kernels incrementally, re-clustering the returned images and creating new presentation and 6 Y. Gao et al. visualization of the returned images. Through such iterative procedure, the most suitable image kernels can be selected and be combined to effectively characterize the diverse image similarities and filter out the junk images from Google Images. To select the most suitable image kernel to start this iterative procedure, the S measure is used. For a given basic image kernel K, it can be turned into a distance matrix D, where the distance D(x, y) between two images with the visual features x and y is given by: (3) D(x, y) = φ(x) − φ(y) = K̂(x, x) + K̂(z, z) − 2K̂(x, z) where we use φ(x) to denote the implicit feature space of the image with the visual features x. We then rank all of these basic image kernels by their S scores, which is defined as: m m m n i=1 j=i+1 D(xi , xj ) − i=1 j=1 D(xi , yj ) (4) S= median(D) where median(D) gives the median distance between all the pair-distances among all the image samples. {xi |i = 1, · · ·, m} and {yj |j = 1, · · ·, n (m + n ≥ 2)} are the image pairs. Intuitively, S measure gives the favor of the basic image kernels which may have higher similarity between the relevant image pairs and lower similarity between the irrelevant image pairs. The smaller the S score, the better characterization of the image similarity. Therefore, the basic image kernel with the lowest S score is first selected as the ideal kernel to achieve an initial partition (clustering) of large amount of images returned by Google Images, and create an initial hyperbolic visualization of the returned images according to their kernel-based visual similarity, so that the users can easily assess the relevance between the returned images and their query intentions. In addition, the users can input their feedbacks interactively according to their personal preferences. To preserve the similarity relationships between the returned images, the images returned by Google Images are projected to a 2D hyperbolic coordinate by using Kernel Principle Component Analysis (KPCA) according to the selected basic image kernel [13]. The kernel PCA is obtained by solving the eigenvalue equation: Kv = λM v (5) → v1 , · · ·, − v→ where λ = [λ1 , · · · , λM ] denotes the eigenvalues and v = [− M ] denotes the corresponding complete set of eigenvectors, M is the number of the returned images, K is a kernel matrix. The optimal KPCA-based image projection can be obtained by: ⎫ ⎧ M M ⎬ ⎨ min |K̂(xi , xj ) − d(xi , xj )|2 (6) ⎭ ⎩ i=1 j=1 xi = M l=1 αl K̂(x, xl ), xj = M l=1 αl K̂(xl , xj ) A Novel Approach for Filtering Junk Images from Google Search Results 7 where K̂(xi , xj ) is the original kernel-based similarity distance between the images with the visual features xi and xj , d(xi , xj ) is their location distance on the display unit disk by using kernel PCA to achieve similarity-preserving image projection. Thus the visually-similar images (i.e., images with smaller kernel-based similarity distances) can be visualized closely on the display unit disk. The suitable kernels for similaritypreserving image projection can be chosen automatically to make the most representative images from different clusters to be spatially distinct. Our mixture-kernel function can characterize the diverse visual similarities between the images more accurately than the weighted distance functions used in multidimensional scaling (MDS), thus our KPCA-based projection framework can achieve better similarity-based image visualization than the MDS-based projection approaches. Therefore, KPCA-based image projection algorithm can preserve the similarity relationships between the images effectively. 5 Hyperbolic Image Visualization for Hypothesis Assessment After such similarity-based image projection is obtained by using KPCA, Poincaré disk model [15] is used to map the returned images from their feature space (i.e., images which are represented by their visual features) onto a 2D display coordinate. Poincaré disk model maps the entire Euclidean space into an open unit circle, and produces a non-uniform mapping of the Euclidean distance to the hyperbolic space. Formally, if let ρ be the hyperbolic distance and r be the Euclidean distance, of one certain image A to the center of the unit circle, the relationship between their derivative is described by: 2 · dr (7) dρ = 1 − r2 Intuitively, this projection makes a unit Euclidean distance correspond to a longer hyperbolic distance as it approaches the rim of the unit circle. In other words, if the images are of fixed size, they would appear larger when they are closer to the origin of the unit circle and smaller when they are further away. This property makes it very suitable for visualizing large amount of images because the non-uniformity distance mapping creates an emphasis for the images which are in current focus, while de-emphasizing those images that are further form the focus point. In practice, it is often difficult to achieve an optimal kernel at the first guess. Therefore, it is desirable to allow users to provide feedbacks to the system, e.g., how closely the current image layouts correspond to their real needs. On the other hand, it is also very important to guarantee that the system can capture such users’ feedbacks effectively and transform them for updating the underlying kernel matrix and creating new presentation and visualization of large amount of returned images. In this paper, we have explored the usage of pair-wise constraints which can be obtained from users’ feedbacks automatically. In order to incorporate the users’ feedbacks for improving kernel-based image clustering and projection, we have proposed an iterative algorithm that can directly translate the constraints (derived from the relevant and junk images given by the users) into the kernel transformation of input space (feature space) to generate more accurate kernels for image clustering and projection. 8 Y. Gao et al. Fig. 1. Our online system for filtering junk images from Google Images, where the keyword “red flower” is used for Google image search and most junk images are projected on the left side One naive method is to generalize the vector-based kernel by introducing a weight w = (w1 , w2 , ..., wN ) on each feature dimension of the input vector space, i.e., φ(x) = (w1 x1 , w2 x2 , ..., wN xN ). Suppose we encode the pair-wise constraints between two sets of feature vectors x, y in a constraint matrix C, where C(x, y) = 1 for must-link image pairs (relevant image pairs); -1 for cannot-link image pairs (junk image pairs); 0 for non-constrained image pairs (image pairs which are not selected by the users), the weight w can then be updated as: w i = wi · e−γxi−yi ·c(x,y) (8) where γ is a learning rate specifiable by the users. This reweighing process corresponds to a dimension-wise rescaling of the input space such that the must-link image pairs can be close (in the form of norm distance) to each others, and the cannot-link image pairs are far apart. The resulting weight w also has an intuitive interpretation: dimensions associated with large weights are more discriminant. For example, when the feature vectors are represented as the color histogram, large weight for a certain dimension (color bin) means that the proportion of the image area associated with this quantized color play more important role on characterizing the image similarity. If we have m constraints to satisfy, the original input space can be transformed by a sequence of localized functions f 1 , f 2 , · · ·, f m and the final transformation of the input space is given by φ(x) = f m (f m−1 (...f 2 (f 1 (x)))). However, the major limitation of this simple dimension-wise rescaling algorithm is that the scale factor along the full range of the respective dimension is uniform. If there exists the same rescaling demand on this dimension from both the must-link constraints and the cannot-link constraints, the rescaling would be cancelled. A Novel Approach for Filtering Junk Images from Google Search Results 9 Fig. 2. Our online system for filtering junk images from Google Images, where the keyword “sunset” is used for Google image search and most junk images are projected on the right-bottom corner To address this conflict, we have introduced two operators: shrinkage and expansion, whose rescaling effects are limited to a local neighborhood. In this work, we use a piecewise linear function to achieve localized expansion and shrinkage. Obviously, other localized functions may also be applicable. → x ) = (f1 (x1 ), As indicated above, the transformation is now in the form of f k (− f2 (x2 ), · · ·, fN (xN )), where fi (xi ), i = 1, · · ·, N are non-linear functions with local→ ized transformations, and − x = {x1 , · · ·, xN } is N -dimensional feature vector. Given a pair of vectors u, v, the ith component of the transformation is to be updated as: ⎧ xi < ui ⎨ xi xi ∈ [ui , vi ] fi (xi ) = a · (xi − ui ) + ui (9) ⎩ xi + (a − 1) · (vi − ui ) xi > vi where vi > ui , a is a constant term that satisfies: a > 1 for expansion operation, and 0 < a < 1 for shrinkage operation. We set a = γ1 for the must-link constraints and a = γ for the cannot-link constraints, where γ > 1 reflects the learning rate. This constrained rescaling is used in the hyperbolic visualization, which is iteratively rescaled until the best kernel for junk image filtering according to the users’ personal preferences. Although this rescaling is done piece-wise linearly in the input space, it can be a non-linear mapping in the feature space if non-linear kernels such as RBF are used. It can be proved that the new kernel satisfies Mercer’s conditions because K(x, y) = N N K(φ(x), φ(y)), where φ(x) : R −→ R . 10 Y. Gao et al. Through such iterative kernel updating algorithm, an optimal kernel matrix is obtianed by seamlessly integrating both the visual consistency between the relevant images and the constraints derived from the user feedbacks, and our kernel-based image clustering algorithm is then performed to partition the returned images into multiple visual categories. The image cluster that is selected as the relevant images is returned to the user as the final result. Images in this cluster are then ranked in an ascending order according to their kernel-based similarity distances with the images that are selected by the users. 6 System Evaluation For a given text-based image query, our system can automatically generate 2D hyperbolic visualization of the returned images according to their diverse kernel-based visual similarities. In Figs. 1, 2 and 3, the junk image filtering results for several keywordbased Google searches are given. From these experimental results, one can observe that our proposed system can filter out the junk images effectively. In addition, users are allowed to provide the must-link and the cannot-link constraints by clicking the relevant images and the junk images. Such constraints given by the users are automatically incorporated to update the underlying image kernels, generate new clustering and create new presentation and visualization of the returned images as shown in Fig. 4. One can observe that most junk images are filtered out after the first run of feedback. In order to invite more people to participate for evaluating our junk image filtering system, we have released our system at: http://www.cs.uncc.edu/∼jfan/google−demo/ . To evaluate the effectiveness of our proposed algorithms for kernel selection and updating, the accuracy of the undlerying image clustering kernel is calculated for each user-system interaction. Given the confusion matrix C for image clustering, the accuracy is defined as: Fig. 3. Our online system for filtering junk images from Google Images, where the keyword “blue sky” is used for Google image search and most junk images are projected on the left side A Novel Approach for Filtering Junk Images from Google Search Results 11 Fig. 4. The filtering results for the keyword-based search “red flower”, the images, which boubdaries are in red color, are selected as the relevant images by users. 0.95 clustering accurarcy 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 1 2 3 4 5 number of relevance feedback Fig. 5. Clustering accuracy as a function of the number of feedbacks provided by users. The solid line represents the average clustering accuracy while the error bar shows the standard deviation over all 500 queries. c C(i, i) c Accurarcy = c i=1 i=1 j=1 C(i, j) (10) where c = 2 is the number of clusters (i.e. relevant versus junk clusters). As shown in Fig. 5, the performance of our kernel-based image clustering algorithm generally increases with the number of constraints provided by the users, but it becomes stable after 4 iterations. On average, our kernel-based image clustering algorithm can achieve over 75% accuracy after filtering the junk images from Google Images. Compared to the original 58% average accuracy of Google Images, our proposed junk image filtering algorithm can achieve a significant improvement. 12 Y. Gao et al. 7 Conclusions In this paper, we have presented an interactive kernel learning algorithm to filter out the junk images from Google Images or a similar image search engine. The interaction between users and the system can be done quickly and effectively through a hyperbolic visualization tool based on Poincaré disk model. Supplied with user-given constraints, our kernel learning algorithm can incrementally update the underlying hypotheses (margin between the relevant images and the junk images) to approximate the underlying image relevance more effectively and efficiently, and the returned images are then partitioned into multiple visual categories according to the learned kernel matrix automatically. We have tested our kernel learning algorithm and the relevance feedback mechanism on a variety of queries which are submitted to Google Images. Experiments have shown good results as to the effectiveness of this system. This work shows how a straightforward interactive visualization tool coupled tightly with image clustering methods and designed carefully so that the complex image clustering results are presented to the user in an understandable manner can greatly improve and generalize the quality of image filtering. References 1. Fan, J., Gao, Y., Luo, H.: Multi-level annotation of natural scenes using dominant image compounds and semantic concepts. ACM Multimedia (2004) 2. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. on PAMI (2000) 3. He, X., Ma, W.-Y., King, O., Li, M., Zhang, H.J.: Learning and inferring a semantic space from user’s relevance feedback. ACM Multimedia (2002) 4. Tong, S., Chang, E.Y.: Support vector machine active learning for image retrieval. ACM Multimedia, 107–118 (2001) 5. Rui, Y., Huang, T.S., Ortega, M., Mehrotra, S.: Relevance Ffeedback: A power tool in interactive content-based image retrieval. IEEE Trans. on CSVT 8(5), 644–655 (1998) 6. Fergus, R., Perona, P., Zisserman, A.: A Visual Category Filter for Google Images. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3024, Springer, Heidelberg (2004) 7. Fergus, R., Fei-Fei, L., Oerona, P., Zisserman, A.: Learning object categories from Google’s image search. In: IEEE CVPR (2006) 8. Cai, D., He, X., Li, Z., Ma, W.-Y., Wen, J.-R.: Hierarchical clustering of WWW image search results using visual, textual, and link information. ACM Multimedia (2004) 9. Wang, X.-J., Ma, W.-Y., Xue, G.-R., Li, X.: Multi-modal similarity propagation and its application for web image retrieval. ACM Multimedia (2004) 10. Gao, B., Liu, T.-Y., Qin, T., Zhang, X., Cheng, Q.-S., Ma, W.-Y.: Web image clustering by consistent utilization of visual features and surrounding texts. ACM Multimedia (2005) 11. Ma, W.-Y., Manjunath, B.S.: Texture features and learning similarity. IEEE CVPR, 425–430 (1996) 12. Fan, J., Gao, Y., Luo, H., Satoh, S.: New approach for hierarchical classifier training and multi-level image annotation, MMM, Kyoto (2008) 13. Scholkopf, B., Smola, A.J., Muller, K.-R.: Kernel principal component analysis. Neural Computation 10(5), 1299–1319 (1998) 14. Vendrig, J., Worring, M., Smeulders, A.W.M.: Filter image browsing: Interactive image retrieval by using database overviews. Multimedia Tools and Applications 15, 83–103 (2001) 15. Fan, J., Gao, Y., Luo, H.: Hierarchical classification for automatic image annotation. ACM SIGIR (2007)