Get - Wiley Online Library
Transcription
Get - Wiley Online Library
Computational Intelligence, Volume 0, Number 0, 2016 SWALLOW: RESOURCE AND TAG RECOMMENDER SYSTEM BASED ON HEAT DIFFUSION ALGORITHM IN SOCIAL ANNOTATION SYSTEMS VAHIDEH AMEL MAHBOOB, MEHRDAD JALALI, MAJID VAFAEI JAHAN, AND PEGAH BAREKATI Mashhad Branch, Islamic Azad University, Mashhad, Iran Social annotation systems (SAS) allow users to annotate different online resources with keywords (tags). These systems help users in finding, organizing, and retrieving online resources to significantly provide collaborative semantic data to be potentially applied by recommender systems. Previous studies on SAS had been worked on tag recommendation. Recently, SAS-based resource recommendation has received more attention by scholars. In the most of such systems, with respect to annotated tags, searched resources are recommended to user, and their recent behavior and click-through is not taken into account. In the current study, to be able to design and implement a more precise recommender system, because of previous users’ tagging data and users’ current click-through, it was attempted to work on the both resource (such as web pages, research papers, etc.) and tag recommendation problem. Moreover, by applying heat diffusion algorithm during the recommendation process, more diverse options would present to the user. After extracting data, such as users, tags, resources, and relations between them, the recommender system so called “Swallow” creates a graph-based pattern from system log files. Eventually, following the active user path and observing heat conduction on the created pattern, user further goals are anticipated and recommended to him. Test results on SAS data set demonstrate that the proposed algorithm has improved the accuracy of former recommendation algorithms. Received 5 August 2014; Revised 9 October 2015; Accepted 28 October 2015 Key words: social annotation systems, web recommender systems, heat diffusion, graph. 1. INTRODUCTION One of the most successful web2 is social annotation services (SAS), such as Delicious, CiteULike, and Flickr, which has been significantly developed recently. In SAS, users are able to simply organize, share, and retrieve such online resources as resources in Delicious, research papers in CiteUlike, and photos in Flicker by means of annotation technique. Through development of these systems, SASs users have generated great amount of annotation data, which has attracted many research societies’ concern. Considering the volume of online resources, finding resources, which each user is interested in, is highly significant. Most of the annotation services give access to the resources having equivalent tag to keywords searched by user. It should be noted that by searching in this way, users’ problem in detecting interesting resources still remains because the returned resources count is too much so that finding a proper reference through thousands of resources is confusing and dawdling. In other words, the result of such methods only retrieved those resources that are matched with given tags and doesn’t consider the semantically related resources. For instance, in web pages, when a user searches for “clothing,” it may be impossible to retrieve web pages labeled with “garment.” Further, searching appropriate queries and modifying them is difficult, tedious, and not applicable. Therefore, a recommender module is required that recommends to users the most favorite resources among thousands or even millions Address correspondence to Mehrdad Jalali, Mashhad Branch, Islamic Azad University, Mashhad, Islamic Republic of Iran; e-mail: dr_mehrdadjalali@yahoo.com © 2016 Wiley Periodicals, Inc. COMPUTATIONAL INTELLIGENCE FIGURE 1. Two saved bookmarks in Delicious by a user. FIGURE 2. Tag recommender (user would like to annotate a tag on a resource, so list of tags recommended to him). of resources. Since the first emergence of collaborative filtering (CF) systems, the recommender system was considerably taken into account by industry and academia (Resnick et al. 1994; Maes and Shardanand 1995). Older recommender systems focused on user’s explicit ranking (e.g., movie ranking), while SAS data have distinctive properties. Figure 1 shows two bookmarks saved by a user in Delicious. In each bookmark, the web page title and its allocated tags are shown on the left-up side and the right-down side, respectively. The main differences between tagging data and ranked data are as follows: (1) Unlike ranked data, tagging data doesn’t contain users’ explicit priority information on resources. (2) Tagging data are composed of three parameters (user, tag, and resource), and ranked data are only composed of two parameters (user and resource). These differences create some opportunities in recommender problem on the tagging data. There are two kinds of recommender in SAS: the first one is tag recommender, when a user tends to annotate a resource, as shown in Figure 2. In this system, the user has one resource and would allocate a proper tag to it; here, the recommender system suggests the user those semantically related tags to the resource. The second one (according to searched tags) is to suggest bookmarked resources, which still have not been visited by the user, as illustrated in Figure 3. In this figure, the user searches a tag by which the equivalent resources of that tag are recommended to the user as well. In previous systems, the active user’s progress was not regarded, and they were only evaluated and analyzed by considering the accuracy and diversity of recommended resources to the user. Applying heat diffusion algorithm, the current research is focused on resource and tag recommendation system based on tagging data and users’ current activity and user’s clickthrough. Further, comparing previous methods, in this recommender system, a different method was utilized in a way that users’ current activity has been scrutinized in SAS (unlike previous systems that after searching a tag started to recommend resources). It means, after following user’s click-through, user’s interests are anticipated and then proper resources and tags based on the user’s favor are recommended. The learning graph used in this method synchronizes two similarity criteria, causing more precision of semantic relations of graph SOCIAL ANNOTATION SYSTEMS, WEB RECOMMENDER SYSTEMS FIGURE 3. Resource recommender (user search a tag, then many resources related to researched keywords recommended to him). vertices (resources and tags). In fact, both relations between available resources and relations between resources and tags are considered in this graph. Another distinctive notion in this study is to apply heat diffusion algorithm in recommending resources to increase the accuracy and diversity of recommended resources. This system is able to recommend both resources and tags; the proposed algorithm is not only designed for using in SAS, but also it would be applicable in common recommender systems as well. Consequently, this algorithm also could be used in query recommender systems, photo recommender systems, etc. The proposed approach is executed on Delicious and Movielens “SAS” data sets. Results are evaluated by comparing recommended resources and tags by the system, by user’s real visiting from resources, and the proposed system’s success is measured with evaluation criteria. Data set is divided in two parts, that is, train and test; and recommended resources generated based on train data set are controlled on test data. Results reveal that applying the proposed system has remarkably augmented the recommendations precision than former approaches. As mentioned before, we have used heat diffusion natural algorithm; therefore, we were interested to name our system as a natural phenomenon. It was a very desirable similarity for this study: sagacity in selecting and following a path and object (to convey a user to the nearest resources to the object in annotation systems) and the bird Swallow immigration approach (to find final destination) using heat factor. For this reason, to animate this study and to introduce the proposed system, the word “Swallow” was used, and its symbol was utilized in the corresponding graphs. The current article is organized as follows: in Section 2, definitions and literature review is presented. The proposed system (Swallow), discussions on designing a “Swallow” and Swallow’s design subjects, are introduced in Section 3. Next, the evaluation and implementation results of the proposed system are presented in Section 4. Section 5 discusses conclusion and future improvements. 2. DEFINITIONS AND LITERATURE REVIEW 2.1. Definitions 2.1.1. Tagging Data Structure. As depicted in Figure 4, there are three connected parameters in a tagging data, which are users, tags, and documents. The tagging data could be seen as a triple set (Heymann et al. 2008; Guan et al. 2009; Markines et al. 2009). Each triple (u, r, and t) states that a user u attributes tag t to resource r. To find weight relations between resources and tags, it can add each of these three options, user, tag, and resource. Thereafter, (1) If the total number of users who have allocated a tag to a document, the number of times is obtained, which a tag has been assigned to a document. (2) And if total number of “documents” have been assigned, a special tag is considered, the number of times which users have used a specified tag is obtained. COMPUTATIONAL INTELLIGENCE FIGURE 4. Presentation of upper level of tagging data structure (Guan et al. 2010). In the proposed method (Swallow), similar to two earlier, mentioned points were used to create a bipartite graph (weighted) between the source and the tag. 2.1.2. Diffusion on the Graph. In this section, based on heat diffusion, a new diffusion graph is introduced. This model can be performed on directed and undirected graphs. First, it is discussed how to deduct parameters because of the graph structure. Then, an example is presented. 2.1.2.1. Heat Diffusion. Heat diffusion is a physical phenomenon. Generally, heat moves from a position of higher temperature to a lower temperature position. Recently, heat diffusion-based methods have been successfully applied in such vast domains such as dimension classification and dimension reduction (Lafferty and Kondor 2002; Niyogi and Belkin 2003; Lebanon and Lafferty 2005). Heat approximated the heat kernel in a closed form for a multinomial family, which had more improvement than Gaussian method or linear kernel (Lebanon and Lafferty 2005). Kondor proposed a separate heat diffusion kernel for classification and showed that simple kernel diffusion on a hypercube had an acceptable efficiency on this kind of data (Lafferty and Kondor 2002). On the other hand, Belkin et al. used heat kernel to weight a neighborhood graph and applied it in a dimension reduction algorithm (Niyogi and Belkin 2003). Yang et al. (2007), by using heat diffusion, suggested a ranking algorithm called Rank Diffusion. The simulation states that this method is highly effective in recognizing spams. In this article, we use heat diffusion to find the similarity between user’s click-through and other previous annotated resources. In nature and physics, the heat diffusion is performed on the manifolds, but we propagate it on the graph. 2.1.2.2. Heat Diffusion on Undirected Graph. Consider the undirected graph G ® D .V; E/ where V D ¹v1 ; v2 ; : : : ; vn º ¯is a set of graph vertexes and E D vi ; vj j t here is an edge bet wee n vi t o vj is a set of edges. We suppose edge vi ; vj as a pipe connecting vi and vj vertexes. The value fi .t/ shows the amounts of heat at vertex vi at time t beginning from an initial distribution of heat given by fi .0/ at time 0. f .t / represents a vector that its components are fi .t /. The mathematic model of this algorithm is as the utilized approaches by Ma et al. (2012): suppose at the time t , the vertex I receives the amounts M.ijt; t / heat from its neighbor vertex j during a time period t . The amounts of exchanged heat should be related to t and also related to the difference SOCIAL ANNOTATION SYSTEMS, WEB RECOMMENDER SYSTEMS ˇ ˇ of ˇfi .t / fj .t /ˇ. Moreover, the heat transfers through a pipe connecting these vertices. So we can say M .i; j; t; t / D ˛.fi .t / fj .t //t (1) ˛ is the thermal conductivity of the heat diffusion coefficient. As a result, the different amount of heat on the vertex i at the times t and t C t is equal to some of the heat that this vertex receives from its neighbor vertexes. Mathematically speaking, we have the following: f .t C t / f .t / D˛ t X fj fi .t / (2) j W.vj ;vi /2E E is the graph edges set. To find out a close solution, we rewrite the aforementioned equation in a matrix form. f .t C t / f .t / D ˛ .H D/ f .t / t where ² Hij D (3) vi ; vj 2 Eor vj ; vi 2 E ot herwise 1 0 and ² Dij D d .vi / 0 i Dj ot herwise (4) (5) In which d.vi / is the degree of the vertex vi . From the definition, it is clear that the D is a Diagonal matrix. We normalize all the elements of the matrices H and D by the degrees of all vertices to have a more general representation. So the H and D matrices got improved as follows: ² 1 vi ; vj 2 E d .v / i Hij D (6) 0 otherwise and ² Dij D 1 0 i Dj otherwise (7) Now, while the limit .t / tends to zero, the equation is also as the following: d f .t / D ˛ .H D/ f .t / dt (8) And we got to this answer when solving this equation as derived is equal to the same function, so we have the following: f .1/ D e ˛.H D/ f .0/ (9) In which d.v/ is the degree of vertex v. Using the following expansion, we can calculate the amounts of e ˛.H D/ and could be extended as e ˛.H D/ D I C ˛ .H D/ C ˛2 ˛3 ˛4 .H D/2 C .H D/3 C .H D/4 C : : : (10) 2Š 3Š 4Š COMPUTATIONAL INTELLIGENCE FIGURE 5. Two simple heat diffusion examples on an undirected graph (Ma et al. 2012). The e ˛.H D/ is called the diffusion kernel in the sense that the heat diffusion process continues infinitely many times from the initial heat diffusion. This problem is very important to assign vertexes in a graph. Ma et al. (2012) have discussed about heat diffusion on an undirected graph and its related algorithm time complexity. An example of heat diffusion is obtained later. To interpret Equation (8) and the heatdiffusion process more intuitively, we construct a small undirected graph with only five nodes as showed in Figure 5(a). Initially, at time zero, suppose node 1 is given three units of heat, and node 2 is given two units of heat, then the vector f .0/ equals Œ3; 2; 0; 0; 0T . The entries in matrix HD are follows: 3 2 1 1 1 1 1 6 1=4 1 0 0 0 7 7 6 6 1=4 0 1 0 0 7 4 1=4 0 0 1 0 5 1=4 0 0 0 1 Without loss of generality, we set the thermal conductivity ˛ D 1 and vary time t from 0 to 1 with a step of 0.05. The curve for the amount of heat at each node with time is shown in Figure 5(b). It can be seen that, as time passes, the heat sources node 1 and node 2 will diffuse their heat to nodes 3, 4, and 5. The heat of nodes 3, 4, and 5 will increase, respectively, and the trends of their heat curves are the same because in this graph, these three nodes are symmetric. Another example is shown in Figure 5(c). Initially, at time zero, suppose node 1 is given four units of heat, then the vector f .0/ equals Œ4; 0; 0; 0T . The related heat curve is shown in Figure 5(d). We can see that the node 2, the nearest node to the heat source, gains more heat than other nodes. This also indicates that if a node has more paths connected to the heat SOCIAL ANNOTATION SYSTEMS, WEB RECOMMENDER SYSTEMS source, it will potentially obtain more heat. This is a perfect property for recommending relevant nodes on a graph. 2.2. Literature Review Because expanding and developing the news elements in scientific and research contexts undoubtedly depends on completely knowing their background, innovating and developing Swallow hybrid system owes to the ideas and studies on three research domains in the past 10 years: (1) Recommender by using social tagging data (2) Graph-based recommendation (3) Heat diffusion–based recommendation A glimpse at these three may lead to understand the Swallow system. 2.2.1. A Recommender by Using Tagging Data. Recently, research interests are on resource recommendation in SASs. T.bogers and van den Bosch worked on applying CiteULike data for recommending scientific resources (such as papers). They studied three different CF algorithms and found out that user-based collaborative filtering algorithm performs much well (CiteULike 2004). Moreover, there is a huge amount of studies for enhancing older methods by means of tagging data (Flickr 2005; Resnick et al. 1994; Maes and Shardanand 1995; Guan et al. 2010; Gemmell et al. 2012). De Gemmis et al. (2008) utilized tagging data to enhance the content for user-based recommender systems to use tags as the old content of documents (Flickr 2005). Tso-Sutter et al. (2008) decreased triple options (user, item, and tag) of tagging data to two options, so that recalled tags is considered either as users or items and after that item-based CF algorithms or user-based CF algorithms are consecutively applied. They combined the calculated numbers by CF algorithm through linear adjoining to generate the final recommender (Heymann et al. 2008). Nakamoto et al. (2008) modified the user-based CF algorithm to unify tagging data in each step (Markines et al. 2009). Iofciu and Diederich (2006) created TF-IDF tag properties and used this property vector to measure user–user similarities in a user-based algorithm (Sigurbjörnsson and van Zwol 2008). Wetzker modified probabilistic latent semantic analysis to model collaborative occurrence relations between users–resources and resources– users (Wetzker et al. 2009). He assumed that these two types of collaborative occurrence relations share the same set of latent subjects and because of the probability of those resources, those suggested resources to the user are given to the user. Five aforementioned studies heuristically found tagging data in the older methods. They do not explore structural information in tagging data. Recent studies used more structured tagging data in these studies; Shepitsen proposed a private recommender algorithm for SASs, which was with respect to hierarchically tag clustering (Shepitsen et al. 2008). This algorithm required that the users enter queries tags to obtain the recommender. It can point to linear hybrid method as one of the most innovative methods for suggesting a source in SAS. In this method, various dual combinations of (tag–resource–user) were used and an optimal linear hybrid of them was reached by Hill Climbing approach (the method which was compared with the proposed method of this study) (Gemmell et al. 2012). On the other hand, first, by deducting the priorities of the user tag, produced the item recommender, then computed the item priorities (based on the tag priority). This method needed annotation information (e.g., the information by clicking and ranking information) in addition to different information and is not limited to entering query. In this study, it is tried to use structured combination of COMPUTATIONAL INTELLIGENCE (tag–resource) (resource–resource) and user’s click-through instead of user’s query. It may be hard to obtain the information by clicking and ranking information that doesn’t usually exist in SAS data. One of the other problems related to SAS is the tag recommender. The tag recommender is different from the resources recommender such that a tag recommender suggests corresponding tags for a resource (likely with respect to the user tag, such as customization); whereas, a resource recommender proposes the resources to the user (Lafferty and Kondor 2002; Niyogi and Belkin 2003; Lebanon and Lafferty 2005; Iofciu and Diederich 2006; Yang et al. 2007; De Gemmis et al. 2008; Sigurbjörnsson and van Zwol 2008; van den Bosch and Bogers 2008; Guan et al. 2009). Guan proposed a ranking-based graph for customized tag recommender (Guan et al. 2009). The mentioned algorithm took the history of the user tags and objective document to compute the customization as the query, ranking the tags with respect to them, and then the tags with high rank were recommended. But this algorithm doesn’t explicitly include the users while they are vital for resources recommender. Thereafter, it cannot be directly applied for the resources recommender. 2.2.2. Graph-Based Recommender. These studies are related to semi-space learningbased graph. Common algorithms locally consist of Laplacian maps and linear embedded codes (Niyogi and Belkin 2003; Guan et al. 2009). The basis of these algorithms is so that a dependent graph is made as an approximation to multiply the vital data to learn presentation with low dimension for the data by means of sustaining the dependent graph structure. With this trained space, it is possible to approximate the resemblance between two arbitrary samples. The former semi-space learning-based graph algorithms consider just one type of data parameter, while the proposed algorithm deals with two kinds of related data. Recently, the manifold alignment subject has become an interesting topic in research studies (Lafferty and Kondor 2002). The problem is to “align” two data manifolds corresponding to two types of data objects into a common space by pairwise correspondence examples between the two types of objects. In another study, Guan discussed the resources recommendation by using a graph learning algorithm. He made two weighted bipartite graphs, user-tag and resourcetag, then, a non-weighted graph, user-resource and finally, a content-based graph for the documents. Consequently, a semantic space appears for user-tag-document, which keeps the best connection structure of these graphs and shows the closest documents to the user, which has not tagging them yet (Guan et al. 2010). In this article, a way similar to Guan et al. (2010) used for construction of graph for achieving better accuracy, a new section of resource-resource similarity graph added to the resource-tag graph. 2.2.3. Recommender by Heat-Diffusion Algorithm. Recently, applying heat-diffusion algorithm has been taken into account by researchers in web recommender systems. Shang et al. (2010) used the diffusion to find the similarity between the users and concluded that this resemblance is much more effective than others (like, cosine similarity). Heat-diffusion algorithm has been used for suggesting a friend in social networks; Zhang et al. (2010), first, created a three-part graph (user-item-tag) by means of web log file in SAS. Then, by selecting the present user’s query as the heat resource heat-diffusion algorithm was performed on the graph, finally he proposed items of higher heat (Zhang et al. 2007; Shang et al. 2010; Guan et al. 2012; Lüa et al. 2012). Aarthi et al. providing a common framework on mining web graphs for recommendations using heat-diffusion method first proposed a recommendation algorithm; the algorithm aggregates items from similar customers eliminates items the user has already rated, and recommends the remaining items to the user (Aarthi and Sampath 2013). In this article, the user’s query is replaced by the user’s click-through. SOCIAL ANNOTATION SYSTEMS, WEB RECOMMENDER SYSTEMS Also, some resources used heat diffusion for query recommendation (Ma et al. 2012) in this model; first of all, a bipartite graph was made, which represented the relation between the query and the link. Second, by entering the searched query by the user, a subgraph was exploited from the main graph, and heat-diffusion algorithm (although all nodes are one and the searched node is zero) was executed on the graph. Finally, high-heat nodes were proposed to the user. The applied model in this study has been applied in Swallow system. 3. PROPOSED SYSTEM (SWALLOW) In this section, the new system and applied modules are thoroughly defined and their implementation in real world is discussed. Additionally, to do required experiments for assessing the system result, required parameters are introduced. 3.1. Introduction of “Swallow” Swallow conveys the resource and tag recommendation problem in SASs. This system contains two separate parts, online and offline. In the first phase, that is, offline section, because of learning framework, a hybrid graph is created. This graph is a combination of the weighted links between the resources and tags. In Swallow, the link graph of the resources and tags is implicitly exploited, and it is combined with the representative graph of the similarity resources created of the users’ applicable data. In online phase, by using the pattern of the obtained graph (of last step) and by following the current user’s click-through and performing heat diffusion on the graph, the user bookmarked resources and tags are forecasted and are proposed to him. In the following, the method details and the proposed algorithm are described (the proposed system framework is shown in Figure 6. 3.1.1. Introduction: Preprocessing. In this section, the Swallow’s immigration start point is explained. This step is only the data preparation for entering the system. FIGURE 6. The proposed system (Swallow) structure. COMPUTATIONAL INTELLIGENCE Swallow was performed on the user’s tagging data, which had been taken from Delicious and Movielens SAS (from October 20 to December 15, 2008) (Gemmell et al. 2012). Every record of registered data (user’s annotation) consists of following items: (1) Data registering date (2) User identification (ID(u)) (3) The resource, which was labeled by the user (URL) in Delicious, and Movie, which was labeled by the user that we named it resource afterward (4) Tags Available information in the data set could be represented by a quadruple (D, U, R, T), where R is the available URL in table and T stands for tag. In this research, to make the graph of resources and tags relations, the pair (R, T) was used, and the user’s related information and the registration time of data were ignored. At the beginning, 300 users, who have the highest number of annotations among all users, were selected. Then, those resources, which have been used less than five times, were omitted. 3.1.1.1. Data Cleaning. are deleted from data set: In this part, after doing following four steps, unrelated options (1) Deleting the records without tags. (2) Deleting the records with non-English tags. (3) Deleting the records whose tags are in the blacklist (a list of excessive words such as and, or, the, etc.). (4) Performing p-core algorithm: to use the considered data set, p-core parameter is performed according to the way that Gemmell et al. (2012) used. P-core parameter guarantees that each user, page, and tag exist in at least annotation of training database. Applying this parameter results in omission of some data of database; to clarify the activity of this parameter, its behavior can be expressed as the following two sets intersection: (1) Users who have labeled the pages more than P times. (2) Users who have labeled the pages more than P times and those of this set who have labeled more than P pages. This intersection causes that the users who have labeled one page more than P times are not placed in final list. In addition, it causes that the pages labeled more than P times and appeared only once in the database and this number is common for more than one user would not be appeared. 3.1.2. The First Step: Making Resources Similarity Graph. In this section, by using the previous users’ data registered in log file, the resources similarity graph is made: after preprocessing step, the similarity matrix is produced. The similarity matrix shows the correlation among the resources. At first, there are k resources. The similarity matrix is M*K, where M and K are the number of the users and the resources, respectively. Each matrix’s column is assumed as column vector, like vi D .R1i : : : Rmi /; which states how the source i is used by each user. The resemblance between both resources may be calculated by set analogy and Euclidean distance. Total analogy can also be achieved by a proper weighting to these two SOCIAL ANNOTATION SYSTEMS, WEB RECOMMENDER SYSTEMS formulas. These concepts are given as follows: s 2 PkD1 vki vkj m ED vi ; vj D 1 m Set analogy: SetSim vi ; vj Euclidean distance: r ED vi ; vj D Normalization: ˇ ˇ ˇ Vi \ Vj ˇ ˇ Dˇ ˇ Vi [ Vj ˇ Xm kD1 s ND vi ; vj D 1 Total analogy: Pm (12) 2 vki vkj kD1 (11) 2 Vki Vkj m Sij D WSS :SetSim vi ; vj C WND ND vi ; vj (13) (14) (15) and WSS C WND D 1 which is the weighted summation of two upper formulas and declares how the source i is used by every user. In this matrix, if Rhi is 1, the source i has been used by the user h and vice versa. Final matrix S is a K*K similarity matrix, which shows the similarity between each two resources. The components of the similarity matrixSij represents the degree of similarity between sources i and j. The matrix S is represented as follows: R1 R2 S D :: : Rk R1 R2 : : : Rk S11 S12 S1k 6 S21 S22 S2k 6 :: :: : : :: 4 : : :: Sk1 Sk2 Skk 2 3 7 7 5 (16) It is attempted that other similarity criteria, such as cosine similarity and Pearson correlation coefficient (whose definitions are given later) to be used instead of Euclidean distance. The comparison of the achieved results of algorithms is presented in the evaluation section. Cosine similarity: Pm vki vkj CS vi ; vj D qP kD1 qP (17) m m 2 2 v v kD1 kj i D1 kj Pearson correlation coefficient similarity: P P P Vj Vj i n Vi Vj P C C.vi vj / D rh i h P P i P 2 P 2 n V i . V i /2 n Vj2 Vj (18) COMPUTATIONAL INTELLIGENCE 3.1.3. The Second Step: Making the Bipartite Graph “Resource-tag.”. To create a resource-tag bipartite graph, an undirected bipartite graph Brt D .Vrt [ Ert / is considered such that ® Vrt Dˇ .R [ T / R D .r1 r2 : : : rn /; T D .t1 t2¯: : : tn /, and define the set of edges Ert D .ri ; ti / ˇif t here is an edge bet wee n ri t otj . The edge (ri ; ti ) exists if the user ui has labeled the resource ri as ti . For instance, in Figure 7(a), the value on each edge explains that how many times a tag is attributed to resource R. The graph, which is extracted by user’s clicks, cannot be simply used in the heat-diffusion algorithm because this graph is undirected and doesn’t precisely declare the relations between resources and tags. Hence, it is modified to the one in Figure 7(b). In the second graph, each undirected edge in the first one has been changed into two directed edges. The weight of each edge of resource-tag directed graph was normalized with respect to the number of times, which that resource has been labeled. Also, each edge weight of (tag-resource) graph was normalized because of the number of times that this tag has been used. The information of this graph is located in an M*K matrix, where M is the number of resources, and K is the number of tags. The final matrix RT showing the relations between the resource and the tag is as follows: R1 R2 RT D :: : RM T1 T2 TR D :: : Tk T1 T2 : : : rt 11 rt 12 ::: 6 rt 21 rt 22 6 :: :: : : 6 : : : 6 4 rt k1 rt k2 2 R1 tr11 6 tr 21 6 :: 6 : 6 4 tr 2 k1 R2 : : : tr 12 tr 22 :: : : : : tr k2 Tk 3 rt 1k rt 2k 7 7 :: 7 : 7 rt 5 (19) mk RM 3 tr 1k tr 2k 7 7 :: 7 : 7 tr km 5 (20) FIGURE 7. Making the graph in the second step; a) resource-tag graph; b) resource-tag modified bipartite graph). SOCIAL ANNOTATION SYSTEMS, WEB RECOMMENDER SYSTEMS 3.1.4. The Third Step: Combining Two Graphs. In this step, two declared graphs (resource-resource and resource-tag) are combined with each other and are set in one graph as shown in Figure 8. So, after performing heat-diffusion algorithm for every considered resource, it is accounted for the heat resource. It is possible to predict and determine both similar resources and similar tags. It results in using two similarity criteria in the recommendation problem, which increases the proposed system accuracy. The final obtained matrix is a combination of three matrices, S, RT, and TR. 2 S11 6 S21 6 6 6 6 6 Sk1 Final matrix D 6 6 tr11 6 6 t r21 6 6 4 t rk1 :: : :: : S12 S22 :: : Sk2 t r12 t r22 :: : t rk2 :: S1k S2k :: : : :: : rt11 rt21 :: : Skk t r1k t r2k rtk1 0 0 :: : t rkm :: : 0 3 rt12 rt1k rt22 rt2k 7 7 :: : : :: 7 7 : : : 7 rtk2 rtmk 7 7 7 0 0 7 7 0 0 7 7 :: : : :: 5 : : : 0 0 3.1.5. The Fourth Step: Determining the Initial Heat Resources. Here, according to the user current process and click trough, observed and labeled resources are taken as the initial heat resource, and their heat is considered 1, whereas other nodes are zero. To clarify, if the user annotates the resources r1, r2, r5 with t3, t7, t2, respectively, then the initial temperature of the mentioned nodes gets 1 in the created combined graph. There is a vector f .0/ whose number of elements is equal to resources plus tags whose all values are zero and the mentioned nodes are 1. 3.1.6. The Fifth Step: Determining a Proper Subgraph. To reduce the duration of performing heat-diffusion algorithm, it was attempted to decrease the final graph size. Moreover, an algorithm was implanted, which only extracted the initial heat resources and related nodes to them from the main graph, and other nodes were omitted. The algorithm is shown in Table 1. FIGURE 8. Combining resource-tag graph and resource-similarity graph. COMPUTATIONAL INTELLIGENCE TABLE 1. Making a subgraph for heat-diffusion algorithm’s entry. Algorithm 1. Making subgraph G (G, R,T) 1. Inputs: G = the graph G = (V, E) where is the vertices ®composed¯ of resources and tag. R D ¹r1 ; : : : rk º Where is visited resources, T D t1 ; : : : ; tj where is visited tags are given 2. For each resources and tags in R and T, a subgraph is constructed by using depthfirst search in G. The search stops when the number of nodes is larger than a predefined number. 3. 4. All the above subgraphs are combined together so all visited vertexes are placed in b V, and selected edges in b E K D .b Vb E/ is an appropriated subgraph. Graph G TABLE 2. Swallow recommendation algorithm. Algorithm 2. Making the recommendation set (G, R,T) 1. 2. a hybrid graph G D .V C [V E C [E / where the vertices are composed of two sets V C D ¹r 1 ; : : : ; r n º resources, and V D ¹t 1 : : : t n º tags and the edges includes two direction weighted edges showing the E relations of resource-tag and E C which contains the edges showing resources relations. ® ¯ By taking a set from the entry including R D ¹r 1 ; : : : r k º and T D t 1 ; : : : ; t j that are visited resources and annotated tags, a subgraph in G is made by means of algorithm 1. 3. Let ˛ D 1 and the initial value of R and T be 1, i.e., temperature f 0 .r 1 ; : : : :r k / D 1 and f 0 t 1 ; : : : :; t j D 1 4. Heat-diffusion algorithm begins by taking f .1/ D e ˛ .H-D/ f .0/ 5. The final f .1/ suggests to user 10 highest temperatures related to the sources and 10 highest ones related to the tags. 3.1.7. The Sixth Step: Resources Recommendation. In the final step, heat-diffusion algorithm was performed on a final subgraph and after some duration, the nodes with the highest temperature were suggested to the users as the favorite’s resources and tags. This was performed with respect to Equations (9) and (10) such that the matrix f .1/ was computed. In matrix f .1/, 10 tags and 10 resources of the largest value which had not been observed yet were proposed to him. The algorithm is shown in Table 2. Swallow suggests both favor resources and tags, which might be proper with the user’s click-through. SOCIAL ANNOTATION SYSTEMS, WEB RECOMMENDER SYSTEMS 4. ESTIMATION AND EVALUATION 4.1. Evaluation Criteria In this study, common method such as the linear hybrid method was utilized to evaluate Swallow. In most of the methods, the recommendations efficacy is assessed with respect to precision and coverage. Precision is criterion to assess the correctness of the recommended pages and coverage is as a criterion to evaluate this system’s capability of including those items, which the user tends to visit. Precision is defined as the ratio of the number of common options of system’s recommended data, “A,” and of user’s observed data, “B,” to the number of the system’s recommended pages, “A.” jA \ Bj Precision D (21) jAj Coverage is defined as the ratio of the common options of the system’s recommended data, “A” and of user’s observed data, “B,” to the number of user’s observed pages, “B.” Coverage D jA \ Bj jBj (22) The calculated precision and coverage of test data set interactions were averaged, and system efficiency was assessed. Data sets, after preparation, were divided into two test and train sets such that 70% of users to be in train set and 30% of them to be in test set. Then, the initial 50% of the users of test set were used for giving the recommendation; and the available sources of the remainder 50% were compared with the recommended sources, and the precision and coverage were computed. 4.2. The Effect of Resources Similarity Criterion on Swallow To calculate the degree of resources similarity, three similarity criteria that were mentioned in Section 3.1.3 were used: weighted combination of Euclidean distance, set similarity, cosine similarity, and Pearson similarity. Comparison results of performing each algorithm are given in Figure 9. As it is seen, using Pearson correlation coefficient has shown better results than the other criteria. Thereafter, Pearson correlation having the highest precision was applied for final computation of web pages similarity. As Figure 9 in most cases, cosine similarity and Pearson similarity resulted in nearly the same recommendation precision. But when the number of recommended pages increased, Pearson correlation coefficient had higher precision. The results show that precision of resources similarity degree directly affects the Swallow precision. For this reason, for researchers, it seems reasonable to use this system. 4.3. Resources Recommendation Evaluation 4.3.1. Comparing the Proposed Method with the Linear Hybrid Methods (Resource Recommendation). In this section, Swallow’s obtained results of the resource recommendation were compared with those of the linear combination method (Gemmell et al. 2012). They discussed about resource recommendation in SAS. First, a three-dimension matrix (user, resource, and tag) was produced, and to score the resources, their optimal linear hybrid was calculated by multiple two-dimension images (of the three-dimension matrix). In this COMPUTATIONAL INTELLIGENCE FIGURE 9. The effect of resources similarity on Swallow. FIGURE 10. Comparing Swallow precision and linear hybrid precision in resource recommendation with respect to the number of recommended pages (RP) in Delicious and Movielens data set. part, to specify the effect of the number of recommended pages on assessment criteria, per different values of the number of recommended pages, precision and coverage criteria were computed. The number of recommended pages is altered from 1 to 10 and for each, precision of the system was calculated. Considering Figure 10, in RP = 1, Swallow has the maximum capability, and according to the total output, and in all cases, Swallow is more valid for recommendations than the linear hybrid system. As stated earlier, for two data sets, the proposed systems were evaluated. As can be seen in both data sets (Movielens and Delicious), the accuracy of Swallow is more than linear hybrid method. SOCIAL ANNOTATION SYSTEMS, WEB RECOMMENDER SYSTEMS FIGURE 11. Comparing Swallow coverage and the linear hybrid coverage in resource recommendation with respect to the number of recommended pages (RP) in Movielens and Delicious. 0.8 Precision (%) 0.7 0.6 0.5 0.4 Swallow 0.3 Linear Hybrid 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 Recommended Page Count (RP) FIGURE 12. Comparing precision of Swallow and linear hybrid systems in tag recommendation considering the number of recommended pages (RP). By observing Swallow coverage and the linear hybrid system coverage as depicted in Figure 11, one can well understand that Swallow functionality capability in coverage is more than its neighbor’s graph. Coverage (%) COMPUTATIONAL INTELLIGENCE 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Swallow Linear Hybrid 1 2 3 4 5 6 7 8 9 10 Recommended Pages Count (RP) FIGURE 13. Comparing coverage of Swallow and linear hybrid systems in tag recommendation considering the number of recommended pages (RP). 4.4. Evaluating Tag Recommendation 4.4.1. Comparing Swallow System with Linear Hybrid Method (Tag Recommendation). As mentioned earlier, while a user wants to annotate a tag on the resources in the SAS, Swallow can also predict tag. Here, the proposed system was compared with the linear hybrid method for tag recommendation by Gemmell et al. (2011). Then tag recommendation, precision, and recall were again calculated. It was seen that it is more valid, and its coverage and precision is higher (as shown in Figures 12 and 13). 5. CONCLUSION AND FUTURE DEVELOPMENTS 5.1. Conclusion In this article, to propose resources and tags, a new recommendation system, based on heat-diffusion algorithm, has been developed. The proposed system, for using a natural phenomenon (heat-diffusion algorithm) and its instinct similarity to the router birds, was named Swallow. Observing the online user’s click-through and referring previous users tagging data, Swallow helps SAS to identify the properties and interests of current users and shows them to the users. In this study, first, using the information of log files, the similarity of resources in a graph was obtained. Then, this graph was enhanced by using the available relations between resources and tags annotated in the website by the users. The obtained pattern is the Swallow’s emigration map. Swallow checks online user’s behavior and after that, it heats checked points on its pattern and while following heat-diffusion process in the graph, reaches to users’ aimed resources. From that point, according to users’ priority, they will be presented. Analyzing and assessing Swallow based on precision and coverage criteria and comparing it with linear hybrid method reveals the increment of mentioned criteria in this system. To develop Swallow and to get close to a thorough useful and modern system probable perspectives are presented: (1) Regarding assessment results in Figure 9, one of the criteria, related to swallow, is used for computing resources similarity. So, it is suggested, to increase similarity precision, context-based methods to be used. (2) As adding resources similarity graph in bipartite resource-tag graph helps to improve the precision of the proposed system, utilizing other relations, such as tags similarity in this system, can be helpful. SOCIAL ANNOTATION SYSTEMS, WEB RECOMMENDER SYSTEMS (3) In these systems, because speed factor is noticeable and important, one of the proposed items (to improve) is to apply clustering approaches to set resources and tags in the clusters. Instead of heating a resource or tag in a hybrid graph, it could heat the most similar cluster to the considered options. It not only improves its speed, but also it solves the cold start. That is, if a selected resource and tag, as initial heat resources, not to be available, to heat the available resources in its similar cluster will be possible. REFERENCES AARTHI, S., and S. SAMPATH. 2013. A heat diffusion method for mining web graphs for recommendations using recommendation algorithm. International Journal of Engineering Research and Technology (IJERT), 2(8): 961–966. CITEULIKE. 2004. Available at: http://www.citeulike.org. DE GEMMIS, M., P. LOPS, G. SEMERARO, and P. BASILE. 2008. Integrating tags in a semantic content-based recommender. In Proceedings of the 2008 ACM Conference on Recommender Systems. ACM: New York, pp. 163–170. DELICIOUS. 2003. Available at: http://delicious.com. FLICKR. 2005. Available at: http://www.flickr.com. GEMMELL, J., T. SCHIMOLER, B. MOBASHER, and R. BURKE. 2011. Tag-based resource recommendation in social annotation applications. In Proceedings of the 19th International Conference on User Modeling, Adaption, and Personalization. Springer-Verlag: Berlin, Heidelberg, pp. 111–122. GEMMELL, J., T. SCHIMOLER, B. MOBASHER, and B. BURKE. 2012. Resource recommendation in social annotationsystems: a linear-weighted hybrid approach. Journal of Computer and System Sciences, 78(4): 1160–1174. GUAN, Z., J. BU, Q. MEI, C. CHEN, and C. WANG. 2009. Personalized tag recommendation using graph-based ranking on multi-type interrelated objects. ACM: New York, pp. 540–547. GUAN, Z., C. WANG, J. BU, C. CHEN, K. YANG, D. CAI, and X. HE. 2010. Document recommendation in social tagging services. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, pp. 391–400. GUAN, Z., G. MIAO, R. MCLOUGHLIN, X. YAN, and D. CAI. 2012. Co-occurrence-based diffusion for expert search on the web. IEEE Transactions on Knowledge and Data Engineering, 25(99): 1001–1014. HAM, J., D. D. LEE, and L. SAUL. 2005. Semisupervised alignmentof manifolds. In Proceedings of the Annual Conference on Uncertainty in Artificial Intelligence, Edinburgh, UK, pp. 120–127. HEYMANN, P., D. RAMAGE, and M. GARCIA-MOLINA. 2008. Social tag prediction. In Proceedings of the 31st annual international ACM SIGIR conference. ACM: New York, pp. 531–538. IOFCIU, T., and J. DIEDERICH. 2006. Finding communities of practice from user profiles based on folksonomies. In Proceedings of the 1st International Workshop on Building Technology Enhanced Learning Solutions for Communities of Practice, EC-TEL, Crete, Greece, pp. 308–410. KARYPIS, G. 2001. Evaluation of item-based top-n recommendation algorithms. In Proceedings of the Tenth International Conference on Information and Knowledge Management. ACM: New York, pp. 247–254. LAFFERTY, J. D., and R. I. KONDOR. 2002. Diffusion kernels on graphs and other discrete input spaces. In ICML ’02 Proceedings of the Nineteenth International Conference on Machine Learning. Morgan Kaufmann: San Francisco, pp. 315–322. LEBANON, G., and J. D. LAFFERTY. 2005. Diffusion kernels on statisticalmanifolds. Journal of Machine Learning Research, 6: 129–163. LI, X., S. LIN, S. YAN, and D. XU. 2008. Discriminant locally linear embedding with high-order tensor data. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 38: 342–352. LÜA, L., M. MEDO, C. H. YEUNG, Y.-C. ZHANG, Z.-K. ZHANG A, and T. ZHOUA. 2012. Recommender systems. Physics Reports, 519: 1–49. COMPUTATIONAL INTELLIGENCE MA, H., I. KING, and M. R. LYU. 2012. Mining web graphs for recommendations. IEEE Transactions on Knowledge and Data Engineering, 24: 1051–1064. MAES, P., and U. SHARDANAND. 1995. Social information filtering: algorithms for automating “word of mouth.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM Press/ Addison-Wesley: New York, pp. 210–217. MARKINES, B., C. CATTUTO, F. MENCZER, D. BENZ, A. HOTHO, and G. STUMME. 2009. Evaluating similarity measures for emergent-semantics of social tagging. In Proceedings of the 18th International Conference on World Wide Web, Madrid, pp. 641–650. MIN, W., K. LU, and X. HE. 2004. Locality pursuit embedding. Pattern Recognition, 34: 781–788. NAKAMOTO, R. Y., S. NAKAJIMA, J. MIYAZAKI, S. UEMURA, H. KATO, and Y. INAGAKI. 2008. Reasonable tag-based collaborative filtering for social tagging systems. In Proceeding of the 2nd ACM Workshop on Information Credibility on the Web. ACM: New York, pp. 11–18. NIYOGI, P., and M. BELKIN. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15: 1373–1396. RESNICK, P., N. IACOVOU, M. SUCHAK, P. BERGSTROM, and J. RIEDL. 1994. Grouplens: an open architecture for collaborative filtering of netnews. In Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work. ACM: New York, pp. 175–186. SAUL, L., and S. ROWEIS. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science, 290: 2323–2326. SHANG, M. S., Z.-K. ZHANG, T. ZHOU, and Y.-C. ZHANG. 2010. Collaborative filtering with diffusion-based similarity on tripartite graphs. Physica A, 389: 1259–1264. SHEPITSEN, A., J. GEMMELL, B. MOBASHER, and R. BURKE. 2008. Personalized recommendation in social tagging systems using hierarchical clustering. In Proceedings of the 2008 ACM Conference on Recommender Systems. ACM: New York, pp. 259–266. SIGURBJÖRNSSON, B., and R. VAN ZWOL. 2008. Flickr tag recommendation based on collective knowledge. In Proceedings of the 17th International Conference on World Wide Web. ACM: New York, pp. 327–336. TSO-SUTTER, K., L. B. MARINHO, and L. SCHMIDT-THIEME. 2008. Tag-aware recommender systems by fusion of collaborative filtering algorithms. In Proceedings of the 2008 ACM Symposium on Applied Computing. ACM: New York, pp. 1995–1999. BOSCH, A., and T. BOGERS. 2008. Recommending scientific articles using CiteULike. In Proceedings of the 2008 ACM Conference on Recommender Systems. ACM: New York, pp. 287–290. VAN DEN WETZKER, R., W. UMBRATH, and A. SAID. 2009. A hybrid approach to item recommendation in folksonomies. In Proceedings of the WSDM’09 Workshop on Exploiting Semantic Annotations in Information Retrieval. ACM: New York, pp. 25–29. YANG, H., I. KING, and M. R. LYU. 2007. DiffusionRank: a possible penicillin for web spamming. In SIGIR ’07 Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM: New York, pp. 431–438. ZHANG, Y.-C., M BLATTNER, and Y.-K. YU. 2007. Heat Conduction Process on Community Networks as a Recommendation Model. Physical Review Letters, 99(14): 1–5. ZHANG, Z.-K., T. ZHOU, and Y.-C. ZHANG. 2010. Personalized recommendation via integrated diffusion on user–item–tag tripartite graphs. Physica A: Statistical Mechanics and Its Applications, 389: 179–186.