Print this article
Transcription
Print this article
K.Padmapriya * et al. / (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No. 2, February – March 2014, 819 - 826. K.PADMAPRIYA Research Scholar Department of Computer Science & Engineering Sathyabama University, Chennai, India Dr.S.SRIDHAR Professor & Dean – CCCF, R.V.College of Engineering, Bangalore, Karnataka, India Abstract - Group join processing in huge volume of data streaming environment has a lot of practical applications like pursuing, observing, arranging etc. Research in these areas usually takes stream processing over precise data and benchmark data. In this paper, the similarity join processing on data streams spontaneously takes data which contain uncertainty and inaccuracy. Since input data are coming from a variety of resources at each time interval the data are uncertain. The main problem which is grouping of uncertain data streams [USG] can be overcome using Modified Pruning [MP] method which will guarantee the accuracy of the grouping of uncertain data. To challenge the encounters with respect to the efficiency and effectiveness such as less time, limited memory and cost reduction the MP method [which is a combination of object and sample levels] filters out false alarms. The MP method combined with query procedures will incrementally answer the problem of USG. Since the data modeled, that is, the real time data - like image data base, time series data, and sensor data - are uncertain and immense, the uncertain data are group together by the group nearest neighboring method. Since the data is uncertain, a novel procedure MPSRQ is utilized [Modified Probabilistic Subspace Range Query] to make the subspace query function efficiently and effectively. This novel MPSRQ procedure finds objects within a space from a query object in any subspace with high probability. Keywords: Similarity Group; Subspace Query; Uncertain Data Streams, Group nearest Neighbor; Pruning Method; Query Processing, Data Level Pruning; Object Level Pruning. I. INTRODUCTION Recently, uncertain data analysis has become an increasingly important issue due to the everywhere data uncertainty in many real-world applications such as sensor data monitoring [1], [2], [3], location-based services (LBS) [4], RFID networks [5], object identification [6], and moving object search [7], [8]. As an example, in sensor networks, sensory data contain a lot of static, resulting from environmental factors, packet losses, and low energy. The same LBS is also used in few other examples, like, finding the position of a mobile user employing the Global Positioning system [GPS]. However, GPS data are often imprecise for various reasons such as clock errors, ephemeral errors, atmospheric delays, multipathing and satellite geometry. Also, the data path of mobile users is sometimes intentionally distorted by a trustworthy third party for the sake of privacy preserving [9], [4]. Therefore, in real applications, it is not unusual to encounter uncertain and imprecise data, and we have to effectively and efficiently answer queries on such data. Similarity search has been extensively studied for traditional categorical and numerical data types in relational data. There are also a few studies leveraging link information in networks. Most of these studies are focused on homogeneous networks or bipartite networks, such as personalized Page Rank (P-Page Rank) [10], Sim-Rank [11] and SCAN [12]. However, these similarity measures disregard the subtlety of different types among objects and links. Adoptions of such measures to various networks have significant drawbacks: Objects of different types and links take different semantic ISSN 2320 –5547 meanings, and it does not make sense to combine them to measure the similarity without distinguishing their semantics. The volume of data managed by the Database Management Systems (DBMS) is increasing continuously. Moreover, new complex data types, such as multimedia data (like image, audio, video and long text), time series, fingerprints, geo-referenced information, genomic data and protein sequences, among others, have been added to DBMS [13]. Formally, a metric space is a pair < S, d( ) >, where S is the data domain and d() is a distance function that complies with the following three properties: 1. 2. symmetry: non-negativity: ∞ and 3. Triangular inequality: A metric dataset S is a set of objects si S currently stored in a database. Vector based data with Lp distance function, such as Euclidean distance (L2), are special cases of metric spaces. The Range query and the K-NN query are more or less similar and is defined as: Range query - Rq: given a query center object sq S and a maximum query distance rq, the query Rq(sq,rq) retrieves every object si S, such that d(si,sq) < rq. An example is: "Select the proteins that are similar to the protein P by up to 5 purine bases", which is represented as Rq(P,5); k-Nearest Neighbor query - kNNq: given a query center object sq S and an integer value k > 1, the query kNNq(sq,k) retrieves @ 2013 http://www.ijitr.com All rights Reserved. Page | 819 K.Padmapriya * et al. / (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No. 2, February – March 2014, 819 - 826. the k objects in S that have the minimum distance from the uncertainty object sq, according to the distance function d(). An example is: "Select the 3 protein mostly similar to the protein P ", where k=3, which is represented as kNNq(P,3). In our model we specially make the following clarities 1. 2. 3. We solemnize the problem of Group on uncertain data streams from subspaces using GNN. We provide a general structure for accomplishing USG and incrementally preserving USG answer set. We propose the Modified Pruning method to prune false alarms of USG candidate pairs [data pair] and integrate them into an efficient USG. II. RELATED WORK Given below are the synopses of a few papers presented:In [14] the authors propose the use of LocalitySensitive Hashing (LSH) to transform a D dimensional vector x into a sequence of C bits (binary vector) v(x). Since the L1 distance between the various vectors can be approximated by the Hamming (edit) distance between the corresponding binary vectors, they propose a hashing technique to index only the binary vectors v(x). Of course, both preciseness and efficaciousness of the execution highly depend on the number C bits used for approximating vectors. The technique VSL1 applies Hamming distance to the metric L1. In [15] approximate nearest neighbor quest techniques based on the VA-file [WSB98] are presented. Such structure, in its essence, is a structure in order containing approximations of vectors using a fixed number b of bits. Exact k-NN search is performed initially by executing a sequential scan of the structure using the query distance on vectors approximations, which yields a number M, where M >k of candidate vectors, and then applying an amelioration step, where the distance is assessed on real vectors and only the k “best” vectors are retained. Proposed techniques also suggest to reduce the number of considered approximations by diminishing the query radius (VA-BND) or to avoid the improvement stage by rendering only the “best” k candidate vectors, using the approximations (VALOW). In [16] the authors propound the P-Sphere tree, a 2level index structure for the nearest 1-NN search. In order to ascertain the nearest neighbor for the query point, the leaf node which is closest to the nearest node is accessed by utilizing the distance function for the query point. The distance is calculated on the classified object, not based on a synchronized system, ISSN 2320 –5547 and also no assumption is made. The distance sometimes means the similitude between the attributes. Query processing on data flows has many essential applications such as real-time processing of data gathered from sensor networks [17]. Join processing in non-modified databases has been extensively researched, for example, the spatial join [18], [19], [20]. Fundamentally, two spatial indexes (e.g., R-tree [21]) are usually constructed offline for two static data sets, in that order, and the spatial join can be performed by traversing these two indexes in parallel. Different from the join in static databases, the prior works on data stream processing usually inferred that the underlying data are specific data points, which is also the inference in our USG problem. Hence, static indexes cannot be composed offline. In other words, in the concatenation data flow environment, the cost of building/updating/querying indexes online may not catch up with the speed of continuously incoming data (with possibly high input rate). Many methods have been designed for efficaciously processing different query types on data streams, including the top-k query [22], skyline query [23], aggregate query [24], join [25], [26], [27], [28], and so on. To list a few, Das et al. [22] studied the top-k queries on data streams, which first transform data points to a double space and then obtain fast top-k answers in this double space. Tao and Papadias [23] designed lazy and eager strategies to incrementally uphold skylines over a sliding window from data streams. The existing works on the join over data streams include X Join [25], hash merge join [27], and rate-based progressive join [28]. These works primarily focus on the equality join over specific data points, which allow disk accesses and can move the unprocessed data onto disks for later join operation. Thus, different policies to select memory partitions for flushing were propounded in order to minimize the total cost of the join operator on flow of data. In contrast, our USG problem considers the similarity join with range predicate (rather than equality join) on streams where data are uncertain and imprecise. III. EXISTING APPROACH Initially we define the problem based upon the subspace similarity search over uncertain data in brief. Then we expound the problem definition of SSS over precise data. A database D contains N precise data objects in a n-dimensional full space DM and a query q in a k-dimensional subspace DM’ of DM [i.e., DM’ DM, k [kmin,kmax], and k << n] , a subspace similarity query retrieves all the objects obj D such as dist(q,obj) ≤ , where obj is a kdimensional point obtained by the projection of object obj on subspace DM’ and dist(., .) is a distance function (p [1, ∞]). In particular, given any two kdimensional data points x and y in the subspace DM’, the function dist(x, y) is defined as @ 2013 http://www.ijitr.com All rights Reserved. Page | 820 K.Padmapriya * et al. / (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No. 2, February – March 2014, 819 - 826. p Dist(x, y) = (1) ′ Where 1≤ p ≤∞. When p = ∞, the distance function dist(x, y) is given by Dist(x, y) = (2) ′ In this existing system the problem exclusively finds the output data retrieved from the database using equation 1 and 2, the solemnization of the problem for the similarity search over precise data is in arbitrary space. One of the key issues in the similarity search problem is the development of efficient recovery methods. In order to facilitate a fast similarity search, the preceding works usually constructed multidimensional indexes, such as R-tree [16], for the data set, on which either range or nearest neighbor query is issued. Fundamentally we find the distance between the data in a subspace and it should come under certain constraints from the arbitrary subspaces. IV. CW(DS1) = (3) CW(DS2) = (4) At the current time interval t, it can be said that when a new certain object x[t+1] (y[t+1]) comes in at the next time interval (t+1), this new object x[t+1] (y[t+1]) is appended to DS1(DS2). At that particular time the old object x[t-cw+1] (y[t-cw+1]) expires and is evicted from the memory. Thus, USG at time interval (t+1) is conducted on a new compartment window {x[t-cw+2], ……x[t+1]} (y[tw+2],….,y[t+1]}) of size cw. PROBLEM DEFINITION The proposed approach Modified Pruning method combines by creating the USG framework and making a similar object retrieval procedure. This framework is about considering data streams with the component window at a particular time interval. It considers the data stream into hypersphere objects and gets the similarity distance by making sample objects placed inside the whole data objects set. Each time the uncertain data object satisfies the inequality [equation - 5, 6, 7, 8], obtains those objects through an object retrieval procedure and removes the old objects from the space. To obtain the data we invoke a procedure getdata_pair() from the data objects in the hyper space and remove the expired objects, in order to reduce the complexity. A. USG_Framework – [Data Level Pruning] The main problem defined in this paper is grouping on uncertain Data streams. There are n numbers uncertain data streams available in a data pool, from that without loss of generality, we consider two uncertain data streams in our experiment. A complete two uncertain data streams DS1 and DS2 are taken as inputs for the USG problem, where both data streams consist of a sequence of continuously occurring uncertain objects in different time intervals, as denoted below: Fig.1: Grouping Uncertain Data Streams For Grouping the uncertain Data Streams, we utilize two data streams DS1 and DS2, a distance threshold [0, 1] and a value , a probabilistic threshold α group on uncertain data streams which continuously monitor pairs of uncertain objects x[i] and y[i] within the compartment windows CW(DS1) and CW(DS2), respectively, of size cw at the current period of clock interval t, and it can be presented as: (5) To perform an USG equation-5, users need to register and two parameters - distance threshold probabilistic threshold α. Since each uncertain object at a given time consists of R samples, the grouping probability is P|r{dist( x[i], y[i]) } in Inequality (5) which can be rewritten as: DS1 = DS2 = Where x[i] or y[i] - k-dimensional uncertain object at the time interval I, t - Current time interval. According to the group nearest neighbor, the objects should retrieve close pairs of objects within a period. Thus a compartment window concept is adapted for uncertain stream group operators. From figure-1, it is clear that the USG operator always considers the most recent CW uncertain data stream, that is, ISSN 2320 –5547 (6) One straightforward method to directly perform USG over compartment windows is to follow the USG definition. That is, for every object pair X[i], Y[i] from compartment windows CW(DS1) and CW(DS2), respectively, we compute the grouping probability that X[i] is within distance from Y[i] (via samples) based on equation(6). If the resulting probability is greater than or equal to probabilistic @ 2013 http://www.ijitr.com All rights Reserved. Page | 821 K.Padmapriya * et al. / (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No. 2, February – March 2014, 819 - 826. threshold α, then this pair X[i], Y[i] is reported as the USG answer; otherwise, it is a false alarm and can be safely eliminated. Table-1: Symbols and Descriptions Symbol Description DS1 (DS2) Uncertain data streams CW(DS1) Compartment window in DS1 (DS2) with the most recent w data (CW(DS2)) X[i] (Y[j]) Uncertain object at the timestamp i (j) in DS1 (DS2) X[i] ( y[j]) Sample of object X[i] (Y[j]) (1 k l) Xk[i].p (yk[j].p) Appearance probability of sample xk[i] HS (X[i]) Hypersphere bounding all samples of X[i] centered at CX[i] and with radius rX[i] Distance threshold of USG processing Α Probabilistic threshold of USG processing Further, we provide Table 1 which mentions the commonly used symbols for understanding the complete formulations and notations presented in this paper. The complete functionality of the USG framework can further be implemented through any computer language and can be verified. Since the verification of the USG framework can be obtained, it is given as a pseudo code form in Figure-2. Fig 2: Pseudo code for USG_Framework B. Object Level Pruning Further the data streams are converted into data Pseudo Code: USG_Framework () { Input: Two uncertain data streams DS1 and DS2 are separated from a group of uncertain data stream DS and we initialize two variables - a distance threshold and a probabilistic threshold α. Output: Obtain the USG results between CW(DS1) and CW(DS2) by grouping the similarity. Store DS1 And DS2 in an array or in any type of data structure which is flexible For every time interval (t+1) Obtain uncertain object X[t+1] and Y[t+1] from uncertain data streams DS1 and DS2 respectively Then add the new object X[t+1 (Y[t+1]) ) to and obliterate the expired objects X[t-cw+1] (Y[t-cw+1]) from CW(DS1) (CW(DS2)) Invoke the procedure getdata_pair( ) to find the data objects Y[j] (X[i]) in CW(DS2) (CW(DS1)) such that inequality (5) holds for pair ( Insert the data pair ( to the result RS and obliterate the expired pair in RS Report actual USG answers in RS and t = t+1. } ISSN 2320 –5547 objects by taking the random sampling method and it is given in a compartment window which consists of l random sample. Fig.3: Object-level pruning method Clearly if all the pair wise distances between samples from two uncertain objects X[t+1] and Y[j] are above threshold , then these two uncertain objects will definitely have their distance above , and in turn = 0. also above The uncertain data streams converted into hypersphere data object is presented clearly in figure3 depicted above. C. Modified Pruning Method In this section the rationale behind the pruning method in USG processing is presented. There are lots of pruning methods available like Sample level, Data level, Index pruning, CarmelTopKTerm Pruning Policy, etc. But, the Modified Object level pruning is tested in this paper for better performance. In the object pruning method an uncertain object X[t+1] from the component window CW(DS1) and a number of uncertain objects Y[j] (t-w+2 j t+1) from the component window CW(DS2), would discard data pairs if they do not satisfy the group probability given below: Pr ε = 0. (7) In other words, we want to prune those pairs such that objects X[t+1] and Y[j] always have a distance to each other greater than the distance threshold . Thus Figure-3 illustrates how to reduce cost using an example. When a new uncertain object, say X[t+1], is bound by all the L samples in X[t+1] along with a hypersphere HS(X[t+1]) then it should be centered in the centroid, CX[t+1], of X[t+1], or should satisfy the following inequality. . The case of uncertain object Y[j] is similar to that of uncertain object X[j], that is we use a hypersphere HS(Y[j]) to bind all samples of Y[j]. The object-level pruning method is described below, in the following lemma. Lemma 1: [Object level pruning]. Given a pair of uncertain objects X[t+1] and Y[j], and a distance @ 2013 http://www.ijitr.com All rights Reserved. Page | 822 K.Padmapriya * et al. / (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No. 2, February – March 2014, 819 - 826. threshold , the candidate pair safely pruned if it holds that can be . (8) Proof: From Figure-3, it is spontaneously known that the LHS of the inequality equation (8) corresponds to the minimum possible sample distance equation between objects X[t+1] and Y[j] and if this minimum distance is greater than the distance threshold , then inequality equation (5) in the USG definition will never hold because of [ α > 0], and thus it discards this object pair. To improve the efficiency of the object level pruning one more inequality constraint is applied to the data (9). Then, instead of exhaustive computation, only those uncertain objects in grid cells satisfying equation (9) are needed to be accessed, where rX[t+1] is the radius of object X[t+1], and rmax(DS2) is defined as the maximum radius among all objects in component windows CW(DS2). for all Since we construct a grid with centers CY[j] of uncertain objects Y[j] in the component window CW(DS2), we apply the objectlevel pruning method (in Lemma1) to the grid index. After that, we can obtain a number of data pairs that cannot be pruned on the object level and satisfy the inequality equation (9) which will be explained and experimented in future works. The input data is preprocessed, normalized, centroid created and using the radius in hyper sphere, the query is applied. This sequence of steps can be implemented in any computer programming languages like DOTNET, JAVA etc., to verify the efficiency of the proposed approach. In this paper the pseudo code given in the following Figure-4 is implemented in MATLAB software and the results produced with detailed explanation in the Results and Discussion section. V. RESULTS AND DISCUSSION The USG framework and the modified pruning method is experimented and simulated using MATLAB 2012a software where the input data is taken from the Benchmark data of US government share market data. D. Query processing One imperative step is to beseech the procedure getdata_pair() to recover data pairs from the data streams. Procedure getdata_pair( ) { Input: data streams DS1, DS2… DSn, processed as uncertain objects X[t+1] and Y[j] in CW[T2], a distance threshold , and a probabilistic threshold α. Output: data pairs satisfying inequality (5) Convert the data into objects // by applying random sampling method Decide the compartment window size CW Find the centroid and the radius of each compartment Find average similarity distance between X, Y p Sort the data in the particular compartment Get the probability of distance similar data Retrieve uncertain objects Y[j] in grid cells satisfying inequality (6) // using object level pruning, lemma 1 For each remaining data pair check the inequality (1) by computing the group probability via objects Return all the data pairs that pass the checking. } Fig 4: Pseudo Code for getdata_pair ( ) For developing the USG framework in any computer language, the pseudo code getdata_pair () is presented below. Specifically, the new uncertain object X[t+1] from uncertain stream DS1, procedure getdata_pair( ) retrieves candidate pairs which sustains the Inequality equation (5), ISSN 2320 –5547 Fig. 5a: Original uncertain data from Benchmark Database of DS1 Fig.5b: Original uncertain data from Benchmark Database of DS2 The data consists of five fields in excel format where the first field mentions the region, the second field mentions the data and time, the third filed mentions the total demand, the fourth field mentions the RRP and the fifth field mentions the period type as Trade or Non-Trade. For this research only the numerical data is taken from excel data and assumed as data stream1 and data stream2. @ 2013 http://www.ijitr.com All rights Reserved. Page | 823 K.Padmapriya * et al. / (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No. 2, February – March 2014, 819 - 826. The original data of the data stream1 and the data stream2 is shown in Figure-5a and Figure-5b. There are a lot of differences in the original data, where the limited size of the data is taken, [1000 columns] and is represented as 1x1000 for both DS1 and DS2. Fig.7a: The Hypersphere data for compartment window of DS1 Fig.6a: The Compartment Window for DS1 Fig.6b: The Compartment Window for DS2 From the data 1x1000, the compartment window size is 100 and the CW (DS1) and the CW (DS2) is the compartment window data for the data stream1 and data stream2 and it is shown in Figure-6a and Figure6b in order to reduce the complexity and to increase the speed in preprocessing the data. The compartment window concept is used for sampling, range estimation, distance distribution and reference point estimation. So that the stream joining process can be applied on the compartment window and the USG will take the most recent CW uncertain data streams as CW (DS1) and CW (DS2). Once the CW sized data taken from the DS1, DS2 then the data is converted into hyperspace data, sorted, find the centroid value and radius for the CW. The data converted into hyperspace is by selecting the random sampling for the compartment size as 100. The figure-7a and figure-7b represents the random sampling HS(DS1), HS(DS2) for the CW(DS1) and CW(DS2). Now the comparison of HS(Y[j]) with the HS(X[t=1]) should satisfy the inequality conditions of Equation (7) and (8) produce the matching pairs. ISSN 2320 –5547 Fig.7b: The Hypersphere data for compartment window of DS1 The figure-8a and figure-8b gives the data objects from HS(CW(DS1)) and HS(CW(DS2)) which are satisfying the in-equality constraints [5] and [8]. The data objects those who satisfy the in-equality condition means we retrieve the original data pairs from the DS1 and DS2 at the time interval t and the . pair is denoted as The similarity data is retrieved by the Modified Pruning method for the uncertain data stream taken and experimented from the USG framework and it represents the USG answer which is shown in Figure9. Fig.8a: Data satisfy the In-equal constrains from DS1 @ 2013 http://www.ijitr.com All rights Reserved. Page | 824 K.Padmapriya * et al. / (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No. 2, February – March 2014, 819 - 826. Fig.8b: Data satisfy the In-equal constrains from DS2 Fig. 10: Performance Evaluation of the proposed approach. VI. Fig.9: Query Results for getdata_pair() The total data size is 1401 rows, representing 12 months trading information where each row consists of 5 columns. The USG experiment assigned with 2 columns which is about the time, trading amount. Out of 1401 there are 1200 data are similar. This experiment can be extended for the entire data set and the performance of the proposed approach is given in the following figure -10 and in the table-1. Year 2005 2006 2007 2008 2009 2010 2011 2012 Total DS size 1800 2500 2200 1455 1401 1401 1401 1401 Similarity Found 1500 2234 2087 1376 1200 1108 1203 1200 Table-1: More Data Streams Experimented for Compare the Performance of the proposed approach. From the above Table-1 it is clear that, which year, what is the original data stream size taken and what will be the resultant data given by the query process by the proposed approach. The graphical representation of the Table-1 is given in the Figure10, where the first column gives the year 2005 and the data size is 1800 and the result of the similarity searching for the USG framework and the getdata_pair() is 1500 and it is the same way for all the columns are representing the year with data size and the result. ISSN 2320 –5547 CONCLUSION A simulation based problem for grouping uncertain data streams is reserved and it is observed that alike set of uncertain objects with high self-assurance between multiple data streams is the data input. Essentially the data streams are having same features and uncertainty in the level, so we proposed a framework USG and object level pruning be the preprocessing and normalizing the data in the search space and it make easy for searching. We make obvious through widespread experiment the competence and success of our proposed USJ processing techniques under different parameter settings. In Future work the cost and the time interval should be analyzed and due to cost the USG framework will be improved as cost effective USG for similarity GNN for uncertain data. REFERENCES [1] A. Faradjian, J. Gehrke, and P. Bonnet, “Gadt: A Probability Space ADT for Representing and Querying and Physical World,” Proc. 18th Int’l Conf. Data Eng. (ICDE), 2002. [2] M. Li and Y. Liu, “Underground Coal Mine Monitoring with Wireless Sensor Networks,” ACM Trans. Sensor Networks, vol. 5, pp. 129, 2009. [3] Z. Yang and Y. Liu, “Quality of Trilateration: Confidence-Based Iterative Localization,” IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 5, pp. 631-640, May 2010. [4] M.F. Mokbel, C.-Y. Chow, and W.G. Aref, “The New Casper: Query Processing for Location Services without Compromising Privacy,” Proc. 32nd Int’l Conf. Very Large Data Bases (VLDB), 2006. [5] S.R. Jeffery, M.J. Franklin, and M. Garofalakis, “An Adaptive RFID Middleware for Supporting Metaphysical Data @ 2013 http://www.ijitr.com All rights Reserved. Page | 825 K.Padmapriya * et al. / (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH Volume No.2, Issue No. 2, February – March 2014, 819 - 826. Independence,” The VLDB J., vol. 17, no. 2, pp. 265-289, 2008. [6] [7] [8] [9] C. Bo¨hm, A. Pryakhin, and M. Schubert, “The Gauss-Tree: Efficient Object Identification in Databases of Probabilistic Feature Vectors,” Proc. 22nd Int’l Conf. Data Eng. (ICDE), 2006. R. Cheng, D. Kalashnikov, and S. Prabhakar, “Querying Imprecise Data in Moving Object Environments,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 9, pp. 1112-1127, Sept. 2004. L. Chen, M.T. O ¨ zsu, and V. Oria, “Robust and Fast Similarity Search for Moving Object Trajectories,” Proc. ACM SIGMOD Int’l Conf. Management of Data, 2005. B. Gedik and L. Liu, “Location Privacy in Mobile Systems: A Personalized Anonymization Model,” Proc. 25th Int’l Conf. Distributed Computing Systems, 2005. [10] G. Jeh and J. Widom. Scaling personalized web search. In WWW’03, 271–279, 2003. [11] G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD’02, 538– 543, 2002. [12] X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. Scan: a structural clustering algorithm for networks. In KDD’07, 824–833, 2007. [13]. Marcos R. Vieira, Caetano Traina Jr., Fabio J. T. Chino, Agma J. M. Traina, “DBM-Tree: Trading Height-Balancing forPerformance in Metric Access Methods” , CEP 13560-970 – Sao Carlos – SP – Brazil. [14] [15] [16] Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In Proceedings of 25th International Conference on Very Large Data Bases (VLDB’99), pages 518–529, Edinburgh, Scotland, UK, September 1999. Roger Weber and Klemens B¨ohm. Trading quality for time with nearest-neighbor search. In Proceedings of the 7th International Conference on Extending Database Technology (EDBT2000), pages 21–35, Konstanz, Germany, March 2000. Jonathan Goldstein and Raghu Ramakrishnan. Contrast plots and P-Sphere trees: Space vs. time in nearest neighbor searches. In Proceedings of 26th International Conference on Very Large Data Bases (VLDB 2000), pages 429–440, Cairo, Egypt, September 2000. ISSN 2320 –5547 [17] D. Carney, U. C¸ etintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S.B. Zdonik, “Monitoring Streams - A New Class of Data Management Applications,” Proc. 28th Int’l Conf. Very Large Data Bases (VLDB), 2002. [18] T. Brinkhoff, H-P. Kriegel, and B. Seeger, “Efficient Processing of Spatial Joins Using RTrees,” Proc. ACM SIGMOD Int’l Conf. Management of Data, 1993. [19] Y.-W. Huang, N. Jing, and E.A. Rundensteiner, “Spatial Joins Using R-Trees: Breadth-First Traversal with Global Optimizations,” Proc. 23rd Int’l Conf. Very Large Data Bases (VLDB), 1997. [20] M.L. Lo and C.V. Ravishankar, “Spatial HashJoins,” ACM SIGMOD Record, vol. 25, pp. 247-258, 1996. [21] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD Int’l Conf. Management of Data, 1984. [22] G. Das, D. Gunopulos, N. Koudas, and N. Sarkas, “Ad-Hoc Top-k Query Answering for Data Streams,” Proc. 33rd Int’l Conf. Very Large Data Bases (VLDB), 2007. [23] Y. Tao and D. Papadias, “Maintaining Sliding Window Skylines on Data Streams,” IEEE Trans. Knowledge and Data Eng., vol. 18, no. 3, pp. 377-391, Mar. 2006. [24] A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M.J. Strauss, “Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries,” Proc. 27th Int’l Conf. Very Large Data Bases (VLDB), 2001. [25] T. Urhan and M.J. Franklin, “Xjoin: A Reactively-Scheduled Pipelined Join Operator,” IEEE Data Eng. Bull., vol. 23, no. 2, pp. 27-33, June 2000. [26] J. Kang, J.F. Naughton, and S.D. Viglasg, “Evaluating Window Joins over Unbounded Streams,” Proc. 19th Int’l Conf. Data Eng. (ICDE), 2003. [27] M.F. Mokbel, M. Lu, and W.G. Aref, “HashMerge Join: A Non- Blocking Join Algorithm for Producing Fast and Early Join Results,” Proc. 20th Int’l Conf. Data Eng. (ICDE), 2004. [28] Y.F. Tao, M.L. Yiu, D. Papadias, M. Hadjieleftheriou, and N. Mamoulis, “RPJ: Producing Fast Join Results on Streams through Rate-Based Optimization,” Proc. ACM SIGMOD Int’l Conf. Management of Data, 2005. @ 2013 http://www.ijitr.com All rights Reserved. Page | 826