Print this article

Transcription

Print this article

K.Padmapriya * et al. / (IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH
Volume No.2, Issue No. 2, February – March 2014, 819 - 826.
K.PADMAPRIYA
Research Scholar
Department of Computer Science & Engineering
Sathyabama University, Chennai, India
Dr.S.SRIDHAR
Professor & Dean – CCCF,
R.V.College of Engineering,
Bangalore, Karnataka, India
Abstract - Group join processing in huge volume of data streaming environment has a lot of practical
applications like pursuing, observing, arranging etc. Research in these areas usually takes stream
processing over precise data and benchmark data. In this paper, the similarity join processing on data
streams spontaneously takes data which contain uncertainty and inaccuracy. Since input data are coming
from a variety of resources at each time interval the data are uncertain. The main problem which is
grouping of uncertain data streams [USG] can be overcome using Modified Pruning [MP] method which
will guarantee the accuracy of the grouping of uncertain data. To challenge the encounters with respect to
the efficiency and effectiveness such as less time, limited memory and cost reduction the MP method
[which is a combination of object and sample levels] filters out false alarms. The MP method combined
with query procedures will incrementally answer the problem of USG. Since the data modeled, that is, the
real time data - like image data base, time series data, and sensor data - are uncertain and immense, the
uncertain data are group together by the group nearest neighboring method. Since the data is uncertain,
a novel procedure MPSRQ is utilized [Modified Probabilistic Subspace Range Query] to make the
subspace query function efficiently and effectively. This novel MPSRQ procedure finds objects within a
space from a query object in any subspace with high probability.
Keywords: Similarity Group; Subspace Query; Uncertain Data Streams, Group nearest Neighbor;
Pruning Method; Query Processing, Data Level Pruning; Object Level Pruning.
I.
INTRODUCTION
Recently, uncertain data analysis has become an
increasingly important issue due to the everywhere
data uncertainty in many real-world applications such
as sensor data monitoring [1], [2], [3], location-based
services (LBS) [4], RFID networks [5], object
identification [6], and moving object search [7], [8].
As an example, in sensor networks, sensory data
contain a lot of static, resulting from environmental
factors, packet losses, and low energy. The same LBS
is also used in few other examples, like, finding the
position of a mobile user employing the Global
Positioning system [GPS]. However, GPS data are
often imprecise for various reasons such as clock
errors, ephemeral errors, atmospheric delays,
multipathing and satellite geometry. Also, the data
path of mobile users is sometimes intentionally
distorted by a trustworthy third party for the sake of
privacy preserving [9], [4]. Therefore, in real
applications, it is not unusual to encounter uncertain
and imprecise data, and we have to effectively and
efficiently answer queries on such data.
Similarity search has been extensively studied for
traditional categorical and numerical data types in
relational data. There are also a few studies
leveraging link information in networks. Most of
these studies are focused on homogeneous networks
or bipartite networks, such as personalized Page
Rank (P-Page Rank) [10], Sim-Rank [11] and SCAN
[12]. However, these similarity measures disregard
the subtlety of different types among objects and
links. Adoptions of such measures to various
networks have significant drawbacks: Objects of
different types and links take different semantic
ISSN 2320 –5547
meanings, and it does not make sense to combine
them to measure the similarity without distinguishing
their semantics.
The volume of data managed by the Database
Management Systems (DBMS) is increasing
continuously. Moreover, new complex data types,
such as multimedia data (like image, audio, video and
long text), time series, fingerprints, geo-referenced
information, genomic data and protein sequences,
among others, have been added to DBMS [13].
Formally, a metric space is a pair < S, d( ) >, where S
is the data domain and d() is a distance function that
complies with the following three properties:
1.
2.
symmetry:
non-negativity:
∞
and
3.
Triangular
inequality:
A metric dataset
S is a set of objects si S
currently stored in a database. Vector based data
with Lp distance function, such as Euclidean distance
(L2), are special cases of metric spaces. The Range
query and the K-NN query are more or less similar
and is defined as:

Range query - Rq: given a query center object
sq S and a maximum query distance rq, the
query Rq(sq,rq) retrieves every object si S, such
that d(si,sq) < rq. An example is: "Select the
proteins that are similar to the protein P by up to
5 purine bases", which is represented as Rq(P,5);

k-Nearest Neighbor query - kNNq: given a
query center object sq S and an integer
value k > 1, the query kNNq(sq,k) retrieves
@ 2013 http://www.ijitr.com All rights Reserved.
Page | 819
the k objects in S that have the minimum
distance from the uncertainty object sq, according
to the distance function d(). An example is:
"Select the 3 protein mostly similar to the
protein P ", where k=3, which is represented
as kNNq(P,3).
In our model we specially make the following
clarities
1.
2.
3.
We solemnize the problem of Group on
uncertain data streams from subspaces using
GNN.
We provide a general structure for
accomplishing
USG
and
incrementally
preserving USG answer set.
We propose the Modified Pruning method to
prune false alarms of USG candidate pairs [data
pair] and integrate them into an efficient USG.
II.
RELATED WORK
Given below are the synopses of a few papers
presented:In [14] the authors propose the use of LocalitySensitive Hashing (LSH) to transform a D
dimensional vector x into a sequence of C bits (binary
vector) v(x). Since the L1 distance between the
various vectors can be approximated by the
Hamming (edit) distance between the corresponding
binary vectors, they propose a hashing technique to
index only the binary vectors v(x). Of course, both
preciseness and efficaciousness of the execution
highly depend on the number C bits used for
approximating vectors. The technique VSL1 applies
Hamming distance to the metric L1.
In [15] approximate nearest neighbor quest
techniques based on the VA-file [WSB98] are
presented. Such structure, in its essence, is a structure
in order containing approximations of vectors using a
fixed number b of bits. Exact k-NN search is
performed initially by executing a sequential scan of
the structure using the query distance on vectors
approximations, which yields a number M, where M
>k of candidate vectors, and then applying an
amelioration step, where the distance is assessed on
real vectors and only the k “best” vectors are
retained. Proposed techniques also suggest to reduce
the number of considered approximations by
diminishing the query radius (VA-BND) or to avoid
the improvement stage by rendering only the “best” k
candidate vectors, using the approximations
(VALOW).
In [16] the authors propound the P-Sphere tree, a 2level index structure for the nearest 1-NN search. In
order to ascertain the nearest neighbor for the query
point, the leaf node which is closest to the nearest
node is accessed by utilizing the distance function for
the query point. The distance is calculated on the
classified object, not based on a synchronized system,
ISSN 2320 –5547
and also no assumption is made. The distance
sometimes means the similitude between the
attributes.
Query processing on data flows has many essential
applications such as real-time processing of data
gathered from sensor networks [17]. Join processing
in non-modified databases has been extensively
researched, for example, the spatial join [18], [19],
[20]. Fundamentally, two spatial indexes (e.g., R-tree
[21]) are usually constructed offline for two static
data sets, in that order, and the spatial join can be
performed by traversing these two indexes in parallel.
Different from the join in static databases, the prior
works on data stream processing usually inferred that
the underlying data are specific data points, which is
also the inference in our USG problem. Hence, static
indexes cannot be composed offline. In other words,
in the concatenation data flow environment, the cost
of building/updating/querying indexes online may not
catch up with the speed of continuously incoming
data (with possibly high input rate). Many methods
have been designed for efficaciously processing
different query types on data streams, including the
top-k query [22], skyline query [23], aggregate query
[24], join [25], [26], [27], [28], and so on. To list a
few, Das et al. [22] studied the top-k queries on data
streams, which first transform data points to a double
space and then obtain fast top-k answers in this
double space. Tao and Papadias [23] designed lazy
and eager strategies to incrementally uphold skylines
over a sliding window from data streams. The
existing works on the join over data streams include
X Join [25], hash merge join [27], and rate-based
progressive join [28]. These works primarily focus on
the equality join over specific data points, which
allow disk accesses and can move the unprocessed
data onto disks for later join operation. Thus,
different policies to select memory partitions for
flushing were propounded in order to minimize the
total cost of the join operator on flow of data. In
contrast, our USG problem considers the similarity
join with range predicate (rather than equality join)
on streams where data are uncertain and imprecise.
III.
EXISTING APPROACH
Initially we define the problem based upon the
subspace similarity search over uncertain data in
brief. Then we expound the problem definition of
SSS over precise data. A database D contains N
precise data objects in a n-dimensional full space DM
and a query q in a k-dimensional subspace DM’ of
DM [i.e., DM’ DM, k [kmin,kmax], and k << n] , a
subspace similarity query retrieves all the objects obj
D such as dist(q,obj) ≤ , where obj is a kdimensional point obtained by the projection of
object obj on subspace DM’ and dist(., .) is a distance
function (p [1, ∞]). In particular, given any two kdimensional data points x and y in the subspace DM’,
the function dist(x, y) is defined as
Page | 820
p
Dist(x, y) =
(1)
′
Where 1≤ p ≤∞. When p = ∞, the distance function
dist(x, y) is given by
Dist(x, y) =
(2)
′
In this existing system the problem exclusively finds
the output data retrieved from the database using
equation 1 and 2, the solemnization of the problem
for the similarity search over precise data is in
arbitrary space. One of the key issues in the similarity
search problem is the development of efficient
recovery methods. In order to facilitate a fast
similarity search, the preceding works usually
constructed multidimensional indexes, such as R-tree
[16], for the data set, on which either range or nearest
neighbor query is issued. Fundamentally we find the
distance between the data in a subspace and it should
come under certain constraints from the arbitrary
subspaces.
IV.
CW(DS1) =
(3)
CW(DS2)
=
(4)
At the current time interval t, it can be said that when
a new certain object x[t+1] (y[t+1]) comes in at the
next time interval (t+1), this new object x[t+1]
(y[t+1]) is appended to DS1(DS2). At that particular
time the old object x[t-cw+1] (y[t-cw+1]) expires and
is evicted from the memory. Thus, USG at time
interval (t+1) is conducted on a new compartment
window
{x[t-cw+2],
……x[t+1]}
(y[tw+2],….,y[t+1]}) of size cw.
PROBLEM DEFINITION
The proposed approach Modified Pruning method
combines by creating the USG framework and
making a similar object retrieval procedure. This
framework is about considering data streams with the
component window at a particular time interval. It
considers the data stream into hypersphere objects
and gets the similarity distance by making sample
objects placed inside the whole data objects set. Each
time the uncertain data object satisfies the inequality
[equation - 5, 6, 7, 8], obtains those objects through
an object retrieval procedure and removes the old
objects from the space. To obtain the data we invoke
a procedure getdata_pair() from the data objects in
the hyper space and remove the expired objects, in
order to reduce the complexity.
A. USG_Framework – [Data Level Pruning]
The main problem defined in this paper is grouping
on uncertain Data streams. There are n numbers
uncertain data streams available in a data pool, from
that without loss of generality, we consider two
uncertain data streams in our experiment. A complete
two uncertain data streams DS1 and DS2 are taken as
inputs for the USG problem, where both data streams
consist of a sequence of continuously occurring
uncertain objects in different time intervals, as
denoted below:
Fig.1: Grouping Uncertain Data Streams
For Grouping the uncertain Data Streams, we utilize
two data streams DS1 and DS2, a distance threshold
[0, 1] and a
value , a probabilistic threshold α
group on uncertain data streams which continuously
monitor pairs of uncertain objects x[i] and y[i] within
the compartment windows CW(DS1) and CW(DS2),
respectively, of size cw at the current period of clock
interval t, and it can be presented as:
(5)
To perform an USG equation-5, users need to register
and
two parameters - distance threshold
probabilistic threshold α. Since each uncertain object
at a given time consists of R samples, the grouping
probability is P|r{dist( x[i], y[i])
} in Inequality
(5) which can be rewritten as:
DS1 =
DS2 =
Where x[i] or y[i] - k-dimensional uncertain object at
the time interval I, t - Current time interval.
According to the group nearest neighbor, the objects
should retrieve close pairs of objects within a period.
Thus a compartment window concept is adapted for
uncertain stream group operators. From figure-1, it is
clear that the USG operator always considers the
most recent CW uncertain data stream, that is,
ISSN 2320 –5547
(6)
One straightforward method to directly perform USG
over compartment windows is to follow the USG
definition. That is, for every object pair X[i], Y[i]
from compartment windows CW(DS1) and
CW(DS2), respectively, we compute the grouping
probability that X[i] is within distance from Y[i]
(via samples) based on equation(6). If the resulting
probability is greater than or equal to probabilistic
Page | 821
threshold α, then this pair X[i], Y[i] is reported
as the USG answer; otherwise, it is a false alarm and
can be safely eliminated.
Table-1: Symbols and Descriptions
Symbol
Description
DS1 (DS2)
Uncertain data streams
CW(DS1)
Compartment window in DS1
(DS2) with the most recent w data
(CW(DS2))
X[i] (Y[j])
Uncertain object at the timestamp
i (j) in DS1 (DS2)
X[i] ( y[j])
Sample of object X[i] (Y[j]) (1
k l)
Xk[i].p (yk[j].p) Appearance probability of sample
xk[i]
HS (X[i])
Hypersphere
bounding
all
samples of X[i] centered at CX[i]
and with radius rX[i]
Distance threshold of USG
processing
Α
Probabilistic threshold of USG
processing
Further, we provide Table 1 which mentions the
commonly used symbols for understanding the
complete formulations and notations presented in this
paper.
The complete functionality of the USG framework
can further be implemented through any computer
language and can be verified. Since the verification
of the USG framework can be obtained, it is given as
a pseudo code form in Figure-2.
Fig 2: Pseudo code for USG_Framework
B. Object Level Pruning
Further the data streams are converted into data
Pseudo Code: USG_Framework () {
Input: Two uncertain data streams DS1 and DS2 are
separated from a group of uncertain data stream DS and
we initialize two variables - a distance threshold and a
probabilistic threshold α.
Output: Obtain the USG results between CW(DS1) and
CW(DS2) by grouping the similarity.
 Store DS1 And DS2 in an array or in any type
of data structure which is flexible
 For every time interval (t+1)
 Obtain uncertain object X[t+1] and
Y[t+1] from uncertain data streams DS1
and DS2 respectively
 Then add the new object X[t+1 (Y[t+1]) )
to and obliterate the expired objects
 X[t-cw+1] (Y[t-cw+1]) from CW(DS1)
(CW(DS2))
 Invoke the procedure getdata_pair( ) to
find the data objects Y[j] (X[i]) in
CW(DS2)
(CW(DS1))
such
that
inequality (5) holds for pair
(
 Insert the data pair
(
to the result RS and
obliterate the expired pair in RS
 Report actual USG answers in RS and t =
t+1.
}
ISSN 2320 –5547
objects by taking the random sampling method and it
is given in a compartment window which consists of l
random sample.
Fig.3: Object-level pruning method
Clearly if all the pair wise distances between samples
from two uncertain objects X[t+1] and Y[j] are above
threshold , then these two uncertain objects will
definitely have their distance above , and in turn
= 0.
also above
The uncertain data streams converted into
hypersphere data object is presented clearly in figure3 depicted above.
C. Modified Pruning Method
In this section the rationale behind the pruning
method in USG processing is presented. There are
lots of pruning methods available like Sample level,
Data level, Index pruning, CarmelTopKTerm Pruning
Policy, etc. But, the Modified Object level pruning is
tested in this paper for better performance. In the
object pruning method an uncertain object X[t+1]
from the component window CW(DS1) and a number
of uncertain objects Y[j] (t-w+2 j t+1) from the
component window CW(DS2), would discard data
pairs if they do not satisfy the group probability given
below:
Pr
ε
= 0.
(7)
In other words, we want to prune those pairs
such that objects X[t+1] and Y[j] always
have a distance to each other greater than the distance
threshold . Thus Figure-3 illustrates how to reduce
cost using an example. When a new uncertain object,
say X[t+1], is bound by all the L samples in X[t+1]
along with a hypersphere HS(X[t+1]) then it should
be centered in the centroid, CX[t+1], of X[t+1], or
should satisfy the following inequality.
.
The case of uncertain object Y[j] is similar to that of
uncertain object X[j], that is we use a hypersphere
HS(Y[j]) to bind all samples of Y[j]. The object-level
pruning method is described below, in the following
lemma.
Lemma 1: [Object level pruning]. Given a pair of
uncertain objects X[t+1] and Y[j], and a distance
Page | 822
threshold , the candidate pair
safely pruned if it holds that
can be
. (8)
Proof: From Figure-3, it is spontaneously known
that the LHS of the inequality equation (8)
corresponds to the minimum possible sample
distance equation between objects X[t+1] and Y[j]
and if this minimum distance is greater than the
distance threshold , then inequality equation (5) in
the USG definition will never hold because of [ α >
0], and thus it discards this object pair.
To improve the efficiency of the object level pruning
one more inequality constraint is applied to the data
(9).
Then, instead of exhaustive computation, only those
uncertain objects in grid cells satisfying equation (9)
are needed to be accessed, where rX[t+1] is the radius
of object X[t+1], and rmax(DS2) is defined as the
maximum radius among all objects in component
windows CW(DS2).
for all
Since we construct a grid
with centers CY[j] of uncertain objects Y[j] in the
component window CW(DS2), we apply the objectlevel pruning method (in Lemma1) to the grid index.
After that, we can obtain a number of data pairs that
cannot be pruned on the object level and satisfy the
inequality equation (9) which will be explained and
experimented in future works. The input data is
preprocessed, normalized, centroid created and using
the radius in hyper sphere, the query is applied. This
sequence of steps can be implemented in any
computer programming languages like DOTNET,
JAVA etc., to verify the efficiency of the proposed
approach. In this paper the pseudo code given in the
following Figure-4 is implemented in MATLAB
software and the results produced with detailed
explanation in the Results and Discussion section.
V.
RESULTS AND DISCUSSION
The USG framework and the modified pruning
method is experimented and simulated using
MATLAB 2012a software where the input data is
taken from the Benchmark data of US government
share market data.
D. Query processing
One imperative step is to beseech the procedure
getdata_pair() to recover data pairs from the data
streams.
Procedure getdata_pair( ) {
Input: data streams DS1, DS2… DSn, processed as
uncertain objects X[t+1] and Y[j] in CW[T2], a distance
threshold , and a probabilistic threshold α.
Output: data pairs
satisfying inequality
(5)
 Convert the data into objects // by applying
random sampling method
 Decide the compartment window size CW
 Find the centroid and the radius of each
compartment
 Find average similarity distance between X, Y p
 Sort the data in the particular compartment
 Get the probability of distance similar data
 Retrieve uncertain objects Y[j] in grid cells
satisfying inequality (6) // using object level
pruning, lemma 1
 For each remaining data pair
check the inequality (1) by computing the group
probability via objects
 Return all the data pairs that pass the checking.
}
Fig 4: Pseudo Code for getdata_pair ( )
For developing the USG framework in any computer
language, the pseudo code getdata_pair () is
presented below. Specifically, the new uncertain
object X[t+1] from uncertain stream DS1, procedure
getdata_pair( ) retrieves candidate pairs
which sustains the Inequality equation (5),
ISSN 2320 –5547
Fig. 5a: Original uncertain data from Benchmark
Database of DS1
Fig.5b: Original uncertain data from Benchmark
Database of DS2
The data consists of five fields in excel format where
the first field mentions the region, the second field
mentions the data and time, the third filed mentions
the total demand, the fourth field mentions the RRP
and the fifth field mentions the period type as Trade
or Non-Trade. For this research only the numerical
data is taken from excel data and assumed as data
stream1 and data stream2.
Page | 823
The original data of the data stream1 and the data
stream2 is shown in Figure-5a and Figure-5b. There
are a lot of differences in the original data, where the
limited size of the data is taken, [1000 columns] and
is represented as 1x1000 for both DS1 and DS2.
Fig.7a: The Hypersphere data for compartment
window of DS1
Fig.6a: The Compartment Window for DS1
Fig.6b: The Compartment Window for DS2
From the data 1x1000, the compartment window size
is 100 and the CW (DS1) and the CW (DS2) is the
compartment window data for the data stream1 and
data stream2 and it is shown in Figure-6a and Figure6b in order to reduce the complexity and to increase
the speed in preprocessing the data. The compartment
window concept is used for sampling, range
estimation, distance distribution and reference point
estimation. So that the stream joining process can be
applied on the compartment window and the USG
will take the most recent CW uncertain data streams
as CW (DS1) and CW (DS2).
Once the CW sized data taken from the DS1, DS2
then the data is converted into hyperspace data,
sorted, find the centroid value and radius for the CW.
The data converted into hyperspace is by selecting
the random sampling for the compartment size as
100. The figure-7a and figure-7b represents the
random sampling HS(DS1), HS(DS2) for the
CW(DS1) and CW(DS2). Now the comparison of
HS(Y[j]) with the HS(X[t=1]) should satisfy the
inequality conditions of Equation (7) and (8) produce
the matching pairs.
ISSN 2320 –5547
Fig.7b: The Hypersphere data for compartment
window of DS1
The figure-8a and figure-8b gives the data objects
from HS(CW(DS1)) and HS(CW(DS2)) which are
satisfying the in-equality constraints [5] and [8]. The
data objects those who satisfy the in-equality
condition means we retrieve the original data pairs
from the DS1 and DS2 at the time interval t and the
.
pair is denoted as
The similarity data is retrieved by the Modified
Pruning method for the uncertain data stream taken
and experimented from the USG framework and it
represents the USG answer which is shown in Figure9.
Fig.8a: Data satisfy the In-equal constrains from
DS1
Page | 824
Fig.8b: Data satisfy the In-equal constrains from
DS2
Fig. 10: Performance Evaluation of the proposed
approach.
VI.
Fig.9: Query Results for getdata_pair()
The total data size is 1401 rows, representing 12
months trading information where each row consists
of 5 columns. The USG experiment assigned with 2
columns which is about the time, trading amount.
Out of 1401 there are 1200 data are similar. This
experiment can be extended for the entire data set and
the performance of the proposed approach is given in
the following figure -10 and in the table-1.
Year
2005
2006
2007
2008
2009
2010
2011
2012
Total DS size
1800
2500
2200
1455
1401
1401
1401
1401
Similarity Found
1500
2234
2087
1376
1200
1108
1203
1200
Table-1: More Data Streams Experimented for
Compare the Performance of the proposed
approach.
From the above Table-1 it is clear that, which year,
what is the original data stream size taken and what
will be the resultant data given by the query process
by the proposed approach.
The graphical
representation of the Table-1 is given in the Figure10, where the first column gives the year 2005 and
the data size is 1800 and the result of the similarity
searching for the USG framework and the
getdata_pair() is 1500 and it is the same way for all
the columns are representing the year with data size
and the result.
ISSN 2320 –5547
CONCLUSION
A simulation based problem for grouping uncertain
data streams is reserved and it is observed that alike
set of uncertain objects with high self-assurance
between multiple data streams is the data input.
Essentially the data streams are having same features
and uncertainty in the level, so we proposed a
framework USG and object level pruning be the
preprocessing and normalizing the data in the search
space and it make easy for searching. We make
obvious through widespread experiment the
competence and success of our proposed USJ
processing techniques under different parameter
settings.
In Future work the cost and the time interval should
be analyzed and due to cost the USG framework will
be improved as cost effective USG for similarity
GNN for uncertain data.
REFERENCES
[1]
A. Faradjian, J. Gehrke, and P. Bonnet, “Gadt:
A Probability Space ADT for Representing
and Querying and Physical World,” Proc. 18th
Int’l Conf. Data Eng. (ICDE), 2002.
[2]
M. Li and Y. Liu, “Underground Coal Mine
Monitoring with Wireless Sensor Networks,”
ACM Trans. Sensor Networks, vol. 5, pp. 129, 2009.
[3]
Z. Yang and Y. Liu, “Quality of Trilateration:
Confidence-Based Iterative Localization,”
IEEE Trans. Parallel and Distributed Systems,
vol. 21, no. 5, pp. 631-640, May 2010.
[4]
M.F. Mokbel, C.-Y. Chow, and W.G. Aref,
“The New Casper: Query Processing for
Location Services without Compromising
Privacy,” Proc. 32nd Int’l Conf. Very Large
Data Bases (VLDB), 2006.
[5]
S.R. Jeffery, M.J. Franklin, and M.
Garofalakis, “An Adaptive RFID Middleware
for
Supporting
Metaphysical
Data
Page | 825
Independence,” The VLDB J., vol. 17, no. 2,
pp. 265-289, 2008.
[6]
[7]
[8]
[9]
C. Bo¨hm, A. Pryakhin, and M. Schubert,
“The
Gauss-Tree:
Efficient
Object
Identification in Databases of Probabilistic
Feature Vectors,” Proc. 22nd Int’l Conf. Data
Eng. (ICDE), 2006.
R. Cheng, D. Kalashnikov, and S. Prabhakar,
“Querying Imprecise Data in Moving Object
Environments,” IEEE Trans. Knowledge and
Data Eng., vol. 16, no. 9, pp. 1112-1127, Sept.
2004.
L. Chen, M.T. O ¨ zsu, and V. Oria, “Robust
and Fast Similarity Search for Moving Object
Trajectories,” Proc. ACM SIGMOD Int’l
Conf. Management of Data, 2005.
B. Gedik and L. Liu, “Location Privacy in
Mobile
Systems:
A
Personalized
Anonymization Model,” Proc. 25th Int’l Conf.
Distributed Computing Systems, 2005.
[10]
G. Jeh and J. Widom. Scaling personalized
web search. In WWW’03, 271–279, 2003.
[11]
G. Jeh and J. Widom. Simrank: a measure of
structural-context similarity. In KDD’02, 538–
543, 2002.
[12]
X. Xu, N. Yuruk, Z. Feng, and T. A. J.
Schweiger. Scan: a structural clustering
algorithm for networks. In KDD’07, 824–833,
2007.
[13]. Marcos R. Vieira, Caetano Traina Jr., Fabio J.
T. Chino, Agma J. M. Traina, “DBM-Tree:
Trading Height-Balancing forPerformance in
Metric Access Methods” , CEP 13560-970 –
Sao Carlos – SP – Brazil.
[14]
[15]
[16]
Aristides Gionis, Piotr Indyk, and Rajeev
Motwani. Similarity search in high dimensions
via hashing. In Proceedings of 25th
International Conference on Very Large Data
Bases (VLDB’99), pages 518–529, Edinburgh,
Scotland, UK, September 1999.
Roger Weber and Klemens B¨ohm. Trading
quality for time with nearest-neighbor search.
In Proceedings of the 7th International
Conference
on
Extending
Database
Technology (EDBT2000), pages 21–35,
Konstanz, Germany, March 2000.
Jonathan Goldstein and Raghu Ramakrishnan.
Contrast plots and P-Sphere trees: Space vs.
time in nearest neighbor searches. In
Proceedings of 26th International Conference
on Very Large Data Bases (VLDB 2000),
pages 429–440, Cairo, Egypt, September
2000.
ISSN 2320 –5547
[17]
D. Carney, U. C¸ etintemel, M. Cherniack, C.
Convey, S. Lee, G. Seidman, M. Stonebraker,
N. Tatbul, and S.B. Zdonik, “Monitoring
Streams - A New Class of Data Management
Applications,” Proc. 28th Int’l Conf. Very
Large Data Bases (VLDB), 2002.
[18]
T. Brinkhoff, H-P. Kriegel, and B. Seeger,
“Efficient Processing of Spatial Joins Using RTrees,” Proc. ACM SIGMOD Int’l Conf.
Management of Data, 1993.
[19]
Y.-W. Huang, N. Jing, and E.A.
Rundensteiner, “Spatial Joins Using R-Trees:
Breadth-First
Traversal
with
Global
Optimizations,” Proc. 23rd Int’l Conf. Very
[20]
M.L. Lo and C.V. Ravishankar, “Spatial HashJoins,” ACM SIGMOD Record, vol. 25, pp.
247-258, 1996.
[21]
A. Guttman, “R-Trees: A Dynamic Index
Structure for Spatial Searching,” Proc. ACM
SIGMOD Int’l Conf. Management of Data,
1984.
[22]
G. Das, D. Gunopulos, N. Koudas, and N.
Sarkas, “Ad-Hoc Top-k Query Answering for
Data Streams,” Proc. 33rd Int’l Conf. Very
[23]
Y. Tao and D. Papadias, “Maintaining Sliding
Window Skylines on Data Streams,” IEEE
Trans. Knowledge and Data Eng., vol. 18, no.
3, pp. 377-391, Mar. 2006.
[24]
A.C. Gilbert, Y. Kotidis, S. Muthukrishnan,
and M.J. Strauss, “Surfing Wavelets on
Streams:
One-Pass
Summaries
for
Approximate Aggregate Queries,” Proc. 27th
Int’l Conf. Very Large Data Bases (VLDB),
2001.
[25]
T. Urhan and M.J. Franklin, “Xjoin: A
Reactively-Scheduled
Pipelined
Join
Operator,” IEEE Data Eng. Bull., vol. 23, no.
2, pp. 27-33, June 2000.
[26]
J. Kang, J.F. Naughton, and S.D. Viglasg,
“Evaluating Window Joins over Unbounded
Streams,” Proc. 19th Int’l Conf. Data Eng.
(ICDE), 2003.
[27]
M.F. Mokbel, M. Lu, and W.G. Aref, “HashMerge Join: A Non- Blocking Join Algorithm
for Producing Fast and Early Join Results,”
Proc. 20th Int’l Conf. Data Eng. (ICDE), 2004.
[28]
Y.F. Tao, M.L. Yiu, D. Papadias, M.
Hadjieleftheriou, and N. Mamoulis, “RPJ:
Producing Fast Join Results on Streams
through Rate-Based Optimization,” Proc.
ACM SIGMOD Int’l Conf. Management of
Data, 2005.
Page | 826

Print this article

Transcription

Similar documents

rmm43 english - Runcit Media

2-Plate ELISA Processing System

Pruning Part II - Oregon State University Extension Service

Part I - Oregon State University Extension Service

Basics of Tree Pruning - Regina Horticultural Society

- Lab for Media Search - National University of Singapore

- Lab for Media Search - National University of Singapore

classic service van for ford transit

Walter Reeves` Shrub Pruning Calendar

Photo search by face positions and facial attributes on

Leaflet Catalyst Recycling