1.3 Contributions
Existing approaches in the field of uncertain data man-
agement and mining can be categorized into a number of
We make the following contributions:
research directions. Most related to our work are the two
• We propose a probabilistic framework for frequent item-
categories “probabilistic databases” [4, 12, 13, 3] and “proba-
set mining in databases containing uncertain transac-
bilistic query processing” [7, 10, 17, 14].
tions, based on the possible worlds model.
The uncertainty model used in our approach is very close
to the model used for probabilistic databases. A probabilis-
• We present a dynamic computation method for com-
tic database denotes a database composed of relations with
puting the probability that an itemset is frequent, as
uncertain tuples [7], where each tuple is associated with a
well as the entire probability distribution function of
probability denoting the likelihood that it exists in the rela-
the support of an itemset, in O(|T |) time 3 . Without
tion. This model, called “tuple uncertainty”, adopts the pos-
this technique, it would run in exponential time in the
sible worlds semantics [3]. A probabilistic database repre-
number of transactions. Using our approach, our algo-
sents a set of possible “certain” database instances (worlds),
rithm has the same time complexity as methods based
where a database instance corresponds to a subset of un-
on the expected support [5, 6, 11]. However, our ap-
certain tuples. Each instance (world) is associated with the
proach yields much better effectiveness since it pro-
probability that the world is “true”. The probabilities re-
vides confidences for frequent itemsets.
flect the probability distribution of all possible database in-
stances. In the general model description [13], the possible
• We propose an algorithm to mine all itemsets that are
worlds are constrained by rules that are defined on the tu-
frequent with a probability of at least τ. Furthermore,
ples in order to incorporate object (tuple) correlations. The
we propose an additional algorithm that incrementally
ULDB model proposed in [4], which is used in Trio[1], sup-
outputs the uncertain itemsets in the order of their
ports uncertain tuples with alternative instances which are
frequentness probability. This ensures that itemsets
called x-tuples. Relations in ULDB are called x-relations
with the highest probability of being frequent are out-
containing a set of x-tuples. Each x-tuple corresponds to
put first. This has two additional advantages; First,
a set of tuple instances which are assumed to be mutually
it makes the approach free of the parameter τ. Sec-
exclusive, i.e. no more than one instance of an x-tuple can
ondly, it solves the top k itemsets problem in uncertain
appear in a possible world instance at the same time. Prob-
databases.
abilistic top-k query approaches [14, 17, 12] are usually asso-
ciated with uncertain databases using the tuple uncertainty
The remainder of this paper is organised as follows; Section
model. The approach proposed in [17] was the first approach
2 surveys related work. Section 3 presents our probabilistic
able to solve probabilistic queries efficiently under tuple in-
support framework. Section 4 shows how to compute the
dependency by means of dynamic programming techniques.
frequentness probability in O(|T |) time. Section 5 presents
In our paper, we adopt the dynamic programming technique
a probabilistic frequent itemset mining algorithm. Section 6
for the efficient computation of frequent itemsets in a prob-
presents our incremental algorithm. We present our experi-
abilistic way.
ments in Section 7 and conclude in Section 8.
Bạn đang xem 1. - BAI 3 URBAN PLANNING