3 CONTRIBUTIONSEXISTING APPROACHES IN THE FIELD OF UNCERTAIN DATA MA...

1.3 Contributions

Existing approaches in the field of uncertain data man-

agement and mining can be categorized into a number of

We make the following contributions:

research directions. Most related to our work are the two

• We propose a probabilistic framework for frequent item-

categories “probabilistic databases” [4, 12, 13, 3] and “proba-

set mining in databases containing uncertain transac-

bilistic query processing” [7, 10, 17, 14].

tions, based on the possible worlds model.

The uncertainty model used in our approach is very close

to the model used for probabilistic databases. A probabilis-

• We present a dynamic computation method for com-

tic database denotes a database composed of relations with

puting the probability that an itemset is frequent, as

uncertain tuples [7], where each tuple is associated with a

well as the entire probability distribution function of

probability denoting the likelihood that it exists in the rela-

the support of an itemset, in O(|T |) time 3 . Without

tion. This model, called “tuple uncertainty”, adopts the pos-

this technique, it would run in exponential time in the

sible worlds semantics [3]. A probabilistic database repre-

number of transactions. Using our approach, our algo-

sents a set of possible “certain” database instances (worlds),

rithm has the same time complexity as methods based

where a database instance corresponds to a subset of un-

on the expected support [5, 6, 11]. However, our ap-

certain tuples. Each instance (world) is associated with the

proach yields much better effectiveness since it pro-

probability that the world is “true”. The probabilities re-

vides confidences for frequent itemsets.

flect the probability distribution of all possible database in-

stances. In the general model description [13], the possible

• We propose an algorithm to mine all itemsets that are

worlds are constrained by rules that are defined on the tu-

frequent with a probability of at least τ. Furthermore,

ples in order to incorporate object (tuple) correlations. The

we propose an additional algorithm that incrementally

ULDB model proposed in [4], which is used in Trio[1], sup-

outputs the uncertain itemsets in the order of their

ports uncertain tuples with alternative instances which are

frequentness probability. This ensures that itemsets

called x-tuples. Relations in ULDB are called x-relations

with the highest probability of being frequent are out-

containing a set of x-tuples. Each x-tuple corresponds to

put first. This has two additional advantages; First,

a set of tuple instances which are assumed to be mutually

it makes the approach free of the parameter τ. Sec-

exclusive, i.e. no more than one instance of an x-tuple can

ondly, it solves the top k itemsets problem in uncertain

appear in a possible world instance at the same time. Prob-

databases.

abilistic top-k query approaches [14, 17, 12] are usually asso-

ciated with uncertain databases using the tuple uncertainty

The remainder of this paper is organised as follows; Section

model. The approach proposed in [17] was the first approach

2 surveys related work. Section 3 presents our probabilistic

able to solve probabilistic queries efficiently under tuple in-

support framework. Section 4 shows how to compute the

dependency by means of dynamic programming techniques.

frequentness probability in O(|T |) time. Section 5 presents

In our paper, we adopt the dynamic programming technique

a probabilistic frequent itemset mining algorithm. Section 6

for the efficient computation of frequent itemsets in a prob-

presents our incremental algorithm. We present our experi-

abilistic way.

ments in Section 7 and conclude in Section 8.