2 PROBLEM DEFINITIONTION THAT CONTAINS UNCERTAIN ITEMS. A TRANSACTIO...

1.2 Problem Definition

tion that contains uncertain items. A transaction database

T containing uncertain transactions is called an uncertain

An itemset is a frequent itemset if it occurs in at least

transaction database.

minSup transactions, where minSup is a user specified pa-

rameter. In uncertain transaction databases however, the

An uncertain transaction t is represented in an uncertain

support of an itemset is uncertain; it is defined by a dis-

transaction database by the items x ∈ I associated with an

existential probability value 1 P (x ∈ t) ∈ (0, 1]. Example

crete probability distribution function (p.d.f). Therefore,

each itemset has a frequentness probability 2 – the probabil-

uncertain transaction databases are depicted in Figures 1

and 2. To interpret an uncertain transaction database we

ity that it is frequent. In this paper, we focus on the problem

of efficiently calculating this p.d.f. and extracting all proba-

apply the possible world model. An uncertain transaction

bilistic frequent itemsets;

database generates possible worlds, where each world is de-

fined by a fixed set of (certain) transactions. A possible

Definition 4. A Probabilistic Frequent Itemset (PFI) is

world is instantiated by generating each transaction t i ∈ T

an itemset with a frequentness probability of at least τ.

according to the occurrence probabilities P (x ∈ t i ). Con-

sequently, each probability 0 < P (x ∈ t i ) < 1 derives two

The parameter τ is the user specified minimum confidence

possible worlds per transaction: One possible world in which

in the frequentness of an itemset.

x exists in t i , and one possible world where x does not exist

We are now able to specify the Probabilistic Frequent Item-

in t i . Thus, the number of possible worlds of a database

set Mining (PFIM) problem as follows; Given an uncertain

increases exponentially in both the number of transactions

transaction database T , a minimum support scalar minSup

and the number of uncertain items contained in it.

and a frequentness probability threshold τ, find all proba-

Each possible world w is associated with a probability

bilistic frequent itemsets.

that that world exists, P (w). Figure 1(b) shows all possible

1 If an item x has an existential probability of zero, it does

2 Frequentness is the rarely used word describing the prop-

not appear in the transaction.

erty of being frequent.