2 FREQUENTNESS PROBABILITYRATHER, MUST BE REPRESENTED BY A DISCRETE...

3.2 Frequentness Probability

rather, must be represented by a discrete probability distri-

bution.

Recall that we are interested in the probability that an

itemset is frequent, i.e. the probability that an itemset oc-

Definition 6. Given an uncertain (transaction) database

curs in at least minSup transactions.

T and the set W of possible worlds (instantiations) of T , the

support probability P i (X ) of an itemset X is the probability

Definition 9. Let T be an uncertain transaction database

that X has the support i. Formally,

and X be an itemset. P ≥i (X) denotes the probability that the

P i (X) = X

support of X is at least i, i.e. P ≥i (X) = P |T|

P (w j )

k=i P k (X ). For a

given minimal support minSup ∈ {0, . . . , |T |}, the probabil-

w

j

∈W,(S(X,w

j

)=i)

ity P ≥minSup (X ), which we call the frequentness probability

where S(X, w j ) is the support of X in world w j .

of X, denotes the probability that the support of X is at least

minSup.

Intuitively, P i (X) denotes the probability that the support

of X is exactly i . The support probabilities associated with

Figure 4(b) shows the frequentness probabilities of {D}

an itemset X for different support values form the support

for all possible minSup values in the database of Figure 2.

probability distribution of the support of X.

For example, the probability that {D} is frequent when

minSup = 3 is approximately 0.7, while its frequentness

Definition 7. The probabilistic support of an itemset X

probability when minSup = 4 is approximately 0.3.

in an uncertain transaction database T is defined by the sup-

The intuition behind P ≥minSup (X) is to show how confi-

port probabilities of X (P i (X)) for all possible support val-

dent we are that an itemset is frequent. With this policy,

ues i ∈ {0, ..., |T |}. This probability distribution is called

the frequentness of an itemset becomes subjective and the

support probability distribution. The following statement

decision about which candidates should be reported to the

holds: P

0≤i≤|T| P i (X) = 1.0.

user depends on the application. Hence, we use the mini-

Returning to our example of Figure 2, Figure 4(a) shows the

mum frequentness probability τ as a user defined parameter.

support probability distribution of itemset {D}.

Some applications may need a low τ , while in other applica-

The number of possible worlds |W | that need to be con-

tions only highly confident results should be reported (high

sidered for the computation of P i (X) is extremely large. In

τ ).

fact, we have O(2 |T|·|I| ) possible worlds, where |I| denotes

In the possible worlds model we know that P ≥i (X) =

the total number of items. In the following, we show how to

P

w

j

∈W :(S(X,w

j

)≥i) P (w j ). This can be computed according

compute P i (X) without materializing all possible worlds.

to Equation 1 by

Lemma 8. For an uncertain transaction database T with

P ≥i (X ) = X

(1 − P(X ⊆ t))).

P (X ⊆ t) · Y

( Y

mutually independent transactions and any 0 ≤ i ≤ |T |, the

t∈S

t∈T−S

S⊆T ,|S|≥i

support probability P i (X ) can be computed as follows:

(2)

Hence, the frequentness probability can be calculated by

(1−P(X ⊆ t))) (1)

P(X ⊆ t)· Y

enumerating all possible worlds satisfying the minSup condi-

S⊆T ,|S|=i

tion through the direct application of Equation 2. This naive

approach is very inefficient however. We can speed this up

P i,j (X) = 0 (i>j) P minSup,|T| ! 1 (X)

support i P minSup,|T| (X)

significantly. First, note that typically minSup << |T | and

|T |

minSup

0

the number of worlds with support i is at most

.

i

P minSup ! 1,|T| ! 1 (X)

Hence, enumeration of all worlds w in which the support of

X is greater than minSup is much more expensive than enu-

1

merating those where the support is less than minSup. Using

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

# transactions j

the following easily verified Lemma, we can compute the fre-

1 2 | | j

|T|

quentness probability exponentially in minSup << |T |.

start computation with P 1,1 (X) P 0,j (X)= 1

Lemma 10. P ≥i (X ) = 1 − P

t∈S P(X ⊆ t) ·

S⊆T:|S|<i ( Q

Figure 5: Dynamic Computation Scheme

Q

t∈T −S (1 − P (X ⊆ t))).

Despite this improvement, the complexity of the above ap-

The above dynamic programming scheme is an adaption of

proach, called Basic in our experiments, is still exponential

a technique previously used in the context of probabilistic

w.r.t. the number of transactions. In Section 4 we describe

top-k queries by Kollios et. al [17].

how we can reduce this to linear time.

=

Proof. P ≥i,j (X ) = P j