1 EFFICIENT COMPUTATION OF PROBABILISTIC SUP-ITEMSET X BY CALCULATIN...

4.1 Efficient Computation of Probabilistic Sup-

itemset X by calculating the cells depicted in Figure 5. In

port

the matrix, each cell relates to a probability P ≥i,j , with i

marked on the x-axis, and j marked on the y-axis. Note

The key to our approach is to consider it in terms of sub-

that according to Lemma 12, in order to compute a P ≥i,j ,

problems. First, we need appropriate definitions;

we require the probabilities P ≥i−1,j−1 and P ≥i,j−1 , that is,

the cell to the left and the cell to the lower left of P ≥i,j .

Definition 11. The probability that i of j transactions

Knowing that P ≥0,0 = 1 and P ≥1,0 = 0 by definition, we can

contain itemset X is

start by computing P ≥1,1 . The probability P ≥1,j can then

P (X ⊆ t) · Y

( Y

P i,j (X ) = X

(1 − P (X ⊆ t)))

be computed by using the previously computed P ≥1,j−1 for

all j. P ≥1,j can, in turn, be used to compute P ≥2,j . This

S⊆T

j

:|S|=i

t∈S

t∈T

j

−S

iteration continues until i reaches minSup, so that finally

where T j = {t 1 , ..., t j } ⊆ T is the set of the first j transac-

we obtain P ≥minSup,|T| – the frequentness probability (Def-

tions. Similarly, the probability that at least i of j transac-

inition 9).

tions contain itemset X is

Note that in each line (i.e. for each i) of the matrix in

Figure 5, j only runs up to |T | − minSup + i. Larger values

P(X ⊆ t) · Y

P ≥i,j (X) = X

of j are not required for the computation of P minSup,|T| .

S⊆T

j

:|S|≥i

Lemma 13. The computation of the frequentness proba-

Note that P ≥i,|T| (X ) = P ≥i (X ), the probability that at

bility P ≥minSup requires at most O(|T | ∗ minSup) = O(|T |)

least i transactions in the entire database contain X. The

time and at most O(|T |) space.

key idea in our approach is to split the problem of computing

P ≥i,|T| (X ) into smaller problems P ≥i,j (X), j < |T |. This

Proof. Using the dynamic computation scheme as shown

can be achieved as follows. Given a set of j transactions

in Figure 5, the number of computations is bounded by the

T j = {t 1 , ..., t j } ⊆ T : If we assume that transaction t j con-

size of the depicted matrix. The matrix contains |T |∗minSup

tains itemset X, then P ≥i,j (X ) is equal to the probability

cells. Each cell requires an iteration of the dynamic com-

that at least i − 1 transactions of T j \{t j } contain X. Oth-

putation (c.f. Corollary 12) which is performed in O(1)

erwise, P ≥i,j (X ) is equal to the probability that at least i

time. Note that a matrix is used here for illustration purpose

transactions of T j \{t j } contain X . By splitting the problem

only. The computation of each probability P i,j (X) only re-

quires information stored in the current line and the previous

in this way we can use the recursion in Lemma 12, which

tells us what these probabilities are, to compute P ≥i,j (X)

line to access the probabilities P i−1,j−1 (X) and P i,j−i (X) .

by means of the paradigm of dynamic programming.

Therefore, only these two lines (of length |T |) need to be

preserved requiring O(|T |) space. Additionally, the proba-

bilities P (X ⊆ t j ) have to be stored, resulting in a total of

Lemma 12. P ≥i,j (X) =

O(|T |) space.

P ≥i−1,j−1 (X ) · P (X ⊆ t j ) + P ≥i,j−1 (X ) · (1 − P j (X ⊆ t j ))

Note that we can save computation time if an itemset is

where

certain in some transactions. If a transaction t j ∈ T contains

P ≥0,j = 1 ∀.0 ≤ j ≤ |T |, P ≥i,j = 0 ∀.i > j

itemset X with a probability of zero, i.e. P(X ⊆ t j ) = 0,

support i

transaction t j can be ignored for the dynamic computation

P

minSup,|T|

(X)

because P ≥i,j (X ) = P ≥i,j−1 (X ) holds (Lemma 12). If |T 0 |

0 P

minSup!d,T!d

(X) " P

minSup,T

(X)

minSup

is less than minSup, then X can be pruned since, by defini-

0

pruning criterion:

tion, P ≥minSup,T

0

= 0 if minSup > T 0 . The dynamic compu-

if P

minSup

if P (X)<

tation scheme can also omit transactions T j where the item

!d,T!d

(X)<

then stop computation

1

has a probability of 1, because P ≥i,j (X) = P ≥i−1,j−1 (X)

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

due to P(X ⊆ t j ) = 1. Thus, if a transaction t j contains

# transactions j

j

|T|

1 2

| |

X with a probability of 1, then t j (i.e. the corresponding

column) can be omitted if minSup is reduced by one, to com-

Figure 6: Visualization of the Pruning Criterion

pensate the missing transaction. The dynamic programming

scheme therefore only has to consider uncertain items. We