2.1 MONOTONICITY CRITERIAX . IN UNCERTAIN TRANSACTION DATABASES HOWE...

4.2.1 Monotonicity Criteria

X . In uncertain transaction databases however, recall that

First, if we increase the minimal support i, then the fre-

support is defined by a probability distribution and that

quentness probability of an itemset decreases.

we mine itemsets according to their frequentness probabil-

ity. It turns out that the frequentness probability is anti-

Lemma 14. P ≥i,j (X) ≥ P ≥i+1,j (X).

monotonic:

= P ≥i,j (X) − P i+1,j (X ) ≤

Proof. P ≥i+1,j (X) Def inition 9

Lemma 17. ∀Y ⊆ X : P ≥minSup (X) ≤ P ≥minSup (Y ). In

P ≥i,j (X)

other words, all subsets of a probabilistic frequent itemset

are also probabilistic frequent itemsets.

Intuitively, this result is obvious since the predicate “the

support is at least i” implies “the support is at least i + 1”.

Proof. P ≥i (X) = |W 1 | P |W|

i=1 P (w i )·I S(X,w

i

)≥minSup , since

The next criterion says that an extension of the uncertain

the probability is defined over all possible worlds. Here, I Z

transaction database leads to an increase of the frequentness

is an indicator variable that is 1 when z = true and 0 oth-

probability of an itemset.

erwise. In other words, P ≥i (X) is the relative number of

worlds in which S(X) ≥ minSup holds, where each occur-

Lemma 15. P ≥i,j (X) ≤ P ≥i,j+1 (X).

rence is weighted by the probability of the world occurring.

Since world w i corresponds to a normal transaction database

Proof. P ≥i,j+1 (X) Lemma = 12 P ≥i−1,j (X) ·P (X ⊆ t j+1 ) +

Lemma14

with no uncertainty, S(X, w i ) ≤ S (Y, w i )∀Y ⊆ X due to the

≥ P ≥i,j (X ) · P (X ⊆

P ≥i,j (X) · (1 − P(X ⊆ t j+1 ))

anti-monotonicity of support. Therefore, I S(X,w

i

)≥minSup ≤

t j+1 ) + P ≥i,j (X) · (1 − P (X ⊆ t j+1 )) = P ≥i,j (X)

I S(Y,W

i

)≥minSup ∀i ∈ |W |, ∀Y ⊆ X and, thus, P ≥i (X) ≤

P ≥i (Y ), ∀Y ⊆ X.

The intuition behind this lemma is that one more transac-

tion can increase the support of an itemset. Putting these

We can use the contra-positive of Lemma 17 to prune

results together;

the search space for probabilistic frequent itemsets. That

is, if an itemset Y is not a probabilistic frequent itemset,

Lemma 16. P ≥i,j (X) ≥ P ≥i+1,j+1 (X).

i.e. P ≥minSup (Y ) < τ, then all itemsets X ⊇ Y cannot be

probabilistic frequent itemsets either.

= P ≥i,j (X )·P (X ⊆ t j+1 )+

Proof. P ≥i+1,j+1 (X) Corollary12

Our first algorithm is based on a “marriage” of traditional

≤ P ≥i,j (X ) · P (X ⊆

P ≥i+1,j (X)(1 − P(X ⊆ t j+1 ))

frequent itemset mining methods and our uncertain item-

t j+1 ) + P ≥i,j (X)(1 − P (X ⊆ t j+1 )) = P ≥i,j .

set identification algorithms. In particular, we propose a

probabilistic frequent itemset mining approach based on the

Next, we describe how these monotonicity criteria can be

Apriori algorithm ([2]). Like Apriori, our method iteratively

exploited to prune the dynamic computation.

generates the probabilistic frequent itemsets using a bottom-

up strategy. Each iteration is performed in two steps, a join