4.2.1 Monotonicity Criteria
X . In uncertain transaction databases however, recall that
First, if we increase the minimal support i, then the fre-
support is defined by a probability distribution and that
quentness probability of an itemset decreases.
we mine itemsets according to their frequentness probabil-
ity. It turns out that the frequentness probability is anti-
Lemma 14. P ≥i,j (X) ≥ P ≥i+1,j (X).
monotonic:
= P ≥i,j (X) − P i+1,j (X ) ≤
Proof. P ≥i+1,j (X) Def inition 9
Lemma 17. ∀Y ⊆ X : P ≥minSup (X) ≤ P ≥minSup (Y ). In
P ≥i,j (X)
other words, all subsets of a probabilistic frequent itemset
are also probabilistic frequent itemsets.
Intuitively, this result is obvious since the predicate “the
support is at least i” implies “the support is at least i + 1”.
Proof. P ≥i (X) = |W 1 | P |W|
i=1 P (w i )·I S(X,w
i)≥minSup , since
The next criterion says that an extension of the uncertain
the probability is defined over all possible worlds. Here, I Z
transaction database leads to an increase of the frequentness
is an indicator variable that is 1 when z = true and 0 oth-
probability of an itemset.
erwise. In other words, P ≥i (X) is the relative number of
worlds in which S(X) ≥ minSup holds, where each occur-
Lemma 15. P ≥i,j (X) ≤ P ≥i,j+1 (X).
rence is weighted by the probability of the world occurring.
Since world w i corresponds to a normal transaction database
Proof. P ≥i,j+1 (X) Lemma = 12 P ≥i−1,j (X) ·P (X ⊆ t j+1 ) +
Lemma14
with no uncertainty, S(X, w i ) ≤ S (Y, w i )∀Y ⊆ X due to the
≥ P ≥i,j (X ) · P (X ⊆
P ≥i,j (X) · (1 − P(X ⊆ t j+1 ))
anti-monotonicity of support. Therefore, I S(X,w
i)≥minSup ≤
t j+1 ) + P ≥i,j (X) · (1 − P (X ⊆ t j+1 )) = P ≥i,j (X)
I S(Y,W
i)≥minSup ∀i ∈ |W |, ∀Y ⊆ X and, thus, P ≥i (X) ≤
P ≥i (Y ), ∀Y ⊆ X.
The intuition behind this lemma is that one more transac-
tion can increase the support of an itemset. Putting these
We can use the contra-positive of Lemma 17 to prune
results together;
the search space for probabilistic frequent itemsets. That
is, if an itemset Y is not a probabilistic frequent itemset,
Lemma 16. P ≥i,j (X) ≥ P ≥i+1,j+1 (X).
i.e. P ≥minSup (Y ) < τ, then all itemsets X ⊇ Y cannot be
probabilistic frequent itemsets either.
= P ≥i,j (X )·P (X ⊆ t j+1 )+
Proof. P ≥i+1,j+1 (X) Corollary12
Our first algorithm is based on a “marriage” of traditional
≤ P ≥i,j (X ) · P (X ⊆
P ≥i+1,j (X)(1 − P(X ⊆ t j+1 ))
frequent itemset mining methods and our uncertain item-
t j+1 ) + P ≥i,j (X)(1 − P (X ⊆ t j+1 )) = P ≥i,j .
set identification algorithms. In particular, we propose a
probabilistic frequent itemset mining approach based on the
Next, we describe how these monotonicity criteria can be
Apriori algorithm ([2]). Like Apriori, our method iteratively
exploited to prune the dynamic computation.
generates the probabilistic frequent itemsets using a bottom-
up strategy. Each iteration is performed in two steps, a join
Bạn đang xem 4. - BAI 3 URBAN PLANNING