4.1 Efficient Computation of Probabilistic Sup-
itemset X by calculating the cells depicted in Figure 5. In
port
the matrix, each cell relates to a probability P ≥i,j , with i
marked on the x-axis, and j marked on the y-axis. Note
The key to our approach is to consider it in terms of sub-
that according to Lemma 12, in order to compute a P ≥i,j ,
problems. First, we need appropriate definitions;
we require the probabilities P ≥i−1,j−1 and P ≥i,j−1 , that is,
the cell to the left and the cell to the lower left of P ≥i,j .
Definition 11. The probability that i of j transactions
Knowing that P ≥0,0 = 1 and P ≥1,0 = 0 by definition, we can
contain itemset X is
start by computing P ≥1,1 . The probability P ≥1,j can then
P (X ⊆ t) · Y
( Y
P i,j (X ) = X
(1 − P (X ⊆ t)))
be computed by using the previously computed P ≥1,j−1 for
all j. P ≥1,j can, in turn, be used to compute P ≥2,j . This
S⊆T
j:|S|=i
t∈S
t∈T
j−S
iteration continues until i reaches minSup, so that finally
where T j = {t 1 , ..., t j } ⊆ T is the set of the first j transac-
we obtain P ≥minSup,|T| – the frequentness probability (Def-
tions. Similarly, the probability that at least i of j transac-
inition 9).
tions contain itemset X is
Note that in each line (i.e. for each i) of the matrix in
Figure 5, j only runs up to |T | − minSup + i. Larger values
P(X ⊆ t) · Y
P ≥i,j (X) = X
of j are not required for the computation of P minSup,|T| .
S⊆T
j:|S|≥i
Lemma 13. The computation of the frequentness proba-
Note that P ≥i,|T| (X ) = P ≥i (X ), the probability that at
bility P ≥minSup requires at most O(|T | ∗ minSup) = O(|T |)
least i transactions in the entire database contain X. The
time and at most O(|T |) space.
key idea in our approach is to split the problem of computing
P ≥i,|T| (X ) into smaller problems P ≥i,j (X), j < |T |. This
Proof. Using the dynamic computation scheme as shown
can be achieved as follows. Given a set of j transactions
in Figure 5, the number of computations is bounded by the
T j = {t 1 , ..., t j } ⊆ T : If we assume that transaction t j con-
size of the depicted matrix. The matrix contains |T |∗minSup
tains itemset X, then P ≥i,j (X ) is equal to the probability
cells. Each cell requires an iteration of the dynamic com-
that at least i − 1 transactions of T j \{t j } contain X. Oth-
putation (c.f. Corollary 12) which is performed in O(1)
erwise, P ≥i,j (X ) is equal to the probability that at least i
time. Note that a matrix is used here for illustration purpose
transactions of T j \{t j } contain X . By splitting the problem
only. The computation of each probability P i,j (X) only re-
quires information stored in the current line and the previous
in this way we can use the recursion in Lemma 12, which
tells us what these probabilities are, to compute P ≥i,j (X)
line to access the probabilities P i−1,j−1 (X) and P i,j−i (X) .
by means of the paradigm of dynamic programming.
Therefore, only these two lines (of length |T |) need to be
preserved requiring O(|T |) space. Additionally, the proba-
bilities P (X ⊆ t j ) have to be stored, resulting in a total of
Lemma 12. P ≥i,j (X) =
O(|T |) space.
P ≥i−1,j−1 (X ) · P (X ⊆ t j ) + P ≥i,j−1 (X ) · (1 − P j (X ⊆ t j ))
Note that we can save computation time if an itemset is
where
certain in some transactions. If a transaction t j ∈ T contains
P ≥0,j = 1 ∀.0 ≤ j ≤ |T |, P ≥i,j = 0 ∀.i > j
itemset X with a probability of zero, i.e. P(X ⊆ t j ) = 0,
support i
transaction t j can be ignored for the dynamic computation
P
minSup,|T|(X)
because P ≥i,j (X ) = P ≥i,j−1 (X ) holds (Lemma 12). If |T 0 |
0 P
minSup!d,T!d(X) " P
minSup,T(X)
minSup
is less than minSup, then X can be pruned since, by defini-
0
pruning criterion:
tion, P ≥minSup,T
0 = 0 if minSup > T 0 . The dynamic compu-
if P
minSupif P (X)<
tation scheme can also omit transactions T j where the item
!d,T!d(X)<
then stop computation
1
has a probability of 1, because P ≥i,j (X) = P ≥i−1,j−1 (X)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
due to P(X ⊆ t j ) = 1. Thus, if a transaction t j contains
# transactions j
j
|T|
1 2
| |
X with a probability of 1, then t j (i.e. the corresponding
column) can be omitted if minSup is reduced by one, to com-
Figure 6: Visualization of the Pruning Criterion
pensate the missing transaction. The dynamic programming
scheme therefore only has to consider uncertain items. We
Bạn đang xem 4. - BAI 3 URBAN PLANNING