3.2 Frequentness Probability
rather, must be represented by a discrete probability distri-
bution.
Recall that we are interested in the probability that an
itemset is frequent, i.e. the probability that an itemset oc-
Definition 6. Given an uncertain (transaction) database
curs in at least minSup transactions.
T and the set W of possible worlds (instantiations) of T , the
support probability P i (X ) of an itemset X is the probability
Definition 9. Let T be an uncertain transaction database
that X has the support i. Formally,
and X be an itemset. P ≥i (X) denotes the probability that the
P i (X) = X
support of X is at least i, i.e. P ≥i (X) = P |T|
P (w j )
k=i P k (X ). For a
given minimal support minSup ∈ {0, . . . , |T |}, the probabil-
w
j∈W,(S(X,w
j)=i)
ity P ≥minSup (X ), which we call the frequentness probability
where S(X, w j ) is the support of X in world w j .
of X, denotes the probability that the support of X is at least
minSup.
Intuitively, P i (X) denotes the probability that the support
of X is exactly i . The support probabilities associated with
Figure 4(b) shows the frequentness probabilities of {D}
an itemset X for different support values form the support
for all possible minSup values in the database of Figure 2.
probability distribution of the support of X.
For example, the probability that {D} is frequent when
minSup = 3 is approximately 0.7, while its frequentness
Definition 7. The probabilistic support of an itemset X
probability when minSup = 4 is approximately 0.3.
in an uncertain transaction database T is defined by the sup-
The intuition behind P ≥minSup (X) is to show how confi-
port probabilities of X (P i (X)) for all possible support val-
dent we are that an itemset is frequent. With this policy,
ues i ∈ {0, ..., |T |}. This probability distribution is called
the frequentness of an itemset becomes subjective and the
support probability distribution. The following statement
decision about which candidates should be reported to the
holds: P
0≤i≤|T| P i (X) = 1.0.
user depends on the application. Hence, we use the mini-
Returning to our example of Figure 2, Figure 4(a) shows the
mum frequentness probability τ as a user defined parameter.
support probability distribution of itemset {D}.
Some applications may need a low τ , while in other applica-
The number of possible worlds |W | that need to be con-
tions only highly confident results should be reported (high
sidered for the computation of P i (X) is extremely large. In
τ ).
fact, we have O(2 |T|·|I| ) possible worlds, where |I| denotes
In the possible worlds model we know that P ≥i (X) =
the total number of items. In the following, we show how to
P
w
j∈W :(S(X,w
j)≥i) P (w j ). This can be computed according
compute P i (X) without materializing all possible worlds.
to Equation 1 by
Lemma 8. For an uncertain transaction database T with
P ≥i (X ) = X
(1 − P(X ⊆ t))).
P (X ⊆ t) · Y
( Y
mutually independent transactions and any 0 ≤ i ≤ |T |, the
t∈S
t∈T−S
S⊆T ,|S|≥i
support probability P i (X ) can be computed as follows:
(2)
Hence, the frequentness probability can be calculated by
(1−P(X ⊆ t))) (1)
P(X ⊆ t)· Y
enumerating all possible worlds satisfying the minSup condi-
S⊆T ,|S|=i
tion through the direct application of Equation 2. This naive
approach is very inefficient however. We can speed this up
P i,j (X) = 0 (i>j) P minSup,|T| ! 1 (X)
support i P minSup,|T| (X)
significantly. First, note that typically minSup << |T | and
|T |
minSup
0
the number of worlds with support i is at most
.
i
P minSup ! 1,|T| ! 1 (X)
Hence, enumeration of all worlds w in which the support of
X is greater than minSup is much more expensive than enu-
1
merating those where the support is less than minSup. Using
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# transactions j
the following easily verified Lemma, we can compute the fre-
1 2 | | j
|T|
quentness probability exponentially in minSup << |T |.
start computation with P 1,1 (X) P 0,j (X)= 1
Lemma 10. P ≥i (X ) = 1 − P
t∈S P(X ⊆ t) ·
S⊆T:|S|<i ( Q
Figure 5: Dynamic Computation Scheme
Q
t∈T −S (1 − P (X ⊆ t))).
Despite this improvement, the complexity of the above ap-
The above dynamic programming scheme is an adaption of
proach, called Basic in our experiments, is still exponential
a technique previously used in the context of probabilistic
w.r.t. the number of transactions. In Section 4 we describe
top-k queries by Kollios et. al [17].
how we can reduce this to linear time.
=
Proof. P ≥i,j (X ) = P j
Bạn đang xem 3. - BAI 3 URBAN PLANNING