2 FREQUENTNESS PROBABILITYRATHER, MUST BE REPRESENTED BY A DISCRETE...

Question

3.2 Frequentness Probabilityrather, must be represented by a discrete probability distri-bution.Recall that we are interested in the probability that anitemset is frequent, i.e. the probability that an itemset oc-Definition 6. Given an uncertain (transaction) databasecurs in at least minSup transactions.T and the set W of possible worlds (instantiations) of T , thesupport probability P i (X ) of an itemset X is the probabilityDefinition 9. Let T be an uncertain transaction databasethat X has the support i. Formally,and X be an itemset. P ≥i (X) denotes the probability that theP i (X) = Xsupport of X is at least i, i.e. P ≥i (X) = P |T|P (w j )k=i P k (X ). For agiven minimal support minSup ∈ {0, . . . , |T |}, the probabil-wj∈W,(S(X,wj)=i)ity P ≥minSup (X ), which we call the frequentness probabilitywhere S(X, w j ) is the support of X in world w j .of X, denotes the probability that the support of X is at leastminSup.Intuitively, P i (X) denotes the probability that the supportof X is exactly i . The support probabilities associated withFigure 4(b) shows the frequentness probabilities of {D}an itemset X for different support values form the supportfor all possible minSup values in the database of Figure 2.probability distribution of the support of X.For example, the probability that {D} is frequent whenminSup = 3 is approximately 0.7, while its frequentnessDefinition 7. The probabilistic support of an itemset Xprobability when minSup = 4 is approximately 0.3.in an uncertain transaction database T is defined by the sup-The intuition behind P ≥minSup (X) is to show how confi-port probabilities of X (P i (X)) for all possible support val-dent we are that an itemset is frequent. With this policy,ues i ∈ {0, ..., |T |}. This probability distribution is calledthe frequentness of an itemset becomes subjective and thesupport probability distribution. The following statementdecision about which candidates should be reported to theholds: P0≤i≤|T| P i (X) = 1.0.user depends on the application. Hence, we use the mini-Returning to our example of Figure 2, Figure 4(a) shows themum frequentness probability τ as a user defined parameter.support probability distribution of itemset {D}.Some applications may need a low τ , while in other applica-The number of possible worlds |W | that need to be con-tions only highly confident results should be reported (highsidered for the computation of P i (X) is extremely large. Inτ ).fact, we have O(2 |T|·|I| ) possible worlds, where |I| denotesIn the possible worlds model we know that P ≥i (X) =the total number of items. In the following, we show how toPwj∈W :(S(X,wj)≥i) P (w j ). This can be computed accordingcompute P i (X) without materializing all possible worlds.to Equation 1 byLemma 8. For an uncertain transaction database T withP ≥i (X ) = X(1 − P(X ⊆ t))).P (X ⊆ t) · Y( Ymutually independent transactions and any 0 ≤ i ≤ |T |, thet∈St∈T−SS⊆T ,|S|≥isupport probability P i (X ) can be computed as follows:(2)Hence, the frequentness probability can be calculated by(1−P(X ⊆ t))) (1)P(X ⊆ t)· Yenumerating all possible worlds satisfying the minSup condi-S⊆T ,|S|=ition through the direct application of Equation 2. This naiveapproach is very inefficient however. We can speed this upP i,j (X) = 0 (i>j) P minSup,|T| ! 1 (X)support i P minSup,|T| (X)significantly. First, note that typically minSup << |T | and |T |minSup0the number of worlds with support i is at most.iP minSup ! 1,|T| ! 1 (X)Hence, enumeration of all worlds w in which the support ofX is greater than minSup is much more expensive than enu-1merating those where the support is less than minSup. Using1 1 1 1 1 1 1 1 1 1 1 1 1 1 1# transactions jthe following easily verified Lemma, we can compute the fre-1 2 | | j|T|quentness probability exponentially in minSup << |T |.start computation with P 1,1 (X) P 0,j  (X)= 1Lemma 10. P ≥i (X ) = 1 − Pt∈S P(X ⊆ t) ·S⊆T:|S|<i ( QFigure 5: Dynamic Computation SchemeQt∈T −S (1 − P (X ⊆ t))).Despite this improvement, the complexity of the above ap-The above dynamic programming scheme is an adaption ofproach, called Basic in our experiments, is still exponentiala technique previously used in the context of probabilisticw.r.t. the number of transactions. In Section 4 we describetop-k queries by Kollios et. al [17].how we can reduce this to linear time.=Proof. P ≥i,j (X ) = P j