1 EFFICIENT COMPUTATION OF PROBABILISTIC SUP-ITEMSET X BY CALCULATIN...

Question

4.1 Efficient Computation of Probabilistic Sup-itemset X by calculating the cells depicted in Figure 5. Inportthe matrix, each cell relates to a probability P ≥i,j , with imarked on the x-axis, and j marked on the y-axis. NoteThe key to our approach is to consider it in terms of sub-that according to Lemma 12, in order to compute a P ≥i,j ,problems. First, we need appropriate definitions;we require the probabilities P ≥i−1,j−1 and P ≥i,j−1 , that is,the cell to the left and the cell to the lower left of P ≥i,j .Definition 11. The probability that i of j transactionsKnowing that P ≥0,0 = 1 and P ≥1,0 = 0 by definition, we cancontain itemset X isstart by computing P ≥1,1 . The probability P ≥1,j can thenP (X ⊆ t) · Y( YP i,j (X ) = X(1 − P (X ⊆ t)))be computed by using the previously computed P ≥1,j−1 forall j. P ≥1,j can, in turn, be used to compute P ≥2,j . ThisS⊆Tj:|S|=it∈St∈Tj−Siteration continues until i reaches minSup, so that finallywhere T j = {t 1 , ..., t j } ⊆ T is the set of the first j transac-we obtain P ≥minSup,|T| – the frequentness probability (Def-tions. Similarly, the probability that at least i of j transac-inition 9).tions contain itemset X isNote that in each line (i.e. for each i) of the matrix inFigure 5, j only runs up to |T | − minSup + i. Larger valuesP(X ⊆ t) · YP ≥i,j (X) = Xof j are not required for the computation of P minSup,|T| .S⊆Tj:|S|≥iLemma 13. The computation of the frequentness proba-Note that P ≥i,|T| (X ) = P ≥i (X ), the probability that atbility P ≥minSup requires at most O(|T | ∗ minSup) = O(|T |)least i transactions in the entire database contain X. Thetime and at most O(|T |) space.key idea in our approach is to split the problem of computingP ≥i,|T| (X ) into smaller problems P ≥i,j (X), j < |T |. ThisProof. Using the dynamic computation scheme as showncan be achieved as follows. Given a set of j transactionsin Figure 5, the number of computations is bounded by theT j = {t 1 , ..., t j } ⊆ T : If we assume that transaction t j con-size of the depicted matrix. The matrix contains |T |∗minSuptains itemset X, then P ≥i,j (X ) is equal to the probabilitycells. Each cell requires an iteration of the dynamic com-that at least i − 1 transactions of T j \{t j } contain X. Oth-putation (c.f. Corollary 12) which is performed in O(1)erwise, P ≥i,j (X ) is equal to the probability that at least itime. Note that a matrix is used here for illustration purposetransactions of T j \{t j } contain X . By splitting the problemonly. The computation of each probability P i,j (X) only re-quires information stored in the current line and the previousin this way we can use the recursion in Lemma 12, whichtells us what these probabilities are, to compute P ≥i,j (X)line to access the probabilities P i−1,j−1 (X) and P i,j−i (X) .by means of the paradigm of dynamic programming.Therefore, only these two lines (of length |T |) need to bepreserved requiring O(|T |) space. Additionally, the proba-bilities P (X ⊆ t j ) have to be stored, resulting in a total ofLemma 12. P ≥i,j (X) =O(|T |) space.P ≥i−1,j−1 (X ) · P (X ⊆ t j ) + P ≥i,j−1 (X ) · (1 − P j (X ⊆ t j ))Note that we can save computation time if an itemset iswherecertain in some transactions. If a transaction t j ∈ T containsP ≥0,j = 1 ∀.0 ≤ j ≤ |T |, P ≥i,j = 0 ∀.i > jitemset X with a probability of zero, i.e. P(X ⊆ t j ) = 0,support itransaction t j can be ignored for the dynamic computationPminSup,|T|(X)because P ≥i,j (X ) = P ≥i,j−1 (X ) holds (Lemma 12). If |T 0 |0 PminSup!d,T!d(X) " PminSup,T(X)minSupis less than minSup, then X can be pruned since, by defini-0pruning criterion:tion, P ≥minSup,T0 = 0 if minSup > T 0 . The dynamic compu-if PminSupif P (X)