2 TOP- K PROBABILISTIC FREQUENT ITEMSETS QUERYTION APPROACHES WHEN W...

6.2 Top- k Probabilistic Frequent Itemsets Query

tion approaches when we vary the number of transactions,

In many applications however, relatively few top prob-

|T |. The runtime of the Basic approach increases exponen-

abilistic frequent itemsets are required. For instance, the

tially in minSup as explained in Section 3.2, and is therefore

store in Example 1 may want to know the top k = 100. Top-

not applicable for a |T | > 50 as can be seen in Figure 7(a).

k highest frequentness probability queries can be efficiently

Our approaches Dynamic+P and DynamicOpt+P scale

computed by using Algorithm 1 and constraining the length

linearly as expected when using a constant minSup value.

The 0-1-optimization has an impact on the runtime when-

ever there is some certainty in the database. The perfor-

Algorithm 1 Incremental Algorithm

mance gain of our pruning strategies depends on the used

//initialise

minSup value. In Figures 7(b), 7(c) and 7(d) the scala-

AIQ = new PriorityQueue

bility of Dynamic and Dynamic+P is shown for differ-

FOR EACH x ∈ I

ent minSup values expressed as percentages of |T |. It is

AIQ.add([x,P ≥minSup (x)])

notable that the time complexity of O(|T | ∗ minSup) be-

//return the next probabilistic frequent itemset

comes O(|T | 2 ) if minSup is chosen relative to the database

getNext() RETURNS X

size. Also, it can be observed that the higher minSup, the

X = AIQ.removeFirst()

higher the difference between Dynamic and Dynamic+P;

FOR EACH (x ∈ I \ X : x = lastInLexOrder(X ∪ x))

a higher minSup causes the frequentness probability to fall

AIQ.add([X ∪ x,P ≥minSup (X ∪ x)])

overall, thus allowing earlier pruning.

90025002000

Dynamic

18008001600700140060012001500

Dynamic+P

5001000400Runtime [ms]300

DynamicOpt

200100

DynamicOpt+P

00 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 8500 9000 9500 100000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1DensityDatabase size

(a) minSup = 0.1

(b) minSup = 0.25

(a) minSup = 10

(b) minSup = 25%

3000

(c) minSup = 0.5

(c) minSup = 50%

(d) minSup = 0.75

(d) minSup = 75%

Figure 8: Runtime evaluation w.r.t. the density.

Figure 7: Runtime evaluation w.r.t. |T |.