1.2 EFFECT OF THE DENSITYITEMSET MINING ALGORITHMSWE NOW EVALUATE TH...

7.1.2 Effect of the Density

Itemset Mining Algorithms

We now evaluate the effectiveness of our pruning strat-

Experiments for the probabilistic frequent itemset mining

egy w.r.t. the density. minSup is important here too, so we

algorithms were run on a subset of the real-world dataset

report results for different values in Figure 8. The pruning

accidents 4 , denoted by ACC. It consists of 340, 184 transac-

works well for datasets with low density and has no effect on

tions and 572 items whose occurrences in transactions were

the runtime for higher densities. The reason is straightfor-

randomized; with a probability of 0.5, each item appearing

ward; the higher the density, the higher the probability that

for certain in a transaction was assigned a value drawn from

a given itemset is frequent and, thus, cannot be pruned. Re-

a uniform distribution in (0, 1]. Here we use AP to denote

garding the effect of minSup; a larger minSup value decreases

the Apriori-based and IP for the incremental probabilistic

the probability that itemsets are frequent and therefore in-

itemset mining algorithms.

creases the number of computations that can be pruned.

We performed Top-k queries on the first 10, 000 transac-

The break-even point between pruning and non-pruning in

tions of ACC using a minSup = 500 and τ = 0.1. Figure

our experiments is when the density is approximately twice

10(a) shows the result of IP. Note that the frequentness

the minSup value, since, due to the method of creating our

probability of the resulting itemsets is monotonically de-

datasets, this corresponds to the expected support. At this

value, all itemsets are expected to be frequent.

4 The accidents dataset [8] was derived from

Overall, with reasonable parameter settings our pruning

the Frequent Itemset Mining Dataset Repository

strategies achieve a significant speed-up for the identification

(https://traloihay.net)

of probabilistic frequent itemsets.