7.1.2 Effect of the Density
Itemset Mining Algorithms
We now evaluate the effectiveness of our pruning strat-
Experiments for the probabilistic frequent itemset mining
egy w.r.t. the density. minSup is important here too, so we
algorithms were run on a subset of the real-world dataset
report results for different values in Figure 8. The pruning
accidents 4 , denoted by ACC. It consists of 340, 184 transac-
works well for datasets with low density and has no effect on
tions and 572 items whose occurrences in transactions were
the runtime for higher densities. The reason is straightfor-
randomized; with a probability of 0.5, each item appearing
ward; the higher the density, the higher the probability that
for certain in a transaction was assigned a value drawn from
a given itemset is frequent and, thus, cannot be pruned. Re-
a uniform distribution in (0, 1]. Here we use AP to denote
garding the effect of minSup; a larger minSup value decreases
the Apriori-based and IP for the incremental probabilistic
the probability that itemsets are frequent and therefore in-
itemset mining algorithms.
creases the number of computations that can be pruned.
We performed Top-k queries on the first 10, 000 transac-
The break-even point between pruning and non-pruning in
tions of ACC using a minSup = 500 and τ = 0.1. Figure
our experiments is when the density is approximately twice
10(a) shows the result of IP. Note that the frequentness
the minSup value, since, due to the method of creating our
probability of the resulting itemsets is monotonically de-
datasets, this corresponds to the expected support. At this
value, all itemsets are expected to be frequent.
4 The accidents dataset [8] was derived from
Overall, with reasonable parameter settings our pruning
the Frequent Itemset Mining Dataset Repository
strategies achieve a significant speed-up for the identification
(https://traloihay.net)
of probabilistic frequent itemsets.
Bạn đang xem 7. - BAI 3 URBAN PLANNING