RELATED WORKRECALL THAT PREVIOUS WORK WAS BASED ON THE EXPECTED SUP...

2. RELATED WORK

Recall that previous work was based on the expected sup-

There is a large body of research on Frequent Itemset Min-

port [5, 6, 11].

ing (FIM) but very little work addresses FIM in uncertain

databases [5, 6, 11]. The approach proposed by Chui et.

Definition 5. Given an uncertain transaction database

T , the expected support E(X) of an itemset X is defined

al [6] computes the expected support of itemsets by sum-

ming all itemset probabilities in their U-Apriori algorithm.

as E(X )= P

t∈T P (X ⊆ t).

Later, in [5], they additionally proposed a probabilistic fil-

Considering an itemset frequent if its expected support is

ter in order to prune candidates early. In [11], the UF-

above minSup has a major drawback. Uncertain transaction

growth algorithm is proposed. Like U-Apriori, UF-growth

databases naturally involve uncertainty concerning the sup-

computes frequent itemsets by means of the expected sup-

port of an itemset. Considering this is important when eval-

port, but it uses the FP-tree [9] approach in order to avoid

uating whether an itemset is frequent or not. However, this

expensive candidate generation. In contrast to our proba-

information is forfeited when using the expected support ap-

bilistic approach, itemsets are considered frequent if the ex-

proach. Let us return to the example shown in Figure 2. The

pected support exceeds minSup. The main drawback of this

expected support of the itemset {D} is E({D}) = 3.0. The

estimator is that information about the uncertainty of the

fact that {D} occurs for certain in one transaction, namely

expected support is lost; [5, 6, 11] ignore the number of pos-

in t 2 , and that there is at least one possible world where

sible worlds in which an itemsets is frequent. [18] proposes

X occurs in five transactions are totally ignored when using

exact and sampling-based algorithms to find likely frequent

the expected support in order to evaluate the frequency of

items in streaming probabilistic data. However, they do not

an itemset. Indeed, suppose minSup = 3; do we call {D}

consider itemsets with more than one item. Finally, except

frequent? And if so, how certain can we even be that {D} is

for [15], existing FIM algorithms assume binary valued items

frequent? By comparison, consider itemset {G}. This also

which precludes simple adaptation to uncertain databases.

has an expected support of 3, but its presence or absence

To the best of our knowledge, our approach is the first that

in transactions is more certain. It turns out that the prob-

is able to find frequent itemsets in an uncertain transaction

ability that {D} is frequent is 0.7 and the probability that

database in a probabilistic way.

G is frequent is 0.91. While both have the same expected

3 Assuming minSup is a constant.

support, we can be quite confident that {G} is frequent, in

Notation Description

P

i

({D})

P

minSup

({D})

W, w Set of all possible worlds, Possible world

0,45