2. RELATED WORK
Recall that previous work was based on the expected sup-
There is a large body of research on Frequent Itemset Min-
port [5, 6, 11].
ing (FIM) but very little work addresses FIM in uncertain
databases [5, 6, 11]. The approach proposed by Chui et.
Definition 5. Given an uncertain transaction database
T , the expected support E(X) of an itemset X is defined
al [6] computes the expected support of itemsets by sum-
ming all itemset probabilities in their U-Apriori algorithm.
as E(X )= P
t∈T P (X ⊆ t).
Later, in [5], they additionally proposed a probabilistic fil-
Considering an itemset frequent if its expected support is
ter in order to prune candidates early. In [11], the UF-
above minSup has a major drawback. Uncertain transaction
growth algorithm is proposed. Like U-Apriori, UF-growth
databases naturally involve uncertainty concerning the sup-
computes frequent itemsets by means of the expected sup-
port of an itemset. Considering this is important when eval-
port, but it uses the FP-tree [9] approach in order to avoid
uating whether an itemset is frequent or not. However, this
expensive candidate generation. In contrast to our proba-
information is forfeited when using the expected support ap-
bilistic approach, itemsets are considered frequent if the ex-
proach. Let us return to the example shown in Figure 2. The
pected support exceeds minSup. The main drawback of this
expected support of the itemset {D} is E({D}) = 3.0. The
estimator is that information about the uncertainty of the
fact that {D} occurs for certain in one transaction, namely
expected support is lost; [5, 6, 11] ignore the number of pos-
in t 2 , and that there is at least one possible world where
sible worlds in which an itemsets is frequent. [18] proposes
X occurs in five transactions are totally ignored when using
exact and sampling-based algorithms to find likely frequent
the expected support in order to evaluate the frequency of
items in streaming probabilistic data. However, they do not
an itemset. Indeed, suppose minSup = 3; do we call {D}
consider itemsets with more than one item. Finally, except
frequent? And if so, how certain can we even be that {D} is
for [15], existing FIM algorithms assume binary valued items
frequent? By comparison, consider itemset {G}. This also
which precludes simple adaptation to uncertain databases.
has an expected support of 3, but its presence or absence
To the best of our knowledge, our approach is the first that
in transactions is more certain. It turns out that the prob-
is able to find frequent itemsets in an uncertain transaction
ability that {D} is frequent is 0.7 and the probability that
database in a probabilistic way.
G is frequent is 0.91. While both have the same expected
3 Assuming minSup is a constant.
support, we can be quite confident that {G} is frequent, in
Notation Description
P
i({D})
P
minSup({D})
W, w Set of all possible worlds, Possible world
0,45
Bạn đang xem 2. - BAI 3 URBAN PLANNING