2006.
1 2 32 3 4Number of results
Itemset size[5] Chun Kit Chui and Ben Kao. A decremental approach for
mining frequent itemsets from uncertain data. In The 12th
Pacific-Asia Conference on Knowledge Discovery and Data
(b) Effectiveness of rank-
(a) Output: AP vs. IP
Mining (PAKDD), pages 64–75, 2008.
ing queries
[6] Chun Kit Chui, Ben Kao, and Edward Hung. Mining
frequent itemsets from uncertain data. In 11th Pacific-Asia
Conference on Advances in Knowledge Discovery and Data
Figure 10: Effectiveness of AP vs IP.
Mining, PAKDD 2007, Nanjing, China, pages 47–58, 2007.
[7] N. Dalvi and D. Suciu. ”Efficient query evaluation on
probabilistic databases”. The VLDB Journal,
creasing. In contrast, AP returns probabilistic frequent
16(4):523–544, 2007.
itemsets in the classic way; in descending order of their size,
[8] Karolien Geurts, Geert Wets, Tom Brijs, and Koen
i.e. all itemsets of size one are returned first, etc. While
Vanhoof. Profiling high frequency accident locations using
both approaches return probabilistic frequent itemsets, AP
association rules. In Proceedings of the 82nd Annual
Transportation Research Board, Washington DC. (USA),
returns an arbitrary frequentness probability order, while
January 12-16, page 18pp, 2003.
IP returns the most relevant itemsets first.
[9] Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent
Next we performed ranking queries on the first 100, 000
patterns without candidate generation. SIGMOD Rec.,
itemsets (Figure 10(b)). In this experiment, our aim was to
29(2):1–12, 2000.
find the m-itemset X with the highest frequency probabil-
[10] H.-P. Kriegel, P. Kunath, M. Pfeifle, and M. Renz.
ity of all m-itemsets, where m ∈ {2, 3, 4}. We measured the
”Probabilistic Similarity Join on Uncertain Data”. In Proc.
11th Int. Conf. on Database Systems for Advanced
number of itemsets returned before X. It can be seen that
Applications, Singapore, pp. 295-309, 2006.
the speed up factor for ranking (and thus top-k queries) is
[11] Carson Kai-Sang Leung, Christopher L. Carmichael, and
several orders of magnitude and increases exponentially in
Boyu Hao. Efficient mining of frequent patterns from
the length of requested itemset length. The reason is that
uncertain data. In ICDMW ’07: Proceedings of the Seventh
AP must return all frequent itemsets of length m − 1 be-
IEEE International Conference on Data Mining
fore processing itemsets of length m, while IP is able to
Workshops, pages 489–494, 2007.
quickly rank itemsets in order of their frequentness proba-
[12] C. Re, N. Dalvi, and D. Suciu. ”Efficient top-k query
bility, therefore leading to better quality results delivered to
evaluation on probalistic databases”. In Proc. 23rd Int.
Conf. on Data Engineering, Istanbul, Turkey, 2007.
the user much earlier.
[13] P. Sen and A. Deshpande. ”Representing and querying
correlated tuples in probabilistic databases”. In Proc. 23rd
Bạn đang xem 2006. - BAI 3 URBAN PLANNING