1 2 32 3 4NUMBER OF RESULTSITEMSET SIZE[5] CHUN KIT CHUI AND BEN...

2006.

1 2 32 3 4

Number of results

Itemset size

[5] Chun Kit Chui and Ben Kao. A decremental approach for

mining frequent itemsets from uncertain data. In The 12th

Pacific-Asia Conference on Knowledge Discovery and Data

(b) Effectiveness of rank-

(a) Output: AP vs. IP

Mining (PAKDD), pages 64–75, 2008.

ing queries

[6] Chun Kit Chui, Ben Kao, and Edward Hung. Mining

frequent itemsets from uncertain data. In 11th Pacific-Asia

Conference on Advances in Knowledge Discovery and Data

Figure 10: Effectiveness of AP vs IP.

Mining, PAKDD 2007, Nanjing, China, pages 47–58, 2007.

[7] N. Dalvi and D. Suciu. ”Efficient query evaluation on

probabilistic databases”. The VLDB Journal,

creasing. In contrast, AP returns probabilistic frequent

16(4):523–544, 2007.

itemsets in the classic way; in descending order of their size,

[8] Karolien Geurts, Geert Wets, Tom Brijs, and Koen

i.e. all itemsets of size one are returned first, etc. While

Vanhoof. Profiling high frequency accident locations using

both approaches return probabilistic frequent itemsets, AP

association rules. In Proceedings of the 82nd Annual

Transportation Research Board, Washington DC. (USA),

returns an arbitrary frequentness probability order, while

January 12-16, page 18pp, 2003.

IP returns the most relevant itemsets first.

[9] Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent

Next we performed ranking queries on the first 100, 000

patterns without candidate generation. SIGMOD Rec.,

itemsets (Figure 10(b)). In this experiment, our aim was to

29(2):1–12, 2000.

find the m-itemset X with the highest frequency probabil-

[10] H.-P. Kriegel, P. Kunath, M. Pfeifle, and M. Renz.

ity of all m-itemsets, where m ∈ {2, 3, 4}. We measured the

”Probabilistic Similarity Join on Uncertain Data”. In Proc.

11th Int. Conf. on Database Systems for Advanced

number of itemsets returned before X. It can be seen that

Applications, Singapore, pp. 295-309, 2006.

the speed up factor for ranking (and thus top-k queries) is

[11] Carson Kai-Sang Leung, Christopher L. Carmichael, and

several orders of magnitude and increases exponentially in

Boyu Hao. Efficient mining of frequent patterns from

the length of requested itemset length. The reason is that

uncertain data. In ICDMW ’07: Proceedings of the Seventh

AP must return all frequent itemsets of length m − 1 be-

IEEE International Conference on Data Mining

fore processing itemsets of length m, while IP is able to

Workshops, pages 489–494, 2007.

quickly rank itemsets in order of their frequentness proba-

[12] C. Re, N. Dalvi, and D. Suciu. ”Efficient top-k query

bility, therefore leading to better quality results delivered to

evaluation on probalistic databases”. In Proc. 23rd Int.

Conf. on Data Engineering, Istanbul, Turkey, 2007.

the user much earlier.

[13] P. Sen and A. Deshpande. ”Representing and querying

correlated tuples in probabilistic databases”. In Proc. 23rd