1 2 32 3 4NUMBER OF RESULTSITEMSET SIZE[5] CHUN KIT CHUI AND BEN...

Question

2006.1 2 32 3 4Number of resultsItemset size[5] Chun Kit Chui and Ben Kao. A decremental approach formining frequent itemsets from uncertain data. In The 12thPacific-Asia Conference on Knowledge Discovery and Data(b) Effectiveness of rank-(a) Output: AP vs. IPMining (PAKDD), pages 64–75, 2008.ing queries[6] Chun Kit Chui, Ben Kao, and Edward Hung. Miningfrequent itemsets from uncertain data. In 11th Pacific-AsiaConference on Advances in Knowledge Discovery and DataFigure 10: Effectiveness of AP vs IP.Mining, PAKDD 2007, Nanjing, China, pages 47–58, 2007.[7] N. Dalvi and D. Suciu. ”Efficient query evaluation onprobabilistic databases”. The VLDB Journal,creasing. In contrast, AP returns probabilistic frequent16(4):523–544, 2007.itemsets in the classic way; in descending order of their size,[8] Karolien Geurts, Geert Wets, Tom Brijs, and Koeni.e. all itemsets of size one are returned first, etc. WhileVanhoof. Profiling high frequency accident locations usingboth approaches return probabilistic frequent itemsets, APassociation rules. In Proceedings of the 82nd AnnualTransportation Research Board, Washington DC. (USA),returns an arbitrary frequentness probability order, whileJanuary 12-16, page 18pp, 2003.IP returns the most relevant itemsets first.[9] Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequentNext we performed ranking queries on the first 100, 000patterns without candidate generation. SIGMOD Rec.,itemsets (Figure 10(b)). In this experiment, our aim was to29(2):1–12, 2000.find the m-itemset X with the highest frequency probabil-[10] H.-P. Kriegel, P. Kunath, M. Pfeifle, and M. Renz.ity of all m-itemsets, where m ∈ {2, 3, 4}. We measured the”Probabilistic Similarity Join on Uncertain Data”. In Proc.11th Int. Conf. on Database Systems for Advancednumber of itemsets returned before X. It can be seen thatApplications, Singapore, pp. 295-309, 2006.the speed up factor for ranking (and thus top-k queries) is[11] Carson Kai-Sang Leung, Christopher L. Carmichael, andseveral orders of magnitude and increases exponentially inBoyu Hao. Efficient mining of frequent patterns fromthe length of requested itemset length. The reason is thatuncertain data. In ICDMW ’07: Proceedings of the SeventhAP must return all frequent itemsets of length m − 1 be-IEEE International Conference on Data Miningfore processing itemsets of length m, while IP is able toWorkshops, pages 489–494, 2007.quickly rank itemsets in order of their frequentness proba-[12] C. Re, N. Dalvi, and D. Suciu. ”Efficient top-k querybility, therefore leading to better quality results delivered toevaluation on probalistic databases”. In Proc. 23rd Int.Conf. on Data Engineering, Istanbul, Turkey, 2007.the user much earlier.[13] P. Sen and A. Deshpande. ”Representing and queryingcorrelated tuples in probabilistic databases”. In Proc. 23rd

1 2 32 3 4NUMBER OF RESULTSITEMSET SIZE[5] CHUN KIT CHUI AND BEN...

2006.

Number of results

[5] Chun Kit Chui and Ben Kao. A decremental approach for

mining frequent itemsets from uncertain data. In The 12th

Pacific-Asia Conference on Knowledge Discovery and Data

(b) Effectiveness of rank-

(a) Output: AP vs. IP

Mining (PAKDD), pages 64–75, 2008.

ing queries

[6] Chun Kit Chui, Ben Kao, and Edward Hung. Mining

frequent itemsets from uncertain data. In 11th Pacific-Asia

Conference on Advances in Knowledge Discovery and Data

Figure 10: Effectiveness of AP vs IP.

Mining, PAKDD 2007, Nanjing, China, pages 47–58, 2007.

[7] N. Dalvi and D. Suciu. ”Efficient query evaluation on

probabilistic databases”. The VLDB Journal,

creasing. In contrast, AP returns probabilistic frequent

16(4):523–544, 2007.

itemsets in the classic way; in descending order of their size,

[8] Karolien Geurts, Geert Wets, Tom Brijs, and Koen

i.e. all itemsets of size one are returned first, etc. While

Vanhoof. Profiling high frequency accident locations using

both approaches return probabilistic frequent itemsets, AP

association rules. In Proceedings of the 82nd Annual

Transportation Research Board, Washington DC. (USA),

returns an arbitrary frequentness probability order, while

January 12-16, page 18pp, 2003.

IP returns the most relevant itemsets first.

[9] Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent

Next we performed ranking queries on the first 100, 000

patterns without candidate generation. SIGMOD Rec.,

itemsets (Figure 10(b)). In this experiment, our aim was to

29(2):1–12, 2000.

find the m-itemset X with the highest frequency probabil-

[10] H.-P. Kriegel, P. Kunath, M. Pfeifle, and M. Renz.

ity of all m-itemsets, where m ∈ {2, 3, 4}. We measured the

”Probabilistic Similarity Join on Uncertain Data”. In Proc.

11th Int. Conf. on Database Systems for Advanced

number of itemsets returned before X. It can be seen that

Applications, Singapore, pp. 295-309, 2006.

the speed up factor for ranking (and thus top-k queries) is

[11] Carson Kai-Sang Leung, Christopher L. Carmichael, and

several orders of magnitude and increases exponentially in

Boyu Hao. Efficient mining of frequent patterns from

the length of requested itemset length. The reason is that

uncertain data. In ICDMW ’07: Proceedings of the Seventh

AP must return all frequent itemsets of length m − 1 be-

IEEE International Conference on Data Mining

fore processing itemsets of length m, while IP is able to

Workshops, pages 489–494, 2007.

quickly rank itemsets in order of their frequentness proba-

[12] C. Re, N. Dalvi, and D. Suciu. ”Efficient top-k query

bility, therefore leading to better quality results delivered to

evaluation on probalistic databases”. In Proc. 23rd Int.

Conf. on Data Engineering, Istanbul, Turkey, 2007.

the user much earlier.

[13] P. Sen and A. Deshpande. ”Representing and querying

correlated tuples in probabilistic databases”. In Proc. 23rd

Bạn đang xem 2006. - BAI 3 URBAN PLANNING