SECTION 5 GIVES THE RESULTS OF A NUMBER OF EXPERI-KNOWLEDGE ABOUT THE...

4.3 An example

Qsp1234

Asp

&

'"()-+.

Qsp879!

Asp

7

Consider an example taken from the question an-Maximal Likelihood Ratio (MLHR) is also usedswer corpus of the main task of TREC-2001:for word co-occurrence mining (Dunning, 1993).“Which river in US is known as Big Muddy?”. TheWe decided to check MLHR for answer validationquestion keywords are: “river”, “US”, “known”,because it is supposed to outperform PMI in case“Big”, “Muddy”. The search of the pattern [riverof sparse data, a situation that may happen in caseNEARUSNEAR(knownORknowOR...) NEARBigof questions with complex patterns that return smallNEARMuddy] returns 0 pages, so the algorithm re-number of hits.laxes the pattern by cutting the initial noun “river”,

&6:<;>=

according to the heuristic for discarding a noun if it

?$A@CB%DFEHGCI

is the first keyword of the question. The second pat-

:

LKRSLPTR

LKNMOLPQM

F

:

tern [USNEAR(knownORknowOR...) NEARBig

IJ$

M,LKNMOLPQM

R.OKR,LPTR

NEARMuddy] also returns 0 pages, so we apply thewhere

:

F

OKTOP

8$UVWC@XY[ZV

heuristic for ignoring verbs like “know”, “call” and,

R

$

V]

abstract nouns like “name”. The third pattern [US

VL\

M

$

Y[\

Y.]

NEARBigNEARMuddy] returns 28 pages, which is

V^\_TV]

#$

over the experimentally set threshold of seven pages.

Y[\_!Y,]

K

M

,K

R

$`

@C

$`

PQM

$0!

,

PaR

$0!@C

One of the 50 byte candidate answers from theTREC-2001 answer collection is “recover Missis-Here

!

@C !

is the number ofsippi River”. Taking into account the answer typeappearances of Qsp when Asp is not present and

LOCATION

, the algorithm considers only the namedit is calculated as

*@(b1234C

.entity: “Mississippi River”. To calculate answervalidity score (in this example PMI) for [Missis-Similarly,

!@C

is the number of Websippi River], the procedure constructs the validationpages where Asp does not appear and it is calculatedpattern: [US NEAR Big NEAR Muddy NEAR Mis-as

&

'"()-+. @c

.sissippi River] with the answer sub-pattern [Missis-sippi River]. These two patterns are passed to theCorrected Conditional Probability (CCP) insearch engine, and the returned numbers of pagescontrast with PMI and MLHR, CCP is notare substituted in the mutual information expressionsymmetric (e.g. generally

ded"#

gf

$

at the places of

!C1234l

and

!

). This is based on the fact that

ded"#

respectively; the previously obtained number (i.e.we search for the occurrence of the answer pattern