SECTION 5 GIVES THE RESULTS OF A NUMBER OF EXPERI-KNOWLEDGE ABOUT THE...
4.3 An example
Qsp1234
Asp&
'"()-+.
Qsp879!
Asp
7
Consider an example taken from the question an-Maximal Likelihood Ratio (MLHR) is also usedswer corpus of the main task of TREC-2001:for word co-occurrence mining (Dunning, 1993).“Which river in US is known as Big Muddy?”. TheWe decided to check MLHR for answer validationquestion keywords are: “river”, “US”, “known”,because it is supposed to outperform PMI in case“Big”, “Muddy”. The search of the pattern [riverof sparse data, a situation that may happen in caseNEARUSNEAR(knownORknowOR...) NEARBigof questions with complex patterns that return smallNEARMuddy] returns 0 pages, so the algorithm re-number of hits.laxes the pattern by cutting the initial noun “river”,&6:<;>=
according to the heuristic for discarding a noun if it?$A@CB%DFEHGCI
is the first keyword of the question. The second pat-:
LKRSLPTR
LKNMOLPQM
F
:
tern [USNEAR(knownORknowOR...) NEARBigIJ$
M,LKNMOLPQM
R.OKR,LPTR
NEARMuddy] also returns 0 pages, so we apply thewhere:
F
OKTOP
8$UVWC@XY[ZV
heuristic for ignoring verbs like “know”, “call” and,
R
$
V]
abstract nouns like “name”. The third pattern [USVL\
M
$
Y[\
Y.]
NEARBigNEARMuddy] returns 28 pages, which isV^\_TV]
#$
over the experimentally set threshold of seven pages.Y[\_!Y,]
K
M
,K
R
$`
@C
$`
PQM
$0!
,PaR
$0!@C
One of the 50 byte candidate answers from theTREC-2001 answer collection is “recover Missis-Here!
@C !
is the number ofsippi River”. Taking into account the answer typeappearances of Qsp when Asp is not present andLOCATION
, the algorithm considers only the namedit is calculated as*@(b1234C
.entity: “Mississippi River”. To calculate answervalidity score (in this example PMI) for [Missis-Similarly,!@C
is the number of Websippi River], the procedure constructs the validationpages where Asp does not appear and it is calculatedpattern: [US NEAR Big NEAR Muddy NEAR Mis-as&
'"()-+. @c
.sissippi River] with the answer sub-pattern [Missis-sippi River]. These two patterns are passed to theCorrected Conditional Probability (CCP) insearch engine, and the returned numbers of pagescontrast with PMI and MLHR, CCP is notare substituted in the mutual information expressionsymmetric (e.g. generallyded"#
gf
$
at the places of!C1234l
and!
). This is based on the fact thatded"#
respectively; the previously obtained number (i.e.we search for the occurrence of the answer pattern