2 QDC OPERATION THE APPARENT REDUNDANCY HERE IS BECAUSE OF THE THE S...
4.2 QDC Operation
The apparent redundancy here is because of the
The system first asked three questions for each
potential NIL answers for some of the date slots.
subject X:
We also rejected combinations of works whose
years spanned more than 100 years (in case there
In what year was X born?
were no BORN or DIED dates). In performing these
In what year did X die?
constraint calculations, NIL satisfied every test by
What compositions did X have?
fiat. The constraint network we used is depicted in
Figure 2.
The third of these triggers our named-entity type
COMPOSITION that is used for all kinds of titled
Birthdate of X
works – books, films, poems, music, plays and so
Work W
i
on, and also quotations. Our named-entity recog-
Author X
Date of W
i
X
i
= Author of W
i
nizer has rules to detect works of art by phrases that
are in apposition to “the film … ” or the “the book
Deathdate of X
… ” etc., and also captures any short phrase in quotes
beginning with a capital letter. The particular ques-
Figure 2. Constraint Network for evaluation ex-
tion phrasing we used does not commit us to any
ample. Dashed lines represent question-answer
specific creative verb. This is of particular impor-
pairs, solid lines constraints between the answers.
tance since it very frequently happens in text that
titled works are associated with their creators by
We used as a test corpus the AQUAINT corpus
means of a possessive or parenthetical construction,
used in TREC-QA since 2002. Since this was not
rather than subject-verb-object.
the same corpus from which the test questions were
The top five answers, with confidences, are re-
generated (the Web), we acknowledged that there
turned for the born and died questions (subject to
might be some difference in the most common spell-
also passing a confidence threshold test). The com-
ing of certain names, but we made no attempt to cor-
positions question is treated as a list question, mean-
rect for this. Neither did we attempt to normalize,
ing that all answers that pass a certain threshold are
translate or aggregate names of the titled works that
returned. For each such returned work W
i
, two addi-
were returned, so that, for example, “Well-
tional questions are asked:
Tempered Klavier” and “Well-Tempered Clavier”
ally associated with the correct artist, so our decision
were treated as different. Since only individuals
to remove them from consideration resulted in a de-
were used in the question set, we did not have in-
crease in both the numerator and denominator of the
stances of problems we saw in training, such as
precision and recall calculations, resulting in a
where an ensemble (such as The Beatles) created a
minimal effect.
The results of applying QDC to the 57 test indi-
certain piece, which in turn via the reciprocal ques-
tion was found to have been written by a single per-
viduals are summarized in Table 3. The baseline
assertions for individual X were:
son (Paul McCartney). The reverse situation was
o Top-ranking birthdate/NIL
still possible, but we did not handle it. We foresee a
o Top-ranking deathdate/NIL
future version of our system having knowledge of
o Set of works W
i
that passed threshold
ensembles and their composition, thus removing this
o Top-ranking date for W
i
/NIL
restriction. In general, a variety of ontological rela-
tionships could occur between the original individ-
The sets of baseline assertions (by individual) are
ual and the discovered performer(s) of the work.
in effect the results of QA-by-Dossier WITHOUT
We generated answer keys by reading the pas-
Constraints (QbD).
sages that the system had retrieved and from which
the answers were generated, to determine “truth”. In
Assertions Micro-Average Macro-Averagecases of absent information in these passages, we
did our own corpus searches. This of course made
Prec Rec F Prec Rec F Tru- Total Cor-the issue of evaluation of recall only relative, since
th rectwe were not able to guarantee we had found all ex-
1671 517 933 .309 .554 .396 .331 .520 .386 Base-isting instances.
line QDC 1417 813 933 .573 .871 .691 .603 .865 .690We encountered some grey areas, e.g., if a paint-
ing appeared in an exhibition or if a celebrity en-
dorsed a product, then should the exhibition’s or
Table 3. Results of Performance Evaluation.
product’s name be considered an appropriate “work”
Two calculations of P/R/F are made, depending on
of the artist? The general perspective adopted was
whether the averaging is done over the whole set, or
that we were not establishing or validating the nature
first by individual; the results are very similar.
of the relationship between an individual and a crea-
tive work, but rather its existence. We answered
The QDC assertions were the same as those for
“yes” if we subjectively felt the association to be
QbD, but reflecting the following effects:
both very strong and with the individual’s participa-
o Some {W
i