1 EVALUATION METRICS THE EVALUATION OF QA SYSTEMS IS DETERMINED ACC...

4.1 Evaluation Metrics

The evaluation of QA systems is determined according to the  Mean Reciprocal Rank (MRR): criteria for judging an answer. The following list captures The Mean Reciprocal Rank (MRR), which was first used for some possible criteria for answer evaluation [1]: TREC8, is used to calculate the answer rank (relevance): (1) Relevance: the answer should be a response to the

n

where n is the number of test question. MRR= ∑ 1 questions and r

i

is the rank of the first r

i

(2) Correctness: the answer should be factually correct. correct answer for the i-th test

i=1

(3) Conciseness: the answer should not contain extraneous or question. irrelevant information.  Confidence Weighted Score (CWS): (4) Completeness: the answer should be complete (not a part The confidence about the correctness of an answer is of the answer). evaluated using another metric called Confidence Weighted (5) Justification: the answer should be supplied with Score (CWS), which was defined for TREC11: sufficient context to allow a user to determine why this was chosen as an answer to the question. CWS= ∑ p

i

questions and p

i

is the precision of nBased on the aforementioned criteria, there are three different the answers at positions from 1 to i in judgments for an answer extracted from a document: the ordered list of answers. - “Correct”: if the answer is responsive to a question in a correct way - (criteria 1 & 2).