1 SCREENING FILTERA NOUN PHRASE. ALL THE PROBABILITIES OF RULES ARES...

3.1 Screening filter

a noun phrase. All the probabilities of rules are

Screening was performed by removing recognition

stochastically estimated based on data. Probabilities

errors using a confidence measure as a threshold and

for frequently used rules become greater, and those

then summarizing it within an 80% to 100% com-

for rarely used rules become smaller. Even though

paction ratio. In this summarization technique, the

transcription results given by a speech recognizer are

word significance and linguistic score for summa-

ill-formed, the dependency structure can be robustly

rization were calculated using text from Mainichi

estimated by our SDCFG.

newspapers published from 1994 to 2001, compris-

ing 13.6M sentences with 232M words. The SD-

The generality score is defined as

CFG for the word concatenation score was calcu-

A

G

(P

n

) =

w∈P

n

:w=

contlogP(w),

lated using the manually parsed corpus of Mainichi

where

P(w)

is the unigram probability of

w

based

newspapers published from 1996 to 1998, consist-

on the corpus to be retrieved. Thus, “

w =

cont”

ing of approximately 4M sentences with 68M words.

means that

w

is a content word such as a noun, verb

The number of non-terminal symbols was 100. The

or adjective.

posterior probability of each transcribed word in a

We generate the DQs using templates of interrog-

word graph obtained by ASR was used as the confi-

ative sentences. These templates contain an inter-

dence score.

rogative and a phrase taken from the user’s question,