1 SCREENING FILTERA NOUN PHRASE. ALL THE PROBABILITIES OF RULES ARES...

3. - BÁO CÁO KHOA HỌC SPOKEN INTERACTIVE ODQA SYSTEM SPIQA PDF

Khoa học BÁO CÁO KHOA HỌC SPOKEN INTERACTIVE ODQA SYSTEM SPIQA PDF

Nội dung
Đáp án tham khảo

3.1 Screening filter

a noun phrase. All the probabilities of rules are

Screening was performed by removing recognition

stochastically estimated based on data. Probabilities

errors using a confidence measure as a threshold and

for frequently used rules become greater, and those

then summarizing it within an 80% to 100% com-

for rarely used rules become smaller. Even though

paction ratio. In this summarization technique, the

transcription results given by a speech recognizer are

word significance and linguistic score for summa-

ill-formed, the dependency structure can be robustly

rization were calculated using text from Mainichi

estimated by our SDCFG.

newspapers published from 1994 to 2001, compris-

ing 13.6M sentences with 232M words. The SD-

The generality score is defined as

CFG for the word concatenation score was calcu-

) =

_w∈P

_:w=

_contlogP(w),

lated using the manually parsed corpus of Mainichi

where

P(w)

is the unigram probability of

based

newspapers published from 1996 to 1998, consist-

on the corpus to be retrieved. Thus, “

w =

cont”

ing of approximately 4M sentences with 68M words.

means that

is a content word such as a noun, verb

The number of non-terminal symbols was 100. The

or adjective.

posterior probability of each transcribed word in a

We generate the DQs using templates of interrog-

word graph obtained by ASR was used as the confi-

ative sentences. These templates contain an inter-

dence score.

rogative and a phrase taken from the user’s question,

1 SCREENING FILTERA NOUN PHRASE. ALL THE PROBABILITIES OF RULES ARES...

3.1 Screening filter

a noun phrase. All the probabilities of rules are

Screening was performed by removing recognition

stochastically estimated based on data. Probabilities

errors using a confidence measure as a threshold and

for frequently used rules become greater, and those

then summarizing it within an 80% to 100% com-

for rarely used rules become smaller. Even though

paction ratio. In this summarization technique, the

transcription results given by a speech recognizer are

word significance and linguistic score for summa-

ill-formed, the dependency structure can be robustly

rization were calculated using text from Mainichi

estimated by our SDCFG.

newspapers published from 1994 to 2001, compris-

ing 13.6M sentences with 232M words. The SD-

The generality score is defined as

CFG for the word concatenation score was calcu-

lated using the manually parsed corpus of Mainichi

where

is the unigram probability of

based

newspapers published from 1996 to 1998, consist-

on the corpus to be retrieved. Thus, “

cont”

ing of approximately 4M sentences with 68M words.

means that

is a content word such as a noun, verb

The number of non-terminal symbols was 100. The

or adjective.

posterior probability of each transcribed word in a

We generate the DQs using templates of interrog-

word graph obtained by ASR was used as the confi-

ative sentences. These templates contain an inter-

dence score.

rogative and a phrase taken from the user’s question,

Bạn đang xem 3. - BÁO CÁO KHOA HỌC SPOKEN INTERACTIVE ODQA SYSTEM SPIQA PDF