4, HIKARIDAI, SEIKA-CHO, SORAKU-GUN, KYOTO, JAPANABSTRACTON THEIR TA...

2-4, Hikaridai, Seika-cho, Soraku-gun, Kyoto, Japan

Abstract

on their target domains, interfaces, and interactions

to draw out additional information from users to ac-

We have been investigating an interactive

complish set tasks, as is shown in Table 1. In this

approach for Open-domain QA (ODQA)

table, text and speech denote text input and speech

and have constructed a spoken interactive

input, respectively. The term “addition” represents

ODQA system, SPIQA. The system de-

additional information queried by the QA systems.

rives disambiguating queries (DQs) that

This additional information is separate to that de-

draw out additional information. To test

rived from the user’s initial questions.

the efficiency of additional information re-

quested by the DQs, the system recon-

structs the user’s initial question by com-

bining the addition information with ques-

Table 1: Domain and data structure for QA systems

tion. The combination is then used for an-

target domain specific open

swer extraction. Experimental results re-

data structure knowledge DB unstructured text

vealed the potential of the generated DQs.

without addition CHAT-80 SAIQAtext with addition MYCIN (SPIQA

)without addition Harpy VAQA

1 Introduction

speechwith addition JUPITER (SPIQA

)

Open-domain QA (ODQA), which extracts answers

SPIQA is our system.

from large text corpora, such as newspaper texts, has

been intensively investigated in the Text REtrieval

Conference (TREC). ODQA systems return an ac-

To construct spoken interactive ODQA systems,

tual answer in response to a question written in a

the following problems must be overcome: 1. Sys-

natural language. However, the information in the

tem queries for additional information to extract an-

first question input by a user is not usually sufficient

swers and effective interaction strategies using such

to yield the desired answer. Interactions for col-

queries cannot be prepared before the user inputs the

lecting additional information to accomplish QA are

question. 2. Recognition errors degrade the perfor-

needed. To construct more precise and user-friendly

mance of QA systems. Some information indispens-

ODQA systems, a speech interface is used for the

able for extracting answers is deleted or substituted

interaction between human beings and machines.

with other words.

Our goal is to construct a spoken interactive

ODQA system that includes an automatic speech

Our spoken interactive ODQA system, SPIQA,

recognition (ASR) system and an ODQA system.

copes with the first problem by adopting disam-

To clarify the problems presented in building such

biguating users’ questions using system queries. In

a system, the QA systems constructed so far have

addition, a speech summarization technique is ap-

been classified into a number of groups, depending

plied to handle recognition errors.

2 Spoken Interactive QA system: SPIQA

ODQA engine

The ODQA engine, SAIQA, has four compo-

Figure 1 shows the components of our system, and

nents: question analysis, text retrieval, answer hy-

the data that flows through it. This system com-

pothesis extraction, and answer selection.

prises an ASR system (SOLON), a screening filter

DDQ module

that uses a summarization method, and ODQA en-

When the ODQA engine cannot extract an appro-

gine (SAIQA) for a Japanese newspaper text corpus,

priate answer to a user’s question, the question is

a Deriving Disambiguating Queries (DDQ) module,

considered to be “ambiguous.” To disambiguate the

and a Text-to-Speech Synthesis (TTS) engine (Fi-

initial questions, the DDQ module automatically de-

nalFluet).

rives disambiguating queries (DQs) that require in-

formation indispensable for answer extraction. The

Additional

Question

info.

New question

reconstructor

situations in which a question is considered ambigu-

Recognition

Question/

ous are those when users’ questions exclude indis-

result

Additional info.

ODQA engine

Screening

ASR

First

filter

(SAIQA)

pensable information or indispensable information

question

User

Answer/

Answer

sentence

is lost through ASR errors. These instances of miss-

DDQ speech

TTS

derived?

Yes

sentence generator

ing information should be compensated for by the

No

DDQ

users.

module

To disambiguate a question, ambiguous phrases

Figure 1: Components and data flow in SPIQA.

within it should be identified. The ambiguity of

each phrase can be measured by using the struc-

tural ambiguity and generality score for the phrase.

ASR system

The structural ambiguity is based on the dependency

Our ASR system is based on the Weighted Finite-

structure of the sentence; phrase that is not modified

State Transducers (WFST) approach that is becom-

by other phrases is considered to be highly ambigu-

ing a promising alternative formulation for the tra-

ous. Figure 2 has an example of a dependency struc-

ditional decoding approach. The WFST approach

ture, where the question is separated into phrases.

offers a unified framework representing various

Each arrow represents the dependency between two

knowledge sources in addition to producing an op-

phrases. In this example, “the World Cup” has no

timized search network of HMM states. We com-

bined cross-word triphones and trigrams into a sin-

gle WFST and applied a one-pass search algorithm

Which country in Southeast Asia won the world cup ?

to it.

Figure 2: Example of dependency structure.

Screening filter

modifiers and needs more information to be identi-

To alleviate degradation of the QA’s perfor-

fied. “Southeast Asia” also has no modifiers. How-

mance by recognition errors, fillers, word fragments,

ever, since “the World Cup”appears more frequently

and other distractors in the transcribed question, a

than “Southeast Asia” in the retrieved corpus, “the

screening filter that removes these redundant and

World Cup” is more difficult to identify. In other

irrelevant information and extracts meaningful in-

words, words that frequently occur in a corpus rarely

formation is required. The speech summarization

help to extract answers in ODQA systems. There-

approach (C. Hori et. al., 2003) is applied to the

fore, it is adequate for the DDQ module to generate

screening process, wherein a set of words maximiz-

questions relating to “World Cup” in this example,

ing a summarization score that indicates the appro-

such as “What kind of World Cup?” , “What year

priateness of summarization is extracted automati-

was the World Cup held?”.

cally from a transcribed question, and these words

The structural ambiguity of the

n

-th phrase is de-

are then concatenated together. The extraction pro-

fined as

cess is performed using a Dynamic Programming

A

D

(P

n

) = log1

N

i=1:i=n

D(P

i

, P

n

),

(DP) technique.

where the complete question is separated into

N

system. The question transcriptions were processed

with a screening filter and input into the ODQA

phrases, and

D(P

i

, P

n

)

is the probability that phrase

P

n

will be modified by phrase

P

i

, which can be cal-

engine. Each question consisted of about 19 mor-

phemes on average. The sentences were grammat-

culated using Stochastic Dependency Context-Free

Grammar (SDCFG) (C. Hori et. al., 2003).

ically correct, formally structured, and had enough

information for the ODQA engine to extract the cor-

Using this SDCFG, only the number of non-

rect answers. The mean word recognition accuracy

terminal symbols is determined and all combina-

obtained by the ASR system was 76%.

tions of rules are applied recursively. The non-

terminal symbol has no specific function, such as