10.20.30.40.50.60.70.80.91FOR ITEMS (4) OR (12)).0.0 0.5 1.0 1.5(12...

Question

00.10.20.30.40.50.60.70.80.91for items (4) or (12)).0.0 0.5 1.0 1.5(12) A: Do you happen to be working for aEntropy of response distributionlarge firm?B: It’s about three hundred and fiftyFigure 5: Correlation between agreement amongpeople.Turkers and whether the system gets the correctLooking at the negative hits for item (12), oneanswer. For each dialogue, we plot a circle atsees that few give an indication about the num-Turker response entropy and either 1 = correctber of people in the firm, but rather qualificationsinference or 0 = incorrect inference, except theabout colleagues or employees (great people, peo-points are jittered a little vertically to show whereple’s productivity), or the hits are less relevant:the mass of data lies. As the entropy rises (i.e., as“Most of the people I talked to were actually prettyagreement levels fall), the system’s inferences be-optimistic. They were rosy on the job marketcome less accurate. The fitted logistic regressionand many had jobs, although most were not largemodel (black line) has a statistically significant co-firm jobs”. The lack of data comes from the facte ffi cient for response entropy (p < 0.001).that the queries are very specific, since the adjec-tive depends on the product (e.g., “expensive ex-ercise bike”, “deep pond”). However when wedo get a predictive model, the probabilities corre-late almost perfectly with the Turkers’ responses.levels drop.This happens for 8 items: “expensive to call (507 Conclusioncents a minute)”, “little kids (7 and 10 year-old)”,“long growing season (3 months)”, “lot of landWe set out to find techniques for grounding ba-(80 acres)”, “warm weather (80 degrees)”, “youngsic meanings from text and enriching those mean-kids (5 and 2 year-old)”, “young person (31 year-ings based on information from the immediate lin-old)” and “large house (2450 square feet)”. Inguistic context. We focus on gradable modifiers,the latter case only, the system output (uncer-seeking to learn scalar relationships between theirtain) doesn’t correlate with the Turkers’ judgmentmeanings and to obtain an empirically grounded,(where the dominant answer is ‘probable yes’ withprobabilistic understanding of the clear and fuzzy15 responses, and 11 answers are ‘uncertain’).cases that they often give rise to (Kamp and Partee,The logistic curves in figure 4 capture nicely the

Answer

00.10.20.30.40.50.60.70.80.91for items (4) or (12)).0.0 0.5 1.0 1.5(12) A: Do you happen to be working for aEntropy of response distributionlarge firm?B: It’s about three hundred and fiftyFigure 5: Correlation between agreement amongpeople.Turkers and whether the system gets the correctLooking at the negative hits for item (12), oneanswer. For each dialogue, we plot a circle atsees that few give an indication about the num-Turker response entropy and either 1 = correctber of people in the firm, but rather qualificationsinference or 0 = incorrect inference, except theabout colleagues or employees (great people, peo-points are jittered a little vertically to show whereple’s productivity), or the hits are less relevant:the mass of data lies. As the entropy rises (i.e., as“Most of the people I talked to were actually prettyagreement levels fall), the system’s inferences be-optimistic. They were rosy on the job marketcome less accurate. The fitted logistic regressionand many had jobs, although most were not largemodel (black line) has a statistically significant co-firm jobs”. The lack of data comes from the facte ffi cient for response entropy (p < 0.001).that the queries are very specific, since the adjec-tive depends on the product (e.g., “expensive ex-ercise bike”, “deep pond”). However when wedo get a predictive model, the probabilities corre-late almost perfectly with the Turkers’ responses.levels drop.This happens for 8 items: “expensive to call (507 Conclusioncents a minute)”, “little kids (7 and 10 year-old)”,“long growing season (3 months)”, “lot of landWe set out to find techniques for grounding ba-(80 acres)”, “warm weather (80 degrees)”, “youngsic meanings from text and enriching those mean-kids (5 and 2 year-old)”, “young person (31 year-ings based on information from the immediate lin-old)” and “large house (2450 square feet)”. Inguistic context. We focus on gradable modifiers,the latter case only, the system output (uncer-seeking to learn scalar relationships between theirtain) doesn’t correlate with the Turkers’ judgmentmeanings and to obtain an empirically grounded,(where the dominant answer is ‘probable yes’ withprobabilistic understanding of the clear and fuzzy15 responses, and 11 answers are ‘uncertain’).cases that they often give rise to (Kamp and Partee,The logistic curves in figure 4 capture nicely the

10.20.30.40.50.60.70.80.91FOR ITEMS (4) OR (12)).0.0 0.5 1.0 1.5(12...

for items (4) or (12)).

(12) A: Do you happen to be working for a

large firm?

B: It’s about three hundred and fifty

Figure 5: Correlation between agreement among

people.

Turkers and whether the system gets the correct

Looking at the negative hits for item (12), one

answer. For each dialogue, we plot a circle at

sees that few give an indication about the num-

Turker response entropy and either 1 = correct

ber of people in the firm, but rather qualifications

inference or 0 = incorrect inference, except the

about colleagues or employees (great people, peo-

points are jittered a little vertically to show where

ple’s productivity), or the hits are less relevant:

the mass of data lies. As the entropy rises (i.e., as

“Most of the people I talked to were actually pretty

agreement levels fall), the system’s inferences be-

optimistic. They were rosy on the job market

come less accurate. The fitted logistic regression

and many had jobs, although most were not large

model (black line) has a statistically significant co-

firm jobs”. The lack of data comes from the fact

e ffi cient for response entropy (p < 0.001).

that the queries are very specific, since the adjec-

tive depends on the product (e.g., “expensive ex-

ercise bike”, “deep pond”). However when we

do get a predictive model, the probabilities corre-

late almost perfectly with the Turkers’ responses.

levels drop.

This happens for 8 items: “expensive to call (50

7 Conclusion

cents a minute)”, “little kids (7 and 10 year-old)”,

“long growing season (3 months)”, “lot of land

We set out to find techniques for grounding ba-

(80 acres)”, “warm weather (80 degrees)”, “young

sic meanings from text and enriching those mean-

kids (5 and 2 year-old)”, “young person (31 year-

ings based on information from the immediate lin-

old)” and “large house (2450 square feet)”. In

guistic context. We focus on gradable modifiers,

the latter case only, the system output (uncer-

seeking to learn scalar relationships between their

tain) doesn’t correlate with the Turkers’ judgment

meanings and to obtain an empirically grounded,

(where the dominant answer is ‘probable yes’ with

probabilistic understanding of the clear and fuzzy

15 responses, and 11 answers are ‘uncertain’).

cases that they often give rise to (Kamp and Partee,

The logistic curves in figure 4 capture nicely the

Bạn đang xem 00. - BÁO CÁO KHOA HỌC IT WAS PROVOCATIVE ” LEARNING THE MEANING OF SCALAR ADJECTIVES POTX