10.20.30.40.50.60.70.80.91FOR ITEMS (4) OR (12)).0.0 0.5 1.0 1.5(12...

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

for items (4) or (12)).

0.0

0.5

1.0

1.5

(12) A: Do you happen to be working for a

Entropy of response distribution

large firm?

B: It’s about three hundred and fifty

Figure 5: Correlation between agreement among

people.

Turkers and whether the system gets the correct

Looking at the negative hits for item (12), one

answer. For each dialogue, we plot a circle at

sees that few give an indication about the num-

Turker response entropy and either 1 = correct

ber of people in the firm, but rather qualifications

inference or 0 = incorrect inference, except the

about colleagues or employees (great people, peo-

points are jittered a little vertically to show where

ple’s productivity), or the hits are less relevant:

the mass of data lies. As the entropy rises (i.e., as

“Most of the people I talked to were actually pretty

agreement levels fall), the system’s inferences be-

optimistic. They were rosy on the job market

come less accurate. The fitted logistic regression

and many had jobs, although most were not large

model (black line) has a statistically significant co-

firm jobs”. The lack of data comes from the fact

e ffi cient for response entropy (p < 0.001).

that the queries are very specific, since the adjec-

tive depends on the product (e.g., “expensive ex-

ercise bike”, “deep pond”). However when we

do get a predictive model, the probabilities corre-

late almost perfectly with the Turkers’ responses.

levels drop.

This happens for 8 items: “expensive to call (50

7 Conclusion

cents a minute)”, “little kids (7 and 10 year-old)”,

“long growing season (3 months)”, “lot of land

We set out to find techniques for grounding ba-

(80 acres)”, “warm weather (80 degrees)”, “young

sic meanings from text and enriching those mean-

kids (5 and 2 year-old)”, “young person (31 year-

ings based on information from the immediate lin-

old)” and “large house (2450 square feet)”. In

guistic context. We focus on gradable modifiers,

the latter case only, the system output (uncer-

seeking to learn scalar relationships between their

tain) doesn’t correlate with the Turkers’ judgment

meanings and to obtain an empirically grounded,

(where the dominant answer is ‘probable yes’ with

probabilistic understanding of the clear and fuzzy

15 responses, and 11 answers are ‘uncertain’).

cases that they often give rise to (Kamp and Partee,

The logistic curves in figure 4 capture nicely the