0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
for items (4) or (12)).
0.0
0.5
1.0
1.5
(12) A: Do you happen to be working for a
Entropy of response distribution
large firm?
B: It’s about three hundred and fifty
Figure 5: Correlation between agreement among
people.
Turkers and whether the system gets the correct
Looking at the negative hits for item (12), one
answer. For each dialogue, we plot a circle at
sees that few give an indication about the num-
Turker response entropy and either 1 = correct
ber of people in the firm, but rather qualifications
inference or 0 = incorrect inference, except the
about colleagues or employees (great people, peo-
points are jittered a little vertically to show where
ple’s productivity), or the hits are less relevant:
the mass of data lies. As the entropy rises (i.e., as
“Most of the people I talked to were actually pretty
agreement levels fall), the system’s inferences be-
optimistic. They were rosy on the job market
come less accurate. The fitted logistic regression
and many had jobs, although most were not large
model (black line) has a statistically significant co-
firm jobs”. The lack of data comes from the fact
e ffi cient for response entropy (p < 0.001).
that the queries are very specific, since the adjec-
tive depends on the product (e.g., “expensive ex-
ercise bike”, “deep pond”). However when we
do get a predictive model, the probabilities corre-
late almost perfectly with the Turkers’ responses.
levels drop.
This happens for 8 items: “expensive to call (50
7 Conclusion
cents a minute)”, “little kids (7 and 10 year-old)”,
“long growing season (3 months)”, “lot of land
We set out to find techniques for grounding ba-
(80 acres)”, “warm weather (80 degrees)”, “young
sic meanings from text and enriching those mean-
kids (5 and 2 year-old)”, “young person (31 year-
ings based on information from the immediate lin-
old)” and “large house (2450 square feet)”. In
guistic context. We focus on gradable modifiers,
the latter case only, the system output (uncer-
seeking to learn scalar relationships between their
tain) doesn’t correlate with the Turkers’ judgment
meanings and to obtain an empirically grounded,
(where the dominant answer is ‘probable yes’ with
probabilistic understanding of the clear and fuzzy
15 responses, and 11 answers are ‘uncertain’).
cases that they often give rise to (Kamp and Partee,
The logistic curves in figure 4 capture nicely the