Recipe recommendation using ingredient networks

Transcription

Recipe recommendation using ingredient networks
Recipe recommendation using ingredient networks
Chun-Yuen Teng
Yu-Ru Lin
Lada A. Adamic
School of Information
University of Michigan
Ann Arbor, MI, USA
IQSS, Harvard University
CCS, Northeastern University
Boston, MA
School of Information
University of Michigan
Ann Arbor, MI, USA
chunyuen@umich.edu
yuruliny@gmail.com
ABSTRACT
T he recording and sharing of cooking recipes, a human act ivity dat ing back t housands of years, nat urally became an
early and prominent social use of t he web. T he result ing
online recipe collect ions are reposit ories of ingredient combinat ions and cooking met hods whose large-scale and variety yield int erest ing insight s about bot h t he fundament als of
cooking and user preferences. At t he level of an individual
ingredient we measure whet her it t ends t o be essent ial or can
be dropped or added, and whet her it s quant ity can be modified. We also const ruct two types of networks t o capt ure t he
relat ionships between ingredient s. T he complement network
capt ures which ingredient s t end t o co-occur frequent ly, and
is composed of two large communit ies: one savory, t he ot her
sweet . T he subst it ut e network, derived from user-generat ed
suggest ions for modificat ions, can be decomposed int o many
communit ies of funct ionally equivalent ingredient s, and capt ures users’ preference for healt hier variant s of a recipe. Our
experiment s reveal t hat recipe rat ings can be well predict ed
wit h feat ures derived from combinat ions of ingredient net works and nut rit ion informat ion.
Categoriesand Subject Descriptors
H.2.8 [D at abase M anagement ]: Dat abase applicat ions—
Data mining
General Terms
M easurement ; Experiment at ion
Keywords
ingredient networks, recipe recommendat ion
1. I NTRODUCTI ON
T he web enables individuals t o collaborat ively share knowledge and recipe websit es are one of t he earliest examples of
collaborat ive knowledge sharing on t he web. A llrecipes.com,
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
WebSci 2012, June 22–24, 2012, Evanston, Illinois, USA.
Copyright 2012 ACM 978-1-4503-1228-8.…$10.00
ladamic@umich.edu
t he subject of our present st udy, was founded in 1997, years
ahead of ot her collaborat ive websit es such as t he W ikipedia.
Recipe sit es t hrive because individuals are eager t o share
t heir recipes, from family recipes t hat had been passed down
for generat ions, t o new concoct ions t hat t hey creat ed t hat
aft ernoon, having been mot ivat ed in part by t he ability t o
share t he result online. Once shared, t he recipes are implement ed and evaluat ed by ot her users, who supply rat ings
and comment s.
T he desire t o look up recipes online may at first appear
odd given t hat t ombs of print ed recipes can be found in
almost every kit chen. T he Joy of Cooking [12] alone cont ains 4,500 recipes spread over 1,000 pages. T here is, however, subst ant ial addit ional value in online recipes, beyond
t heir accessibility. W hile t he Joy of Cooking cont ains a
single recipe for Swedish meat balls, A llrecipes.com host s
“ Swedish M eat balls I”, “ I I”, and “ I I I”, submit t ed by di erent
users, along wit h 4 ot her variant s, including “ T he A mazing Swedish M eat ball”. Each variant has been reviewed,
from 329 reviews for “ Swedish M eat balls I” t o 5 reviews
for “ Swedish M eat balls I I I”. T he reviews not only provide
a crowd-sourced ranking of t he di erent recipes, but also
many suggest ions on how t o modify t hem, e.g. using ground
t urkey inst ead of beef, skipping t he “ cream of wheat ” because it is rarely on hand, et c.
T he wealt h of informat ion capt ured by online collaborat ive recipe sharing sit es is revealing not only of t he fundament als of cooking, but also of user preferences. T he cooccurrence of ingredient s in t ens of t housands of recipes provides informat ion about which ingredient s go well t oget her,
and when a pairing is unusual. Users’ reviews provide clues
as t o t he flexibility of a recipe, and t he ingredient s wit hin
it . Can t he amount of cinnamon be doubled? Can t he nut meg be omit t ed? If one is lacking a cert ain ingredient , can a
subst it ut e be found among supplies at hand wit hout a t rip
t o t he grocery st ore? Unlike cookbooks, which will cont ain
vet t ed but perhaps not t he best variant s for some individuals’ t ast es, rat ings assigned t o user-submit t ed recipes allow
for t he evaluat ion of what works and what does not .
In t his paper, we seek t o dist ill t he collect ive knowledge
and preference about cooking t hrough mining a popular
recipe-sharing websit e. To ext ract such informat ion, we first
parse t he unst ruct ured t ext of t he recipes and t he accompanying user reviews. We const ruct two types of networks
t hat reflect di erent relat ionships between ingredient s, in
order t o capt ure users’ knowledge about how t o combine ingredient s. T he complement network capt ures which ingredient s t end t o co-occur frequent ly, and is composed of two
large communit ies: one savory, t he ot her sweet . T he subst it ut e network, derived from user-generat ed suggest ions for
modificat ions, can be decomposed int o many communit ies of
funct ionally equivalent ingredient s, and capt ures users’ preference for healt hier variant s of a recipe. Our experiment s
reveal t hat recipe rat ings can be well predict ed by feat ures
derived from combinat ions of ingredient networks and nut rit ion informat ion (wit h accuracy .792), while most of t he
predict ion power comes from t he ingredient networks (84%).
T he rest of t he paper is organized as follows. Sect ion 2 reviews t he relat ed work. Sect ion 3 describes t he dat aset . Sect ion 4 discusses t he ext ract ion of t he ingredient and complement networks and t heir charact erist ics. Sect ion 5 present s
t he ext ract ion of recipe modificat ion informat ion, as well as
t he const ruct ion and charact erist ics of t he ingredient subst it ut e network. Sect ion 6 present s our experiment s on recipe
recommendat ion and Sect ion 7 concludes.
2. RELATED WORK
Recipe recommendat ion has been t he subject of much
prior work. T ypically t he goal has been t o suggest recipes
t o users based on t heir past recipe rat ings [15][3] or browsing/ cooking hist ory [16]. T he algorit hms t hen find similar recipes based on overlapping ingredient s, eit her t reat ing each ingredient equally [4] or by ident ifying key ingredient s [19]. Inst ead of modeling recipes using ingredient s,
Wang et al. [17] represent t he recipes as graphs which are
built on ingredient s and cooking direct ions, and t hey demonst rat e t hat graph represent at ions can be used t o easily aggregat e Chinese dishes by t he flow of cooking st eps and t he
sequence of added ingredient s. However, t heir approach only
models t he occurrence of ingredient s or cooking met hods,
and doesn’t t ake int o account t he relat ionships between ingredient s. In cont rast , in t his paper we incorporat e t he likelihood of ingredient s t o co-occur, as well as t he pot ent ial of
one ingredient t o act as a subst it ut e for anot her.
A not her branch of research has focused on recommending recipes based on desired nut rit ional int ake or promot ing
healt hy food choices. Geleijnse et al. [7] designed a prot otype of a personalized recipe advice syst em, which suggest s
recipes t o users based on t heir past food select ions and nut rit ion int ake. In addit ion t o nut rit ion informat ion, K amiet h
et al. [9] built a personalized recipe recommendat ion syst em
based on availability of ingredient s and personal nut rit ional
needs. Shidochi et al. [14] proposed an algorit hm t o ext ract
replaceable ingredient s from recipes in order t o sat isfy users’
various demands, such as calorie const raint s and food availability. T heir met hod ident ifies subst it ut able ingredient s by
mat ching t he cooking act ions t hat correspond t o ingredient
names. However, t heir assumpt ion t hat subst it ut able ingredient s are subject t o t he same processing met hods is less direct and specific t han ext ract ing subst it ut ions direct ly from
user-cont ribut ed suggest ions.
A hn et al. [1] and K inouchi et al [10] examined networks
involving ingredient s derived from recipes, wit h t he former
modeling ingredient s by t heir flavor bonds, and t he lat t er
examining t he relat ionship between ingredient s and recipes.
In cont rast , we derive direct ingredient -ingredient networks
of bot h compliment s and subst it ut es. We also st ep beyond
charact erizing t hese networks t o demonst rat ing t hat t hey
can be used t o predict which recipes will be successful.
3. DATASET
A llrecipes.com is one of t he most popular recipe-sharing
websit es, where novice and expert cooks alike can upload
and rat e cooking recipes. It host s 16 cust omized int ernat ional sit es for users t o share t heir recipes in t heir nat ive
languages, of which we st udy only t he main, English, version. Recipes uploaded t o t he sit e cont ain specific inst ruct ions on how t o prepare a dish: t he list of ingredient s, preparat ion st eps, preparat ion and cook t ime, t he number of servings produced, nut rit ion informat ion, serving direct ions, and
phot os of t he prepared dish. T he uploaded recipes are enriched wit h user rat ings and reviews, which comment on
t he quality of t he recipe, and suggest changes and improvement s. In addit ion t o rat ing and comment ing on recipes,
users are able t o save t hem as favorit es or recommend t hem
t o ot hers t hrough a forum.
We downloaded 46,337 recipes including all informat ion
list ed from allrecipes.com, including several classificat ions,
such as a region (e.g. t he midwest region of US or Europe), t he course or meal t he dish is appropriat e for (e.g.:
appet izers or breakfast ), and any holidays t he dish may be
associat ed wit h. In order t o underst and users’ recipe preferences, we crawled 1,976,920 reviews which include reviewers’
rat ings, review t ext , and t he number of users who vot ed t he
review as useful.
3.1 Data preprocessing
T he first st ep in processing t he recipes is ident ifying t he
ingredient s and cooking met hods from t he freeform t ext of
t he recipe. Usually, alt hough not always, each ingredient
is list ed on a separat e line. To ext ract t he ingredient s, we
t ried two approaches. In t he first , we found t he maximal
mat ch between a pre-curat ed list of ingredient s and t he t ext
of t he line. However, t his missed t oo many ingredient s,
while misident ifying ot hers. In t he second approach, we
used regular expression mat ching t o remove non-ingredient
t erms from t he line and ident ified t he remainder as t he ingredient . We removed quant ifiers, such as e.g. “ 1 lb” or “ 2
cups”, words referring t o consist ency or t emperat ure, e.g.
chopped or cold, along wit h a few ot her heurist ics, such as
removing cont ent in parent heses. For example“ 1 (28 ounce)
can baked beans (such as Bush’s Original R )” is ident ified
as “ baked beans”. By limit ing t he list of pot ent ial t erms
t o remove from an ingredient ent ry, we erred on t he side
of not conflat ing pot ent ially ident ical or highly similar ingredient s, e.g. “ cheddar cheese”, used in 2450 recipes, was
considered di erent from “ sharp cheddar cheese”, occurring
in 394 recipes.
We t hen generat ed an ingredient list sort ed by frequency
of ingredient occurrence and select ed t he t op 1000 common
ingredient names as our finalized ingredient list . Each of t he
t op 1000 ingredient s occurred in 23 or more recipes, wit h
plain salt making an appearance in 47.3% of recipes. T hese
ingredient s also account ed for 94.9% of ingredient ent ries in
t he recipe dat aset . T he remaining ingredient s were missed
eit her because of high specificity (e.g. yolk-free egg noodle),
referencing brand names (e.g. Plant ers almonds), rarity (e.g.
serviceberry), misspellings, or not being a food (e.g. “ nylon
net t ing”).
T he remaining processing t ask was t o ident ify cooking
processes from t he direct ions. We first ident ified all heat ing
met hods using a list ing in t he W ikipedia ent ry on cooking
[18]. For example, baking, boiling, and st eaming are all ways
40
20
0
10
% in recipes
30
midwest
mountain
northeast
west coast
south
bake
boil
fry
grill
roast
simmer marinate
method
F igure 1: T he percent age of recipes by region t hat
apply a specific heat ing met hod.
of heat ing t he food. We t hen ident ified mechanical ways of
processing t he food such as chopping and grinding, and ot her
chemical t echniques such as marinat ing and brining.
3.2 Regional preferences
Choosing one cooking met hod over anot her appears t o be
a quest ion of regional t ast e. 5.8% of recipes were classified
int o one of five US regions: M ount ain, M idwest , Nort heast ,
Sout h, and West Coast (including A laska and Hawaii). Figure 1 shows significant ly ( 2 t est p-value < 0.001) varying
preferences in t he di erent US regions among 6 of t he most
popular cooking met hods. Boiling and simmering, bot h involving heat ing food in hot liquids, are more common in t he
Sout h and M idwest . M arinat ing and grilling are relat ively
more popular in t he West and M ount ain regions, but in t he
West more grilling recipes involve seafood (18/ 42 = 42%)
relat ive t o ot her regions combined (7/ 106 = 6%). Frying
is popular in t he Sout h and Nort heast . Baking is a universally popular and versat ile t echnique, which is oft en used for
bot h sweet and savory dishes, and is slight ly more popular
in t he Nort heast and M idwest . Examinat ion of individual
recipes reflect ing t hese frequencies shows t hat t hese di erences in preference can be t ied t o di erences in demographics, immigrant cult ure and availability of local ingredient s,
e.g. seafood.
4. I NGREDI ENT COM PLEM ENT NETWORK
Can we learn how t o combine ingredient s from t he dat a?
Here we employ t he occurrences of ingredient s across recipes
t o dist ill users’ knowledge about combining ingredient s.
We const ruct ed an ingredient complement network based
on pointwise mut ual informat ion (PM I) defined on pairs of
ingredient s (a, b):
PM I(a, b) = l og
p(a, b)
,
p(a)p(b)
where
p(a, b) =
# of recipes cont aining a and b
,
# of recipes
p(a) =
# of recipes cont aining a
,
# of recipes
p(b) =
# of recipes cont aining b
.
# of recipes
T he PM I gives t he probability t hat two ingredient s occur
t oget her against t he probability t hat t hey occur separat ely.
Complement ary ingredient s t end t o occur t oget her far more
oft en t han would be expect ed by chance.
Figure 2 shows a visualizat ion of ingredient complement arity. T wo dist inct subcommunit ies of recipes are immediat ely apparent : one corresponding t o savory dishes, t he
ot her t o sweet ones. Some cent ral ingredient s, e.g. egg and
salt , act ually are pushed t o t he periphery of t he network.
T hey are so ubiquit ous, t hat alt hough t hey have many edges,
t hey are all weak, since t hey don’t show part icular complement arity wit h any single group of ingredient s.
We furt her probed t he st ruct ure of t he complement arity
network by applying a network clust ering algorit hm [13].
T he algorit hm confirmed t he exist ence of two main clust ers
cont aining t he vast majority of t he ingredient s. A n int erest ing sat ellit e clust er is t hat of mixed drink ingredient s, which
is evident as a const ellat ion of small nodes locat ed near t he
t op of t he sweet clust er in Figure 2. T he clust er includes
t he following ingredient s: lime, rum, ice, orange, pineapple
juice, vodka, cranberry juice, lemonade, t equila, et c.
For each recipe we recorded t he minimum, average, and
maximum pairwise pointwise mut ual informat ion between
ingredient s. T he int uit ion is t hat complement ary ingredient s would yield higher rat ings, while ingredient s t hat don’t
go t oget her would lower t he average rat ing. We found t hat
while t he average and minimum pointwise mut ual informat ion between ingredient s is uncorrelat ed wit h rat ings, t he
maximum is very slight ly posit ively correlat ed wit h t he average rat ing for t he recipe (⇢ = 0.09, p-value < 10− 10 ). T his
suggest s t hat having at least two complement ary ingredient s
very slight ly boost s a recipe’s prospect s, but having clashing
or unrelat ed ingredient s does not seem t o do harm.
5. RECI PE M ODI FI CATI ONS
Co-occurrence of ingredient s aggregat ed over individual
recipes reveals t he st ruct ure of cooking, but t ells us lit t le
about how flexible t he ingredient proport ions are, or whet her
some ingredient s could easily be left out or subst it ut ed. A n
experienced cook may know t hat apple sauce is a low-fat alt ernat ive t o oil, or may know t hat nut meg is oft en opt ional,
but a novice cook may implement recipes lit erally, afraid
t hat deviat ing from t he inst ruct ions may produce poor result s. W hile a t radit ional hardcopy cookbook would provide
few such hint s, t hey are plent iful in t he reviews submit t ed
by users who implement ed t he recipes, e.g. “ T his is a great
recipe, but using fresh tomatoes only adds a few minutes to
the prep time and makes it taste so much better”, or anot her
comment about t he same salsa recipe“ T his is by far the best
recipe we have ever come across. We did however change it
just a little bit by adding extra onion.”
A s t he examples illust rat e, modificat ions are report ed even
when t he user likes t he recipe. In fact , we found t hat 60.1%
of recipe reviews cont ain words signaling modificat ion, such
as “ add”, “ omit ”, “ inst ead”, “ ext ra” and 14 ot hers. Furt hermore, it is t he reviews t hat include changes t hat have a st at ist ically higher average rat ing (4.49 vs. 4.39, t -t est p-value
< 10− 10 ), and lower rat ing variance (0.82 vs. 1.05, Bart let t
t est p-value < 10− 10 ), as is evident in t he dist ribut ion of
rat ings, shown in Fig. 3. T his suggest s t hat flexibility in
recipes is not necessarily a bad t hing, and t hat reviewers
who don’t ment ion modificat ions are more likely t o t hink of
t he recipe as perfect , or t o dislike it ent irely.
tiger prawn
lobster tail
sea salt black pepper
artichoke
greek yogurt
kosher salt black pepper
root beer
white mushroom
haddock
button mushroom
goat cheese
salt black pepper
port wine
watercres
sea scallop
triple sec
sour mix
sweet
white rum
club soda
butter
cranberry juice
ice
pomegranate juice
pink lemonade
banana liqueur
shallot
juiced
tequila
smoked ham
chocolate ice cream
asparagus
brie cheese
watermelon
hazelnut
orange juice
eggnog
maraschino cherry juice
lemon juice
juice
angel food cake mix
superfine sugar
plum
white chocolate
artificial sweetener
semisweet chocolate
chocolate coffee
cake flour
raspberry jam
hazelnut liqueur
cocoa powder
almond paste
creme de menthe liqueur
milk chocolate
vanilla wafer
peach
cantaloupe
pie shell
pistachio nut
bourbon whiskey
vanilla yogurt
blackberry
fig
golden syrup
kiwi
banana
pear
prune
chocolate wafer
candied cherry
red candied cherry
apricot jam
apple juice
raspberry gelatin mix
currant
orange gelatin
strawberry gelatin mix
tapioca
confectioners' sugar walnut coconut raisin
whipped topping
peppermint candy
turbinado sugar
pie crust
cream of tartar
german chocolate
flour
baking soda
strawberry preserve
yellow food coloring
green candied cherry
pistachio pudding mix
candied pineapple
coffee powder
vanilla
extract
semisweet chocolate chip
maple extract
white chip
chocolate chip
devil's food cake mix
vanilla frosting
low fat peanut butter
chocolate cookie crust
lemon gelatin mix
crisp rice cereal
unpie crust
unbleached flour
applesauce
solid pack pumpkin
tapioca flour
fruit
brownie mix
flax seed
sugar free vanilla pudding mix
oat bran
butterscotch pudding mix
spice cake mix
skim milk
orange gelatin mix
teriyaki sauce
yeast
sunflower kernel
matzo meal
pie filling
barley nugget cereal
wheat
cream cheese
wheat bran
beaten egg
sourdough starter
non fat milk powder
neufchatel cheese
pretzel
chocolate pudding
baker's semisweet chocolate
decorating gel
cook
low fat margarine
brick cream cheese
cornflakes cereal
cornmeal
crescent dinner roll
pancake mix
white rice pork
imitation crab meat
beer
vegetable
low fat cheddar cheese
ranch dressing
corn
green bean
spiral pasta
salt
italian dressing mix
whole wheat bread
cornflake
olive
kidney bean
white corn
vinegar
biscuit baking mix
ketchup
pickle relish
crescent roll
butter cooking spray
potato chip
dill pickle
bean
barbeque sauce
rye bread
butter cracker
green chile
baby pea
chili seasoning mix
spicy pork sausage
sausage
brown mustard
colby monterey jack cheese
stuffing
picante sauce
turkey gravy
cheese
lean beef
ranch bean
macaroni
taco seasoning
taco sauce
elbow macaroni
kernel corn
catalina dressing
ham
onion
whole wheat tortilla
tomato vegetable juice cocktail
corn tortilla chip
green chily
mexican cheese blend
butter bean
stuffed olive
egg noodle
mild cheddar cheese
colby cheese
beef gravy
cream of mushroom soup
corn bread mix
sour cream
vidalia onion
taco seasoning mix
french onion soup
processed cheese
stuffing mix
barbecue sauce
cream corn
biscuit mix
bread stuffing mix
buttermilk biscuit
onion salt
cream of chicken soup
sourdough bread
chili without bean
tuna
curd cottage cheese
monterey jack cheese
refried bean
enchilada sauce
lima bean
garlic salt
steak sauce
yellow mustard
mustard
pimento pepper
ranch dressing mix
french dressing
dill pickle relish
sauerkraut
corned beef
thousand island dressing
vegetable combination
corn chip
tortilla chip
pickled jalapeno pepper
guacamole
beef chuck
powder
chunk chicken breast
pepperjack cheese
kaiser roll
pimento
pickle
bacon grease
hoagie roll
corkscrew shaped pasta
tomato juice
flour tortilla salsa
english muffin
blue cheese dressing
pepperoni sausage
pizza sauce
chili bean
mixed vegetable
onion flake
seasoning salt
pimiento
onion soup mix
pepper
pizza crust
bread dough
chuck roast
wax bean
roast beef
beef consomme
wild rice mix
corn tortilla
brown gravy mix
cream of potato soup
dill pickle juice
saltine cracker
biscuit
bratwurst
round steak
golden mushroom soup
sandwich roll
white bread
apple jelly
baking mix
black olive
beef bouillon
pinto bean
parsley flake
meat tenderizer
vegetable soup mix
crescent roll dough
dressing
marinara sauce
spaghetti sauce
salami
pepperoni
tomato sauce
potato
green bell pepper
venison
broccoli floweret
cottage cheese
liquid smoke
cracker
zesty italian dressing
red kidney bean
smoked sausage
worcestershire
sauce
chicken
spicy brown mustard
part skim ricotta cheese
lasagna noodle
spaghetti
salt free seasoning blend
polish sausage
swiss cheese
provolone cheese
seashell pasta
bacon dripping
steak
onion separated
vegetable cooking spray
seasoning
horseradish
pork chop
buttery round cracker
italian bread
italian salad dressing
toothpick
saltine
adobo seasoning
noodle
great northern bean
long grain
salad green
part skim mozzarella cheese
louisiana hot sauce
black bean
navy bean
cornbread
beef brisket
mexican corn
pasta sauce
basil sauce
manicotti shell
red bean
seafood seasoning
browning sauce
chicken bouillon
iceberg lettuce
italian sauce
ziti pasta
meatless spaghetti sauce
turkey breast
lean turkey
lettucechili sauce
pizza crust dough
cheese ravioli
tomato
mozzarella cheese
white hominy
baby carrot
barley
beef broth
green pea
poultry seasoning
kielbasa sausage
monosodium glutamate
chile sauce
alfredo sauce
mild italian sausage
pasta shell
tube pasta
tomato paste
fajita seasoning
beef stew meat
paprika chili powder
garlic
turkey
hot pepper sauce
broccoli
mayonnaise
bacon bread
old bay seasoning tm
mustard powder
country pork rib
pasta
alfredo pasta sauce
italian sausage
white potato
nutritional yeast
rump roast
black eyed pea
veal
beef chuck roast
long grain rice
lean pork
beef round steak
sugar based curing mixture
crouton
dill seed
bagel
grape jelly
creole seasoning
okra
red potato
smoked paprika
pepper jack cheese
celery salt
mixed nut
buttermilk baking mix
celery
romano cheese
rotini pasta
banana pepper
pimento stuffed green olive
honey mustard
cabbage
cocktail rye bread
herb stuffing mix
popped popcorn
pork shoulder roast
chicken soup base
lump crab meat
hot sauce
onion powder
celery seed
herb bread stuffing mix
yellow summer squash
caesar dressing
lentil
marjoram
beef sirloin
bacon bit
cocktail sauce
fat free sour cream
pork sparerib
miracle whip ‚Ñ
potato flake
yellow cornmeal
milk
margarine
cereal
egg
candy
dill
ditalini pasta
rigatoni pasta
ricotta cheese
pearl barley
ham hock
green salsa
chive
steak seasoning
cider vinegar
caraway seed
chow mein noodle
bread flour
crab meat
broiler fryer chicken up
herb stuffing
savory
meatball
jalapeno pepper
cauliflower
sirloin steak
low fat sour cream
oil
flat iron steak
pearl onion
chicken leg quarter
cod
cauliflower floret
lemon pepper seasoning
oyster
wild rice
catfish
apple cider vinegar
white vinegar
unpie shell
lemon pepper
pork loin chop
water chestnut
pickling spice
yellow squash
chorizo sausage
fat free italian dressing
beef stock
cajun seasoning
puff pastry shell
fat free mayonnaise
coleslaw mix
distilled white vinegar
rapid rise yeast
vegetable bouillon
hungarian paprika
french bread
parmesan cheese
spinach
artichoke heart
russet potato
flounder
caulifloweret
pork loin roast
molasse
yellow onion
yellow pepper
poblano pepper
crawfish tail
radishe
low fat mayonnaise
fat free cream cheese
vital wheat gluten
italian cheese blend
cannellini bean
burgundy wine
pork shoulder
broccoli floret
beet
green beans snapped
chicken liver
white onion
beef sirloin steak
green chile pepper
cheese tortellini
fusilli pasta
fat free chicken broth
marinated artichoke heart
andouille sausage
white cheddar cheese
chicken wing
giblet
chicken bouillon powder
white wine vinegar
romaine
beef short rib
cumin
cayenne pepper sage
cooking oil
mustard seed
salmon steak
pumpkin seed
rye flour
bread machine yeast
oatmeal
nilla wafer
vanilla
blue cheese
half and half
unpastry shell
topping
garlic paste
egg roll wrapper
egg substitute
non fat yogurt
cooking spray
sunflower seed
whole wheat flour
powdered milk
sugar cookie mix
food coloring
ramen noodle
fat free evaporated milk
chutney
black pepper
french baguette
white bean
chicken breastmushroom
scallion
chicken thigh
sesame seed
cashew
softened butter
cherry gelatin
milk powder
brown sugar
crispy rice cereal
butterscotch chip
firmly brown sugar
low fat yogurt
red grape
fat free yogurt
basil pesto
pre pizza crust
oregano
chicken broth
italian seasoning
bay
tomatillo
avocado
low fat
canola oil
dijon mustard
acorn squash
whole milk
red apple
pineapple chip
peanut poppy seed
lime gelatin mix
wheat germ
lemon gelatin
baking apple
peanut butter
vanilla pudding
german chocolate cake mix
maple syrup
tart apple
anise seed
turmeric
garam masala
lobster
pumpkin pie spice
caramel
chocolate mix
butter shortening
ginger paste
chicken ramen noodle
mixed fruit
caramel ice cream topping
candy coated milk chocolate
maple flavoring
honey
asafoetida powder
green tomato
whipped topping mix
milk chocolate chip
mango chutney
pita bread
chipotle pepper
cucumber
pesto
escarole
white kidney bean
kale
clam
poblano chile pepper
clam juice
red pepper
brown rice
white pepper
ginger garlic paste
wonton wrapper
serrano pepper
green lettuce
baby corn
salad shrimp
curry powder
fruit gelatin mix
apple pie spice
marshmallow
apple pie filling
orange marmalade
low fat cream cheese
cranberry sauce
lite whipped topping
low fat whipped topping
jellied cranberry sauce
individually wrapped caramel
candy coated chocolate
marshmallow creme
mixed berry
rice flour
lime gelatin
recipe pastry
whole wheat pastry flour
chocolate cake mix
cream of shrimp soup
green grape
ring
pastry
apple
peppercorn
green apple
apricot preserve
soy milk
potato starch
lemon cake mix
lemon pudding mix
nut
lard
peanut butter chip
vegetable shortening
toffee baking bit
lemon peel
lemon yogurt
berry cranberry sauce
sour milk
pumpkin
1% buttermilk
evaporated milk
baking cocoa
corn syrup
apple butter
milk chocolate candy kisse
powdered fruit pectin
cornstarch
cinnamon water
vegetable oil
oat sugar
persimmon pulp
blueberry pie filling
raspberry preserve
cinnamon sugar
raspberry gelatin
allspice
mace
gingersnap cooky
strawberry gelatin
white cake mix
cherry pie filling
strawberry jam
any fruit jam
black walnut
coconut extract
shortening
pecan
anise extract
orange peel
yellow cake mix
butterchocolate
extract cookie
bourbon
baking chocolate
rhubarb
self rising flour
graham cracker
buttermilk
date
powdered non dairy creamer
white chocolate chip
chocolate pudding
mix
chocolate frosting
candied citron
fruit cocktail
cinnamon
red candy
chocolate sandwich
cooky
lemon extract
golden delicious apple
chicken drum
red lentil
panko bread
habanero pepper
snow pea
bamboo shoot
low sodium soy sauce
cumin seed
fenugreek seed
curry
cooking sherry
romaine lettuce
fennel seed
oyster sauce
ghee
spaghetti squash
eggplant
bow tie pasta
plum tomato
bell pepper
thyme
garbanzo bean
farfalle pasta
brown lentil
bay scallop
carrot
gingerroot
coriander seed
bean sprout
smoked salmon
basmati rice
anchovy
angel hair pasta
green olive
chicken breast half
chickpea
rutabaga
cream cheese spread
vegetable stock
parsley
red wine
red wine vinegar
muenster cheese
red snapper
saffron thread
herb
prosciutto
collard green
green cabbage
fish stock
round
fontina cheese
basil
zucchini
vermicelli pasta
asiago cheese
linguine
low sodium beef broth
vegetable broth
shrimp
low sodium chicken broth
pork loin
beef flank steak
hoisin sauce
yogurt
soy sauce
cardamom
fettuccini pasta
parsnip
pita bread round
tarragon vinegar
turnip
cilantro
coriander
black peppercorn
red chile pepper
allspice berry
sugar pumpkin
cardamom pod
splenda
clove
mandarin orange
silken tofu
peppermint extract
hot
red lettuce
jalapeno chile pepper
adobo sauce
brussels sprout
seed
linguini pasta
orzo pasta
penne pasta
roma tomato
caper
cherry tomato
leek
phyllo dough
ears corn
halibut
sugar snap pea
chipotle chile powder
bok choy
chinese five spice powder
ginger
grape
walnut oil
granny smith apple
red delicious apple
candied mixed fruit peel
pastry shell
salt pepper
tofu
napa cabbage
rice wine vinegar
stuffed green olive
tarragon
red pepper flake
red cabbage
sherry
rice wine
short grain rice
raspberry vinegar
sweet potato
crystallized ginger
apricot nectar
golden raisin
mixed spice
food cake
orangeangel
extract
rum extract
mixed salad green
apple cider
apricot
egg white pineapplecranberry
vanilla pudding mix
miso paste
asian sesame oil
rice vinegar
white grape juice
rose water
balsamic vinaigrette dressing
chicken stock
pork tenderloin
sesame oil
fish sauce
beef tenderloin
saffron
flank steak
curry paste
jasmine rice
chile paste
rice noodle
grapefruit
low fat milk
orange zest
nectarine
pound cake
gruyere cheese
serrano chile pepper
low fat cottage cheese
red curry paste
lemon gras
peanut oil
fettuccine pasta
swiss chard
creme fraiche
pancetta bacon
debearded
squid
lamb
chile pepper
pork roast
kaffir lime
butternut squash
mirin
ginger root
coconut milk
tahini
spanish onion
scallop
mussel
arborio rice
rosemary
red onion
bulgur
salmon
portobello mushroom
new potato
red bell pepper
linguine pasta
tamari
sake
kalamata olive
feta cheese
asparagu
marsala wine
quinoa
corn oil
chili oil
whipping cream
graham cracker crust
almond
extract
macadamia nut
puff pastry
cream
lime peel
baking powder nutmeg almond
cocoa
red food coloring
green food coloring
lime zest
whiskey
star anise pod
strawberry
mandarin orange segment
cherry
blueberry
yam
tea bag
semolina flour
raspberry
key lime juice
lemon zest brandy
orange sherbet
cola carbonated beverage
heavy whipping cream
gelatin
grape juice
cream of coconut
amaretto liqueur
vanilla ice cream
chocolate syrup
coffee liqueur
ladyfinger
lime juice
mango
skewer
wooden skewer
chicken leg
portobello mushroom cap
crimushroom
yukon gold potato
cracked black pepper
shiitake mushroom
jicama
couscou
heavy cream
honeydew melon
orange liqueur
vanilla bean
white sugar
mascarpone cheese
pine nut
tilapia
cornish game hen
kosher salt
papaya
zested
orange
greek seasoning
english cucumber
coconut oil
malt vinegar
brandy based orange liqueur
pineapple ring
coconut cream
egg yolk
chocolate hazelnut spread
irish cream liqueur
bittersweet chocolate
balsamic vinegar
orange roughy
lime mint
sauce
lemon lime carbonated beverage
champagne
sour cherry
garlic
gorgonzola cheese
sea salt
baby spinach
grapefruit juice
vodka
lemonade
pineapple juice
spiced rum
rum
maraschino cherry
vanilla vodka
lemon
simple syrup
carbonated water
fat free half and half
white balsamic vinegar
fennel
irish stout beer
italian parsley
tuna steak
vermouth
gin
limeade
triple sec liqueur
ginger ale
butterscotch schnapp
olive oil
grape tomato
chestnut
leg of lamb
melon liqueur
coconut rum
grenadine syrup
lemon lime soda
peach schnapp
arugula
white wine
trout
process cheese sauce
cream of celery soup
hamburger bun
processed cheese food
baking potato
process american cheese
pork sausage
creamed corn
canadian bacon
chili
sharp cheddar cheese
french green bean
cheddar cheese soup
beef
tomato soup
processed american cheese
cheddar cheese
biscuit dough
process cheese
american cheese
hash brown potato
chunk chicken
corn muffin mix
tomato based chili sauce
hot dog
tater tot
grit
hot dog bun
dinner roll
F igure 2: I ngredient complement network. T wo ingredient s share an edge if t hey occur t oget her more t han
would be expect ed by chance and if t heir pointwise mut ual informat ion exceeds a t hreshold.
0.6
0.1
0.2
0.3
0.4
0.5
no modification
with modification
0.0
proportion of reviews with given rating
luncheon meat
1
2
3
4
5
rating
F igure 3: T he likelihood t hat a review suggest s a
modificat ion t o t he recipe depends on t he st ar rat ing
t he review is assigning t o t he recipe.
In t he following, we describe t he recipe modificat ions ext ract ed from user reviews, including adjust ment , delet ion
and addit ion. We t hen present how we const ruct ed an ingredient subst it ut e network based on t he ext ract ed informat ion.
5.1 Adjustments
Some modificat ions involve increasing or decreasing t he
amount of an ingredient in t he recipe. In t his and t he following analyses, we split t he review on punct uat ion such
as commas and periods. We used simple heurist ics t o det ect when a review suggest ed a modificat ion: adding/ using
more/ less of an ingredient count ed as an increase/ decrease.
Doubling or increasing count ed as an increase, while reducing, cut t ing, or decreasing count ed as a decrease. W hile it is
likely t hat t here are ot her expressions signaling t he adjust ment of ingredient quant it ies, using t his set of t erms allowed
us t o compare t he relat ive rat e of modificat ion, as well as
t he frequency of increase vs. decrease between ingredient s.
T he ingredient s t hemselves were ext ract ed by performing a
maximal charact er mat ch wit hin a window following an adjust ment t erm.
Figure 4 shows t he rat ios of t he number of reviews suggest ing modificat ions, eit her increases or decreases, t o t he
number of recipes t hat cont ain t he ingredient . T wo pat t erns
are immediat ely apparent . Ingredient s t hat may be perceived as being unhealt hy, such as fat s and sugars, are, wit h
t he except ion of veget able oil and margarine, more likely
t o be modified, and t o be decreased. On t he ot her hand,
flavor enhancers such as soy sauce, lemon juice, cinnamon,
Worcest ershire sauce, and t oppings such as cheeses, bacon
and mushrooms, are also likely t o be modified; however, t hey
t end t o be added in great er, rat her t han lesser quant it ies.
Combined, t he pat t erns suggest t hat good-t ast ing but “ unhealt hy” ingredient s can be reduced, if desired, while spices,
ext ract s, and t oppings can be increased t o t ast e.
5.2 Deletionsand additions
Recipes are also frequent ly modified such t hat ingredient s
are omit t ed ent irely. We looked for words indicat ing t hat
t he reviewer did not have an ingredient (and hence did not
use it ), e.g. “ had no” and “ didn’t have”. We furt her used
“ omit / left out / left o / bot her wit h” as indicat ion t hat t he
reviewer had omit t ed t he ingredient s, pot ent ially for ot her
reasons. Because reviewers oft en used simplified t erms, e.g.
“ vanilla” inst ead of “ vanilla ext ract ”, we compared words in
proximity t o t he act ion words by const ruct ing 4-charact ergrams and calculat ing t he cosine similarity between t he ngrams in t he review and t he list of ingredient s for t he recipe.
To ident ify addit ions, we simply looked for t he word “ add”,
but omit t ed possible subst it ut ions. For example, we would
use “ added cucumber”, but not “ added cucumber inst ead of
green pepper”, t he lat t er of which we analyze in t he following sect ion. We t hen compared t he addit ion t o t he list of
ingredient s in t he recipes, and considered t he addit ion valid
only if t he ingredient does not already belong in t he recipe.
1.00
0.50
0.20
0.10
0.05
(# reviews adjusting up)/(# recipes)
garlic
broth
cheddarcinnamon chicken
bacon
chip honey
mushroom parmesan chocolate
cream
cheese
cornstarch
worcestershire s.
potato
lemon juice
garlic powder
chicken breast
milk
carrot
sour cream
tomato
flour
brown sugar
vanilla extract
basil
pecan
nutmegwater
butterwhite sugar
onion
celery
mayonnaise
sugar
oregano
cs’. sugar black pepper
egg
salt
walnut
baking powder
pepper
olive oil
green bell pepper
baking soda
parsley
shortening
vegetable oil
0.01
0.02
soy sauce
margarine
0.01
0.02
0.05
0.10
0.20
0.50
1.00
(# reviews adjusting down)/(# recipes)
F igure 4: Suggest ed modificat ions of quant ity for
t he 50 most common ingredient s, derived from
recipe reviews. T he line denot es equal numbers of
suggest ed quant ity increases and decreases.
Table 1 shows t he correlat ion between ingredient modificat ions. A s might be expect ed, t he more frequent ly an ingredient occurs in a recipe, t he more t imes it s quant ity has
t he opport unity t o be modified, as is evident in t he st rong
correlat ion between t he t he number of recipes t he ingredient
occurs in and bot h increases and decreases recommended in
reviews. However, t he more common an ingredient , t he more
st able it appears t o be. Recipe frequency is negat ively correlat ed wit h delet ions/ recipe (⇢ = 0.22), addit ions/ recipe
(⇢ = 0.25), and increases/ recipe (⇢ = 0.26). For example, salt is so essent ial, appearing in over 21,000 recipes, t hat
we det ect ed only 18 reviews where it was explicit ly dropped.
In cont rast , Worcheshire sauce, appearing in 1,542 recipes,
is dropped explicit ly in 148 reviews.
A s might also be expect ed, addit ions are posit ively correlat ed wit h increases, and delet ions wit h decreases. However,
addit ions and delet ions are very weakly negat ively correlat ed, indicat ing t hat an ingredient t hat is added frequent ly
is not necessarily omit t ed more frequent ly as well.
T able 1: C orrelat ions between ingredient modificat ions
# recipes
addit ion
delet ion
increase
addit ion
0.41
delet ion
0.22
-0.15
increase
0.61
0.79
0.09
decrease
0.68
0.11
0.58
0.39
5.3 I ngredient substitute network
Replacement relat ionships show whet her one ingredient
is preferable t o anot her. T he preference could be based
on t ast e, availability, or price. Some ingredient subst it ut ion t ables can be found online1 , but are neit her ext ensive
nor cont ain informat ion about relat ive frequencies of each
1
e.g.,
ht t p:/ / allrecipes.com/ HowTo/ common-ingredient subst it ut ions/ det ail.aspx
F igure 5: I ngredient subst it ut e network. N odes are
sized according t o t he number of t imes t hey have
been recommended as a subst it ut e for anot her ingredient , and colored according t o t heir indegree.
subst it ut ion. T hus, we found an alt ernat ive source for ext ract ing replacement relat ionships – users’ comment s, e.g.
“ I replaced the butter in the frosting by sour cream, just to
soothe my conscience about all the fatty calories”.
To ext ract such knowledge, we first parsed t he reviews
as follows: we considered several phrases t o signal replacement relat ionships: “ replace a wit h b”, “ subst it ut e b for a”,
“ b inst ead of a”, et c, and mat ched a and b t o our list of
ingredient s.
We const ruct ed an ingredient subst it ut e network t o capt ure users’ knowledge about ingredient replacement . T his
weight ed, direct ed network consist s of ingredient s as nodes.
We t hresholded and eliminat ed any suggest ed subst it ut ions
t hat occurred fewer t han 5 t imes. We t hen det ermined t he
weight of each edge by p(b|a), t he proport ion of subst it ut ions of ingredient a t hat suggest ingredient b. For example,
68% of subst it ut ions for whit e sugar were t o splenda, an
art ificial sweet ener, and hence t he assigned weight for t he
sugar ! spl enda edge is 0.68.
T he result ing subst it ut ion network, shown in Figure 5,
exhibit s st rong clust ering. We examined t his st ruct ure by
applying t he map generat or t ool by Rosvall et al. [13], which
uses a random walk approach t o ident ify clust ers in weight ed,
direct ed networks. T he result ing clust ers, and t heir relat ionships t o one anot her, are shown in Fig. 6. T he derived clust ers could be used when following a relat ively new recipe
which may not receive many reviews, and t herefore many
suggest ions for ingredient subst it ut ions. If one does not have
all ingredient s at hand, one could examine t he cont ent of
one’s fridge and pant ry and mat ch it wit h ot her ingredient s
found in t he same clust er as t he ingredient called for by
t he recipe. Table 2 list s t he cont ent s of a few such sample
ingredient clust ers, and Fig. 7 shows two example clust ers
ext ract ed from t he subst it ut e network.
T able 2: C lust ers of ingredient s t hat can be subst it ut ed for one anot her. A maximum of 5 addit ional
ingredient s for each clust er are list ed, ordered by
P ageR ank.
main
chicken
olive oil
sweet
pot at o
baking
powder
almond
apple
egg
t ilapia
spinach
it alian
seasoning
cabbage
ot her ingredient s
t urkey, beef, sausage, chicken breast , bacon
but t er, apple sauce, oil, banana, margarine
yam, pot at o, pumpkin, but t ernut squash,
parsnip
baking soda, cream of t art ar
pecan, walnut , cashew, peanut , sunflower s.
peach, pineapple, pear, mango, pie filling
egg whit e, egg subst it ut e, egg yolk
cod, cat fish, flounder, halibut , orange roughy
mushroom, broccoli, kale, carrot , zucchini
basil, cilant ro, oregano, parsley, dill
coleslaw mix, sauerkraut , bok choy
napa cabbage
Finally, we examine whet her t he subst it ut ion network encodes preferences for one ingredient over anot her, as evidenced by t he relat ive rat ings of similar recipes, one which
cont ains an original ingredient , and anot her which implement s a subst it ut ion. To t est t his hypot hesis, we const ruct
a “ preference network”, where one ingredient is preferred t o
anot her in t erms of received rat ings, and is const ruct ed by
creat ing an edge (a, b) between a pair of ingredient s, where a
and b are list ed in two recipes X and Y respect ively, if recipe
rat ings R X > R Y . For example, if recipe X includes beef,
ket chup and cheese, and recipe Y cont ains beef and pickles, t hen t his recipe pair cont ribut es t o two edges: one from
pickles t o ket chup, and t he ot her from pickles t o cheese. T he
aggregat e edge weight s are defined based on PM I. Because
PM I is a symmet ric quant ity (PM I(a; b) = PM I(b; a)), we
int roduce a direct ed PM I measure t o cope wit h t he direct ionality of t he preference network:
p(a ! b)
,
PM I(a ! b) = log
p(a)p(b)
where
p(a ! b) =
# of recipe pairs from a t o b
,
# of recipe pairs
and p(a), p(b) are defined as in t he previous sect ion.
We find high correlat ion between t his preference network
and t he subst it ut ion network (⇢ = 0.72, p < 0.001). T his observat ion suggest s t hat t he subst it ut e network encodes users’
ingredient preference, which we use in t he recipe predict ion
t ask described in t he next sect ion.
6. RECI PE RECOM M ENDATI ON
We use t he above insight s t o uncover novel recommendat ion algorit hms suit able for recipe recommendat ions. We
use ingredient s and t he relat ionships encoded between t hem
in ingredient networks as our main feat ure set s t o predict
recipe rat ings, and compare t hem against feat ures encoding nut rit ion informat ion, as well as ot her baseline feat ures
such as cooking met hods, and preparat ion and cook t ime.
vegetable shortening,..
pumpkin seed,..
lemon cake mix,..
baking powder,..
dijon mustard,..
black olive,..
golden syrup,..
lemonade,..
graham cracker,..
coconut milk,..
almond extract,..
vanilla,..
peach schnapp,..
cranberry,..
strawberry,..
almond,..
milk,..
lemon juice,..
cinnamon,..
apple juice,..
bread,..chocolate chip,..
corn chip,..
olive oil,..
sour cream,.. apple,..
white wine,.. champagne,..
flour,..
cottage cheese,..
egg,..
chicken broth,..
garlic,.. sauce,..
sweet potato,..
onion,..
tomato,..
brown rice,..
celery,..
pepper,..
spaghetti sauce,..
hot,..
cheese,.. chicken,..
spinach,..
seasoning,..
black bean,..
red potato,..
italian seasoning,..
cream of mushroom soup,..
sugar snap pea,..iceberg lettuce,.. curry powder,..
imitation crab meat,..
pickle,..
quinoa,..
tilapia,.. cabbage,..sea scallop,.. smoked paprika,..
hoagie roll,..
honey,..
pie crust,..
F igure 6: I ngredient subst it ut ion clust ers. N odes
represent clust ers and edges indicat e t he presence of
recommended subst it ut ions t hat span clust ers. E ach
clust er represent s a set of relat ed ingredient s which
are frequent ly subst it ut ed for one anot her.
ginger root
whipping cream
evaporated milk
half and half
cream
buttermilk
heavy cream
cardamom
pumpkin pie spice
cinnamon
heavy whipping cream
milk
clove
whole milk
soy milk
skim milk
(a) milk subst it ut es
ginger
nutmeg
allspice
mace
(b) cinammon subst it ut es
F igure 7: R elat ionships between ingredient s locat ed
wit hin two of t he clust ers from F ig. 6.
T hen we apply a discriminat ive machine learning met hod,
st ochast ic gradient boost ing t rees [6], t o predict recipe rat ings.
In t he experiment s, we seek t o answer t he following t hree
quest ions. (1) Can we predict users’ preference for a new
recipe given t he informat ion present in t he recipe? (2) W hat
are t he key aspect s t hat det ermine users’ preference? (3)
Does t he st ruct ure of ingredient networks help in recipe recommendat ion, and how?
6.1 Recipe Pair Prediction
T he goal of our predict ion t ask is: given a pair of similar
recipes, determine which one has higher average rating than
the other. T his t ask is designed part icularly t o help users
wit h a specific dish or meal in mind, and who are t rying t o
decide between several recipe opt ions for t hat dish.
R ecipe pair dat a. T he dat a for t his predict ion t ask
consist s of pairs of similar recipes. T he reason for select ing similar recipes, wit h high ingredient overlap, is t hat
while apples may be quit e comparable t o oranges in t he
cont ext of recipes, especially if one is evaluat ing salads or
dessert s, lasagna may not be comparable t o a mixed drink.
To derive pairs of relat ed recipes, we comput ed similarity
combined
ing. networks
nutrition
full ingredients
0.80
0.75
0.70
0.65
baseline
0.60
wit h a cosine similarity between t he ingredient list s for t he
two recipes, weight ed by t he inverse document frequency,
l og(# of r eci pes/ # of r eci pes contai ni ng the i ngr edi ent).
We considered only t hose pairs of recipes whose cosine similarity exceeded 0.2. T he weight ing is int ended t o ident ify
higher similarity among recipes sharing more dist inguishing
ingredient s, such as Brussels sprout s, as opposed t o recipes
sharing very common ones, such as but t er.
A furt her challenge t o obt aining reliable relat ive rankings
of recipes is variance int roduced by having di erent users
choose t o rat e di erent recipes. In addit ion, some users
might not have a su cient number of reviews under t heir
belt t o have calibrat ed t heir own rat ing scheme. To cont rol for variat ion int roduced by users, we examined recipe
pairs where t he same users are rat ing bot h recipes and are
collect ively expressing a preference for one recipe over anot her. Specifically, we generat ed 62,031 recipe pairs (a, b)
where r ati ngi (a) > r ati ngi (b), for at least 10 users i , and
over 50% of users who rat ed bot h recipe a and recipe b. Furt hermore, each user i should be an act ive enough reviewer
t o have rat ed at least 8 ot her recipes.
Feat ures. In t he predict ion dat aset , each observat ion
consist s of a set of predict or variables or feat ures t hat represent informat ion about two recipes, and t he response variable is a binary indicat or of which get s t he higher rat ing on
average. To st udy t he key aspect s of recipe informat ion, we
const ruct ed di erent set of feat ures, including:
• Baseline: T his includes cooking met hods, such as chopping, marinat ing, or grilling, and cooking e ort descript ors, such as preparat ion t ime in minut es, as well
as t he number of servings produced, et c. T hese feat ures are considered as primary informat ion about a
recipe and will be included in all ot her feat ure set s
described below.
• Full ingredient s: We select ed up t o 1000 popular ingredient s t o build a “ full ingredient list ”. In t his feat ure
set , each observed recipe pair cont ains a vect or wit h
ent ries indicat ing whet her an ingredient from t he full
list is present in eit her recipe in t he pair.
• Nut rit ion: T his feat ure set does not include any ingredient s but only nut rit ion informat ion such t he t ot al
caloric cont ent , as well as quant it ies of fat s, carbohydrat es, et c.
• Ingredient networks: In t his set , we replaced t he full
ingredient list by st ruct ural informat ion ext ract ed from
di erent ingredient networks, as described in Sect ions 4
and 5.3. Co-occurrence is t reat ed separat ely as a raw
count , and a complement arity, capt ured by t he PM I.
• Combined set : Finally, a combined feat ure set is const ruct ed t o t est t he performance of a combinat ion of
feat ures, including baseline, nut rit ion and ingredient
networks.
To build t he ingredient network feat ure set , we ext ract ed
t he following two types of st ruct ural informat ion from t he
co-occurrence and subst it ut ion networks, as well as t he complement network derived from t he co-occurrence informat ion:
Network positions are calculat ed t o represent how a recipe’s
ingredient s occupy posit ions wit hin t he networks. Such posit ion measures are likely t o inform if a recipe cont ains any
“ popular” or “ unusual” ingredient s. To calculat e t he posit ion measures, we first calculat ed various network cent rality
Accuracy
F igure 8: P redict ion performance. T he nut rit ion
informat ion and ingredient networks are more e ect ive feat ures t han full ingredient s. T he ingredient
network feat ures lead t o impressive performance,
close t o t he best performance.
measures, including degree cent rality, betweenness cent rality, et c., from t he ingredient networks. A cent rality measure
can be represent ed as a vect or ~g where each ent ry indicat es
t he cent rality of an ingredient . T he network posit ion of a
recipe, wit h it s full ingredient list represent ed as a binary
vect or f~, can be summarized by ~gT · f~, i.e., an aggregat ed
cent rality measure based on t he cent rality of it s ingredient s.
Network communities provide informat ion about which
ingredient is more likely t o co-occur wit h a group of ot her
ingredient s in t he network. A recipe consist ing of ingredient s
t hat are frequent ly used wit h, complement ed by or subst it ut ed by cert ain groups may be predict ive of t he rat ings
t he recipe will receive. To obt ain t he network community
informat ion, we applied lat ent semant ic analysis (LSA ) on
recipes. We first fact orized each ingredient network, represent ed by mat rix W , using singular value decomposit ion
(SV D). In t he mat rix W , each ent ry Wi j indicat es whet her
ingredient i co-occurrs, complement s or subst it ues ingredient j .
Suppose Wk = Uk Σ k VkT is a rank-k approximat ion of W ,
we can t hen t ransform each recipe’s full ingredient list using
t he low-dimensional represent at ion, Σ −k 1 VkT f~, as community
informat ion wit hin a network. T hese low-dimensional vect ors, t oget her wit h t he vect ors of network posit ions, const it ut e t he ingredient network feat ures.
Learning met hod. We applied discriminat ive machine
learning met hods such as support vect or machines (SV M ) [2]
and st ochast ic gradient boost ing t rees [5] t o our predict ion
problem. Here we report and discuss t he det ailed result s
based on t he gradient boost ing t ree model. Like SV M , t he
gradient boost ing t ree model seeks a paramet erized classifier, but unlike SV M t hat considers all t he feat ures at one
t ime, t he boost ing t ree model considers a set of feat ures
at a t ime and it erat ively combines t hem according t o t heir
empirical errors. In pract ice, it not only has compet it ive
performance comparable t o SV M , but can serve as a feat ure
ranking procedure [11].
In t his work, we fit t ed a st ochast ic gradient boost ing t ree
model wit h 8 t erminal nodes under an exponent ial loss funct ion. T he dat aset is roughly balanced in t erms of which
recipe is t he higher-rat ed one wit hin a pair. We randomly
group
1.0
1.0
nutrition
nutrition (6.5%)
carbs (20.9%)
cook effort (5.0%)
0.8
cholesterol (17.7%)
0.8
ing. networks (84%)
calories (19.7%)
cook methods (3.9%)
importance
importance
sodium (16.8%)
0.6
0.4
0.2
fiber (12.3%)
fat (12.4%)
0.4
0.2
0.0
0.0
20
40
60
80
100
feature
network
0.7
substitution (39.8%)
co−occurrence (30.9%)
0.6
complement (29.2%)
0.5
0.4
0.3
0.2
0.1
0.0
20
40
60
80
100
feature
F igure 10: R elat ive import ance of feat ures represent ing t he network st ruct ure. T he subst it ut ion net work has t he st rongest cont ribut ion ( 39.8%) t o t he
t ot al import ance of network feat ures, and it also has
more influent ial feat ures in t he t op 100 list , which
suggest s t hat t he subst it ut ion network is complement ary t o ot her feat ures.
divided t he dat aset int o a t raining set (2/ 3) and a t est ing
set (1/ 3). T he predict ion performance is evaluat ed based on
accuracy, and t he feat ure performance is evaluat ed in t erms
of relat ive import ance [8]. For each single decision t ree, one
of t he input variables, x j , is used t o part it ion t he region associat ed wit h t hat node int o two subregions in order t o fit
t o t he response values. T he squared relat ive import ance of
variable x j is t he sum of such squared improvement s over
all int ernal nodes for which it was chosen as t he split t ing
variable, as:
î 2k I (split s on x j )
i mp(j ) =
2
4
6
8
10
12
feature
F igure 9: R elat ive import ance of feat ures in t he
combined set . T he individual it ems from nut rit ion informat ion are very indicat ive in di erent iat ing highly rat ed recipes, while most of t he predict ion
power comes from ingredient networks.
importance
0.6
k
where î 2k is t he empirical improvement by t he k-t h node
split t ing on x j at t hat point .
6.2 Results
T he overall predict ion performance is shown in Fig. 8.
Surprisingly, even wit h a full list of ingredient s, t he predict ion accuracy is only improved from .712 (baseline) t o
F igure 11: R elat ive import ance of feat ures from nut rit ion informat ion. T he carbs it em is t he most influent ial feat ure in predict ing higher-rat ed recipes.
.746. In cont rast , t he nut rit ion informat ion and ingredient
networks are more e ect ive (wit h accuracy .753 and .786, respect ively). Bot h of t hem have much lower dimensions (from
t ens t o several hundreds), compared wit h t he full ingredient s
t hat are represent ed by more t han 2000 dimensions (1000
ingredient s per recipe in t he pair). T he ingredient network
feat ures lead t o impressive performance, close t o t he best
performance given by t he combined set (.792), indicat ing
t he power of network st ruct ures in recipe recommendat ion.
Figure 9 shows t he influence of di erent feat ures in t he
combined feat ure set . Up t o 100 feat ures wit h t he highest
relat ive import ance are shown. T he import ance of a feat ure
group is summarized by how much t he t ot al import ance is
cont ribut ed by all feat ures in t he set . For example, t he
baseline consist ing of cooking e ort and cooking met hods
cont ribut e 8.9% t o t he overall performance. T he individual
it ems from nut rit ion informat ion are very indicat ive in di erent iat ing highly-rat ed recipes, while most of t he predict ion
power comes from ingredient networks (84%).
Figure 10 shows t he t op 100 feat ures from t he t hree net works. In t erms of t he t ot al import ance of ingredient net work feat ures, t he subst it ut ion network has slight ly st ronger
cont ribut ion (39.8%) t han t he ot her two networks, and it
also has more influent ial feat ures in t he t op 100 list . T his
suggest s t hat t he st ruct ural informat ion ext ract ed from t he
subst it ut ion network is not only import ant but also complement ary t o informat ion from ot her aspect s.
Looking int o t he nut rit ion informat ion (Fig. 11), we found
t hat carbohydrat es are t he most influent ial feat ure in predict ing higher-rat ed recipes. Since carbohydrat es comprise
around 50% or more of t ot al calories, t he high import ance
of t his feat ure int erest ingly suggest s t hat a recipe’s rat ing
can be influenced by users’ concerns about nut rit ion and
diet . A not her int erest ing observat ion is t hat , while individual nut rit ion it ems are powerful predict ors, a higher predict ion accuracy can be reached by using ingredient networks
alone, as shown in Fig. 8. T his implies t he informat ion
about nut rit ion may have been encoded in t he ingredient
network st ruct ure, e.g. subst it ut ions of less healt hful ingredient s wit h “ healt hier” alt ernat ives.
Const ruct ing t he ingredient network feat ure involves reducing high-dimensional network informat ion t hrough SV D,
as described in t he previous sect ion. T he dimensionality can
be det ermined by cross-validat ion. A s shown in Fig. 12, feat ures wit h a very large dimension t end t o overfit t he t raining
In Figure 13 we show t he most represent at ive ingredient s
in t he decomposed mat rix derived from t he subst it ut ion net work. We display t he t op five influent ial dimensions, evaluat ed based on t he relat ive import ance, from t he SV D result ant mat rix Vk , and in each of t hese dimensions we ext ract ed six represent at ive ingredient s based on t heir int ensit ies in t he dimension (t he squared ent ry values). T hese
represent at ive ingredient s suggest t hat t he communit ies of
ingredient subst it ut es, such as t he sweet and oil subst it ut es
in t he first dimension or t he milk subst it ut es in t he second
dimesion (which is similar t o t he clust er shown in Fig. 6),
are part icularly informat ive in predict ing recipe rat ings.
To summarize our observat ions, we find we are able t o
e ect ively predict users’ preference for a recipe, but t he predict ion is not t hrough using a full list of ingredient s. Inst ead,
by using t he st ruct ural informat ion ext ract ed from t he relat ionships among ingredient s, we can bet t er uncover users’
preference about recipes.
0.80
●
●
0.79
●
Accuracy
●
0.78
●
●
●
network
0.77
● combined
substitution
complement
co−occurrence
7. CONCLUSI ON
0.76
10
20
30
40
50
60
70
Dimensions
F igure 12: P redict ion performance over reduced dimensionality. T he best performance is given by reduced dimension k = 50 when combining all t hree
networks. I n addit ion, using t he informat ion about
t he complement network alone is more e ect ive in
predict ion t han using ot her two networks.
Color Key
svd dimension
82
−0.5
433
0.5
Value
splenda
olive oil
applesauce
honey
butter
brown sugar
milk
half and half
chicken broth
buttermilk
sour cream
evaporated milk
vanilla extract
vanilla
kale
almond
beef
cream of chicken soup
almond extract
chocolate pudding
lemon extract
lime juice
walnut
coconut extract
turkey
chicken
sausage
italian sausage
pork
chicken breast
194
65
splenda
olive oil
applesauce
honey
butter
brown sugar
milk
half and half
chicken broth
buttermilk
sour cream
evaporated milk
vanilla extract
vanilla
kale
almond
beef
cream of chicken soup
almond extract
chocolate pudding
lemon extract
lime juice
walnut
coconut extract
turkey
chicken
sausage
italian sausage
pork
chicken breast
6
19
43
8
4
F igure 13: I nfluent ial subst it ut ion communit ies.
T he mat rix shows t he most influent ial feat ure dimensions ext ract ed from t he subst it ut ion network.
For each dimension, t he six represent at ive ingredient s wit h t he highest int ensity values are shown,
wit h colors indicat ing t heir int ensity. T hese feat ures
suggest t hat t he communit ies of ingredient subst it ut es, such as t he sweet and oil in t he first dimension, are part icularly informat ive in predict ion.
dat a. Hence we chose k = 50 for t he reduced dimension of
all t hree networks. T he figure also shows t hat using t he
informat ion about t he complement network alone is more
e ect ive in predict ion t han using eit her t he co-occurrence
and subst it ut e networks, even in t he case of low dimensions. Consist ent ly, as shown in t erms of relat ive import ance
(Fig. 10), t he subst it ut ion network alone is not t he most effect ive, but it provides more complement ary informat ion in
t he combined feat ure set .
Color Key
−0.5
Value
ingredient
41
Recipes are lit t le more t han inst ruct ions for combining
and processing set s of ingredient s. Individual cookbooks,
even t he most expansive ones, cont ain single recipes for each
dish. T he web, however, permit s collaborat ive recipe generat ion and modificat ion, wit h t ens of t housands of recipes
cont ribut ed in individual websit es. We have shown how t his
dat a can be used t o glean insight s about regional preferences
and modifiability of individual ingredient s, and also how it
can be used t o const ruct two kinds of networks, one of ingredient complement s, t he ot her of ingredient subst it ut es.
T hese networks encode which ingredient s go well t oget her,
and which can be subst it ut ed t o obt ain superior result s, and
permit one t o predict , given a pair of relat ed recipes, which
one will be more highly rat ed by users.
In fut ure work, we plan t o ext end ingredient networks t o
incorporat e t he cooking met hods as well. It would also be
of int erest t o generat e region-specific and diet -specific rat ings, depending on t he users’ background and preferences.
A whole host of user-int erface feat ures could be added for
users who are int eract ing wit h recipes, whet her t he recipe
is newly submit t ed, and hence unrat ed, or whet her t hey are
browsing a cookbook. In addit ion t o aut omat ically predict ing a rat ing for t he recipe, one could flag ingredient s t hat
can be omit t ed, ones whose quant ity could be tweaked, as
well as suggest ed addit ions and subst it ut ions.
8. ACKNOWLEDGM ENTS
T his work was support ed by M URI award FA 9550-08-10265 from t he A ir Force O ce of Scient ific Research. T he
met hodology used in t his paper was developed wit h support from funding from t he A rmy Research O ce, M ult iUniversity Research Init iat ive on M easuring, Underst anding, and Responding t o Covert Social Networks: Passive and
A ct ive Tomography. T he aut hors grat efully acknowledge D.
Lazer for support .
9. REFERENCES
[1] A hn, Y ., A hnert , S., Bagrow, J., and Barabasi, A .
Flavor network and t he principles of food pairing.
Bulletin of the American Physical Society 56 (2011).
[2] Cort es, C., and Vapnik, V . Support -vect or networks.
Machine learning 20, 3 (1995), 273–297.
[3] Forbes, P., and Zhu, M . Cont ent -boost ed mat rix
fact orizat ion for recommender syst ems: Experiment s
wit h recipe recommendat ion. Proceedings of
Recommender Systems (2011).
[4] Freyne, J., and Berkovsky, S. Int elligent food
planning: personalized recipe recommendat ion. In I UI ,
ACM (2010), 321–324.
[5] Friedman, J. St ochast ic gradient boost ing.
Computational Statistics & Data Analysis 38, 4
(2002), 367–378.
[6] Friedman, J., Hast ie, T ., and T ibshirani, R. A ddit ive
logist ic regression: a st at ist ical view of boost ing.
Annals of Statistics 28 (1998), 2000.
[7] Geleijnse, G., Nacht igall, P., van K aam, P., and
W ijgergangs, L. A personalized recipe advice syst em
t o promot e healt hful choices. In I UI , ACM (2011),
437–438.
[8] Hast ie, T ., T ibshirani, R., Friedman, J., and Franklin,
J. T he element s of st at ist ical learning: dat a mining,
inference and predict ion. T he Mathematical
I ntelligencer 27, 2 (2005).
[9] K amiet h, F., Braun, A ., and Schlehuber, C. A dapt ive
implicit int eract ion for healt hy nut rit ion and food
int ake supervision. Human-Computer I nteraction.
Towards Mobile and I ntelligent I nteraction
Environments (2011), 205–212.
[10] K inouchi, O., Diez-Garcia, R., Holanda, A .,
Zambianchi, P., and Roque, A . T he non-equilibrium
nat ure of culinary evolut ion. New Journal of Physics
10 (2008), 073020.
[11] Lu, Y ., Peng, F., Li, X ., and A hmed, N. Coupling
feat ure select ion and machine learning met hods for
navigat ional query ident ificat ion. In CI K M, ACM
(2006), 682–689.
[12] Rombauer, I., Becker, M ., Becker, E., and M aest ro, L.
Joy of cooking. Scribner Book Company, 1997.
[13] Rosvall, M ., and Bergst rom, C. M aps of random walks
on complex networks reveal community st ruct ure.
PNAS 105, 4 (2008), 1118.
[14] Shidochi, Y ., Takahashi, T ., Ide, I., and M urase, H.
Finding replaceable mat erials in cooking recipe t ext s
considering charact erist ic cooking act ions. In Proc. of
the ACM multimedia 2009 workshop on Multimedia
for cooking and eating activities, ACM (2009), 9–14.
[15] Svensson, M ., Höök, K ., and Cöst er, R. Designing and
evaluat ing kalas: A social navigat ion syst em for food
recipes. ACM Transactions on Computer-Human
I nteraction (T OCHI ) 12, 3 (2005), 374–400.
[16] Ueda, M ., Takahat a, M ., and Nakajima, S. User’s food
preference ext ract ion for personalized cooking recipe
recommendat ion. Proc. of the Second Workshop on
Semantic Personalized I nformation Management:
Retrieval and Recommendation (2011).
[17] Wang, L., Li, Q., Li, N., Dong, G., and Yang, Y .
Subst ruct ure similarity measurement in chinese
recipes. In WWW, ACM (2008), 979–988.
[18] W ikipedia. Out line of food preparat ion, 2011. [Online;
accessed 22-Oct -2011].
[19] Zhang, Q., Hu, R., M ac Namee, B., and Delany, S.
Back t o t he fut ure: K nowledge light case base cookery.
In Proc. of T he 9th European Conference on
Case-Based Reasoning Workshop (2008), 15.