Recipe recommendation using ingredient networks
Transcription
Recipe recommendation using ingredient networks
Recipe recommendation using ingredient networks Chun-Yuen Teng Yu-Ru Lin Lada A. Adamic School of Information University of Michigan Ann Arbor, MI, USA IQSS, Harvard University CCS, Northeastern University Boston, MA School of Information University of Michigan Ann Arbor, MI, USA chunyuen@umich.edu yuruliny@gmail.com ABSTRACT T he recording and sharing of cooking recipes, a human act ivity dat ing back t housands of years, nat urally became an early and prominent social use of t he web. T he result ing online recipe collect ions are reposit ories of ingredient combinat ions and cooking met hods whose large-scale and variety yield int erest ing insight s about bot h t he fundament als of cooking and user preferences. At t he level of an individual ingredient we measure whet her it t ends t o be essent ial or can be dropped or added, and whet her it s quant ity can be modified. We also const ruct two types of networks t o capt ure t he relat ionships between ingredient s. T he complement network capt ures which ingredient s t end t o co-occur frequent ly, and is composed of two large communit ies: one savory, t he ot her sweet . T he subst it ut e network, derived from user-generat ed suggest ions for modificat ions, can be decomposed int o many communit ies of funct ionally equivalent ingredient s, and capt ures users’ preference for healt hier variant s of a recipe. Our experiment s reveal t hat recipe rat ings can be well predict ed wit h feat ures derived from combinat ions of ingredient net works and nut rit ion informat ion. Categoriesand Subject Descriptors H.2.8 [D at abase M anagement ]: Dat abase applicat ions— Data mining General Terms M easurement ; Experiment at ion Keywords ingredient networks, recipe recommendat ion 1. I NTRODUCTI ON T he web enables individuals t o collaborat ively share knowledge and recipe websit es are one of t he earliest examples of collaborat ive knowledge sharing on t he web. A llrecipes.com, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. WebSci 2012, June 22–24, 2012, Evanston, Illinois, USA. Copyright 2012 ACM 978-1-4503-1228-8.…$10.00 ladamic@umich.edu t he subject of our present st udy, was founded in 1997, years ahead of ot her collaborat ive websit es such as t he W ikipedia. Recipe sit es t hrive because individuals are eager t o share t heir recipes, from family recipes t hat had been passed down for generat ions, t o new concoct ions t hat t hey creat ed t hat aft ernoon, having been mot ivat ed in part by t he ability t o share t he result online. Once shared, t he recipes are implement ed and evaluat ed by ot her users, who supply rat ings and comment s. T he desire t o look up recipes online may at first appear odd given t hat t ombs of print ed recipes can be found in almost every kit chen. T he Joy of Cooking [12] alone cont ains 4,500 recipes spread over 1,000 pages. T here is, however, subst ant ial addit ional value in online recipes, beyond t heir accessibility. W hile t he Joy of Cooking cont ains a single recipe for Swedish meat balls, A llrecipes.com host s “ Swedish M eat balls I”, “ I I”, and “ I I I”, submit t ed by di erent users, along wit h 4 ot her variant s, including “ T he A mazing Swedish M eat ball”. Each variant has been reviewed, from 329 reviews for “ Swedish M eat balls I” t o 5 reviews for “ Swedish M eat balls I I I”. T he reviews not only provide a crowd-sourced ranking of t he di erent recipes, but also many suggest ions on how t o modify t hem, e.g. using ground t urkey inst ead of beef, skipping t he “ cream of wheat ” because it is rarely on hand, et c. T he wealt h of informat ion capt ured by online collaborat ive recipe sharing sit es is revealing not only of t he fundament als of cooking, but also of user preferences. T he cooccurrence of ingredient s in t ens of t housands of recipes provides informat ion about which ingredient s go well t oget her, and when a pairing is unusual. Users’ reviews provide clues as t o t he flexibility of a recipe, and t he ingredient s wit hin it . Can t he amount of cinnamon be doubled? Can t he nut meg be omit t ed? If one is lacking a cert ain ingredient , can a subst it ut e be found among supplies at hand wit hout a t rip t o t he grocery st ore? Unlike cookbooks, which will cont ain vet t ed but perhaps not t he best variant s for some individuals’ t ast es, rat ings assigned t o user-submit t ed recipes allow for t he evaluat ion of what works and what does not . In t his paper, we seek t o dist ill t he collect ive knowledge and preference about cooking t hrough mining a popular recipe-sharing websit e. To ext ract such informat ion, we first parse t he unst ruct ured t ext of t he recipes and t he accompanying user reviews. We const ruct two types of networks t hat reflect di erent relat ionships between ingredient s, in order t o capt ure users’ knowledge about how t o combine ingredient s. T he complement network capt ures which ingredient s t end t o co-occur frequent ly, and is composed of two large communit ies: one savory, t he ot her sweet . T he subst it ut e network, derived from user-generat ed suggest ions for modificat ions, can be decomposed int o many communit ies of funct ionally equivalent ingredient s, and capt ures users’ preference for healt hier variant s of a recipe. Our experiment s reveal t hat recipe rat ings can be well predict ed by feat ures derived from combinat ions of ingredient networks and nut rit ion informat ion (wit h accuracy .792), while most of t he predict ion power comes from t he ingredient networks (84%). T he rest of t he paper is organized as follows. Sect ion 2 reviews t he relat ed work. Sect ion 3 describes t he dat aset . Sect ion 4 discusses t he ext ract ion of t he ingredient and complement networks and t heir charact erist ics. Sect ion 5 present s t he ext ract ion of recipe modificat ion informat ion, as well as t he const ruct ion and charact erist ics of t he ingredient subst it ut e network. Sect ion 6 present s our experiment s on recipe recommendat ion and Sect ion 7 concludes. 2. RELATED WORK Recipe recommendat ion has been t he subject of much prior work. T ypically t he goal has been t o suggest recipes t o users based on t heir past recipe rat ings [15][3] or browsing/ cooking hist ory [16]. T he algorit hms t hen find similar recipes based on overlapping ingredient s, eit her t reat ing each ingredient equally [4] or by ident ifying key ingredient s [19]. Inst ead of modeling recipes using ingredient s, Wang et al. [17] represent t he recipes as graphs which are built on ingredient s and cooking direct ions, and t hey demonst rat e t hat graph represent at ions can be used t o easily aggregat e Chinese dishes by t he flow of cooking st eps and t he sequence of added ingredient s. However, t heir approach only models t he occurrence of ingredient s or cooking met hods, and doesn’t t ake int o account t he relat ionships between ingredient s. In cont rast , in t his paper we incorporat e t he likelihood of ingredient s t o co-occur, as well as t he pot ent ial of one ingredient t o act as a subst it ut e for anot her. A not her branch of research has focused on recommending recipes based on desired nut rit ional int ake or promot ing healt hy food choices. Geleijnse et al. [7] designed a prot otype of a personalized recipe advice syst em, which suggest s recipes t o users based on t heir past food select ions and nut rit ion int ake. In addit ion t o nut rit ion informat ion, K amiet h et al. [9] built a personalized recipe recommendat ion syst em based on availability of ingredient s and personal nut rit ional needs. Shidochi et al. [14] proposed an algorit hm t o ext ract replaceable ingredient s from recipes in order t o sat isfy users’ various demands, such as calorie const raint s and food availability. T heir met hod ident ifies subst it ut able ingredient s by mat ching t he cooking act ions t hat correspond t o ingredient names. However, t heir assumpt ion t hat subst it ut able ingredient s are subject t o t he same processing met hods is less direct and specific t han ext ract ing subst it ut ions direct ly from user-cont ribut ed suggest ions. A hn et al. [1] and K inouchi et al [10] examined networks involving ingredient s derived from recipes, wit h t he former modeling ingredient s by t heir flavor bonds, and t he lat t er examining t he relat ionship between ingredient s and recipes. In cont rast , we derive direct ingredient -ingredient networks of bot h compliment s and subst it ut es. We also st ep beyond charact erizing t hese networks t o demonst rat ing t hat t hey can be used t o predict which recipes will be successful. 3. DATASET A llrecipes.com is one of t he most popular recipe-sharing websit es, where novice and expert cooks alike can upload and rat e cooking recipes. It host s 16 cust omized int ernat ional sit es for users t o share t heir recipes in t heir nat ive languages, of which we st udy only t he main, English, version. Recipes uploaded t o t he sit e cont ain specific inst ruct ions on how t o prepare a dish: t he list of ingredient s, preparat ion st eps, preparat ion and cook t ime, t he number of servings produced, nut rit ion informat ion, serving direct ions, and phot os of t he prepared dish. T he uploaded recipes are enriched wit h user rat ings and reviews, which comment on t he quality of t he recipe, and suggest changes and improvement s. In addit ion t o rat ing and comment ing on recipes, users are able t o save t hem as favorit es or recommend t hem t o ot hers t hrough a forum. We downloaded 46,337 recipes including all informat ion list ed from allrecipes.com, including several classificat ions, such as a region (e.g. t he midwest region of US or Europe), t he course or meal t he dish is appropriat e for (e.g.: appet izers or breakfast ), and any holidays t he dish may be associat ed wit h. In order t o underst and users’ recipe preferences, we crawled 1,976,920 reviews which include reviewers’ rat ings, review t ext , and t he number of users who vot ed t he review as useful. 3.1 Data preprocessing T he first st ep in processing t he recipes is ident ifying t he ingredient s and cooking met hods from t he freeform t ext of t he recipe. Usually, alt hough not always, each ingredient is list ed on a separat e line. To ext ract t he ingredient s, we t ried two approaches. In t he first , we found t he maximal mat ch between a pre-curat ed list of ingredient s and t he t ext of t he line. However, t his missed t oo many ingredient s, while misident ifying ot hers. In t he second approach, we used regular expression mat ching t o remove non-ingredient t erms from t he line and ident ified t he remainder as t he ingredient . We removed quant ifiers, such as e.g. “ 1 lb” or “ 2 cups”, words referring t o consist ency or t emperat ure, e.g. chopped or cold, along wit h a few ot her heurist ics, such as removing cont ent in parent heses. For example“ 1 (28 ounce) can baked beans (such as Bush’s Original R )” is ident ified as “ baked beans”. By limit ing t he list of pot ent ial t erms t o remove from an ingredient ent ry, we erred on t he side of not conflat ing pot ent ially ident ical or highly similar ingredient s, e.g. “ cheddar cheese”, used in 2450 recipes, was considered di erent from “ sharp cheddar cheese”, occurring in 394 recipes. We t hen generat ed an ingredient list sort ed by frequency of ingredient occurrence and select ed t he t op 1000 common ingredient names as our finalized ingredient list . Each of t he t op 1000 ingredient s occurred in 23 or more recipes, wit h plain salt making an appearance in 47.3% of recipes. T hese ingredient s also account ed for 94.9% of ingredient ent ries in t he recipe dat aset . T he remaining ingredient s were missed eit her because of high specificity (e.g. yolk-free egg noodle), referencing brand names (e.g. Plant ers almonds), rarity (e.g. serviceberry), misspellings, or not being a food (e.g. “ nylon net t ing”). T he remaining processing t ask was t o ident ify cooking processes from t he direct ions. We first ident ified all heat ing met hods using a list ing in t he W ikipedia ent ry on cooking [18]. For example, baking, boiling, and st eaming are all ways 40 20 0 10 % in recipes 30 midwest mountain northeast west coast south bake boil fry grill roast simmer marinate method F igure 1: T he percent age of recipes by region t hat apply a specific heat ing met hod. of heat ing t he food. We t hen ident ified mechanical ways of processing t he food such as chopping and grinding, and ot her chemical t echniques such as marinat ing and brining. 3.2 Regional preferences Choosing one cooking met hod over anot her appears t o be a quest ion of regional t ast e. 5.8% of recipes were classified int o one of five US regions: M ount ain, M idwest , Nort heast , Sout h, and West Coast (including A laska and Hawaii). Figure 1 shows significant ly ( 2 t est p-value < 0.001) varying preferences in t he di erent US regions among 6 of t he most popular cooking met hods. Boiling and simmering, bot h involving heat ing food in hot liquids, are more common in t he Sout h and M idwest . M arinat ing and grilling are relat ively more popular in t he West and M ount ain regions, but in t he West more grilling recipes involve seafood (18/ 42 = 42%) relat ive t o ot her regions combined (7/ 106 = 6%). Frying is popular in t he Sout h and Nort heast . Baking is a universally popular and versat ile t echnique, which is oft en used for bot h sweet and savory dishes, and is slight ly more popular in t he Nort heast and M idwest . Examinat ion of individual recipes reflect ing t hese frequencies shows t hat t hese di erences in preference can be t ied t o di erences in demographics, immigrant cult ure and availability of local ingredient s, e.g. seafood. 4. I NGREDI ENT COM PLEM ENT NETWORK Can we learn how t o combine ingredient s from t he dat a? Here we employ t he occurrences of ingredient s across recipes t o dist ill users’ knowledge about combining ingredient s. We const ruct ed an ingredient complement network based on pointwise mut ual informat ion (PM I) defined on pairs of ingredient s (a, b): PM I(a, b) = l og p(a, b) , p(a)p(b) where p(a, b) = # of recipes cont aining a and b , # of recipes p(a) = # of recipes cont aining a , # of recipes p(b) = # of recipes cont aining b . # of recipes T he PM I gives t he probability t hat two ingredient s occur t oget her against t he probability t hat t hey occur separat ely. Complement ary ingredient s t end t o occur t oget her far more oft en t han would be expect ed by chance. Figure 2 shows a visualizat ion of ingredient complement arity. T wo dist inct subcommunit ies of recipes are immediat ely apparent : one corresponding t o savory dishes, t he ot her t o sweet ones. Some cent ral ingredient s, e.g. egg and salt , act ually are pushed t o t he periphery of t he network. T hey are so ubiquit ous, t hat alt hough t hey have many edges, t hey are all weak, since t hey don’t show part icular complement arity wit h any single group of ingredient s. We furt her probed t he st ruct ure of t he complement arity network by applying a network clust ering algorit hm [13]. T he algorit hm confirmed t he exist ence of two main clust ers cont aining t he vast majority of t he ingredient s. A n int erest ing sat ellit e clust er is t hat of mixed drink ingredient s, which is evident as a const ellat ion of small nodes locat ed near t he t op of t he sweet clust er in Figure 2. T he clust er includes t he following ingredient s: lime, rum, ice, orange, pineapple juice, vodka, cranberry juice, lemonade, t equila, et c. For each recipe we recorded t he minimum, average, and maximum pairwise pointwise mut ual informat ion between ingredient s. T he int uit ion is t hat complement ary ingredient s would yield higher rat ings, while ingredient s t hat don’t go t oget her would lower t he average rat ing. We found t hat while t he average and minimum pointwise mut ual informat ion between ingredient s is uncorrelat ed wit h rat ings, t he maximum is very slight ly posit ively correlat ed wit h t he average rat ing for t he recipe (⇢ = 0.09, p-value < 10− 10 ). T his suggest s t hat having at least two complement ary ingredient s very slight ly boost s a recipe’s prospect s, but having clashing or unrelat ed ingredient s does not seem t o do harm. 5. RECI PE M ODI FI CATI ONS Co-occurrence of ingredient s aggregat ed over individual recipes reveals t he st ruct ure of cooking, but t ells us lit t le about how flexible t he ingredient proport ions are, or whet her some ingredient s could easily be left out or subst it ut ed. A n experienced cook may know t hat apple sauce is a low-fat alt ernat ive t o oil, or may know t hat nut meg is oft en opt ional, but a novice cook may implement recipes lit erally, afraid t hat deviat ing from t he inst ruct ions may produce poor result s. W hile a t radit ional hardcopy cookbook would provide few such hint s, t hey are plent iful in t he reviews submit t ed by users who implement ed t he recipes, e.g. “ T his is a great recipe, but using fresh tomatoes only adds a few minutes to the prep time and makes it taste so much better”, or anot her comment about t he same salsa recipe“ T his is by far the best recipe we have ever come across. We did however change it just a little bit by adding extra onion.” A s t he examples illust rat e, modificat ions are report ed even when t he user likes t he recipe. In fact , we found t hat 60.1% of recipe reviews cont ain words signaling modificat ion, such as “ add”, “ omit ”, “ inst ead”, “ ext ra” and 14 ot hers. Furt hermore, it is t he reviews t hat include changes t hat have a st at ist ically higher average rat ing (4.49 vs. 4.39, t -t est p-value < 10− 10 ), and lower rat ing variance (0.82 vs. 1.05, Bart let t t est p-value < 10− 10 ), as is evident in t he dist ribut ion of rat ings, shown in Fig. 3. T his suggest s t hat flexibility in recipes is not necessarily a bad t hing, and t hat reviewers who don’t ment ion modificat ions are more likely t o t hink of t he recipe as perfect , or t o dislike it ent irely. tiger prawn lobster tail sea salt black pepper artichoke greek yogurt kosher salt black pepper root beer white mushroom haddock button mushroom goat cheese salt black pepper port wine watercres sea scallop triple sec sour mix sweet white rum club soda butter cranberry juice ice pomegranate juice pink lemonade banana liqueur shallot juiced tequila smoked ham chocolate ice cream asparagus brie cheese watermelon hazelnut orange juice eggnog maraschino cherry juice lemon juice juice angel food cake mix superfine sugar plum white chocolate artificial sweetener semisweet chocolate chocolate coffee cake flour raspberry jam hazelnut liqueur cocoa powder almond paste creme de menthe liqueur milk chocolate vanilla wafer peach cantaloupe pie shell pistachio nut bourbon whiskey vanilla yogurt blackberry fig golden syrup kiwi banana pear prune chocolate wafer candied cherry red candied cherry apricot jam apple juice raspberry gelatin mix currant orange gelatin strawberry gelatin mix tapioca confectioners' sugar walnut coconut raisin whipped topping peppermint candy turbinado sugar pie crust cream of tartar german chocolate flour baking soda strawberry preserve yellow food coloring green candied cherry pistachio pudding mix candied pineapple coffee powder vanilla extract semisweet chocolate chip maple extract white chip chocolate chip devil's food cake mix vanilla frosting low fat peanut butter chocolate cookie crust lemon gelatin mix crisp rice cereal unpie crust unbleached flour applesauce solid pack pumpkin tapioca flour fruit brownie mix flax seed sugar free vanilla pudding mix oat bran butterscotch pudding mix spice cake mix skim milk orange gelatin mix teriyaki sauce yeast sunflower kernel matzo meal pie filling barley nugget cereal wheat cream cheese wheat bran beaten egg sourdough starter non fat milk powder neufchatel cheese pretzel chocolate pudding baker's semisweet chocolate decorating gel cook low fat margarine brick cream cheese cornflakes cereal cornmeal crescent dinner roll pancake mix white rice pork imitation crab meat beer vegetable low fat cheddar cheese ranch dressing corn green bean spiral pasta salt italian dressing mix whole wheat bread cornflake olive kidney bean white corn vinegar biscuit baking mix ketchup pickle relish crescent roll butter cooking spray potato chip dill pickle bean barbeque sauce rye bread butter cracker green chile baby pea chili seasoning mix spicy pork sausage sausage brown mustard colby monterey jack cheese stuffing picante sauce turkey gravy cheese lean beef ranch bean macaroni taco seasoning taco sauce elbow macaroni kernel corn catalina dressing ham onion whole wheat tortilla tomato vegetable juice cocktail corn tortilla chip green chily mexican cheese blend butter bean stuffed olive egg noodle mild cheddar cheese colby cheese beef gravy cream of mushroom soup corn bread mix sour cream vidalia onion taco seasoning mix french onion soup processed cheese stuffing mix barbecue sauce cream corn biscuit mix bread stuffing mix buttermilk biscuit onion salt cream of chicken soup sourdough bread chili without bean tuna curd cottage cheese monterey jack cheese refried bean enchilada sauce lima bean garlic salt steak sauce yellow mustard mustard pimento pepper ranch dressing mix french dressing dill pickle relish sauerkraut corned beef thousand island dressing vegetable combination corn chip tortilla chip pickled jalapeno pepper guacamole beef chuck powder chunk chicken breast pepperjack cheese kaiser roll pimento pickle bacon grease hoagie roll corkscrew shaped pasta tomato juice flour tortilla salsa english muffin blue cheese dressing pepperoni sausage pizza sauce chili bean mixed vegetable onion flake seasoning salt pimiento onion soup mix pepper pizza crust bread dough chuck roast wax bean roast beef beef consomme wild rice mix corn tortilla brown gravy mix cream of potato soup dill pickle juice saltine cracker biscuit bratwurst round steak golden mushroom soup sandwich roll white bread apple jelly baking mix black olive beef bouillon pinto bean parsley flake meat tenderizer vegetable soup mix crescent roll dough dressing marinara sauce spaghetti sauce salami pepperoni tomato sauce potato green bell pepper venison broccoli floweret cottage cheese liquid smoke cracker zesty italian dressing red kidney bean smoked sausage worcestershire sauce chicken spicy brown mustard part skim ricotta cheese lasagna noodle spaghetti salt free seasoning blend polish sausage swiss cheese provolone cheese seashell pasta bacon dripping steak onion separated vegetable cooking spray seasoning horseradish pork chop buttery round cracker italian bread italian salad dressing toothpick saltine adobo seasoning noodle great northern bean long grain salad green part skim mozzarella cheese louisiana hot sauce black bean navy bean cornbread beef brisket mexican corn pasta sauce basil sauce manicotti shell red bean seafood seasoning browning sauce chicken bouillon iceberg lettuce italian sauce ziti pasta meatless spaghetti sauce turkey breast lean turkey lettucechili sauce pizza crust dough cheese ravioli tomato mozzarella cheese white hominy baby carrot barley beef broth green pea poultry seasoning kielbasa sausage monosodium glutamate chile sauce alfredo sauce mild italian sausage pasta shell tube pasta tomato paste fajita seasoning beef stew meat paprika chili powder garlic turkey hot pepper sauce broccoli mayonnaise bacon bread old bay seasoning tm mustard powder country pork rib pasta alfredo pasta sauce italian sausage white potato nutritional yeast rump roast black eyed pea veal beef chuck roast long grain rice lean pork beef round steak sugar based curing mixture crouton dill seed bagel grape jelly creole seasoning okra red potato smoked paprika pepper jack cheese celery salt mixed nut buttermilk baking mix celery romano cheese rotini pasta banana pepper pimento stuffed green olive honey mustard cabbage cocktail rye bread herb stuffing mix popped popcorn pork shoulder roast chicken soup base lump crab meat hot sauce onion powder celery seed herb bread stuffing mix yellow summer squash caesar dressing lentil marjoram beef sirloin bacon bit cocktail sauce fat free sour cream pork sparerib miracle whip ‚Ñ potato flake yellow cornmeal milk margarine cereal egg candy dill ditalini pasta rigatoni pasta ricotta cheese pearl barley ham hock green salsa chive steak seasoning cider vinegar caraway seed chow mein noodle bread flour crab meat broiler fryer chicken up herb stuffing savory meatball jalapeno pepper cauliflower sirloin steak low fat sour cream oil flat iron steak pearl onion chicken leg quarter cod cauliflower floret lemon pepper seasoning oyster wild rice catfish apple cider vinegar white vinegar unpie shell lemon pepper pork loin chop water chestnut pickling spice yellow squash chorizo sausage fat free italian dressing beef stock cajun seasoning puff pastry shell fat free mayonnaise coleslaw mix distilled white vinegar rapid rise yeast vegetable bouillon hungarian paprika french bread parmesan cheese spinach artichoke heart russet potato flounder caulifloweret pork loin roast molasse yellow onion yellow pepper poblano pepper crawfish tail radishe low fat mayonnaise fat free cream cheese vital wheat gluten italian cheese blend cannellini bean burgundy wine pork shoulder broccoli floret beet green beans snapped chicken liver white onion beef sirloin steak green chile pepper cheese tortellini fusilli pasta fat free chicken broth marinated artichoke heart andouille sausage white cheddar cheese chicken wing giblet chicken bouillon powder white wine vinegar romaine beef short rib cumin cayenne pepper sage cooking oil mustard seed salmon steak pumpkin seed rye flour bread machine yeast oatmeal nilla wafer vanilla blue cheese half and half unpastry shell topping garlic paste egg roll wrapper egg substitute non fat yogurt cooking spray sunflower seed whole wheat flour powdered milk sugar cookie mix food coloring ramen noodle fat free evaporated milk chutney black pepper french baguette white bean chicken breastmushroom scallion chicken thigh sesame seed cashew softened butter cherry gelatin milk powder brown sugar crispy rice cereal butterscotch chip firmly brown sugar low fat yogurt red grape fat free yogurt basil pesto pre pizza crust oregano chicken broth italian seasoning bay tomatillo avocado low fat canola oil dijon mustard acorn squash whole milk red apple pineapple chip peanut poppy seed lime gelatin mix wheat germ lemon gelatin baking apple peanut butter vanilla pudding german chocolate cake mix maple syrup tart apple anise seed turmeric garam masala lobster pumpkin pie spice caramel chocolate mix butter shortening ginger paste chicken ramen noodle mixed fruit caramel ice cream topping candy coated milk chocolate maple flavoring honey asafoetida powder green tomato whipped topping mix milk chocolate chip mango chutney pita bread chipotle pepper cucumber pesto escarole white kidney bean kale clam poblano chile pepper clam juice red pepper brown rice white pepper ginger garlic paste wonton wrapper serrano pepper green lettuce baby corn salad shrimp curry powder fruit gelatin mix apple pie spice marshmallow apple pie filling orange marmalade low fat cream cheese cranberry sauce lite whipped topping low fat whipped topping jellied cranberry sauce individually wrapped caramel candy coated chocolate marshmallow creme mixed berry rice flour lime gelatin recipe pastry whole wheat pastry flour chocolate cake mix cream of shrimp soup green grape ring pastry apple peppercorn green apple apricot preserve soy milk potato starch lemon cake mix lemon pudding mix nut lard peanut butter chip vegetable shortening toffee baking bit lemon peel lemon yogurt berry cranberry sauce sour milk pumpkin 1% buttermilk evaporated milk baking cocoa corn syrup apple butter milk chocolate candy kisse powdered fruit pectin cornstarch cinnamon water vegetable oil oat sugar persimmon pulp blueberry pie filling raspberry preserve cinnamon sugar raspberry gelatin allspice mace gingersnap cooky strawberry gelatin white cake mix cherry pie filling strawberry jam any fruit jam black walnut coconut extract shortening pecan anise extract orange peel yellow cake mix butterchocolate extract cookie bourbon baking chocolate rhubarb self rising flour graham cracker buttermilk date powdered non dairy creamer white chocolate chip chocolate pudding mix chocolate frosting candied citron fruit cocktail cinnamon red candy chocolate sandwich cooky lemon extract golden delicious apple chicken drum red lentil panko bread habanero pepper snow pea bamboo shoot low sodium soy sauce cumin seed fenugreek seed curry cooking sherry romaine lettuce fennel seed oyster sauce ghee spaghetti squash eggplant bow tie pasta plum tomato bell pepper thyme garbanzo bean farfalle pasta brown lentil bay scallop carrot gingerroot coriander seed bean sprout smoked salmon basmati rice anchovy angel hair pasta green olive chicken breast half chickpea rutabaga cream cheese spread vegetable stock parsley red wine red wine vinegar muenster cheese red snapper saffron thread herb prosciutto collard green green cabbage fish stock round fontina cheese basil zucchini vermicelli pasta asiago cheese linguine low sodium beef broth vegetable broth shrimp low sodium chicken broth pork loin beef flank steak hoisin sauce yogurt soy sauce cardamom fettuccini pasta parsnip pita bread round tarragon vinegar turnip cilantro coriander black peppercorn red chile pepper allspice berry sugar pumpkin cardamom pod splenda clove mandarin orange silken tofu peppermint extract hot red lettuce jalapeno chile pepper adobo sauce brussels sprout seed linguini pasta orzo pasta penne pasta roma tomato caper cherry tomato leek phyllo dough ears corn halibut sugar snap pea chipotle chile powder bok choy chinese five spice powder ginger grape walnut oil granny smith apple red delicious apple candied mixed fruit peel pastry shell salt pepper tofu napa cabbage rice wine vinegar stuffed green olive tarragon red pepper flake red cabbage sherry rice wine short grain rice raspberry vinegar sweet potato crystallized ginger apricot nectar golden raisin mixed spice food cake orangeangel extract rum extract mixed salad green apple cider apricot egg white pineapplecranberry vanilla pudding mix miso paste asian sesame oil rice vinegar white grape juice rose water balsamic vinaigrette dressing chicken stock pork tenderloin sesame oil fish sauce beef tenderloin saffron flank steak curry paste jasmine rice chile paste rice noodle grapefruit low fat milk orange zest nectarine pound cake gruyere cheese serrano chile pepper low fat cottage cheese red curry paste lemon gras peanut oil fettuccine pasta swiss chard creme fraiche pancetta bacon debearded squid lamb chile pepper pork roast kaffir lime butternut squash mirin ginger root coconut milk tahini spanish onion scallop mussel arborio rice rosemary red onion bulgur salmon portobello mushroom new potato red bell pepper linguine pasta tamari sake kalamata olive feta cheese asparagu marsala wine quinoa corn oil chili oil whipping cream graham cracker crust almond extract macadamia nut puff pastry cream lime peel baking powder nutmeg almond cocoa red food coloring green food coloring lime zest whiskey star anise pod strawberry mandarin orange segment cherry blueberry yam tea bag semolina flour raspberry key lime juice lemon zest brandy orange sherbet cola carbonated beverage heavy whipping cream gelatin grape juice cream of coconut amaretto liqueur vanilla ice cream chocolate syrup coffee liqueur ladyfinger lime juice mango skewer wooden skewer chicken leg portobello mushroom cap crimushroom yukon gold potato cracked black pepper shiitake mushroom jicama couscou heavy cream honeydew melon orange liqueur vanilla bean white sugar mascarpone cheese pine nut tilapia cornish game hen kosher salt papaya zested orange greek seasoning english cucumber coconut oil malt vinegar brandy based orange liqueur pineapple ring coconut cream egg yolk chocolate hazelnut spread irish cream liqueur bittersweet chocolate balsamic vinegar orange roughy lime mint sauce lemon lime carbonated beverage champagne sour cherry garlic gorgonzola cheese sea salt baby spinach grapefruit juice vodka lemonade pineapple juice spiced rum rum maraschino cherry vanilla vodka lemon simple syrup carbonated water fat free half and half white balsamic vinegar fennel irish stout beer italian parsley tuna steak vermouth gin limeade triple sec liqueur ginger ale butterscotch schnapp olive oil grape tomato chestnut leg of lamb melon liqueur coconut rum grenadine syrup lemon lime soda peach schnapp arugula white wine trout process cheese sauce cream of celery soup hamburger bun processed cheese food baking potato process american cheese pork sausage creamed corn canadian bacon chili sharp cheddar cheese french green bean cheddar cheese soup beef tomato soup processed american cheese cheddar cheese biscuit dough process cheese american cheese hash brown potato chunk chicken corn muffin mix tomato based chili sauce hot dog tater tot grit hot dog bun dinner roll F igure 2: I ngredient complement network. T wo ingredient s share an edge if t hey occur t oget her more t han would be expect ed by chance and if t heir pointwise mut ual informat ion exceeds a t hreshold. 0.6 0.1 0.2 0.3 0.4 0.5 no modification with modification 0.0 proportion of reviews with given rating luncheon meat 1 2 3 4 5 rating F igure 3: T he likelihood t hat a review suggest s a modificat ion t o t he recipe depends on t he st ar rat ing t he review is assigning t o t he recipe. In t he following, we describe t he recipe modificat ions ext ract ed from user reviews, including adjust ment , delet ion and addit ion. We t hen present how we const ruct ed an ingredient subst it ut e network based on t he ext ract ed informat ion. 5.1 Adjustments Some modificat ions involve increasing or decreasing t he amount of an ingredient in t he recipe. In t his and t he following analyses, we split t he review on punct uat ion such as commas and periods. We used simple heurist ics t o det ect when a review suggest ed a modificat ion: adding/ using more/ less of an ingredient count ed as an increase/ decrease. Doubling or increasing count ed as an increase, while reducing, cut t ing, or decreasing count ed as a decrease. W hile it is likely t hat t here are ot her expressions signaling t he adjust ment of ingredient quant it ies, using t his set of t erms allowed us t o compare t he relat ive rat e of modificat ion, as well as t he frequency of increase vs. decrease between ingredient s. T he ingredient s t hemselves were ext ract ed by performing a maximal charact er mat ch wit hin a window following an adjust ment t erm. Figure 4 shows t he rat ios of t he number of reviews suggest ing modificat ions, eit her increases or decreases, t o t he number of recipes t hat cont ain t he ingredient . T wo pat t erns are immediat ely apparent . Ingredient s t hat may be perceived as being unhealt hy, such as fat s and sugars, are, wit h t he except ion of veget able oil and margarine, more likely t o be modified, and t o be decreased. On t he ot her hand, flavor enhancers such as soy sauce, lemon juice, cinnamon, Worcest ershire sauce, and t oppings such as cheeses, bacon and mushrooms, are also likely t o be modified; however, t hey t end t o be added in great er, rat her t han lesser quant it ies. Combined, t he pat t erns suggest t hat good-t ast ing but “ unhealt hy” ingredient s can be reduced, if desired, while spices, ext ract s, and t oppings can be increased t o t ast e. 5.2 Deletionsand additions Recipes are also frequent ly modified such t hat ingredient s are omit t ed ent irely. We looked for words indicat ing t hat t he reviewer did not have an ingredient (and hence did not use it ), e.g. “ had no” and “ didn’t have”. We furt her used “ omit / left out / left o / bot her wit h” as indicat ion t hat t he reviewer had omit t ed t he ingredient s, pot ent ially for ot her reasons. Because reviewers oft en used simplified t erms, e.g. “ vanilla” inst ead of “ vanilla ext ract ”, we compared words in proximity t o t he act ion words by const ruct ing 4-charact ergrams and calculat ing t he cosine similarity between t he ngrams in t he review and t he list of ingredient s for t he recipe. To ident ify addit ions, we simply looked for t he word “ add”, but omit t ed possible subst it ut ions. For example, we would use “ added cucumber”, but not “ added cucumber inst ead of green pepper”, t he lat t er of which we analyze in t he following sect ion. We t hen compared t he addit ion t o t he list of ingredient s in t he recipes, and considered t he addit ion valid only if t he ingredient does not already belong in t he recipe. 1.00 0.50 0.20 0.10 0.05 (# reviews adjusting up)/(# recipes) garlic broth cheddarcinnamon chicken bacon chip honey mushroom parmesan chocolate cream cheese cornstarch worcestershire s. potato lemon juice garlic powder chicken breast milk carrot sour cream tomato flour brown sugar vanilla extract basil pecan nutmegwater butterwhite sugar onion celery mayonnaise sugar oregano cs’. sugar black pepper egg salt walnut baking powder pepper olive oil green bell pepper baking soda parsley shortening vegetable oil 0.01 0.02 soy sauce margarine 0.01 0.02 0.05 0.10 0.20 0.50 1.00 (# reviews adjusting down)/(# recipes) F igure 4: Suggest ed modificat ions of quant ity for t he 50 most common ingredient s, derived from recipe reviews. T he line denot es equal numbers of suggest ed quant ity increases and decreases. Table 1 shows t he correlat ion between ingredient modificat ions. A s might be expect ed, t he more frequent ly an ingredient occurs in a recipe, t he more t imes it s quant ity has t he opport unity t o be modified, as is evident in t he st rong correlat ion between t he t he number of recipes t he ingredient occurs in and bot h increases and decreases recommended in reviews. However, t he more common an ingredient , t he more st able it appears t o be. Recipe frequency is negat ively correlat ed wit h delet ions/ recipe (⇢ = 0.22), addit ions/ recipe (⇢ = 0.25), and increases/ recipe (⇢ = 0.26). For example, salt is so essent ial, appearing in over 21,000 recipes, t hat we det ect ed only 18 reviews where it was explicit ly dropped. In cont rast , Worcheshire sauce, appearing in 1,542 recipes, is dropped explicit ly in 148 reviews. A s might also be expect ed, addit ions are posit ively correlat ed wit h increases, and delet ions wit h decreases. However, addit ions and delet ions are very weakly negat ively correlat ed, indicat ing t hat an ingredient t hat is added frequent ly is not necessarily omit t ed more frequent ly as well. T able 1: C orrelat ions between ingredient modificat ions # recipes addit ion delet ion increase addit ion 0.41 delet ion 0.22 -0.15 increase 0.61 0.79 0.09 decrease 0.68 0.11 0.58 0.39 5.3 I ngredient substitute network Replacement relat ionships show whet her one ingredient is preferable t o anot her. T he preference could be based on t ast e, availability, or price. Some ingredient subst it ut ion t ables can be found online1 , but are neit her ext ensive nor cont ain informat ion about relat ive frequencies of each 1 e.g., ht t p:/ / allrecipes.com/ HowTo/ common-ingredient subst it ut ions/ det ail.aspx F igure 5: I ngredient subst it ut e network. N odes are sized according t o t he number of t imes t hey have been recommended as a subst it ut e for anot her ingredient , and colored according t o t heir indegree. subst it ut ion. T hus, we found an alt ernat ive source for ext ract ing replacement relat ionships – users’ comment s, e.g. “ I replaced the butter in the frosting by sour cream, just to soothe my conscience about all the fatty calories”. To ext ract such knowledge, we first parsed t he reviews as follows: we considered several phrases t o signal replacement relat ionships: “ replace a wit h b”, “ subst it ut e b for a”, “ b inst ead of a”, et c, and mat ched a and b t o our list of ingredient s. We const ruct ed an ingredient subst it ut e network t o capt ure users’ knowledge about ingredient replacement . T his weight ed, direct ed network consist s of ingredient s as nodes. We t hresholded and eliminat ed any suggest ed subst it ut ions t hat occurred fewer t han 5 t imes. We t hen det ermined t he weight of each edge by p(b|a), t he proport ion of subst it ut ions of ingredient a t hat suggest ingredient b. For example, 68% of subst it ut ions for whit e sugar were t o splenda, an art ificial sweet ener, and hence t he assigned weight for t he sugar ! spl enda edge is 0.68. T he result ing subst it ut ion network, shown in Figure 5, exhibit s st rong clust ering. We examined t his st ruct ure by applying t he map generat or t ool by Rosvall et al. [13], which uses a random walk approach t o ident ify clust ers in weight ed, direct ed networks. T he result ing clust ers, and t heir relat ionships t o one anot her, are shown in Fig. 6. T he derived clust ers could be used when following a relat ively new recipe which may not receive many reviews, and t herefore many suggest ions for ingredient subst it ut ions. If one does not have all ingredient s at hand, one could examine t he cont ent of one’s fridge and pant ry and mat ch it wit h ot her ingredient s found in t he same clust er as t he ingredient called for by t he recipe. Table 2 list s t he cont ent s of a few such sample ingredient clust ers, and Fig. 7 shows two example clust ers ext ract ed from t he subst it ut e network. T able 2: C lust ers of ingredient s t hat can be subst it ut ed for one anot her. A maximum of 5 addit ional ingredient s for each clust er are list ed, ordered by P ageR ank. main chicken olive oil sweet pot at o baking powder almond apple egg t ilapia spinach it alian seasoning cabbage ot her ingredient s t urkey, beef, sausage, chicken breast , bacon but t er, apple sauce, oil, banana, margarine yam, pot at o, pumpkin, but t ernut squash, parsnip baking soda, cream of t art ar pecan, walnut , cashew, peanut , sunflower s. peach, pineapple, pear, mango, pie filling egg whit e, egg subst it ut e, egg yolk cod, cat fish, flounder, halibut , orange roughy mushroom, broccoli, kale, carrot , zucchini basil, cilant ro, oregano, parsley, dill coleslaw mix, sauerkraut , bok choy napa cabbage Finally, we examine whet her t he subst it ut ion network encodes preferences for one ingredient over anot her, as evidenced by t he relat ive rat ings of similar recipes, one which cont ains an original ingredient , and anot her which implement s a subst it ut ion. To t est t his hypot hesis, we const ruct a “ preference network”, where one ingredient is preferred t o anot her in t erms of received rat ings, and is const ruct ed by creat ing an edge (a, b) between a pair of ingredient s, where a and b are list ed in two recipes X and Y respect ively, if recipe rat ings R X > R Y . For example, if recipe X includes beef, ket chup and cheese, and recipe Y cont ains beef and pickles, t hen t his recipe pair cont ribut es t o two edges: one from pickles t o ket chup, and t he ot her from pickles t o cheese. T he aggregat e edge weight s are defined based on PM I. Because PM I is a symmet ric quant ity (PM I(a; b) = PM I(b; a)), we int roduce a direct ed PM I measure t o cope wit h t he direct ionality of t he preference network: p(a ! b) , PM I(a ! b) = log p(a)p(b) where p(a ! b) = # of recipe pairs from a t o b , # of recipe pairs and p(a), p(b) are defined as in t he previous sect ion. We find high correlat ion between t his preference network and t he subst it ut ion network (⇢ = 0.72, p < 0.001). T his observat ion suggest s t hat t he subst it ut e network encodes users’ ingredient preference, which we use in t he recipe predict ion t ask described in t he next sect ion. 6. RECI PE RECOM M ENDATI ON We use t he above insight s t o uncover novel recommendat ion algorit hms suit able for recipe recommendat ions. We use ingredient s and t he relat ionships encoded between t hem in ingredient networks as our main feat ure set s t o predict recipe rat ings, and compare t hem against feat ures encoding nut rit ion informat ion, as well as ot her baseline feat ures such as cooking met hods, and preparat ion and cook t ime. vegetable shortening,.. pumpkin seed,.. lemon cake mix,.. baking powder,.. dijon mustard,.. black olive,.. golden syrup,.. lemonade,.. graham cracker,.. coconut milk,.. almond extract,.. vanilla,.. peach schnapp,.. cranberry,.. strawberry,.. almond,.. milk,.. lemon juice,.. cinnamon,.. apple juice,.. bread,..chocolate chip,.. corn chip,.. olive oil,.. sour cream,.. apple,.. white wine,.. champagne,.. flour,.. cottage cheese,.. egg,.. chicken broth,.. garlic,.. sauce,.. sweet potato,.. onion,.. tomato,.. brown rice,.. celery,.. pepper,.. spaghetti sauce,.. hot,.. cheese,.. chicken,.. spinach,.. seasoning,.. black bean,.. red potato,.. italian seasoning,.. cream of mushroom soup,.. sugar snap pea,..iceberg lettuce,.. curry powder,.. imitation crab meat,.. pickle,.. quinoa,.. tilapia,.. cabbage,..sea scallop,.. smoked paprika,.. hoagie roll,.. honey,.. pie crust,.. F igure 6: I ngredient subst it ut ion clust ers. N odes represent clust ers and edges indicat e t he presence of recommended subst it ut ions t hat span clust ers. E ach clust er represent s a set of relat ed ingredient s which are frequent ly subst it ut ed for one anot her. ginger root whipping cream evaporated milk half and half cream buttermilk heavy cream cardamom pumpkin pie spice cinnamon heavy whipping cream milk clove whole milk soy milk skim milk (a) milk subst it ut es ginger nutmeg allspice mace (b) cinammon subst it ut es F igure 7: R elat ionships between ingredient s locat ed wit hin two of t he clust ers from F ig. 6. T hen we apply a discriminat ive machine learning met hod, st ochast ic gradient boost ing t rees [6], t o predict recipe rat ings. In t he experiment s, we seek t o answer t he following t hree quest ions. (1) Can we predict users’ preference for a new recipe given t he informat ion present in t he recipe? (2) W hat are t he key aspect s t hat det ermine users’ preference? (3) Does t he st ruct ure of ingredient networks help in recipe recommendat ion, and how? 6.1 Recipe Pair Prediction T he goal of our predict ion t ask is: given a pair of similar recipes, determine which one has higher average rating than the other. T his t ask is designed part icularly t o help users wit h a specific dish or meal in mind, and who are t rying t o decide between several recipe opt ions for t hat dish. R ecipe pair dat a. T he dat a for t his predict ion t ask consist s of pairs of similar recipes. T he reason for select ing similar recipes, wit h high ingredient overlap, is t hat while apples may be quit e comparable t o oranges in t he cont ext of recipes, especially if one is evaluat ing salads or dessert s, lasagna may not be comparable t o a mixed drink. To derive pairs of relat ed recipes, we comput ed similarity combined ing. networks nutrition full ingredients 0.80 0.75 0.70 0.65 baseline 0.60 wit h a cosine similarity between t he ingredient list s for t he two recipes, weight ed by t he inverse document frequency, l og(# of r eci pes/ # of r eci pes contai ni ng the i ngr edi ent). We considered only t hose pairs of recipes whose cosine similarity exceeded 0.2. T he weight ing is int ended t o ident ify higher similarity among recipes sharing more dist inguishing ingredient s, such as Brussels sprout s, as opposed t o recipes sharing very common ones, such as but t er. A furt her challenge t o obt aining reliable relat ive rankings of recipes is variance int roduced by having di erent users choose t o rat e di erent recipes. In addit ion, some users might not have a su cient number of reviews under t heir belt t o have calibrat ed t heir own rat ing scheme. To cont rol for variat ion int roduced by users, we examined recipe pairs where t he same users are rat ing bot h recipes and are collect ively expressing a preference for one recipe over anot her. Specifically, we generat ed 62,031 recipe pairs (a, b) where r ati ngi (a) > r ati ngi (b), for at least 10 users i , and over 50% of users who rat ed bot h recipe a and recipe b. Furt hermore, each user i should be an act ive enough reviewer t o have rat ed at least 8 ot her recipes. Feat ures. In t he predict ion dat aset , each observat ion consist s of a set of predict or variables or feat ures t hat represent informat ion about two recipes, and t he response variable is a binary indicat or of which get s t he higher rat ing on average. To st udy t he key aspect s of recipe informat ion, we const ruct ed di erent set of feat ures, including: • Baseline: T his includes cooking met hods, such as chopping, marinat ing, or grilling, and cooking e ort descript ors, such as preparat ion t ime in minut es, as well as t he number of servings produced, et c. T hese feat ures are considered as primary informat ion about a recipe and will be included in all ot her feat ure set s described below. • Full ingredient s: We select ed up t o 1000 popular ingredient s t o build a “ full ingredient list ”. In t his feat ure set , each observed recipe pair cont ains a vect or wit h ent ries indicat ing whet her an ingredient from t he full list is present in eit her recipe in t he pair. • Nut rit ion: T his feat ure set does not include any ingredient s but only nut rit ion informat ion such t he t ot al caloric cont ent , as well as quant it ies of fat s, carbohydrat es, et c. • Ingredient networks: In t his set , we replaced t he full ingredient list by st ruct ural informat ion ext ract ed from di erent ingredient networks, as described in Sect ions 4 and 5.3. Co-occurrence is t reat ed separat ely as a raw count , and a complement arity, capt ured by t he PM I. • Combined set : Finally, a combined feat ure set is const ruct ed t o t est t he performance of a combinat ion of feat ures, including baseline, nut rit ion and ingredient networks. To build t he ingredient network feat ure set , we ext ract ed t he following two types of st ruct ural informat ion from t he co-occurrence and subst it ut ion networks, as well as t he complement network derived from t he co-occurrence informat ion: Network positions are calculat ed t o represent how a recipe’s ingredient s occupy posit ions wit hin t he networks. Such posit ion measures are likely t o inform if a recipe cont ains any “ popular” or “ unusual” ingredient s. To calculat e t he posit ion measures, we first calculat ed various network cent rality Accuracy F igure 8: P redict ion performance. T he nut rit ion informat ion and ingredient networks are more e ect ive feat ures t han full ingredient s. T he ingredient network feat ures lead t o impressive performance, close t o t he best performance. measures, including degree cent rality, betweenness cent rality, et c., from t he ingredient networks. A cent rality measure can be represent ed as a vect or ~g where each ent ry indicat es t he cent rality of an ingredient . T he network posit ion of a recipe, wit h it s full ingredient list represent ed as a binary vect or f~, can be summarized by ~gT · f~, i.e., an aggregat ed cent rality measure based on t he cent rality of it s ingredient s. Network communities provide informat ion about which ingredient is more likely t o co-occur wit h a group of ot her ingredient s in t he network. A recipe consist ing of ingredient s t hat are frequent ly used wit h, complement ed by or subst it ut ed by cert ain groups may be predict ive of t he rat ings t he recipe will receive. To obt ain t he network community informat ion, we applied lat ent semant ic analysis (LSA ) on recipes. We first fact orized each ingredient network, represent ed by mat rix W , using singular value decomposit ion (SV D). In t he mat rix W , each ent ry Wi j indicat es whet her ingredient i co-occurrs, complement s or subst it ues ingredient j . Suppose Wk = Uk Σ k VkT is a rank-k approximat ion of W , we can t hen t ransform each recipe’s full ingredient list using t he low-dimensional represent at ion, Σ −k 1 VkT f~, as community informat ion wit hin a network. T hese low-dimensional vect ors, t oget her wit h t he vect ors of network posit ions, const it ut e t he ingredient network feat ures. Learning met hod. We applied discriminat ive machine learning met hods such as support vect or machines (SV M ) [2] and st ochast ic gradient boost ing t rees [5] t o our predict ion problem. Here we report and discuss t he det ailed result s based on t he gradient boost ing t ree model. Like SV M , t he gradient boost ing t ree model seeks a paramet erized classifier, but unlike SV M t hat considers all t he feat ures at one t ime, t he boost ing t ree model considers a set of feat ures at a t ime and it erat ively combines t hem according t o t heir empirical errors. In pract ice, it not only has compet it ive performance comparable t o SV M , but can serve as a feat ure ranking procedure [11]. In t his work, we fit t ed a st ochast ic gradient boost ing t ree model wit h 8 t erminal nodes under an exponent ial loss funct ion. T he dat aset is roughly balanced in t erms of which recipe is t he higher-rat ed one wit hin a pair. We randomly group 1.0 1.0 nutrition nutrition (6.5%) carbs (20.9%) cook effort (5.0%) 0.8 cholesterol (17.7%) 0.8 ing. networks (84%) calories (19.7%) cook methods (3.9%) importance importance sodium (16.8%) 0.6 0.4 0.2 fiber (12.3%) fat (12.4%) 0.4 0.2 0.0 0.0 20 40 60 80 100 feature network 0.7 substitution (39.8%) co−occurrence (30.9%) 0.6 complement (29.2%) 0.5 0.4 0.3 0.2 0.1 0.0 20 40 60 80 100 feature F igure 10: R elat ive import ance of feat ures represent ing t he network st ruct ure. T he subst it ut ion net work has t he st rongest cont ribut ion ( 39.8%) t o t he t ot al import ance of network feat ures, and it also has more influent ial feat ures in t he t op 100 list , which suggest s t hat t he subst it ut ion network is complement ary t o ot her feat ures. divided t he dat aset int o a t raining set (2/ 3) and a t est ing set (1/ 3). T he predict ion performance is evaluat ed based on accuracy, and t he feat ure performance is evaluat ed in t erms of relat ive import ance [8]. For each single decision t ree, one of t he input variables, x j , is used t o part it ion t he region associat ed wit h t hat node int o two subregions in order t o fit t o t he response values. T he squared relat ive import ance of variable x j is t he sum of such squared improvement s over all int ernal nodes for which it was chosen as t he split t ing variable, as: î 2k I (split s on x j ) i mp(j ) = 2 4 6 8 10 12 feature F igure 9: R elat ive import ance of feat ures in t he combined set . T he individual it ems from nut rit ion informat ion are very indicat ive in di erent iat ing highly rat ed recipes, while most of t he predict ion power comes from ingredient networks. importance 0.6 k where î 2k is t he empirical improvement by t he k-t h node split t ing on x j at t hat point . 6.2 Results T he overall predict ion performance is shown in Fig. 8. Surprisingly, even wit h a full list of ingredient s, t he predict ion accuracy is only improved from .712 (baseline) t o F igure 11: R elat ive import ance of feat ures from nut rit ion informat ion. T he carbs it em is t he most influent ial feat ure in predict ing higher-rat ed recipes. .746. In cont rast , t he nut rit ion informat ion and ingredient networks are more e ect ive (wit h accuracy .753 and .786, respect ively). Bot h of t hem have much lower dimensions (from t ens t o several hundreds), compared wit h t he full ingredient s t hat are represent ed by more t han 2000 dimensions (1000 ingredient s per recipe in t he pair). T he ingredient network feat ures lead t o impressive performance, close t o t he best performance given by t he combined set (.792), indicat ing t he power of network st ruct ures in recipe recommendat ion. Figure 9 shows t he influence of di erent feat ures in t he combined feat ure set . Up t o 100 feat ures wit h t he highest relat ive import ance are shown. T he import ance of a feat ure group is summarized by how much t he t ot al import ance is cont ribut ed by all feat ures in t he set . For example, t he baseline consist ing of cooking e ort and cooking met hods cont ribut e 8.9% t o t he overall performance. T he individual it ems from nut rit ion informat ion are very indicat ive in di erent iat ing highly-rat ed recipes, while most of t he predict ion power comes from ingredient networks (84%). Figure 10 shows t he t op 100 feat ures from t he t hree net works. In t erms of t he t ot al import ance of ingredient net work feat ures, t he subst it ut ion network has slight ly st ronger cont ribut ion (39.8%) t han t he ot her two networks, and it also has more influent ial feat ures in t he t op 100 list . T his suggest s t hat t he st ruct ural informat ion ext ract ed from t he subst it ut ion network is not only import ant but also complement ary t o informat ion from ot her aspect s. Looking int o t he nut rit ion informat ion (Fig. 11), we found t hat carbohydrat es are t he most influent ial feat ure in predict ing higher-rat ed recipes. Since carbohydrat es comprise around 50% or more of t ot al calories, t he high import ance of t his feat ure int erest ingly suggest s t hat a recipe’s rat ing can be influenced by users’ concerns about nut rit ion and diet . A not her int erest ing observat ion is t hat , while individual nut rit ion it ems are powerful predict ors, a higher predict ion accuracy can be reached by using ingredient networks alone, as shown in Fig. 8. T his implies t he informat ion about nut rit ion may have been encoded in t he ingredient network st ruct ure, e.g. subst it ut ions of less healt hful ingredient s wit h “ healt hier” alt ernat ives. Const ruct ing t he ingredient network feat ure involves reducing high-dimensional network informat ion t hrough SV D, as described in t he previous sect ion. T he dimensionality can be det ermined by cross-validat ion. A s shown in Fig. 12, feat ures wit h a very large dimension t end t o overfit t he t raining In Figure 13 we show t he most represent at ive ingredient s in t he decomposed mat rix derived from t he subst it ut ion net work. We display t he t op five influent ial dimensions, evaluat ed based on t he relat ive import ance, from t he SV D result ant mat rix Vk , and in each of t hese dimensions we ext ract ed six represent at ive ingredient s based on t heir int ensit ies in t he dimension (t he squared ent ry values). T hese represent at ive ingredient s suggest t hat t he communit ies of ingredient subst it ut es, such as t he sweet and oil subst it ut es in t he first dimension or t he milk subst it ut es in t he second dimesion (which is similar t o t he clust er shown in Fig. 6), are part icularly informat ive in predict ing recipe rat ings. To summarize our observat ions, we find we are able t o e ect ively predict users’ preference for a recipe, but t he predict ion is not t hrough using a full list of ingredient s. Inst ead, by using t he st ruct ural informat ion ext ract ed from t he relat ionships among ingredient s, we can bet t er uncover users’ preference about recipes. 0.80 ● ● 0.79 ● Accuracy ● 0.78 ● ● ● network 0.77 ● combined substitution complement co−occurrence 7. CONCLUSI ON 0.76 10 20 30 40 50 60 70 Dimensions F igure 12: P redict ion performance over reduced dimensionality. T he best performance is given by reduced dimension k = 50 when combining all t hree networks. I n addit ion, using t he informat ion about t he complement network alone is more e ect ive in predict ion t han using ot her two networks. Color Key svd dimension 82 −0.5 433 0.5 Value splenda olive oil applesauce honey butter brown sugar milk half and half chicken broth buttermilk sour cream evaporated milk vanilla extract vanilla kale almond beef cream of chicken soup almond extract chocolate pudding lemon extract lime juice walnut coconut extract turkey chicken sausage italian sausage pork chicken breast 194 65 splenda olive oil applesauce honey butter brown sugar milk half and half chicken broth buttermilk sour cream evaporated milk vanilla extract vanilla kale almond beef cream of chicken soup almond extract chocolate pudding lemon extract lime juice walnut coconut extract turkey chicken sausage italian sausage pork chicken breast 6 19 43 8 4 F igure 13: I nfluent ial subst it ut ion communit ies. T he mat rix shows t he most influent ial feat ure dimensions ext ract ed from t he subst it ut ion network. For each dimension, t he six represent at ive ingredient s wit h t he highest int ensity values are shown, wit h colors indicat ing t heir int ensity. T hese feat ures suggest t hat t he communit ies of ingredient subst it ut es, such as t he sweet and oil in t he first dimension, are part icularly informat ive in predict ion. dat a. Hence we chose k = 50 for t he reduced dimension of all t hree networks. T he figure also shows t hat using t he informat ion about t he complement network alone is more e ect ive in predict ion t han using eit her t he co-occurrence and subst it ut e networks, even in t he case of low dimensions. Consist ent ly, as shown in t erms of relat ive import ance (Fig. 10), t he subst it ut ion network alone is not t he most effect ive, but it provides more complement ary informat ion in t he combined feat ure set . Color Key −0.5 Value ingredient 41 Recipes are lit t le more t han inst ruct ions for combining and processing set s of ingredient s. Individual cookbooks, even t he most expansive ones, cont ain single recipes for each dish. T he web, however, permit s collaborat ive recipe generat ion and modificat ion, wit h t ens of t housands of recipes cont ribut ed in individual websit es. We have shown how t his dat a can be used t o glean insight s about regional preferences and modifiability of individual ingredient s, and also how it can be used t o const ruct two kinds of networks, one of ingredient complement s, t he ot her of ingredient subst it ut es. T hese networks encode which ingredient s go well t oget her, and which can be subst it ut ed t o obt ain superior result s, and permit one t o predict , given a pair of relat ed recipes, which one will be more highly rat ed by users. In fut ure work, we plan t o ext end ingredient networks t o incorporat e t he cooking met hods as well. It would also be of int erest t o generat e region-specific and diet -specific rat ings, depending on t he users’ background and preferences. A whole host of user-int erface feat ures could be added for users who are int eract ing wit h recipes, whet her t he recipe is newly submit t ed, and hence unrat ed, or whet her t hey are browsing a cookbook. In addit ion t o aut omat ically predict ing a rat ing for t he recipe, one could flag ingredient s t hat can be omit t ed, ones whose quant ity could be tweaked, as well as suggest ed addit ions and subst it ut ions. 8. ACKNOWLEDGM ENTS T his work was support ed by M URI award FA 9550-08-10265 from t he A ir Force O ce of Scient ific Research. T he met hodology used in t his paper was developed wit h support from funding from t he A rmy Research O ce, M ult iUniversity Research Init iat ive on M easuring, Underst anding, and Responding t o Covert Social Networks: Passive and A ct ive Tomography. T he aut hors grat efully acknowledge D. Lazer for support . 9. REFERENCES [1] A hn, Y ., A hnert , S., Bagrow, J., and Barabasi, A . Flavor network and t he principles of food pairing. Bulletin of the American Physical Society 56 (2011). [2] Cort es, C., and Vapnik, V . Support -vect or networks. Machine learning 20, 3 (1995), 273–297. [3] Forbes, P., and Zhu, M . Cont ent -boost ed mat rix fact orizat ion for recommender syst ems: Experiment s wit h recipe recommendat ion. Proceedings of Recommender Systems (2011). [4] Freyne, J., and Berkovsky, S. Int elligent food planning: personalized recipe recommendat ion. In I UI , ACM (2010), 321–324. [5] Friedman, J. St ochast ic gradient boost ing. Computational Statistics & Data Analysis 38, 4 (2002), 367–378. [6] Friedman, J., Hast ie, T ., and T ibshirani, R. A ddit ive logist ic regression: a st at ist ical view of boost ing. Annals of Statistics 28 (1998), 2000. [7] Geleijnse, G., Nacht igall, P., van K aam, P., and W ijgergangs, L. A personalized recipe advice syst em t o promot e healt hful choices. In I UI , ACM (2011), 437–438. [8] Hast ie, T ., T ibshirani, R., Friedman, J., and Franklin, J. T he element s of st at ist ical learning: dat a mining, inference and predict ion. T he Mathematical I ntelligencer 27, 2 (2005). [9] K amiet h, F., Braun, A ., and Schlehuber, C. A dapt ive implicit int eract ion for healt hy nut rit ion and food int ake supervision. Human-Computer I nteraction. Towards Mobile and I ntelligent I nteraction Environments (2011), 205–212. [10] K inouchi, O., Diez-Garcia, R., Holanda, A ., Zambianchi, P., and Roque, A . T he non-equilibrium nat ure of culinary evolut ion. New Journal of Physics 10 (2008), 073020. [11] Lu, Y ., Peng, F., Li, X ., and A hmed, N. Coupling feat ure select ion and machine learning met hods for navigat ional query ident ificat ion. In CI K M, ACM (2006), 682–689. [12] Rombauer, I., Becker, M ., Becker, E., and M aest ro, L. Joy of cooking. Scribner Book Company, 1997. [13] Rosvall, M ., and Bergst rom, C. M aps of random walks on complex networks reveal community st ruct ure. PNAS 105, 4 (2008), 1118. [14] Shidochi, Y ., Takahashi, T ., Ide, I., and M urase, H. Finding replaceable mat erials in cooking recipe t ext s considering charact erist ic cooking act ions. In Proc. of the ACM multimedia 2009 workshop on Multimedia for cooking and eating activities, ACM (2009), 9–14. [15] Svensson, M ., Höök, K ., and Cöst er, R. Designing and evaluat ing kalas: A social navigat ion syst em for food recipes. ACM Transactions on Computer-Human I nteraction (T OCHI ) 12, 3 (2005), 374–400. [16] Ueda, M ., Takahat a, M ., and Nakajima, S. User’s food preference ext ract ion for personalized cooking recipe recommendat ion. Proc. of the Second Workshop on Semantic Personalized I nformation Management: Retrieval and Recommendation (2011). [17] Wang, L., Li, Q., Li, N., Dong, G., and Yang, Y . Subst ruct ure similarity measurement in chinese recipes. In WWW, ACM (2008), 979–988. [18] W ikipedia. Out line of food preparat ion, 2011. [Online; accessed 22-Oct -2011]. [19] Zhang, Q., Hu, R., M ac Namee, B., and Delany, S. Back t o t he fut ure: K nowledge light case base cookery. In Proc. of T he 9th European Conference on Case-Based Reasoning Workshop (2008), 15.