thesis
Transcription
thesis
Poznan University of Technology Faculty of Computer Science and Management Institute of Computing Science Doctoral dissertation FITNESS-DISTANCE ANALYSIS FOR ADAPTATION OF A MEMETIC ALGORITHM TO TWO PROBLEMS OF COMBINATORIAL OPTIMISATION Marek Kubiak Supervisor Andrzej Jaszkiewicz, PhD Dr Habil. Poznań, 2009 Acknowledgement I would like to thank Dr. Andrzej Jaszkiewicz for introducing me to the area of metaheuristics and to the property of fitness-distance correlation. Also for the support during my years as a PhD student and the example of a successful scientist and engineer, which is for me the source of constant inspiration. I deeply thank Professor Roman Słowiński for encouragement, sympathy and help. I appreciate the help, support and presence of: Przemysław Wesołek, Dawid Weiss, Maciej Komosiński, Wojciech Kotłowski, Izabela Szczęch, Krzysztof Dembczyński, Jerzy Błaszczyński, Marcin Szeląg and the whole staff of the Laboratory of Intelligent Decision Support Systems. Without the help and example of my teachers of mathematics, physics and literature on all levels of education, my knowledge and experience would have been much closer to ignorance. Therefore, I thank Ewa Krompiewska, Halina Szalaty, Alicja Borowska and Dr. Grzegorz Kubski for their excellent work as teachers and guides. Finally, I would like to thank my family for the enormous support and encouragement they offered me during these last 6 years of scientific work. iii Streszczenie Rozdział 1. Wprowadzenie Kontekst tematu badawczego W nowoczesnej ekonomii, administracji i nauce często napotyka się problemy optymalizacji. Wynika to z tego, że decydenci są zwykle zainteresowani efektywnym przydziałem dostępnych zasobów do zadań lub rozwiązywaniem problemów nowych, wcześniej w ogóle nie rozpatrywanych. Problemy takie mogą być problemami optymalizacji ciągłej lub kombinatorycznej (dyskretnej). Ta rozprawa zajmuje się dwoma problemami kombinatorycznymi. Wymagają one, żeby w przestrzeni rozwiązań o skończonej liczbie elementów znaleźć rozwiązanie optymalne w sensie pewnej danej funkcji celu. Jednak to, że przestrzeń taka jest skończonych rozmiarów nie znaczy, że jest mała i że łatwo w niej znaleźć to optimum. Jak pokazuje praktyka, wiele problemów kombinatorycznych jest trudnych do rozwiązania w rozsądnym czasie, co teoria złożoności obliczeniowej nazywa NP-trudnością. W rezultacie często próbuje się rozwiązywać takie problemy w sposób przybliżony, przy pomocy algorytmów heurystycznych i metaheurystycznych. Celem takich algorytmów jest generowanie w rozsądnym czasie rozwiązań o dobrej, akceptowalnej jakości, choć niekoniecznie optymalnych. Ta rozprawa podejmuje właśnie zagadnienie zastosowania pewnego rodzaju metaheurystyki, algorytmu memetycznego, do rozwiązania dwóch konkretnych problemów. Algorytmy metaheurystyczne nie są jednak algorytmami sensu stricte. To są meta-algorytmy, czyli schematy algorytmów, które muszą być najpierw dostosowane do rozpatrywanego problemu, by móc efektywnie działać. To dostosowanie wymaga dokonania wyboru lub zaprojektowania tych komponentów meta-algorytmu, które nie są w nim jawnie wyspecyfikowane. A jak pokazuje literatura przedmiotu, dostosowanie to może bardzo silnie wpłynąć na efektywność uzyskanego algorytmu. Dlatego też to dostosowanie komponentów metaheurystyki do konkretnego problemu powinno być wykonywane z uwagą i dobrze uzasadnione. Niestety, aktualnie brakuje jasnych wskazówek dotyczących tego, jak projektować te niewyspecyfikowane komponenty metaheurystyk. Dostosowanie metaheurystyki do problemu jest w tym momencie bardziej sztuką niż wiedzą techniczną. Jedynie pewne wstępne wskazówki można znaleźć w literaturze. W tych wskazówkach mowa o analizie zbioru rozwiązań rozpatrywanego problemu przed konstrukcją algorytmu. Taka analiza ma dostarczyć wiedzy o własnościach problemu, które można potem wykorzystać w projektowanych komponentach. Jedną z takich własności jest korelacja jakości i odległości rozwiązań (ang. fitness-distance correlation, inaczej: globalna wypukłość). Cecha ta polega na tym, że im lepsze są rozwiązania danego problemu, tym są bliżej siebie w sensie pewnej miary odległości specyficznej dla problemu. Dodatkowo zakłada, że najlepsze rozwiązania (w tym i optimum) są gdzieś „pośrodku” tego trendu. Jak się obecnie uważa, sposobem na wykorzystanie tej własności przestrzeni rozwiązań jest np. konstrukcja dla algorytmu memetycznego operatorów krzyżowania zachowujących odległość. Pomysł na konstrukcję takich operatorów opierając się na własności globalnej wypukłości jest stosunkowo nowy. Był stosowany dopiero w kilku przypadkach. Nawet sama analiza korelacji jakości i odległości rozwiązań nie była jeszcze wykonywana dla wielu problemów. Aktualnie brak niezbędnych do tego narzędzi: miar odległości dla rozwiązań problemów kombinatorycznych. Dlatego ta rozprawa podejmuje zagadnienie adaptacji algorytmu memetycznego, czyli konstrukcji operatorów zachowujących odległość, na podstawie analizy korelacji jakości i odległości. Taka analiza i adaptacja nie były jeszcze wykonywane na problemie planowania tras i problemie planowania produkcji samochodów, które tutaj się rozpatruje. Główne założenia i hipoteza pracy Pierwszym założeniem tej pracy jest to, że dwa rozpatrywane problemy optymalizacji posiadają własność globalnej wypukłości. Drugie założenie mówi, że obecność globalnej wypukłości ułatwia dostosowanie algorytmu memetycznego do konkretnego problemu. Główna hipoteza tej rozprawy głosi, że adaptacja algorytmu memetycznego do danego problemu powinna polegać na konstrukcji operatorów krzyżowania zachowujących odległość, o ile tylko ten problem wykazuje globalną wypukłość. W takim przypadku uzyskany algorytm będzie generował rozwiązania co najmniej niegorsze lub nawet lepsze niż ta sama metaheurystyka z operatorami innego rodzaju. Cel pracy Celem niniejszej pracy jest wykonanie i ocena schematu adaptacji algorytmu memetycznego opartego na badaniu globalnej wypukłości. Schemat ten jest wykonany i oceniony na tytułowych dwóch problemach: problemie planowania tras dla pojazdów z ograniczeniami pojemnościowymi i problemie planowania produkcji samochodów. Opublikowane wyniki Pewne elementy tej rozprawy były już przez autora opisywane w następujących tekstach: Kubiak (2004), Jaszkiewicz i inni (2004), Kubiak (2005), Kubiak i inni (2006), Kubiak (2006), Kubiak (2007), Kubiak i Wesołek (2007). Rozdział 2. Metaheurystyki w optymalizacji kombinatorycznej Krótki przegląd metaheurystyk i ich zastosowań dokładniej pokazuje, że to nie są algorytmy, które można od razu zastosować do rozwiązania dowolnego problemu. Są to raczej ogólne schematy algorytmów, które trzeba dostosować do konkretnego problemu, czyli zaprojektować lub wybrać komponenty dla wybranej metaheurystyki. Tablica 1 prezentuje przykłady takich komponentów dla przeanalizowanych metaheurystyk. W większości przypadków (w tym w przypadku algorytmu memetycznego) brakuje jasnych wskazówek projektowych, które mogłyby pomóc w praktyce zastosowania metaheurystyk. Można za to często znaleźć ogólne stwierdzenia, że należy wprowadzać do takiego algorytmu wiedzę specyficzną dla zadania. Tablica 1: Komponenty metaheurystyk, które wymagają adaptacji do problemu. Metaheurystyka przeszukiwanie lokalne algorytm kolonii mrówek hiperheurystyka algorytm ewolucyjny algorytm memetyczny Komponenty wymagające adaptacji tworzenie rozwiązań początkowych operator(y) sąsiedztwa reguła poprawy definicja komponentu rozwiązania zrandomizowana heurystyka konstrukcyjna reguła aktualizacji śladu feromonowego definicja komponentu rozwiązania zbiór heurystyk niskiego poziomu hiperheurystyka wysokiego poziomu tworzenie rozwiązań początkowych reprezentacja rozwiązań operator(y) krzyżowania operator(y) mutacji wszystkie komponenty przeszukiwania lokalnego wszystkie komponenty algorytmu ewolucyjnego Rozdział 3. Twierdzenia „nic za darmo” i ich konsekwencje dla optymalizacji Nie ma ogólnych metod optymalizacji dla dowolnych problemów Twierdzenia „nic za darmo” (ang. No Free Lunch, NFL) (Wolpert & Macready 1997, Schumacher et al. 2001) implikują, że nie ma ogólnych metod optymalizacji, które równie dobrze rozwiązywałyby problemy z bardzo obszernej klasy. Dzieje się tak przede wszystkim dlatego, że gdy w algorytmie nie wykorzystuje się wiedzy o problemie, co musi mieć miejsce przy rozpatrywaniu odpowiednio szerokiej klasy problemów, to ten algorytm jest niejako „ślepy” i działa na problemie zamkniętym w „czarnej skrzynce”. W takiej zaś sytuacji, zgodnie z tezą wspomnianych twierdzeń, efektywność takiego algorytmu jest równa efektywności przeszukiwania losowego, czyli bardzo słabej metody optymalizacji. W praktyce więc istotą stosowania ogólnych metod optymalizacji, jakimi są metaheurystyki, jest ucieczka od modelu „czarnej skrzynki”. To jednak nie jest możliwe tylko przez syntaktyczne manipulacje na rozwiązaniach problemu, bez wykorzystania w algorytmie jego semantyki. Ta ucieczka jest możliwa tylko wtedy, gdy metaheurystyka jest odpowiednio zaadaptowana do problemu, wyposażona w wiedzę o nim. Dlatego też należy uznać, że nie ma metaheurystyk, które zawsze byłyby lepsze od innych przy rozwiązywaniu dowolnych problemów optymalizacji. Są raczej lepsze lub gorsze dostosowania metaheurystyk do konkretnych problemów. Wykorzystanie wiedzy o regularności przestrzeni problemu Twierdzenia „nic za darmo” także pośrednio wskazują, że w przestrzeni rozwiązań rozpatrywanego problemu optymalizacji musi być pewnego rodzaju regularność, jeśli jakiś algorytm ma być lepszy dla tego problemu niż przeszukiwanie losowe. Co więcej, sama obecność tej regularności nie wystarczy: musi ona być znana i bezpośrednio wykorzystana w stosowanym algorytmie. W przypadku metaheurystyk oznacza to, że ogólny schemat algorytmu, zwykle w praktyce niemodyfikowany, musi zostać wypełniony komponentami wyposażonymi w wiedzę o problemie, do którego są stosowane. Przykładowo (zob. tablica 1), w przeszukiwaniu lokalnym należy zapro- jektować odpowiednie operatory sąsiedztwa, a w algorytmie ewolucyjnym operatory krzyżowania i mutacji. Co jest regularnością przestrzeni? Trudno jest jednak obecnie jednoznacznie odpowiedzieć na pytanie, co jest regularnością przestrzeni rozwiązań problemu i jak taką regularność wykorzystać w dostosowaniu algorytmu. Według pewnych wskazówek taką regularnością może być wszystko, co prowadzi do przyspieszenia przeszukiwania przestrzeni, np. silna lokalność operatorów sąsiedztwa, pozwalająca na obliczanie wartości funkcji celu szybciej dla sąsiadów niż dla dowolnych rozwiązań. Według innych może to być istnienie szybkiej procedury pozwalającej na obliczenie ograniczenia dolnego na wartość funkcji celu w pewnej podprzestrzeni rozwiązań problemu. Dzięki takiej procedurze rozmiar przeszukiwanej przestrzeni może być często znacznie ograniczony. Konsekwencje dla algorytmów ewolucyjnych W świetle twierdzeń NFL „wiara w używanie algorytmu ewolucyjnego jako „ślepego” narzędzia optymalizacji jest nie na miejscu”1 i „nie ma powodu żeby uważać, że algorytm genetyczny będzie ogólnie bardziej użyteczny niż jakiekolwiek inne podejście do optymalizacji” (Culberson 1998). A jednak wielu entuzjastów neodarwinizmu powiedziałoby, że wyniki ewolucji naturalnej są najlepszym dowodem na to, że procesy ewolucyjne oparte na zasadzie przetrwania osobników najlepiej dostosowanych do środowiska dają w praktyce dobre rezultaty. Takie właśnie procesy doprowadziły do powstania złożonych organizmów, doskonale dostosowanych do swoich środowisk. Ale, jak to zręcznie ujmuje Culberson (1998), „sam fakt [istnienia] ewolucji naturalnej nie wskazuje, gdzie mogą być obszary zastosowania algorytmu ewolucyjnego i z pewnością nie daje podstaw do twierdzenia, że ten algorytm jest uniwersalnym narzędziem optymalizacji”. Podobnie twierdzi Muhlenbein (2003): „ jestem przeciwny popularnym argumentom z rodzaju: ten algorytm jest dobrą metodą optymalizacji, bo jego podstawy są używane w naturze”. Reeves i Rowe (2003) także dokładnie przeanalizowali tę kwestię i w pierwszym rozdziale swojej książki zauważają, że: • neodarwinizm jest atrakcyjną teorią i często wystarczy tylko przywołać teorię ewolucji oraz nazwisko Darwina, żeby uzasadnić ogólność algorytmów ewolucyjnych jako metod optymalizacji; • jednak mechanizmy ewolucji w naturze nie są jeszcze w wielu przypadkach dobrze znane i wyjaśnione, jest za to wiele spekulacji na ich temat, bez solidnych dowodów; • jest prawdopodobne, że ewolucja naturalna nie optymalizuje żadnej funkcji celu, a w każdym razie taki cel nie został jeszcze wskazany; brakuje więc uzasadnienia wychodzącego od ewolucji naturalnej dla stosowania algorytmów ewolucyjnych do optymalizacji. Dlatego należy stwierdzić, że algorytm ewolucyjny (a z nim memetyczny) nie jest niczym innym, jak tylko pewnym abstrakcyjnym tworem matematycznym, pewnym schematem optymalizacji, który być może ma niewiele wspólnego z ewolucją w naturze. To adaptacja tego algorytmu do konkretnego problemu jest podstawą sukcesu lub porażki w optymalizacji, jak wskazują twierdzenia „nic za darmo”. Z tych powodów to adaptacja pewnego rodzaju algorytmu ewolucyjnego jest głównym tematem tej rozprawy. 1 Wszystkie tłumaczenia cytatów z języka angielskiego w streszczeniu pochodzą od autora rozprawy. Rozdział 4. Metody adaptacji algorytmu ewolucyjnego do problemu optymalizacji kombinatorycznej Przegląd metod adaptacji dokonany w tym rozdziale może być źródłem jednej podstawowej konkluzji: w algorytmie ewolucyjnym (i memetycznym) jest kilka komponentów, które muszą być zaadaptowane do rozważanego problemu przed wykorzystaniem tego algorytmu, jednak możliwości wyboru dla tej adaptacji jest wiele i zwykle trudno jest znaleźć w literaturze wskazówki praktyczne, w jakich warunkach i które z nich zastosować. Pewnymi wyjątkami są tutaj pojawiające się od niedawna wskazówki oparte na analizie chropowatości krajobrazu funkcji celu (ang. landscape ruggedness) lub jego globalnej wypukłości. Faktycznie, w poważnych opracowaniach o algorytmach ewolucyjnych i metaheurystykach słychać narzekania na ten stan rzeczy. Michalewicz i Fogel (2000) przyznają, że teoretyczne podstawy dla projektowania hybrydowych algorytmów ewolucyjnych (np. memetycznych) są niewielkie. Hoos i Stutzle w epilogu do swojej książki (Hoos & Stutzle 2004) podsumowują w ten sposób stan aktualny: wiele prac dotyczących projektowania i zastosowania algorytmów metaheurystycznych raczej przypomina sztukę niż naukę (. . . ), doświadczenie, bardziej niż zrozumienie, jest często kluczem do osiągania zamierzonych celów. Podobnie uważają Krasnogor i Smith (2005), przyznając, że „proces projektowania efektywnych algorytmów memetycznych jest wykonywany obecnie ad hoc i często jest ukryty za szczegółami charakterystycznymi dla konkretnego problemu”. Ta duża ilość intuicji i doświadczenia potrzebna do projektowania dobrych algorytmów memetycznych (i ogólniej: metaheurystyk) jest często podstawą krytyki tych algorytmów i opinii, że ich projektowanie nie jest oparte na systematycznym podejściu naukowym. Dlatego też znani autorzy w tej dziedzinie oceniają zagadnienie wyjaśniania i predykcji efektywności algorytmów ewolucyjnych jako jedne z najważniejszych w teorii obliczeń (Hoos & Stutzle 2004, Moscato & Cotta 2003, Reeves & Rowe 2003). Co więcej, w ich mniemaniu badania prowadzone nad tym zagadnieniem najprawdopodobniej zaowocują lepszym zrozumieniem związków pomiędzy własnościami problemów kombinatorycznych i algorytmów metaheurystycznych. To może w rezultacie doprowadzić do zbudowania mocniejszych podstaw do stosowania tych algorytmów w praktyce. Z tej perspektywy najciekawsze sposoby adaptacji algorytmu memetycznego zaprezentowane w tym rozdziale to te oparte na pewnych analizach przeszukiwanej przestrzeni rozwiązań. Te analizy polegają na badaniu chropowatości lub globalnej wypukłości krajobrazu funkcji celu. Autor tej rozprawy zajął się zagadnieniem globalnej wypukłości, ponieważ w przeszłości to właśnie projekty algorytmu memetycznego oparte na tej cesze problemu doprowadziły do uzyskania dobrych wyników w optymalizacji (Galinier & Hao 1999, Hoos & Stutzle 2004, Jaszkiewicz & Kominek 2003, Merz 2000). Także dlatego, że „systematyczna metoda projektowania właściwych operatorów krzyżowania i mutacji byłaby bardzo pomocna ” (Reeves & Rowe 2003, strona 283), a wykorzystanie globalnej wypukłości może prowadzić właśnie do projektu takich operatorów. Przez zajęcie się właśnie tym zagadnieniem autor rozprawy kontynuuje badania prowadzone wcześniej przez takich naukowców jak: Kirkpatrick i Toulouse (1985), Mühlenbein (1991), Boese i inni (1995, 1994), Jones i Forrest (1995), Altenberg (1997), Merz (2000), Watson i inni (2003), Jaszkiewicz i Kominek (1999, 2003). Rozdział 5. Analiza globalnej wypukłości przestrzeni rozwiązań problemu kombinatorycznego jako podstawa do adaptacji algorytmu memetycznego Analiza globalnej wypukłości krajobrazu funkcji celu Krajobraz L funkcji celu (ang. fitness landscape) dla instancji I problemu optymalizacji kombinatorycznej π to trójelementowy wektor L = (S, f, N ), gdzie S = Sπ (I) to zbiór rozwiązań instancji I, f to funkcja celu, a N to funkcja sąsiedztwa rozwiązań w zbiorze S, czyli N : S → 2S . Zamiast funkcji sąsiedztwa można też zastosować funkcję odległości rozwiązań d : S × S → R, co daje ten sam rezultat. Analiza globalnej wypukłości (ang. fitness-distance analysis, FDA) poszukuje w krajobrazie funkcji celu związku pomiędzy jakością rozwiązań, a ich odległością do celu przeszukiwania, czyli optimum globalnego. Najczęściej była do tej pory wykonywana jako analiza statystyczna próbki dobrych rozwiązań i podsumowywana przez współczynnik korelacji jakości i odległości. Pożądanym wynikiem analizy w przypadku problemu maksymalizacji jest korelacja negatywna, czyli malejąca odległość rozwiązań do optimum globalnego przy zwiększaniu się wartości funkcji celu. W takim wypadku przeszukiwanie przestrzeni przy pomocy algorytmów opartych na selekcji (np. ewolucyjnych) powinno być łatwe, gdyż istnieje ścieżka do optimum przez rozwiązania o wzrastającej wartości funkcji celu (Merz 2000). Podstawowa wersja badania globalnej wypukłości wymaga znajomości co najmniej jednego optimum globalnego dla analizowanej instancji problemu. Najpierw generowana jest duża próba losowa dobrych rozwiązań tej instancji. Pojedyncze rozwiązanie generuje się niezależnie od innych przez wylosowanie punktu startowego w krajobrazie i zastosowanie do niego jakiegoś zrandomizowanego algorytmu przeszukiwania lokalnego. W ten sposób próbka zawiera losowe optima lokalne. Dla każdego elementu próbki s oblicza się wartość funkcji celu f (s) i odległość od najbliższego optimum globalnego dopt (s). Również dla każdej pary rozwiązań w próbce, s1 , s2 , oblicza się ich wzajemną odległość d(s1 , s2 ). Na podstawie tych pomiarów ocenia się najpierw rozkład wzajemnej odległości rozwiązań. Oblicza się średnią odległość optimów lokalnych (elementów próby): d¯ = n n X X 2 d(si , sj ) n(n − 1) i=1 j=i+1 Tę wartość porównuje się ze średnią odległością rozwiązań losowych lub z analitycznie obliczoną średnicą krajobrazu. W ten sposób można uzyskać odpowiedź na pytanie, czy optima lokalne badanej instancji są skupione w jakimś fragmencie krajobrazu, czy też są raczej rozrzucone po nim całym. Jeśli optima lokalne są skupione, to oznacza, że mają zwykle wiele wspólnych cech, co może być wykorzystane w projekcie operatorów krzyżowania. Drugim elementem FDA jest ocena siły związku jakości i odległości w pobranej próbce. W tym celu oblicza się wartość współczynnika korelacji liniowej pomiędzy jakością i odległością (ang. fitnessdistance correlation, FDC). Dla próbki postaci s = {s1 , . . . , sN }, FDC oblicza się jako: r= cov(f, dopt ) sf · sdopt gdzie cov oznacza estymatę kowariancji dwóch zmiennych na podstawie próbki: cov(f, dopt ) = N 1 X (f (si ) − f¯)(dopt (si ) − d¯opt ) N i=1 f¯ i d¯opt to średnie wartości jakości i odległości w próbce, a s to wartość odchylenia standardowego odpowiedniej zmiennej w próbce, np.: v u N u1 X t (dopt (si ) − d¯opt )2 sdopt = N i=1 Wysokie dodatnie wartości FDC (dla problemu minimalizacji) sugerują, że optima lokalne instancji są rozłożone wokół optimum globalnego, które znajduje się w centrum. Co więcej, im gorsze są optima lokalne, tym dalej od tego centrum się znajdują (Reeves & Yamada 1998). Taki krajobraz funkcji celu jest wg Jonesa i Forrest (1995) łatwy dla algorytmów genetycznych. Korelacja FDC równa 1 wskazywałaby na liniowy związek jakości i odległości, a więc i na łatwe przeszukiwanie (Merz 2000). Za to wartości ujemne r ukazują problem trudny, gdzie rozwiązania coraz to lepsze są coraz to dalej od celu przeszukiwania. Wartości około zera wskazują na brak związku pomiędzy jakością i odległością w krajobrazie, czyli brak pomocy ze strony funkcji celu w przeszukiwaniu. Zerowa korelacja może także wskazywać na związek nieliniowy, który nie jest dobrze opisywany przez współczynnik korelacji liniowej. Z tego powodu warto się przyjrzeć związkowi jakości i odległości na wykresie rozrzutu. Jeden element próbki stanowi wtedy pojedynczy punkt na wykresie o osiach f (s) i dopt (s). Przykłady 0.8 0.9 0.75 0.8 0.7 0.7 d_pn d_e podobnych wykresów są pokazane na rysunku 1. Zostały one wzięte z rzeczywistych analiz globalnej wypukłości. Dla problemu minimalizacji pożądane jest istnienie trendu w obserwowanym zbiorze punktów: wraz ze zmniejszaniem się wartości funkcji celu (oś pozioma) maleć powinny odległości do optimum (oś pionowa). 0.65 0.6 0.6 0.5 0.55 0.4 0.5 1060 1080 1100 1120 fitness 1140 1160 1180 0.3 1600 1640 1680 1720 fitness 1760 1800 1840 Rysunek 1: Przykłady wykresów rozrzutu jakości i odległości dla dwóch instancji problemu CVRP, z dodanymi prostymi regresji. Dla wykresu po lewej r = 0.54; po prawej r = 0.52. Wykorzystanie globalnej wypukłości w operatorach krzyżowania zachowujących odległość Według Merza (2000) oraz Jaszkiewicza i Kominka (2003) algorytm memetyczny powinien stosować operator krzyżowania zachowujący odległość (ang. distance-preserving crossover, respectful crossover) w przypadku globalnie wypukłego krajobrazu funkcji celu. Taka operacja zachowuje w potomku niezmienione wspólne cechy rozwiązań rodzicielskich, które powodują, że odległość potomka od rodziców jest niewiększa niż odległość rodziców od siebie. Celem takiego projektu jest właśnie to, żeby potomek nie był dowolnie odległy od rodziców, ale żeby odległość ta była kontrolowana i zależna od odległości rodziców. Operatory krzyżowania zachowujące odległość są użyteczne w sytuacji, gdy dobre rozwiązania rozpatrywanego problemu są skupione w niewielkim fragmencie krajobrazu. Wtedy potomki krzy- żowania dziedziczą wiele wspólnych cech rodziców i następuje intensyfikacja obliczeń w niedużej od nich odległości, w tym właśnie niewielkim fragmencie. Jeśli do tego w tym krajobrazie zaobserwowano globalną wypukłość, to rozwiązanie potomne, zachowujące wspólne cechy rodziców, ma większą szansę na lepszą wartość funkcji celu niż rozwiązanie potomne, które tych cech nie zachowuje (Merz 2000, Mattfeld et al. 1999). Przykłady takich efektywnych operatorów można znaleźć w literaturze (Merz 2000, Merz 2002, Merz & Freisleben 2000b, Merz & Freisleben 2000a, Jaszkiewicz & Kominek 2003, Jaszkiewicz 2004). Przykłady literaturowe i pozytywne wyniki własnych eksperymentów skłoniły Jaszkiewicza (2004) do sformułowania schematu adaptacji algorytmu memetycznego do problemu optymalizacji kombinatorycznej opartego na badaniach globalnej wypukłości. Schemat ten składa się z następujących kroków: 1. Wygeneruj zbiory dobrych i zróżnicowanych rozwiązań dla rozważanych instancji problemu. 2. Sformułuj pewne hipotezy dotyczące cech rozwiązań, które miałyby być istotne dla dobrych rozwiązań tego problemu. 3. Dla każdej cechy i każdej instancji zbadaj istotność tych cech dla jakości rozwiązań przy pomocy korelacji jakości i odległości (czyli wykonaj badanie globalnej wypukłości). Odległość jest przy tym zdefiniowana tak, by odzwierciedlała rozpatrywane cechy. 4. Zaprojektuj operator krzyżowania zachowujący odległość, czyli wspólne cechy rozwiązań, o ile zaobserwowano korelację jakości i odległości dla danej cechy. Jeden operator może zachowywać wiele cech różnego rodzaju. Głównym celem tego schematu adaptacji jest redukcja wkładu pracy potrzebnego do zaprojektowania dobrego algorytmu optymalizacji. Ta redukcja jest uzyskiwana przez unikanie projektowania operatorów metodą prób i błędów, co miało miejsce w przeszłości np. dla problemu komiwojażera (zob. uwagi w pracy Jaszkiewicza i Kominka (2003)). Wykorzystanie badania globalnej wypukłości pozwala projektantowi odkryć te cechy rozwiązań, które powinny być zachowywane podczas przeszukiwania, żeby uzyskiwać dobrą jakość. W ten sposób wprowadza się do komponentu algorytmu memetycznego niezbędną wiedzę o strukturze problemu. Wnioski Analiza globalnej wypukłości bada bardzo ciekawy aspekt przestrzeni rozwiązań problemu optymalizacji: związek pomiędzy jakością rozwiązań i ich odległością wzajemną lub do optimum globalnego. Jeśli istnieje taki związek, wyrażany przez dodatnią korelację jakości i odległości dla problemu minimalizacji, oznacza to, że coraz lepsze rozwiązania problemu są bliżej siebie i jednocześnie bliżej optimum. Ten związek uzasadnia wprowadzanie do metaheurystyk komponentów zachowujących odległość aby podnieść efektywność przeszukiwania. Taka idea, że na podstawie pewnej miary odległości pomiędzy rozwiązaniami problemu kombinatorycznego można projektować lepsze komponenty algorytmu, nie była jeszcze proponowana. Trzeba jednak przyznać, że analiza FDA nie jest jeszcze w pełni rozwiniętą metodą badania: • brak odpowiedniego modelu matematycznego globalnej wypukłości, • reguły interpretacji wartości współczynnika korelacji są do pewnego stopnia uznaniowe, • wynik analizy zależy od instancji i może być niejednoznaczny dla całego problemu, • istnieją różne wersje procedury analizy, o różnych własnościach, • znajomość optimów globalnych jest wymagana dla uzyskania poprawnego wyniku, • bez znajomości optimów globalnych wynik analizy jest jedynie przybliżony, • istnieją teoretyczne argumenty przeciw sensowności korelacji FDC jako dobrego wskaźnika trudności instancji problemu. Jednak pomimo tych argumentów przeciw FDA należy powiedzieć, że analizy wykonane do tej pory jasno wykazały istnienie takiego zjawiska jak globalna wypukłość w przestrzeniach kilku problemów optymalizacji. Trudno więc zaprzeczać samemu istnieniu takiego zjawiska, nawet gdy metoda jego badania nie jest jeszcze wystarczająco dobra. Według autora tej pracy jest bardzo prawdopodobne, że zjawisko globalnej wypukłości istnieje także w innych, jeszcze nie przebadanych problemach. Patrząc z drugiej strony, zagadnienia adaptacji, można mieć wątpliwości co do istnienia związku pomiędzy globalną wypukłością instancji problemu i trudnością tej instancji dla algorytmu memetycznego (Bierwirth et al. 2004, Hoos & Stutzle 2004). Istnienie tego związku zostało do tej pory potwierdzone jedynie jakościowo, np. przez prace wymienione w rozprawie. Wyrażenie ilościowe takiego lub podobnego związku trudno jeszcze znaleść w literaturze. Istnieje pewna pionierska praca Watsona i innych (2003), ale dotyczy ona problemu szeregowania zadań (job shop) i algorytmu przeszukiwania tabu. Wygląda więc na to, że wiele jeszcze pozostało do zrobienia, by wyjaśnić istnienie takiego związku i wykazać, że jest on silny. Wielu badaczy wierzy jednak, że taki związek istnieje. Jednym z nich jest autor tej rozprawy. Uważa on, że pomiędzy globalną wypukłością i efektywnością algorytmu memetycznego z operatorami zachowującymi odległość istnieje silny związek. To przekonanie jest oparte przede wszystkim na wynikach dotychczasowych prac Boese i innych (1994), Reevesa i Yamady (1998), szeregu prac Merza (2000, 2004), Jaszkiewicza (1999, 2004) i także Kominka (2003). Te prace przekonują autora, że projektowanie operatorów zachowujących odległość dla algorytmu memetycznego daje, w przypadku globalnej wypukłości, dobre wyniki w optymalizacji. Rozdział 6. Problem planowania tras dla pojazdów z ograniczeniami pojemnościowymi W tym rozdziale opisany jest problem planowania tras dla pojazdów z ograniczeniami pojemnościowymi (ang. the capacitated vehicle routing problem, CVRP). Problem taki polega na wyznaczeniu tras dla pojazdów firmy transportowej. Firma ma za zadanie dostarczyć pewien towar ze swojej centrali do pewnej liczby klientów. Klienci są rozproszeni geograficznie, odpowiednie odległości pomiędzy klientami i magazynem są z góry dane. Każdy klient zgłosił wcześniej pewne zapotrzebowanie na towar. Firma posiada identyczne pojazdy z pewnym zadanym ograniczeniem pojemności przewozowej. Trasa dla każdego z pojazdów rozpoczyna się w centrali firmy, prowadzi przez pewnych klientów, gdzie dokonywany jest rozładunek wymaganej ilości towaru, po czym powraca do centrali. W żadnym momencie pojazd nie może być przeładowany, tzn. nie może być przekroczona jego pojemność. Każdy klient jest obsługiwany przez dokładnie jeden pojazd. Celem firmy jest wyznaczenie takiego zestawu tras, który zapewnia obsłużenie wszystkich klientów oraz minimalizację długości wszystkich tras. Ta sumaryczna długość całego zestawu tras modeluje koszt wszystkich dostaw realizowanych przez firmę. Ten problem jest trudny obliczeniowo (NP-trudny). Nawet dla dosyć małych instancji tego problemu z 75 klientami optima globalne pozostają nieznane. Przegląd heurystyk i metaheurystyk Autor dokonał przeglądu niektórych istniejących algorytmów heurystycznych i metaheurystycznych dla tego problemu. Z przeglądu wynika, że algorytmy metaheurystyczne dla CVRP są raczej skomplikowanymi konstrukcjami, z wieloma komponentami, technikami przyspieszania, strategiami dywersyfikacji, a nawet procedurami obliczającymi rozwiązania pewnych podproblemów w sposób dokładny. Można jednak znaleźć pewne wspólne idee w tych metaheurystykach, a w szczególności w algorytmach ewolucyjnych. Jeśli chodzi o przeszukiwanie lokalne, to widać, że jest ono konieczne w efektywnej adaptacji metaheurystyki do CVRP. Najlepsze do tej pory algorytmy są mocno oparte na takim przeszukiwaniu. Do tego musi ono być szybkie. W większości z zaprezentowanych algorytmów to właśnie przeszukiwanie lokalne zabiera największą część czasu obliczeń, a więc jego projekt ma duży wpływ na całkowity czas. Najlepsze projekty metaheurystyk dąża też do przyspieszania przeszukiwania lokalnego przez wykorzystanie różnorodnych technik. Specjalizowana reprezentacja rozwiązań i operatory rekombinacji W przejrzanych publikacjach można zauważyć, że specjalizowane reprezentacje i operatory krzyżowania oraz mutacji są konieczne dla CVRP. Widać w tych publikacjach kilka podejść. Jedno z podejść jest reprezentowane przez prace Rochata i Taillarda (1995) oraz Potvina i Bengio (1996). Ci badacze mieli intuicję, że operator krzyżowania powinien tworzyć potomka przez dziedziczenie z rodziców całych tras. Ta intuicja opierała się w przypadku Rochata i Taillarda na wizualnej analizie podobieństwa dobrych heurystycznych rozwiązań CVRP do rozwiązania najlepszego znanego. Stąd też ich procedura konstrukcji potomka. Na wejściu otrzymuje ona dużą liczbę pełnych tras z dobrych rozwiązań wygenerowanych wcześniej i próbuje niejako złożyć potomka z tych tras. Także operator krzyżowania RBX Potvina i Bengio próbuje przenosić z rodziców do potomka pełne trasy. Bardzo podobnie działa operator krzyżowania zaproponowany przez Tavaresa i innych (2003). Te konstrukcje nie były jednak wsparte żadnymi analizami przestrzeni rozwiązań, które mogłyby potwierdzić tę intuicję dotyczącą podobieństwa tras. Kolejne podejście do projektowania operatorów było prawdopodobnie oparte na podobieństwie pomiędzy problemem CVRP i problemem komiwojażera (TSP). Po niewielkiej adaptacji operatory charakterystyczne dla TSP były używane także dla CVRP (Gendreau et al. 2002). Niedawno także różne operatory rekombinacji krawędzi (ang. edge recombination, edge-assembly crossover) były adaptowane do CVRP (Alba & Dorronsoro 2004, Alba & Dorronsoro 2006, Nagata 2007). Takie operatory przenoszą pewne krawędzie tras z rodziców do potomków. I jest to najpewniej sensowna strategia dla problemu komiwojażera, bo analizy krajobrazu funkcji celu dla tego problemu wskazywały, że to krawędzie są w tym problemie nośnikiem istotnej informacji o jakości rozwiązań. I choć podobieństwo TSP i CVRP sugeruje, że tak samo może być dla CVRP, to jednak dla problemu planowania tras takie analizy nie zostały wykonane. Osobnym przypadkiem jest operator SPX Prinsa (2001), także stosowany niedawno przez Jozefowieza i innych (2007). Jest to de facto operator krzyżowania porządkowego stosowany na reprezentacji permutacyjnej. Korzysta on z dodatkowej dokładnej procedury dekodowania permutacji do postaci rozwiązania CVRP. To podejście z dokładnym dekodowaniem redukuje znacznie rozmiar przestrzeni przeszukiwań, gdyż permutacji jest zazwyczaj mniej niż permutacji z podziałami (jakimi są normalnie rozwiązania problemu planowania tras). Jednak zastosowanie krzyżowania porządkowego na permutacjach zostało przez Prinsa uzasadnione wynikami optymalizacji, a nie badaniami przestrzeni rozwiązań. Ten operator porządkowy został wybrany „po pewnych wstępnych testach” (Prins 2001). W takim razie wygląda na to, że istniejące projekty operatorów rekombinacji dla CVRP były oparte głównie na dobrej intuicji, podobieństwie problemów CVRP i TSP, oraz wynikach wstępnych eksperymentów obliczeniowych. Sensowność operatorów była potem sprawdzana w dalszych eksperymentach z optymalizacją. Nie była jednak w żadnym z prezentowanych przypadków oparta na teoretycznych lub empirycznych analizach przestrzeni rozwiązań problemu planowania tras. Autor tej rozprawy nie widział w literaturze np. żadnej analizy globalnej wypukłości dla CVRP, choć niezbędne dla przeprowadzenia tej analizy miary odległości rozwiązań były całkiem niedawno proponowane. Systematyczna konstrukcja operatorów krzyżowania dla CVRP oparta na badaniach globalnej wypukłości, która jest głównym tematem następnego rozdziału, jest więc pierwszym podejściem do projektowania krzyżowania opierającym się na analizie krajobrazu funkcji celu. Rozdział 7. Adaptacja algorytmu memetycznego do problemu planowania tras oparta na badaniach globalnej wypukłości Rozdział ten prezentuje adaptację algorytmu memetycznego do problemu planowania tras. Widać w tym rozdziale, że ta adaptacja wymaga podjęcia szeregu decyzji projektowych. Autor zaprezentował w szczególności wybraną reprezentację, projekt szybkiego przeszukiwania lokalnego, wybór heurystyk do generowania rozwiązań początkowych. Co ważniejsze, w rozdziale jest zaprezentowana systematyczna konstrukcja operatorów krzyżowania oparta na badaniu globalnej wypukłości. To w niniejszej pracy po raz pierwszy zostało wykonane takie badanie problemu planowania tras. Najpierw autor zaproponował i zaimplementował miary odległości dla rozwiązań tego problemu. Potem wykorzystał te miary w analizie globalnej wypukłości. Analiza ta wykazała, że średnie odległości de , dpn , deu , dear pomiędzy optimami lokalnymi są ok. 30% mniejsze niż średnie odległości rozwiązań losowych. To oznacza, że optima lokalne są do siebie podobne w sensie cech mierzonych przez te rodzaje odległości i że są do pewnego stopnia skoncentrowane w badanych krajobrazach. Co więcej, analiza FDA wykazała istnienie umiarkowanych wartości korelacji jakości i odległości de , dpn , deu , co oznacza, że im lepsze są optima lokalne, tym więcej posiadają wspólnych cech (krawędzi, klastrów, podsekwencji klientów). Przykładowe wykresy jakości i odległości dla CVRP, pokazujące istnienie tej korelacji, są pokazane na rysunku 1. Obecność tej korelacji zależna jest jednak od analizowanej instancji problemu. Te wyniki analizy FDA potwierdzają do pewnego stopnia intuicję badawczą wyrażaną wcześniej np. przez Rochata i Taillarda (1995), że dobre rozwiązania CVRP są podobne do tych najlepszych znanych. W ich pracy ta intuicja była oparta na wizualnym podobieństwie rozwiązań. Za to w tej pracy została wyrażona w sposób obiektywny w postaci pewnych miar odległości rozwiązań i przeanalizowana empirycznie. Wygląda jednak na to, że ich intuicja była prawdziwa tylko do pewnego stopnia (ze względu na umiarkowane wartości korelacji) i nie dla wszystkich instancji. Te same wyniki analizy FDA były dalej dla autora podstawą dla projektu i implementacji 4 operatorów zachowujących odległość: CPX2, CEPX, CECPX2, GCECPX2. Pierwszy operator zachowuje dpn , drugi de , a dwa ostatnie obie te miary odległości. Dodatkowo zaprojektowany został operator mutacji CPM, który zachowuje odległość dpn , ale zaburza de . Te operatory, razem z RBX i SPX wziętymi z literatury, zostały przetestowane w dwóch eksperymentach z wykorzystaniem algorytmu memetycznego. Wyniki eksperymentów pozwalają na wyciągnięcie następujących wniosków. • W eksperymencie z krótkimi uruchomieniami po 256 sekund szybkość operacji krzyżowania i następującego po nim przeszukiwania lokalnego okazała się istotna. RBX okazał się szybszy niż wszystkie zaprojektowane przez autora operatory, a SPX szybszy niż niektóre z nich. Do tego SPX i RBX dawały lepsze wyniki w tym eksperymencie, co zostało potwierdzone przez testy statystyczne. • Zachowujący odległość operator CEPX wykazał jednak większe prawdopodobieństwo generowania coraz to lepszych rozwiązań w algorytmie memetycznym niż RBX i SPX. Można to było zaobserwować po liczbie generacji algorytmu w eksperymencie ze zbieżnością. Co więcej, testy statystyczne wykazały najlepszą jakość rozwiązań generowanych właśnie przez CEPX w tym eksperymencie. • Obecność mutacji CPM w algorytmie memetycznym wpływała pozytywnie na jakość generowanych rozwiązań. Najbardziej potrzebna ta mutacja była operatorom silnie burzącym cechy niewspólne rodziców (np. CPX2), a najmniej dla SPX. • Wyniki algorytmu memetycznego były bardzo dobre. Średni odstęp jakości rozwiązań od jakości rozwiązań najlepszych znanych wyniósł 0,5–0,7% dla wszystkich rodzajów krzyżowania, z niewielkim rozrzutem. Co więcej, dla połowy z analizowanych instancji pewne uruchomienia algorytmu znajdowały te najlepsze znane rozwiązania. Najlepsze rezultaty pod względniem jakości dawał algorytm memetyczny z operatorami CEPX i CPM, które zostały zaprojektowane przez autora. Zaprezentowane wyniki eksperymentów oznaczają według autora, że dobry operator krzyżowania dla CVRP powinien zachowywać wspólne cechy rodziców (krawędzie, klastry) i do tego nie burzyć silnie cech niewspólnych. Gdy potrzebna jest duża szybkość operatora, to można w nim zawrzeć podprocedurę zachłannego doboru cech niewspólnych (jak w GCECPX2). To powoduje mniejsze ich zaburzanie i redukuje liczbę iteracji przeszukiwania lokalnego po krzyżowaniu. Z operatorów przetestowanych w tej rozprawie autor wybrałby SPX Prinsa (2004) do krótkich uruchomień algorytmu memetycznego. W takich warunkach ten operator generuje całkiem szybko dobre rozwiązania dla CVRP. Gdy można poświęcić więcej czasu na obliczenia i liczy się bardziej jakość uzyskanych rozwiązań, autor poleciłby zastosowanie operatora krzyżowania zachowującego krawędzie, CEPX, i mutacji zachowującej klastry, a zaburzającej krawędzie, CPM. To ta para generowała w algorytmie memetycznym najlepsze rozwiązania. Autor podjął także próbę analizy bezpośredniego związku jakości rozwiązań z jednego eksperymentu obliczeniowego z wynikami badania globalnej wypukłości. Ta próba nie powiodła się: nie znaleziono żadnego związku tego rodzaju. W takim razie nie udało się potwierdzić wcześniejszych stwierdzeń innych autorów, że korelacja jakości i odległości rozwiązań może służyć do predykcji trudności danego problemu dla algorytmu ewolucyjnego. Jakość rozwiązań generowanych przez algorytm memetyczny okazała się związana w pewnym stopniu z procentem dopuszczalnych sąsiadów najlepszych znanych rozwiązań: im mniejsza frakcja dopuszczalnych sąsiadów, tym trudniej o dobre rozwiązania w algorytmie memetycznym. Być może warto w takim razie w tym algorytmie przyjmować do populacji także rozwiązania niedopuszczalne, by poprawić eksplorację granicy rozwiązań dopuszczalnych i niedopuszczalnych. Stosowany w tej pracy algorytm memetyczny przyjmował do populacji jedynie rozwiązania dopuszczalne. Podsumowując, metoda systematycznej konstrukcji operatorów rekombinacji oparta na badaniach globalnej wypukłości dała wg autora dobry wynik dla rozpatrywanego problemu planowania tras. Najlepszy z zaprojektowanych operatorów, zachowujący krawędzie CEPX, używany z mutacją CPM generuje rozwiązania najlepsze z punktu widzenia jakości wśród wszystkich przetestowanych operatorów. Rozdział 8. Problem planowania produkcji samochodów w fabrykach Renault Problem planowania produkcji samochodów w fabrykach Renault (ang. Renault’s car sequencing problem, CarSP) wymaga, by zadany zbiór pewnych samochodów został ustawiony w pewnej kolejności (sekwencji) na linii produkcyjnej. Ta kolejność musi zachować przy tym ograniczenia technologiczne procesu produkcji w poszczególnych jej etapach: w lakierni i w montowni (nie rozważa się w tym problemie zadań związanych z pierwszym etapem, spawalnią). Lakiernia wymaga, by samochody o tym samym kolorze nadwozia były ustawione bezpośrednio po sobie w sekwencji. Takie ustawienie minimalizuje koszt czyszczenia pistoletów natryskowych w lakierni, gdyż muszą one być czyszczone po każdej zmianie koloru samochodu na linii. Wymaganie montowni to przede wszystkim równomierne rozłożenie w sekwencji nakładu pracy potrzebnego do zmontowania samochodów. Ten nakład jest związany z koniecznością instalacji w pojazdach pewnych dodatkowych elementów (opcji) wyposażenia, np. szyberdachu, systemu nawigacji, elektrycznych szyb. Wymaganych opcji jest wiele, a samochody ustawione na linii produkcyjnej często wymagają różnych zestawów opcji. Dlatego nakład pracy na pewnych stanowiskach wzdłuż linii może się zmieniać bardzo nierównomiernie, zależnie właśnie od kolejności samochodów. To może prowadzić do częstych opóźnień i przestojów produkcji, i stanowi jej dodatkowy koszt. Ten koszt jest modelowany przez liczbę naruszeń pewnych ograniczeń proporcji, nałożonych na opcje pojazdów w sekwencji. Ostatecznie rozwiązaniem optymalnym problemu planowania produkcji samochodów jest minimalizacja ważonej sumy liczby zmian koloru (lakiernia) i liczb naruszeń ograniczeń proporcji (montownia), przy jednoczesnym zachowaniu wymaganych ograniczeń produkcji. Praktyczne przypadki tego problemu są trudne obliczeniowo (NP-trudne), ze względu na ograniczenia proporcji. Za to minimalizacja liczby zmian koloru w lakierni jest problemem prostym. Przegląd heurystyk i metaheurystyk Przegląd kilku istniejących algorytmów dla problemu planowania produkcji samochodów prowadzi do następujących spostrzeżeń. Istotne przeszukiwanie lokalne Wygląda na to, że przeszukiwanie lokalne jest podstawą efektywnych algorytmów dla tego problemu. Najlepsze istniejące metody opierają się na właśnie takim przeszukiwaniu. Do tego wszystkie przeanalizowane algorytmy używają pewnego podzbioru tego samego zestawu operatorów sąsiedztwa: wymiany (ang. swap), wstawienia (insert), odwrócenia (reflection), tasowania (random shuffle). To nie są operatory specjalnie przeznaczone dla problemu CarSP. To są raczej ogólne operacje, które były już stosowane w wielu problemach. Nie jest do końca jasne które z tych ogólnych operatorów, lub jaka ich kombinacja, sprawują się najlepiej dla CarSP i dlaczego. Raczej to szybkość obliczania wartości funkcji celu dla rozwiązań sąsiednich jest kluczem do sukcesu przy wyborze operatora. Dobre heurystyki początkowe Kolejnym wspólnym elementem przeanalizowanych algorytmów jest stosowanie przez nie tej samej dobrej heurystycznej idei Puchty i Gottlieba (2002): dynamicznej sumy użyteczności (ang. the dynamic sum of utilities, DSU). Poza autorami tej idei wykorzystują ją także Estellon i inni (2006). A Zinflou i inni (2007) używają jej jako elementu swojego krzyżowania NCPX. Co poza przeszukiwaniem lokalnym? Najlepsze z prezentowanych algorytmów nie odchodziły daleko od idei przeszukiwania lokalnego: Ribeiro i inni (2005) po prostu iterowali je, wykorzystując przy tym pewne mechanizmy perturbacji (mutacji); Estellon i inni (2006) wykorzystali dodatkowo specjalny operator k-permutacji. Algorytmy innego rodzaju są rzadkością. Operatory rekombinacji? Dosyć trudno znaleźć dla problemu CarSP dobre operatory krzyżowania. Bardzo niedawno Zinflou i inni (2007) dokonali właśnie takiej obserwacji, którą umotywowali 3 swoje nowe propozycje takich operatorów. Poza nimi do rozwiązywania CarSP były stosowane: klasyczny operator krzyżowania jednopunktowego i dostosowany operator krzyżowania jednorodnego (ang. uniform adaptive crossover, UAX). Jednak projekty tych operatorów były motywowane przede wszystkim intuicją, pewnymi pomysłami heurystycznymi (DSU) lub też doświadczeniami z innych problemów. Żaden z przytaczanych autorów nie próbował teoretycznie lub empirycznie odkryć, jaki rodzaj informacji jest najistotniejszy dla dobrej jakości rozwiązań CarSP. Z tego powodu w rozprawie podjęta została próba empirycznej analizy globalnej wypukłości. Wyniki tej analizy są dalej podstawą do zaproponowania projektu operatorów dla algorytmu memetycznego, zgodnie ze schematem systematycznej ich konstrukcji. Rozdział 9. Adaptacja algorytmu memetycznego do problemu planowania produkcji samochodów oparta na badaniach globalnej wypukłości Treścią tego rozdziału jest adaptacja algorytmu memetycznego do problemu firmy Renault. Opisano w nim następujące elementy adaptacji: wybraną reprezentację, projekt przeszukiwania lokalnego, projekt operatorów krzyżowania i mutacji. W szczególności, operator krzyżowania został zaprojektowany na podstawie wyników analizy globalnej wypukłości. Przed analizą autor sformułował kilka hipotez dotyczących tych cech rozwiązań problemu, które mogą mieć wpływ na wartość funkcji celu. Cechy te zostały odzwierciedlone w zaproponowanych miarach podobieństwa rozwiązań. Analiza podobieństwa optimów lokalnych potwierdziła wstępne hipotezy: pozycje pojazdów na linii produkcyjnej nie mają znaczenia w problemie CarSP, ale za to ma znaczenie istnienie pewnych takich samych podsekwencji pojazdów (niezależnie od ich miejsca na linii). Podobieństwo rozwiązań w sensie następstwa pojazdów na linii również okazało się istotna dla jakości, ale w mniejszym stopniu. Przykładowe wykresy jakości i podobieństwa, ilustrujące znalezione korelacje, są pokazane na rysunku 2. Warto zauważyć, że na osiach pionowych wykresów są pokazane wartości podobieństwa, a nie odległości. Niestety, okazało się, że wysoka korelacja jakości i dwóch rodzajów podobieństwa nie jest własnością problemu CarSP jako całości, choć wystąpiła dla większości analizowanych instancji. Jest to raczej własność tylko pewnych typów instancji lub nawet pojedynczych instancji. Uzyskane wyniki dały jednak podstawy do zaprojektowania operatora krzyżowania CCSPX-2, zachowującego w potomku wspólne sekwencje pojazdów rodziców. Podobnie, mutacja RSM zaburzająca podsekwencje została wybrana do zastosowania. Operatory te zostały zaimplementowane i przetestowane w algorytmie memetycznym. W dwóch eksperymentach obliczeniowych porównano je do operatorów proponowanych w literaturze. Wyniki eksperymentów pokazały, że ta para zaproponowanych operatorów była najlepsza w sensie kilku wskaźników jakości. Po pierwsze, generowała rozwiązania o najlepszej średniej 0.062 0.069 0.068 0.061 sim_cs sim_cs 0.067 0.06 0.059 0.066 0.065 0.058 0.057 300000 0.064 320000 340000 fitness 360000 0.063 240000 260000 280000 300000 fitness Rysunek 2: Przykłady wykresów rozrzutu jakości i podobieństwa dla dwóch instancji problemu CarSP, z dodanymi prostymi regresji. Dla wykresu po lewej r = 0.68; po prawej r = 0.57. jakości w długich uruchomieniach algorytmu, do uzyskania zbieżności. Dla jednej instancji wygenerowała nawet rozwiązanie lepsze niż najlepsze znane dotychczas. Po drugie, te operatory wykazywały największe prawdopodobieństwo wstawienia nowych dobrych rozwiązań do populacji. Po trzecie, potrzebowały zdecydowanie najmniejszej liczby iteracji przeszukiwania lokalnego do poprawy swoich potomków, co znacznie przyspieszało obliczenia. W krótkich uruchomieniach algorytmu memetycznego, zaprojektowana mutacja RSM okazała się najważniejszym operatorem. Miała w nich największy wkład w poprawę rozwiązań. Operator krzyżowania CCSPX-2 był drugi w kolejności znaczenia, choć dla największych instancji jego wkład był równy mutacji. Wygląda więc na to, że krzyżowanie CCSPX-2 było szczególnie użyteczne w długich uruchomieniach algorytmu i dla dużych instancji. Autor podjął także próbę związania wyników eksperymentów obliczeniowych i wyników analizy FDA dla podobieństwa wg podsekwencji. Ta próba się nie udała i żaden związek nie został znaleziony. Prawdopodobnie przeszkodą było to, że wiele z czynników zwykle wpływających na efektywność algorytmu było niekontrolowanych w przeprowadzanych eksperymentach. Podsumowując, metoda systematycznej konstrukcji operatorów krzyżowania opierająca się na badaniach globalnej wypukłości dała dobry wynik w przypadku problemu firmy Renault. Zaprojektowane operatory zachowujące lub zaburzające wspólne podsekwencje to najlepsze operatory dla algorytmu memetycznego zaproponowane do tej pory. Rozdział 10. Podsumowanie Głównym celem pracy było przeprowadzenie i ocena adaptacji algorytmu memetycznego do dwóch problemów optymalizacji na podstawie proponowanego schematu, opierającego się na badaniach globalnej wypukłości. Cel ten został osiągnięty: adaptacja została przeprowadzona i oceniona eksperymentalnie. Następujące elementy pracy stanowią najważniejsze, oryginalne wyniki autora. • Definicja i implementacja właściwych dla analizowanych problemów miar odległości (podobieństwa): de , dpn , dpc dla problemu CVRP i simcs , simcsuc dla problemu CarSP. • Analiza globalnej wypukłości tych dwóch problemów z wykorzystaniem miar własnych i proponowanych w literaturze. Wykazała ona, że lokalne optima są do pewnego stopnia podobne i skoncentrowane w analizowanych krajobrazach funkcji celu. Korelacja jakości i pewnych typów odległości została znaleziona w większości analizowanych instancji. • Konstrukcja operatorów krzyżowania zachowujących odległość i operatorów mutacji zaburzających odległość, właśnie na podstawie wyników analizy globalnej wypukłości. Te operatory to: CPX2, CEPX, CECPX, GCECPX2 i CPM dla problemu CVRP; CCSPX-2 i RSM dla problemu CarSP. • Eksperymentalne porównanie zaprojektowanych operatorów z podobnymi operatorami z literatury. To porównanie wykazało, że operatory proponowane w tej pracy (CEPX i CPM; CCSPX-2 i RSM) generują najlepsze rozwiązania w długich uruchomieniach algorytmu memetycznego, aż do uzyskania zbieżności. Te operatory mogą nie być jednoznacznie najlepsze dla krótkich uruchomień, ale pozostają w grupie najlepszych. Można w takim razie powiedzieć, że metoda konstrukcji operatorów opierająca się na analizie globalnej wypukłości dała dobry rezultat dla dwóch rozpatrywanych problemów. Łącznie z wcześniejszymi analizami i projektami innych autorów, wyniki tej rozprawy wzmacniają podstawy do stosowania tej metody w praktyce. Pewne elementy tej pracy stanowią także dodatkowy wkład autora do rozwoju badań dotyczących zagadnienia globalnej wypukłości. • Przegląd dotychczas wykonanych analiz globalnej wypukłości jest najprawdopodobniej najszerszym przeglądem tego rodzaju dostępnym w literaturze. Może on być cennym źródłem informacji dla badaczy zainteresowanych taką metodą analizy. • Nowa wersja metody badania globalnej wypukłości ma zapewne lepsze własności statystyczne i praktyczne niż wersje stosowane wcześniej. Perspektywy dalszych badań Pewne zagadnienia rozpatrywane w tej rozprawie pozostają po jej zakończeniu nadal otwarte: • ustalenie matematycznego modelu globalnej wypukłości; • określenie granic praktycznego znaczenia korelacji jakości i odległości; • określenie warunków na istnienie znaczącej globalnej wypukłości, np. typów instancji; • ilościowe wyrażenie związku siły globalnej wypukłości i efektywności algorytmu memetycznego ją wykorzystującego; • weryfikacja korelacji znalezionych w rozpatrywanych problemach przez wykorzystanie metody stosującej optima globalne; • wykonanie analiz globalnej wypukłości dla kolejnych problemów kombinatorycznych; • obiektywna ocena własności zaproponowanej wersji metody badania globalnej wypukłości; • jednoznaczne określenie relacji pomiędzy twierdzeniami „nic za darmo” i praktycznymi problemami optymalizacji. Dokładne przeanalizowanie tych zagadnień najprawdopodobniej doprowadzi w przyszłości do lepszego zrozumienia własności przestrzeni trudnych problemów optymalizacji. W rezultacie może to pomóc w ustaleniu solidnych podstaw do jeszcze bardziej efektywnych adaptacji algorytmów metaheurystycznych do takich problemów. Contents Acknowledgement iii Extended abstract in Polish v 1 Introduction 1 1.1 Context of the research subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Initial assumptions and the main hypothesis . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Goals of the dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Published work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Metaheuristics in combinatorial optimisation 2.1 5 Problems of combinatorial optimisation . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Examples of problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Computational complexity of algorithms and problems . . . . . . . . . . . . . . . . 8 2.3 Methods of dealing with hard combinatorial optimisation problems . . . . . . . . . 9 2.4 Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.5 2.4.1 Local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.2 Ant colony optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.3 Hyperheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.4 Evolutionary algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4.5 Memetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Metaheuristics: schemes of algorithms which require adaptation . . . . . . . . . . . 20 3 The No Free Lunch theorems and their consequences for optimisation 3.1 21 Formulations of the theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.1 The original formulation by Wolpert and Macready . . . . . . . . . . . . . . 21 3.1.2 The strengthened formulation by Schumacher et al. . . . . . . . . . . . . . . 22 3.2 Major consequences of the theorems . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 The No Free Lunch theorems vs. the practise of optimisation . . . . . . . . . . . . 23 3.3.1 Practical optimisation problems are not subject to No Free Lunch . . . . . 23 3.3.2 Practical algorithms are not subject to No Free Lunch . . . . . . . . . . . . 25 3.3.3 Not only the sampled points matter . . . . . . . . . . . . . . . . . . . . . . 26 Conclusions: practical implications of the theorems . . . . . . . . . . . . . . . . . . 27 3.4.1 No general tools of optimisation . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.2 There is some structure in search space of particular problems . . . . . . . 28 3.4.3 Structure should be exploited . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 I II Contents 3.4.4 Analysis first, exploitation second . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.5 What is structure? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.6 Caution required while evaluating algorithms on benchmarks . . . . . . . . 29 Implications for evolutionary algorithms . . . . . . . . . . . . . . . . . . . . . . . . 30 4 Adaptation of an evolutionary algorithm to a combinatorial optimisation problem 31 3.5 4.1 Representation of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Fitness function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3 Initial population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4 Crossover operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4.1 Importance of crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4.2 The Schema Theorem and the choice of crossover . . . . . . . . . . . . . . . 35 4.4.3 Adaptation of crossover to a problem . . . . . . . . . . . . . . . . . . . . . . 36 4.5 Mutation operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.6 Local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.6.1 Place for local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.6.2 Choice of a local search type . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.6.3 Choice of a neighbourhood . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.6.4 Neighbourhood and landscape structure . . . . . . . . . . . . . . . . . . . . 46 4.6.5 Efficiency of local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.7 Other components and techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5 Fitness-distance analysis 5.1 5.2 5.3 5.4 49 Fitness landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.1.1 Neighbourhood-based definition . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.1.2 Distance-based definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.1.3 Comparison of definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1.5 Landscape and fitness function . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.1.6 Landscape and distance measure . . . . . . . . . . . . . . . . . . . . . . . . 51 Fitness-distance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2.1 Basic approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2.2 Examples of analyses from the literature . . . . . . . . . . . . . . . . . . . . 55 Exploitation of fitness-distance correlation in a memetic algorithm . . . . . . . . . 63 5.3.1 Design of respectful recombination . . . . . . . . . . . . . . . . . . . . . . . 63 5.3.2 Adaptation of mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.3.3 Adaptation of local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.3.4 Adaptation of other components . . . . . . . . . . . . . . . . . . . . . . . . 69 Variants of the fitness-distance analysis . . . . . . . . . . . . . . . . . . . . . . . . 69 5.4.1 Analysis with only one global optimum known . . . . . . . . . . . . . . . . 69 5.4.2 Analysis with the distance to the best-known solution . . . . . . . . . . . . 69 5.4.3 Analysis with the average distance to all other local optima . . . . . . . . . 70 5.4.4 Analysis with the average distance to not worse solutions . . . . . . . . . . 71 5.4.5 Tests for the value of the FDC . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.4.6 Analysis of a set of pairs of solutions . . . . . . . . . . . . . . . . . . . . . . 73 III 5.4.7 5.5 Comparison of all approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.5.1 Fitness-distance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.5.2 Exploitation of FDC in metaheuristic algorithms . . . . . . . . . . . . . . . 76 6 The capacitated vehicle routing problem 6.1 77 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.1.1 Versions and extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.2 Instances used in this study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.3 Heuristic algorithms for the CVRP . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.3.1 Savings algorithm by Clarke and Wright . . . . . . . . . . . . . . . . . . . . 80 6.3.2 Sweep algorithm by Gillet and Miller . . . . . . . . . . . . . . . . . . . . . . 81 6.3.3 First-Fit Decreasing algorithm for bin packing . . . . . . . . . . . . . . . . 81 Metaheuristic algorithms for the CVRP . . . . . . . . . . . . . . . . . . . . . . . . 82 6.4.1 Iterated tabu search by Taillard . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.4.2 Iterated tabu search by Rochat and Taillard . . . . . . . . . . . . . . . . . . 85 6.4.3 Route-based crossover by Potvin and Bengio . . . . . . . . . . . . . . . . . 87 6.4.4 Memetic algorithm by Prins . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.4.5 Cellular genetic algorithm by Alba and Dorronsoro . . . . . . . . . . . . . . 91 6.4.6 6.4 Other algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7 Adaptation of the memetic algorithm to the capacitated vehicle routing problem 97 6.5 7.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 7.2 Fitness function and constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.3 Local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.3.1 Merge of 2 routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7.3.2 Exchange of 2 edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.3.3 Exchange of 2 customers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.3.4 Composition of neighbourhoods . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.3.5 Acceleration techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7.3.6 Measuring the speed of local search . . . . . . . . . . . . . . . . . . . . . . . 106 7.4 7.5 7.6 Initial solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.4.1 Heuristic solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.4.2 Random solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Fitness-distance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.5.1 New distance metrics for solutions of the CVRP . . . . . . . . . . . . . . . 110 7.5.2 Distance measures defined in the literature . . . . . . . . . . . . . . . . . . 112 7.5.3 Random solutions vs. local optima . . . . . . . . . . . . . . . . . . . . . . . 114 7.5.4 Fitness-distance relationships . . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.5.5 Main conclusions from the fitness-distance analysis . . . . . . . . . . . . . . 122 Recombination operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.6.1 CPX2: clusters preserving crossover . . . . . . . . . . . . . . . . . . . . . . 124 7.6.2 CEPX: common edges preserving crossover . . . . . . . . . . . . . . . . . . 125 7.6.3 CECPX2: common edges and clusters preserving crossover . . . . . . . . . 125 7.6.4 GCECPX2: greedy CECPX2 . . . . . . . . . . . . . . . . . . . . . . . . . . 127 IV Contents 7.7 CPM: clusters preserving mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.8 Experiments with initial solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.9 Experiments with memetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.9.1 Long runs until convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.9.2 Runs limited by time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.9.3 Quality vs. FDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.9.4 Quality vs. feasibility of neighbours . . . . . . . . . . . . . . . . . . . . . . 142 7.10 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 8 The car sequencing problem 147 8.1 ROADEF Challenge 2005 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.2.1 Other forms of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 8.2.2 Groups of cars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.3 Computational complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.4 Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.5 Heuristic algorithms for the CarSP . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 8.6 8.7 8.5.1 Greedy heuristics by Gottlieb et al. . . . . . . . . . . . . . . . . . . . . . . . 153 8.5.2 Insertion heuristic by Ribeiro et al. . . . . . . . . . . . . . . . . . . . . . . . 155 Metaheuristic algorithms for the CarSP . . . . . . . . . . . . . . . . . . . . . . . . 156 8.6.1 Local search by Gottlieb et al. . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.6.2 Iterated local search by Ribeiro et al. 8.6.3 Local search and very large neighbourhood by Estellon et al. . . . . . . . . 157 8.6.4 Generic genetic algorithm by Warwick and Tsang . . . . . . . . . . . . . . . 159 8.6.5 Genetic algorithm by Terada et al. . . . . . . . . . . . . . . . . . . . . . . . 160 8.6.6 New crossover operators by Zinflou et al. . . . . . . . . . . . . . . . . . . . 161 . . . . . . . . . . . . . . . . . . . . . 157 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 9 Adaptation of the memetic algorithm to the car sequencing problem 165 9.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 9.2 Fitness function and constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 9.3 Local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 9.4 9.5 9.3.1 Insertion of a group index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 9.3.2 Swap of two group indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Initial solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 9.4.1 Exact algorithm for paint colour changes . . . . . . . . . . . . . . . . . . . 167 9.4.2 Kominek’s heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.4.3 Extended Gottlieb and Puchta’s DSU heuristic . . . . . . . . . . . . . . . . 170 9.4.4 Random solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Fitness-distance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 9.5.1 Similarity measures for solutions of the CarSP . . . . . . . . . . . . . . . . 171 9.5.2 Random solutions vs. local optima . . . . . . . . . . . . . . . . . . . . . . . 174 9.5.3 Fitness-distance relationships . . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.5.4 Main conclusions from the fitness-distance analysis . . . . . . . . . . . . . . 185 9.6 CCSPX: conservative common subsequence preserving crossover . . . . . . . . . . 185 9.7 RSM: random shuffle mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 9.8 Adaptation of crossovers from the literature . . . . . . . . . . . . . . . . . . . . . . 187 V 9.8.1 Adaptation of NCPX . . . . 9.8.2 Adaptation of UAX . . . . . 9.9 Experiments with initial solutions . . 9.10 Experiments with memetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 187 188 190 9.10.1 Long runs until convergence 9.10.2 Runs limited by time . . . . 9.10.3 Quality vs. FDC . . . . . . 9.11 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 196 197 198 . . . . 10 Conclusions 199 10.1 Summary of motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 10.2 Contribution of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 10.3 Perspectives for further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 A Names of instances of the car sequencing problem 203 B Detailed results of memetic algorithms 205 Bibliography 211 Chapter 1 Introduction 1.1 Context of the research subject In modern economy, administration and science problems of optimisation are very often encountered, because decision makers are usually interested in cost-effective assignment of available resources to tasks or in solutions to otherwise unsolvable problems. Consider as an example a car manufacturer who is concerned with the number of cars produced during a shift. There are limits to his plant’s throughput due to staff and technological constraints. But the operations of staff and machines may be optimised without violating the constraints, e.g. by arranging production tasks in proper order, so the plant’s throughput is increased. Another example is a manager of a hospital unit who has to assign nurses to shifts (build a work schedule) for a period of a month. The goal of such a schedule is to balance the workload of nurses, satisfy their preferences concerning working hours and days-off and minimise personnel requirements. There are also some limiting legal regulations which influence the possible assignments. Thus, it is not easy for the manager to build a schedule which satisfies all the demands. This leads to an optimisation problem of building a schedule which improves the conditions of work. Yet another example might be a biochemist who wants to determine the complete sequence of a genome. The whole sequence cannot be accurately checked at once (a technological constraint); it has to be fragmented before analysis. When the analysis is completed, the fragments have to be assembled into one. Since the process of analysis does not preserve the ordering of fragments, the original order is lost. This gives rise to an optimisation problem: short sequences have to be assembled into the original form, maximising e.g. the amount of overlaps between fragments. An optimisation problem may be of two major types: continuous or combinatorial. This thesis is concerned with certain combinatorial problems. The three examples sketched above are in fact combinatorial optimisation problems of car sequencing, nurse scheduling and DNA assembly. Such problems deal with finite sets of potential solutions, e.g. sequences of some elements or subsets of a given set, as opposed to infinite number of potential solutions to continuous ones. The desired solution of such a problem is usually an object (e.g. a sequence, a subset) which maximises or minimises the given objective function (having the meaning of gain or cost). This feature of combinatorial problems does not always render them easy ones, since finite sets may still have huge cardinalities and may not be easy to search through. As practise demonstrates, many combinatorial optimisation problems are very hard to solve to optimality with acceptable time limits. The theory of computational complexity labels such problems NP-hard. Consequently, computer scientists deal with such hard problems by means of approximate 1 2 Introduction approaches: heuristic or metaheuristic algorithms. The goal of such algorithms is to generate some solutions in reasonable time. These solutions are only suboptimal, meaning that they may not be optimal, but still demonstrate good quality (e.g. low cost). The literature on heuristic and metaheuristic algorithms is huge. When one decides to use a metaheuristic, there is a large number of types to choose from: local search, tabu search, simulated annealing, evolutionary (and memetic) algorithms, just to name a few. Moreover, metaheuristics are not ready-to-use algorithms, but they are only schemes of algorithms which have to be adapted to the optimisation problem under consideration. Thus, when one decides upon a certain algorithm, one has further to design components of the algorithm in order to obtain a functional program. However, it appears that metaheuristic algorithms are not general tools of optimisation; they are not equally well-performing across all possible optimisation problems and certainly there exist cases when each metaheuristic performs poorly. This fact makes the choice of a proper algorithm for a problem the first important issue. Another crucial factor for the chosen algorithm’s performance is the already mentioned design of its components. As it may be seen in the literature, this design severely affects the final algorithm’s performance (its speed; the quality of solutions it produces). As a consequence, this process of adaptation of a metaheuristic algorithm to the problem at hand should be performed with care and justified as well as possible. Unfortunately, at the moment this process of adaptation of a metaheuristic appears to be more like craft than science or engineering. There is a clear lack of design guidelines for users of metaheuristics, lack of schemes of adaptation of algorithms to problems. 1.2 Research subject Some initial research on schemes of adaptation of metaheuristic algorithms to problems may be found in the literature. Such schemes are mainly based on analysis of certain properties of problems. The rationale behind this approach is to exploit knowledge about the problem’s properties in the design of algorithm’s components, since knowledge-less algorithms are expected to perform poorly. One of such properties of problems is fitness-distance correlation (also called ’big valley’ or global convexity). It is a phenomenon occurring in the space of solutions of a problem’s instance: as solutions become better, the distance between them decreases (with respect to some problemspecific distance measure). This property also assumes that the best solutions (global optima) are roughly in-between other good solutions. The hypothesis of fitness-distance correlation was examined and the property was found for a number of classical combinatorial optimisation problems. Even though there existed few instances of these problems which did not reveal the property, some algorithmic components exploiting fitness-distance correlation were proposed and eventually lead to well-performing metaheuristics. One of the approaches was to design distance-preserving crossover operators for a metaheuristic called the memetic algorithm. Still, fitness-distance analysis have not been performed for many hard problems of combinatorial optimisation. The main issue while testing fitness-distance correlation is the definition of some distance between solutions of the problem. Many researchers tend to understand distance only in terms of Hamming or Euclidean metric, while for combinatorial problems (with sequences, sets or graphs involved) these measures are hardly applicable. Consequently, there is still scope for research in this area. If some new distance measures for combinatorial objects (solutions) are defined and fitness-distance correlation is found, then components of metaheuristics may be designed based on this feature (e.g. distance-preserving 1.3. Initial assumptions and the main hypothesis 3 crossover operators for memetic algorithms). Therefore, this thesis focuses on the scheme of adaptation of a memetic algorithm which is founded on positive results of fitness-distance analysis and leads to the design of distance-preserving operators. This scheme is applied to algorithms solving problems which have not yet been analysed for fitness-distance relationship: a vehicle routing problem and a car sequencing problem. 1.3 Initial assumptions and the main hypothesis The assumptions of this thesis are the following: • the two chosen problems of combinatorial optimisation reveal the property of fitness-distance correlation • the presence of fitness-distance correlation in the fitness landscape of a problem facilitates the design of efficient memetic algorithms. The main hypothesis of this dissertation states that adaptation of a memetic algorithm to a combinatorial optimisation problem should lead to the design of distance-preserving crossover operators if only fitness-distance correlation is revealed. In such a case the adapted algorithm will generate solutions of not worse or even better quality than the same algorithm with other operators. 1.4 Goals of the dissertation The goal of this dissertation is to perform and evaluate the scheme of adaptation of the memetic algorithm which is based on fitness-distance correlation. This scheme is applied to and evaluated on two chosen problems of combinatorial optimisation: the capacitated vehicle routing problem and the car sequencing problem. This main goal consists of the following sub-goals: • the design and implementation of fast local search procedures for the two problems, • the examination of fitness-distance correlation in fitness landscapes of the problems (this includes defining proper distance/similarity measures for solutions of the problem), • the design of distance-preserving crossover operators, • the design and implementation of memetic algorithms which use the operators, • the comparison of the operators with the ones which may be found in the literature for the same problems, • the analysis of performance of the resulting memetic algorithms. 1.5 Published work Some contents of this thesis have been already published by the author in conference proceedings, scientific journals and a book chapter. • The method of fitness-distance analysis based on the examination of a set of pairs of solutions was proposed and first employed by Kubiak (2005) and Kubiak (2007). Here it is discussed in more detail in section 5.4.6. 4 Introduction • Local search acceleration techniques for the capacitated vehicle routing problem were published by Kubiak & Wesołek (2007). They are discussed in section 7.3.5. • First elements of the fitness-distance analysis of the capacitated vehicle routing problem were published by Kubiak (2004) and Kubiak (2005). A complete analysis, which also used distance measures of other authors, was presented by Kubiak (2007). This analysis makes the contents of section 7.5. • The first set of distance-preserving recombination operators for the problem was published by Kubiak (2004). Here they are further developed in section 7.6. • Fitness-distance analysis of the car sequencing problem was partially published by Jaszkiewicz et al. (2004) and later extended by Kubiak et al. (2006). This is described in section 9.5. • First distance-preserving recombination operator and the memetic algorithm for the car sequencing problem was first published by Jaszkiewicz et al. (2004). A modified operator is developed in this thesis in section 9.6. Additionally, the analysis of distance between solutions of the capacitated vehicle routing problem generated by the memetic algorithm was published by Kubiak (2006). Although beyond the scope of the thesis, this subject is closely related to the fitness-distance analysis and performance of the designed memetic algorithm. Chapter 2 Metaheuristics in combinatorial optimisation 2.1 Problems of combinatorial optimisation 2.1.1 Basic definitions Combinatorics deals with finite sets and structures, such as orderings, subsets, assignments, graphs, etc. (Bronshtein et al. 2004). Similarly, combinatorial optimisation is interested in such structures, which are the basis for defining its problems. According to Błażewicz (1988) and Hoos & Stutzle (2004), a combinatorial optimisation problem (COP) defines a finite set of some combinatorial parameters the values of which does not have to be entirely known in advance. A problem instance completes the definition by setting all the parameters to certain values. A feasible solution to a problem instance is a combinatorial object (e.g. a number, a set, a function, etc.) which observes the constraints of the parameters given in advance. In the problem there is also an objective function defined, which assigns a numerical value to each solution. This function is to be optimised, meaning that an optimal solution should be found. The optimal solution to a problem instance is a feasible solution which minimises or maximises the objective function; the direction of optimisation is always given in the definition of the problem. The definition given above, although stresses well the most important aspects of combinatorial optimisation and gives the necessary intuition about them, is rather informal. Precisely speaking, a combinatorial optimisation problem π consists of (Merz 2000, Kominek 2001): • a set of problem instances Dπ , • for each instance I ∈ Dπ , a finite set Sπ (I) of feasible solutions, • an objective function fπ : Dπ × Sπ (I) → Q which assigns a fractional number q ∈ Q for each solution s ∈ Sπ (I) of instance I ∈ Dπ , • the direction of optimisation (either maximisation or minimisation). An optimal solution s∗ of a problem instance I ∈ Dπ (for a minimisation problem π) is a feasible solution which has the minimum value of the objective function fπ among all feasible solutions: ∀s ∈ Sπ (I) : fπ (I, s∗ ) ≤ fπ (I, s) For a maximisation problem only the direction of the inequality changes. 5 6 Metaheuristics in combinatorial optimisation A combinatorial optimisation problem π is solved by an algorithm which generates an optimal solution for each instance of the problem or indicates that there is no feasible solution for the instance at all. As it turns out in practise, one of the most important questions regarding COPs deals with the running time of such algorithms (to be precise: with their time complexity). 2.1.2 Examples of problems Before proceeding to a discussion on time-complexity of algorithms, it is useful to see some examples of combinatorial optimisation problems. Minimisation of the number of paint colour changes in a paint shop Consider a production day in a car factory. A set of cars is to be produced. Each car has a paint colour code assigned, which defines its final body colour. This set of cars is to be put on a production line, so the order (sequence) of cars has to be determined. But each change of colour between two consecutive cars in a sequence generates additional cost; if a colour changes, then spray guns in the paint shop have to be purged. Therefore, the goal of scheduling the cars is to minimise the number of paint colour changes in the sequence. Additionally, a number is given which limits the maximum number of consecutive cars with the same colour. This paint batch limit (PBL) reflects the fact that the spray guns have to be purged regularly. This problem is a part of a larger one, called the car sequencing problem (CarSP) (Cung 2005b) and was defined by French car manufacturer Renault. The latter also defines other characteristics of cars and some more constraints and components of the objective function, but here only the colour objective is described. As an example, lets consider an instance of the problem with 10 cars of colour 1, 9 cars of colour 2 and 8 cars of colour 3. The paint batch limit is set to 5. A feasible and an optimal solution to the instance are shown in figure 2.1. 112233222221111122333331113 111112222233333111112222333 Figure 2.1: A feasible (top) and an optimal (bottom) solution to an exemplary instance of the car sequencing problem (the colour objective only). The feasible solution induces 8 colour changes between consecutive cars. The optimal one contains only 5 colour changes. Minimisation of the cost of deliver of goods from a central depot to distributed customers The second example is problem concerned with deliveries. A transportation company has to deliver some goods (e.g. petrol) from its depot to a number of geographically distributed customers. All distances between customers and the depot are known. The company possesses some vehicles, all with the same capacity limit. These vehicles start their deliveries at the depot, travel to customers, unload the demanded amounts of goods and return to the company’s depot. Each customer is serviced exactly once by one vehicle. The goal of the company is to create a delivery plan for its vehicles (i.e. for each vehicle the order of customers it visits) so the total distance travelled by all the vehicles is minimised. 2.1. Problems of combinatorial optimisation 7 This informal description defines the capacitated vehicle routing problem (CVRP) (Toth & Vigo 2002b). Sketches of a feasible and an optimal solution for an instance of the CVRP are shown in figure 2.2. The instance contains 50 customers (indicated by circles); the depot is the centrally located circle without connected lines (for the sake of the figure’s clarity). A solution contains edges (lines) between the depot and customers (circles). A sequence of edges starting and finishing at the depot (the half-drawn lines) defines a route of one vehicle. Figure 2.2: A feasible (left) and an optimal (right) solution to an exemplary instance of the capacitated vehicle routing problem. The feasible solution consists in 5 routes with their total length (cost) of 579 units. The optimal solution also contains 5 routes; the cost equals to 524. Other problems The two problems described above are just some examples of COPs. The set of combinatorial optimisation problems is vast and diverse. There are classical problems, with short and easy formulation, such as: • the problem of satisfiability of boolean expressions (Hoos & Stutzle 2004), • the knapsack problem (Błażewicz 1988), • the bin packing problem (Falkenauer 1998), • the travelling salesman problem (TSP) (Cormen et al. 1990, Hoos & Stutzle 2004), • the graph colouring problem (Galinier & Hao 1999, Falkenauer 1998). There are also problems with more complex definitions, like diverse scheduling problems (Coffman 1976, Słowiński 1984, Błażewicz 1988) or vehicle routing problems (Toth & Vigo 2002b). These problems are not only theoretical ones, but they arise in practical situations of management, where limited resources (people, time, rooms, machines, trucks, etc.) have to be assigned to tasks and the gain from this assignment has to be maximised (or its cost minimised). This practical significance of combinatorial optimisation problems may be also seen in the cycle of challenges organised by ROADEF, the French society of operational research and decision support. This 8 Metaheuristics in combinatorial optimisation biannual series of computational challenges, launched in 1995, always deals with an optimisation problem posed by some institution or company which has to solve it in its everyday operations. For example, the challenge problems in years 2003, 2005 and 2007 were formulated by ONERA and CNES (French space agencies) (Jaszkiewicz 2004), Renault (a car manufacturer) (Jaszkiewicz et al. 2004), and France Telecom (a telecommunication company). 2.2 Computational complexity of algorithms and problems As it was indicated earlier in the chapter, an important property of an algorithm for a given problem is its function of time complexity. It is a function which depends on the instance size (the size of input data) and bounds from above the number of steps of the algorithm (or its running time) (Błażewicz 1988). A crucial characteristic of this complexity function is whether it may be bounded from above with a polynomial of the instance size. If that is the case, then the algorithm is called polynomial ; otherwise it is said to be exponential (Błażewicz 1988, Cormen et al. 1990). An exponential algorithm usually cannot solve to optimality instances of practical sizes in reasonable time: the running time of such an algorithm quickly grows to infinity with increasing instance size. Even the incredible growth of the computational power of processors, which has been observed in recent years and is well described by the Moore’s Law, is not able to overcome this fundamental issue. That is why exponential algorithms are called inefficient (Błażewicz 1988). On the other hand, polynomial ones are called efficient. Consequently, when dealing with new problems of combinatorial optimisation, the first step toward a solution requires searching for a polynomial algorithm. If one is found, then it may be said that the problem is computationally easy. Such problems (their decision counterparts, to be precise) belong to class P: solvable in polynomial time. If this step fails, however, it might mean that the problem at hand is a difficult one. For many COPs polynomial algorithms have not been found, which suggests that they are indeed more difficult than problems from P. This observation led to construction of a class of NP-hard problems of optimisation (or NP-complete for their decision versions) (Błażewicz 1988, Cormen et al. 1990) Until now, it has not been proved whether NP-hard problems are really more difficult than problems from P. Nevertheless, the NP-hard class is constructed in such a way that if a polynomial algorithm is found for only one problem of this type, then it is found for all of them. Given the fact that research in the theory of computational complexity has been conducted for some 30–40 years now and none such algorithm has been found (instead, the class of NP-hard problems has been enlarged greatly), it is commonly believed that there are no polynomial algorithms for NP-hard problems. (Błażewicz 1988, Cormen et al. 1990). This is also the author’s personal opinion. This assumption, or rather a belief, has major consequences to the practise of combinatorial optimisation: for practically interesting problems it is most likely that there are no efficient algorithms which could solve them. From the two problems described briefly in section 2.1.2, the CVRP is NP-hard, whereas the CarSP (the decision counterpart with the colour criterion) is in P. More practical version of the CarSP is, however, also NP-hard. As for the other problems mentioned earlier, almost all of them have the proved NP-hard status. The exception are some simple versions of scheduling and vehicle routing problems, but still the majority of more complex versions of such problems are also computationally hard. 2.3. Methods of dealing with hard combinatorial optimisation problems 2.3 9 Methods of dealing with hard combinatorial optimisation problems Nevertheless, many hard COPs have so important practical applications that they must be solved somehow. In the last 30–40 years, several general methods of solving hard problems were proposed (Błażewicz 1988, Cormen et al. 1990). One of the approaches is to employ exponential algorithms. This may give satisfactory running times when the considered instances of a problem are small. Examples may be: branch and bound or pseudopolynomial algorithms (Błażewicz 1988, Michalewicz & Fogel 2000, Hoos & Stutzle 2004). Another possibility is to use some approximation algorithms. In this approach the goal is to ‘generate a solution with the objective function value differing only slightly from the value of the optimal solution’1 (Błażewicz 1988). Thus, usually some measure of the relative error (excess) of a generated solution s with respect to the optimal solution s∗ is introduced (Cormen et al. 1990), which should be minimised: |f (s) − f (s∗ )| ²(s) = f (s∗ ) In this group of methods examples may be: special-purpose algorithms with proved upper bound on the relative error (polynomial-time approximation schemes) (Cormen et al. 1990), heuristics (like greedy algorithms) or metaheuristics (Michalewicz & Fogel 2000, Hoos & Stutzle 2004). Since the memetic algorithm, a kind of a metaheuristic, is the subject of this thesis, this group of methods is dealt with in more detail in a separate section. 2.4 Metaheuristics Metaheuristics, in their original definition, are solution methods that orchestrate an interaction between local improvement procedures and higher level strategies to create a process capable of escaping from local optima and perform a robust search on a solution space. Over time, these methods have also come to include any procedures that employ strategies for overcoming the trap of local optimality in complex solution spaces (Glover & Kochenberger 2003) Due to this great diversity of methods it is hard to enumerate properties common to all metaheuristics. However, some characteristics, which are frequent among these methods, may be given. 1. Metaheuristics are not algorithms in the strict sense, but schemes of algorithms which are defined without any particular problem in mind. 2. They usually search the space of complete solutions of the given problem. 3. They work by iteratively repeating the same main steps. 4. Most of them are inspired by some natural phenomenon (e.g. physical, biological). Metaheuristics are not exactly algorithms, because in their definitions mechanisms and components are only ambiguously described. Many design decisions are left to the practitioner who adapts a metaheuristic to the given problem. These kinds of algorithms usually work on complete solutions of the problem at hand, using the so-called perturbative search (Hoos & Stutzle 2004). This means that they employ some kind of transformation from one complete solution to another by changing some solution components. These components may be, for example: 1 the author’s translation from the Polish original 10 Metaheuristics in combinatorial optimisation • an edge or a vertex in the TSP, • an assignment of a truth value to a binary variable in the MAX-SAT problem, • an order of two tasks on a machine in job-shop scheduling, • an assignment of a car to a position on a production line in the CarSP. Just like any other approximate approach to COPs, metaheuristics do not guarantee that optimality is reached. However, they have proved to be very efficient in providing good suboptimal solutions to many complex, real-world problems (Aarts & Lenstra 2003a, Hoos & Stutzle 2004, Michalewicz & Fogel 2000, Glover & Kochenberger 2003). That is why there is so much interest in the development of such methods, and so many types of metaheuristics have been proposed and tested on a variety of problems. These include: • local search (Aarts & Lenstra 2003a, Hoos & Stutzle 2004, Michalewicz & Fogel 2000), • tabu search (Gendreau 2003, Hertz et al. 2003, Michalewicz & Fogel 2000), • simulated annealing (Aarts et al. 2003, Henderson et al. 2003, Michalewicz & Fogel 2000), • genetic algorithms and evolutionary computation (including memetic algorithms) (Goldberg 1989, Michalewicz 1996, Michalewicz & Fogel 2000, Reeves & Rowe 2003, Reeves 2003, Moscato & Cotta 2003, Muhlenbein 2003), • scatter search (Glover et al. 2003), • variable neighbourhood search (Hansen & Mladenović 2003, Hoos & Stutzle 2004), • greedy randomised adaptive search procedures (Resende & Ribeiro 2003, Hoos & Stutzle 2004), • ant colony optimisation (Dorigo & Stutzle 2003, Hoos & Stutzle 2004), • hyperheuristics (Burke, Kendall, Nevall, Hart, Ross & Schulenburg 2003). Some of the methods are not described here, because the literature on metaheuristics is huge; the interested reader is referred to the cited works. Instead, only some metaheuristics are closer characterised; these are the ones more recent and perhaps less known than e.g. tabu search or simulated annealing, or those of special interest to this thesis. In particular, evolutionary and memetic algorithms are presented in more detail. 2.4.1 Local search Local search is a basic and simple metaheuristic. It ‘starts off with an initial solution and then continually tries to find better solutions by searching neighbourhoods’ (Aarts & Lenstra 2003a). The neighbourhood of a solution is a set of solutions which are in some sense close to it. For a given instance I ∈ Dπ of a given optimisation problem π the neighbourhood is usually defined as a function (Aarts & Lenstra 2003a, Michalewicz & Fogel 2000): N : S(I) → 2S(I) The notion of a neighbourhood implies the fundamental notion of a locally optimum solution (Aarts & Lenstra 2003a, Hoos & Stutzle 2004, Michalewicz & Fogel 2000). Solution s ∈ S(I) is a local optimum (minimum) with respect to neighbourhood N when: ∀sn ∈ N (s) f (s) ≤ f (sn ) 2.4. Metaheuristics 11 Given these definitions, the local search starting from solution s is formulated in algorithm 1. Algorithm 1 LocalSearch(s). 1: repeat {main local search loop} 2: s0 = s 3: betterF ound = false 4: for all sn ∈ N (s) do {iterate over the neighbours of s} 5: if f (sn ) < f (s0 ) then {check if this is the best neighbour} 6: s0 = sn {remember the better neighbour} 7: betterF ound = true 8: if betterF ound then 9: s = s0 {proceed to the better neighbour} 10: until betterF ound == false {stop at a local optimum} 11: return s It should be noted that local search always returns a locally optimum solution. This algorithm is by some authors called ‘iterative improvement’ (Aarts & Lenstra 2003a, Hoos & Stutzle 2004), in order to distinguish it from other, more complex methods collectively called by them ‘(stochastic) local search’ (like simulated annealing, tabu search or genetic algorithms). While it is authors’ right to name methods as they please, in this thesis local search always and only refers to the ‘iterative improvement’ scheme given above. There are two main versions of local search, which differ in the way an improving neighbour of s from N (s) is chosen (Aarts & Lenstra 2003a): • best improvement: always the whole neighbourhood is examined and the best improving neighbour is chosen as the new current solution; this one is also called steepest local search and this is the version shown in algorithm 1; • first improvement: the first neighbour found in N (s) which improves the objective function is chosen as the new solution; this version is also called greedy local search and may be implemented by putting an additional break statement after line 7 of algorithm 1. It is difficult to trace the inspiration for local search applied to COPs. Perhaps it was motivated by gradient search methods known earlier in numerical optimisation. According to Aarts & Lenstra (2003a), first trials with local search in combinatorial optimisation were performed in late 1950s on the TSP, with the use of an edge-exchange neighbourhood. Some examples of local search include: • k-exchanges for the TSP (Hoos & Stutzle 2004), • the Clarke and Wright algorithm invented originally for the CVRP (Clarke & Wright 1964) and also applied to the TSP (Aarts & Lenstra 2003a, Hoos & Stutzle 2004), • edge-exchange-based algorithms for vehicle routing problems (Kindervater & Savelsbergh 2003). More modern examples of pure local search are not easy to find, because it is relatively straightforward to extend this kind of algorithm to simulated annealing, tabu search or some other hybrid approach (Aarts & Lenstra 2003a, Hoos & Stutzle 2004, Michalewicz & Fogel 2000). Hence, local search is usually a component of these more complex methods. It can be seen from the description above that local search is not an ‘out-of-the-box’ solution to COPs; it requires adaptation to the specific problem being solved. It means that a designer has to define: the neighbourhood(s) being used, the way an initial solution is generated and the improvement rule (either first or best improvement). 12 2.4.2 Metaheuristics in combinatorial optimisation Ant colony optimisation Ant colony optimisation algorithms (ACO) were inspired by the behaviour of foraging ants (Dorigo & Stutzle 2003). When ants search for food in the area surrounding their ant-hill, they usually roam randomly in order to find some. But when food is found, ants are able to optimise the route from the ant-hill to this place and back. This optimisation is performed by ants collectively using a chemical intermediate, the so-called pheromone trail. Some amount of pheromone is left by each ant on the route it traversed; the pheromone evaporates with time unless there is another ant which could sustain or amplify its level. This inspiration may be translated into the following algorithmic scheme (Hoos & Stutzle 2004). Algorithm 2 Ant colony optimisation. Initialise pheromone trails τi while not stopping do for all solutions sj in the population do Construct sj by a randomised procedure based on a heuristic function h and pheromone trails τi Update the pheromone trails τi based on the current contents of all solutions sj Update the currently best-found solution return the best-found solution In this algorithm one solution sj is related to an ant in the biological metaphor. At the beginning of each iteration each ant creates its solution from scratch by subsequently choosing and inserting solution components into it. This choice is done in an almost greedy way: the components are chosen probabilistically based on their current evaluation (by the heuristic evaluation function h) and past evaluations (by the pheromone trail τi ). When the solution of each ant is created, the pheromone trail of each solution component i is updated. At each step of the ACO algorithm the trail evaporates to some extent on each possible component i and is amplified on components i which are present in some solutions sj generated by ants; the amount of amplification is related to solution quality. The algorithm was invented by Marco Dorigo in early 1990s, with the first application to the TSP. According to Hoos & Stutzle (2004), in later years a local search phase was added; it was performed on each solution separately after the construction phase, and irrespectively of the levels of pheromone. Some examples of ACO algorithms are described by Hoos & Stutzle (2004). A simple ACO for the TSP defines an edge in the input graph as a solution component. The heuristic function, which evaluates components during the construction phase, is the reciprocal of the edge weight. Pheromone trail is also defined for each edge. Another examples for the TSP are the max-min ant system (Hoos & Stutzle 2004) or an ACO algorithm described by Boryczka et al. (2006). An ant system for the CVRP is presented by Reimann et al. (2002). Gottlieb et al. (2003) and Gravel et al. (2005) present applications to the car sequencing problem. Ant colony optimisation, like any other metaheuristic, has to be adapted to the problem to be solved. Here, this adaptation requires: a clear definition of a solution component (for the sake of a construction heuristic and pheromone data structures); a randomised construction heuristic; a rule of pheromone update. 2.4. Metaheuristics 2.4.3 13 Hyperheuristics This approach has been inspired by a very pragmatic and market-oriented point of view on COPs (Burke, Kendall, Nevall, Hart, Ross & Schulenburg 2003): the well-known metaheuristic methods are too problem-specific and knowledge-intensive to be practically useful in cheap and easy-touse computer systems. They are tailor made to specific problems and when characteristics of the problem change slightly, they fail to deliver good solutions. Moreover, the extensive use of problemspecific information makes them too resource-intensive in the development, which is unacceptable to small companies; they prefer ‘good enough - soon enough - cheap enough’ solutions (Burke, Kendall, Nevall, Hart, Ross & Schulenburg 2003, Soubeiga 2003) which are more general than special purpose methods. Therefore, the advocates of hyperheuristics argue that combining simple heuristics is cheaper to implement and easier to use. This idea leads to ‘using (meta-)heuristics to choose (meta-)heuristics to solve the problem in hand’ (Burke, Kendall, Nevall, Hart, Ross & Schulenburg 2003), which may be realised in the following algorithm. Lets assume that a set H of simple constructive heuristics is given, H = {h1 , h2 , . . . , hm }. Hyperheuristics assume that each heuristic can be applied to a partial solution and adds only one component to it, so these low-level heuristics could work alternately. At the beginning of the algorithm the solution is empty (s0 ) and the goal is to reach a complete solution (sn ) in some stage n of the algorithm. A very basic hyperheuristic might work as shown in algorithm 3. Algorithm 3 Basic hyperheuristic. Create an empty initial solution s0 i=0 while solution si is not complete do Choose a heuristic hj from H to apply to the current state of the built solution, si Apply hj to si obtaining si+1 i=i+1 return si In this scheme the control over low-level heuristics is very simple and leads only to the construction of one solution. This high-level steering may be more sophisticated; examples in the literature include diverse approaches: a choice function, tabu search, a genetic algorithm or variable neighbourhood search hyperheuristics (Soubeiga 2003, Qu & Burke 2005, Remde et al. 2007). An important property of a hyperheuristics is that the high-level control is completely detached from the problem it is trying to solve; it has neither the knowledge about the problem instance data, nor on the solutions it constructs, except for the objective function values. It works only on the supplied low-level heuristics (Burke, Kendall, Nevall, Hart, Ross & Schulenburg 2003). Although some authors indicate that the first hyperheuristics appeared as early as 1960s (Soubeiga 2003), major interest in such methods could have been seen in 1990s, with a substantial increase in publications in early years of the 21st century (Burke, Kendall, Nevall, Hart, Ross & Schulenburg 2003, Soubeiga 2003) (mainly due to two British universities: University of Nottingham and Napier University). Applications of hyperheuristics mainly concern real-world problems: • personnel scheduling with a choice function or tabu search hyperheuristic (Soubeiga 2003, Burke, Kendall & Soubeiga 2003), • workforce scheduling with a random or greedy hyperheuristic (Remde et al. 2007), 14 Metaheuristics in combinatorial optimisation • exam timetabling at universities with a variable neighbourhood hyperheuristic (Qu & Burke 2005). The inventors of hyperheuristics might argue that this type of algorithms is more general than other metaheuristics and does not require problem-specific components. However, to the author of this thesis it is clear that at least some adaptation of a hyperheuristic to the problem is required. Firstly, the set of low-level heuristics has to be provided. This also implies that they have to possess exactly the same interface (parameters, return values) and the same general behaviour (transition from one to the next stage of a solution construction process) in order to be used alternately. Moreover, it means that the concept of a state of a solution hat to be clearly defined during design and implementation of a hyperheuristic: low-level heuristics have to share exactly the same concept. On top of that, the high-level control hyperheuristic, even though detached from the problem and its particularities, also has to be chosen in some way for a specific application. 2.4.4 Evolutionary algorithms Evolutionary algorithms (EAs) were inspired by the phenomenon of natural selection in the world of living organisms (Goldberg 1989, Michalewicz 1996, Reeves & Rowe 2003). They mimic this phenomenon by performing optimisation on a set of solutions (a population) at each search step (a generation) and by repeating the artificial counterparts of crossover, mutation and selection. It is hoped that such an artificial evolution will lead to good solutions of the considered problem, just as the natural evolution ‘generated’ complex living organisms which are well-adapted to demanding environments. The general scheme of an evolutionary algorithm is shown as algorithm 4 (Reeves & Rowe 2003). Algorithm 4 General scheme of an evolutionary algorithm. Generate a population of initial solutions while termination condition not satisfied do while not sufficient offspring generated do if crossover condition satisfied then Select parent solutions Choose crossover parameters Perform crossover obtaining an offspring solution if mutation condition satisfied then Select a solution for mutation Choose mutation parameters Perform mutation obtaining an offspring solution Evaluate fitness of offspring Select new population from the current one return the best solution in the population The history of such algorithms reaches early 1950s, and, as noted by Michalewicz & Fogel (2000), such an approach was invented approximately 10 times by different scientists and under different names: genetic algorithms (GAs), evolution strategies, evolutionary programming or genetic programming. In 1990s the equivalence of these methods was demonstrated and most of them finally combined into what is now known as evolutionary algorithms, by borrowing ideas and mechanisms from each other. Examples of EAs adapted to combinatorial problems are numerous in the literature: • the book by Falkenauer (1998) is devoted to GAs applied to grouping problems (e.g. bin packing, graph colouring); 2.4. Metaheuristics 15 • a genetic algorithm for a timetabling problem is described by Lewis & Paechter (2005a); • applications of EAs to production scheduling are the theme of the work by Pawlak (1999); • resource-constrained project scheduling problem is solved by a genetic algorithm due to Kominek (2001); • genetic algorithms for vehicle routing problems were proposed by Potvin & Bengio (1996), Tavares et al. (2003), Baker & Ayechew (2003); • GAs for car sequencing problems are described by Warwick & Tsang (1995), Terada et al. (2006), Zinflou et al. (2007). Even more examples of EAs for COPs are given further, together with the description of basic mechanisms of artificial evolution. Representation of solutions Although not directly visible in the algorithm outline above, the issue of representation of solutions (sometimes called the ‘genetic’ representation), which is used in an application of the EA, is an important one, since it impacts the algorithm performance. It also influences other components of the EA: crossover and mutation operators; the usage of certain representations makes the application of some operators much easier. Binary representation In this approach solutions are always represented as binary strings and only such strings are manipulated in an evolutionary algorithm (Goldberg 1989, Michalewicz 1996, Reeves & Rowe 2003). Some research indicated the usefulness of Gray codes while encoding integers (Michalewicz 1996). Goldberg also advocated the use of ‘messy’ binary encodings with variable length and redundancy (Goldberg 1989, Michalewicz 1996). There are problems for which this kind of representation is a natural one (e.g. NK-landscapes, binary quadratic programming, graph bi-partitioning (Merz 2000)) but in other cases this representation requires that specific encoding-decoding procedures are used. Floating-point representation Here, solution parameters are represented as vectors of real numbers. In some cases the direct problem parameters are accompanied by special values of difference in mutations (the delta encoding (Michalewicz 1996)). This representation is well-suited to problems with numerical decision variables (Michalewicz & Fogel 2000). Specific combinatorial representations In case of COPs, many problem-specific representations have been proposed in the literature. These representations reflect the diverse nature of combinatorial structures and constraints arising in practical applications. For example: • there are adjacency, ordinal, path, edge-list and matrix representations for the TSP (Michalewicz 1996, Merz 2000); • a permutation representation is used for the QAP (Merz 2000), • in grouping problems (Falkenauer 1998, Michalewicz 1996) there are representations indicating the membership of an object to a group, the order of insertion into groups (requiring a decoding procedure) or specific ‘group-oriented’ representations; 16 Metaheuristics in combinatorial optimisation • a matrix representation is used for the university course timetabling problem (Lewis & Paechter 2005a, Lewis & Paechter 2005b, Michalewicz 1996); • for the CVRP several possibilities were examined: a ‘genetic vehicle representation’ (GVR) (Tavares et al. 2003), a permutation representation without (Prins 2001) or with (Alba & Dorronsoro 2004) route delimiters; • a sequence representation is employed for the CarSP (Cheng et al. 1999, Zinflou et al. 2007). These numerous examples demonstrate that the choice or design of a suitable solution representation for COPs is indeed an issue in the field of evolutionary computation. Crossover operators The operation of crossover (or recombination) is most often applied to a pair of solutions (parents) and produces one new solution (an offspring). It aims at generating an offspring which inherits good properties (components) from both of the parents. The actual form of this operation depends on the problem being solved and the chosen representation. Well-known examples of crossover operators for binary representations are: one-point crossover, two-point and multi-point crossover, uniform crossover (Goldberg 1989, Michalewicz 1996, Michalewicz & Fogel 2000, Reeves & Rowe 2003) In the case of floating-point representations one-point and multi-point crossovers are also used, but there are some more specialised operators: arithmetic and heuristic crossovers, which make use of the idea of linear combination of vectors (Michalewicz 1996, Michalewicz & Fogel 2000). But the clearly visible diversity of specialised crossover operators starts with combinatorial problems. The TSP, as the test-bed of metaheuristics for COPs, has seen more than 10 different recombinations (Goldberg 1989, Michalewicz 1996, Michalewicz & Fogel 2000, Reeves & Rowe 2003, Merz 2000): partially-mapped crossover, order crossover, cyclic crossover, one-point crossover for the ordinal representation, heuristic greedy crossover, edge recombination crossover, edge assembly crossover, maximum-preservative crossover, cut and merge operators for a matrix representation, matrix crossover, distance-preserving crossover. Similarly, there are many proposals of recombination for other COPs. In the case of the CVRP these are: order and partially-mapped crossovers (Gendreau et al. 2002), two-point and uniform crossovers (Baker & Ayechew 2003) or generic and specific crossovers (Tavares et al. 2003). For the closely related vehicle routing problem with time windows (VRPTW) a sequence-based and a route-based crossover were proposed (Potvin & Bengio 1996). Car sequencing problems have also seen several variants of recombination: uniform adaptive crossover (Warwick & Tsang 1995), a cross-switching operator (Cheng et al. 1999) and the three crossover operators proposed very recently by Zinflou et al. (Zinflou et al. 2007): interest-based, uniform interest and non-conflict position crossovers. Although at the very beginning of GAs there was the prevailing opinion that one-point crossover is general and sufficient enough to be successful for any problem (Goldberg 1989, Michalewicz & Fogel 2000, Reeves & Rowe 2003), currently it is a universally shared belief that crossover operators should be well-adapted to the problem at hand. The story of the TSP and other COPs seems to confirm this point of view, although it took some 30–40 years of research to reach it. However, it is not clearly understood how to choose or design a good crossover for the given task. 2.4. Metaheuristics 17 Mutation operators Mutation is an operation which generates a solution (an offspring or a mutant) by a slight perturbation of another one. The goal of this perturbation is usually to increase the diversity in the population, to explore new traits in solutions or to escape from local optima (Michalewicz 1996, Michalewicz & Fogel 2000, Reeves & Rowe 2003). The definition of mutation also depends on the problem and representation of solutions. For binary problems the most popular is the bit-flip mutation (Reeves & Rowe 2003). In case of floating-point encodings several other options have been devised in the literature: nonuniform and border mutations (Michalewicz 1996) or mutations based on probability distributions: Gaussian, Cauchy or uniform (Michalewicz & Fogel 2000). Again, applications of EAs to combinatorial problems have seen numerous alternatives. For the TSP these are: edge-exchange (also known as inversion), relocation of a vertex, relocation (displacement) of a path and vertex-exchange (Michalewicz & Fogel 2000). Mutations for the CVRP are: remove-and-reinsert or swap mutations (Gendreau et al. 2002), and inversion of a path or path displacement (Tavares et al. 2003). Greedy local search with several neighbourhood operators was also used as a mutation operator for this problem (Prins 2001). In the case of car sequencing several specialised mutations were employed (Cheng et al. 1999, Zinflou et al. 2007): switching to non-identical and non-overlapping subsequences of the same length (block switching); switching two non-identical vehicles (unit switching); inversion of a subsequence (block inversion); random reallocation of a subsequence (shuffle); displacement of a subsequence. These examples demonstrate that the form of mutation depends greatly on the problem being solved. Moreover, despite the early opinion that mutation has secondary function in EAs (Goldberg 1989), now the mutation operator is perceived as an important one (Michalewicz 1996). Selection This mechanism decides which solutions from the current population are selected to the next one. It usually probabilistically prefers better solutions. The selection procedure is independent on the problem and representation, i.e. it does not require any knowledge about them, except for their evaluations. Yet, it influences the search process considerably, since it determines the selection pressure - the probability of selecting a solution relative to the probability of selecting the best one from the population (Reeves & Rowe 2003). This way poor solutions may be either quickly abandoned by an EA or, conversely, left to breed in hope to generate good successors. Several procedures of selection have been proposed in the literature. Some of them are: proportional (or roulette-wheel) selection, truncation selection, selection based on ranking, stochastic universal selection, random tournaments (Goldberg 1989, Michalewicz 1996, Reeves & Rowe 2003). Closely related to selection is the issue of fitness scaling, which decides how the original values of the objective function are scaled to obtain selection probabilities; there are also several options for this choice (Goldberg 1989, Michalewicz 1996, Reeves & Rowe 2003). Some researchers influence the selection mechanism with the ‘no duplication’ policy: no two identical solutions may exist in the population (Reeves & Rowe 2003, Prins 2004). Another selection-related mechanism is the so-called elitist model invented by De Jong (see Goldberg (1989) or Reeves & Rowe (2003)) in order to improve the optimisation performance of GAs. This model requires that the currently best individual in the population is preserved. Yet another modification of the selection mechanism may be the steady-state version of the 18 Metaheuristics in combinatorial optimisation evolutionary algorithm. This version completely abandons the idea of generations in EAs. Instead, only one offspring is generated in each iteration of the algorithm (either by crossover or by mutation) and, if good enough, it replaces one (usually the worst) solution in the population. Initialisation of the population The way solutions are generated for an initial population is problem- and representation-specific. However, hardly any detailed guidelines concerning the initialisation exist in the literature; only some very general methods are given. Firstly, the population may be initialised at random, using simple random sampling (Merz 2000, Michalewicz & Fogel 2000, Reeves & Rowe 2003). This method may be easily applied to problems with solutions encoded in binary alphabet; problems with more complicated representations, e.g. permutations, require specialised procedures to ensure uniform randomness (Manly 1997). Secondly, the amount of randomness may be somewhat controlled with more systematic sampling, e.g. by the method based on the Latin hypercube sampling described by Reeves & Rowe (2003). Moreover, randomness may be totally removed from the initial population by completely systematic sampling (Michalewicz & Fogel 2000). These methods are useful in case of binary or floating-point representations. Finally, many authors mention that the initial population may include solutions of high quality obtained from some other heuristic techniques (Reeves & Rowe 2003, Michalewicz & Fogel 2000). These are usually fast greedy heuristics, rather problem-specific. 2.4.5 Memetic algorithms The inspiration for memetic algorithms (MAs) comes from several sources and they have many inventors. First is the notion of a cultural evolution (Moscato & Cotta 2003, Merz 2000) which is supposedly faster than the genetic one in improving its objects: memes, the unit of culture (ideas, designs, tunes, etc.) In such cultural evolution random changes, like mutation, are less probable. Rather, the variation of memes is performed on purpose by intelligent individuals. On top of that, cultural evolution requires less resources than its genetic counterpart; it does not have to physically build living creatures, but only requires some of their memory. The second source of inspiration is the Lamarckian point of view on evolution (Reeves & Rowe 2003, Merz 2000, Michalewicz & Fogel 2000). Here, an individual created by evolution has the ability to learn something which is not encoded in its genotype and improve its fitness during lifetime, and pass these acquired characteristics to its descendants. The third source is a rather pragmatic one. In many experiments with evolutionary algorithms applied to COPs it was noted that they have very limited ability to locally tune the generated solutions (Reeves & Rowe 2003, Hoos & Stutzle 2004). Some attempts at enriching EAs with other heuristic techniques, local search among them, demonstrated that such hybridisation very often results in considerably improved performance of evolutionary optimisation. This inspiration led to the integration of other heuristics, carrying problem-specific knowledge, into the evolutionary algorithms (Reeves & Rowe 2003), giving rise to memetic algorithms. Most often these heuristics are local search algorithms (in the broad sense) (Reeves & Rowe 2003, Michalewicz & Fogel 2000, Merz 2000, Hoos & Stutzle 2004), although some authors include also exact methods, approximation algorithms, specialised recombinations in the list (Moscato & Cotta 2003). The scheme shown in algorithm 5 represents the steady-state version the memetic algorithm, with the replacement of the worst solution in the population. 2.4. Metaheuristics 19 Algorithm 5 Steady-state memetic algorithm. Generate a population of initial solutions Apply local search to each solution in the population while termination condition is not satisfied do if crossover condition satisfied then Select parent solutions Choose crossover parameters Perform crossover obtaining an offspring solution Apply local search to the offspring else Select a solution for mutation Choose mutation parameters Perform mutation obtaining an offspring solution Apply local search to the offspring if the offspring is better than the worst solution in the population then Replace the worst solution with the offspring return the best solution in the population Some authors state the algorithm as a generational one (Hoos & Stutzle 2004, Merz 2000). Some other ones explicitly include some restart mechanisms (Moscato & Cotta 2003). From the perspective of metaheuristics there are two main points of view on the memetic algorithms: • it is an evolutionary algorithm which manipulates only locally optimal solutions (Gendreau et al. 2002, Michalewicz & Fogel 2000, Jaszkiewicz & Kominek 2003); • it is a local search algorithm restarted multiple times from intelligently generated starting points (by crossover and mutation) (Gendreau et al. 2002, Jaszkiewicz & Kominek 2003). This perspective requires from any designer of a memetic algorithm to first compare his algorithm with an ordinary EA and iterated LS in order to demonstrate usefulness of his design. According to (Moscato & Cotta 2003) the term ‘memetic algorithm’ was first used in 1989 to describe a hybrid of a genetic algorithm and simulated annealing. However, hybridisation of genetic and local search methods was also attempted some years earlier, and hence the term ‘hybrid genetic algorithm’ is also employed to describe this approach. On the other hand, some authors call such methods ‘genetic local search’ (Merz 2000, Hoos & Stutzle 2004, Jaszkiewicz & Kominek 2003) or still use the general term ‘evolutionary algorithms’. There are many examples of applications of MAs to combinatorial problems. Michalewicz & Fogel (2000) cite a publication on a MA applied to the TSP with very good results. Jaszkiewicz & Kominek (2003) use a genetic local search algorithm to solve a vehicle routing problem, while Prins (2001) applies local search to all individuals in his genetic algorithm for the CVRP. Zinflou et al. (2007) apply local search in some of their experiments with a GA for a CarSP. Peter Merz’s Ph.D. thesis (2000) is entirely devoted to the design of memetic algorithms for classical combinatorial optimisation problems. Numerous applications of MAs are listed by Moscato & Cotta (2003). An alert reader will note that there are many components in the memetic algorithm which have to be adapted to the problem under study before the algorithm can actually run. These are all evolutionary components (representation, crossover, mutation, initialisation) and all local search components (neighbourhood(s), improvement rule). 20 Metaheuristics in combinatorial optimisation Table 2.1: Components of metaheuristics which require adaptation to the problem. Metaheuristic Local search Ant colony optimisation Hyperheuristic Evolutionary algorithm Memetic algorithm 2.5 Components requiring adaptation generation of an initial solution neighbourhood operator(s) improvement rule definition of a solution component randomised construction heuristic pheromone update rule definition of a solution component set of low-level heuristics high-level control hyperheuristic generation of initial solutions representation crossover operator(s) mutation operator(s) all components of local search all components of an evolutionary algorithm Metaheuristics: schemes of algorithms which require adaptation This short survey of metaheuristics and their applications shows that these are not algorithms ready to use in every application. Rather, these are general ideas and schemes of algorithms. They have to be further adapted while considering application to a specific problem: components of a metaheuristic have to be designed or chosen (from the already existing ones). A list of such components of the surveyed metaheuristics is presented in table 2.1. In majority of cases (also in the case of the memetic algorithm) there are no design guidelines which could help a practitioner to adapt a metaheuristic to a problem, except rather general statements that problem-specific knowledge should be introduced in such algorithms. As the next chapter demonstrates, this design of components influences the efficiency of the obtained algorithm considerably. This is the reason why this thesis undertakes the subject of adaptation of the memetic algorithm to certain problems. Chapter 3 The No Free Lunch theorems and their consequences for optimisation The No Free Lunch (NFL) theorems (Wolpert & Macready 1997, Schumacher et al. 2001) are supposed to impose serious limits on the performance of search and optimisation algorithms with respect to some large sets of functions (problems). Thus, in a text on optimisation they cannot be omitted; they will be shortly described and their interpretations discussed with consequences to solving practical combinatorial optimisation problems. The proofs of the theorems will not be given since they are not essential to the discussion and may be easily found elsewhere. 3.1 3.1.1 Formulations of the theorems The original formulation by Wolpert and Macready In their important article Wolpert & Macready (1997) raised an issue of performance of optimisation algorithms with respect to all possible discrete functions (i.e. problems and their instances). In order to address the problem they used the notion of black-box search: when an optimisation algorithm is run on some function it is only allowed to evaluate one point in the domain at a time and guide its further search based only on evaluations performed earlier. The algorithm knows nothing more on the function than these previously sampled points. Yet another assumption is that points in the domain are not revisited by the algorithm. Further, the authors attempted at defining sensible performance measures of algorithms executed in such an environment. They rightly stated that the evaluation of performance (solution quality) after m steps of an algorithm had to be based on the sample dym of m evaluations of the optimised function the algorithm performed. Hence, they firstly focused on the distribution of such samples after m evaluations, putting the issue of performance measures aside. Next, they considered the distribution of samples dym for any two algorithms a1 , a2 when all possible discrete functions f were equally likely. Wolpert and Macready came to the conclusion that: X X P (dym |f, m, a1 ) = P (dym |f, m, a2 ) f (3.1) f It means that for any two algorithms a1 , a2 the probability of obtaining some sample dym after a number m of search steps is equal when summed across all possible discrete functions f . Given this result they concluded that it did not matter what performance measure were used to evaluate algorithms (provided it was based on samples of points); since the distribution of samples were exactly the same for all algorithms, so were the performance measures. 21 22 The No Free Lunch theorems and their consequences for optimisation What is also important, Wolpert & Macready (1997) showed that an algorithm need not be deterministic for the theorem to apply; it is also true for stochastic algorithms. 3.1.2 The strengthened formulation by Schumacher et al. A stronger formulation of the No Free Lunch theorem was presented several years later by Schumacher et al. (2001). They proved that ‘a No Free Lunch result holds for a set of functions F if and only if F is closed under permutation’. Firstly, they demonstrated that a discrete function might be described by a sequence of points (values) from its domain and the corresponding sequence of evaluations of the points (values from a co-domain). Then they showed that assuming a certain ‘canonical’ ordering of domain points, the sufficient means of describing a function was a sequence of evaluations for this ordering. Secondly, Schumacher et al. (2001) recalled the notion of a permutation of a function. For example, if a function was described as a sequence f = (1, 2, 3, 2, 1) and permutation π was given π = (2, 1, 5, 3, 4), then the permuted f , π(f ) = (2, 1, 2, 1, 3) was some other discrete function with the same domain and co-domain as the original f . Thirdly, the authors defined a set F of functions as closed under permutation if for every function f from F and any permutation π applicable to f , the permuted function π(f ) was also in this set F . Finally, they proved that the equality 3.1.1 showed by Wolpert and Macready held if and only if the set of functions F (the basis for summation) was closed under permutation. This formulation is stronger than the original one because it also shows cases when a No Free Lunch result cannot hold: when a set of functions is not closed under permutation. It also demonstrates that such a result may hold for a very limited set of functions. Schumacher et al. (2001) give an example of ‘needle-in-a-haystack’ functions F = {(0, 0, 0, 1), (0, 0, 1, 0), (0, 1, 0, 0), (1, 0, 0, 0)} which is clearly closed under permutation; hence, the No Free Lunch holds in this tiny case. 3.2 Major consequences of the theorems Wolpert & Macready (1997) commented on their formulation that it ‘explicitly demonstrates that what an algorithm gains in performance on one class of problems is necessarily offset by its performance on the remaining problems’. Whitley & Watson (2006) rightly note that this is valid for all possible performance measures based on sampling the search space. This is a crucial consequence: there are no general optimisation algorithms which perform well on all possible problems. If an algorithm performs better than average on some set of functions, it has to perform worse on the complement of this set (with respect to the universe of all discrete functions). To emphasise this consequence, some authors (Culberson 1998, Whitley & Watson 2006, Wolpert & Macready 1997) also apply the No Free Lunch theorem to the algorithm of random enumeration. It appears that across all possible discrete functions there is no algorithm that performs better than this random enumeration, no matter the performance measure; actually, all algorithms have equal performance. This applies to either deterministic or stochastic algorithms that conform to the assumed black-box environment. As pointed out by Whitley & Watson (2006), this consequence stopped the arguments in the optimisation community as to which algorithm was more powerful and general than others; in the black-box environment there are no better algorithms. It also appears that the reasoning similar to the one in the NFL theorems may be applied to the issue of encodings. This is an important part of many well-performing evolutionary algorithms and 3.3. The No Free Lunch theorems vs. the practise of optimisation 23 it has been stated many times that a good encoding of solutions is the foundation of good EAs (see e.g. Michalewicz (1996)). However, when all possible discrete functions are considered, the effect of encoding is always the same (Culberson 1998, Reeves & Rowe 2003, Whitley & Watson 2006). However, metaheuristics are not subject to No Free Lunch theorems. Yet this is not good news. This is bad news because it recalls that metaheuristics are not ready-to-use algorithms, but only schemes of algorithms which have to be further specified to be fully operational. But when a metaheuristic is completely specified (consider the Simple Genetic Algorithm from Goldberg (1989)), it becomes an ordinary algorithm. If this algorithm were considered a general tool of optimisation, it would be de facto put in the black-box environment and instantly be subject to the NFL theorems, as any other blindly applied algorithm. This means that when investigating metaheuristics, we also have to remember the No Free Lunch results. 3.3 The No Free Lunch theorems vs. the practise of optimisation As it was noted, the NFL theorems require some assumptions about algorithms and problems to be made. These assumptions have been the basis of criticism about the theorems and their applicability to the practise of optimisation. 3.3.1 Argument no. 1: practical optimisation problems are not subject to No Free Lunch The number and compressibility of all discrete functions According to Whitley & Watson (2006), some critics of the NFL claim that the original theorem is not applicable to real-life functions because the set of all possible discrete functions (the basis of the result by Wolpert and Macready) is not representative of real-world problems. First of all, the critics say, the set of all functions is uncountably infinite, while the set of functions practically implementable in any digital computer is only countably infinite (Whitley & Watson 2006). Therefore, the conclusions from the NFL theorem do not apply to the set of implementable functions, but only to this larger set of rather abstract ones. Moreover, even this countably infinite set does not represent what may be called a practical problem, because the majority of functions in this infinite set are incompressible (Whitley & Watson 2006, Culberson 1998, Reeves & Rowe 2003). This means that the majority of such functions has to be described with a string of a length comparable to the size of the corresponding search space. Why is this a problem? Because real-life problems of optimisation (e.g. NP-hard problems) have very concise formulation (instance description), yet their search spaces are usually by orders of magnitude larger and hard to search through. So the set consisting mainly of incompressible functions cannot be representative of the hard, yet compressible ones. It is hard to disagree with these arguments since they show certain weaknesses in the original formulation of the NFL. However, the strengthened formulation by Schumacher et al. (2001) demonstrates that the No Free Lunch result may apply to finite and sometimes very small sets of functions (note their ‘needle-in-a-haystack’ example). It also shows that compressible functions may be subject to the NFL. Therefore, the argument that all possible discrete functions are not representative of real-world problems may not be used to disregard the consequences of the No Free Lunch theorems. Due to the strengthened formulation the danger of the NFL comes closer to practical optimisation problems than in the case of the original formulation. 24 The No Free Lunch theorems and their consequences for optimisation The uniform distribution of all functions Wolpert & Macready (1997) assumed that all possible discrete functions are uniformly distributed. Yet, they were aware that for practical problems this distribution will not be uniform. Some interpretations of the original NFL theorem state, therefore, that this is the cause of non-applicability of the theorem to practical problems (Kominek 2001) and in this case there may exist algorithms which outperform at least some of the other ones (e.g. random enumeration). Anticipating such interpretations Wolpert and Macready formulated two counterarguments: 1. In practise the function to be optimised is usually completely specified, or at least its general form is known and only some parameters may vary across instances. Yet even in this case of quite detailed knowledge some important characteristics of the function are still unknown (e.g. the optima). In effect, one knows nothing or very little about the optimised function, and this ignorance may be expressed with the assumption that actually any function is for one equally likely (uniformly distributed). 2. If one has some knowledge about the properties of the problem to be solved, but this knowledge is not included in the proposed algorithm, then the distribution of functions this algorithm encounters is effectively uniform. In this case there is simply no guarantee that an arbitrarily chosen algorithm will perform well on a function it knows nothing about. These counterarguments raise the issue that algorithms, in order to perform well on a certain problem, have to have some knowledge on the problem implemented. This issue will be later commented upon in more detail. The claim that unequal probability of functions in practise undermines the NFL theorem may be also disputed to some extent with the strengthened version of the theorem. Whitley & Watson (2006) rightly point out that the properties of the permutation closure allow some unequally distributed cases. Thus, even for unequally distributed functions the No Free Lunch result may hold. It should be clearly stated, however, that these cases are of very specific type and certainly do not cover all possible non-uniform distributions. Proved cases of non-applicability of the No Free Lunch theorems Nevertheless, there is some support for the non-appliance of NFL arguments to ‘real problems’, as Reeves and Rowe say (2003). They cite a work by Igel and Toussaint where these authors prove that sets of functions with certain properties (e.g. some kind of neighbourhood relation and a reasonable limit on the number of local optima) are not closed under permutation and are not subject to the No Free Lunch result. Some support for this conclusion is given by Whitley & Watson (2006). They cite another research where it was proved that the NFL does not hold for sets of polynomials of a single variable and bounded complexity. Summary It may be said that the NFL theorems might not apply to practical problems when something more is known about them, but at the moment there are still to few arguments to be sure, and the notion of ‘something more’ remains very vague. Therefore, the NFL theorems may not be generally disregarded due to this issues yet. 3.3. The No Free Lunch theorems vs. the practise of optimisation 3.3.2 25 Argument no. 2: practical algorithms are not subject to No Free Lunch Another line of criticism against the NFL theorems concerned the issue whether practical algorithms are subject to its assumptions. Revisiting previously sampled points According to Reeves & Rowe (2003) the idealised algorithm assumed in the NFL theorems differs considerably from real-world algorithms. They note that Wolpert & Macready (1997) assumed no revisiting of previously sampled points by an algorithm, and state that such an assumption is debatable. Firstly, they say that revisiting of solutions by any algorithm may happen very often if some countermeasures are not adopted (like in tabu search, for example). Moreover, such revisiting is not costless (in terms of computing time), so the amount of revisiting an algorithm does may be the basis of some difference in performance between it and some other algorithm for very broad sets of functions (even all possible discrete functions). Secondly, Reeves and Rowe indicate that it follows from the NFL theorem itself that revisiting should be avoided when possible, because an algorithm which revisits less points cannot be on average worse than others. They also note that the idea of limiting revisits was the very basis of many algorithmic innovations which are said to perform well in practise. Therefore, the amount of revisiting cannot be omitted in practical considerations. There is much truth in these arguments: revisiting of previously sampled points is an issue in practise an incurs additional computation cost. It is hard to agree, however, that this issue undermines the whole theorem, because when we imagine all algorithms equipped with some sophisticated no-revisiting policies, they basically become subject to the No Free Lunch theorem again. Limiting revisits or inducing diversity may be (and most probably are) good algorithmic ideas, but they do not solve the problem of equal performance across all functions. The assumption of black-box search Culberson (1998) stresses the fact that the original formulation of the NFL theorem applies only to black-box search: the algorithm knows nothing about the optimised function except the previously sampled points. It may be seen that this is also the assumption of the strengthened version of the theorem (Schumacher et al. 2001). This black-box assumption is seen by Culberson as an important weakness of the NFL theorems: in practise the problem to be solved is most often known beforehand (as it is assumed in the classical computational complexity theory) and the designer may (and usually does) incorporate some problem-specific knowledge into the solving algorithm. Hence, practical algorithms are not left problem-blind but made problem-aware by their designers. Whitley & Watson (2006) state that the NFL theorems hold only due to this black-box assumption: no algorithm is able to efficiently search the given function, because ‘we do not have information about what a solution might look like or information about how the evaluation [function] is constructed that might allow us to search more intelligently’. For instance, no lower bound of the evaluation function may be computed in any subspace of the search space, because absolutely any value of the objective function may happen to be there, including the optimum. Hence, no cuts-off are possible e.g. in a branch and bound algorithm. Culberson’s comments on the No Free Lunch raise again the issue of implementing problem- 26 The No Free Lunch theorems and their consequences for optimisation specific knowledge into search and optimisation algorithms: they should have such knowledge incorporated in order to perform well on a specific problem. What is more important, Culberson’s paper (1998) shows that when the problem to be solved is known in advance, there is a chance of escaping the No Free Lunch result, because algorithms operate in the black-box environment no more. They operate in a more specific setting of the classical computational complexity theory. Black-box search vs. computational complexity theory Culberson (1998) demonstrates that there is a huge gap between the assumptions of the original NFL theorem and the assumptions of the computational complexity theory. He describes five cases of an algorithm’s knowledge about a problem, the first being full knowledge (classical theory) and the last being no knowledge (black-box search). Then he presents different implications of those cases. The most important one is that the last case includes the set of problems even larger than the whole class NP. It means that by assuming black-box search one is not trying to solve a particular (probably hard) problem, but all such problems at once. Therefore, it is not surprising that in such a case all algorithms have equal performance, like the NFL theorem says. Culberson concludes: It is difficult to see how throwing away information and enlarging the problem classes with which an algorithm must cope can improve the search capabilities. The NFL theorems are much weaker than the intractability [i.e. complexity] theory in the sense that the claims of difficulty are made on much larger classes of problems. This argument shows that when problem-specific knowledge is incorporated in an algorithm it may escape the original NFL result. It appears, however, that it is still subject to the issues of the classical computational complexity theory where better and worse algorithms for particular problems do exist. Nevertheless, there still is the strengthened NFL theorem which says the NFL result holds only for sets of functions closed under permutation. This may be seen as an argument against Culberson’s point of view, since there are sets of functions smaller than NP for which the NFL theorem is true. However, Whitley & Watson (2006) note that there is no proved example of a problem in NP which is closed under permutation. Moreover, if there were any, it would also imply P 6= N P , solving the most famous problem in the computational complexity theory, so this is not very likely. Hence, the strengthened NFL formulation appears not to undermine Culberson’s argumentation. Summary The black-box assumption behind the No Free Lunch theorem looks like its weakest point, because when problem-specific knowledge is implemented in an algorithm, it may escape the consequences of the theorem. But it does not escape the issues of the computational complexity theory. 3.3.3 Argument no. 3: not only the sampled points matter Wolpert and Macready are well aware of the fact that when comparing algorithms not only the sampled points do matter. In this context, they say about the NFL theorem (Wolpert & Macready 1997): ‘measures of performance based on other factors than [the sampled points] dym (e.g. wall clock time) are outside the scope of our result’. They mention computation time probably because 3.4. Conclusions: practical implications of the theorems 27 it is usually one of the key criteria of algorithm’s performance (Hoos & Stutzle 2004). Yet, this criterion is not addressed in the No Free Lunch theorems. The revisiting argument by Reeves & Rowe (2003), which was mentioned in the previous section, also hints at this issue: revisiting of points in search space is not costless and incurs some additional computation time on algorithms. Thus, performance of algorithms may be different across all problems when the time of computation is considered. Culberson (1998) sees this issue of computational effort as a strong argument against the practical applicability of the NFL theorems. He notes that when we assume that only performance measures based on sampled points are used and we have some additional knowledge about the problem to be solved, then we get nonsense with respect to the computational effort as a result. He gives an example showing that when the black-box assumption is relaxed, the No Free Lunch reasoning clearly fails to identify the effort. Consider an algorithm which is given a black-box function to optimise, yet it knows that inside that box there is some specific function (e.g. an instance of the NP-hard problem of minimum graph colouring). Then the algorithm may ignore the black-box, compute the optimum solutions without sampling the black-box even once by simulating the given function on its own. From the point of view of the NFL theorems this algorithm is a perfect one, since it generates the proper solution ‘instantly’ (no samples required). From the rational point of view, the algorithm probably requires huge amount of time to complete on reasonably-sized instances (unless P=NP). This illustrates the practical nonsense of estimating performance based only on samples of evaluations in presence of additional knowledge about the optimised function. This issue is also visible when comparing algorithms which solve exactly the same problem. It may happen that two algorithms employ the very same search policy (i.e. they always visit the same points in search space in the same order), but they differ in some implementation details, so the first algorithm is only faster than the second. While from the point of view of the NFL performance measures the algorithms are effectively equal, the first algorithm is clearly better. Of course, such situation may happen only when the algorithms have implemented some knowledge about the problem being solved. 3.4 Conclusions: practical implications of the theorems In all the arguments against the practical applicability of the No Free Lunch theorems there is a point when it comes to the issue of algorithm’s knowledge about the optimised function. Each of these arguments becomes strong when it is assumed that the problem to be solved is known to the algorithm, because: • most probably the set of considered functions is not closed under permutation (although this is only a conjecture), • the algorithm runs in the black-box environment no more, • otherwise identical algorithms may differ in speed due to implementation details. It appears, therefore, that this issue of problem-specific knowledge indicates the most important practical implications of the NFL theorems. 3.4.1 No general tools of optimisation The theorems imply that there are no general, ‘magic’ tools of optimisation which perform well across a large variety of problems. This is because if there is no problem-specific knowledge 28 The No Free Lunch theorems and their consequences for optimisation exploited in an algorithm, it works in the black-box environment and inevitably becomes subject to the NFL result (its performance is equal to that of random enumeration). The point in designing and applying algorithms in practise is to escape the black-box environment (Culberson 1998, Wolpert & Macready 1997) and this cannot be done only with syntactic manipulations on bit strings describing solutions of some unknown problem. Hence, there are also no metaheuristics which could be always better or more general than others. This applies to all types of metaheuristics (those mentioned in chapter 2 among others). 3.4.2 There is some structure in search space of particular problems The NFL theorems also indirectly indicate that there must be some structure in the search space of a problem if an algorithm is to perform better than random enumeration. Reeves & Rowe (2003) confirm this implication: ‘Practitioners seem to believe that there is something about their problems that make them amenable to solutions by a particular method (such as a GA) more rapidly and efficiently than by random search’. Kominek (2001) also subscribes to this point of view. 3.4.3 Structure should be exploited However, the structure in search space is not enough. Wolpert & Macready (1997) state that: ‘while most classes of problems will certainly have some structure which, if known, might be exploitable, the simple existence of that structure does not justify choice of a particular algorithm; that structure must be known and reflected directly in the choice of algorithm to serve as such a justification’. Very similar statements were given by Culberson (1998), Schumacher et al. (2001), Reeves & Rowe (2003) and Whitley & Watson (2006): the algorithm should be matched to the structure in search space. In case of metaheuristics it means that the general outline of an algorithm has to be specifically adapted to the given problem. Since the main steps of a metaheuristic are usually not modified, the adaptation should focus on the components only vaguely described in the general scheme. These are, for instance, neighbourhood operators in local search or recombination operators in evolutionary algorithms (see table 2.1). Problem-specific knowledge should be introduced in such components for the sake of efficiency (Jaszkiewicz & Kominek 2003). 3.4.4 Analysis first, exploitation second Wolpert & Macready (1997) note that ‘while serious optimization practitioners almost always perform such matching, it is usually on a heuristic basis’ and ask a question: ‘can such matching be formally analyzed?’ There is no guarantee that an informal matching of an algorithm to a problem, one very much based on experience, will eventually lead to a good algorithm. Perhaps it should be done some other way. First, some analysis of search space of the given problem should be performed. The obtained knowledge of the problem characteristics should be then exploited in the designed algorithm. This is also David Wolpert’s point of view (Wolpert 2005). 3.4.5 What is structure? Reeves & Rowe (2003) say that there is an important difficulty with the analysis of problem’s structure: ‘while the existence of structure is undeniable for many practical problems, it is not easy to pin down exactly what it means, nor how it might be possible to make use of it’. 3.4. Conclusions: practical implications of the theorems 29 Perhaps the strengthened formulation of the NFL theorem may shed some light on the problem. Permutation closure is the finest level of granularity at which the No Free Lunch result may hold (Schumacher et al. 2001). Thus, it may be the case that the difference between the given problem and the smallest set of functions closed under permutation which contains the problem indicated what structure in the problem is. At first sight it appears to be a difficult research area, though. Some other kind of problem-specific knowledge may be anything that accelerates computation. These may be techniques alike to those used in neighbourhood-based search methods: • computation of the difference of the objective function between neighbouring points instead of evaluating each point from scratch (Merz 2000, Hoos & Stutzle 2004, Kindervater & Savelsbergh 2003), • setting the appropriate order of evaluations of points in the neighbourhood (Kindervater & Savelsbergh 2003), • caching evaluations of neighbours from previous search steps (Merz 2000, Hoos & Stutzle 2004) Yet another type of knowledge about a problem may have the form of a proved upper bound on the optimum value of the objective function. Whitley & Watson (2006) mention an example of an approximate algorithm for the Euclidean TSP which may be used to make sure certain parts of search space are never examined in e.g. a branch and bound algorithm. 3.4.6 Caution required while evaluating algorithms on benchmarks Wolpert & Macready (1997) say in their conclusions from the No Free Lunch theorem that it is dangerous to compare algorithms based on their performance on a small sample of test problems. This is an important remark, since an algorithm may perform well on one set of test problems and yet exhibit poor performance on some other set. Reeves & Rowe (2003) and Whitley & Watson (2006) also much concern about evaluating algorithms on test problems, especially when instances are created by some random generator. In the light of the NFL theorems, ‘there is no guarantee that algorithms developed and evaluated using synthetic benchmarks will perform well on more realistic problem instances’ (Whitley & Watson 2006). It seems, therefore, that great caution is required while comparing algorithms based on experimental testing. When synthetic benchmarks are used, it should be ensured that they are not biased, but exhibit properties of real-world instances, because we surely do not want our algorithms to be tuned to artificial problems. But if it is difficult to so (e.g. due to the mentioned problems with properly defining characteristics of practical instances), examples of problems from real-world applications should be used. Nevertheless, even if test cases are realistic, it still may happen that designed algorithms are overly tuned to these cases. This is a threat especially to algorithms which have been refined for years (Whitley & Watson 2006). A very similar problem exists with parametrised algorithms, for which massive search across the space of parameters is performed. The resulting algorithm may perform well on test instances, but if there is no knowledge on how to adjust the parameters to new instances, it may fail to perform well on such new examples. Perhaps a procedure similar to resampling or cross-validation (Manly 1997, Duda et al. 2001) would be also of use in optimisation. These procedures are widely used in machine learning in the classification task to avoid over-fitting of a classifier to a learning set. In case of optimisation 30 The No Free Lunch theorems and their consequences for optimisation some test instances should be hidden from the algorithm designer and made available when the algorithm is completely specified. This idea is well implemented in some computational contests, e.g. the series of ROADEF Challenges (Cung 2005b) or ACM Collegiate Programming Contests. 3.5 Implications for evolutionary algorithms In the light of the No Free Lunch theorems ‘the faith in using an EA as a blind search optimization engine is misplaced’ and ‘there is no reason to believe that a genetic algorithm will be of any more general use than any other approach to optimization’ (Culberson 1998). Even so, many enthusiasts of neo-Darwinism would say that natural evolution is the best evidence that evolutionary engine based on survival of the fittest works in practise; after all it was able to create complex organisms perfectly adapted to difficult dynamic environments (see e.g. works cited by Reeves & Rowe (2003)). While reading the book by Michalewicz & Fogel (2000) one may get an impression that these authors subscribe to some extent to this enthusiastic point of view. However, as Culberson (1998) neatly puts it, ‘the fact of natural evolution does not indicate where these areas of applicability [of EAs] might be, and it certainly yields no basis to claim EAs as a universal magic bullet’. Similarly, Muhlenbein (2003) says: ‘I object to popular arguments along the lines: this algorithm is a good optimization method because the principle is used in nature’. Reeves & Rowe (2003) also thoroughly investigated the issue and in the first chapter of their book they notice that: • neo-Darwinism is a seductive theory and very often it is sufficient to quote the theory of evolution and the name of Darwin to justify evolutionary algorithms as general tools of optimisation without any further explanation; • at the same time, the mechanisms of natural evolution are in many cases unknown or insufficiently explained, and there is much speculation about them, without sound proofs; • natural evolution most probably optimises nothing (at least no objective has been shown yet), hence it is no justification for the use of evolutionary algorithms as a tool for optimisation. Very similar arguments about the seductive power of neo-Darwinism were presented by Andrew Tuson during a conference on operational research (Tuson 2005). Therefore, it should be concluded that evolutionary algorithms are nothing more than some abstract, mathematical construction, some optimisation engine which has little in common with natural evolution. It is the adaptation of the engine to the given problem which is the basis of success (or failure) in optimisation, as indicated by the No Free Lunch theorems. That is the reason why the adaptation of a kind of an EA is the focus of attention of this thesis, and some ways of adaptation will be discussed in the next chapter. Chapter 4 Adaptation of an evolutionary algorithm to a combinatorial optimisation problem Since it is the adaptation of a metaheuristic algorithm to a combinatorial optimisation problem which greatly affects the algorithm’s performance, it is necessary to know which components of an evolutionary algorithm should be adapted. The methods of adaptation which are given in the literature will also be discussed here. The components of an EA which should be adapted are (see also table 2.1): the representation (encoding) of solutions, the fitness function, the method (methods) of generation of solutions for initial population, ‘genetic’ operators: crossover and mutation, local search (if a memetic algorithm is designed). Other components of evolutionary algorithms are tuned rather than adapted to a problem. These are, among others: the selection method, the stop condition, the population size, probabilities of crossover and mutation. The difference between adaptation and tuning of components is subtle but important. The adapted components, like the fitness function, depend mainly on the meaning of the considered problem or, like representation and operators, on contents and structure of solutions of the problem. Usually, they cannot be easily changed during execution of an algorithm. Conversely, the tuned components do not depend on these issues, but rather on evaluation of solutions in the population. Therefore, they are parameters of the method, either numerical or procedural, which may be changed during execution without much effort. This distinction between adapted components and tuned parameters is not universally agreed upon in the literature. There are authors who through adaptation mean exactly the tuning of EA’s parameters, either before or during an algorithm run (Michalewicz & Fogel 2000). Some other authors (Bonissone et al. 2006) mix adaptation with tuning, focusing on the latter. The author of this thesis is against such mixing; he agrees with scientists who concentrate on the components to be adapted, like Hoos & Stutzle (2004) or Falkenauer (1998), ‘because if those are not appropriate, the GA will not deliver, whatever the settings [of parameters]’. 4.1 Representation of solutions Many authors stress the issue of a good representation of solutions for the given combinatorial optimisation problem (Michalewicz 1996, Falkenauer 1998, Michalewicz & Fogel 2000, Reeves & Rowe 2003, Hoos & Stutzle 2004). At the beginning of evolutionary (genetic) algorithms there was the prevailing opinion that 31 32 Adaptation of an evolutionary algorithm to a combinatorial optimisation problem binary encoding should be used independently on the problem being solved (Goldberg 1989). This opinion was based mainly on Holland’s Schema Theorem, which was deemed the foundation of genetic algorithms. The advantage of binary representation was supposed to be the largest possible number of schemata generated by this representation in comparison to other ones (Goldberg 1989). However, this computation of the number of schemata for other representations appeared to have had some important flaws (Reeves & Rowe 2003). Moreover, the Schema Theorem itself faced severe criticism. Altenberg (1995) concluded his analysis: ‘the Schema Theorem has no implications for how well a GA is performing’. Muhlenbein (2003) expressed similar objections. Therefore, this theorem cannot be the basis for design of representations in evolutionary algorithms and the importance of binary representation diminishes. There were some attempts at varying the representation of solutions during the run of an EA. Michalewicz & Fogel (2000) cite experiments with changes of precision in binary-encoded variables, with delta encodings and messy GAs. However, these attempts focused only on binary encodings and led to no general conclusions about where and when such techniques should be used. After the publication of the book by Michalewicz (1996) it appeared that a good representation is the one which is in some sense ‘natural’ and close to the definition of the given optimisation task. This was the main idea of Michalewicz, based on intuition and experiments with diverse problems: binary, numerical and combinatorial. Falkenauer’s experiments with the so-called grouping problems (e.g. graph colouring or bin-packing) showed that solutions should be encoded in a way that is ‘natural’ to the problem, as well. In his case the notion of a group was stressed upon in a representation, as the one most important in grouping problems; his observations were also used in algorithms designed by Lewis and Paechter (2005a, 2005b). Other authors also emphasise the importance of problem-specific representations used in EAs for combinatorial problems (Hoos & Stutzle 2004, Reeves & Rowe 2003), but hardly give any more specific guidelines. Somehow linked to the argument of ‘naturality’ is the fact that in the family of all bijective representations of solutions of the given problem there is no choice of a representation which is better that any other one (Michalewicz & Fogel 2000). Further, Michalewicz and Fogel conclude that in the light of this argument the best option is to use data structures which are intuitive with respect to the formulation of the considered problem. One can see that despite many years of research on applications of evolutionary algorithms to combinatorial problems, there are still hardly any specific guidelines concerning the design of a good representation, except advice of naturality and intuition, and some more specifics in case of certain problems (e.g. the grouping ones). However, the importance of representation is somehow lessened by the fact of duality of representations and operators (Altenberg 1995): the effect of change in a representation may be also achieved with the same representation and different operators of crossover and mutation. It appears, therefore, that the issue of representation may be considered as a rather technical one: a container for solutions. The structure of this container should mainly enable fast operations: evaluation, crossover, mutation and local search (if used). The direction of search should be chosen by these operators and not necessarily by the representation of solutions. 4.2 Fitness function In many cases the fitness function which guides evolutionary search is simply identical to the objective function specified in the formulation of the given problem (Hoos & Stutzle 2004). Michalewicz & Fogel (2000) say, however, that it happens mainly for problems taken from literature, where the objective function is indeed described, like in the TSP, the knapsack problem, or other problems 4.3. Initial population 33 in operational research. They claim that real-world problems are not that easy from this point of view, but give examples of problems of optimal control, not combinatorial optimisation. Nevertheless, it happens in combinatorial optimisation that the originally given objective function is too coarse-grained to properly guide the search. This is the case of the problem of satisfiability of boolean expressions (SAT) (Hoos & Stutzle 2004, Michalewicz & Fogel 2000) which is, in a way, a needle-in-a-haystack problem: only the optimal solutions (needles) have the objective equal to one; all other solutions have it equal to zero. That is why in optimisation a more fine-grained objective function is considered, which gives rise to the MAXSAT problem (Hoos & Stutzle 2004). Very similar difficulty was encountered by Falkenauer (1998) in the bin-packing problem; he changed the fitness function to be more fine-grained in order to discriminate worse and better solutions easier. The same strategy of adaptation of the fitness function was applied by Lewis & Paechter (2005b) in the case of a university course timetabling problem, with some success. Not only in these circumstances the fitness function is somehow adapted to both the problem and the algorithm. More generally, if the original objective function is hard to optimise, e.g. highly multimodal and rugged, its modification may make the search process easier. But then one has to remember that the modified function should at least have the same optima as the original one. Moreover, it is desired that this new function be rather smooth and without many modes (Michalewicz & Fogel 2000). Except for those general remarks it is hard to find some more detailed rules for such a modification, though. Some authors also tried to modify the fitness function dynamically during the run of an EA (Michalewicz 1996, Michalewicz & Fogel 2000). The modifications concerned mainly some parameters of one fitness formula given in advance, not the whole formula itself. They were also focused only on highly constrained problems. Michalewicz & Fogel (2000) discuss the issue how to adapt the fitness function to a problem with constraints. They stress the fact that real-world problems always come with some additional constraints on feasibility of solutions. In such cases it may be beneficial to accept infeasible solutions to the population, but they have to be assigned sensible values of fitness, they say. First option they propose, though, is to always eliminate infeasible solutions. They claim that it is one of the most sensible heuristics to maintain feasibility of solutions in the population all the time through application of specialised representations and operators. It is especially useful when the space of feasible solutions is convex and is a large part of the whole search space. They recommend second option when these conditions are not satisfied, e.g. it is hard to generate any feasible solution from scratch. In such a case they suggest accepting infeasible solutions with some penalty in fitness values. This way any feasible solution is almost always better than the penalised solutions and will be accepted to the population if only generated. Michalewicz & Fogel (2000) warn, however, that there are no general and useful heuristics on how to define such a penalty function; the designer of an EA has to rely on their intuition and experience, it seems. 4.3 Initial population In the literature one may find two general goals of the generation of an initial population, the goals which are somehow conflicting. First goal is to ensure good coverage of search space with the generated solutions. It is to make sure that potentially any solution may be generated by the subsequent artificial evolution process; this goal is also referred to as enabling high diversity of initial population. The second goal, though no less important, is to generate solutions of good quality, since 34 Adaptation of an evolutionary algorithm to a combinatorial optimisation problem optimisation is the ultimate task of the designed algorithm. One of the suggested methods of initialisation consists in completely random generation of solutions (Hoos & Stutzle 2004, Michalewicz & Fogel 2000, Reeves & Rowe 2003), which usually means a number of independent, unbiased draws from the search space (simple random sampling (Cochran 1953)). This method is supposed to create an initial population with high diversity, but this actually depends on pure luck, as some remark (Reeves & Rowe 2003). Moreover, it might be difficult to implement for practical combinatorial optimisation problems. The notion of a random structure (e.g. tree, graph, assignment, family of sets) may have very different meaning there, depending on the actual context. The possible presence of constraints also limits the application of purely random sampling. Finally, from the point of view of optimisation (the second goal), the resulting population is usually of rather poor quality. Another way of initialisation focuses even more closely on the issue of diversity of the generated solutions. It requires that some more sophisticated statistical methods of sampling are used instead of simple sampling, in order not to depend on luck only; the initial population is to uniformly cover the search space on purpose. The methods suggested in this approach include generation of solutions located in a mesh (Michalewicz & Fogel 2000) or through generalisation of the notion of a Latin hypercube (Reeves & Rowe 2003). However, these methods are mainly applicable to binary, integer or continuous optimisation problems, where free variables are available that can be independently assigned. Similarly to the first initialisation method, it is usually quite difficult to apply these ideas of systematic sampling to combinatorial structures. The quality of obtained solutions is also problematic. Since it is difficult to create an initial solution to a combinatorial problem by random sampling, Michalewicz & Fogel (2000) suggest a more direct way of ensuring high diversity in the first population. They propose that each solution added to the initial set should be checked for its distance to the already generated ones. If this is too low, then this solution should not be admitted to the population. This way only a diverse enough population is produced, although the whole procedure may be significantly more time-consuming than simple random sampling. One may notice that a sensible distance or diversity measure is required to implement this method in practise. The methods discussed so far specifically addressed the issue of diversity of the initial population and disregarded the question of quality of solutions. Therefore, with the goal of optimisation in mind, many authors (Falkenauer 1998, Hoos & Stutzle 2004, Merz 2000, Michalewicz & Fogel 2000, Pawlak 1999, Reeves & Rowe 2003) suggest that the initial population necessarily be seeded with known good solutions coming e.g. from other heuristic techniques, specific to the problem being solved. This method, one can see, is also in accordance with conclusions from the No Free Lunch Theorems (see chapter 3) that problem-specific knowledge about good solutions should be exploited in the designed algorithms. Although it usually helps in finding good solutions quickly, it may also be the source of premature convergence of the following evolution process to sub-optimal solutions, the authors warn. Thus, this approach should be used with some care, often backed up by additional randomisation (Merz 2000) and applied, perhaps, only to a fraction of the initial population. To summarise, there are some guidelines concerning initialisation of an evolutionary algorithm. One can use simple or more systematic sampling, initialisation based on diversity or some problemspecific, randomised heuristic methods. However, except for the last recommendation, there seems to be no focus on adaptation of the initialisation procedure to the problem being solved, only on the aspects of diversity and quality of generated solutions. Even the last suggestion, of problemspecific heuristics, is rather vague and leaves the choice of appropriate procedures to the designer of an EA. 4.4. Crossover operators 4.4 35 Crossover operators Crossover is an operation which usually creates one or two new solutions (offspring) based on a pair of given solutions (parents). Sometimes a generalisation of this procedure with multiple parents is called recombination (Merz 2000, Michalewicz & Fogel 2000). 4.4.1 Importance of crossover Opinions as to the importance of crossover (recombination) in EAs vary between authors. Hoos & Stutzle (2004) say that designing recombination operators is one major challenge in designing evolutionary algorithms. In his seminal book, Michalewicz (1996) also subscribes to this point of view, explaining the power of EAs exactly by proper exchange of information through crossover. Michalewicz & Fogel (2000) also noticed that the design of recombination of operators was the most important part of much research they had seen. However, they claimed that the focus on crossover had been mainly historically motivated, and that there was no reason not to design operators of other types, e.g. alike to their well-performing inver-over operator for the TSP. Moreover, Reeves & Rowe (2003) cite dissenting opinions about the importance of crossover operators which claim that mutation is superior. Despite these arguments, the author of this thesis still deems that crossover is an important part of EAs. Michalewicz & Fogel (2000) might have been right with the argument concerning the history of EAs, but one can see in their discussion that they almost directly hint at their inver-over operator as an alternative to crossover. But this operator is actually a kind of crossover coupled with local search, so this is not a real alternative, especially to memetic algorithms. The arguments cited by Reeves & Rowe (2003) are perhaps more powerful, but in the author’s opinion they are not enough to diminish the major importance of crossover operators in EAs, as it is generally perceived in the evolutionary computation community; crossover usually considerably accelerates computation, they admit (Reeves & Rowe 2003). 4.4.2 The Schema Theorem and the choice of crossover Concerning the adaptation of crossover to specific problems of combinatorial optimisation, the first thing to say is that designs based mainly on Holland’s Schema Theorem and the hypothesis of building blocks (Goldberg 1989) are no longer valid. As already mentioned in section 4.1 on representation, the theorem received adverse criticism in 1990s (Reeves & Rowe 2003). Culberson (1998) directly points out that the Schema Theorem is an attempt at ‘getting a free lunch’ (see chapter 3), because it is assuming nothing about the optimised function. He also mentions that very similar theorems may be easily proved for actually any definition of a schema; it is only necessary to modify crossover and mutation accordingly. Altenberg (1995) puts forward an argument of performance, proving with Price’s Covariance and Selection Theorem that ‘the Schema Theorem (. . . ) does not address the search component of genetic algorithms on which performance depends, and cannot distinguish genetic algorithms that are performing well from those that are not’. He concludes that ‘schemata are therefore not informative structures for operators and representations in general’, but their relevance depends on the application of single-point crossover. Thus, from the point of view of the author of this thesis, the time of universal application of this crossover operator to any problem is likely to have finished. 36 4.4.3 Adaptation of an evolutionary algorithm to a combinatorial optimisation problem Adaptation of crossover to a problem Michalewicz (1996) was one of the first researchers to emphasise the importance of a crossover operator adapted to a given problem. As a matter of fact, the idea of problem-dependent crossovers is one of cornerstones of his extremely popular book. But while he correctly noticed that each crossover operator performed well for certain problems and poorly for others (which was an excellent conclusion in the time before the No Free Lunch Theorems), he did not give much guidance on how to design a crossover. Later, with Fogel (2000), they only admitted there were no proper choice of crossover for all possible problems. When attempting to design a crossover operator for a specific task, the first issue to be analysed is the role of the operator in the developed algorithm. Merz (2000) rightly points out the difference in the role between evolutionary and memetic algorithms. In EAs crossover is required to produce offspring with fitness higher than these of their parents, so the offspring could survive selection. Memetic algorithms have a different role for crossover. Its goal is to produce a starting point for the subsequent local search; crossover actually has to produce offspring in the attraction region of a local optimum with high fitness, but the offspring themselves do not have to be extremely good solutions. Merz concludes his analysis stating that crossover (and mutation) designed for an evolutionary algorithm does not have to fit a memetic one well. The second adaptation issue to be addressed while designing crossover is its mutual dependence with representation of solutions. Michalewicz & Fogel (2000) say, for example, that it is crossover which strongly depends on the chosen representation. On the other hand, Altenberg (1995) notices the effect of duality of representation and crossover (this was already mentioned in section 4.1). As an illustration of this duality one may consider a representation of solutions and a number of crossovers which may be applied only to some different representations. In such a case a number of encoding-decoding procedures would be required to enable the crossovers, but, in effect, the crossovers would not depend on this basic representation at all. Of course, the application of an encoding-decoding procedure in a real algorithm would be time consuming, but this argument only stresses the issue of efficiency of the pair (representation, crossover), not any crucial semantic dependency. Therefore, one may say that there is no strong dependency between a representation and some crossover operators, unless efficiency is at stake. The third issue of adaptation of the designed algorithm to a problem concerns the feasibility of generated offspring. Usually, simple crossover operators from the literature do not produce valid solution candidates (Hoos & Stutzle 2004, Michalewicz & Fogel 2000, Reeves & Rowe 2003). Therefore, especially for constrained problems, specific operators have to be devised, which do not produce infeasible solutions or do it only with small probability. Lewis & Paechter (2004) emphasise the importance of feasibility in case of a timetabling problem, for example. Except for these few and general guidelines on how to design crossover operators, until rather recently there was no clear advice in the literature on how to adapt a crossover to a given problem. One may notice it while reading the book by Michalewicz & Fogel (2000), for example. These authors clearly place the full responsibility for the adaptation on the designer, only suggesting that the designer’s intuition about the search space is important. Another example of such a point of view may be found in the book by Reeves & Rowe (2003), where they say that ‘we should try to make the representation and operators as ‘natural’ as possible for the problem’ and ‘a systematic method of designing appropriate mixing [i.e. crossover and mutation] operators would be extremely helpful’. Even though apparently at the moment there are no such systematic methods and the design of crossover remains generally a difficult task, Reeves & Rowe (2003) note in the summary of their 4.4. Crossover operators 37 book that the design may not be as hard a job: if we happen to know which properties ought to be preserved during the search process. In this case, there are results that may help, but they rely on having a good understanding of the nature of the search space. Therefore, in order to see what property (or properties) of search space these authors have in mind and what it means to preserve it, some examples of crossover designs for particular problems of combinatorial optimisation will be discussed in the following sections. Travelling salesman problem Due to its central role as a benchmark problem of combinatorial optimisation, the TSP received special attention of researchers working in evolutionary computation and a large number of diverse crossover operators was devised (Merz 2000, Michalewicz & Fogel 2000). Moreover, the design of crossover (recombination) was, together with local search, the focus of most research effort on EAs for the problem, as Hoos and Stutzle say (2004). Hence, there is much serious work on the issue which may serve as a good source of knowledge (e.g. Hoos & Stutzle (2004), Michalewicz (1996), Michalewicz & Fogel (2000)), and the interested reader is referred there. Here, only the issues most important for this thesis are addressed. Historically first attempts at designing crossovers for the TSP concentrated on adjusting the operator so it worked on the chosen representation of solutions (Michalewicz & Fogel 2000). It was observed that binary representation was inadequate for the problem (Michalewicz 1996), but despite this fact there were attempts at applying one-point crossover to solutions described in an adjacency-based representation. Criticising this poorly performing approach, Michalewicz & Fogel (2000) stated that it was not enough to create feasible solutions through crossover; offspring had to inherit good properties of their parents. However, at the time it was hidden what a good parent trait may be. Later, some more ‘natural’ representations for the TSP were designed and, consequently, new crossovers appeared. These were (Michalewicz 1996, Michalewicz & Fogel 2000): partially-mapped crossover (PMX), order crossover (OX), cyclic crossover (CX) and operators applied to matrix representations. Some of these operators preserved certain characteristics of parents in the resulting offspring, e.g. the relative order of vertices, their absolute order or position in the genotype. Nevertheless, it was apparently still unknown which property of crossover is the most important for the TSP. However, many researchers (Michalewicz & Fogel 2000, Reeves & Rowe 2003) had the intuition that since these were edges in a TSP solution which contributed most directly to the solution’s quality, a crossover operator had to be focused on edges. In effect, many crossover operators were proposed which attempted at preservation of parental edges in offspring. These were: maximum preservative crossover (MPX) by Gorges-Schleuter (Merz 2000), edge recombination (ER) by Whitley et al. (1989), edge assembly crossover (EAX) by Nagata & Kobayashi (1997). The idea behind MPX and ER was to preserve in offspring as many parental edges as possible; ER was actually better at the task, resulting in higher inheritance rate of edges. EAX, on the other hand, not only preserved edges from parents, but also administered a dose of implicit mutations and local optimisation. This operator was the best crossover for the TSP for some time. Edge-preservation is also present in the inver-over procedure proposed by Tao and Michalewicz (described also by Michalewicz & Fogel (2000)). It is neither a recombination nor a local search operator, but it merges the concepts into one, with good optimisation results. 38 Adaptation of an evolutionary algorithm to a combinatorial optimisation problem There were even more examples of crossover operators which tried to preserve edges in TSP solutions. Hoos & Stutzle (2004) cite work by Seo and Moon where a well-performing crossover of this type was described. It seems, therefore, that the presence of certain edges is an important property of solutions to this problem. Moreover, the preservation of parental edges in offspring solutions is a crucial part of any crossover for the TSP, although this knowledge was acquired after many years of research on the problem and mainly through intuitive insights into its nature. These intuitions and results were finally confirmed by more rigorous empirical analyses of search space of TSP instances. Kirkpatrick & Toulouse (1985), Boese et al. (1994), Boese (1995) and later Merz (2000) performed fitness-distance analyses of the problem and concluded that it was visible in the search space itself that the preservation of edges was a crucial property of any metaheuristic algorithm for the TSP. They exploited this result either by designing efficient local search algorithms (Boese et al. 1994) or memetic algorithms with edge-preserving crossover operators (Merz 2000). Merz designed his operators based exactly on the results of the mentioned fitness-distance analyses. His experiments confirmed that the designs based on this analysis resulted in well-performing memetic algorithms. To summarise the case of the TSP, one may say that the research intuition about the importance of edges in the problem and their preservation in the designed crossover was finally confirmed and reinforced by an empirical analysis of search space of the TSP, the fitness-distance analysis. Hence, if it is possible to perform such an analysis for other combinatorial optimisation problems and obtain similar results, it may be the basis for design of crossovers and yield powerful operators. Grouping problems Under the notion of grouping problems one usually finds the problems of bin-packing and graph colouring (Michalewicz 1996, Falkenauer 1998), although some more specific problems also contain the aspect of grouping (e.g. the university course timetabling problem (Lewis & Paechter 2005a), the workshop layout problem (Falkenauer 1998), etc.) Here, mainly the case of the graph colouring problem will be discussed, with some remarks concerning other problems. First genetic approaches to the problem were focused on the choice of crossovers which would be adequate to the chosen representations. According to Michalewicz (1996), Davies encoded solutions of the problem with permutations. A colouring was obtained from a permutation by means of a greedy decoding procedure. The chosen crossover was based on order; it actually was the operator originally devised for the TSP. Michalewicz mentions also a very similar approach by Jones and Beltramo, who applied the crossover operators taken from the TSP, as well. It was simple to do so, since TSP solutions were also encoded as permutations. According to Michalewicz (1996), Jones and Beltramo experimented also with other representations and, consequently, other operators. They considered an encoding of solutions as permutations with separators, which directly indicated the beginning and the end of each group (colour). Thus, no decoding through the greedy procedure was required. For this representation, the authors employed some standard (at the time) operators: order crossover (again) and the partially-mapped crossover (PMX, taken from the TSP). However, with this approach the feasibility of offspring was at issue: the crossovers usually had to be applied several times to such a representation in order not to produce empty groups (i.e. separators standing beside each other). Another approach to the problem with GAs was connected with the encoding of groups with numbers (Michalewicz 1996). Here, Jones and Beltramo, and Hans Mühlenbein experimented first. The first authors tried to apply three different crossover operators: one-point crossover, uniform 4.4. Crossover operators 39 crossover and edge-based crossover. They concluded that with this representation the last crossover was best, although it was outperformed by the algorithm based on the greedy decoding procedure mentioned above. Michalewicz (1996) concludes, however, that this poor performance of the edgebased crossover was not due to its irrelevance to the problem, but because of the incompatibility of the operator and the representation: it was simply more time consuming to decode edges from a string describing groups than, say, greedily decode a permutation. Mühlenbein, on the other hand, designed his own ‘intelligent’ crossover which attempted at transmitting whole parts of a colouring, not separate vertices, as Michalewicz reports (1996). In the opinion of the author of this thesis, what unites all these approaches is some degree of arbitrariness in the choice of crossover operators: if an operator could be applied to the given representation, it was used in experiments. One can see that simple binary operators (one-point or uniform crossovers) and operators taken from GAs for the TSP (OX, PMX, edge-based) were mainly used. (It appears that at the time it was only Mühlenbein who was able to devise his own, specialised crossover for the graph colouring problem). However, as Michalewicz (1996) and Falkenauer (1998) point out, the nature of grouping problems is completely different from, say, the travelling salesman problem. In the TSP the focus is almost completely on edges of a graph, while in grouping problems the notion of a vertex and a group of vertices should be stressed. It was probably Falkenauer who first explicitly emphasised the notion of a group in these problems, based on his intuitive analysis of their objective functions: The cost functions of grouping problems thus yields better values in points of the search space that represent solutions containing high-quality groups. Even more so, it yields better values where high-quality groups of groups are located. The high quality groups and their aggregates constitute a regularity of the functional landscape of grouping problems (. . . ) It is this landscape that is common to all grouping problems. Due to this characteristic of grouping problems, Falkenauer (1998) criticised the use of the crossovers mentioned above: the general binary operators failed to preserve the membership of vertices of groups; the more specialised operators were better at the task, but usually transmitted only one group to offspring. He also proposed his own algorithm, the so-called Grouping Genetic Algorithm (GGA), which used specialised representation and crossover operator focused directly on groups of elements (vertices in the case of the graph colouring). The crossover operates on whole groups and completely ignores their labels; only the contents matters. It tries to merge groups of vertices from parents in the way which always preserves groups common to both parents (or groups of one parent which are completely contained in some groups of the other). The groups which are not common to the parents are usually disrupted and new groups are heuristically constructed from scratch. As Falkenauer reports (1998), the operator was an efficient one, faring better than heuristic algorithms or naive evolution without crossover. This approach was well received in the community. Michalewicz (1996) comments that Falkenauer’s crossover operator was able to transmit as much meaningful information as possible from parents to offspring and that optimisation results were very good. Also Reeves & Rowe (2003) confirm Falkenauer’s intuitions concerning the role of subset selection (i.e. groups) in his crossover, instead of simple linear combination of strings (as in simple binary crossovers, for example). However, one has to notice that Falkenauer’s approach was based entirely on his intuitive analysis of the nature of the grouping problems; neither formal nor empirical analysis was performed. The next important step in the design of a crossover operator for the graph colouring problem was taken by Galinier & Hao (1999). Following the same intuitions as Falkenauer’s they say that: 40 Adaptation of an evolutionary algorithm to a combinatorial optimisation problem in general, designing a crossover requires first the identification of some good properties of the problem which must be transmitted from parents to offspring and then the development of an appropriate recombination mechanism. Then, they contrast two possible approached to the colouring problem: the assignment and the partition approach. The former one focuses on the assignment of a label (colour) to a vertex and, hence, emphasises the role of a pair (vertex, colour) in a solution. ‘Nevertheless, such a couple, considered isolatedly, is meaningless for graph colouring, since all the colours play exactly the same role’, they say (Galinier & Hao 1999). The partition approach, on the other hand, concentrates on relationships between vertices. They conclude that the latter approach is more promising, since ‘for the colouring problem, it will be more appropriate and significant to consider a pair or a set of vertices 1 belonging to a same class’. Therefore, they decided to employ the partition approach in the design of their algorithm and proposed the Greedy Partition Crossover (GPX). This operator iteratively transmits one group from a parent to offspring (parents are considered alternately), taking the largest groups first. If some vertices are already put in the offspring solution at some stage of the process, they are removed from the parents in order to prevent potential infeasibility in next steps. Experimental results confirmed that this crossover, coupled with some local search procedure (and thus resulting in a memetic algorithm), was an excellent design (Galinier & Hao 1999). Hoos & Stutzle (2004) comment on this algorithm that it is probably one of the best performing for the problem. One could think that Galinier and Hao’s approach is yet another example of the analysis of a problem and the design of a crossover based only on intuition, similar to Falkenauer’s. This would be true but for one short paragraph in the paper by these authors (Galinier & Hao 1999). It says that they performed a rough analysis of the search space of some random graphs and examined the frequency of assignments of the same colour to pairs of vertices. The analysis revealed, they say, that certain pairs of vertices were assigned the same colour more often than others. This way they confirmed Falkenauer’s intuitive analysis of the objective function and laid firmer foundation for their design of crossover. To sum up, similarily to the designs of crossovers for the TSP, one can see in the case of the graph colouring problem a gradual shift from general-purpose crossovers (e.g. binary or taken from the TSP), through more specialised operators based on intuition, to designs based on some empirical analysis of the search space of the considered problem. Specifically, the analysis of frequency of the same colour in pairs of vertices (Galinier & Hao 1999) revealed that for graph colouring problems it is beneficial in a crossover to preserve groups of vertices with the same colour. Analysis similar to this one might be the basis for the design of crossovers for other problems, as well. Job shop scheduling problem The next example concerns the well-known job shop scheduling problem (JSP) (Coffman 1976, Mattfeld et al. 1999). This problem was attempted to be solved with many different metaheuristic algorithms, with tabu search being superior in most cases (Watson et al. 2003). However, this section covers only some examples of crossover operators designed for the problem. Concerning crossover, Bierwirth et al. (1996) stated that this operator had to respect the semantical properties of the underlying problem representation in order to be efficient, but these properties were usually unknown in advance. In their work they examined this issue by checking 1 The emphasis by Galinier & Hao (1999) 4.4. Crossover operators 41 how three different crossover operators preserved in offspring certain type of distance between parents. They considered generalised versions of two operators taken from the TSP: the generalised order crossover (GOX) and the generalised partially-mapped crossover (GPMX), together with a specialised crossover of their own, called the precedence-preservative crossover (PPX). This last crossover was designed exactly to preserve in offspring the absolute order of jobs found in parents. It was not a surprise, therefore, that they found out this PPX was the best in preservation of the mentioned distance, which actually measured the difference in orderings of jobs between two solutions. What was more important, their results showed that this crossover was the best among the three, although it was not enough to obtain state-of-the-art results. Nevertheless, their experiments confirmed to some extent the hypothesis of the authors that it was exactly the absolute order of jobs which was an important characteristics of good JSP solutions. Later, the same authors (Mattfeld et al. 1999) attempted to shed some light on the issue of local search and recombination design for the JSP by means of a search space analysis. Concerning recombination (of which crossover is a special case), they stated that the ‘well-known condition for the success of recombination is seen in the partial similarity of local optima’, the opinion which they reinforce by a reference to the earlier cited work on the TSP by Freisleben & Merz (1996). They experimentally examined this similarity by means of the same measure as before. Thanks to this investigation it transpired that local optima of the JSP were not more similar to each other than random solutions. Some additional analysis also revealed that there were large plateaus (i.e. subsets of interconnected solutions with the same objective value) in the search space. These results convinced the authors to say that the JSP was probably difficult to recombination-based algorithms and some other methods (like local search, tabu search) should be preferred. And yet, some authors attempted to design and use specialised recombination operators to solve the job shop problem. For example, Tezuka et al. (2000) designed their common cluster crossover (CCX) focusing on the preservation of certain subsequences in offspring solutions. They stated that ‘preserving good sub-sequences that reduces the setup times [of jobs] is important to reach the good solutions quickly’. Despite apparently good motivation, which was in concordance with the intuition behind the crossover of Bierwirth et al. (1996), they somehow failed to achieve the purpose; their CCX operator performed poorly in the task of subsequence preservation, sometimes being no more that one-point crossover. And although it was better in optimisation than the general order-based operator (OX), it fared rather worse than e.g. tabu search algorithms. Another trial of crossover design for the JSP was performed by Zhang et al. (2005). They also started with the preservation of certain characteristics of parents as the operator’s goal. They chose to preserve in offspring the position of some randomly selected jobs and the relative order of others, thus borrowing the idea of order preservation from Bierwirth et al. (1996). The comparison of their hybrid genetic algorithm employing this precedence operation crossover (POX) with wellknown tabu search algorithms demonstrated that this was a better design than CCX; the quality of solutions was comparable to those of tabu search, although the computation time was longer. To summarise, in the example of crossover designs for the job shop scheduling problem one can see yet another type of solutions characteristic which should be preserved in a crossover operator: the absolute order of elements (jobs) (Bierwirth et al. 1996, Mattfeld et al. 1999). Empirical analysis of the search space of the problem revealed, however, that the use of recombinationbased algorithms (EAs, MAs) may not be a good idea for the problem at all, and some other metaheuristics should be used (tabu search, most likely). Therefore, the JSP may serve as a negative example for crossover design: under certain circumstances (results of analysis of the search space) crossover design should not be attempted at all. 42 Adaptation of an evolutionary algorithm to a combinatorial optimisation problem Crossover adaptation — summary From the examples of crossover design described above one can draw several conclusions. Firstly, it appears that the application of ‘standard’ operators (one-point, UX, OX, CX, PMX) is not a good idea; a crossover should be adapted to the considered problem if efficiency of the resulting algorithm matters. Secondly, the design of specialised crossover should focus on aspects of solutions which are important to their quality; properties of parents which are inherited by offspring must not be arbitrary, but directly linked to the problem semantics. One might say that this is the meaning of the frequently stressed, but somehow vague, notion of ‘naturality’ of a crossover in certain domain. Thirdly, the more recent designs indicate that analysis of the search space of the considered problem may shed some light on the issue of what an important solution property is. In particular, designer’s intuition about a property may be empirically confirmed or rejected by statistical analysis of frequency of the property in good solutions of the problem. Such confirmation was obtained by means of the fitness-distance analysis in the case of the TSP or the analysis of frequency of pairs of vertices with the same colour for the graph colouring problem. On the other hand, the hypothesis of similarity of good solutions with respect to precedence relations between jobs was rejected in the case of the JSP. It can be said, therefore, that such empirical analysis should be performed on a problem before actually the design of crossover is attempted. Such analyses usually rely on the notion of the fitness landscape of the considered problem (Altenberg 1997, Hoos & Stutzle 2004, Merz 2000, Reeves & Rowe 2003). Therefore, the author of this thesis thinks that the adaptation of a crossover operator to a particular combinatorial optimisation problem may be performed on the basis of empirical analysis of the fitness landscape of instances of the problem, namely the fitness distance analysis. In the author’s opinion this is a way of an empirical and objective exploration of properties of the problem followed by their exploitation in an algorithm; the way which is clearly indicated in the conclusions from the No Free Lunch Theorems (see chapter 3). Hence, this approach to crossover design is one of central issues of this thesis and will be elaborated upon in the next chapter. 4.5 Mutation operators The issue of design of a mutation operator is closely linked to the role this operator plays in evolutionary algorithms. Initially, in genetic algorithms, operators of this kind were perceived as of much lesser importance than crossover. The role of mutation was basically to maintain diversity in the population, the level of which was usually constantly decreased by the operations of crossover and selection (Goldberg 1989, Merz 2000). In GAs mainly the bit-flip mutation was used, without much concern about whether it fitted the problem or not. No problem-specific adaptation was considered in mutation design. In evolution strategies, on the other hand, mutation had major importance in the algorithm (Merz 2000, Michalewicz 1996). However, this group of methods was used primarily in numeric optimisation and mutation had usually the form of a variation applied to a real-valued variable. One of the most commonly used operators was Gaussian mutation with µ = 0 and adaptive σ. Sometimes operators based on other distributions were used (uniform, Cauchy, etc.) Evolutionary algorithms applied to combinatorial optimisation problems usually employed some kind of mutation, although for a long time the role of this operator was only secondary (the perspective inherited from genetic algorithms). Due to the nature of combinatorial problems, simple 4.5. Mutation operators 43 bit-flip or Gaussian mutation became meaningless, though. Thus, some more problem-specific operators were devised, mainly based on intuition or time-consuming experimental comparisons of the final algorithms. Often, the choice of mutation was inspired by some neighbourhood operator used in local search methods, although the goal of such mutation was to randomly perturb a solution, not to improve it through iterative process. Michalewicz (1996) describes several mutation operators of this kind for the TSP. The operator which is usually called inversion is actually the edge-exchange neighbourhood operator well known in local search algorithms. Evolution strategies for the same problem also employed neighbourhood operators as mutation: insertion (remove a vertex from a tour and reinsert it in a random location), relocation of a subpath (very similar to Or-opt), exchange of two vertices (Michalewicz 1996). In the case of grouping problems, Davies designed a mutation operator with sublist mixing or an exchange of two fields in a chromosome (Michalewicz 1996), in conjunction with an indirect representation of solutions. Falkenauer was probably first to use a kind of heuristic mutation for these problems; in bin-packing he chose to remove the contents of some random bins and then to heuristically reinsert the removed elements into the solution, as noted by Michalewicz (1996). This procedure somehow resembles local search. Later, Falkenauer (1998) introduced mutations operating on whole groups of elements: creation of a new group, removal of a group, random exchange of elements between two groups. Some other mutation, similar to neighbourhood operators for these problems, was used by von Laszewski, and Jones and Beltramo, Michalewicz says (1996). The job shop scheduling problem also saw mutation operators inspired by neighbourhood-based methods: a change of relative order of two jobs, an exchange of two jobs on a critical path, sublist mixing (like in the TSP) (Michalewicz 1996, Pawlak 1999). Even quite recently, Zhang et al. (2005) employed the first operator in their hybrid EA for the JSP. In the above examples one can see that mutation was chosen or designed rather in arbitrary manner, usually based on some neighbourhood operator. Moreover, since it was deemed mutation was of lesser importance than crossover, the issue of adaptation of this operator to a problem was rarely raised, except perhaps the methods of adaptation of mutation probability at runtime (Michalewicz & Fogel 2000); this seems more like tuning of parameters than problem-specific design, though. However, more recently some change of opinions concerning mutation was visible in the literature. Michalewicz (1996) says that the role of mutation in GAs was rather underestimated and there is evidence that it may be even more important than crossover, especially when complex representation is used. This point of view is also shared by Hoos & Stutzle (2004). Nevertheless, these authors have little to say about how a mutation should be designed, except giving examples of designs for certain combinatorial problems. Even so, the ideas of mutation taken from genetic and evolutionary algorithms may have limited application in the case of memetic ones, since the role of the operator changes. In MAs local search is used after crossover and mutation, so the latter should be able to generate a jump in the search space which is long enough to leave the local search attractor of the mutated solution (Merz 2000). If mutation was equivalent to one step of the neighbourhood operator, as it usually was in EAs, the local search would usually revert this effect and return to the starting point. Therefore, in MAs the mutation operator should be at least based on a different neighbourhood than local search is. Moreover, the distance of mutation jump should be evaluated in order to check if the operator is able to escape the attractor. On the other hand, the length of a mutation jump cannot be too large (the operator should not be too disruptive), since it might result in a completely random solution instead of one with many properties inherited from the mutated solution (Merz 2000). 44 Adaptation of an evolutionary algorithm to a combinatorial optimisation problem With these remarks in mind, Merz designed mutation operators for the TSP, the binary quadratic programming problem (BQP) and the quadratic assignment problem (QAP) (Merz 2000). In the TSP he used a double-bridge move (also called a non-sequential four exchange), the operation which is not easily reverted by Lin-Kernighan local search he implemented. The length of a mutation jump for the BQP he set to the average distance between two local optimum solutions. In QAP he applied an iterative exchange of two elements in a solution until the distance of the mutant to the mutated one was above a certain threshold. Merz also linked the mutation and crossover designs together and related them to the fitnessdistance analysis. As it was already mentioned in the section on crossover (4.4), if this analysis reveals certain properties of the search space (large distance between local optima; no correlation between fitness and distance), mutation should have major importance in a MA and its jump should be rather long (Merz 2000). This idea was exploited in Merz’s MA for the QAP, where fitness-distance correlation was not discovered. As Hoos & Stutzle (2004) note, this algorithm abandoned recombination and employed only mutation, with very good results. It seems, therefore, that the designer of a mutation operator for a MA should focus on the issues emphasised by Merz: mutation should escape attractors of local optima by means of an operation which differs from the neighbourhood operator used; it should be not too disruptive in order to retain inherited properties; it should be of major importance when certain structure in the search space (fitness-distance correlation) is not revealed. 4.6 Local search In evolutionary algorithms ‘hybridization should be used whenever possible, a GA is not a black box, but should use any problem-specific information available’, as Reeves & Rowe (2003) say. Application of local search methods in EAs is a major way of such hybridisation. It was considered as early as 1989 by Goldberg (1989) and later advocated e.g. by Culberson (1998), Merz (2000), Michalewicz & Fogel (2000), Moscato & Cotta (2003). The history of application of evolutionary algorithms hybridised with local search demonstrates that such methods may yield very good optimisation results. Therefore, many authors emphasise the utility of such hybrid evolutionary (or memetic) algorithms (Hoos & Stutzle 2004, Merz 2000, Michalewicz 1996, Michalewicz & Fogel 2000, Reeves & Rowe 2003). For example, Hoos & Stutzle (2004) state that pure EAs usually have limited capability of search intensification, and the introduction of local search optimisation in the evolutionary engine very often results in an increase of this capability, with better algorithm performance in consequence. This can be seen e.g. in efficient memetic algorithms for the TSP, which employ local search with edge-exchange neighbourhoods (Hoos & Stutzle 2004, Michalewicz & Fogel 2000). However, the application of some arbitrary local search method to a problem cannot be seen as a universal remedy for the search limitations imposed by the No Free Lunch Theorems. The local search part of a memetic algorithm has to be adapted to the considered problem in order to improve the final algorithm. Concerning this adaptation, there are many choices to be made by a designer, though. 4.6.1 Place for local search Local search procedure may be invoked in several places of a memetic algorithm. Most of the literature advocates to use it after the generation of initial solutions and after each variation (crossover or mutation), as it was shown in chapter 2 (Merz 2000, Krasnogor & Smith 2005, 4.6. Local search 45 Michalewicz 1996, Michalewicz & Fogel 2000, Moscato & Cotta 2003, Jaszkiewicz & Kominek 2003). Nevertheless, some authors also see other options, like making the LS a part of a recombination operator. As examples may serve the inver-over operator by Michalewicz (1996) and the EAX operator by Nagata & Kobayashi (1997) for the TSP, or Falkenauer’s (1998) operators for bin packing. Another possibility is to completely replace mutation with local search (Krasnogor & Smith 2005). 4.6.2 Choice of a local search type Another design decision is related to the type of local search to be used. There are several alternatives here, as well, but there are no clear guidelines when to use each of them. One fairly common choice is to use some standard greedy or steepest local search, and employ it always until a local minimum is found (Jaszkiewicz & Kominek 2003, Krasnogor & Smith 2005). Jaszkiewicz & Kominek (2003) found out that in their problem the best option was to use the greedy version to optimise initial solutions and the steepest one to improve offspring after crossover (they did not consider mutation). Krasnogor & Smith (2005) note, however, that such a strategy may be rather time consuming and that local search should be stopped (truncated) earlier in order not to waste CPU time. This approach may speed up the algorithm, but it also requires that a sensible enough truncation strategy is developed, which rather adds complexity to an already complicated process of algorithm design. Another possibility is to use a local search algorithm in the broad sense of the word, namely some more sophisticated procedure like tabu search (Galinier & Hao 1999) or simulated annealing (Zhang et al. 2005). 4.6.3 Choice of a neighbourhood Yet another important issue of local search adaptation concerns the choice of a neighbourhood operator to be used. According to Merz (2000), such an operator should ideally generate nearoptimum solutions and induce only few local optima in the search space. Some authors openly state that there is no universal rule for the choice of neighbourhood, though. They say that it is entirely dependent on the problem being solved (Krasnogor & Smith 2005, Moscato & Cotta 2003) and only some limited advice may be given. Krasnogor & Smith (2005) suggest using multiple neighbourhoods at the same time and adapting the probabilities of their application at run time, but this suggestion seems to avoid the problem of choice rather than solve it. Despite this opinion, other authors give some advice concerning neighbourhoods. Mattfeld et al. (1999) suggest that a neighbour should be only a slight modification of the original solution. One can see that many practically used operators actually adhere to this rule, e.g. the operators used in the TSP (edge-exchanges (Michalewicz 1996)) or vehicle routing problems (vertex shift (Jaszkiewicz & Kominek 2003)). Another suggestion given by Mattfeld et al. (1999) is to avoid infeasibility of generated neighbours. Moscato & Cotta (2003) say that a local search operator should also potentially link any two solutions in the search space. In his Ph.D. thesis, Merz (2000) suggests that the size of a neighbourhood is a good indicator of its practical utility. He says that usually the greater the neighbourhood, the better the quality 46 Adaptation of an evolutionary algorithm to a combinatorial optimisation problem of results, but too large a neighbourhood quickly becomes impractical. He gives the limit of O(n2 ) for a reasonable neighbourhood size, where n denotes instance size. 4.6.4 Neighbourhood and landscape structure Although the advice given above may make the decision on a neighbourhood easier, it seems that it is not enough to completely discern between neighbourhoods, since there may be several move operators which only slightly change a solution, produce feasible ones, link all of them together and have reasonable size. Therefore, some more guidelines are required to properly choose a neighbourhood and adapt local search to the considered problem. One of them is related to some property of the fitness landscape of the addressed problem. This property is called landscape ruggedness (or, conversely, smoothness) and intuitively it describes how much different (or similar) evaluations of neighbouring solutions are. Merz (2000) says that ‘a fitness landscape is said to be rugged if there is low correlation between neighbouring points of the landscape’. Obviously, it depends on the employed fitness function and neighbourhood relation. Ruggedness is usually measured by the so-called correlation length of a random walk in the landscape (Bierwirth et al. 2004, Hoos & Stutzle 2004, Mattfeld et al. 1999, Merz 2000, Reeves & Rowe 2003). The value of a correlation length may be derived analytically in some cases, but usually it is computed based on a sample of random walks in the landscape. The latter method requires the landscape to be isotropic, i.e. to have the same properties in the whole search space. This is not always the case, though, as demonstrated recently for the JSP (Bierwirth et al. 2004). The knowledge of correlation length may be exploited exactly in the choice of a move operator for local search. However, there is some confusion about the interpretation of the value. Mattfeld et al. (1999) state that the lower the correlation length, the easier the problem for local search which is based on the related neighbourhood. Exactly the opposite interpretation is suggested in more recent works of Merz (2000), and Hoos & Stutzle (2004). The latter interpretation provided the basis for distinction between edge-based and vertex-based neighbourhoods for the TSP; the edge-based moves induce higher correlation lengths, so this result confirms the universally held belief that in this problem manipulation on edges is better than on vertices (Merz 2000). 4.6.5 Efficiency of local search As noted in section 3.4, adaptation of a metaheuristic to a problem may also concern the speed of the designed algorithm. From this point of view there are several options concerning local search, since many authors stress the importance of local search efficiency (Hoos & Stutzle 2004, Jaszkiewicz & Kominek 2003, Merz 2000). One of the proposed techniques is to employ some form of neighbourhood pruning (Hoos & Stutzle 2004), either strict or probabilistic. In some cases, large number of neighbours of a solution may be provably worse than the current one (strict pruning); this may be seen e.g. in the JSP. In other cases, there is high probability that some neighbours do not lead to any improvement, so they are examined with small probability (probabilistic pruning). Another method of acceleration of local search is to substitute the calculation of the objective value for a neighbour with the calculation of the difference between the current solution and the neighbour (Hoos & Stutzle 2004, Jaszkiewicz & Kominek 2003, Merz 2000). This so-called incremental update scheme may be used in problems where a slight change in contents of a solution does not require recomputation of the objective value from scratch (the locality of transformation). However, not all objective functions have this desired property (e.g. the flowshop scheduling problem with the makespan minimisation objective, see the paper by Ishibuchi et al. (2003)). 4.7. Other components and techniques 47 But when this property is present in the objective function, also the ‘don’t look bits’ (caching) technique may be used in order to speed-up the search process (Bentley 1990, Hoos & Stutzle 2004, Jaszkiewicz & Kominek 2003). It consists in storing the evaluations of neighbouring solutions in some auxiliary memory (cache). If the employed move operator only slightly changes the current solution, most of the neighbourhood stays intact and does not have to be evaluated again. Of course, a part of the neighbourhood does change and has to be recomputed, and it is the size of this part which impacts the performance of the technique. If it is too large, then cache management may be too expensive. Moreover, this technique is useful rather with steepest local search than greedy one (Hoos & Stutzle 2004). The former always scans the whole neighbourhoods and cached evaluations are always useful. This is not the case with greedy search; it usually scans only a small fragment of the neighbourhood and, hence, cache is less often used. Yet another technique of acceleration/adaptation to a problem considers local search interaction with crossover in the presence of positive fitness-distance correlation. Some designs of local search which were based on this feature resulted in significant accelerations of the whole memetic algorithms (Jaszkiewicz 1999, Merz 2000). 4.7 Other components and techniques There are also other techniques of adaptation of an evolutionary algorithm to a problem, which are perhaps less known and popular, like the diversity maintenance. They are outside the scope of this thesis and are not elaborated upon; the interested reader is referred to major books and papers on the subject (Hoos & Stutzle 2004, Michalewicz & Fogel 2000, Reeves & Rowe 2003, Krasnogor & Smith 2005). 4.8 Conclusions The review of methods of adaptation given in this chapter may be the source of one major conclusion: there are several components of memetic algorithms which have to be adapted to the considered problem before the algorithm is actually executed, but there are multiple choices for each of them and usually there is little guidance in the literature for a practitioner on when and how to use them (except, perhaps, the advice based on landscape ruggedness or fitness-distance measurement). Indeed, in some serious studies this state of the art is complained about. Michalewicz & Fogel (2000) admit that there is little theoretical background that could help designing hybrid evolutionary algorithms. Hoos and Stutzle in the epilogue of their book (2004) summarise this state by saying that: much of the work on designing and applying SLS [i.e. metaheuristic] algorithms in many ways resembles a craft rather than a science (. . . ), experience, rather than intellectual understanding, is often the key to achieving the underlying goals. Krasnogor & Smith (2005) also subscribe to this point of view admitting that ‘the process of designing effective and efficient MAs currently remains fairly ad hoc and is frequently hidden behind problem-specific details’. This large amount of intuition and experience required to design good MAs (and metaheuristics in general) is the base of scepticism about such algorithms and the opinion that their design lacks 48 Adaptation of an evolutionary algorithm to a combinatorial optimisation problem a systematic, scientific approach (Hoos & Stutzle 2004, Moscato & Cotta 2003, Hammond 2003)2 . That is why well-known authors in the field see the problem of explaining and predicting the performance of evolutionary algorithms as one of the most important in the theory of computation (Hoos & Stutzle 2004, Moscato & Cotta 2003, Reeves & Rowe 2003). Moreover, research conducted in this direction will most likely result in the deeper understanding of the relationships between properties of combinatorial optimisation problems and metaheuristic algorithms, and eventually provide solid basis for practical designs (Hoos & Stutzle 2004). Due to these facts the issue of a systematic design and adaptation of the memetic algorithm (and, generally, metaheuristics) to a combinatorial optimisation problem is of major importance for the practise of computation. What is also worth noting, because of this lack of knowledge on relationships between problems and algorithms, some authors strongly recommend to avoid overloading algorithms with numerous and diverse components; the computational method experimented with should be as simple as possible in order to make the understanding of basic relationships easier (Krasnogor & Smith 2005, Michalewicz & Fogel 2000, Moscato & Cotta 2003). From this perspective, the most interesting approaches to the adaptation of the MA described in this chapter are the ones which are based on some analysis of the problem to be solved. These are the analysis of landscape ruggedness or the relationship between solution fitness and distance. The author of this thesis chose the latter for its subject, because in the past the designs based on fitness-distance analysis lead to efficient algorithms (Galinier & Hao 1999, Hoos & Stutzle 2004, Jaszkiewicz & Kominek 2003, Merz 2000), and also because ‘a systematic method for designing appropriate mixing [i.e. crossover and mutation] operators would be extremely helpful’ (Reeves & Rowe 2003, page 283). Therefore, this thesis goes further along the lines of research drawn earlier by scientists like Kirkpatrick & Toulouse (1985), Muhlenbein (1991), Boese (1995), Boese et al. (1994), Jones & Forrest (1995), Altenberg (1997), Merz (2000), Watson et al. (2003), Jaszkiewicz (1999), Jaszkiewicz & Kominek (2003). The concepts of the fitness landscape and the fitness-distance correlation, which were central to their research and are the basis for construction of crossover operators, will be presented in more detail in the next chapter. 2 Chris Stephens said in the interview (Hammond 2003): ‘I think it [i.e. evolutionary computation] lacks a systematic ‘scientific’ point of view — people are always tending to move on to the next problem, invent the next little widget for their algorithms rather than do a more systematic, more thorough analysis of what they have already done’. Chapter 5 Fitness-distance analysis for adaptation of the memetic algorithm to a combinatorial optimisation problem Fitness-distance analysis relies heavily on the notion of a fitness landscape. Therefore, the latter will be defined and commented on first. 5.1 Fitness landscape Intuitively, a fitness landscape is a graph where solutions play the role of vertices and edges indicate the neighbourhood relation. It is also labelled on vertices with real values of the fitness function. This graph may be imagined as a three dimensional discrete surface (hence the name, landscape): the first two dimensions are spanned over solutions and the last one indicates fitness. However, in practise this imaginative structure may have to few dimensions to precisely describe the usually multidimensional search spaces of combinatorial problems (Reeves 1999). Taking the formal perspective, two different definitions of a landscape may be found in the literature on evolutionary computation. 5.1.1 Neighbourhood-based definition A landscape L of an instance I of a combinatorial optimisation problem π is a triple L = (S, f, N ), where S = Sπ (I) is the set of solutions of this instance, f is the fitness function and N is a neighbourhood defined for solutions in S. This is the most commonly found definition (Merz 2000, Moscato & Cotta 2003, Reeves & Rowe 2003, Mattfeld et al. 1999, Bierwirth et al. 2004, Michalewicz & Fogel 2000, Schiavinotto & Stutzle 2007). 5.1.2 Distance-based definition A less common definition says a landscape is a triple L = (S, f, d), with only d being different from N which appears above. Here, d is a distance measure: d : S × S → R (Merz 2000, Reeves & Rowe 2003). 49 50 Fitness-distance analysis It is usually desired that d possessed properties of a distance metric, namely: ∀s, t ∈ S d(s, t) ≥ 0 d(s, t) = 0 ⇔ s = t ∀s, t ∈ S d(s, t) = d(t, s) (symmetry) ∀s, t, u ∈ S d(s, u) ≤ d(s, t) + d(t, u) (triangle inequality) Such properties usually facilitate the interpretation of values of the measure, but it is not always necessary or possible to define a measure which possesses them all. In particular, symmetry may be not required in practise, while the triangle inequality may be difficult to hold or prove. Without the property of the triangle inequality a measure is usually called a semi-metric. 5.1.3 Comparison of definitions Reeves & Rowe (2003) comment on the two definitions that the neighbourhood-based one is perhaps more common because a neighbourhood relation may be easily converted into a distance metric: distance between solutions s, t ∈ S is simply measured as the minimum number of neighbourhood moves required for s to become t. However, this is not always a practical thing to do and sometimes an operator-independent measure is needed, they say. Nevertheless, it seems that in most cases the two definitions have identical meaning and may be used interchangeably. 5.1.4 Applications When a fitness landscape is determined, many of its properties may be precisely defined and examined. According to Hoos & Stutzle (2004) these are, among others: types of positions (solutions) in the landscape and their distributions (e.g. local minima, interior plateaus); the number and density of local minima; fitness-distance correlation; landscape ruggedness; exits and exit distributions; plateau connection graphs. Most of these properties have some influence on the efficiency of optimisation algorithms, these authors say. Fitness-distance correlation is only one of them. 5.1.5 Landscape and fitness function One can see in the definition of a landscape that it clearly depends on the analysed fitness function (usually being the original objective, as well). In practise, especially when approaching a new problem, it may happen that the fitness function is not entirely defined yet and an algorithm’s designer is also an analyst of the related real-world situation. In such a case there is a possibility of defining the fitness function in a way which could make the landscape easier for optimisation, possibly having some desired properties. On the other hand, if it is required that global optima are found, the fitness function may be also manipulated with, provided that optima of the modified fitness are exactly the same. Similarly to the previous case, the resulting landscape may posses some more desirable properties than the one for the original fitness. Another possibility is to have the same fitness function and modify its decision variables. This may also help create a landscape which is easier for optimisation. An example of such a modification was given by Altenberg (1995) on a function defined by Michalewicz (1996). As a result, for the same fitness function a very different landscape was obtained, which was much smoother and had less local optima. 5.1. Fitness landscape 5.1.6 51 Landscape and distance measure The form of a landscape depends also on the employed distance measure d (or the neighbourhood relation N ). If the given fitness function and its variables are unmodifiable (e.g. explicitly given in the problem formulation, as in the case of classical problems), then only a change in the definition of distance may change the landscape. In this context two issues regarding landscape definition and utility arise. Landscape form depends on a distance measure This dependence was not fully realised at first. Falkenauer (1998), for example, suggested that a distance measure should not have been explicitly defined because it would result in the loss of generality of reasoning. He explained that in order to precisely define a distance between two points in a landscape, one had to state what the search operator in the designed algorithm is, and then the focus of research would be on the fitness function and the operator, not on the function itself. However, without a clear definition of distance one does not exactly know the structure imposed on the search space: which solutions are connected, which are far apart. Only with an unambiguous distance a landscape exists, all its properties become clearly defined and may be examined (Altenberg 1995, Hoos & Stutzle 2004, Reeves & Rowe 2003). Moreover, it is not required to design a complete algorithm first in order to have a landscape, although some correspondence between the two may be important. Concerning the existence of distance measures which could help define landscapes for diverse search spaces, Reeves & Rowe (2003) state that in the continuous case there is little problem compared to the combinatorial one. They say that for continuous variables ‘the landscape is determined only by the fitness function’, meaning most likely that there is a natural distance measure for continuous spaces: the Euclidean metric. Although they may be right that this is usually the first and natural choice, one should remember that it is only a special case of the Lp metric, with p = 2. Putting p = 1 one gets city distance, and for p = +∞ it becomes the Tchebycheff distance. Nevertheless, Reeves and Rowe are definitely right saying that the choice or design of a distance measure for a combinatorial search space is difficult, perhaps due to the larger variety of considered objects (not only vectors of real numbers) and less research in this direction in the past. Despite this, some advice on distance measures for combinatorial landscapes may be found in the literature. Firstly, the distance measure employed in the definition of a landscape should not be directly based on the analysed fitness function. Bonissone et al. (2006) state it clearly when they consider distance used as diversity in an EA: in such a case distance could not differentiate solutions that lie e.g. near different global optima with similar fitness. Secondly, distance should take into account such properties of solutions which are important for fitness. Here, Hoos & Stutzle (2004) give an example of the distance most commonly used for TSP solutions, the bond distance (Boese 1995). This measure takes into account the relative order of vertices in a tour, rather than some arbitrary properties of any representation of the tour. Thirdly, there are some examples of distance measures defined for combinatorial structures encountered in optimisation problems. One of them is the already mentioned bond distance for TSP solutions. Another widely used one is the Hamming distance for bit strings, which was also employed in numerous different contexts (e.g. the set covering problem (Finger et al. 2002), the QAP, the BQP, the GBP (Merz 2000)). Reeves, also with Yamada, (1998, 1999) analysed the landscape of the flowshop scheduling problem using several simple distance measures. Fitness 52 Fitness-distance analysis landscape of some JSP instances was defined using the disjunctive graph distance (which was also inspired by the Hamming distance to some extent) (Bierwirth et al. 1996, Mattfeld et al. 1999, Bierwirth et al. 2004). Some more distance measures may also be found in the literature, specifically for permutations (Schiavinotto & Stutzle 2007) or under the notion of diversity (Mattiussi et al. 2004). It seems, however, that there is still some lack of appropriate measures for combinatorial structures, especially those more involved than permutations. This fact somehow decelerates research on fitness landscapes of practical problems. Finally, one has to remember that distance should be efficiently computable. It would be of no practical use otherwise. An example of a distance measure which is hard to compute (NPhard) is the one based on the 2-edge exchange neighbourhood operator for the TSP. Because of this fact it was eventually replaced by an approximation, the bond distance (Boese 1995, Hoos & Stutzle 2004, Schiavinotto & Stutzle 2007). Perception of a landscape by a search algorithm Falkenauer (1998) was reluctant to clearly define distance because he did not want his reasoning on fitness functions to become dependent on any particular search operator (algorithm). It could be seen from the definition of a landscape that it does not involve any complex algorithm, although it does involve a neighbourhood operator or a distance measure. Therefore, a landscape may exist and be analysed without any complex algorithm in mind. What Falkenauer was right about, though, was that for a landscape analysis to be of any practical use it is important that it somehow corresponded to the eventually designed algorithm. If landscape analysis is performed in order to gain insight in the search space structure and the obtained information is to be exploited in some algorithm, then the algorithm should perceive the search space from the same perspective, the same landscape. Otherwise there would be no link between the algorithm and the landscape analysis performed earlier. This viewpoint is also shared by Reeves & Rowe (2003), and Hoos & Stutzle (2004). 5.2 Fitness-distance analysis Fitness-distance analysis (FDA) examines a fitness landscape in search for a relationship between fitness (quality) of solutions and their distance to the search goal, a global optimum (Jones & Forrest 1995, Reeves 1999, Merz 2000, Hoos & Stutzle 2004). Most often it is a form of a statistical analysis of a sample of good solutions of a problem instance which is summarised by a value of the correlation coefficient between fitness and distance. The desired result of this analysis (in maximisation problems) is a strong relationship with a highly negative correlation, because ‘if fitness increases when the distance to the optimum becomes smaller, then search is expected to be easy for selection-based algorithms, since there is a ‘path’ to the optimum via solutions with increasing fitness’ (Merz 2000). This Merz’s statement about practical profits from negative correlation in selection-based algorithms (e.g. EAs) may also be supported by similar opinions expressed by Jones & Forrest (1995), and Hoos & Stutzle (2004). This desirable structure in the fitness landscape is sometimes also referred to as ‘big valley’ (Boese 1995, Boese et al. 1994, Reeves 1999), ‘central massif’ (Reeves & Yamada 1998) or ‘global convexity’ (Boese 1995, Boese et al. 1994, Jaszkiewicz & Kominek 2003). 53 5.2. Fitness-distance analysis 5.2.1 Basic approach Probably the most common approach to the FDA requires that at least one global optimum of each analysed instance is known in advance (Jones & Forrest 1995, Boese 1995, Merz 2000, Hoos & Stutzle 2004). Landscape sampling A sample is taken of good solutions of the analysed problem instance. This sample is usually large (e.g. 500, 1000 or even more solutions) and most often consists of local optima only. The solutions are generated by starting from randomly chosen points in the landscape and then some simple, randomised local search is performed independently on each of them (Boese 1995, Merz 2000, Hoos & Stutzle 2004, Reeves 1999). Such a sampling procedure may seem to be biased, since local search does not uniformly sample the space of all solutions. This is indeed the case and is done on purpose, because optimisation algorithms (e.g. MAs) are usually highly biased toward good solutions. The purpose of this sampling is to approximate the set of solutions an optimisation algorithm may encounter during search, not the set of all solutions (Hoos & Stutzle 2004). In case of MAs and many other algorithms based on local search only local optima are accepted to the population; hence, local optima are sampled. For each sampled solution s two values are computed: quality f (s) and distance to the nearest known global optimum dopt (s). Also, for each pair s1 , s2 of solutions in the sample the distance between them is calculated, d(s1 , s2 ). Distance between local optima One stage of the analysis is the examination of distance between local optima in the sample and its comparison to the distance arbitrary solutions of the problem instance may have. Usually the average distance between local optima in the sample is computed: d¯ = n n X X 2 d(si , sj ) n(n − 1) i=1 j=i+1 and is compared to the average distance between random solutions (Boese et al. 1994, Mattfeld et al. 1999) or to the analytically computed extension (diameter) of the search space (Boese 1995, Merz 2000), i.e. the maximum distance two solutions in the problem instance may have. This way two questions may be answered. • Are local optima of the analysed instance spread all over the search space or rather confined to its smaller fragment? • What is the size of the subspace containing local optima? If it transpires from the analysis that local optima are more tightly clustered in the search space than arbitrary/random solutions are, it means that they share some common properties. This fact may be later exploited in the designed algorithm (Boese 1995, Boese et al. 1994, Merz 2000). Fitness-distance correlation The relationship between values of f and dopt in the sample is summarised using the linear correlation coefficient, which in this context is called the fitness-distance correlation (FDC). Given 54 Fitness-distance analysis the sample s = {s1 , . . . , sN }, the FDC is computed as (Jones & Forrest 1995, Merz 2000, Hoos & Stutzle 2004, Reeves & Rowe 2003, Bierwirth et al. 2004): r= cov(f, dopt ) sf · sdopt where cov denotes the estimate of covariance of two variables: cov(f, dopt ) = N 1 X (f (si ) − f¯)(dopt (si ) − d¯opt ) N i=1 f¯ and d¯opt are means of fitness and distance in the sample, and s the estimate of the standard deviation of a variable, e.g.: v u N u1 X sdopt = t (dopt (si ) − d¯opt )2 N i=1 Being simply a linear correlation coefficient, the value of FDC always belongs to the closed interval [-1,1]. It should be noted that in this context FDC is only a descriptive statistic (Jones & Forrest 1995); no probabilistic model of the relationship between fitness and distance is assumed here, which would include FDC as a parameter. Interpretation of values of FDC obviously depends on the direction of optimisation in the analysed problem. For minimisation problems, ‘high correlations suggest that the local optima are radially distributed in the problem space relative to a global optimum at the centre; the more distant the local optima are form the centre, the worse their objective function values’ (Reeves & Yamada 1998); Jones and Forrest call such a landscape ‘straightforward’ for GAs. FDC equal to 1 would indicate a perfectly linear relationship between fitness and distance, and promise the search through the landscape to be easy (Merz 2000). On the other hand, highly negative values of r indicate a misleading problem, where solutions with better quality tend to be further and further away from the search goal. Finally, r ≈ 0 indicates no relationship between fitness and distance (hence, no guidance for search from the fitness function) or some nonlinear relationship, which is poorly summarised with linear correlation (Jones & Forrest 1995, Hoos & Stutzle 2004). Scatter plots of fitness vs. distance The relationship between fitness and distance may also be visualised in a scatter plot. This is a plot with fitness on one axis and distance on the other, with each sampled solution’s pair (f (s), dopt (s)) shown as one point (Jones & Forrest 1995, Merz 2000, Hoos & Stutzle 2004, Reeves 1999, Reeves & Rowe 2003). This fitness-distance (FD) plot is said to be useful especially in cases with r ≈ 0, when a nonlinear relationship might be overlooked (Jones & Forrest 1995, Hoos & Stutzle 2004). Merz (2000) even says this plot is more informative than the FDC coefficient. For a minimisation problem, the desired shape should contain a tendency: with decreasing fitness (increasing quality) the values of distance should be smaller. Ideally, there should be only such a tendency, without any noise, outliers or other components (FDC near 1). Examples of FD plots are given in figure 5.1. They were generated from real FD analyses. Note the possible differences in shapes (a flattened oval going slightly up (left) or a cloud with several horizontal groups of points below (right)). Note also the related values of FDC and how well (or poorly) they correspond to the visible shapes. Some very interesting, diverse plots may also be found in the paper by Jones & Forrest (1995) and Merz’s Ph.D. thesis (2000). 55 5.2. Fitness-distance analysis 0.8 1 0.9 0.8 0.7 Distance Distance 0.7 0.6 0.6 0.5 0.4 0.3 0.2 0.1 0.5 1350 1400 1450 1500 0 240 Fitness 250 260 270 Fitness 280 290 300 Figure 5.1: Examples of scatter plots of fitness vs. distance (FD plots) with superimposed lines of first order regression. In the left plot r = 0.487; in the right plot r = 0.214. 5.2.2 Examples of analyses from the literature Table 5.1 lists examples of the fitness distance analysis that may be found in the literature. For each problem (in column 1) the authors of the analysis are noted, together with the year of the related publication (2). The next column (3) indicates the distance (or similarity) measure used in the analysis. Further, the applied sample size is given (4) and the kind of reference solution(s) used to compute distance to (5). Next, the analysed instances are shortly described (6) and the obtained values of FDC coefficients are given (7). The last column (8) indicates the classification of the result as given by the author(s) of the analysis; a ‘+’ means that the fitness distance analysis result was evaluated as positive (i.e. a ‘big valley’ exists), while a ‘−’ indicates no significant FDC (no ‘big valley’). Note that an empty entry in the table means the entry has exactly the same value as the one immediately above it (in the previous row). It should be noted that not all of the listed analyses were conducted with the basic approach described earlier. The most important differences are in the reference solution(s) used to compute the distance of sampled solutions to (column 6). In many cases global optima of the analysed instances, which are required in the basic approach in order to compute dopt (s), were unknown. In these cases some other solution (e.g. the best-known) or a group of other solutions (all other local optima in the sample; all not worse in the sample) were used. Moreover, one can see that there were also major differences in sizes of samples. All these issues will be discussed later. The author’s first impression while looking at table 5.1 is that FDA is not a completely new analysis technique; it has already been performed a number of times and applied to a variety of optimisation problems. It has also been a subject of publications in serious journals and conference proceedings. Boese et al. (1994) Merz (2000) Graph Bipartitioning Problem a modified Hamming distance up to 10000 2500 2500 0.99 (0.66,0.91) 0.54 (0.02,0.07) regular graph grids mesh ‘caterpillar’ graphs or best-known (0.22,0.82) random graphs ‘apparent’ (not given) 0.01 N=1024, K=11 random graphs regular graphs (0.24,0.62) N=1024, K=2,4 global optima all other local optima best-known ≈0.0 N = 12, K = 11 Merz (2000) (0.35,0.83) N = 12, K < 3 NK-landscapes (−0.5, −0.3) ≈0.0 0.88 1.00 3 test instances 1 test instance FDC values (fitness max.) A 4-bit fully deceptive problem global optima Instances 1 test instance 4000; the whole space if smaller Reference solution Needle in a haystack Hamming distance Sample size 1 test instance Jones & Forrest (1995) OneMax Distance/ Similarity Porcupine Author(s), Year Problem Table 5.1: Fitness-distance analyses described in the literature. − + + + + + − + − + − − + + Class. 56 Fitness-distance analysis Problem Watson et al. (2002) Reeves (1999) 2500 Hoos & Stutzle (2004) Flowshop Scheduling up to 2500 Merz (2000) precedence metric 2 adjacency metrics precedence metric position metric 1000 50 2500 Problem bond distance Boese (1995) Sample size Travelling Salesman Distance/ Similarity Author(s), Year Problem all other local optima global optima global optima all other local optima global optima global optima Reference solution (0.07,0.27) 0.2 2 ‘drilling’ graphs 1 hard instance visible (not given) not visible (not given) machine-correlated mixed correlation less significant example: 0.12 ‘significant’ example: 0.54 job-correlated random instances 50 known random benchmarks 0.68 (0.4,0.99) ‘fractal’ graphs geographic graph (0.54,0.7) (0.4,0.57) FDC values (fitness max.) geographic graphs geographic graph Instances Table 5.1: Fitness-distance analyses described in the literature. − + − + + − − + + + Class. 5.2. Fitness-distance analysis 57 percentage of Vehicle Routing Problem common features (4 measures) Jaszkiewicz & Kominek (2003) A real-world number of different elements Finger et al. (2002) Hoos & Stutzle (2004) for permutations Problem Set Covering Problem Assignment Hamming distance 500 1000 1000 10000 global optima local optima all not worse or best-known global optima best-known or best-known global optima Merz (2000) 2500 Reference solution Quadratic Hamming distance Sample size or best-known Merz (2000) Binary Distance/ Similarity Quadratic Programming Author(s), Year Problem 10 instances 15 real-world instances 29 random benchmarks (0.45,0.61) (0.0,0.25) (0.29,0.78) 0.03 (0.09,0.29) random instances high flow dominance 1 benchmark (0.16,0.46) 0.62 (0.0,0.37) 0.31 (0.53,0.81) FDC values (fitness max.) random instances low flow dominance 1 benchmark 14 benchmarks 1 benchmark 24 benchmarks Instances Table 5.1: Fitness-distance analyses described in the literature. + − + − − + + − +/− + Class. 58 Fitness-distance analysis 2 measures, for 1 encoding each Cotta & Fernández (2005) Optimal Golomb Ruler Problem percentage of common features (5 measures) Jaszkiewicz (2004) disjunctive graph distance Distance/ Similarity problem A satellite management Watson (2005) Beck & Watson (2003) Job-shop Scheduling Problem Author(s), Year Problem variable 200 best out of 1000 not given not given Sample size global optima all not worse local optima global optima global optima Reference solution 1 instance (0.8,0.9) (0.5,0.9) ‘poor’ (not given) 2nd half 6 instances high (not given) few outliers in (0.05,0.5) largely in (0.5,0.9) FDC values (fitness max.) half of known random benchmarks 1000 small random instances Instances Table 5.1: Fitness-distance analyses described in the literature. + + − + − + Class. 5.2. Fitness-distance analysis 59 60 Fitness-distance analysis Significance of FDC Looking at the values of the fitness-distance correlation one may notice that the values classified as revealing a ‘big valley’ structure in the landscape were usually moderate. Values of r as large as 0.4, or even 0.3, were deemed significant, e.g. for NK landscapes, the TSP, the SCP. Such values were usually backed up by visible trends in fitness-distance plots. However, from the statistical point of view, r = 0.3 is rather weak and means that only r2 = 0.09 · 100% = 9% of variance of one variable (e.g. dopt ) may be explained be changes in the second one (e.g. f ) through a linear regression model (Ferguson & Takane 1989). For r = 0.4 the explained variablity amounts only to r2 = 16%. And yet, such results were perceived as supporting the ‘big valley’ hypothesis about the related fitness landscape. Dependence of FDC on instance type Another important phenomenon visible in table 5.1 is the dependence of FDA results on the type of the analysed instances (column 4); the phenomenon was also noticed by Merz (2000). This means that FDC value is not a characteristic of a problem, but of a problem instance. This dependence is clearly visible in almost all listed problems: NK-landscapes, the TSP, the JSP, the SCP and, perhaps to a lesser extent, in the GBP and the QAP. In NK-landscapes one may see the FDC value depends on K, second of the two numbers denoting the instance type. In this problem, K determines the number of variables each of the N binary variables in the problem in dependent on. It appears as though values of K larger than 4 make FDC fall down to zero. In the TSP the borderline between instance types is less apparent. Instances based on geographic data (e.g. cities in the USA) usually give rise to high FDCs. The same happens for ‘fractal’ instances. However, real-world drilling problems result in instances with virtually no FDC and it is difficult to judge why this type is so different from the previous ones; perhaps many edges of the same length pose a problem to FDC. Most of benchmark instances of the quadratic assignment problem reveal no significant FDC values. However, random instances of a special type (low flow dominance) may have correlations larger than zero. The set covering problem provides an interesting case (Finger et al. 2002, Hoos & Stutzle 2004). On one hand, 29 randomly generated benchmarks from the OR-Library usually exhibit correlations significantly larger than zero. On the other hand, 15 real-world instances stemming from applications in airline crew scheduling reveal no correlation of fitness and distance. Small random instances of the job-shop scheduling problem usually have high FDCs, while well-known random benchmarks are divided in two with respect to this coefficient, Watson (2005) reports. This dependence on the instance type is not visible for the real-world vehicle routing problem (Jaszkiewicz & Kominek 2003), the satellite management problem (Jaszkiewicz 2004) and the optimal Golomb ruler problem (in the last case there are no instance types, though; only instance size may vary). In all these cases all the analysed instances revealed high FDCs. To summarise the issue, the dependence of FDC on the instance type usually exists, and it is not good news, because it is not entirely a problem characteristic. Moreover, it is difficult for the author to say what actually decides that one instance has high fitness distance correlation and some other has not. As far as the author knows, in the literature there is no satisfactory explanation for this dependence. What is also important, some of the results covered here indicate that it might be dangerous 5.2. Fitness-distance analysis 61 to generalise about FD properties of a problem based only on the analysis of randomly generated instances. Such instances may have quite different landscapes from their real-world counterparts, as it was in the case of some TSP instances, or in the SCP. Dependence of FDC on distance measure The contents of table 5.1 also confirm that a change in the distance measure strongly influences the properties of the analysed fitness landscape (it actually changes the whole landscape, as noted in section 5.1.6). This influence is clearly visible in FDC values in the flowshop scheduling problem, for which 4 different distance measures were employed in the analysis. Two adjacency-based measures give rise to landscapes without ‘big valleys’, while the measures based on position and precedence relations let FDA reveal significant correlations. Also in the case of the real-world VRP and the satellite management problem a change in a similarity measure influences landscape properties. Here, this influence is perhaps less apparent due to the fact that all defined measures are positively correlated with fitness. This dependence of FDC on a distance measure may be interpreted as good news: a problem instance may be analysed from different points of view (i.e. with different measures) and it is enough that one measure correlates with fitness to have a ‘big valley’ in the landscape. Dependence of FDC on the type of sampled solutions The computed value of FDC may depend on the type of algorithm used to generate solutions to a sample. Several authors experimented with different algorithms for the same instances and it appeared that such a change may somewhat alter the value of correlation. Merz (2000), for example, analysed TSP instances with two types of local optima: 3-opt and Lin-Kernighan. It transpired that the use of local optima of the second type usually resulted in higher FDCs, although the final instance ‘big valley’ status remained the same. He obtained similar results for the GBP, where the use of a more powerful local search also resulted in more artificially looking fitness-distance plots. Another example may be the work of Reeves (1999). He experimented with 5 different types of local optima and his FDA results were to some small extent different for each of the types. Distance between local optima Table 5.1 could not fit the results concerning distances between local optima, so they will be discussed here. According to Boese (1995) yet Kirkpatrick & Toulouse (1985) noticed that local optima of the TSP were surprisingly similar (close) to one another. Their findings were confirmed by Boese et al. (1994) in case of randomly generated instances on the unit square: the maximum distance between local optima was less than the half of the average distance between random solutions. Later, Boese (1995) also confirmed this fact for one geographical TSP instance with 532 vertices. Merz (2000) extended these previous studies of the TSP by examining different types of instances. His findings were similar with respect to geographic and ‘fractal’ instances (d¯ ≤ 1/4 search space diameter), but for one ‘drilling’ instance the average distance between local optima was rather large (0.41 of the diameter); note that he computed the average distance, as opposed to the maximum one used by Boese (1995). Similar analysis was performed by Boese et al. (1994) on random GBP instances. There, they found that the maximum distance of local optima sometimes happened to be as large as the average distance between random solutions, quite contrary to the results for the TSP. Merz (2000) obtained 62 Fitness-distance analysis similar results for the same type of instances, but regular and grid-like graphs had more similar local optima. Interestingly, similar optima happened to exist in instances with larger FDC values. The comparison of distances between local optima and random solutions was also performed on the JSP. In this case ‘local optima are spread all over the fitness landscape’, Mattfeld et al. (1999) concluded. Reeves (1999), on the other hand, concluded from similar analysis of the flowshop scheduling problem that local optima seemed to be close to each other, although he did not dwell on the matter. Merz (2000) also analysed from this point of view the NK-landscapes, the BQP and the QAP. His results were similar to conclusions on FDC: • NK-landscapes with K = 2 had very concentrated local optima; for K = 4 this concentration was smaller; • NK-landscapes with K = 11 had local optima distributed in a similar way to a uniform distribution of points in the search space (no concentration); • local optima of the BQP were highly clustered in the search space of all solutions (1/10 to 1/4 of the diameter); • local optima of the QAP were spread all over the landscape, with one exceptional instance (the one with high FDC). The real-world VRP analysed by Jaszkiewicz & Kominek (2003) had quite similar local optima from the point of view of 2 similarity measures, while the same solutions had rather low similarity with respect to the 2 other measures. To summarise, one can see that clustering of local optima is also dependent on the analysed problem, instance type and distance measure. In some problems local optima happened to be very close to one another, and then this landscape feature may be exploited in the designed algorithm. Historical remarks The works that probably inspired most of the literature on FDA are due to Kirkpatrick & Toulouse (1985), and Muhlenbein (1991). As far as the author knows, they did not consider measuring the correlation between solution quality and distance to a global optimum, but their focus on similarity of local optima of the TSP and related results convinced many researchers to follow their ideas. First computation of FDC is most probably due to Boese (1995), and Jones & Forrest (1995). Their works, published in mid 1990s, are likely to be the most often cited on fitness-distance correlation. In late 1990s and early 2000s, Merz and Freisleben enormously contributed to this area of research (Merz 2000, Merz 2001, Merz & Freisleben 1999, Merz & Freisleben 2000a, Merz & Freisleben 2000b). At the same time also two important papers on flowshop scheduling appeared by Reeves (1999), and Reeves & Yamada (1998). Afterwards, a number of other researchers also focused their work on the fitness-distance analysis and its applications, convinced to some extent by these first results that statistical landscape analysis may be an area promising further improvement in the design of metaheuristics. 5.3. Exploitation of fitness-distance correlation in a memetic algorithm 1 2 3 4 5 6 7 8 9 10 63 1 2 3 4 5 6 7 8 9 10 p0 0 1 0 0 1 1 0 0 1 0 p1 1 0 1 0 0 1 1 0 0 1 1 2 3 4 5 6 7 8 9 10 o 1100011000 Figure 5.2: An example of a case of a respectful recombination of parents p0 and p1 on a binary representation. Common assignments to positions of parents and the offspring o are emphasised. This operator preserves the Hamming distance. 5.3 Exploitation of fitness-distance correlation in a memetic algorithm However, ‘it is not sufficient to have a landscape with ‘nice’ properties. It is necessary also to have a means of exploiting the landscape that makes use of these properties’ (Reeves & Rowe 2003). 5.3.1 Design of respectful recombination One of the major conclusions of Merz’s (2000) states that ‘memetic algorithms should employ respectful recombination operators on correlated landscapes’ because they can exploit this, i.e. FDC, structure. The same opinion may be found in the paper by Jaszkiewicz & Kominek (2003). Definition In case of binary representations, a respectful crossover may be the one that preserves in an offspring the positions (bits) which have identical values in (are common to) its parents (Merz 2000) (see figure 5.2). Thinking more generally, a respectful operator preserves in an offspring all solution properties that are common to parents, irrespective of how a property is actually defined: some value at a position in a vector, presence of an edge in a tour, a precedence relation between jobs, etc. From the perspective of measuring distance in terms of the same properties, all the common properties of parents do not contribute to the distance, only the different ones do. Thus, these different ones make the distance between parents larger than zero, d(p0 , p1 ) = d; if there are no different properties in the parents, then the distance with respect to these very properties should be zero. Now, if a respectful crossover is performed, the resulting offspring o always inherits the common properties of its parents. Therefore, the distance of this offspring to its parents, with respect to the same distance measure, may not be larger than the distance between parents: d(p0 , o) ≤ d and d(p1 , o) ≤ d. This is the reason why a respectful operator is also called a distancepreserving one (Merz 2000, Jaszkiewicz 2004, Jaszkiewicz & Kominek 2003). The idea of distance preservation is intuitively shown in figure 5.3. Note that this figure is very similar to the one given by Merz (2000, page 58). Of course, such a figure is only an intuitive, 2-dimensional illustration of what happens; in combinatorial problems with reasonable instance size the search space usually has many more dimensions. Such a recombination operator does not generate arbitrary jumps in the search space, but always results in an offspring which is located in the region spanned between its parents. Respectful recombination exploits FDC Merz explains in conclusions of his thesis (2000) how a memetic algorithm with such a recombination operator can exploit fitness-distance correlation: 64 Fitness-distance analysis o p0 d( p0 , p1 ) p1 Figure 5.3: Intuitive picture of a case of a distance-preserving (respectful) crossover in 2 dimensional Euclidean space. The most important property of landscapes on which MAs have been shown to be highly effective is the correlation of fitness of the local optima and their distance to the global optimum. MAs employing recombination are capable of exploiting this structure since with respectful recombination offspring are produced located in a subspace defined by the parents. These offspring are used as starting solutions for a local search that likely ends in a local optimum relatively near the starting point and thus near the parents. Due to the correlation structure of the search space this point has more likely a higher fitness than local optima more distant in an arbitrary direction from both parents. Viewing the evolutionary process as a whole, recombination is capable of decreasing the distance between solutions (and the distance to the optimum) if the landscape is correlated, while fitness is increased by selection: the higher the fitness of the local optima and the closer they are in terms of distance, the more likely the optimum is found in the vicinity of those solutions. This in not only Merz’s point of view. Mattfeld et al. (1999) express similar opinion when they discuss the applicability of adaptive search approaches to combinatorial optimisation problems. They say that in problems like the TSP, where many local optima are located near the global optimum (i.e. a ‘big valley’ exists), recombination operators like Merz’s are a good option. Respectful recombination is especially useful if there is high similarity of local optima of the considered problem (they are clustered in the search space). In this case it is very likely that parents of recombination share many common properties, so that a large part of the constructed offspring is determined by the parents. When not to use respectful recombination There are cases when a respectful recombination operator should not be used. One of them happens when there is no correlation between fitness and distance to the search goal. Then, a mutation operator should be preferred to recombination, since directed jumps with recombination are aimless (Merz 2000). Another one is when the fitness landscape of the analysed problem is deceptive. In such a case an operator should be used which is disrespectful on purpose (Merz 2000). The last case listed by Merz (2000) happens when the analysis of a fitness landscape reveals that there is significant FDC, but local optima are too close to each other (he encountered such a landscape in the BQP). In these circumstances respectful recombination may frequently produce offspring identical to one of its parents. A mutation with a constant jump distance should be used in an MA for such a problem, Merz suggests. 5.3. Exploitation of fitness-distance correlation in a memetic algorithm 65 Greedy choices to complete an offspring But when a distance-preserving crossover is used in an MA, it is usually not sufficient to complete an offspring. This can be seen in figure 5.2: common parental properties (positions with the same values in this case) are not enough to define a complete offspring; some additional values have to be specified. This is quite important if local optima of the considered problem are not very similar to each other (the average distance between them is high, so usually there are few common properties), which usually happens at the beginning of a run of artificial evolution. Merz (2000) suggests that greedy choices may be used to complete an offspring with common properties already in place, especially in problems with low epistasis. Jaszkiewicz (2004) and Jaszkiewicz & Kominek (2003) follow Merz’s example. Examples of respectful recombination Not only does Merz (2000) argue in favour of the design of respectful crossover in presence of a positive FDC, but also gives numerous examples of such a design for the problems he investigated. He designed and implemented two respectful crossovers for NK-landscapes: a modification of the uniform crossover (UX) which preserves the Hamming distance, called HUX, and a greedy recombination operator (GX), which beside distance preservation also employed the aforementioned greedy completion of an offspring. Both of the operators, when used in a memetic algorithm, performed much better than competitors on NK-landscapes with positive FDC (K < 5). The competitors were: a multi-start local search and a genetic algorithm with uniform crossover, onepoint crossover or mutation only. Merz’s experiments on the BQP also demonstrate that distance-preserving crossovers perform well in a memetic algorithm for a problem with high FDC. For this unconstrained binary problem he also employed the HUX respectful operator in a memetic algorithm and it appeared that it was better than good competitors (tabu search, simulated annealing) taken from the literature. Nevertheless, a mutation-based MA fared even better, due to the very high similarity of local optima in the BQP, as indicated above. The experiments on the TSP support the idea of respectful recombination, as well. Merz (2000, 2002) designed two respectful operators for this problem. The distance-preserving crossover (DPX) preserved in an offspring all edges common to parents and completed it with foreign (uncommon) edges in a greedy way. The generic greedy recombination operator (GX) was also a distancepreserving one with respect to the bond distance, but the completion of an offspring followed a slightly different procedure: more parental edges could be inherited. Computational experiments (Merz 2002) compared the two operators with the maximum preservative crossover (MPX; see also section 4.4.3) in the memetic algorithm framework and they revealed that GX was the best, DPX the second best. These MAs were also better than iterated local search and a mutation-based MA. Respectfulness and a high inheritance rate of parental edges appeared to be the most important properties of a good crossover for the TSP. Following the generally positive results of his fitness-distance analysis of the GBP, Merz implemented 3 respectful crossovers for the problem (Merz 2000, Merz & Freisleben 2000b): the uniform crossover with distance preservation, the HUX crossover and the greedy crossover (GX). The latter operator differs from the two former ones in that it uses a greedy procedure to complete an offspring with common parental bits. Experiments with memetic algorithms confirmed that this GX, when used together with mutation, was superior to the two others, and also to other metaheuristics (simulated annealing, tabu search). The quadratic assignment problem was the last one investigated by Merz & Freisleben (2000a). 66 Fitness-distance analysis They designed two respectful operators for this problem. In the distance-preserving crossover (DPX) an offspring inherited all common properties of parents, but the completion phase was based on features of neither parent, thus resulting in a number of implicit mutations. The cycle crossover (CX), one the other hand, after transmitting common features of parents, also transmitted from them the properties which were not common; this way no implicit mutations happened. It appeared that the DPX was a good design for the few instances of the QAP with high FDC, while CX was even better, performing well also on some other instances with negligible fitness-distance correlation. Based on their FDA of a real-world vehicle routing problem, Jaszkiewicz & Kominek (2003) designed a distance-preserving crossover (DPX) preserving all 4 types of features they found important for high FDC. They used it in their memetic algorithm and compared to two other crossovers they implemented, which preserved some of the 4 features, but not all of them together. Computational experiments demonstrated that this DPX was the best, outperforming also some other metaheuristics (multi-start local search, simulated annealing and an evolutionary algorithm). Distance preservation in the crossover appeared to be crucial for good performance of the memetic algorithm for this VRP. Positive results of the fitness-distance analysis also convinced Jaszkiewicz (2004) to design a distance-preserving crossover for the satellite management problem he considered. There were 5 types of features which positively correlated with fitness in the problem, but one was dependent on some other one. Therefore, Jaszkiewicz designed an operator preserving all 4 independent features. He also implemented weaker versions of the operator, preserving fewer of the features, in order to test if the inheritance of all common features of parents in an offspring was essential. Results of his experiments showed that the more important types of features were preserved by an operator, the better its performance in the memetic algorithm was. Moreover, the memetic algorithm were better than iterated initial solution heuristic, demonstrating that the rational crossover design based on FDA was well worth the effort. Adaptation pattern: systematic design of a crossover operator for the memetic algorithm Examples from the literature and positive experience with crossover operator design based on fitness-distance analysis led Jaszkiewicz (2004) to the formulation of an adaptation pattern of the memetic algorithm to a combinatorial optimisation problem. The patter was formulated as1 : 1. Generate sets of good and diversified solutions for a set of instances of a given problem. 2. Formulate a number of hypotheses about solution features important for a given problem. 3. For each feature and each instance, test the importance of this feature with a correlation between the value of the objective and similarity of good solutions. The similarity is measured with respect to this feature. 4. Design distance preserving recombination operator assuring (or aiming at) preservation of common instances of features for which positive correlations were observed. The operator may preserve common instances of several features. (This pattern, although not directly formulated yet, was also followed by Jaszkiewicz & Kominek (2003) in their work on a real-world VRP ). 1 The list of steps in the pattern is a citation from Jaszkiewicz (2004) 5.3. Exploitation of fitness-distance correlation in a memetic algorithm 67 According to Jaszkiewicz (2004), the main goal of this pattern is to reduce the effort required to design a good optimisation algorithm, by avoiding not promising paths in the development of the algorithm. By effort he means the designer time and the time of computational experiments. Indeed, one certainly sees that in some cases the design of well-performing algorithms takes years (e.g. the TSP case, see comments by Jaszkiewicz & Kominek (2003)). By employing fitnessdistance analysis, which is actually the core of the pattern (points 1 through 3), the designer may discover which features are important for high quality of solutions (high FDC) and then design a recombination operator with this knowledge in mind, thus avoiding experiments with diverse operators in the trial-and-error manner. 5.3.2 Adaptation of mutation Merz (2000, 2004) says that mutation is especially useful in problems (instances) where fitnessdistance correlation is not revealed (‘unstructured landscapes’ he calls them). In such a case this operator simply makes jumps out of the basin of attraction of the current local optimum and enables the subsequent local search to find another one. But he also states that mutation becomes a valuable operation if local optima in the landscape are very close to each other (in a structured landscape). In landscapes of this kind respectful recombination becomes ineffective, producing very often one of the given parents. Thus, mutation with a constant jump length appears to be more suitable. This point of view on the negative effect of recombination (convergence) and the need for more mutations is also expressed by Reeves & Yamada (1998) in their paper on flowshop scheduling. What may also be deemed significant is that in presence of high FDC Jaszkiewicz (2004) and Jaszkiewicz & Kominek (2003) did not consider the use of mutation in their memetic algorithms at all. It should be noted, though, that they had small amount of time for computation and the danger of convergence was likely to be small in their cases. Examples of application Profitability of mutation was clearly visible in the experiments conducted by Merz (2000) on NKlandscapes with K ≥ 5, where FDC was negligible. He employed bit-flip mutation in his MA, with 3 bits being flipped simultaneously. This mutation-based MA was the best algorithm among all others: MA with HUX crossover, other GAs, and multi-start local search. The same conclusion might be derived from Merz’s experiments on the QAP (Merz 2000): for instances without significant FDC the memetic algorithm with mutation only was the best one. The mutation was a jump on a predefined distance in a random direction. All crossover-based MAs were worse. This conclusion was also arrived at by Hoos & Stutzle (2004). The BQP was the problem with high FDC and very closely located local optima, where recombination became unproductive in further stages of the algorithm. Merz’s (2000) experiments revealed that even in the genetic algorithm mutation was superior to uniform crossover (HUX); here, mutation flipped a large, constant number of bits, equal to the average distance between local optima. For the memetic algorithm, the version combining mutation and crossover was best, better than tabu search and simulated annealing, as well. An interesting mutation operator for the flowshop scheduling problem was designed by Reeves & Yamada (1998). In their memetic algorithm they used an operator which required two parent solutions: one to be mutated and the other to be a reference point in the search space. The mutated solution was then modified in such a way that the mutant was further away from the reference point than the original parent. In other words, it was a guided mutation. This design 68 Fitness-distance analysis was motivated by a path relinking viewpoint, which will be briefly discussed later. Their memetic algorithm was excellent in computational experiments, beating the competitors and producing some new best-known solutions at the time (although this performance was also due to other components in their algorithm). 5.3.3 Adaptation of local search Jaszkiewicz (1999) extended the idea of respectful recombination to local search. He argued that if the problem to be solved exhibits high fitness-distance correlation then it was very likely that common parental properties which had been inherited be an offspring during crossover were also common to other good solutions. Therefore, these properties should not be modified during local search, which is launched after each recombination. This leads to the idea of moves in local search which are forbidden (locked) after recombination. Also Merz (2000) considered such a technique. He concluded that it also resulted in significantly faster local search, due to smaller neighbourhoods to be evaluated. What Jaszkiewicz (1999) also noticed was that such local search with forbidden moves might result in offspring which were not local optima in the original, unconstrained landscape. That is why he also considered a two-phase local search after respectful recombination: first phase with forbidden moves, the second without them (the ordinary local search). Ideas similar to this of locked LS moves may also be found in Boese’s paper (Boese 1995), where he also considered forbidding certain moves based on the fitness-distance analysis. This technique is supposed to improve MA’s performance if there is high similarity of local optima in the fitness landscape. In such a case the number of locked moves is likely to be very high after each recombination, accelerating local search considerably. Yet, Moscato & Cotta (2003) advise exactly the opposite approach: to enable local search to change components of an offspring which are common to parent solutions. They argue that the global optimum may be completely different from local minima, so common properties should be modified. The author of this thesis thinks that although global optima may happen to be like this when we do not know them, the strategy proposed by Moscato and Cotta seems to be a wrong choice when positive fitness-distance correlation is present and, therefore, there is justified hope that global optima share many common properties with local ones. That technique of theirs may be useful, though, where there is no fitness-distance correlation in the considered problem. Examples of application Merz (2000) experimented with the technique of forbidden moves on several problems: the BQP, the TSP, the GBP and the QAP. In case of the TSP he concluded that it led to significantly reduced time of computation, with very good quality of results. The very same conclusion resulted from experiments on the QAP. In the GBP he also obtained increased performance of the MA (70%–100% more generations could be computed with the same time limit), but at the cost of a slight deterioration in the quality of final solutions. The TSP was also the subject of Jaszkiewicz’s (1999) investigations. The one-phase approach in his MA led also to a significant increase in performance; as much as 4 to 8 times less evaluations of the objective function had to be performed to obtain the same quality of results as with unconstrained local search. Jaszkiewicz also noticed a slight deterioration of quality of generated solutions when the same population size was used, but concluded that with a larger population for the one-phase approach the final quality was the same, while still reducing the MAs running time considerably. 5.4. Variants of the fitness-distance analysis 69 His two-phase approach (first, forbid some moves; second, use unconstrained neighbourhood) resulted in smaller accelerations but better quality. Thus, it was intermediate between the ordinary, unconstrained version and the one-phase technique. Jaszkiewicz also employed the one-phase technique in his memetic algorithm for the satellite management problem (Jaszkiewicz 2004). The recombination operators which locked some of properties of an offspring were the best performing ones. 5.3.4 Adaptation of other components Another possibility of exploiting fitness-distance correlation is to embed in the memetic algorithm a path relinking procedure, as Reeves & Yamada (1998) indicate. By path relinking (path tracing) they mean ‘tracing local optima step by step, moving from one optimum to a nearby slightly better one, without being trapped’. These authors put this idea to work in their MA for the flowshop scheduling problem (Reeves & Yamada 1998) and embedded a path relinking procedure into the crossover operator they designed. In the crossover the common elements of parents were always inherited (a respectful crossover it was), but the choice of the remaining ones was up to the relinking procedure. It started from one parent and moved step by step toward the other, thus exploiting the space between them in an organised fashion. ‘The results of this approach (. . . ) were far superior to those of methods based on simpler crossover operators, such as PMX and others’ (Reeves & Rowe 2003). Reeves & Yamada (1998) conclude their paper that this embedded path relinking procedure may be fruitful if ‘big valley’ is a feature of many combinatorial optimisation problems. 5.4 Variants of the fitness-distance analysis Not all the analyses conducted in the past were performed with the basic approach described earlier in section 5.2.1. In fact, there are multiple variants of the analysis present in the literature, most important difference being in the way distance is aggregated. 5.4.1 Analysis with only one global optimum known Reeves (1999) was probably the only one to have considered the possible difference in values of the FDC when multiple global optima are substituted with only one of them. He cautiously concluded from his experiments on the flowshop scheduling problem that results of the analysis with one optimum only were the same as with multiple optima available (and the distance dopt to the nearest one is computed). Obviously, Reeves could not have generalised his results to all problems, having performed the comparison on one problem only. However, the author thinks that it is likely that high FDC from the analysis with one optimum will translate to high FDC when multiple optima are used. 5.4.2 Analysis with the distance to the best-known solution Nevertheless, in practise it is much more probable that none of global optima of the considered problem is known in advance; the knowledge of any global optimum could imply the problem is already solved and there is no need to analyse it in order to design an algorithm. Therefore, most of variants of FDA focus on some substitution of global optima with other reference points, so the analysis could be applied in practise. 70 Fitness-distance analysis One of the variants uses the best-known solution instead of global optima and dopt is substituted with dbest ; the formula for FDC stays the same. This variant of FDA was performed by Boese et al. (1994) for the TSP and the GBP. Also Merz (2000) conducted this type of analysis for some instances of the GBP, NK-landscaes, the BQP and the QAP. This approach was also followed by Finger et al. (2002) in the case of the SCP. Interestingly, some of these results were confirmed with analyses where global optima were available for the same types of instances: • results of Boese et al. (1994) concerning a random TSP instance and regular graphs in the GBP were confirmed by Merz (2000); • Merz’s (2000) analysis of NK-landscapes with large N may to some extent be reinforced by earlier results of Jones & Forrest (1995), who experimented with much smaller values of N. Although not all analyses performed with this variant were verified by FDAs with global optimum known, the cases listed above may be the basis for a cautious hope that such concordance of results exists in other problems and types of instances. But despite this hope a question must be asked whether in the fitness-distance analysis one may freely substitute global optima with some other reference point. The author expects the answer to this question to be generally negative; the chosen reference point should ideally be an extremely good solution which is also close to some global optimum (Reeves & Rowe 2003). This cannot be verified when a global optimum is unknown, though. Therefore, the reference point should be chosen very carefully, since its form may influence the result of the analysis. Were it distant from the unknown optimum, the result might be deceptive, meaning that local optima of increasing quality may tend to be more similar to the reference point, but at the same time more distant from the global optimum. This is an inherent danger of FDA with unknown global optima. 5.4.3 Analysis with the average distance to all other local optima Another approach to FDA with global optima not known is to use, instead of dopt , the average value of distance of each sampled solution to all others in the sample: 1 davg (si ) = n−1 n X d(si , sj ) j=1,j6=i This type of analysis was performed by Boese et al. (1994) on the TSP and the GBP, beside the previous type. Boese (1995) compared this variant with the basic approach in the case of a geographical TSP instance. The same comparison was conducted by Reeves & Yamada (1998) for the flowshop scheduling problem. Watson et al. (2002) employed only this variant. There are dangers of this approach, though. Values of dopt and davg are not equivalent with respect to the size of samples they are based on: dopt is based on only one observation, while davg , being an average, on (n − 1) observations. This way the variance of each davg (si ) is substantially decreased, compared to dopt (si ) (the operation of taking the average usually decreases variance). Moreover, the effect of this reduced variance may be visible in the computed value of FDC: correlations of f with davg may be higher than those with dopt . In fact, there is such an effect in all cases when comparison is possible: in the study of the TSP by Boese (1995) and the analysis of the flowshop scheduling problem by Reeves & Yamada (1998). There the values of correlations when dopt is used are usually by 0.1–0.2 lower, which is a substantial value given the fact that even small correlations may be deemed significant. Moreover, 71 5.4. Variants of the fitness-distance analysis the scatter plots presented in these works clearly show decreased variance of davg compared to dopt . What may also be important, there is doubt whether the values of davg in the sample are actually realisations of independent random variables, as it is usually assumed while computing Pearson’s correlation coefficient. For example, davg (s1 ) and davg (s2 ) are based on comparison of s1 and s2 to almost the same set of (n − 1) reference points; (n − 2) points s3 , . . . , sn are exactly the same. Despite the two important drawbacks of this FDC variant, the author thinks it has one advantage over the variant with dbest : the reference point is not a single solution, but the whole sample and, therefore, it is less likely that the result of this analysis leads away from the best solutions. The best-known solution can always be added to the sample, anyway, which adds just a tiny bias, the sample being usually large. 5.4.4 Analysis with the average distance to not worse solutions A different way of substituting the global optimum was employed by Jaszkiewicz & Kominek (2003); instead of davg they computed the average distance of solution si to all not worse solutions in the sample. Assuming that all sampled solutions have different evaluations and are sorted with the best one at the beginning (a minimisation problem), f (s1 ) < f (s2 ) < . . . < f (sn ), this quantity is computed as: i−1 dbetter (si ) = 1 X d(si , sj ) i − 1 j=1 for i > 1 and stays undefined for the best solution in the sample, s1 . FDC is computed between f and dbetter in the sample without s1 . However, with this approach also the problem of variances arises: they are heterogeneous. For example, dbetter (s2 ) is based on one observation, while dbetter (s101 ) on 100 and dbetter (s501 ) on 500 of them. Thus, especially first elements of the sorted sample will have high variability of dbetter . This phenomenon may impact the correlation coefficient and the scatter plot in a hardly predictable way, and introduce such artifacts to the result that actually did not have any counterpart in the analysed landscape. Due to this fact Jaszkiewicz (2004) proposed another modification. In this study the average distance was not computed to not worse solutions, but in the set of all solutions not worse than f (si ): i−1 i X X 2 dˆbetter (si ) = dˆbetter (f (si )) = d(sj , sk ) for i > 1 i(i − 1) j=1 k=j+1,k6=j This way the size of each sample for dˆbetter (si ) was made larger that for dbetter (si ), but the problem of unequal variances remained unsolved. To address this problem, Jaszkiewicz proposed to remove the best 20 solutions from the sample after dˆbetter was computed. Hence, the smallest sample of distances actually used to compute the aggregated distance would be approximately of size 200, quite a large size for a sample. Nevertheless, the problem of unequal variances in the other part of the sample still exists, albeit less visible. Also, the removal of a number of best solutions may to some extent decrease the estimated value of correlation; this is called univariate selection in statistics (Ferguson & Takane 1989). Moreover, the most interesting group of solutions (the best ones) is removed from the sample in an arbitrary manner. All these issues raise doubt as to the objectivity of this approach. 72 5.4.5 Fitness-distance analysis Tests for the value of the FDC Some authors applied statistical tests in their attempts to objectively assess the significance of fitness-distance correlation. Classical tests One of such tests that at first sight may seem appropriate is the test employing the t statistic (Ferguson & Takane 1989, Krysicki et al. 1998): r t=r· n−2 1 − r2 where r is the value of the correlation coefficient computed from the sample of n solutions. With the null hypothesis being H0 : ρ = 0, the t statistic is supposed to have the t distribution. It is likely that Boese (1995) applied this very test in his TSP study, although he provided neither the null hypothesis, nor the statistic formula. However, this test assumes that the involved variables follow the bivariate normal distribution (Krysicki et al. 1998), which is a strong assumption that was not even reported to have been checked. There is another test for the significance of a correlation coefficient, which does not require this assumption to be met (Krysicki et al. 1998). Under the null hypothesis H0 : ρ = 0 and the alternative H1 : ρ 6= 0, the statistic: χ2 = n · r 2 has the χ2 distribution with 1 degree of freedom. There is the additional requirement that the sample should be large (at least several hundreds of elements), but this is easily met by the samples usually employed in FDA. But this test also has drawbacks, as all test which require a simple random sample to be drawn. This was noticed first by Reeves (1999): the values in the sample appear to be dependent; ‘for example, if local optima A and B are close in terms of their objective function values, and B is also close to C, then so are A and C’. Hence, if a classical test were used to assess the significance of correlation, the result could be wrong, because of an improperly chosen model for the real-world situation; this is sometimes called the type III error in statistics (Rao 1989). Randomisation test Reeves (1999) proposed a different approach to testing: a randomisation test. He reported that a similar problem of assessing the significance of correlation between distance matrices had been studied in psychology and biology, and solved by the Mantel’s test (Mantel 1967, Manly 1997). The test starts with two distance (similarity) matrices, Xij , Yij . In the case considered here, one of them could hold fitness differences and the other values of the employed distance measure. It is assumed that Xii = Yii = 0 for all i. The null hypothesis is: H0 : there is no clustering of objects with respect to both X and Y and the alternative: H1 : there is significant clustering the notion of clustering meaning there are objects which are close to one another in terms of both X and Y . 73 5.4. Variants of the fitness-distance analysis The test statistic is: Z= n X n X Xij Yij i=1 j=1 The distribution of the Z statistic under the null hypothesis is not established theoretically, as in classical tests, but by means of randomisation. If the null hypothesis were true, it should be of no importance which values of Xij are paired with some values of Yij ; there should be no clustering in any pairing, and any permutation of labels (values) of Xij should give the same ‘no clustering’ result in the value of Z. Thus, a large number of random permutations of one matrix (say, Xij ) is generated, for each of them the value znull of Z is obtained, and the set of these values defines the null distribution of the Z statistic. Now the value of z obtained from the original data is computed and compared to the null distribution. For a positive z, the fraction of values for which the condition znull ≥ z ∨ znull ≤ −z is true, is the estimated probability that the null hypothesis is true (the two-tailed test). If this fraction is smaller than the initially assumed level of significance α (usually α = 0.05 or α = 0.01), the null hypothesis is rejected in favour of the alternative one: there is clustering of the observed objects with respect to distances in X and Y . Reeves (1999) employed this test in his analyses and reported to have found numerous significant correlations between fitness and two distance measures (see table 5.1). Also Watson et al. (2002) reported to have used the test to check the significance of correlation between fitness and davg , but specifics were not given. This test, as all statistical tests, provides means of checking if the observed objects are really clustered with respect to fitness and distance, or the values of f and d happened to be like this merely by chance. As such, the test does not provide the assessment of the practical utility of the observed phenomenon. 5.4.6 Analysis of a set of pairs of solutions The author of this thesis proposes yet another approach to the fitness-distance analysis. Since there is doubt concerning the independence of sampled fitnesses and distances in most of the other approaches, he proposes to draw a sample of pairs of solutions P = (P1 , P2 , . . . , Pn ), where Pi = (si1 , si2 ) contains two independently drawn solutions (local optima). For each element Pi of the sample three quantities are computed: f1 (Pi ) = f (si1 ) f2 (Pi ) = f (si2 ) d(Pi ) = d(si1 , si2 ) and, consequently, three correlation coefficients r(f1 , d), r(f2 , d) and r(f1 , f2 ), with the first two being fitness-distance correlations. In order to get one fitness-distance relationship indicator from this kind of sample, one should compute the aggregated effect of the two: r2 = r2 (f1 , d) + r2 (f2 , d) which is the coefficient of determination between distance and both the fitnesses. The two correlations may be simply added, because variables f1 and f2 are independent in the sampling model described above. It may be verified by checking the value of r(f1 , f2 ), which must not differ significantly from zero. 74 Fitness-distance analysis If a coefficient comparable to FDC values of the other approaches were required, one may take the square root of the proposed determination coefficient. This coefficient was chosen as an F-D relationship indicator because it has sound interpretation in statistics (Ferguson & Takane 1989). It may be interpreted as a fraction of variance of one variable (here, d) that may be explained by variation in the other variables (here, f1 and f2 ) in a 3-dimensional linear regression model. In this approach a scatter plot may also be examined, although it is a 3-dimensional one. A 2-dimensional plot may be obtained by cutting a slice of the original, e.g. along the plane f1 = f2 of pairs of solutions with the same fitness. The proposed sampling model has several positive statistical features: • the measured quantities are sampled independently in each pair Pi ; • the distributions of d(Pi ) are exactly the same for all i; there is no difference in variances; • there is no aggregation of distance in the sampling procedure, so there is no artificial increase in the values of correlation coefficients. However, as in all approaches where global optima are unknown, the result of this one might be deceptive, i.e. there may be a visible trend that better solutions are more similar to each other than worse ones, but at the same time the similarity to the (unknown) global optima does not have to increase. 5.4.7 Comparison of all approaches Table 5.2 gathers all the discussed approaches to the fitness-distance analysis and their indicated negative features. One can see in the table that the two last approaches, Mantel’s randomisation test and the sampling of pairs of solutions, have the least negative features of all the approaches which may be applied to problems when global optima are unknown. Therefore, one of the two should be employed in practise. But whenever possible, results obtained with these approaches should be confirmed with the result of the Mantel’s test with known global optima. 5.5 5.5.1 Summary and conclusions Fitness-distance analysis Fitness-distance analysis explores a very interesting aspect of a search space of a combinatorial optimisation problem: the relationship between quality of solutions and their distance (either mutual or to a global optimum). If such a positive relationship exists, meaning that better solutions tend to be closer to each other and to the global optimum (high FDC for a minimisation problem), it justifies the introduction into a metaheuristic algorithm of some distance-preserving components which should improve the algorithm’s performance. The concept that based on some measure of distance between solutions, which does not depend on fitness, one may improve some components of an algorithm had not been explicitly considered before FDA was proposed. dopt yes no no no no yes no feature \ approach global optimum required one arbitrary reference point decreased variance of d· unequal variance of d· univariate selection probably dependent d· or f possibly deceptive result no yes no no no yes yes dbest no no yes no no yes yes davg no no yes yes no yes yes dbetter no no yes yes yes yes yes dˆbetter yes no no no no no no Mantel’s test with glob. opt. no no no no no no yes Mantel’s test without glob. opt. Table 5.2: Comparison of features of all the discussed approaches to the fitnessdistance analysis. no no no no no no yes sample of pairs 5.5. Summary and conclusions 75 76 Fitness-distance analysis However, the author of this thesis admits that FDA is not yet a properly developed method of analysis. There is no good mathematical model of what a ‘big valley’ actually is and FDC is not a parameter of such a model; at the moment FDC is simply a descriptive statistic of an intuitively understandable, yet vaguely defined, phenomenon of a ‘big valley’. Secondly, the result of this analysis is in the form of a linear correlation coefficient and a fitnessdistance scatter plot. The rules for interpretation of this result are arbitrary to some extent and differ between authors. Some say that even relatively small values of FDC and F-D plots with just a trace of a positive trend are valuable and may be profitable if exploited in an algorithm. Thirdly, as noted in section 5.2.2, FDA result strongly depends on the analysed landscape: the fitness function and the measure of distance. It is also dependent on a problem instance, not on the problem itself. Therefore, the result may be ambiguous for one problem. Fourthly, there were multiple versions of the analysis procedure proposed in the literature and there is no standard for it yet. This is not a ready-to-use method of analysis. Fifthly, the practical usefulness of the analysis requires that it be applicable to problems with unknown global optima. Yet, without them the analysis may happen to provide wrong guidance: better solutions may tend to be more similar to each other and to some arbitrary reference point, but at the same time more distant from the search goal. What is also important and was not discussed in this chapter, there are some arguments against the FDC as a reliable indicator discerning between problems. For example, Reeves & Rowe (2003) show that there is a formula for a modification of a landscape in a way which is undetectable to FDC. Although it is difficult to say how a landscape actually changes after such a modification (is it harder to optimise?), this argument casts doubt on FDC as a predictor of problem difficulty for evolutionary algorithms. Yet, despite all these arguments against FDA it should be said that in the past this method of analysis provided undeniably positive results in relatively numerous cases. Thus, it is hard to deny the existence of the phenomenon of a ‘big valley’, even if the method of analysis has weaknesses. The author of this thesis expects ‘big valley’ to be present in other, not yet analysed landscapes. 5.5.2 Exploitation of FDC in metaheuristic algorithms There are doubts concerning the link between FDC and problem difficulty for some metaheuristics, e.g. genetic or memetic algorithms (Bierwirth et al. 2004, Hoos & Stutzle 2004). This link was to some extent only qualitatively confirmed by the works listed in table 5.1. There is some pioneering work on this subject by Watson et al. (2003), but it concerns the JSP and tabu search only. Therefore, it seems that there remains much to be done in order to clarify whether the link exists and is strong, although many seem to believe it does exist. The author of this thesis believes there is a strong link between FDC and the efficiency of the memetic algorithm with certain algorithmic components. This belief is based mainly on results of Boese et al. (1994), Reeves & Yamada (1998), Merz (2000, 2004), Jaszkiewicz (2004), Jaszkiewicz & Kominek (2003), which convince the author that components based on ideas of distance-preservation are highly effective in presence of a positive FDC (although some other types of components may also be able to exploit it). That is why the adaptation of the memetic algorithm to two chosen problems of combinatorial optimisation, which is described in the next chapters, is based on results of the fitness-distance analysis. Chapter 6 The capacitated vehicle routing problem 6.1 Problem formulation The capacitated vehicle routing problem (CVRP) was informally described in section 2.1.2. More formally, let G(V, E) be an undirected graph, V = {v0 , v1 , . . . , vN }, N ≥ 1, is the set of vertices, v0 represents the depot, while other vertices are customers. Let c be the function of cost of an edge in G: c : E → R+ ∪ {0} and d be the customers’ demand function: d : V → R+ where the demand of the depot is set to zero: d(v0 ) = 0. It is also assumed that: ∀ v ∈ V : d(v) ≤ C, where C > 0 is the capacity constraint of a vehicle. Under these assumptions the quadruple I = (G, c, d, C) is an instance of CVRP. A solution s of the CVRP is a set of T (s) routes: s = {t1 , t2 , . . . , tT (s) }. A route has the form: ti = (v0 , vi,1 , vi,2 , . . . , vi,n(ti ) ) for i = 1, . . . , T (s), where n(ti ) is the number of customers in route ti , the following constraints being satisfied: ∀ i ∈ {1, . . . , T (s)} ∀ ki ∈ {1, . . . , n(ti )} : vi,ki ∈ V \ {v0 } ∀ i, j ∈ {1, . . . , T (s)} ∀ ki ∈ {1, . . . , n(ti )}, kj ∈ {1, . . . , n(tj )} : ¡ ¢ (i 6= j) ∨ (ki 6= kj ) ⇒ (vi,ki 6= vj,kj ) ∀ v ∈ V \ {v0 } ∃ i ∈ {1, . . . , T (s)}, ki ∈ {1, . . . , n(ti )} : v = vi,ki (6.1) (6.2) (6.3) n(ti ) ∀ ti ∈ s : X ki =1 77 d(vi,ki ) ≤ C (6.4) 78 The capacitated vehicle routing problem Constraint 6.1 means that each vertex of each route (except for v0 , the depot) represents some customer; constraints 6.2 and 6.3 require that each customer is visited exactly once in a solution; condition 6.4 ensures that the sum of demands of customers on each route (serviced by one vehicle) does not exceed the capacity constraint. Let S denote the set of all feasible solutions of the form s. Let f denote the function of cost of a solution (cost of all edges traversed by vehicles): f : S → R+ ∪ {0} f (s) = X³ ´ n(ti )−1 c(v0 , vi,1 ) + c(vi,n(ti ) , v0 ) + ti ∈s X c(vi,ki , vi,ki +1 ) (6.5) ki =1 The goal of CVRP is to find such sopt ∈ S that: ∀ s ∈ S : f (sopt ) ≤ f (s). The CVRP is NP-hard in the strong sense (Toth & Vigo 2002a), since it contains the TSP as a subproblem. Even for some relatively small instances of the CVRP with only 75 customers global optima remain unknown. Informally speaking, the problem contains two subproblems (Laporte & Semet 2002, Altinel & Oncan 2005): 1. The grouping (clustering, bin-packing) subproblem: the goal is to partition the set of all customers into well-packed subsets which later form separate routes. 2. The routing subproblem: the goal is to solve the TSP in each group (cluster) of customers. 6.1.1 Versions and extensions There are several versions of the basic problem (Toth & Vigo 2002a): • a constraint on the maximum distance travelled by a vehicle may be set; • some additional cost is specified for the use of a vehicle and another objective is to minimise the cost of vehicles in a solution; • a fleet of diverse vehicles may be available; • routes with single customers may be prohibited. Moreover, there are also some extensions of the vehicle routing problem considered in the literature. These are, among others, the problems with: time window constraints, split deliveries, multiple depots, pickup and delivery (Toth & Vigo 2002a). The basic CVRP was also considered from the multi-objective point of view (Jozefowiez et al. 2007), with one objective being the usual cost objective (as defined by equation 6.5). The other was the equilibrium objective, where the difference between the cost of the longest and the shortest route should be minimised. None of these variants and extensions is considered in this thesis. 79 6.2. Instances used in this study 6.2 Instances used in this study Instances used in this study (see table 6.1) are taken from Taillard’s website (Taillard 2008), but they have been used in many studies as benchmarks and come from several sources (according to Toth & Vigo (2002a)): • c50, c75 and c100 are taken from a work by Christofides and Elion, • c100b, c120, c150 and c199 originated from Christofdes, Mingozzi and Toth, • f71 and f134 are two larger out of 3 Fisher’s instances, • all 13 tai* instances come from Taillard (Rochat & Taillard 1995). All these instances represent Euclidean CVRP (vertices are located in a plane). Their basic properties are listed in table 6.1. Column ‘name’ lists names of these instances as given by Taillard (Taillard 2008), while ’name in the VRP’ gives names given in Toth & Vigo (2002a), which also have been used in numerous CVRP studies (the names with† are created by the author of this thesis following the convention used in (Toth & Vigo 2002a), because this book did not consider the instances). ‘Best-known cost’ column gives the cost of the best-known solution for each instance. The symbol of asterisk (∗ ) is put beside costs that are known to belong to globally optimum solutions. The column ‘best-known status from’ gives reference to works which were the source of information about the best-known solutions for the author of this thesis. Table 6.1: Basic information about used instances of the CVRP. name name in the VRP size (N ) c50 c75 c100 c100b c120 c150 c199 f71 f134 tai75a tai75b tai75c tai75d tai100a tai100b tai100c tai100d tai150a tai150b tai150c tai150d tai385 E051-05e E076-10e E101-08e E101-10c E121-07c E151-12c E200-17c E072-04f E135-07f † E076a10t † E076b10t † E076c09t † E076d09t † E101a11t † E101b11t † E101c11t † E101d11t † E151a15t † E151b14t † E151c15t † E151d14t † E386-47t 50 75 100 100 120 150 199 71 134 75 75 75 75 100 100 100 100 150 150 150 150 385 best-known cost ∗ 524.61 835.26 826.14 ∗ 819.56 1042.11 1028.42 1291.29 ∗ 241.97 ∗ 1162.96 1618.36 1344.62 1291.01 1365.42 2041.34 1939.90 1406.20 1581.25 3055.23 2656.47 2341.84 2645.39 24431.44 best-known status from (Prins 2004, Gendreau et al. 2002) (Prins 2004, Gendreau et al. 2002) (Prins 2004, Gendreau et al. 2002) (Prins 2004, Gendreau et al. 2002) (Prins 2004, Gendreau et al. 2002) (Prins 2004, Gendreau et al. 2002) (Ho & Gendreau 2006) (Toth & Vigo 2002b) (Toth & Vigo 2002b) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) (Alba & Dorronsoro 2006) 80 6.3 The capacitated vehicle routing problem Heuristic algorithms for the CVRP The capacitated vehicle routing problem was formulated in 1950s, and since then a large variety of exact, heuristic and metaheuristic algorithms was proposed to solve it. Therefore, this review will focus on algorithms that have some importance for this thesis: evolutionary algorithms, tabu search (because this type provided the best-known solutions in the past) and others that were implemented by the author. For a broader review the reader is referred to the survey by Aronson (1996) or the monograph by Toth & Vigo (2002b). 6.3.1 Savings algorithm by Clarke and Wright Clarke & Wright’s (1964) heuristic was one of the first formulated for the problem. The algorithm starts with a ‘daisy-shaped’ solution, i.e. the one where each customer is put in its own route. Then it uses a merge move to improve the solution; a merge removes the last edge of one route, the first of another route, and merges the two routes into one by creating a direct link between customers with removed edges. The merge chosen to be effective is the one with the largest saving (i.e. improvement in cost) among all feasible route merges. The algorithm stops when there is no feasible and improving merge. This heuristic is more formally described in algorithm 6. Algorithm 6 The savings algorithm (the parallel savings version). s = ∅ {an empty solution} for all customers v do {build a separate route for each customer} t = (v0 , v) s=s∪t repeat {find and perform the merge of two routes with the largest saving} bestM ergeCost = 0 for all routes ti in solution s do for all routes tj in solution s, tj 6= ti do s0 = mergeRoutes(ti , tj , s) {s0 is s with merged routes ti and tj } if isF easible(s0 ) then {check if it is feasible to merge ti and tj } if f (s0 ) − f (s) < bestM ergeCost then bestM erge = s0 {remember this better merge} bestM ergeCost = f (s0 ) − f (s) if bestM ergeCost < 0 then s = bestM erge {perform the best-found merge} until bestM ergeCost == 0 return s This original version of the method is deterministic and provides only one solution to each CVRP instance. Laporte & Semet (2002) and Altinel & Oncan (2005) say this heuristic is one of the most popular in commercial routing packages, because it is fast, relatively easy to implement and adapt to more complex versions of the VRP. Perhaps this is the reason why it has been modified and enhanced many times. The basic version itself may be implemented in two ways (sequential or parallel savings). Moreover, the saving of a merge may be parametrised (Laporte & Semet 2002, Altinel & Oncan 2005) in order to make preference for merging of: customers near each other, customers approximately equidistant form the depot, or customers with large demands. Finally, some specialised data structures may be used to accelerate the algorithm (Altinel & Oncan 2005), which has O(N 3 ) pessimistic time complexity. 6.3. Heuristic algorithms for the CVRP 6.3.2 81 Sweep algorithm by Gillet and Miller The heuristic described by Gillet & Miller (1974) is of the category usually called ‘cluster firstroute second’, because in the first stage it addresses the clustering subproblem, leaving the routing aspect to the second stage. This method may be used for planar instances of the CVRP only: it relies heavily on the values of coordinates of vertices in 2 dimensions. It starts with any straight line going through the depot vertex and ‘sweeps’ the customers into a cluster by rotating the line around the depot; if a cluster is full (i.e. the next swept customer would result in the capacity constraint being violated), then it is stored and a new one is created, and the rotation continued. Thus, this method is a greedy one, in a way. Finally, TSPs are solved independently in each cluster. The heuristic is described more formally in algorithm 7. Algorithm 7 The sweep algorithm. choose any straight line going through the depot sort the list L of all customers with respect to the angle between the chosen line and the line from the depot to each customer R = ∅ {an empty cluster of customers} S = ∅ {an empty set of clusters} while list L contains unassigned customers do v = f irstU nassigned(L) {first in the list without a cluster assigned} R0 P = R ∪ {v} {temporarily add the customer to the cluster} if u∈R0 d(u) > C then {adding v permanently would exceed capacity} S = S ∪ R {add the feasible cluster to the set} R = {v} {start a new cluster} else R = R0 {add v to the current cluster} s = ∅ {an empty solution} for all R ∈ S do {construct a solution based on the set of clusters} t = solveT SP (R) {solve the TSP in the set of customers R} s=s∪t return s Gillet & Miller’s (1974) method concentrates on the clustering subproblem of the CVRP, leaving lots of freedom of implementation concerning the routing subproblem. According to Laporte & Semet (2002), the TSPs may be solved by virtually any method (either exact or approximate). Therefore, multiple versions of the algorithm may exist. This heuristic may be potentially used to generate multiple different solutions, because it may be initialised with any straight line going through the depot, perhaps resulting in different clusters. The time complexity of this algorithm is undefined until the TSP algorithm is given. The pessimistic time complexity of the clustering stage is O(N log N ), due to the call to a sorting procedure. 6.3.3 First-Fit Decreasing algorithm for bin packing Another algorithm focused on the clustering subproblem is the First Fit Decreasing heuristic (Falkenauer 1998). It is actually an algorithm designed for the bin packing problem, which is a clustering problem only and ignores any routing requirements. Thus, it does not exactly produce routes, but clusters of customers; TSPs in each cluster have to be solved afterwards. The routes may be of extremely poor quality, because very distant customers may be put together in a cluster. However, the algorithm may also generate fewer routes than other heuristics, because it is not bound by distances and completely focused on clustering. 82 The capacitated vehicle routing problem The idea of the algorithm is very simple. First, sort the customers in the decreasing order of their demands (volumes). Then, consider customers in that order and put each customer in the first cluster (bin) with sufficient remaining space. If there is no such bin then a new one is created. The details of the procedure are shown in algorithm 8. Algorithm 8 The First-Fit Decreasing algorithm. sort the customers by decreasing demands d(v); the index i of vi reflects the order numBins = 1 {start with a single bin} binDemand(0) = 0 {it is empty} for i = 1 to N do binF ound = false for bin = 0 to numBins do if binDemand(bin) + d(vi ) ≤ C then binDemand(bin) = binDemand(bin) + d(vi ) customerBin(vi ) = bin binF ound = true break if not binF ound then binDemand(numBins) = d(vi ) customerBin(vi ) = numBins numBins = numBins + 1 return customerBin The time complexity of this clustering algorithm is O(N 2 ) in the worst case. 6.4 6.4.1 Metaheuristic algorithms for the CVRP Iterated tabu search by Taillard Taillard (1993) developed an iterated tabu search algorithm for both the CVRP and the distanceconstrained CVRP. For Christofides instances (c*) this algorithm produced solutions that stayed best-known for at least several years (see Prins (2004)), so it is worth seeing how it works. The algorithm was divided into two levels. The higher level algorithm generates decompositions of a CVRP instance into several smaller ones, usually involving 4 to 8 routes. The lower level algorithm solves each part independently on others by means of tabu search, and returns the best-found solution to the higher level. Lower level tabu search This level starts with a ‘daisy-shaped’ solution (each customer in its own route; see also section 6.3.1). It is modified by two neighbourhood operators, with solution feasibility always maintained: • swap: swaps two customers between two different routes; • move: moves one customer from its route to some other one. Both of these operators were implemented in the way that firstly removed a customer from its route and then inserted it in the best-possible place in the second route. Taillard devoted some effort into efficiently implementing the operators. He noticed that a swap or a move influenced only some of the moves that would be considered in the next iteration, so the unaffected evaluations were stored between iterations; this design accelerated the whole process. 6.4. Metaheuristic algorithms for the CVRP 83 As it is usual in tabu search, a tabu list is maintained in the algorithm. It stores the indexes of moved customers and their original routes. A move is tabu if it attempts to insert a customer back into the same route it was placed before, unless the move improves on the best solution found so far. The length of the list is proportional to the number of customers in the solved instance. Another technique employed by Taillard was to forbid in tabu search the moves that were too frequent. He noticed that without such mechanism some customers (those near the depot) were very often moved and it resulted in less diversified search (although diversity was not measured). Finally, once in 200 iterations or when a solution is reached with cost less than 0.2% above the best-found, the tabu search launches an exact algorithm to solve independently the TSPs defined by each route. Since the instances considered by Taillard (1993) had routes with less than 30 customers, the exact algorithm completed in short time, producing optimum routes with small computational cost. Higher level decompositions This level was designed in two different ways for two types of instances: uniform and non-uniform Euclidean (see section 6.2). The decision on which of the two methods should be used for an instance was made manually. Uniform problems: polar and concentric sectors Taillard (1993) noticed that all but one (c120) Christofides instances were uniform and that this type of instances had an interesting locality property: ‘it is useless to try to move a city [(customer)] to every other tour [(route)]; it is sufficient to try to move this city to some of the nearest tours’. Therefore, he recommended to partition such instances into polar and, for larger instances, concentric sectors, each containing a small number of vehicles (routes). Once the partition is performed, each sector is solved independently and the final solutions from sectors are merged into one. This decomposition was not performed in entirely automatic way. Taillard manually set the total number of vehicles (routes) to be distributed among subproblems. When the low-level tabu search is finished, the decomposition algorithm is launched again, but now there are more types of objects to be considered: complete routes, unsatisfied customers and unused vehicles. Taillard decided not to partition the routes, but to consider them as supercustomers located at the route’s centre of gravity. With this assumption another decomposition is generated; unused vehicles are randomly assigned to subproblems. Non-uniform problems: branches of a spanning tree The set of instances considered by Taillard (1993) included also 3 non-uniform ones: c100b, c120 and tai385. Taillard expected that the decomposition into polar and concentric sectors should not work well for non-uniform instances, so he resolved to partition them in another way, based on branches of a spanning tree. This approach was backed up by the intuition ‘that cities that are near to each other will probably belong to the same branch of the arborescence [(spanning tree)] and thus will probably belong to the same subproblem.’ Taillard probably thought that an ordinary minimum-cost spanning tree was not exactly what he wanted, because of the special role of the depot vertex in the CVRP: all routes start and eventually finish at the depot. He decided to build a minimum-cost spanning tree but for a somewhat modified matrix of distances between vertices. Generally speaking, the modification added some distance-proportional penalty to edges that had larger angle with the edge directed to 84 The capacitated vehicle routing problem the depot (larger cosine of the angle). The effect was that the generated spanning tree preferred edges in the direction of the depot to the ones perpendicular to this direction. This tree was further decomposed into branches (subproblems) by a greedy algorithm. This procedure in each step cut a branch of the tree that was largest in terms of vertices, had the largest sum of demands and was furthest from the depot in terms of the number of edges. Contrary to the procedure for uniform problems, this decomposition procedure was not iterated. Instead, each series of tabu search was launched starting from the best solution found in the previous series. Computational results Taillard (1993) did not provide an exact stopping condition for his algorithm. However, concerning only the quality of results for 7 Christofides CVRPs, this algorithm was able to generate the bestknown solutions (new at the time for) 3 instances (c75, c150, c199) and also find the best-known for the other 4. The algorithm improved some best-known solutions of the distance-constrained CVRPs. Concerning the time of computation, Taillard’s algorithm performed generally better for uniform problems that an earlier state-of-the-art tabu search by Gendreau, Hertz and Laporte (see Taillard (1993) for a reference). It generated solutions of desired average quality (5%, 2%, 1% above the best-known) in significantly shorter time. Taillard’s (1993) spanning tree-based algorithm performed better than polar partitioning for two non-uniform instances, c120 and tai385, although c120 was solved to the best-known value by the latter. Overall, this algorithm provided results that were undeniably excellent at the time of publication (1993) and also for several years that followed. Comments Taillard’s motivation to use tabu search was rather pragmatic: the best previous approach was also a tabu search algorithm and he wanted to generate solutions of very high quality. The algorithm presented above is not a pure TS; it is an early hybrid tabu search, because it also involves some exact algorithm to solve TSP subproblems (this hybridisation was also motivated by the goal of generating solutions of very high quality). In the opinion of the author of this thesis, Taillard’s algorithm may be generally classified as a ‘cluster first/route second’ approach, although in the higher level a decomposition into subproblems with several clusters is performed, not into single clusters. Nevertheless, in the beginning the algorithm focuses on the decompositions which are only to some small extent modified during search, or not modified at all for non-uniform problems. One of the important elements of the algorithm appeared to be the initial partition of a uniform instance into subproblems. Taillard noticed that he had not found any rules for preferring a particular partition over another and that this initial decomposition had great influence on the final solution. What one may also notice while reading Taillard’s (1993) paper is that several times he motivated his heuristic approach through comparison with the form of the best-known solutions. The geometric form of these solutions was exactly the motivation for the decomposition into polar and concentric sectors. It was also the basis for the modification of the initial decomposition in further steps of the algorithm. It seems, therefore, that Taillard wanted his heuristics to produce solutions similar to the best-known ones (most likely in terms of a decomposition) so that tabu search 6.4. Metaheuristic algorithms for the CVRP 85 could easily find them, working in their proximity. One might even think that this algorithm tried somehow to re-engineer the best-known solutions of uniform instances, which resulted in a success. It also seems that Taillard had very good intuition concerning his polar decomposition algorithm. The locality property he mentioned seems to be true for this uniform instances, although it should be somehow measured to be confirmed. Nevertheless, it seems that such a partition method saves his algorithm a lot of effort. Taillard’s focus on the decompositions, his remarks that the initial partition frequently determined the quality of the final result, and the excellent quality of results of his algorithm seem to indicate that the clustering subproblem in the CVRP is more important than the routing one. 6.4.2 Iterated tabu search by Rochat and Taillard Rochat & Taillard (1995) decided to improve the previous algorithm of Taillard (1993), because the latter had been designed to handle mainly uniform instances of the CVRP, while real-world problems were usually non-uniform. They did it also because Taillard’s algorithm could easily be trapped in poor local optima. Their improved iterated local search could also handle the VRP with time windows. Motivation for components of the algorithm These authors gave a very interesting motivation for their design, especially to the intensification procedure they implemented, which was based on the notions of strongly determined and consistent variables. ‘A strongly determined variable in one whose values cannot be changed except by indicating a disruptive effect on the objective function value or on the value of other variables’, while ‘a consistent variable is one that is frequently strongly determined at a particular value (or in a narrow range). Specifically, the more often a variable receives a particular value in the best solutions (. . . ), the more it qualifies as consistent’. Thus, the identification of consistent variables required defining what actually variables are in the CVRP and then measuring their frequency in the best solutions. Rochat & Taillard (1995) decided to use as a variable the presence in a solution of a route of a certain form. They based this definition on hope that solutions generated by local search (initial solutions in particular) already contain ‘all the information necessary to create solutions of very high quality (. . . ), but in a non-apparent way’, because ‘this information is included in the tours [i.e. routes]. So one hopes that the initialization creates a set of tours that included members not very different from the tours of a good solution 1 ’. To somehow support their intuition they showed in a figure a set of routes from some initial local optima and compared it to the best-known (at the time) solution to the same instance (c199), concluding ‘that several tours are similar, two of them being identical’. Given this definition of a variable, they focused their algorithm on measuring and weighting the frequency of routes in good solutions, and then on using the gathered weighted frequencies in generating initial solutions for tabu search. Low level tabu search The tabu search of Taillard (1993) was used in this algorithm, although slightly changed: the decomposition procedure had been improved and calls to the exact TSP solver replaced by calls 1 Emphasised by the author of this thesis. 86 The capacitated vehicle routing problem to some heuristic. Details of the changes were not provided; Rochat & Taillard (1995) concluded only that they improved the behaviour of the method on non-uniform problem instances. Initialisation of the multiset of routes In order to discover the consistent variables, one large, weighted multiset of routes is maintained in the algorithm. The multiset is initialised with routes of 20 different local optima, resulting from different problem decompositions followed by tabu search, except for routes with one customer only (‘since they do not contain interesting information’ (Rochat & Taillard 1995)). Each route is labelled (weighted) with the cost of the solution it belonged to. Construction of a solution based on the multiset of routes; iteration The multiset of routes is then used in construction of an initial solution for the subsequent tabu search. This construction creates a solution by sequentially adding routes from the multiset. This process is randomised and gives probabilistic preference to routes that originated from better solutions, but avoids routes with customers chosen in previous steps. This way a solution is created, and the process stops when no route can be added. If this results in a partial solution, it is somehow completed in a feasible way. Then the original decomposition procedure of Taillard (1993) follows. In this algorithm, the procedure also accepts routes of the constructed solution as input. Next, the low level tabu search is launched. The solution generated in this way is then used to enrich the multiset of routes, in the same way as during initialisation phase. However, no more than some specified number of routes (e.g. 260) is stored. If necessary, the worst routes are abandoned. Finally, the loop closes when the next construction phase is started. Post-optimisation Rochat & Taillard (1995) noticed that the construction phase was sometimes able to improve the solution best found so far even without the help of tabu search. This meant that some subsets of routes from the multiset defined solutions better than all encountered earlier. Therefore, they decided to introduce a post-optimisation phase in their algorithm, that would attempt to construct a good solution based on the multiset only. This phase actually consists in solving the set-partitioning problem induced by routes in the multiset. This problem is solved in an exact way by a commercial mixed integer programming solver. Computational results The iterated tabu search was first tested on Christofides uniform instances and the authors concluded that it was generally slower than the method of Taillard (1993), which had been designed especially for such instances. Conversely, the results on non-uniform instances c120, f71, f134 and tai385 were better than those of Taillard (1993); the new algorithm was faster in finding very good solutions of a given quality above the best-known. Rochat & Taillard (1995) also performed experiments on 12 non-uniform instances they generated (tai75*, tai100*, tai150*) and compared their algorithm again to that of Taillard (1993), and to the tabu search of Gendreau, Hertz and Laporte that was mentioned earlier (see the works of 6.4. Metaheuristic algorithms for the CVRP 87 Taillard for a reference). Again, their algorithm outperformed the competitors in quickly finding extremely good solutions. Some additional experiments were performed with the post-optimisation technique of theirs. After 5 runs of the main algorithm with 50 (70) steps for instances with 100 (150) customers, 250 routes of the best generated solutions were added to the multiset. These routes defined the set-partitioning problem solved in the exact way. This experiment allowed Rochat & Taillard to find 3 new best-known solutions for instances f134, c199, tai385, and also 2 new best-known for new instances tai100a and tai100c. The best-known values were obtained for the other two tai100* problems, as well, but without further improvement. No improvement of the best-known was obtained for tai150* instances. No time of computation was mentioned in this case. Overall, it seems that this algorithm produced excellent results for non-uniform instances, being somewhat worse than the previous method of Taillard (1993) on the uniform ones. Comments The method of Rochat & Taillard (1995) is yet another example of a hybrid algorithm. Not only does it contain an exact procedure in the post-optimisation step, but also the construction of a solution based on the multiset of routes resembles an adapted multi-parent recombination operator (the fact the authors noticed themselves). Thus, a person who classified this method as an adapted memetic steady-state algorithm would not be entirely wrong. What struck the author of this thesis in the motivation of this algorithm was that Rochat & Taillard based their design on the notion of a consistent variable. This notion seems to be very similar to the one of a solution property that is important for the objective function, which is used while hypothesising about fitness-distance correlation (this was discussed in chapter 5). Although expressed in a different language, the motivation of Rochat & Taillard seems to be based exactly on the hypothesis that good solutions of the CVRP have many properties in common. They intuitively chose this property (a consistent variable) to be the presence in a solution of a route of a certain form. However, except for the visual similarity of routes of local optima to routes of one best-known solution, they did not provide any other objective, measurable argument that their hypothesis about routes being consistent variables was true. One might say, of course, that very good computational results of their algorithm confirm their hypothesis and the whole approach, but in a complicated algorithm such as this it is extremely difficult to tell which component was actually responsible for the excellent final solutions. The effects of the post-optimisation technique shed some light on this issue. This technique was able to find or even improve the best-known solutions for 4 instances tai100*, and this fact proves that the best-known or even better solutions might be assembled out of routes of other good ones. However, instances tai150* are examples of exactly the opposite case: none of the best-known solutions could be assembled from pieces in the multiset of 250 routes, proving that the best-known solutions to these instances are not as similar to other good solutions as Rochat & Taillard would have liked them to be. Therefore, it seems that the intuition behind this algorithm was in several cases rather right, while in several others rather wrong. 6.4.3 Route-based crossover by Potvin and Bengio The hybrid genetic algorithm by Potvin & Bengio (1996) was one of the first for vehicle routing problems, specifically for the CVRP with time windows. In particular, these authors were first to propose dedicated recombination operators: sequence-based crossover (SPX) and route-based crossover (RBX). SBX seems to be specific to the time-windows case, so it is not considered 88 The capacitated vehicle routing problem here. On the other hand, RBX was employed in some EA for the bi-objective CVRP (Jozefowiez et al. 2007), so it will be presented here in the version for this CVRP. The general RBX idea is to combine complete routes of two parent solutions. In the first stage, the operator chooses a random number of routes to be copied from the first parent. Then, the actual routes, also chosen randomly, are copied to the offspring. In the second stage all routes of the second parent are copied to the offspring, but the offspring’s feasibility is maintained all the time: the customers already in the offspring are omitted in the copy process. By such design, Potvin & Bengio wanted firstly to guarantee the feasibility of the generated offspring. Then, their goal was to generate better solutions through recombination of good characteristics of parent solutions. And although they did not explain what a good characteristic actually was, their idea seems to be clear: good routes make a good solution. Thus, RBX seems to be similar in spirit to the construction algorithm based on a multiset of routes, developed by Rochat & Taillard (1995). However, in RBX there are no weights assigned to routes and there are only two parents involved. 6.4.4 Memetic algorithm by Prins Prins (2001, 2004) argued that there had been no evolutionary algorithm for the CVRP that could have competed with the best tabu search approaches available at the time (the point of view also expressed by Laporte & Semet (2002)). Therefore, he resolved to develop such an algorithm. The one he constructed has several interesting features (some of them will be used in this thesis) and indeed proved to provide very good results at the time, so it will be described here. Formal description will not be given, since the algorithm generally fits the scheme of an MA given in section 2.4.5. This MA is also able to solve distance-constrained CVRPs (see section 6.1.1). Sequential representation Prins chose a permutation representation for his MA: a CVRP solution was represented as a sequence of customer indexes, without any route delimiters. His choice was inspired by a work by Beasley (see Prins (2004) for a reference), who had solved the problem with a ‘route first/cluster second’ approach. In this approach the routing subproblem of the CVRP is solved first, while the clustering one is left to a second stage. Prins exploited the fact that there was a procedure that for a given sequence of customers produced the optimum (for this sequence) set of routes, in polynomial time. Prins (2004) gives the following example of the way the procedure works. Lets assume that an instance of the CVRP with 5 customers is given. The graph with some edges is shown in figure 6.1. Vertices are denoted by v0 , v1 , . . . , v5 , with v0 being the depot. Edges are labelled with cost, while customers with demand (in parentheses). Now let us assume that a sequence of customers is given, s0 = (v1 , v2 , v3 , v4 , v5 ), which represents a CVRP solution without route delimiters. The procedure (called Split) requires that an auxiliary directed graph be built, with all vertices but one representing customers; one additional vertex is added, representing some auxiliary source (i.e. a vertex immediately before v1 in sequence s0 ). An arc (u, v) is added to the graph if there is a feasible route in the original graph that starts with the customer immediately after u in the sequence s0 and finishes with v (including all the customers in-between, in the order given by s0 ). Such an arc is labelled with the cost of the corresponding route. The auxiliary graph for the considered example is shown in figure 6.2. 6.4. Metaheuristic algorithms for the CVRP 89 Figure 6.1: Graph of a CVRP instance - Prins’ example. Figure 6.2: Auxiliary graph for the given sequence s0 - Prins’ example. When the graph is built, the Split procedure finds the shortest path from the auxiliary source vertex to the last customer vertex (here, v5 ). This path determines the optimum split of the given sequence into feasible routes. In the example the shortest path is indicated in figure 6.2 with thick lines and defines a solution with 3 routes, s = {(v0 , v1 , v2 ), (v0 , v3 ), (v0 , v4 , v5 )}. Pessimistic time complexity of the Split procedure appears to be O(N 2 ), due to the auxiliary graph construction phase. Sequential representation - comments The choice of this representation undoubtedly has positive sides: there should be a large reduction in the size of the search space for an evolutionary algorithm, because there are fewer sequences of customers than sets of routes for the same instance (although the actual reduction should be enumerated to be sure about its extent). Moreover, as Prins (2004) notes, the optimum solution may always be represented in this form, even by more than one permutation (depending on the number of routes). However, the obtained, reduced problem is still NP-hard and the space to be searched is of exponential size. On the negative side, there is the effect that not every CVRP solution may be represented as a sequence. Of course, its routes may be concatenated and stored as a sequence of customers, but after decoding the final solution may be different from the initial one. And although this final solution would be at least of equal quality, there would be an uncontrolled change to the underlying solution which could be described as mutation implicit in the Split procedure. Moreover, in this sequential representation there is a problem with application of neighbourhood operators usually used for CVRP solutions, because there are no explicitly encoded routes. Again, a CVRP operator may be syntactically applied to a sequence, but the result may be different from the desired effect: the Split procedure may somehow alter the contents of some routes. 90 The capacitated vehicle routing problem To summarise, this representation may be categorised as an indirect one, where the Split procedure is used as a special decoder. Such representations were used in the past e.g. for scheduling problems (Michalewicz 1996, Pawlak 1999) or bin packing (Michalewicz & Fogel 2000). Local search as mutation operator The title of this paragraph is taken from Prins (2004). He argued that in order for an EA to be competitive with tabu search or simulated annealing it was important to hybridise the EA with some improvement procedure (the point of view which is in agreement with remarks given in section 4.6). Hence, he chose to use local search instead of ordinary mutation. Due to the difficulty of application of neighbourhood operators to sequences which was mentioned above, Prins decided to decode each solution from the sequential representation before local search. As operators he chose only the ones with O(N 2 ) neighbourhood size, in order to obtain reasonable running times. For each pair of customers he examined 9 different types of neighbours, which could be classified into 3 super-types: • move: move one customer to a place after the second one; move a pair of successive customers after the second one; move a pair, but reverse its order in the insertion place; • swap: swap two customers; swap a pair of successive customers with the second customer; swap two pairs of successive customers; • edge-exchange: exchange two edges in the same route; exchange two edges of different routes in two possible ways. Local search checks all these neighbourhoods together, but choses the first encountered improvement (the greedy approach). This local search is applied to a solution (offspring of crossover) only with some small probability, usually 0.2. When it is launched, it always stops at a local optimum (no truncation). Order crossover Because the sequential representation was used to encode solutions, Prins was able to employ in his MA some standard recombination operators usually applied to permutations. After some initial testing he decided to use order crossover (OX) (see section 4.4.3). This crossover is able to preserve in an offspring a chosen subsequence of customers from one parent and the relative order of some customers from the second parent. It may also happen that some subsequences from the second parent are preserved. This operator may be implemented with O(N ) time complexity. Prins (2004) commented on the operator: ‘the solutions that can be reached by crossover from a given population define a kind of wide neighbourhood that brings enough diversification [compared to local search]’. Thus, by using OX he aimed at preserving diversity of solutions in the population. Initial population: heuristic and random solutions In the initial population Prins put one solution of the savings heuristic, one of the sweep heuristics (they were described in sections 6.3.1 and 6.3.2) and one solution of the algorithm proposed by Mole and Jameson (see Laporte & Semet (2002) and Prins (2004) for a reference). The remaining part of the population was filled with random permutations. All these solutions were stored in the sequential representation, which means they were optimised by the Split procedure. However, 6.4. Metaheuristic algorithms for the CVRP 91 some of them may have been removed from the population if they represented clones of some other solution. Diversity maintenance; restarts Other techniques employed by Prins were: the diversity maintenance policy by forbidding clones of equivalent solutions and restarts of the whole algorithm. Computational results In his experiments, Prins (2004) employed the well-known benchmark set of Christofides and the one of Golden et. al. He included also those instances of the distance-constrained CVRP. Prins’ results were extremely good. The best results for instances of Christofides placed his MA as 2nd in the ranking of many metaheuristics, including the state-of-the-art tabu search. The results of a standard MA setting put his algorithm in the 4th position in the same ranking. Concerning the test set of Golden et al., Prins’ algorithm discovered 13 out of 20 solutions bestknown at the time, on average beating all other algorithms proposed before. Interesting remarks In his paper, Prins (2004) included also several interesting remarks concerning the algorithm. Firstly, he examined the behaviour and results of his MA without each of the techniques he implemented. The most important techniques appeared to be the local search phase and the policy of forbidden clones: when removed from the algorithm, its results deteriorated drastically. Secondly, he noticed that with a large population and a strong diversity preservation policy ‘an excessive rejection rate (that is most GA iterations are unproductive) appears’ and also that ‘this phenomenon worsens with a high mutation [local search] rate pm , because the local search compresses the population in a small cost interval’. In other words, the policy of forbidden clones had strong effect on the admittance of generated local optima into the population. Finally, Prins concluded that local search phase was the most time-consuming of all in the MA, typically absorbing 95% of CPU time. 6.4.5 Cellular genetic algorithm by Alba and Dorronsoro This algorithm, described in two papers by Alba & Dorronsoro (2004, 2006), solves the CVRP and its distance-constrained version. Population structure: toroidal grid The population of this algorithm is cellular, i.e. organised in a 2D toroidal grid, size 10x10, with each node holding only one solution. Hence, each solution has 4 different neighbours in the grid. The authors included in this neighbourhood also the solution itself. They motivated this choice of structure by saying that ‘these overlapped small neighbourhoods help in exploring the search space because the induced slow diffusion of solutions through the population provides a kind of exploration, while exploitation takes place inside each neighbourhood by genetic operators’. In this population, the selection of one recombination parent is made only in each solution’s neighbourhood; the other parent is the solution itself. 92 The capacitated vehicle routing problem Representation: permutation with route delimiters A solution to the CVRP is represented by a sequence of customer indexes (starting from 0 to N − 1) and route delimiters (starting from N ). It allows empty routes by putting two delimiters beside each other. Fitness function with infeasibility penalty Alba & Dorronsoro (2004) modify the objective function of the CVRP by adding a penalty to infeasible solutions, which are apparently potentially accepted to the population. For the CVRP the fitness becomes: f 0 (s) = f (s) + λ0 · overcapacity(s) where λ0 > 0 is a parameter and overcapacity(s) is the sum of route demands that exceed vehicle capacity, with the capacity subtracted. Recombination operator: ERX This algorithm uses edge recombination that ‘builds an offspring by preserving edges from both parents’ (see section 4.4.3 on the recombination operators for the TSP). Mutation: combined neighbourhood operators The mutation operator in this algorithm is a combination of three operators that are also sometimes used as neighbourhood operators (see section 6.4.4): insertion (moves one customer to another location in a sequence), swap (swaps two customers in a sequence), inversion (reverses a subsequence between two locations in a sequence). The authors note that all of these modifications may be applied in an intra-route and inter-route manner. Local search: edge-exchange and λ-interchange Alba & Dorronsoro (2004) employ local search in their GA mainly because ‘it is very clear after all the literature on VRP that local search is almost mandatory to achieve results of high quality’. They use two neighbourhood operators: edge-exchange (2-opt), that works inside a route only, and λ-interchange. The second operator generally exchanges a subset of customers of one route with a subset from some other route, with the sizes of the subsets limited by λ. In this algorithm, only λ = 1 or λ = 2 is used. These authors employ the steepest (best-improvement) version of local search after each recombination/mutation. In case λ-interchange is used, the LS is limited (truncated) to at most 20 iterations (the rationale is not given). Computational results In their first paper, Alba & Dorronsoro (2004) provided some results on 4 Christofides instances of smaller sizes: c50, c75, c100 and c100b, and reported on the best solutions found; no averages were provided. They compared 4 versions of their cGA: without local search, with edge-exchange local search, with edge-exchange and 1-interchange, and with edge-exchange and 2-interchange. The best version of their algorithm appeared to be the last but one, with edge-exchange and 1-interchange. It generated best-known solutions for all the considered instances, but it was not said whether is happened once, several times or in all runs. 6.4. Metaheuristic algorithms for the CVRP 93 The authors also compared their best version to other heuristic and metaheuristic algorithms, including these of Rochat & Taillard (1995) and Prins (2004), and again only with respect to the best-found solutions. This comparison revealed that their best cGA was of equal performance to the two mentioned algorithms. The second paper by Alba & Dorronsoro (2006) was entirely devoted to experiments with their best version of cGA on some other sets of instances: those of Taillard, Van Breedam and Golden et al. (see section 6.2). With respect to the best-found solutions, the cGA was able to improve the best-known solutions for 8 instances of Van Breedam and 1 instance of Taillard (tai75b). It also found 3 best-known solutions for the other Taillard instances of size 75, another instance of Van Breedam, and one of Golden et al. (with additional distance constraint). However, overall results of their algorithm were not that impressive. It was able to get as close as 0% of excess above the best-known solutions for the smallest instances, but 2.4% for the largest one of Taillard (tai385); these statistics reflect the best solutions found, not the averages. Comments Alba & Dorronsoro (2004, 2006) potentially accepted to the population of their cGA some infeasible solutions. On closer examination, however, it appears that almost any feasible solution was preferred in their algorithm to even the best-cost infeasible one, because they used λ0 = 1000 in all experiments. Knowing that for instances of Christofides, Taillard, and also those of Golden et al. (see Alba & Dorronsoro), the best-known solutions have cost at most of several thousand units, and that usually a random local optimum has cost less than 10% worse, its seems that any random local optimum was able to replace any infeasible solution. Hence, this cGA only potentially accepts infeasible solutions; in practice, further iterations of this algorithm reject infeasible solutions with high probability, even if any is present in the initial population. The best version of this algorithm, with 1-interchange, seems to be using almost the same operations for local search and for mutation, thus making mutation almost completely ineffective. When one recalls the definition of λ-interchange, it transpires that 1-interchange is exactly equivalent to insertion and swap, that are used in mutation. Only the inversion operator differs somewhat from local search: inversion, when applied to one route only, becomes edge-exchange in one route; when applied to more than one route it is equivalent to an exchange of 2 edges, but between routes, and this operator is not employed in local search. Therefore, it appears that only inversion, and only applied to more than one route, is not immediately reversed by the subsequent local search. This fact implies that mutation may be of no importance for the whole algorithm, since the best results were obtained when mutation was ineffective. What is also interesting in the papers by Alba & Dorronsoro (2004, 2006), is that they introduced, for the first time in the history of EAs for the CVRP, the edge-recombination operator in their algorithm, but without any motivation or justification for this design decision. Overall, the author of this thesis thinks that the cGA contains several arbitrary design decisions (toroidal grid, infeasibility penalty, mutation, recombination), sometimes hardly motivated, and fair results, which are at the same time difficult to explain given the design. 6.4.6 Other algorithms The presented review of algorithms is by no means a complete one. Many other metaheuristics for the CVRP have been proposed. An interested reader might find the work of Toth & Vigo (2002b) useful when searching for a broader review. 94 The capacitated vehicle routing problem Concerning EAs, some more algorithms for the CVRP were presented by: Berger & Barkaoui (2003), Tavares et al. (2003), Baker & Ayechew (2003) and very recently by Nagata (2007). 6.5 Summary Except for the first heuristic ideas, the presented metaheuristic algorithms seem to be rather complicated machinery, with many components, additional acceleration techniques, diversification strategies, even some exact algorithms for solving subproblems. Nevertheless, the author of this text draws from the review several conclusions concerning the design of metaheuristics, and especially evolutionary algorithms. Mandatory local search It seems that some local search procedure is indispensable in an efficient algorithm. The tabu search by Taillard and Rochat & Taillard are heavily based on LS. Prins argues that LS is necessary in an EA for the CVRP, and so do Alba & Dorronsoro and Nagata. All of these teams employ it in their designs. Fast local search The employed LS has to be fast, as well. In most of the presented algorithms it consumes the larger part of the time of computation, so the design of LS influences the overall running time enormously. The best designs try to accelerate local search e.g. by exploiting the locality of transformations (the incremental update scheme), caching evaluations of neighbours for later use (Taillard 1993, Rochat & Taillard 1995), truncation of LS before it reaches a local optimum (Alba & Dorronsoro 2004) or by applying local search only to a fraction of generated solutions (Prins 2001, Prins 2004). Specialised representation and recombination operators Yet Gendreau et al. (2002) noticed that in case of EAs ‘the classical [binary] approach (. . . ) is not appropriate for sequencing problems, like the TSP or the VRP’. They argued that the bit string representation was not natural for such problems, as well as typical binary crossover or mutation operators (although some of these have been employed, e.g. the two-point crossover by Baker & Ayechew (2003)). They called for the development of specialised ones. It can be seen in the existing EA designs that indeed such operators were developed and employed. There are several lines of such designs. One line is represented by Rochat & Taillard (1995) and Potvin & Bengio (1996). These researchers had the intuition that an offspring had to inherit complete routes from its parents. Rochat & Taillard were probably first to explicitly express this intuition, based on the visual form and similarity of good heuristic solutions to the best-known one. Hence their design of the construction procedure which takes as input a large number of complete routes. Similarly, the RBX by Potvin & Bengio attempts to inherit in an offspring complete routes from two parents. Moreover, the crossover designs by Tavares et al. (2003) are very similar to the RBX. Nevertheless, these designs were not backed up by any verification of the implicit or expressed intuition. The second line of designs is most probably based on the similarity between the CVRP and the TSP, which was mentioned in section 6.1. Thus, after little adaptation, operators like order crossover or partially-mapped crossover were used in EAs for the CVRP (Gendreau et al. 2002). More recently, edge recombination (Alba & Dorronsoro 2004, Alba & Dorronsoro 2006) and edgeassembly crossover (Nagata 2007) were also employed. The two latter build an offspring by preserving edges from parents, but using different approaches. This is a perfectly sensible strategy for the TSP, given the analyses of TSP landscapes presented earlier in sections 4.4.3 and 5.2.2. But 6.5. Summary 95 the sensibility of edge-preservation, although seems to be obvious given the similarity of the two problems, was not checked for the CVRP by such analyses. The SPX crossover by Prins (also recently employed by Jozefowiez et al. (2007)) seems to be a separate case, because it works on the permutation representation and is backed up by the exact decoding procedure. This design reduces the size of the space to be searched, which seems to be working very well. On the other hand, the choice of order crossover to be used as a part of the SPX seems to be based only on optimisation results; this operator was chosen simply ‘after some initial testing’ (Prins 2001). Hence, it appears that the existing recombination designs were based mainly on good intuition, the similarity of the CVRP to the TSP, or results of initial optimisation tests. The applicability of these designs to the CVRP was then checked by final computational experiments. It was not based on any theoretical or empirical insight into the search space. In particular, the author of this text has not yet seen any fitness-distance analysis performed for the CVRP, although some distance measures for CVRP solutions have been proposed recently, as described in the next chapter. Therefore, the systematic construction of recombination operators for the CVRP based on FDA results, which is the theme of the next chapter, is the very first attempt at a recombination design based on empirical search space analysis. Chapter 7 Adaptation of the memetic algorithm to the capacitated vehicle routing problem based on fitness-distance analysis This chapter describes the adaptation of the memetic algorithm scheme to first of the combinatorial optimisation problems considered in this thesis. Therefore, it provides a step-by-step description of components of the algorithm that are specific to the CVRP. Most importantly, a description is given of the adaptation of recombination operators to the problem based on the fitness-distance analysis. 7.1 Representation For the representation of CVRP solutions 3 options were considered: path (several times used for the TSP (Michalewicz 1996)), adjacency, and sequential (see section 6.4.4). From the three identified options, the adjacency representation was finally chosen for the designed algorithm. In the TSP case (see Michalewicz (1996)) this representation is an array of size N which stores indexes of vertices. At index vi in this array the index of vj , its successor on a route, is stored. Adaptation of this array to the CVRP case should take into account that the depot vertex usually has more than one successor. In the pessimistic case it may have them as many as N (for a ‘daisy-shaped’ solution). Therefore, the depot successors are separated from the others in another array. This additional array for each route stores the index of the first customer on the route; hence the name of the array: F irst. The array of successors of customer vertices (N ext) remains unchanged, since each customer has always exactly one successor in a valid CVRP solution. An exemplary solution s = {(v0 , v1 , v5 , v4 ), (v0 , v2 , v6 ), (v0 , v3 , v7 )} in this adjacency representation is shown in figure 7.1. First: Next: 1 2 3 1 2 3 1 2 3 4 5 6 7 5 6 7 0 4 0 0 Figure 7.1: The adjacency representation of an exemplary solution s = {(v0 , v1 , v5 , v4 ), (v0 , v2 , v6 ), (v0 , v3 , v7 )}. 97 98 Adaptation of the memetic algorithm to the capacitated vehicle routing problem Next: 1 2 3 4 5 6 7 5 6 7 -1 4 -2 -3 Figure 7.2: The modified N ext array for the adjacency representation of an exemplary solution. Technically speaking, the number of routes has to be stored together with the two arrays in order to properly read their contents. Overall, O(N ) integers are required to store a solution. Concerning the time complexity, the access to the first customer on a route or to a successor of a given customer requires O(1) time. However, the time to check the index of the route a given customer belongs to is limited by O(N ) (this requires that all routes are traversed, pessimistically). This last access time may be improved, though only by a constant value, by a slight modification of the array N ext. Instead of having 0 as a successor of each route’s last customer, the index of a route may be stored (as a negative number, to avoid ambiguity). The form of the modified N ext array is shown in figure 7.2. This way the time to check the index of a route of a given customer is reduced to O(nti ), where nti is the number of customers in that route. This becomes O(N ) in a pessimistic case, however. The chosen representation does not induce problems with local search, like the sequential representation does (see section 6.4.4). And although it is as ‘natural’ and memory-consuming as the path representation, it requires less time for some basic operations. Therefore, due to this technical advantage (which is probably the most important advantage of a representation, as emphasised in section 4.1), it was chosen for implementation. Note that this representation was devised by the author of this text in his master’s thesis (Kubiak 2002), independently on Tavares et al. (2003) and before Kytojoki et al. (2007). 7.2 Fitness function and constraints The fitness function for the designed memetic algorithm is simply the objective function of the CVRP (see section 6.1). Infeasible solutions are not accepted by any stage of the algorithm: neither in local search, nor in the population of the MA. This is a realisation of some of the design guidelines described earlier in section 4.2. It seems, however, that acceptance of infeasible solutions with a penalty in the form used by Alba & Dorronsoro (2004) (but with other parameter values) would also be a sensible option. 7.3 Local search The general scheme of local search was provided in algorithm 1, page 11. Here, the specifics concerning the adaptation of local search to the CVRP are given. According to the remarks presented in section 4.6, several design decisions have to be made before implementation. Concerning the local search type, in this work only the basic ones, steepest and greedy, are considered. Any more complex version (e.g. simulated annealing or simple tabu search) would result in some additional design choices to be made or parameters to be set. Therefore, these two simple types were preferred. No local search truncation is performed in this work. Although this may improve the efficiency of search, it is yet another way of introducing a parameter to the method (e.g. the truncation probability or the number of iterations of search). The author of this thesis wanted to avoid them. 99 7.3. Local search Local search is applied to all solutions in the population of the designed MA. Greedy LS is applied to initial solutions, while the steepest one to offspring of genetic operators. During some initial testing, this appeared to be the fastest combination. Another important choice with respect to local search concerned neighbourhood operators. In the literature there are many possibilities of the choice described: e.g. edge-exchange (λopt), Oropt, λ-interchange, b-cyclic k-transfer exchanges (Laporte & Semet 2002, Gendreau et al. 2002, Prins 2001, Prins 2004). However, it is not the goal of this work to consider, implement and evaluate all operators from the literature, but to have a good-enough LS in the fitness-distance analysis and the memetic algorithm. After some experimental testing the author decided that having 3 operators was enough to achieve this goal; the choice was arbitrary to some extent. These operators will be described below. Considerable effort was devoted to the efficiency of these operators and local search: • additional data structures enriched the adjacency representation; • all the operators were implemented using the incremental update scheme; • special lexicographic order of configurations of one operator was introduced; • cache for neighbour evaluations for 2 operators was designed, implemented and evaluated; • low-level implementation changes were introduced. The mentioned data structures are: • an additional array of size O(N ) with the current cost of each route, • an additional array of size O(N ) with the current sum of demands of customers on each route. Some remarks concerning the incremental update scheme are given with the description of each operator. Later, the additional acceleration techniques are presented. 7.3.1 Merge of 2 routes This merge operator was inspired by the algorithm of Clarke & Wright (1964), described earlier in section 6.3.1. Therefore, it will be sometimes referred to as the CW operator. It works by merging sequences of customers from two routes into one sequence (one route), if only such a merge does not lead to a violation of the capacity constraint. The idea of the merge operator is shown in figure 7.3 in one of 4 possible modes. The left-hand side of the figure presents a solution (two routes) before merge; the scheme in the middle shows the same solution being modified; the right-hand scheme presents the solution after modification. A merge removes exactly 2 edges from the initial solution; these are the dotted edges connected to the depot (v0 ). Then, each mode connects the two routes by adding one edge that links together the vertices detached earlier from the depot; this is the dashed edge. The modes may be described in the following way. Let the merged routes ti , tj be: ti = (v0 , vip , . . . , vik ) tj = (v0 , vjp , . . . , vjk ) Let the new route, the one resulting from merge, be denoted by tk . In each mode the route has the form: 100 Adaptation of the memetic algorithm to the capacitated vehicle routing problem Figure 7.3: One mode of the merge (CW ) operator. 1. tk = (v0 , vip , . . . , vik , vjp , . . . , vjk ) 2. tk = (v0 , vjp , . . . , vjk , vip , . . . , vik ) 3. tk = (v0 , vip , . . . , vik , vjk , . . . , vjp ) 4. tk = (v0 , vik , . . . , vip , vjp , . . . , vjk ) For example, the third mode builds route tk by traversing all customers in ti (in the same order), skipping the depot, traversing the customers from tj , but in the reverse order (starting from the end of this route), and returning to the depot. This mode is shown in figure 7.3. The size of the neighbourhood NCW (s) of a solution s depends only on the number of routes T (s) in the solution (Kubiak 2002): |NCW (s)| = O(T (s)2 ) An application of the merge operator may potentially lead to an infeasible solution, if the sum of demands of the merged routes exceeds the vehicle capacity. Therefore, checking the feasibility of a neighbour in the adjacency representation requires O(N ) time, because the merged routes have to be traversed first, in order to compute their demand. This can be accelerated if with the adjacency representation also a vector of the routes’ demands is stored. In such a case the feasibility check requires only O(1) time. The evaluation of ∆f = f (sn ) − f (s), the difference between the cost of a neighbour sn and the current solution s, requires O(1) time: in each mode the cost of 3 edges has to be accessed and added with appropriate signs. Once a neighbour sn is chosen to become a new solution (e.g. in local search), the modification of s takes O(1) time in modes 1, 2 and O(N ) time in modes 3 and 4 (the order of customers in one route has to be reversed in the latter modes). This merge operation was chosen for a neighbourhood operator because, intuitively, in the CVRP it is important to have in a solution as few routes as possible (in metric instances, in any case). It is very likely that the number of routes is positively correlated with the cost of a solution, and although it was not verified empirically, minimising the number of routes appears to be a good idea when searching for a least-cost solution. Another argument for implementing the operator was that Clarke and Wright algorithm generates fair-enough solutions in very short time. 101 7.3. Local search Figure 7.4: One inter-route mode of the edge-exchange (2opt) operator. 7.3.2 Exchange of 2 edges The idea of this operation comes from the TSP, where it is sometimes referred to as 2opt. It performs the substitution of two edges in a solution by some other two edges in such a way that the resulting solution is a valid one. In the TSP the exchange of edges is performed only within a route (there is one route all the time). In the CVRP also two different routes may be considered. Figure 7.4 presents the edge-exchange that is characteristic to the CVRP: one of the two possible inter-route modes. It removes exactly two edges (dotted) from the two routes shown. The parts of routes are then connected by dashed edges: the presented mode links the beginning part of one route with the beginning part of the other into one route, and the end with the end into the other route. The other mode (not shown) links the beginning part of one route with the end part of the other, and vice versa. Formally, the intra-route mode modifies one route ti of a solution: ti = (v0 , vi,1 , . . . , vi,p , vi,p+1 , . . . , vi,k , vi,k+1 , . . . ) to become: t0i = (v0 , vi,1 , . . . , vi,p , vi,k , . . . , vi,p+1 , vi,k+1 , . . . ) One can see that in this mode not only two edges are exchanged, but also a part of route ti between vi,p+1 and vi,k is reversed. Let’s assume that the inter-route modes perform their modifications on routes ti , tj : ti = (v0 , vi,1 , . . . , vi,p , vi,p+1 , . . . , vi,n(ti ) ) tj = (v0 , vj,1 , . . . , vj,k , vj,k+1 , . . . , vj,n(tj ) ) First inter-route mode, the one shown in figure 7.4, alters the routes to have the form: t0i = (v0 , vi,1 , . . . , vi,p , vj,k , vj,k−1 , . . . , vj,1 ) t0j = (v0 , vi,n(ti ) , . . . , vi,p+1 , vj,k+1 , . . . , vj,n(tj ) ) while the second mode changes them to have the form: t00i = (v0 , vj,1 , . . . , vj,k , vi,p+1 , . . . , vi,n(ti ) ) t00j = (v0 , vi,1 , . . . , vi,p , vj,k+1 , . . . , vj,n(tj ) ) 102 Adaptation of the memetic algorithm to the capacitated vehicle routing problem It can be seen that the first mode reverses some parts of the modified routes, while the second mode reverses nothing. The size of this neighbourhood N2opt (s) (all intra- and inter-route modes together) depends on the number of customers in an instance and the number of routes in a solution (Kubiak 2002): |N2opt (s)| = O((N + T (s))2 ) An application of this edge-exchange operator does not generate infeasible solutions in case of the intra-route mode: the set of customers of a route is never modified, so the capacity constraint is never violated. The cost of the modified solution may be evaluated in constant time in the adjacency representation, so overall this modification may be evaluated in O(1) time. However, the inter-route modes may produce infeasible solutions, since route contents change. Therefore, before the the cost of the modified solution is computed, its feasibility status has to be checked. Without any additional measures, this takes O(N ) time, because the demands of parts of routes have to be computed. After this computation, the cost of the modified solution may be evaluated in constant time (two subtract and two add operations). The modification of solution s by means of the edge-exchange operator to produce a neighbour sn takes O(N ) time, since some modes require that a part of a route is reversed. The motivation for using this operator in the CVRP has several sources: • in the TSP it is a very fast operator in performing basic improvements of routes, and although it is not able to conduct some more complicated changes, it is very frequently used as a base point for local search; • again, in the symmetric TSP case this operator has the largest correlation length in a group of several operators that are usually considered for implementation (Merz 2000) and this fact makes the operator very suitable for local search; perhaps it is similar in the CVRP case; • the inter-route modes of 2opt make it possible to exchange some sequences of customers between routes and such an ability seems to be necessary to address the grouping subproblem of the CVRP; • 2opt seems to be the most elementary operation on edges of a CVRP solution; in the opinion of the author it is hardly possible to change less edges in a solution to produce some other one and hence 2opt nicely fits the general definition of a neighbourhood operator that it should produce solutions that are close to the original in some sense (see section 2.4.1). 7.3.3 Exchange of 2 customers This operator is based on the 1-interchange which is sometimes used in the literature (see section 6.4.5). It exchanges (swaps) two customers between their locations in a solution. In the CVRP it has two modes: the intra- and inter-route one. Formally, the intra-route mode changes route ti : ti = (v0 , . . . , vi,p−1 , vi,p , vi,p+1 , . . . , vi,k−1 , vi,k , vi,k+1 , . . .) into: t0i = (v0 , . . . , vi,p−1 , vi,k , vi,p+1 , . . . , vi,k−1 , vi,p , vi,k+1 , . . .) except for the case when vi,k is the immediate successor of vi,p (vi,k = vi,p+1 ), in which a swap is equivalent to the intra-route 2opt on edges (vi,p−1 , vi,p ) and (vi,k , vi,k+1 ). 103 7.3. Local search Figure 7.5: Inter-route mode of the customer-exchange (swap) operator. The inter-route mode, shown in figure 7.5, modifies two routes ti , tj : ti = (v0 , . . . , vi,p−1 , vi,p , vi,p+1 , . . .) tj = (v0 , . . . , vi,k−1 , vi,k , vi,k+1 , . . .) so they become: t0i = (v0 , . . . , vi,p−1 , vi,k , vi,p+1 , . . .) t0j = (v0 , . . . , vi,k−1 , vi,p , vi,k+1 , . . .) The size of this neighbourhood depends only on the number of customers: |Nswap (s)| = O(N 2 ) An application of this operation may produce an infeasible neighbour only in the inter-route mode and when demands of the exchanged customers differ. Then, the demand of one route increases after swap and it may violate the capacity constraint. Thus, the constraint has to be checked, but this takes only O(1) time. The cost of a neighbour may also be computed efficiently in constant time, since there are always 4 edges exchanged in a solution, the ones incident to the swapped customers. When a neighbour of swap is chosen to become the new solution in local search, the necessary modification of a solution also takes O(1) time. The main motivation for using this operator in the CVRP was to have in local search the ability to address the grouping subproblem even closer than 2opt does. It seems that swap, by exchanging single customers between routes, is more dedicated to this goal than 2opt, which exchanges whole subsequences of customers, and hence may be more prone to infeasibility of neighbours. 7.3.4 Composition of neighbourhoods The chosen operators were implemented in a way which allows free composition of any neighbourhoods into one. The goal of this design was to potentially increase search abilities of the LS procedure: if a solution is a local minimum in one neighbourhood, it does not have to be a minimum in some other one. Also, the feasibility of neighbours may have some importance here. For example, swap may help 2opt optimise the load of vehicles. 104 Adaptation of the memetic algorithm to the capacitated vehicle routing problem 7.3.5 Acceleration techniques Speeding up 2opt feasibility tests The main computation cost of finding an improving 2opt move is related to feasibility tests of inter-route 2opt moves. Kindervater & Savelsbergh (2003) described a technique which reduces this cost to a constant. It is based on the observation that demands of parts of routes may be stored and simply updated when iterating over neighbours of a current solution in a right order. This order is called a lexicographic one. An example of such an order is given in figure 7.6. The top of the figure shows a 2opt removing edges (2, 3) and (8, 9) from two routes (the dashed edges); the demands of parts (1, 2), (3, 4, 5, 6), (7, 8), (9, 10, 11, 12) are required for a feasibility test. The bottom of figure 7.6 shows the immediately next 2opt move (in the lexicographic order), the one removing edges (2, 3) and (9, 10). The required demands of parts (1, 2), (3, 4, 5, 6) have just been computed in the previous iteration and may be simply reused; the demands of parts (7, 8, 9) and (10, 11, 12) may be computed from the previous values at the cost of two additions. This reduces the time of a feasibility test from O(N ) to O(1) if performed in a loop and in the lexicographic order. Figure 7.6: Edge exchanges in the lexicographic order: 2opt for edges (2, 3), (8, 9) (top) and 2opt for edges (2, 3) and (9, 10) (bottom) Cache for evaluations of neighbours As described in section 4.6.5, this technique involves caching of (storing in auxiliary memory) values of the difference in objective functions between local search iterations. The general scheme The operations of merge, 2opt and swap modify only a small fragment of a CVRP solution. Large parts of this solution stay intact after a performed modification. Consequently, large number of moves which modified the original solution may also be performed for the new one, and the changes of the objective function stay the same. Therefore, there is no need to recompute these changes; they may be stored in cache for later use. Nevertheless, some moves from the original solution are changed by the actually performed move. These modified moves must not be stored; they have to be removed from or marked as invalid in the cache. The set of such moves strongly depends on the performed move. 7.3. Local search 105 These remarks lead to algorithm 9 of local search with cache (compare to algorithm 1): Algorithm 9 LocalSearch(s). initialise empty cache repeat {main local search loop} s0 = s betterF ound = false for all sn ∈ N (s) do {iterate over the neighbours of s} if ∆f (sn , s) is valid in the cache then ∆f = ∆f (sn , s) {take ∆f from the cache} else ∆f = f (sn ) − f (s) {compute ∆f from scratch} ∆f (sn , s) = ∆f {store ∆f in the cache for later use} if f (s) + ∆f < f (s0 ) then {check if this is the best neighbour so far} s0 = sn {remember the better neighbour} betterF ound = true if betterF ound then for all sa ∈ N (s) which are affected by the chosen move from s to s0 do mark ∆f (sa , s) as invalid in the cache s = s0 {proceed to the better neighbour} until betterF ound == false {stop at a local optimum} return s One may notice the possible source of gain in local search speed: instead of computing ∆f (sn , s) = f (sn ) − f (s) for each neighbour sn of s from scratch, the value is taken from the cache if it was computed earlier. However, the operation of cache update has to be called after a move is found and performed, in order to ensure the cache stays valid. This is a possible source of computation cost. The goal of caching is to make the gain higher than the cost. Cache requirements in the CVRP Firstly, in the CVRP not only the objective function matters. There is also the capacity constraint, which involves complete routes, not only single customers or edges. Thus, if the capacity constraint for a neighbour is violated then this neighbour is infeasible; such moves are forbidden in local search. Therefore, not only the change in the objective function has to be stored in the cache, but also the status of feasibility of a neighbour. Secondly, because the considered operators merge, 2opt and swap have different semantics, cache must be designed and implemented independently for each of them (separate data structures). Thirdly, the local search considered here assumes that the neighbourhoods of these operators may be composed to form one large neighbourhood. It also means that the order of execution of operators cannot be determined in advance (it may be e.g.: merge, merge, 2opt, swap, 2opt,. . . ; it may be any other order). In case of cache this possibility creates a requirement that when one type of move is performed, then cache of all operations has to be updated. Finally, the neighbourhoods of the operators have different sizes; the neighbourhood of 2opt and swap is considerably larger than the one of merge. Moreover, the merge operation is very specific: the number of applications of this operator is always very limited by the minimum possible number of routes. Initial experiments with MAs indicated that the number of applications of this operator amounts to 5–10% of the total number of applications of all operators; the majority of search effort is spent on 2opt and swap. Therefore, the cache was implemented for these two operators only. The size of memory for the cache structures is the same as the size of the related neighbourhoods. 106 Adaptation of the memetic algorithm to the capacitated vehicle routing problem Implementation issues Another factor which influences the speed of local search is the implementation itself. On the low level, making use of certain techniques accelerates computation. The techniques used in this study were method inlining and avoiding calls to expensive copy constructors for large objects by passing references to them. 7.3.6 Measuring the speed of local search Local search execution profiles The author decided to make detailed profiles of local search executions in order to check the cost of and gain from the acceleration techniques. Analytical computation of these values is difficult. Two variants of LS were tested: greedy and steepest. The neighbourhood was merge followed by composed 2opt and swap. The acceleration techniques were employed in 3 variants: without cache and lexicographic order (denoted ‘nc’), with cache and lexicographic order (‘c’), and the most promising one: with swap cache, without 2opt cache and with lexicographic order (‘c∗ ’). All the variants considered in this experiment included low-level implementation changes. Only two instances were tested in detail, tai100d and c120. One run of multiple start local search (MSLS) was conducted for each setting and instance, consisting of 5 independent LS processes. The runs were limited and short because code profiling considerably increased the run time due to injection of timing routines into the original code. The profiling results of greedy LS for instance c120 are presented in table 7.1. They contain the times of operations for each profiled LS setting. Also percentages of the total run time of the base version (nc) are shown. The merge operator is not reported due to insignificant cost of its operations (1–2% of the total run time in all runs). Table 7.1: Times of execution of search and cache operations in greedy LS, instance c120. nc operation c∗ c time [s] (percent) time [s] (percent) time [s] (percent) 2opt: 2opt: 2opt: 2opt: eval. of neighbours cache read/write cache update total search cost 146.3 0.0 0.0 146.3 (50.4) (0.0) (0.0) (50.4) 47.6 30.1 19.8 97.5 (16.4) (10.4) (6.8) (33.6) 57.1 2.6 0.0 59.7 (19.7) (0.9) (0.0) (20.6) swap: swap: swap: swap: eval. of neighbours cache read/write cache update total search cost 33.2 0.0 0.0 33.2 (11.4) (0.0) (0.0) (11.4) 13.3 8.1 5.1 26.5 (4.6) (2.8) (1.8) (9.2) 11.5 7.2 5.1 23.8 (4.0) (2.5) (1.8) (8.2) operators: total 179.5 (61.8) 124.0 (42.7) 83.5 (28.8) greedy LS: total 290.2 (100.0) 209.9 (72.3) 173.1 (59.6) One can see in the table that the variant without the algorithmic techniques (nc) was the slowest one: it took 290.2 s, while the search itself took 179.5 s. Very high cost of search by the 2opt operator is clearly visible (50.4% of the total run time). The cost of swap is lower, although it is also considerable (11.4%). Consequently, there is space for improvement in this variant. The introduction of cache and 2opt lexicographic order decreased the time of computation to 72.3% of the base variant, which is good news. 2opt evaluation time dropped to 16.4%. However, 2opt cache introduces new cost components: cache reads and writes (10.4%), and cache updates 7.3. Local search 107 after a performed move (6.8%). In total, the 2opt search time was reduced from 50.5% to 33.6%. The situation is similar for swap: the evaluation time dropped from 11.4% to 4.6%, but cache management (read/write and update) took another 4.6%, making the cache only slightly profitable. The last variant, c∗ , used 2opt lexicographic order but did not use 2opt cache; it seemed that the cache management cost for this operator was too high. The results show that this approach gives the highest gain for the greedy LS: the evaluation cost for 2opt amounts to 19.7%, while there is no cache management cost (the figure 0.9% reflects the time of calls to empty cache which is not updated at all). In conjunction with gain from swap cache and the lexicographic order, the overall time of computation was reduced by 40.4% of the base variant, nc. The analysis of cache usage in the c variant demonstrated that only 28.1% of 2opt cache was used, while for swap it was 58.8%. As predicted, the cached values were rarely used in the greedy version, because improving steps were usually found very quickly (the neighbourhood was not completely searched through, sparsely filling cache with valid values). Moreover, these numbers indicate that 2opt updates invalidated large parts of cache, while for the swap operator most of the cache remained valid after an improving move was performed. In case of instance tai100d (the detailed results are not reported) the cache management cost was generally higher, making cache usage rather expensive. The steepest version differed mainly in the cache usage: 62.0% of 2opt neighbours were evaluated based on the cache contents; as much as 77.2% in the case of swap. Therefore, gains from cache were higher. To summarise this experiment, the execution profiles indicate high 2opt cache management cost which is generally compensated by gain in evaluation of neighbours, but results in no further significant improvement. On the other hand, the lexicographic order in 2opt and the cache of swap improve the efficiency of greedy local search. The improvement is even higher for the steepest version. Therefore, except for 2opt cache, all the tested techniques should be used in local search. Local search execution times In order to assess the influence of all the acceleration techniques on local search in an unbiased way, an experiment without injected timing routines was conducted. It also involved several instances of different sizes to check the effect of scale. Two local search versions were tested: greedy and steepest. Also, two more versions with respect to the acceleration techniques were involved: without any of them (denoted ‘before’) and with all of them except for the costly 2opt cache (denoted ‘after’). Each combination was run in a multiple start local search (MSLS) algorithm which started from random solution. MSLS was run 10 times; each run stopped after 100 independent LS processes. Average times of computation in this experiment are given in table 7.2. It can be seen in the table that without the acceleration techniques the computation is approximately 10 times longer. This is a constant effect, independent on instance size. The steepest algorithm reduces the time of computation a bit more than the greedy one (by 91.6% compared to 90%, on average). The overall result is very good, so this experiment confirms that the chosen accelerations techniques should be used in LS. Large experiments with the memetic algorithm, presented further in section 7.9, were practically possible mainly due to this positive effect of the acceleration techniques. 108 Adaptation of the memetic algorithm to the capacitated vehicle routing problem Table 7.2: Times of execution of local search without and with additional speed-up techniques, and the reduction in the time of computation due to the techniques. before [s] steepest n-1-2 after [s] reduction [%] before [s] c50 tai75d tai100d c120 tai150b c199 6.0 24.0 50.7 89.5 172.3 307.8 0.5 2.5 5.5 9.2 17.7 29.5 91.7 89.6 89.2 89.7 89.7 90.4 22.3 98.2 218.5 385.0 792.2 1816.5 2.0 8.1 18.5 31.0 68.8 147.1 91.0 91.8 91.5 91.9 91.3 91.9 - - 90.0 - - 91.6 avg. 7.4 greedy n-1-2 after [s] reduction [%] instance Initial solutions Local search and memetic algorithms usually start computation from complete solutions of the given problem. This section presents methods used for generating such initial solutions for these algorithms. 7.4.1 Heuristic solutions Several heuristics have been implemented: • the Clarke and Wright savings algorithm (see section 6.3.1), • the Gillet and Miller sweep algorithm (see section 6.3.2), • the First-Fit Decreasing algorithm (see section 6.3.3), • the Gain-Fit Decreasing algorithm. The first algorithm is simply the steepest local search with the merge (CW ) operator. It starts from a ‘daisy-shaped’ solution. It is deterministic. In the second algorithm a TSP heuristic had to be designed. The well-known algorithm based on a minimum spanning tree was chosen to solve the TSP subproblems (Cormen et al. 1990). The second algorithm was also slightly modified so it generated more than one unique solution. As mentioned in section 6.3.2, the algorithm may be initialised with different straight lines going through the depot. In the implementation different solutions are obtained exactly in this way, by starting from straight lines going through the depot and different customer vertices. The First-Fit Decreasing algorithm, just like the Gillet and Miller one, uses the minimum spanning-tree heuristics to solve the TSPs. Gain-Fit Decreasing heuristic The First-Fit heuristics, although well-performing in clustering demands and minimising the total number of routes, does not take the distances between customers into account. To amend this, the author developed the Gain-Fit Decreasing algorithm, which is presented in algorithm 10. The main idea of the method is to choose for the currently considered customer the cluster which is closest among all clusters, in hope it creates consistent ones. However, such a simple extension of the First-Fit Decreasing heuristic hardly results in any change: when there are no clusters to choose from (as it is always at the very beginning), the customer is still put in the first 7.4. Initial solutions 109 possible one. Therefore, the Gain-Fit Decreasing algorithm is initialised with a number of cluster seeds which should be distant from each other. Algorithm 10 GainF itDecreasing(ClusterSeeds) Clusters = ∅ sort the customers by decreasing demands d(v); the list Cust reflects that order for all seed ∈ ClusterSeeds do c = {seed} Clusters = Clusters ∪ {c} d(c) = d(seed) Cust = Cust \ c for i = 1 to |Cust| do vi is the i-th customer from the list Cust P ossibleClusters = {c ∈ Clusters : d(c) + d(vi ) ≤ C} if P ossibleClusters 6= ∅ then c0 = arg minc∈P ossibleClusters {distT oCluster(vi , c)} c0 = c0 ∪ {vi } d(c0 ) = d(c0 ) + d(vi ) else c0 = {vi } d(c0 ) = d(vi ) Clusters = Clusters ∪ {c0 } return Clusters The function distT oCluster(vi , c) defines the distance of customer vi to cluster of customers c. In the area of clustering (Larose 2005, Weiss 2006) there are multiple ways of defining such a P distance, the simplest ones being minv∈c c(vi , v) (single linkage), v∈c c(vi , v) (average linkage) or maxv∈c c(vi , v) (complete linkage). The result of the algorithm is converted into a solution by the minimum spanning-tree method. Still, the method for choosing cluster seeds has to be given. This was directly inspired by the procedure used by Weiss (2006) to initialise the k-Means clustering algorithm. The method selects the first seed as the customer with the largest demand. It proceeds by selecting each next seed to be the one farthest from the current ones. In this way a set of seeds distant from each other is selected. 7.4.2 Random solutions The procedure for building a random solution is presented in algorithm 11. Algorithm 11 Building a random CVRP solution. 1: repeat {until a feasible solution is found} 2: build a random sequence seq of all customers 3: draw the number of routes the distribution ¡ N −1 ¢T from P (T = splits + 1) = splits · 2−(N −1) for splits = 0, . . . , N − 1 4: divide seq randomly into T subsequences 5: build solution s from the subsequences 6: until isF easible(s) 7: return s This design is an attempt at ensuring that all CVRP solutions of a given instance have equal probability of being constructed. This is achieved by generating first a random sequence of customers (line 2). Then, this sequence is randomly split into a number of routes (line 4). The probability of having T as the number of routes is proportional to the number of different CVRP 110 Adaptation of the memetic algorithm to the capacitated vehicle routing problem solutions that may be generated with those T routes, i.e. constitutes a binomial distribution (line 3). Finally, if an infeasible solution is constructed, the process is repeated from scratch. Such a design was also necessary in order to have unbiased random solutions available for the fitness-distance analysis. 7.5 7.5.1 Fitness-distance analysis New distance metrics for solutions of the CVRP In order to perform the fitness-distance analysis of the problem, some measures of distance between solutions have to be defined. The distance metrics presented in this section correspond to certain structural properties of solutions of the CVRP: existence of certain edges (or even paths) or specific ways of partitioning of the set of customers into routes (clusters). The FD analysis attempts to answer the question which properties are important for the objective function, i.e. which correlate with its values. Although the presented metrics might seem simple at first sight, their strength lies in the fact that they are linked directly to the mentioned properties of CVRP solutions, not to any specific solution representation. Distance in terms of edges: de The idea of this metric is based on a very similar concept formulated for the TSP: the number of common edges in TSP tours (Boese 1995). Due to similarity between solutions of the TSP (one tour) and the CVRP (a set of disjoint tours/routes) the idea of common edges may be easily adapted to the latter. In order to properly define the distance metric some definitions are required: i )−1 n n o ³ n(t[ o´ n o E(ti ) = {v0 , vi,1 } ∪ {vi,j , vi,j+1 } ∪ {vi,n(ti ) , v0 } j=1 E(s) = [ E(ti ) ti ∈s E(ti ) is a multiset of undirected edges appearing in route ti . E(s) is a multiset of edges in solution s. The notion of a multiset is required here, because routes in some solutions of the CVRP may include certain edges twice (the edges to and from the depot). Using the general concept of distance between subsets of the same universal set, as defined by Marczewski & Steinhaus (1958) (cited after Karoński & Palka (1977)), the distance de between two solutions s1 , s2 of the same CVRP instance may be defined as: de (s1 , s2 ) = |E(s1 ) ∪ E(s2 )| − |E(s1 ) ∩ E(s2 )| |E(s1 ) ∪ E(s2 )| Due to the fact that de is only a special case of the Marczewski-Steinhaus distance, it inherits all its properties of a metric; its values are also normalised to the interval [0,1]. This distance metric perceives solutions of the CVRP as multisets of edges: solutions close to each other will have many common edges; distant solutions will have few common ones. However, closer investigation of the metric reveals that it is not intuitively ‘linear’ (although it is ‘monotonic’), e.g. de = 0.5 does not mean that exactly half of each E(si ) is common; 50% of common edges implies de ≈ 2/3. 111 7.5. Fitness-distance analysis Distance in terms of partitions of customers: dpc The concept behind the second distance metric is based on the ‘cluster first/route second’ heuristic approach to the CVRP (Aronson 1996, Laporte & Semet 2002) (see also section 6.3.2): first find a good partition of customers into clusters and then try to find routes (solve TSPs) within these clusters, separately. According to this idea the distance metric should identify dissimilarities between solutions perceived as partitions of the set of customers. Therefore, it should directly address the grouping subproblem in the CVRP. An example of a distance metric for partitions of a given set may be found in the work by Karoński & Palka (it is even more generally defined there, for hypergraphs or binary trees). This example is easily adaptable to solutions of the CVRP. Let us define: C(s) = {c1 (s), c2 (s), . . . , cT (s) (s)} ci (s) = {vi,1 , vi,2 , . . . , vi,n(ti ) } σ(ci (s1 ), cj (s2 )) = |ci (s1 ) ∪ cj (s2 )| − |ci (s1 ) ∩ cj (s2 )| |ci (s1 ) ∪ cj (s2 )| C(s) is a partition of the set of customers into clusters; one cluster, ci (s), holds customers from route ti of s; σ(·) is the Marczewski-Steinhaus distance between two clusters (sets). According to Karoński & Palka (1977), the distance between solutions may be defined as: n T (s1 ) T (s2 ) o T (s2 ) T (s1 ) dpc (s1 , s2 ) = 1/2 max min σ(ci (s1 ), cj (s2 )) + max min σ(ci (s1 ), cj (s2 )) i=1 j=1 i=1 j=1 This function is a distance metric for partitions; it is also normalised. It is not exactly a metric for solutions of the CVRP, because dpc (s1 , s2 ) = 0 does not imply s1 = s2 (the number of solutions which are not discriminated by dpc may be exponentially large). The formula for dpc has the following sense: firstly, the best-possible assignment of clusters from C(s1 ) to clusters from C(s2 ) is made (the one which minimises σ(·)), and vice versa; that is the idea behind the internal min operators. Secondly, two worst assignments are chosen among those pairs (the max operators), and distance in those assignments is averaged to form the overall distance between partitions. Thus, it may be concluded that dpc is somehow ‘pessimistic’ in the choice of ‘optimistic’ matches of clusters; the same observation was made by Robardet & Feschet (2000) in the context of clustering algorithms. This mixture of max and min operators in dpc makes interpretation of its values difficult. Certainly, values near to 0 indicate great similarity of solutions. However, larger values do not necessarily indicate very dissimilar partitions; it is sufficient that there are ‘outliers’ in partitions, which can hardly be well assigned to clusters in the other solution, and the max operator will result in large values, implying distant solutions. Distance in terms of pairs of nodes: dpn The third distance metric, dpn , is based on the same idea as dpc : distance between solutions viewed as partitions of the set of customers. However, this idea has a different, more straightforward, mathematical formulation in dpn . Here, the Marczewski-Steinhaus (1958) concept of distance is applied to sets of pairs of nodes (customers). Let’s define: n(ti )−1 n(ti ) n o [ [ P N (ti ) = {vi,j , vi,k } j=1 k=j+1 112 Adaptation of the memetic algorithm to the capacitated vehicle routing problem P N (s) = [ P N (ti ) ti ∈s P N (ti ) is the set of undirected pairs of nodes (customers) which are assigned to the same route ti (it is a complete graph defined over the set of customers in route ti ). The depot node is not considered here, since every customer is in the same cluster with it. P N (s) is the set of all undirected pairs in solution s. The distance dpn between solutions is defined as: dpn (s1 , s2 ) = |P N (s1 ) ∪ P N (s2 )| − |P N (s1 ) ∩ P N (s2 )| |P N (s1 ) ∪ P N (s2 )| Similarly to dpc , this function is not exactly a metric for solutions of the CVRP, but for partitions implied by those solutions. The formula for dpn has a more straightforward sense than the one for dpc ; here, the value of distance roughly indicates how large are parts of sets of pairs which are not shared by two compared solutions. If dpn = 0 then two solutions imply identical partitions; dpn = 1 implies completely different partitions (not even one pair of nodes is assigned to a route in the same way in s1 and s2 ). A concept of cluster contamination, very similar in spirit to dpn , was formulated by Weiss (2006) in the context of clustering text documents. 7.5.2 Distance measures defined in the literature In recent years some more distance measures and metrics for solutions of the CVRP have been described in the literature. These are: • the edit distance (Sorensen 2003, Sorensen 2007), • the add-remove edit distance for sequential representation (Sorensen et al. 2005), • the stop-centric and route-centric distance measures (Woodruff & Lokketangen 2005). The author managed to analyse and implement the two first measures, so they are described in the sections below. He did not manage, however, to implement the measures developed at the same time by Woodruff & Lokketangen, due to limited time, so these distances are not considered here. Nevertheless, it is important to remember that such measures exist and should also be, in near future, compared to those described in this work. It is not the purpose of this thesis to provide detailed definitions of all existing distance measures for solutions of the CVRP. Therefore, in the sections below the measures are only shortly described and their properties discussed. For the detailed definitions and implementation issues the interested reader is referred to the cited publications. The edit distance for CVRP solutions: deu Sorensen (2003, 2007) defined a distance measure for solutions of the CVRP based on the concept of the edit distance between strings. An edit operation on a string is a modification of one its character by means of an elementary operation: insertion, deletion or substitution. Sorensen describes how to define an edit distance on permutations. Further, he extends this distance to the case of permutations with reversal independence (or undirected permutations, like single routes in the CVRP) and to the case of sets of such permutations (like solutions of the CVRP). The sets of permutations (routes) of two CVRP solutions are matched in this process (in an optimal way) by 7.5. Fitness-distance analysis 113 solving the minimal-cost assignment problem. Therefore, it is possible to determine which routes in one solution correspond to which routes in the other solution. This distance measure will be called deu in this thesis (edit distance for undirected routes). Although the edit distance for strings and undirected permutations is a metric, it is not clear whether deu is a metric for solutions of the CVRP; this matter is not clarified by Sorensen (although it is not the most important property of a measure and is not required here). This measure is not normalised, as well. The value of this distance is the minimal number of elementary edit operations required to transform one set of permutations (a CVRP solution) into another such set. Thus, deu = 0 implies that compared solutions are identical (there is no edit operation required). This measure focuses on the same order of customers in the matched routes; if this order is disturbed somehow, then some edit operations are required to perform necessary transformations. In this sense, the function deu is similar to de , which also stresses the aspect of order (by inspecting edges and paths). It seems, however, that for this edit distance it is also important that long identical subpaths are in the same places (absolute positions) of routes in two solutions. Even if such long subpaths exist in matched routes, a difference in their positions in these routes may incur some additional edit cost. In consequence, this property of deu makes it different from de , which ignores positions of customers in routes and only takes edges into account. Since the order of customers in routes is important for deu , it means that the same suites of vertices in routes of two solutions (the same clusters) are not enough for this measure to make these solutions close. This fact should make it different from metrics which concentrate on clusters only: dpc and dpn . It is also worth noting that the distance deu is inflated when numbers of routes in two compared solutions differ. This is due to the fact that the assignment problem involved in the distance computation has to match some routes of one solution to artificially added empty routes in the other one; it implies performing additional insertions or deletions. The add-remove edit distance for the sequential representation: dear In their proposal of a path relinking procedure for the CVRP, Sorensen et al. (2005) demonstrated another kind of distance measure for CVRP solutions, which is also based on the concept of edit operations. This measure, however, compares solutions encoded in the sequential representation (proposed earlier by Prins, see section 6.4.4). The distance between such permutations defined by Sorensen et al. is the edit distance without the operation of substitution; only insertions and deletions are considered. The authors call it the ‘add-remove edit distance’, so it will be denoted dear hereafter. The cost of one such edit operation is set to 1/2 by Sorensen et al., but here it is assumed to be equal to 1. This measure is not normalised, but this might be amended easily by introducing in its formula a factor being the reciprocal of twice the number of customers. This measure is a metric for permutations, but is not exactly a metric for CVRP solutions, because not every solution of the problem may be encoded in the sequential representation and decoded back without changes. Nevertheless, it seems not to be a great disadvantage of this distance if one imagines an algorithm working only on solutions encoded in this representation (like the one by Prins (2001, 2004), described in section 6.4.4). However, in case any two CVRP solutions had to be compared by a distance measure, this distance would not be directly useful, unless all solutions were encoded as permutations (perhaps with some loss of information on the actual routes). In order to have the possibility of comparison of measures, this approach is applied in this work. 114 Adaptation of the memetic algorithm to the capacitated vehicle routing problem It is harder to provide interpretation of values of this measure than in the previous case. This value is, of course, the minimal number of edit operations required to transform one permutation into another one, but it is not clear how an edit operation influences the actual underlying CVRP solution. Due to the nature of the sequential representation it is unknown which edges actually exist in a solution, and where each route starts and finishes. Thus, an edit operation on a permutation may imply in the decoded solutions some additional modifications of vertices which are not directly involved in the edit operation itself. This phenomenon is visible in the example provided by Sorensen et al. (2005) (page 844, 3 last move operations). It seems that this property of dear might considerably decrease its utility. 7.5.3 Random solutions vs. local optima One stage of the fitness-distance analysis focuses on possible differences between sets of local optima and random solutions of a given instance in terms of distance in these sets (see section 5.2.1). In order to check these differences in case of the CVRP, large random samples of 2000 different solutions of each type were generated. Random solutions were produced using algorithm 11, described earlier in section 7.4.2. Local optima were generated starting from random solutions and proceeding with a greedy local search. A composite neighbourhood of 3 operators was used in the algorithm: merge, 2opt, swap. In these sets distance of each type was computed for 1000 different pairs of solutions, with deu and dear normalised. Finally, statistics on values of distance in these samples were computed: the average distance for each instance, the aggregate average and standard deviation in all instances and rN , the correlation of the average distance and instance size. These values are shown in table 7.3 and figure 7.7. Note that for local optima the table shows the difference between the average distance for these solutions and random ones, but the values of standard deviation and correlation are computed for the original averages, not for the differences. Comment on de Looking at solutions from the point of view of de one can see that random solutions share surprisingly many edges, on average (1 − 0.716) · 100% ≈ 28%. However, a closer look at the solutions revealed that this number is artificially high, because random solutions have many routes and, hence, many depot edges. These are the edges that contribute to this high similarity. When removed, the distance becomes close to 1.0. For example, in instance c50 the common edges constitute 27.9% of all of them in compared solutions, but only 0.1% of them are edges not connected to the depot. The same happens for other instances (e.g. 0.1% for tai385). Therefore, d¯e = 0.716 is artificially low for random solutions and is the effect of the sampling procedure. In local optima, however, the proportions change. In c50 there are on average 39.5% common edges and as much as 69% of them are not conneced to the depot. Consequently, the change between rand and lopt pools is higher than indicated by the numbers in the table. The author estimates it to approximately 0.3. 0.715 0.717 0.711 0.699 0.702 0.701 0.709 0.725 0.725 0.712 0.712 0.715 0.712 0.727 0.730 0.728 0.718 0.723 0.720 0.719 0.731 0.713 0.716 0.009 0.291 avg. std. dev. rN d¯e c50 f71 c75 tai75a tai75b tai75c tai75d c100 c100b tai100a tai100b tai100c tai100d c120 f134 c150 tai150a tai150b tai150c tai150d c199 tai385 instance 0.856 0.014 0.781 0.834 0.847 0.838 0.839 0.839 0.841 0.843 0.861 0.860 0.852 0.852 0.853 0.853 0.866 0.870 0.871 0.865 0.868 0.867 0.865 0.877 0.880 0.991 0.003 0.808 0.982 0.987 0.989 0.989 0.989 0.988 0.989 0.991 0.991 0.991 0.991 0.991 0.991 0.992 0.993 0.994 0.994 0.994 0.994 0.994 0.995 0.998 0.813 0.010 -0.420 0.824 0.825 0.812 0.803 0.803 0.800 0.812 0.825 0.826 0.807 0.810 0.812 0.809 0.823 0.825 0.820 0.809 0.814 0.812 0.809 0.817 0.789 random solutions (rand ) 1 ¯ d¯pc d¯pn N deu 0.837 0.029 0.893 0.775 0.806 0.811 0.811 0.810 0.811 0.811 0.832 0.833 0.832 0.832 0.833 0.833 0.846 0.852 0.860 0.860 0.860 0.860 0.861 0.877 0.909 1 ¯ 2N dear -0.090 0.063 0.613 -0.109 -0.240 -0.023 -0.078 -0.085 -0.119 -0.190 -0.052 -0.185 -0.117 -0.102 -0.089 -0.103 -0.014 -0.097 -0.049 -0.091 -0.060 -0.091 -0.083 -0.030 0.032 -0.065 0.048 0.600 -0.090 -0.201 -0.039 -0.044 -0.031 -0.053 -0.080 -0.095 -0.156 -0.067 -0.046 -0.044 -0.029 -0.072 -0.089 -0.061 -0.046 -0.042 -0.055 -0.037 -0.051 -0.001 -0.294 0.102 0.553 -0.323 -0.648 -0.237 -0.274 -0.256 -0.306 -0.440 -0.260 -0.402 -0.301 -0.293 -0.284 -0.314 -0.246 -0.283 -0.234 -0.263 -0.263 -0.258 -0.244 -0.204 -0.137 -0.121 0.075 0.627 -0.174 -0.297 -0.117 -0.137 -0.116 -0.153 -0.243 -0.076 -0.276 -0.159 -0.134 -0.145 -0.098 -0.057 -0.042 -0.069 -0.096 -0.094 -0.105 -0.051 -0.044 0.024 -0.342 0.072 0.661 -0.324 -0.472 -0.306 -0.351 -0.321 -0.359 -0.435 -0.284 -0.457 -0.378 -0.358 -0.349 -0.335 -0.269 -0.291 -0.298 -0.348 -0.331 -0.351 -0.322 -0.287 -0.290 difference for local optima (lopt) 1 ¯ 1 ¯ d¯e d¯pc d¯pn N deu 2N dear Table 7.3: Average values of distance in sets of random solutions and local optima. 7.5. Fitness-distance analysis 115 116 Adaptation of the memetic algorithm to the capacitated vehicle routing problem 1 random solutions local optima average distance 0.8 0.6 0.4 0.2 0 d_e d_pc d_pn d_eu distance type d_ear Figure 7.7: Aggregate average values of distance in sets of random solutions and local optima. One should note a change in the values of the standard deviation of d¯e between rand and lotp. It rises from 0.009 to 0.063, meaning that while in random solutions the values of the average distance does not really differ between instances, the difference becomes higher for local optima. For example, d¯e is much smaller than the aggregate average in case of f71, c100b, tai75d, tai75c, tai100a. Much larger values of distance appear for c120, c199 and tai385. For the last instance the distance between local optima seems to grow compared to random solutions, but this is also the effect of the artificially low distance in the rand pool due to the mentioned numerous depot edges. The value of rN , the correlation coefficient between d¯e and N , the instance size, is higher for local optima than for random solutions, and in the former group it amounts to 0.61. This means that d¯e depends to some extent on the instance size. Comment on dpn Contrary to d¯e , the values of d¯pn in the rand pool are extremely high, nearly reaching the maximum possible value. In lopt these values drop significantly, to 0.7 on average, meaning that approximately 30% of all pairs of nodes are in the same routes in local optima, irrespective of the order and adjacency of these nodes. In such solutions certain nodes should not be put in separate routes. The comparison of standard deviations between rand and lopt reveals that the difference between instances becomes considerable when looking at local optima. The coefficient of variation rises 148 times in the second pool. For example, d¯pn drops from 0.99 to 0.34 in case of f71 and from 0.99 to 0.55 for tai75d. These are very high changes compared to a drop of 0.13 and 0.2 for tai385 and c199, respectively. The value of correlation rN changes from 0.8 for rand to 0.55 for lopt, It means that while the distance of random solutions highly depends on instance size, the dependence loses some strength in sets of local optima. Comment on dpc Distance d¯pc between random solutions does not reach the theoretical maxi- mum of 1.0 here, similarly to the case of de . This is probably due to the fact that there is some probability of finding at least one common customer vertex in optimistically matched routes of any two solutions. Nevertheless, in local optima the distance is smaller on average, and amounts to 0.79. This means that even the most pessimistic of the optimistic matches of routes reveal some common vertices. This effect is better measured by dpn , the author deems. 7.5. Fitness-distance analysis 117 The increase in the value of standard deviation between rand and lopt, from 0.01 to 0.05, indicates slightly greater variability between instances with respect to d¯pc when local optima are examined. Here, the positive examples are again: f71, c100b, and perhaps c100, c50. The negative ones are: tai385, tai75b, tai100d. Comment on deu The values of d¯eu in rand pool seem to be approximately the same for all instances, around 0.81. Again, the theoretical maximum of 1.0 is not reached by the measure, perhaps due to the same reasons as for de or dpc . In the other pool, lopt, the value drops by 0.12, meaning that 12% less edit operations are required to convert one local optimum into another one than in case of random solutions. This is a significant change, although not a huge one. It means that some subsequences of routes are conserved in local search. The coefficient of variation of d¯eu increases 9 times between random solutions and local optima. This signifies that differences between instances are much more visible when local optima are considered. For example, in case of f71, c100b and tai75d the values of d¯eu change by more than 0.2, while in case of tai385, f134 and c199 the change is by less than 0.05. Somewhat surprising is the fact that the value of correlation rN changes its sign when shifting from rand (-0.42) to lopt (0.63). Comment on dear In the pool of random solutions d¯ear seems to be largely dependent on instance size: rN is as high as 0.89. With increasing instance size d¯ear appears to slowly approach 1.0, which looks sensible given the fact that it means 2 add or remove operations per customer to change one random permutation into another one. This value drops significantly in the lopt pool: by 0.34 on average. This is the largest drop of all the considered measures. It indicates that there are important regularities in local optima which are not present in random permutations. The standard deviation of d¯ear rises from 0.03 to 0.07, meaning the difference between instances is slightly more visible when examining local optima. Here, the positive examples are again: f71, c100b, tai75d, the negative ones being: c199, c120, f134, tai385. 7.5.4 Fitness-distance relationships The second stage of the fitness-distance analysis is an attempt to find trends in the sets of local optima themselves and verify the ‘big valley’ hypothesis: if better solutions tend to be closer (more similar) to each other. In case of this study positive values of the fitness-distance correlation would indicate a ‘big valley’ structure. The verification was performed with the method of analysis of a set of pairs of local optima (see section 5.4.6), which leads to the computation of values of the linear determination coefficient r2 between fitness and distance as an indicator of FDC. The same sets of 1000 pairs of local optima were used as in the previous section. The obtained values of r2 are given in table 7.4. All the values marked as significant correspond to positive correlations. The author’s classification of each instance according to the ‘big valley’ status is given in the last column. The values of r2 emphasised in boldface in table 7.4 are those greater than 0.18. One such value corresponds to two independent values of correlations r2 (f1 , d) and r2 (f2 , d) being at least 0.3. Although these values are not large as correlations, the author thinks they are significant as indicators of FDC (compare to values in table 5.1). Jones & Forrest (1995) employed the value of 0.15 as a border between significant and insignificant correlations, so 0.3 used here is much more prudent (twice as high), although still arbitrary. 118 Adaptation of the memetic algorithm to the capacitated vehicle routing problem Table 7.4: Values of the linear determination coefficient r2 between fitness and each distance measure for all instances. instance linear determination coefficient big valley? re2 2 rpc 2 rpn 2 reu 2 rear c50 f71 c75 tai75a tai75b tai75c tai75d c100 c100b tai100a tai100b tai100c tai100d c120 f134 c150 tai150a tai150b tai150c tai150d c199 tai385 0.167 0.237 0.112 0.165 0.025 0.187 0.244 0.197 0.347 0.079 0.238 0.246 0.178 0.020 0.160 0.296 0.009 0.072 0.190 0.042 0.237 0.123 0.069 0.046 0.011 0.021 0.026 0.034 0.059 0.062 0.211 0.034 0.032 0.036 0.082 0.091 0.099 0.015 0.003 0.035 0.046 0.011 0.014 0.047 0.173 0.378 0.112 0.156 0.075 0.258 0.278 0.126 0.468 0.110 0.233 0.380 0.272 0.152 0.045 0.232 0.000 0.223 0.217 0.057 0.242 0.158 0.150 0.326 0.079 0.163 0.070 0.200 0.199 0.116 0.414 0.114 0.212 0.290 0.198 0.065 0.083 0.207 0.002 0.126 0.216 0.044 0.204 0.106 0.016 0.010 0.001 0.017 0.008 0.016 0.006 0.004 0.027 0.020 0.008 0.031 0.017 0.014 0.010 0.025 0.001 0.021 0.038 0.010 0.008 0.001 avg. std. dev. 0.162 0.093 0.049 0.045 0.197 0.116 0.163 0.097 0.014 0.010 amb. yes no amb. no yes yes yes yes no yes yes yes amb. amb. yes no yes yes no yes amb. All cases with r2 ∈ [0.15, 0.18) are typeset in italic. These are values which are deemed ‘borderline cases’; perhaps there exists a ‘big valley’, but there is more doubt about it. An instance is classified as ‘big valley’=‘yes’ if there is at least one r2 value not less than 0.18. It is in the ‘no’ class when no r2 is larger than 0.15. Otherwise it is said to be ambiguous (‘amb.’). Comment on distance measures First general observation made based on values in table 7.4 is that dear is not correlated with fitness at all. Thus, it seems that this type of distance does not 2 reveal any ‘big valley’ in the CVRP. A very similar conclusion may be derived from values of rpc : dpc does not correlate with fitness with one exception, c100b, which has the largest values of r2 for all types of distance. The properties measured by these distances, whatever they may be, are not important for the objective function. Conclusions are different in case of the three other measures. Firstly, de reveals fitness-distance correlation for 10–14 instances out of 22. Significant values of FDC indicate that in these cases better solutions tend to contain some more common edges, on average, so the presence of some edges is important for good quality of solutions. The highest FDC values are obtained for instances c100b, c150, tai100c. Virtually zero correlations happen for tai150a, c120, tai75b. The largest instance, tai385, has it very small, so that r2 =0.12. Secondly, when dpn is taken into account, it appears as though there are ‘big valleys’ in 11–15 cases (mostly the same as for de ). It means that for these instances better local optima usually contain more similar clusters (assignments of customers to routes) than poorer ones and that 7.5. Fitness-distance analysis 119 certain contents of clusters are important for good quality of solutions. The best instances from this point of view are c100b, tai100c, f71. The worst are tai150a, f134, tai150d. The largest instance, with r2 =0.157, seems to be an ambiguous case, where some traces of a ‘big valley’ structure may be found. Finally, deu reveals reasonable fitness-distance correlations in 10-11 cases (again, usually the same as for de ). This result suggests that better solutions of the CVRP are more closely related in terms of edit operations on routes than worse local optima. The highest values of r2 are for c100b (again), f71, tai100c, while the lowest for tai150a (again), c120, c75. Comment on instances Looking at table 7.4 from the point of view of instances, one can see that 9 instances out of 22 reveal ‘big valleys’ for 3 distance measures: de , dpn , and deu . 3 more problem examples reveal FDC for at least one of the measures. The most obvious example here is c100b, with the highest r2 coefficients of all the studied instances, while tai100c, c150, f71 and tai75d follow. There are also 5 instances which do not reveal any trace of fitness-distance correlation with respect to any distance measure used here: c75, tai100a, tai150a, tai150d, tai75b. Values of FDC for each of them are very small. A negative example may be tai150a: all r2 are virtually zero. The other instances listed in table 7.4 are intermediate cases: there is some indication of ‘big valley’ with respect to some measures, but not when some other distances are taken into account. Fitness-distance plots The conclusions derived from the values of FDC may be further verified through inspection of fitness-distance plots. In this study, 2-dimensional FD plots are constructed based on the mentioned 3-dimensional observations by cutting a slice through the 3 dimensions along the line of solutions with approximately the same fitness (max. 2% difference in quality). In figure 7.8 one can see FD plots generated for instance c150 and all types of distance. As revealed by the values of r2 in table 7.4, fitness is hardly correlated with dear and dpc : there is large variance of distance in the presented sample and no significant trend. Distances de , dpn and deu are different in the figure. Although the scatter of distance is still large, a moderate trend is also visible: solutions with smaller fitnesses tend to be slightly closer to each other, as indicated by the superimposed lines of first-order regression. The strongest trend, with the smallest variance of distance, appears for de , as predicted by the largest r2 for this distance measure: r2 =0.296. Another examples of FD plots confirming the presence of ‘big valleys’ are shown in figure 7.9. The plot generated for instance tai150c and dpn shows that with the improving solutions quality the average distance between local optima decreases, as indicated by the regression line; here r2 =0.217. However, in this plot the variance of distance seems to grow at the same time. There is no such phenomenon for c100b and deu . Here, the variance of distance appears to be constant, while the average distance between local optima drops significantly with increasing solution quality (r2 =0.414). Some more plots for the positive ‘big valley’ cases are shown in figures 7.10 and 7.11. They generally confirm the earlier conclusions based on r2 , that fitness landscapes of these instances reveal the ‘big valley’ structure. Instance tai385 with distance dpn (r2 =0.158) and tai75a with de (r2 =0.165) are examples of the ambiguous ‘big valley’ status (figure 7.12). Here, there are some visible changes to the average distance between lopt solutions, but at the same time the variance of distance is rather large and obscures the trends to high extent. There are also negative examples in the studied set of instances, tai150a being one of them. FD scatter plots for this problem example are shown in figure 7.13. One can clearly see that there are 120 0.8 140 0.75 130 0.7 120 d_eu d_e Adaptation of the memetic algorithm to the capacitated vehicle routing problem 0.65 110 0.6 100 0.55 90 0.5 1060 1080 1100 1120 fitness 1140 1160 80 1060 1180 0.9 0.8 0.85 0.75 0.8 1100 1120 fitness 1140 1160 1180 d_pc d_pn 0.85 1080 0.7 0.75 0.65 0.7 0.6 1060 1080 1100 1120 fitness 1140 1160 1180 1080 1100 0.65 1060 1080 1100 1120 fitness 1140 1160 1180 270 260 250 d_ear 240 230 220 210 200 190 1060 1120 fitness 1140 1160 1180 Figure 7.8: Fitness-distance plots with local optima for instance c150 and all types of distance, together with lines of regression. 0.85 80 0.8 70 0.75 60 0.65 0.6 0.55 0.5 0.45 2400 2450 2500 2550 2600 2650 2700 2750 2800 fitness d_eu d_pn 0.7 50 40 30 20 10 820 840 860 880 900 920 fitness 940 960 Figure 7.9: Fitness-distance plots with local optima for instance tai150c, distance dpn (left) and instance c100b, distance deu (right), together with lines of regression. 980 121 7.5. Fitness-distance analysis 0.8 0.8 0.75 0.7 0.7 d_e d_e 0.6 0.5 0.65 0.6 0.55 0.4 0.5 0.3 1960 2000 2040 2080 2120 2160 2200 2240 fitness 0.45 840 860 880 900 fitness 920 940 0.9 90 0.8 80 0.7 70 d_eu d_pn Figure 7.10: Fitness-distance plots with local optima for instance tai100b, distance de (left) and instance c100, distance de (right), together with lines of regression. 0.6 60 0.5 50 0.4 40 0.3 1600 1640 1680 1720 fitness 1760 1800 30 1400 1840 1440 1480 1520 fitness 1560 1600 1640 Figure 7.11: Fitness-distance plots with local optima for instance tai100d, distance dpn (left) and instance tai100c, distance deu (right), together with lines of regression. 0.8 0.88 0.7 0.86 0.6 d_e d_pn 0.9 0.84 0.5 0.82 0.4 0.8 26000 26400 26800 fitness 27200 27600 0.3 1640 1680 1720 1760 fitness 1800 Figure 7.12: Fitness-distance plots with local optima for instance tai385, distance dpn (left) and instance tai75a, distance de (right), together with lines of regression. 1840 122 Adaptation of the memetic algorithm to the capacitated vehicle routing problem no traces of the positive trend; in case of de , dpn and deu the average distance between local optima even seems to increase with decreasing fitness, but the variance of distance is large at the same time. Surely, there is no ‘big valley’ here from the point of view of this sample of local optima. 0.75 130 0.7 120 110 100 0.6 d_eu d_e 0.65 0.55 90 80 0.5 70 0.45 60 0.4 3150 3200 3250 3300 3350 3400 3450 3500 fitness 50 3150 3200 3250 3300 3350 3400 3450 3500 fitness 0.85 0.95 0.8 0.9 0.75 0.85 d_pc d_pn 0.7 0.65 0.6 0.8 0.75 0.55 0.7 0.5 0.45 3150 3200 3250 3300 3350 3400 3450 3500 fitness 0.65 3150 3200 3250 3300 3350 3400 3450 3500 fitness 270 260 d_ear 250 240 230 220 210 200 3150 3200 3250 3300 3350 3400 3450 3500 fitness Figure 7.13: Fitness-distance plots with local optima for instance tai150a and all types of distance, together with lines of regression. 7.5.5 Main conclusions from the fitness-distance analysis Generally speaking, local optima are closer to one another than random solutions, but the decrease in distance is moderate. The greatest difference is observed for dear (-34% on average), and then for dpn (-29%). For de and deu the computed differences are smaller, but the real ones are estimated to approximately 30%, as well. This effect is somewhat fainter for instances c199, tai385 and c75, while it is much stronger for f71, c100b and tai75d. The average distance between local optima is to some extent correlated with instance size, even irrespective of distance type, with rN around 0.6. Hence, it seems that larger instances have local optima generally more spread in the fitness landscape than smaller ones. This is rather a negative effect, since one would wish larger (and harder) instances to have local optima more clustered. The standard deviations of d¯ across instances are higher for local optima than for random solutions. It indicates that the average distance in the lopt pool depends on the particular instance. 7.5. Fitness-distance analysis 123 In spite of those two rather negative effects, the decrease in the average distance between rand and lopt is a fact. It confirms that a metaheuristic algorithm for the CVRP should contain a local search component to increase efficiency, since such an approach helps to reduce to some extent the size of the search space to be searched for good solutions. At the same time the author admits that the decrease of approximately 30% of distance is not a huge one. The average similarity of 30% means that there is not enough common properties of solutions to define a complete and good new solution; still about 70% of its properties (edges, clusters, subsequences) has to be added to the common ones. However, in terms of the size of the space to be searched this 30% may mean a considerable reduction and gain for a search algorithm; compare the computation of the reduction in the TSP case (Boese 1995). The result that about 70% of solution properties has to be added to the common ones agrees to some extent with one of the conclusions of Reeves (1999) made for the flowshop scheduling problem: there appeared to be some ‘wall’ between local optima and the global optimum which did not allow the former to be arbitrarily close to the latter. This is visible e.g. in the case of tai385 and dpn (see figure 7.12), where no solutions are closer to each other than dpn = 0.8. The landscapes revealed also moderate correlations between fitness and distance in more than half of the studied instances. The correlations are significant mainly for measures dpn , deu and de , with the other measures being generally uncorrelated. It means that better local optima tend to have more clusters, subsequences or edges in common than worse ones, although the variance of distance is also visible along the trends. 9 instances out of the studied 22 reveal ‘big valleys’ for the 3 types of distance. 3 more instances reveal it for at least one of the distances. According to the hypothesis that the presence of such a structure facilitates search by an evolutionary algorithm with distance-preserving operators, these instances should be easy for such an algorithm. Apparently, there is no FDC in case of 5 instances and these are: c75, tai75b, tai100a, tai150b, tai150d. These instances should be rather difficult for optimisation by such an algorithm, unless some other properties of the instances make them easy. Finally, 5 ambiguous cases were found. Therefore, it seems, somewhat sadly, that the presence of a ‘big valley’ structure is a property of an instance of the CVRP, not of the problem itself. FDC is not a reliable problem property on which a metaheuristic algorithm may be founded, because it may or may not exist in the fitness landscape. The design of an algorithm which assumed that the positive correlation always existed could be prone to inefficiency, unless some other components were implemented, e.g. mutation. Concerning the distance measures presented in this chapter, the results for the distance in terms of edges (de ) confirm to some extent that the intuitive idea of preserving parental edges in recombination operators for the CVRP does make sense. This idea was the cornerstone of the operators employed in the efficient algorithms by Alba & Dorronsoro (2004) (ERX) and Nagata (2007) (EAX), both inspired by similar results for the TSP. The results for the distance in terms of pairs of nodes (dpn ) confirm that approaches based on the ‘cluster first/route second’ paradigm are also sensible. Taillard’s (1993) and Rochat & Taillard’s (1995) algorithms were founded exactly on the idea: a good heuristic clustering should be only slightly changed in order to obtain good solutions. The second of the mentioned algorithms even combined this approach with the previous one, based on the preservation of routes (edges), and this was probably the cause of its long-lasting success. The last measure which correlates with fitness, deu , reveals that in some instances it is important to preserve certain subsequences of customers in routes. This result may be exploitable in an optimisation algorithm, but this will not be attempted in this thesis. 124 7.6 Adaptation of the memetic algorithm to the capacitated vehicle routing problem Recombination operators Based on the results of the FDA, the designed recombination operators should be distancepreserving with respect to dpn and de . All the presented operators preserve the feasibility of offspring if parents are also feasible. 7.6.1 CPX2: clusters preserving crossover Algorithm 12 computes clusters of customers which are common to two given solutions. It is used in the definition of the crossover operator, in algorithm 13. Algorithm 12 CommonClusters(s1 , s2 ) Ensure: ∀c ∈ Customers ∃cc ∈ Clusters : c ∈ cc Clusters = ∅ {the common clusters} C1 = C(s1 ) C2 = C(s2 ) for all ci ∈ C1 do for all cj ∈ C2 do cc = ci ∩ cj if |cc| ≥ 1 then Clusters = Clusters ∪ {cc} return Clusters The idea behind the CPX2 procedure is quite straightforward: build a random route in each common cluster. If a cluster contains only one customer, a single-customer route is built. Algorithm 13 CP X2(p1 , p2 ) o=∅ Clusters = CommonClusters(p1 , p2 ) for all cc ∈ Clusters do o = o ∪ {RandomRoute(cc)} return o The unexplained procedure RandomRoute(cc) simply creates a random permutation of the given customers cc and builds a route from the permutation. Therefore, the result is randomised. The first version of this algorithm, CPX (Kubiak 2004), preserved only the largest intersection of each cluster from one solution with a cluster from the other one. This has been improved in CPX2, by correcting the definition of CommonClusters. Most likely, although this has not been formally proved, the operator preserves distance dpn , so that the following condition holds: dpn (p1 , o) ≤ dpn (p1 , p2 ) ∧ dpn (p2 , o) ≤ dpn (p1 , p2 ) An example offspring of this crossover is generated from parents which are shown in figure 7.14. These are solutions of instance tai75a. One of the parents (on the left side) is the best-known solution for this instance, while the other is a local optimum of CW , 2opt and swap operators. The offspring of CPX2 is shown in figure 7.15, on the left. In the figure the customers being in some common cluster are emphasised. The actual clusters are difficult to present in a black and white figure, but one may verify the common clusters in figure 7.14 with parents. It may be seen that common parental clusters are preserved in the offspring, but not the common edges, because the order of customers in clusters is chosen randomly. Due to this fact, such a solution may be very expensive, unless local search is performed. 7.6. Recombination operators 125 The distance dpn between parents is preserved in this example: dpn (p1 , p2 ) = 0.755, dpn (p1 , o) = 0.654, dpn (p2 , o) = 0.664. 7.6.2 CEPX: common edges preserving crossover This operator, explained in algorithm 14, aims at preservation of distance de . It builds routes out of common edges (paths). All customers which are not incident with any common edge are put in separate single-customer routes in the offspring. Algorithm 14 CEP X(p1 , p2 ) o=∅ C = Customers CE = E(p1 ) ∩ E(p2 ) while CE 6= ∅ do ce is an edge in CE p = M aximumP ath(ce, CE) o = o ∪ Route(p) for all customers v on path p do C = C \ {v} for all edges e in path p do CE = CE \ {e} for all v ∈ C do o = o ∪ {(v0 , v)} return o The set of common edges, CE, contains all the edges that are common to parents p1 and p2 . These edges may (and frequently do) define longer paths, some of them containing the depot vertex. Therefore, the procedure M aximumP ath(ce, CE) builds a maximum path p based on those edges in CE which are connected with the chosen edge ce. Each such path is later converted into a route by a call to Route(p). This results in a deterministic offspring for each pair of parents. An exemplary offspring, generated from the same parents, is shown in figure 7.15 (right). The edges common to parents are emphasised. There are especially many such edges around the depot vertex and near the convex hull of the set of vertices. All these are preserved in the offspring and none foreign customer-to-customer edge is added; there are many routes. Distance de is almost preserved in this case: de (p1 , p2 ) = 0.621, de (p1 , o) = 0.640, de (p2 , o) = 0.656. It is not exactly preserved because of the mentioned numerous routes: the parents have less routes than the offspring and hence they have less different edges; the offspring has many depot edges which may be found in neither parent, so these edges additionally contribute to the value of distance to the parents. 7.6.3 CECPX2: common edges and clusters preserving crossover This operator preserves in an offspring the sets of common clusters and common edges of its parents. Thus, it also aims at preservation of dpn and de . It uses a helper algorithm 15 to find in the given cluster cc all paths which can be assembled out of common edges CE. The CECPX2 procedure (algorithm 16) first creates routes within common clusters using paths made out of common edges. If there is no edge (path) connecting customers in a cluster, they are linked randomly. Similarly to CPX2, single-customer clusters define single-customer routes. In algorithm 16 the operation t = t · p is a concatenation of the path p at the end of route t. An exemplary offspring of CECPX2 is shown in figure 7.16 (left). The common properties of parents (edges and clusters) are emphasised like in the CPX2 and CEPX cases. Comparing the 126 Adaptation of the memetic algorithm to the capacitated vehicle routing problem Figure 7.14: Solutions used as parents in crossover examples. Instance tai75a. Figure 7.15: Offspring of the CPX2 (left) and CEPX operators (right). Algorithm 15 M aximumP athsInCluster(CE, cc) Ensure: ∀c ∈ cc ∃p ∈ P aths : p contains c P aths = ∅ while cc 6= ∅ do v is a customer in cc if CE contains an edge ce with v then p = M aximumP ath(ce, CE) P aths = P aths ∪ {p} for all customers v on path p do cc = cc \ {v} else P aths = P aths ∪ {(v)} {a single-customer path (without edges)} cc = cc \ {v} return P aths 7.6. Recombination operators 127 Algorithm 16 CECP X2(p1 , p2 ) o=∅ Clusters = CommonClusters(p1 , p2 ) CE = E(p1 ) ∩ E(p2 ) for all cc ∈ Clusters do P aths = M aximumP athsInCluster(CE, cc) t = (v0 ) {an empty route} if P aths contains a path p1 from the depot then t = t · p1 {start from the depot side} P aths = P aths \ {p1 } if P aths contains another path p2 from the depot then P aths = P aths \ {p2 } {remember for later use} while P aths 6= ∅ do choose randomly a path p from P aths t=t·p P aths = P aths \ {p} if p2 exists then t = t · p2 {finish at the depot side} o = o ∪ {t} return o offspring to its parents (figure 7.14) one may note that all the common properties are preserved in the generated solution. The result is slightly randomised. In this example distances de and dpn are preserved: de (p1 , p2 ) = 0.621, de (p1 , o) = 0.582, de (p2 , o) = 0.590; dpn (p1 , p2 ) = 0.755, dpn (p1 , o) = 0.564, dpn (p2 , o) = 0.641. The first distance is preserved because there are less routes in this offspring than in the one of CEPX. 7.6.4 GCECPX2: greedy CECPX2 This operator (algorithm 17) extends CECPX2 by connecting common paths in a greedy way and by considering also the possibility of merging different common clusters. This greedy approach is motivated by the results of the analysis of the average distance between local optima (section 7.5.3), which indicated that preserving only the common parental features in an offspring may be not enough to create a good solution. While creating an offspring, sometimes it may happen that a cluster of customers contains only one depot-to-depot path. In this case the cluster must not be merged with any other, because some common edges would have to be broken. Therefore, such clusters define complete, separate routes. This is covered by lines 9 to 11 of algorithm 17. In other cases the common paths do not define complete routes. Thus, they may potentially be merged, even if they belong to different clusters. However, the conditions under which a particular common path p (line 19) must not be attached to the current route t are quite complex: • it is already present in the offspring (line 21); • it is in the same cluster as the current route and it leads from the depot, and there are still some other paths in the same cluster (line 24); in this case taking p would force finishing the route, coming back to the depot, and in effect breaking the common cluster; • it is a path in a different cluster and: – the sum of demands of this and the current cluster would exceed the vehicle capacity (line 29); 128 Adaptation of the memetic algorithm to the capacitated vehicle routing problem Algorithm 17 GCECP X2(p1 , p2 ) 1: o = ∅ 2: Clusters = CommonClusters(p1 , p2 ) 3: CE = E(p1 ) ∩ E(p2 ) 4: AllP aths = M aximumP athsInCluster(CE, Customers) 5: while Clusters 6= ∅ do 6: cc is a randomly chosen cluster from Clusters 7: P aths = M aximumP athsInCluster(CE, cc) 8: t = (v0 ) 9: if P aths contains a depot-to-depot path p then 10: t=t·p 11: P aths = P aths \ {p} 12: else 13: if P aths contains a path p1 from the depot then 14: t = t · p1 15: P aths = P aths \ {p1 } 16: if P aths contains another path from the depot then 17: p2 is this path from the depot 18: while P aths 6= ∅ do 19: for all paths p in AllP aths do 20: IsF easibleP ath(p) = true {assume the path may be attached to route t} 21: if p is already present in o then 22: IsF easibleP ath(p) = false 23: else if p ∈ P aths then 24: if (p = p2 ) and (|P aths| > 1) then 25: IsF easibleP ath(p) = false 26: else 27: cc2 is the cluster of this path p 28: P aths2 = M aximumP athsInCluster(CE, cc2 ) 29: if d(cc) + d(cc2 ) > C then 30: IsF easibleP ath(p) = false 31: else if P aths2 contains a depot-to-depot path or two depot paths then 32: IsF easibleP ath(p) = false 33: else if (P aths2 contains one depot path) and (p2 exists) then 34: IsF easibleP ath(p) = false 35: else if (p is a path to the depot) and ((|CP | > 1) or (|CP2 | > 1)) then 36: IsF easibleP ath(p) = false 37: M inP athDist = ∞ 38: for all p ∈ AllP aths do 39: Dist = GetDistance(LastCustomer(t), p) 40: if (Dist < M inP athDist) and (IsF easibleP ath(p)) then 41: M inP athDist = Dist 42: p0 = p 43: if M inP athDist < ∞ then 44: if p0 6∈ P aths then {merge the cluster of p0 into the cluster of t} 45: cc2 is the cluster of this path 46: P aths2 = M aximumP athsInCluster(CE, cc2 ) 47: cc = cc ∪ cc2 48: P aths = P aths ∪ P aths2 49: t = t · p0 50: P aths = P aths \ p0 51: Clusters = Clusters \ cc 52: o = o ∪ {t} 53: return o 7.7. CPM: clusters preserving mutation 129 – this cluster contains a depot-to-depot path, i.e. defines a complete route on its own (line 31); – this cluster contains two depot paths; in case of a cluster merge one would have to be broken (also line 31); – this cluster contains one depot path but the current route cluster also contains one; one would have to be broken (line 33); – this path leads from the depot and there are some other paths in this or the current route cluster; coming back to the depot would brake a common cluster (line 35). Once it is established which paths are feasible to add (i.e. the function IsF easibleP ath(p) is computed), the one is chosen which is closest to the last customer of the current route t. This is the greedy part of the operator (lines 37 to 42). If this chosen path p0 comes from some other cluster, then this cluster and its paths are merged with the current cluster and its paths (lines 44 to 48). After attaching path p0 to the current route t this process of building the offspring is continued with another path, and for all remaining common clusters. Although the operator is a greedy and distance-preserving one, the result is randomised, because of the random order of considering common clusters in algorithm 17 (line 6). The offspring generated from the exemplary parents is shown in figure 7.16 (right). Common parental properties are emphasised and, one can see, also preserved in the offspring, exactly like in the CECPX2 case. GCECPX2, however, also connected some of the common clusters; one example is the top left route of the offspring, which corresponds to two routes in the CECPX2 result. Therefore, an offspring of GCECPX2 will usually have fewer routes than its CECPX2 counterpart. Distances de and dpn are preserved for this offspring: de (p1 , p2 ) = 0.621, de (p1 , o) = 0.546, de (p2 , o) = 0.517; dpn (p1 , p2 ) = 0.755, dpn (p1 , o) = 0.604, dpn (p2 , o) = 0.605. 7.7 CPM: clusters preserving mutation The idea of this mutation is to alter a parent in a way which does not change the contents of any route (the clusters). Only the order of customers in one route is changed, as one can see in algorithm 18. The route to be altered is usually chosen randomly, with uniform distribution over routes. Algorithm 18 CP M (p, t) m=∅ copy all routes from p to m except t cc is the set of customers of t m = m ∪ {RandomRoute(cc)} return m This way the operator builds mutant m based on parent p so that dpn (p, m) = 0. An example of this mutation is shown in figure 7.17. The mutated solution is the left parent from figure 7.14. The mutant edges which differ with the parent are emphasised. In this example obviously dpn (p, m) = 0; de (p, m) = 0.111 and deu (p, m) = 4. 7.8 Experiments with initial solutions This section describes experiments focused on the quality of solutions produced by heuristic algorithms, since these solutions initialise the population of the designed MA. The algorithms investi- 130 Adaptation of the memetic algorithm to the capacitated vehicle routing problem gated here are: Clarke & Wright (CW), Gillet & Miller (GM), First-Fit (FF), Gain-Fit (GF) and random, also with greedy or steepest local search applied to their result. In the actual experiment deterministic algorithms were run one time per parameter setting; all variants of parameters were generated for the GM and GF algorithms. Samples of 30 random solutions were taken for each instance. Concerning local search, 30 runs of greedy LS were performed on each parameter setting, since this LS is randomised. The steepest version was run once only on each heuristic solution. All experiments were performed with the code developed by the author in C++. The computation was performed on a desktop PC with Pentium 4 2.6 GHz processor, 1 GB RAM, running MS Windows XP. The results, averaged over instances, are shown in figure 7.18. Although this cannot be seen in the figure, the quality of random solutions without LS is extremely poor, as expected; the average excess amounts to 488%. The FF heuristic is also poor, 143%, while GF improves the result to 62%. The GM algorithm is better and produces solutions on average 36% worse than the bestknown ones. Definitely the best is the CW local search, which produces solutions with only 5.6% excess, on average. The quality of solutions after greedy LS improves. For the CW algorithm this improvement amounts only to 1%, while for other types of heuristics the drop of solution cost is considerable: to 15.6% for FF, 10.2% for GM and 9.2% for GF. The CW heuristic remains the best with the average excess of 4.7%. The steepest algorithm changes little. There is approx. 1% of improvement for the random, GM and GF algorithms, no change for the CW heuristic, and, quite surprisingly, some drop in quality for the FF heuristic. The examination of the worst solutions across instances revealed that there was no great deterioration. For example, without local search the worst CW solution had excess of only 11.5%, while 10% with LS. Other types of solutions were worse than this, but this result shows than even in the worst case the Clarke & Wright algorithm may be improved only by 10% by any metaheuristic. However, in the optimistic scenario it may reach solutions as close as 1% to the best-known ones (e.g. for instances tai75b, c120). Concerning the time of computation, the generation of all but CW solutions took next to no time. The CW heuristic took slightly more time, since it is a steepest algorithm. With local search, all algorithms always finish computation within 10 s, even for the largest instances, with steepest LS being slightly slower than greedy. Quite surprisingly, it was also found that local search made only small changes to CW solutions, thus proving that this heuristic indeed produces results of high quality and near local optima; the average distance between the heuristic solution before and after LS amounted only to de = 0.17 and dpn = 0.07. This was quite different for the other algorithms, e.g. for the GM algorithm it was de = 0.6 and dpn = 0.55, and this was the next smallest change. Predictably, random solutions were almost completely changed by local search: de = 0.93 and dpn = 0.98. Overall, the CW heuristic generates the best solutions, on average, with GM being second and GF third. With the help of LS this ranking changes a bit: CW still comes first, then GF and GM or random. These solutions (after greedy LS) are put in the initial population of the MA. 131 7.8. Experiments with initial solutions Figure 7.16: Offspring of the CECPX2 (left) and GCECPX2 operators (right). Figure 7.17: CPM mutant of the best-known solution of instance tai75a. average excess [%] 70% 60% 50% no LS 40% greedy steepest 30% 20% 10% 0% random CW GM FF GF initial solution type Figure 7.18: Average quality of heuristic solutions over all instances, without and with local search. 132 Adaptation of the memetic algorithm to the capacitated vehicle routing problem 7.9 7.9.1 Experiments with memetic algorithm Long runs until convergence The goal of this experiment was to assess the limits of the potential of the memetic algorithm with different crossover operators. The main question here was what quality of solutions the MA could generate for a reasonably-sized population, irrespective of the time of computation. Therefore, very long runs of the algorithm were allowed, very likely until complete convergence: the population of 30 individuals was modified until 150 consecutive generations without any change in this population, and the best-found solution was recorded. The time of computation was also an important aspect and was recorded, as well. Both the designed crossovers and some operators from the literature were considered: CPX2, CEPX, CECPX2, GCECPX2, RBX and SPX (see sections 6.4.3 and 6.4.4). This resulted in 6 versions of the basic (pure) MA. CPM always played the role of mutation (if mutation was enabled) and the greedy LS was employed to initial solutions and offspring: firstly with the CW operator and then with the combined 2opt and swap neighbourhoods, like in the experiments with local search speed (section 7.3.6). Another issue was the impact of mutation on the results of the MA. Therefore, one configuration without mutation was considered (denoted ‘noMutation’). Each version and configuration of the MA was executed 15 times on each of the 22 considered benchmark instances of the CVRP, in order to get reliable-enough estimates of average and standard deviation of two interesting quantities: time of computation and solution quality. All these runs were performed on identical desktop PCs with Intel Pentium 4 2.6 GHz processor, 2 GB RAM, running MS Windows XP. Quality of solutions: aggregated results The aggregated results of the experiment are presented first. Although such aggregation obscures the differences between instances, it may be useful in getting the idea about the general performance of all the algorithm versions. Figure 7.19 presents the average (bars) and one standard deviation (whiskers) of the results of all MAs, aggregated across 22 instances and 15 runs per instance. The figure shows that all versions of the basic algorithm (with mutation) perform very well: the average quality of solutions exceeds the quality of the best-known ones only by a fraction of one percent (0.5–0.7%), with the deviation of no more than another percent. The very good quality of results of the basic MA versions is also confirmed by the number of instances on which the best-known solutions have been found. This is shown in table 7.5, which is an aggregated view of detailed tables B.1 and B.2. Table 7.5: The number of instances for which each basic version of the MA found the best-known solution in long runs. Best: the best run of 15 for a version; all: all runs of a version. MA version best all CPX2 CEPX CECPX2 GCECPX2 RBX SPX 11 11 10 9 9 10 5 5 6 4 3 5 In case of CPX2 and CEPX the best-known solutions were found in some runs for 11 instances 133 7.9. Experiments with memetic algorithm 2,50% 2,25% average excess [%] 2,00% 1,75% 1,50% 1,25% 1,00% 0,75% CPX2 CEPX CECPX2 GCECPX2 RBX 0,57% 0,71% 0,67% 0,61% 0,00% 0,51% 0,25% 0,54% 0,50% SPX MA type noMutation basic Figure 7.19: Aggregated quality of solutions generated by different versions and configurations of the memetic algorithm; long runs. out of 22. These are mostly smaller ones, but not always (e.g. f134, tai150a). 5 instances were solved to the best-known values by every run of these algorithms. For 3 instances (c50, c100b, tai75c) all algorithms in all runs found the best-known solutions. Versions other than CPX2 and CEPX are only slightly inferior from the point of view of the best-known solutions found. If the ranking of the quality of solutions produced by the MAs in long runs were required, the author would establish it based on table 7.6. An entry in the table presents the aggregated results of statistical tests for the difference in average quality of results in each pair of MAs. The first number in an entry (before ‘/’) is the number of instances for which the algorithm in a row is statistically better than the algorithm in a column (i.e. the row is a ‘winner’). The second number (after ‘/’) is the number of instances for which the algorithm in a column is better, but the number is given with the negative sign (i.e. the row is a ‘loser’). The column ‘totals’ gives the sums of the won and lost comparisons for each row. The column ‘sum’ is the sum of the wins and losses, so it is the net flow score for each algorithm. The Cochran-Cox test for the difference in two population means was employed (Krysicki et al. 1998, Ferguson & Takane 1989). This test does not assume that variances in the compared populations are equal, like Student’s test does. Each instance of the CVRP was tested separately, and for each pair of the algorithm versions the tested hypothesis was: H0 : µ1 = µ2 against the X̄1 −X̄2 . alternative: H1 : µ1 6= µ2 (the two-sided test). The test statistic was: C = √ 2 2 S1 /(n1 −1)+S2 /(n2 −1) The level of significance was always set to 0.05. The table shows that the best MA version employs CEPX; it wins 26 direct comparisons and loses only 2. The runner-up is SPX (17/-4). CPX2 is slightly worse (16/-5) and comes third, then CECPX2 and GCECPX2. The worst results are due to RBX, which loses 29 comparisons and wins none: it is not statistically better than any of the algorithms on any instance in long runs. A direct comparison of the best and the second-best algorithms, CEPX and SPX, reveals that the former is statistically better on 3 instances (these are: c120, tai100a, tai150c), while the latter only on one (f134). In other cases the observed differences are not statistically significant. For most of the versions (crossover operators) the presence of the CPM operator seems to be 134 Adaptation of the memetic algorithm to the capacitated vehicle routing problem Table 7.6: Comparison of the basic MA versions with the Cochran-Cox statistical test for the significance of the difference of averages; long runs. CPX2 CEPX CECPX2 GCECPX2 RBX SPX CPX2 CEPX CECPX2 GCECPX2 RBX SPX totals sum 0/0 2/0 1/-2 0/-5 0/-9 2/0 0/-2 0/0 1/-4 0/-8 0/-9 1/-3 2/-1 4/-1 0/0 1/-4 0/-4 2/-1 5/0 8/0 4/-1 0/0 0/0 5/0 9/0 9/0 4/0 0/0 0/0 7/0 0/-2 3/-1 1/-2 0/-5 0/-7 0/ 0 16/-5 26/-2 11/-9 1/-22 0/-29 17/-4 11 24 2 -21 -29 13 important. Without this mutation only the SPX version generates solutions of comparable quality (deterioration by less than 0.15%); all other crossover operators produce visibly worse solutions. CPM is most important for CPX2 (deterioration by 1.25% without mutation) and CEPX (0.8%), while its impact on CECPX2, GCECPX2 and RBX is somewhat smaller (less than 0.4%). Compared to the quality of the best heuristic solutions (the Clarke and Wright heuristic), the quality of the basic MA solutions is, on average, better by 4%, which is a moderate gain. Quality of solutions: basic MA Here, some more detailed results of runs of the basic configuration of the MA for the larger CVRP instances are presented. Figure 7.20 shows the averages and the standard deviations of the final solution quality for instances with more than 100 customers. The averages were taken over 15 runs. The actual values of the presented statistics for all instances are presented in the appendix in tables B.1 and B.2. 3,5% average excess [%] 3,0% 2,5% 2,0% 1,5% 1,0% 0,5% 0,0% c120 f134 c150 tai150a tai150b tai150c tai150d c199 tai385 instance CPX2 CEPX CECPX2 GCECPX2 RBX SPX Figure 7.20: Average quality of solutions generated by the basic MA configuration for larger instances; long runs. One can see in the figure that the averages differ somewhat between instances. For some of them almost the best-known solution qualities are consistently obtained (c120, f134, tai150a), while for some other ones none of the versions of the MA is able to reach the average excess of 1% (c199, tai150b). It seems that these two instances are hard for the designed MA, irrespective of the crossover operator employed. 135 7.9. Experiments with memetic algorithm Compared to the aggregated results presented in figure 7.19, it seems that these results confirm the earlier conclusions: the versions which appear good on average (e.g. CEPX) are also good on the presented instances. The same effect was observed on the subset of smaller instances. There, only one instance, tai100a, had the average excess larger than 1% (1.5% exactly). The noMutation configuration of the MA yielded very similar results, confirming the results presented above, so the details are not presented here. Time of computation: basic MA Figure 7.21 presents the average time of computation of each version of the basic MA configuration as a function of instance size. In addition to raw data the figure also shows the curves of regression of the form time = sizea · b (the power function). For all those lines the values of r2 exceed 0.9, so they approximate the actual data very well. 1500 average time [s] 1200 900 600 300 0 50 100 150 200 250 300 350 400 instance size CPX2 CEPX CECPX2 GCECPX2 RBX SPX r-CPX2 r-CEPX r-CECPX2 r-GCECPX2 r-RBX r-SPX Figure 7.21: Average time of computation of the basic MA, together with curves of power regression; long runs. Comparing the lines one can clearly see that there are major differences between the considered versions with respect to the time of computation until convergence. The process of computation finishes earliest for the RBX version, and the difference with the other ones is large. Then the order is: GCECPX2, CECPX2 and SPX (probably indiscernible), CEPX and CPX2, the slowest. The exponents of the power regression functions are: 3.45 (RBX), 3.99 (GCECPX2), 3.86 (SPX), 4.18 (CECPX2), 4.38 (CEPX) and 4.13 (CPX2). If one wanted to compare these times of computation with the results of the statistical test (table 7.6) then CPX2 would be dominated by CEPX and SPX (faster algorithms and better results), while all the other versions would be non dominated (a slower algorithm gives a slightly better result). The time of computation is not the only indicator of the speed of the artificial evolution. The number of generations of the process may also shed some light on it. Figure 7.22 shows the average number of generations of the basic configuration as a function of instance size. Again, the lines of regression are shown, as these approximate well the observed tendencies (all r2 > 0.89). This figure partially explains the differences in the time of computation observed earlier: the MA versions differ in the number of performed generations. RBX performs the least generations 136 Adaptation of the memetic algorithm to the capacitated vehicle routing problem average number of generations 8000 7000 6000 5000 4000 3000 2000 1000 0 50 100 150 200 250 300 350 400 instance size CPX2 CEPX CECPX2 GCECPX2 RBX SPX r-CPX2 r-CEPX r-CECPX2 r-GCECPX2 r-RBX r-SPX Figure 7.22: Average number of generations of the basic MA, together with lines of regression; long runs. for instances with more than 70 customers. GCECPX2 comes as second, then CECPX2, SPX, CEPX and CPX2 (again the most resource-consuming one). The number of generations is related to the probability of some new individual being put into the population. Clearly, this probability is highest for the CPX2 version, since the algorithm is not able to stop as early as the other ones. On the other hand, this probability is smallest for RBX, meaning that the process of evolution fades out relatively early in this case, and that the combination of RBX, CPM and greedy LS is not able to generate anything new to the population. This means that RBX or GCECPX2 may be profitable in short runs, but they lead to premature convergence. Hence, when more time is available, CEPX and CPX2, the long-runners with higher potential for generating new solutions, provide better results. The last quantity which may give some insight into the nature of the evolution process is the number of performed local search iterations per generation. This is presented in figure 7.23, as a function of instance size. Once more the lines of regression are shown, as they generalise the observed data well (all r2 > 0.89). The figure reveals that some of the MA versions differ greatly in the average number of LS iterations. CPX2 performs even more than twice as much as the second-largest CEPX. In this ranking CECPX2 comes third, with SPX and GCECPX2 being more or less equal; RBX performs slightly less iterations per generation. In the author’s opinion, these differences may be closely related to the completeness of an offspring generated by the considered crossover operators. Here, by completeness the author means the degree to which the generated routes (vehicles) are filled with demands and are optimised in terms of distance. In case of CPX2 these indicators are most probably much lower than for the other operators (see figure 7.15), meaning that CPX2 is too disruptive. The results presented here would sustain the hypothesis that only SPX, GCECPX2 and RBX generate complete enough offspring that need few LS iterations to become local optima. CECPX2, CEPX and CPX2 need substantially more iterations of local search for their offspring. This completeness of offspring is also important for the quality of solutions when there is no CPM mutation (see again figure 7.19): the disruptive CPX2 and CEPX substantially deteriorate their results without this mutation. Therefore, it seems that the more complete an offspring is, 137 7.9. Experiments with memetic algorithm average LS iterations per generation 300 250 200 150 100 50 0 50 100 150 200 250 300 350 400 instance size CPX2 CEPX CECPX2 GCECPX2 RBX SPX r-CPX2 r-CEPX r-CECPX2 r-GCECPX2 r-RBX r-SPX Figure 7.23: Average number of local search iterations per generation of the basic MA, together with lines of regression; long runs. the better. In case of the systematically constructed operators that would mean that the more common parental properties are preserved in an offspring, the faster the computation converges, and still to good quality solutions. The disruptive operators need more help from CPM and LS. On the other hand, it may be said that the high degree of completeness of offspring is not always beneficial in a long process of computation: the most disruptive and LS-consuming operators, CPX2 and CEPX, are also the ones to produce the best solutions. This comes at the cost of a long evolution, though. Impact of CPM mutation Figure 7.24 present the average time of the memetic algorithm for the basic and the noMutation configurations. The time is plotted as a function of instance size, together with lines of power regression. For all the presented lines values of r2 exceed 0.9. The comparison of the basic and the noMutation series reveals that, generally, the absence of CPM mutation accelerates convergence. The degree of this acceleration differs between MA versions, being the largest for CPX2 and CEPX, and smallest (or even none) for CECPX2. Significant speedup is usually visible only for larger instances (size more than 100); for the smaller ones this effect is negligible. However, except for CPX2 there is no clear relationship of this acceleration with the instance size. It seems that this phenomenon depends on some other factors than size. Nevertheless, even from the qualitative point of view this result is telling something about the importance of CPM mutation for the designed MA. First, CPX2 and CEPX especially need the presence of this mutation in order to slower their convergence. Although CPM makes computation much longer in their case, the convergence without CPM is premature. This is confirmed by the quality of results, which deteriorate without mutation (see figure 7.19). Second, CECPX2, GCECPX2 and RBX also need mutation, but to lesser extent. It appears that these operators, coupled with greedy LS, can sustain for themselves the high probability of generating new good solutions. At the same time they produce very good results, almost as good as with mutation. From this point of view SPX and LS is the most self-sufficient pair. Overall, the presence of CPM seems to improve the quality of results of all MA versions. The crossovers which create only partially complete solutions (CPX2, CEPX) need its presence to high extent. The other ones, which produce more complete offspring, need it somewhat less. 138 1500 1500 1200 1200 average CEPX time [s] average CPX2 time [s] Adaptation of the memetic algorithm to the capacitated vehicle routing problem 900 600 300 0 900 600 300 0 50 100 150 200 250 300 350 400 50 100 150 instance size 250 300 350 400 300 350 400 300 350 400 1500 average GCECPX2 time [s] average CECPX2 time [s] 1500 1200 900 600 300 0 1200 900 600 300 0 50 100 150 200 250 300 350 400 50 100 150 instance size 200 250 instance size 1500 1500 1200 1200 average SPX time [s] average RBX time [s] 200 instance size 900 600 900 600 300 300 0 0 50 100 150 200 250 300 instance size 350 400 50 100 150 200 250 instance size Figure 7.24: Time of computation: CPX2 and CEPX (top), CECPX2 and GCECPX2 (middle), RBX and SPX (bottom); long runs. Basic MA: squares, solid line; noMutation MA: diamonds, dotted line. 7.9.2 Runs limited by time The goal of the second experiment was to check the quality of results that may be generated by MAs which are given exactly the same time of computation. The time limit for one run was set to 256 seconds (almost 5 minutes). Exactly the same versions of MA were employed in this experiment: CPX2, CEPX, CECPX2, GCECPX2, RBX, SPX, with the same size of population (30). Additionally, a multiple start local search (MSLS) was run, in order to check if the recombinations and the mutation contribute anything to the basic local search. Indeed, the MA may be perceived as MSLS but with different starting points, provided by the ‘genetic’ operators (Jaszkiewicz & Kominek 2003). The MSLS had exactly the same set of initial solutions as the MAs; when no more heuristic solutions could be generated, random ones were always used. Again, the impact of mutation on the MAs was of interest and the noMutation configuration was launched. Each considered version and configuration of the algorithms was executed 15 times on the 22 considered CVRP instances. The same set of PCs was used in this second experiment. 139 7.9. Experiments with memetic algorithm Quality of solutions: aggregated results Figure 7.25 presents the average (bars) and one standard deviation (whiskers) of the results of all MAs and MSLS, aggregated across 22 instances and 15 runs per instance. 2,50% 2,25% average excess [%] 2,00% 1,75% 1,50% 1,25% 1,00% 0,75% MSLS CPX2 CEPX CECPX2 GCECPX2 RBX 0,64% 0,64% 0,68% 0,69% 0,65% 0,00% 0,81% 0,25% 2,48% 0,50% SPX algorithm type noMutation basic Figure 7.25: Aggregated quality of solutions generated by different versions and configurations of the memetic algorithm; runs limited by 256 s. In the figure one can see that MSLS is the worst configuration of all, whether MAs use mutation or not. This is good news from the point of view of the designed operators: their presence in a population really matters and improves the final result by 1–2%. Comparing the basic results to the previous experiment, one can see that the averages are worse here, but only slightly, e.g. GCECPX2 generates solutions of almost the same quality, SPX is worse by less than 0.1%. RBX, quite surprisingly, improves the result. This is perhaps due to the fact that in the previous experiment it actualy converged before 256 s passed. CPX2 is hit hardest by the time limit of 256 s, but even its result worsens only by 0.26%. This would mean that the 256 s limit generally allows the algorithms to converge on the tested set of instances. In this experiment, the ranking based on multiple Cochran-Cox tests is: SPX (16/-1), RBX (11/0), CECPX2 (8/-3), GCECPX2 (8/-4), CEPX (8/-8) and CPX2 (0/-35). A probable cause of the change in the ranking compared to the previous one (especially of RBX being second) is the speed of operators and the completeness of generated offspring, which were commented on earlier. This should be also visible later e.g. on the numbers of performed generations. Still, the average results of all basic MAs are less than 1% worse than the best-known solutions. These are very good results. The impact of mutation on the quality of results Concerning the configuration without mutation, it can be seen that SPX performs best in such conditions. Generally, all MA versions (recombination operators) have worse results, but in case of SPX the deterioration is smallest. While looking at the noMutation series, it seems that generally the more complete is an offspring produced by an operator, the better solutions the related MA generates. This is clearly seen for the operators designed in this thesis: GCECPX2 is best, then CECPX2, CEPX and finally CPX2. The author of the thesis thinks there are two possible causes of this effect. One is the mentioned 140 Adaptation of the memetic algorithm to the capacitated vehicle routing problem offspring completeness: a more complete offspring consumes less LS iterations to become a local optimum. Hence, more generations of the MA may be performed. The other cause may be that some recombinations, like SPX and GCECPX2, also generate implicit mutations, while CEPX or RBX may be more deterministic. This observation and the comparison of basic and noMutation series give rise to an interesting hypothesis: that the presence of mutation (either CPM or implicit in recombination) is important in a memetic algorithm for the CVRP. This might be due to fast convergence of MAs or to a multimodal landscape. Either of these statements remain a hypothesis without further evidence. Speed of the algorithms Figure 7.26 shows the generations of each MA as a function of instance size. A point in the figure represents one instance and the average number of generations for this instance based on 15 runs. There are also lines of power regression. For all of them the values of r2 exceed 0.95, so they approximate the raw data very well. average number of generations 100000 10000 1000 100 50 100 150 200 250 300 350 400 instance size CPX2 CEPX CECPX2 GCECPX2 RBX SPX MSLS r-CPX2 r-CEPX r-CECPX2 r-GCECPX2 r-RBX r-SPX r-MSLS Figure 7.26: Average numbers of generations of the the basic MA configurations; runs limited by 256 s. Firstly, the comparison of these lines shows that multiple start local search makes the least number of generations (iterations). CPX2 comes next, then CEPX, CECPX2, GCECPX2 and SPX, and finally RBX. This confirms the result of the previous experiment: RBX coupled with LS is the fastest pair. This effect seems to be caused by the number of iterations of local search after recombination. To demonstrate this, LS iterations as a function of generation number are shown in figure 7.27, for two largest instances. The top chart (c199) shows that recombination operators really do improve the situation over random restarts: all operators result in offspring which are closer to local optima (in terms of iterations) than a random solution in MSLS. Moreover, all the operators reduce the number of LS iterations over time (generations). The order of operators with respect to the decreasing number of iterations is the same as with respect to the number of generations (figure 7.26): CPX2, CEPX, CECPX2, GCECPX2 and SPX, RBX. It is also consistent with a chart of average quality of solutions as a function of time (not shown). 141 average LS iterations per generation 7.9. Experiments with memetic algorithm 900 MSLS 800 700 600 500 CPX2 400 300 CEPX 200 GCECPX2 CECPX2 SPX 100 RBX 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 generation CPX2 CEPX CECPX2 GCECPX2 RBX SPX MSLS average LS iterations per generation 2000 MSLS 1800 1600 1400 CPX2 1200 CEPX 1000 CECPX2 800 600 GCECPX2X 400 SPX 200 RBX 0 0 20 40 60 80 100 120 140 160 180 200 generation CPX2 CEPX CECPX2 GCECPX2 RBX SPX MSLS Figure 7.27: Average numbers of LS iterations per generation: c199 (top) and tai385 (bottom); runs limited by 256 s. It can be also seen that the more ‘complete’ an offspring is generated by recombination, the less LS iterations it needs to become a local optimum, and the faster the drop in these iterations. Compare e.g. CPX2 (500 iterations, slowly decreasing) and CECPX2 (a drop from initial 400 to less than 100). It may mean that the more preserving an operator is, the faster the population is made uniform. The chart showing results on instance tai385 is very similar, although it looks like the one for c199 truncated at the 200th generation: MSLS has just finished processing heuristic solutions (approx. 500 LS iterations per solution) and started with random ones (1900 iterations); CEPX is still slower than CPX2, and GCECPX2 than SPX. The distance preserving operators are slow at the beginning. The main conclusion is that the more ‘complete’ an offspring is, the faster the computation. Since there is a substantial gap between CECPX2 and GCECPX2 in figure 7.27, it also seems that a distance-preserving recombination operator should include some greedy completion procedure. 142 7.9.3 Adaptation of the memetic algorithm to the capacitated vehicle routing problem Quality vs. FDC The author also analysed the relationship between the quality of results of MAs and the fitnessdistance determination coefficients. As a quality indicator the average excess over the best-known solution was employed, for each instance and algorithm version. The basic configuration from the first experiment was considered (long runs; see tables B.1 and B.2). Values of the fitness-distance determination coefficient (r2 ) for each distance measure and instance were taken from table 7.4 as the second observed variable. The strength of the analysed relationship was measured by the determination coefficient (r2 ). The scatter plots are not presented here, because no visible relationships could be found. All the computed r2 values were below 0.15, indicating the ‘no relationship’ case. Moreover, for the meaningful measures de , dpn and deu these values were virtually zero. Therefore, it seems that there is no relationship between the average excess and the FD determination. Yet another attempt to relate FDC with the quality of results was made. This time the results were simplified to high extent: instances were classified as easy/hard to solve and as with/without FDC. A hard instance was one with at least 3 algorithms with the average excess above 1%. In practise it meant that no algorithm was better than 0.5% on such an instance. The classification with respect to FDC was taken from table 7.4 and even more simplified: the ‘no’ and ’ambiguous’ cases were merged into ‘no FDC’. Then all the instances were aggregated, as shown in figure 7.28. number of instances 10 8 6 easy hard 4 2 0 no FDC FDC classification based on FDC Figure 7.28: The relationship between FDC and the hardness of CVRP instances. One can see that the ‘no FDC’ class contains mainly easy instances, contrary to the expected result. Moreover, the ’FDC’ class is mixed, containing almost the same number of easy and hard instances. Overall, it appears that even in this simplified classification there is no relationship between FD determination coefficients and the quality of results. This would mean that in the CVRP the FD coefficients should not be used for the prediction of instance hardness for the memetic algorithm. 7.9.4 Quality vs. feasibility of neighbours During the preparation of this thesis numerous analyses of solutions were conducted. One of them concerned the feasibility of neighbours of local optima and the best-known solutions. The author had the hypothesis that at least some instances may have the best-solutions with many infeasible neighbours (somewhat ‘hidden’ in the solution space), thus hindering the designed local search. 143 7.9. Experiments with memetic algorithm The author knows the form of the best-known solutions for 18 out of 22 instances. These solutions remain unknown for: tai100a, tai100b, tai150b, tai150c. All 2opt, swap and merge neighbours of the available best-solutions and local optima were examined for feasibility; the sets of local optima employed earlier in the FDA were used here. The fraction of feasible neighbours is shown in figure 7.29. c50 f71 c75 tai75a tai75b tai75c instance tai75d c100 c100b tai100c tai100d c120 f134 c150 tai150a tai150d c199 tai385 0,0 5,0 10,0 15,0 20,0 25,0 30,0 35,0 40,0 45,0 50,0 feasible neighbours [%] best-known local-optima Figure 7.29: The percentage of feasible neighbours of the best-known solutions and local optima. This figure reveals interesting properties of solutions. For 15 instances the best-known solutions have less feasible neighbours then an average local optimum; the exceptions are: c100b, c120 and f134. The drop in the fraction amounts to 7%, from 28% for local optima to 21% for best-known solutions. The actual percentage of feasible neighbours of best-known solutions varies greatly across instances. It is very low for several of them, e.g. 9% for tai385, 11% for c199, 13% for c75. For some other it is much higher: 40% (c100b), 38% (f71), 34% (f134). This means that instances differ greatly with respect to the accessibility of the best-known solutions by means of the employed local search; the best-known solutions for some instances are more on the edge of feasibility than for others. This may have impact on the quality of results and eventually on the hardness of instances. Therefore, the relationship between the quality of results and the feasibility of neighbours was further examined. A scatter plot of it is shown in figure 7.30; the values of r2 for the presented data series are given in table 7.7. One can see in the table that as much as 44% of variability in the average excess of results of MA-RBX may be explained by the variability of the percentage of feasible neighbours of the bestknown solutions. Compared to the relationship between FDC and quality this one is a surprisingly strong relationship. For other versions of the MA the values of r2 are a bit smaller, but except for GCECPX2 they are all around 40%. On top of that, the figure shows that there really may be some impact of the feasible neighbours on the quality of results, in contrast to FDC. 144 Adaptation of the memetic algorithm to the capacitated vehicle routing problem 2,0% 1,8% average excess [%] 1,6% 1,4% 1,2% 1,0% 0,8% 0,6% 0,4% 0,2% 0,0% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% feasible neighbours of the best-known solution [%] CPX2 CEPX CECPX2 GCECPX2 RBX SPX Figure 7.30: The relationship between the percentage of feasible neighbours and the quality of results obtained by basic MAs in long runs. Table 7.7: Values of the r2 determination coefficient for the relationship between the percentage of feasible neighbours of the best-known solutions and the average excess of results in long runs. 7.10 CPX2 CEPX CECPX2 GCECPX2 RBX SPX 0.38 0.40 0.39 0.27 0.43 0.37 Summary and conclusions This chapter presented the process of adaptation of the memetic algorithm to the capacitated vehicle routing problem. It was clearly visible in the chapter that adaptation of the MA requires many design decisions to be made, like for any other metaheuristic. In particular, the author presented the chosen representation, the design of the local search algorithm, the choice and design of initial solutions. What is more important, the systematic design of recombination operators based on the fitnessdistance analysis was performed. The author was first to have analysed the relationship between fitness and distance in the CVRP. Firstly, he designed and implemented distance measures for solutions of the problem. These measures were then used in the fitness-distance analysis. FDA revealed that the average distances de , dpn , deu , dear between local optima are approximately 30% smaller than the average distance between random solutions: local optima share some features and are to some extent concentrated in the analysed landscapes. Moreover, moderate values of FDC exist in case of 3 measures: de , dpn , deu ; this means that better local optima of the CVRP tend to have more common features (edges, clusters, subsequences of customers) than worse ones. However, it was also observed that the existence of FDC depends on the analysed problem instance. These FDA results confirm to some extent the research intuition expressed earlier e.g. by Taillard (1993) and Rochat & Taillard (1995), that good solutions of the CVRP are similar to the best-known ones. There, the intuition was based on visual similarity of solutions, while here it was expressed with distance measures and analysed empirically. It appears, however, that the intuition was right only to some extent (moderate correlations) and for some instances. 7.10. Summary and conclusions 145 The results were then the basis for the design and implementation of 4 distance-preserving recombination operators: CPX2, CEPX, CECPX2, GCECPX2. Some of these operators preserve dpn , some de , some both the distance measures. Additionally, the mutation operator CPM was designed to preserve dpn , but disturb de . These operators, also with RBX and SPX taken from the literature, were tested in two experiments with memetic algorithms. The presented results of the experiments allow to form the following conclusions. • In runs limited by time the speed of the pair (recombination, LS) appeared to be an issue. RBX demonstrated to be faster than all the systematically designed operators, while SPX faster than some of them. At the same time SPX and RBX provided better results in these short runs, confirmed by statistical tests. • The distance-preserving CEPX has a higher probability of generating better solutions in the MA than RBX and SPX. This was visible in the numbers of generations in long runs. It was also confirmed by statistical tests on the quality of results in these runs. • The presence of CPM mutation in all MAs had positive effect on the quality of generated solutions. The more disruptive an operator is, the more it requires CPM in the MA. CPM was most important to CPX2, while least to SPX. • The overall results of the MAs were very good. The average excess above the best-known solutions amounted to 0.5–0.7% for all recombination versions. For half of the studied instances some of the long runs of MAs found the best-known solutions. The best results were generated by the MA employing CEPX and CPM designed in this thesis. The author concludes that a recombination operator for the CVRP should preserve common parental features (edges, clusters) and try not to be too disruptive. If speed is an issue, it may contain a greedy component to make an offspring more complete and require less LS iterations. Overall, from the group of the tested recombinations the author of this thesis would choose Prins’s (2004) SPX to be used in short runs. It proved to generate good solutions and be reasonably fast when coupled with local search. For longer runs, when the best quality is required, CEPX with CPM should be used; the pair has the highest probability of generating better solutions. The author also attempted to relate the results of the long experiment to the fitness-distance determination coefficients. This attempt failed. No relationship of this kind was found, which could confirm some earlier hypotheses and results of other authors that FDC may be a good predictor of problem hardness for evolutionary algorithms. Surprisingly, the quality of results of MAs appeared to be linked to some extent with the feasibility of neighbours of the best-known solutions. This means that an exploration of the edge between feasible and infeasible solutions might ease optimisation; the algorithms presented here explored it only from the side of feasible solutions. Concerning the method of systematic construction of recombination operators based on the fitness-distance analysis, the author deems it provided good results. The best distance-preserving recombination coupled with the designed mutation generate solutions of the best quality among all the tested operators. Chapter 8 The car sequencing problem 8.1 ROADEF Challenge 2005 The second problem addressed in this thesis was the subject of an open computational contest ROADEF Challenge 2005 (Cung 2005b), organised by French Society of Operations Research and Decision Analysis. The formulation of the problem was given by French car manufacturer Renault. The publication of this problem in July 2003 started the challenge. The author of this thesis participated in the contest in a senior team from Poznan University of Technology (PUT), together with Andrzej Jaszkiewicz and Paweł Kominek. There were also 6 other teams from PUT registered in the contest, 5 of which were junior teams led by Grzegorz Pawlak. These were the only teams from Poland. Overall, 55 teams from 15 countries took part in the challenge. There were two stages: qualification and final. Based on the ranking of results of programs submitted by participants in March 2004, the jury selected 24 teams for the final stage. This included two PUT teams: one led by Pawlak and one by Jaszkiewicz. In the final stage the qualified teams could improve their code based on additional instances provided by Renault. Improved programs were submitted in October 2004. Final results were announced in February 2005. The winner was a team from Université de Marseilles, France: Bertrand Estellon, Frederic Gardi and Karim Nouioua, also the authors of the publication (Estellon et al. 2006). The second team was from Brazil, led by Celso Ribeiro, the co-author of (Ribeiro et al. 2005). The best team from PUT finished 10th: Grzegorz Pawlak, Maciej Płaza, Przemysław Piechowiak and Marek Rucinski. The second-best team from PUT was led by Jaszkiewicz. It finished the competition on 13th place. Detailed results may be found at the challenge web page (Cung 2005b). 8.2 Problem formulation Renault’s car sequencing problem (CarSP) requires that a set of cars to be produced during a working day is put in some order (sequence) on a production line. This order has to observe the constraints of the whole technological process in a paint shop and assembly line (Nguyen 2003). The paint shop requires that cars to be manufactured are grouped together in subsequences of the same colour, because it minimises the cost of purging spray guns when a colour changes. At the same time, the colour has to be changed regularly; it is a hard constraint (see section 2.1.2). The assembly line requires that workload is distributed evenly along the line. The workload is related with some car options (e.g. sun-rof, navigation system, power windows, etc.). While the assembly line should advance in regular time intervals, some options require more work to be done 147 148 The car sequencing problem at certain assembly stations and delay this advance. Therefore, for each option some constraints are given in form of ratios N/D. Such a ratio constraint states that at most N cars may require an option in any continuous subsequence of D cars. Otherwise, some violations of the constraint are computed. A perfect solution for an assembly line causes no violations along the whole line. If that is not possible, the number of violations should be minimised. Since there are some options more important than others, the sum of violations is weighted by the given priorities. The overall goal of the problem is to establish a feasible sequence of cars which minimises the weighted sum of paint colour changes and violations of ratio constraints. Formal description of the problem is given below. Input data A set of N cars is given to be manufactured in the current production day. Each car is uniquely identified by a number. Here it may be assumed that the cars are numbered by consecutive integers from 1 to N . Each car has a paint colour code assigned which describes its required body colour. Lets assume that these codes are consecutive integers from 1 to C. Then, function col(i) = c ∈ {1, . . . , C} describes the colour required for car i. Related to the colour information is the paint batch limit, P BL, a natural number. Another batch of input data is connected with options a car may possess. There is a number of options given, O. For each option j a function optj indicates whether a car requires the option to be assembled: optj (i) = 1 if car i needs the option (the option is active for the car), optj (i) = 0 otherwise. All cars are described this way. With each option there is always a fractional number given in the form: Nj /Dj , where Dj , Nj ≤ N and Dj ≥ Nj > 0. This pair of numbers describes the ratio constraint related to option j. There is always one constraint of this kind related to one option. Moreover, for each option there is also its priority assigned, in the form of a binary value prio(j). If prio(j) = 1, then the related option is of high priority, otherwise (prio(j) = 0) it has low priority in the planning and manufacturing process. This way the set of options (and related ratio constrains) is partitioned into two subsets: high and low priority ones. Additionally, the subset of high priority ratio constraints is said either to be easy (dif = 0) or difficult (dif = 1) in an instance of the CarSP. On top of that, there is also a final fragment of a sequence of Nprev cars from a previous production day given. These cars may be identified by negative indexes i from −Nprev + 1 to 0 (to discern them from the current-day cars). The same pieces of information are given for these cars as for the ones from the current production day: options optj and colours col. These cars are required to properly define the objective function in the problem. Finally, instance data contains a vector of three natural weights for each of the three components of the objective function: w = (wHP RCs , wLP RCs , wP CC ). The set of all possible values of w contains only 3 vectors: {(106 , 103 , 1), (106 , 1, 103 ), (103 , 1, 106 )}. Feasible solution A solution s to an instance of the CarSP is a sequence (permutation) of all cars: s = (s1 , s2 , . . . , sN ) where ∀l ∈ {1, . . . , N } sl ∈ {1, . . . , N }; sl = i means that car i appears at position l in the sequence. Obviously, it is required from a feasible solution that it is a permutation: each car is 149 8.2. Problem formulation present exactly once in the sequence: ∀l1 , l2 ∈ {1, . . . , N }, l1 6= l2 : sl1 6= sl2 There is one hard constraint in the problem. It is defined by the paint batch limit, PBL, and states that there must not be any continuous subsequence of cars with the same colour in s which is longer than P BL: ∀l ∈ {1, . . . , N − P BL} col(sl+1 ) = col(sl+2 ) = . . . = col(sl+P BL ) ⇒ col(sl ) 6= col(sl+1 ) Objective function In order to properly and easily define the objective function, each solution s to the CarSP has to be extended with cars from the previous day and some ‘dummy’ cars from the following day. There are Nprev cars from the previous day. They are given in a sequence starting from −Nprev + 1 and finishing at 0 (the position exactly adjacent to the first current-day car). Let Dmax = maxO j=1 {Dj } be the maximum of all denominators of ratio constraints. There are Dmax − 1 dummy cars added. They are numbered i = N + 1, . . . , N + Dmax − 1. Each such car i requires no additional options to be mounted, so optj (i) = 0 ∀j. The colour of these cars is irrelevant, so it may as well be col(i) = 1. Under these assumptions, for each solution s to the CarSP there is solution s0 defined, which is unambiguously extended to the previous and the following day: s0 = (s0−Nprev +1 , s0−Nprev +2 , . . . , s00 , s01 , s02 , . . . , s0N , s0N +1 , s0N +2 , . . . , s0N +Dmax −1 ) This is a sequence of Nprev cars from the previous day, followed by N cars from the current day and Dmax − 1 ‘dummy’ cars from the next day. Hence it is required that: ∀l ∈ {−Nprev + 1, . . . , 0} s0l = l ∀l ∈ {1, . . . , N } s0l = sl ∀l ∈ {N + 1, . . . , N + Dmax − 1} s0l = l The number of paint colour changes, P CC(s), is one component of the objective function: P CC(s) = P CC(s0 ) = |{l ∈ {1, . . . , N } : col(s0l ) 6= col(s0l−1 )}| In other words, it is the number of changes of the paint colour between two subsequent cars in sequence s, including possibly one colour change at the very beginning of the sequence (hence the need for a previous-day car). Then there are two components of the objective function related to the binary options j. Lets define acj as the number of active options j of such a subsequence of Dj consecutive cars in s0 that starts at position i: i+Dj −1 X acj (s0 , i) = optj (s0l ) l=i where i ∈ {−Dj + 2, −Dj + 3, . . . , −1, 0, 1, . . . , N }. The ratio constraint RCj defined by ratio Nj /Dj for option j states that in any sequence of Dj consecutive cars (called window) there should not be more than Nj cars with option j active. If for such a window starting at position i this number of active options is exceeded, the number of violations of the ratio constraint in this window is computed in the form: ( acj (s0 , i) − Nj if acj (s0 , i) > Nj vnj (s0 , i) = 0 otherwise 150 The car sequencing problem The number of violations of a ratio constraint in the entire sequence s is defined as the sum of violations in all subsequences: V Nj (s) = V Nj (s0 ) = N X vnj (s0 , i) i=−Dj +2 and these violations are computed also for all those subsequences starting in the previous-day sequence which contain at least one current-day car, hence the first i equals −Dj + 2. These violations are also computed for all the subsequences extending to the following day which contain at least Nj +1 current-day cars. Such a form of V Nj (s) ensures that when this number is minimised, then the workload on the assembly line is evenly distributed. The workload cannot be artificially greater at the beginning or the end of a working day, hence the previous and the following day cars are needed. Since the given options are partitioned into two subsets, they give rise to two subsets of ratio constraints: high priority ratio constraints (HPRCs) and low priority ones (LPRCs). The sums of violations in these sets define the two components of the objective function mentioned earlier: V NHP RCs (s) = O X prio(j) · V Nj (s) j=1 V NLP RCs (s) = O X (1 − prio(j)) · V Nj (s) j=1 Finally, the objective function f (s) to be minimised is a weighted sum of all components: f (s) = wHP RCs · V NHP RCs (s) + wLP RCs · V NLP RCs (s) + wP CC · P CC(s) and it reflects the wish to distribute the workload evenly along the assembly line and simultaneously minimise the cost of paint colour changes in the paint shop. Since these goals may be conflicting, the vector of weights w gives the order in which the components in f (s) should be minimised. For example, when w = (wHP RCs , wLP RCs , wP CC ) = (103 , 1, 106 ), then the number of paint colour changes P CC(s) should be minimised first, then the number of violations of high priority ratio constraints V NHP RCs (s), and finally, with the smallest weight, the violations of LPRCs, V NLP RCs (s). 8.2.1 Other forms of the problem According to Kis (2004), the car sequencing problem appeared in the literature in mid 1980s as a particular version of the job-shop scheduling problem. Later, in 1990s, it was addressed in the constraint programming community (Gent & Walsh 1999). In 1990s and early 2000s it emerged in papers on metaheuristics (Warwick & Tsang 1995, Cheng et al. 1999, Schneider et al. 2000). Consequently, Renault’s formulation of the CarSP is not the only one that may be found in the literature. There are several of them at least, and some of them are also connected with names of certain manufacturers. 1. Warwick & Tsang’s (1995) CarSP. This form of the problem addresses the assembly line requirements. There are constraints alike to RCs. Penalties are computed in a different way, though: only the fact of violating the constraints is counted, not the number of violations. It implies that optimum solutions without violations are the same as in Renault’s case, but for overconstrained instances they will be somewhat different. This version, taken from (Warwick & Tsang 1995) was most probably considered earlier in the constraint programming context. 8.2. Problem formulation 151 2. CSPLib CarSP. Only the assembly line requirements are considered, with penalties exactly equivalent to Renault’s ratio constraints. There are no colours of cars involved, no priorities of RCs, no previous-day cars. Hence, this problem may be modelled as Renault’s problem with all cars painted black, P BL = N and only LPRCs. This form of the problem was addressed by Gent (1998), Gottlieb et al. (2003), Kis (2004), Gravel et al. (2005), Terada et al. (2006) and Zinflou et al. (2007). 3. Ford’s CarSP. This version is quite different from Renault’s problem. It seems to consider all stages of manufacturing (body shop, paint shop, assembly line). There are two additional, specific hard constraints. Despite the presence of P CC(s) in the weighted objective function, penalties for violated ratio constraints are counted by a different formula. The problem with this form was considered only by Cheng et al. (1999). 4. BMW’s CarSP. Yet another version was inspired by a real production problem in BMW factories (Schneider et al. 2000). It addresses assembly line requirements only. Penalties similar in the form to RCs of 1/D or (D − 1)/D are computed, but by a different formula. Additionally, a specific component is present in the objective function, quite similar to the cost function in the TSP. This form of CarSP was described exclusively by Schneider et al. (2000). 5. Puchta & Gottlieb’s (2002) CarSP. Again, only the assembly line is addressed. There are as much as 5 different types of soft constraints given, two of them resembling RCs and the paint batch limit. A major difference here is that quadratic penalty functions are employed instead of linear ones. CarSP of this kind was attacked only by Puchta & Gottlieb (2002). Based on the detailed inspection of the mentioned formulations, the author of this thesis considers only CSPLib CarSP as similar enough to Renault’s problem. Other forms differ too much from Renault’s. Therefore, the algorithms proposed for the CSPLib problem may be the source of direct inspiration for and comparison with algorithms for the problem considered in this thesis. Algorithms designed for other problems are not directly comparable, although they may be the source of some inspiration. This remark is especially valid when one notices that the colour-related constraint and objective are actually computationally easy to solve, making CSPLib and Renault’s problems equivalently hard at their core. This is explained in more detail in section 8.3 on the complexity of the CarSP. 8.2.2 Groups of cars The original Renault’s problem description does not mention any similarities between cars to be manufactured. Each car has its own unique identifier in the input. In the output a feasible permutation of such identifiers is required. However, CSPLib CarSP (Gent & Walsh 1999, Kis 2004) explicitly states that input data contains G groups (or classes) of cars; each class represents ng cars (g = 1, . . . , G) with exactly the same options required. The same is in Ford’s case (Cheng et al. 1999) and Warwick & Tsang’s (1995). Therefore, it is useful to pose Renault’s problem in a similar way. The sole difference is that here the colour has to be accounted for: a group (a class) of cars is a set of cars with the same options and paint colour code. 152 The car sequencing problem All the objects defined above (solution s, option optj , etc.) may be easily redefined to handle groups of cars. For example, a solution s is still a sequence s = (s1 , s2 , . . . , sN ), but ∀i ∈ {1, . . . , N } si ∈ {1, . . . , G}; si = g means that a car from group g appears at position i. Moreover, s is feasible if all required cars are indeed produced: ∀g ∈ {1, . . . , G} |{i : si = g}| = ng This approach is useful in breaking symmetry in the problem. The search space of the reformulated problem is certainly not larger, but usually much smaller than the original space. This is because there is more permutations of size N than permutations of the same size but with certain repetitions. The search space stays exponentially large in practical cases, though. In the context of Renault’s CarSP and ROADEF Challenge 2005 the notion of a group of cars was introduced e.g. by Jaszkiewicz et al. (2004). On the other hand, Ribeiro et al. (2005) do not mention any groups of cars. 8.3 Computational complexity First proof of the NP-completeness of the decision version of the CarSP was given by Gent (1998) for the CSPLib form. However, Kis (2004) demonstrated a flaw in this proof and provided his own, yet for a subset of instances addressed by Gent. This subset contains instances with the number of cars of each group bounded by a polynomial of the number of classes: maxg {ng } ≤ poly(G). In other cases it is not even known if CarSP belongs to NP. Kis (2004) also demonstrated with his polynomial transformation that this subset of instances is strongly NP-hard. Concerning the colour component in Renault’s problem, it was found by several authors simultaneously that optimising P CC(s) while observing the paint batch limit is an easy problem. At least Pawlak (2007) and the author of this thesis designed a polynomial algorithm to deal with colours in an exact way. In the light of this last observation it appears that the hardness of Renault’s and CSPLib forms is related to exactly the same elements in the problem: multiple ratio constraints. 8.4 Instances Four sets of instances were made available by Renault for the purpose of the challenge: A, B, X and T (Cung 2005b). First 3 sets were used to evaluate programs submitted by participants; set T contained 4 test instances. Table 8.1 lists some basic properties of all A, B, X instances grouped by their general type. There are 5 types, which result from 2 values of dif and 3 values of w (one combination is meaningless). The type of each instance is encoded as a two-digit number WD, which results directly from dif and w (this is explained in appendix A). In this table, there is also the number of instances of each type given, the numbers of cars, ratio constraints and the paint batch limits. Overall there are 80 instances in these sets, so their detailed list would be too long to cite it here. However, 18 instances from set X, together with some reference results, are given in table 8.2. These results were generated in the final stage of the challenge, using the same machine and time limits for all submitted programs. The results include: the best average of 5 runs across all teams, the best result of the winning team (Cung 2005a) and the average of 5 runs of the program submitted by the author’s team. 153 8.5. Heuristic algorithms for the CarSP Table 8.1: Basic description of the types of instances of Renault’s CarSP, sets: A, B, X. #inst. type WD weights w dif D #cars (min-max) #HPRCs (min-max) #LPRCs (min-max) PBL (min-max) 13 9 25 11 22 00 01 30 31 60 (106 , 103 , 1) (106 , 103 , 1) (106 , 1, 103 ) (106 , 1, 103 ) (103 , 1, 106 ) 0 1 0 1 0 65–1231 128–1315 65–1319 128–1315 65–1270 1–11 2–9 1–11 2–9 1–11 1–19 2–16 0–19 0–16 0–19 10–500 10–150 10–500 10–150 10–1000 A competitor of the challenge will instantly note that the original names of Renaults’ instances were changed here. These was done because the original names were sometimes extremely long and hard to manage, so they were made shorter. The employed map of instance names is given in appendix A. For instance 035X60-0090 the author’s program did not provide any solution in the challenge. This was due to a bug in the program code, in the input data parsing procedure. 8.5 8.5.1 Heuristic algorithms for the CarSP Greedy heuristics by Gottlieb et al. Gottlieb et al. (2003) proposed a set of greedy heuristics for the CSPLib CarSP. These constructive heuristics start building a solution with an empty sequence. At each step one feasible index of a group is appended to the current partial sequence. The chosen group minimises the total number of new violations caused by the extension. In order to properly define this heuristic the notion of a partial sequence of groups (partial solution) has to be defined. Informally speaking, it is a sequence of groups which occupy only some initial, consecutive positions of a complete sequence. Formally, it is a sequence sp of length l, where 0 ≤ l ≤ N , of the form sp = (sp,1 , sp,2 , . . . , sp,l ). Let |sp | = l denote the length of this partial sequence and sp = () denote an empty sequence. Let also sp · g be sequence sp extended at the end (concatenated) with group g, thus creating a sequence with one more car. The notion of the number of violations of a ratio constraint RCj may be easily extended to a partial sequence sp if we assume that the undefined positions of sp (from |sp | + 1 up to N ) are filled with ‘dummy’ cars. Then such a complete current-day sequence may be unambiguously extended to the previous and the following day giving s0p . Hence, the number of violations of RCj in sp is computed as: V Nj (sp ) = V Nj (s0p ) = |sp | X vnj (s0p , i) i=−Di +2 The number of new violations of all ratio constraints caused by the extension of sp with a car from group g is defined as: ∆V N (sp , g) = O X (V Nj (sp · g) − V Nj (sp )) j=1 and this is the main heuristic function guiding the greedy constructive algorithm. However, Gottlieb et al. noticed that usually the set of groups which minimised the new violations contained more than one group, resulting in a tie. Thus, they defined a set of additional 154 The car sequencing problem Table 8.2: Values of the objective function in the final stage of the challenge (set X): best average result, winning team best, author’s team average. Components of the objective function given in brackets: (V NHP RCs /V NLP RCs /P CC). instance 022X60-0704 023X30-1260 024X30-1319 025X00-0996 028X00-0325 028X00-0065 029X30-0780 034X30-0921 034X30-0231 035X60-0090 035X60-0376 039X30-1247 039X30-1037 048X30-0519 048X31-0459 064X30-0875 064X30-0273 655X30-0264 655X30-0219 best average winner best author’s team average 12002003.0 (2.0/3.0/12.0) 192466.0 (0.0/66.0/192.4) 337006.0 (0.0/6.0/337.0) 160407.6 (0.0/160.0/407.6) 36341495.4 (36.0/341.4/95.4) 3.0 (0.0/0.0/3.0) 110298.4 (0.0/98.4/110.2) 55994.8 (0.0/794.8/55.2) 8087035.8 (8.0/35.8/87.0) 5010000.0 (10.0/0.0/5.0) 6056000.0 (56.0/0.0/6.0) 69239.0 (0.0/239.0/69.0) 231030.0 (0.0/30.0/231.0) 197005.6 (0.0/1005.6/196.0) 31077916.2 (31.0/1116.2/76.8) 61187229.8 (61.0/29.8/187.2) 37000.0 (0.0/0.0/37.0) 30000.0 (0.0/0.0/30.0) 153034000.0 (153.0/0.0/34.0) 12002003.0 (2.0/3.0/12.0) 191066.0 (0.0/66.0/191.0) 336006.0 (0.0/6.0/336.0) 1181284.0 (0.0/1181.0/284.0) 36361091.0 (36.0/361.0/91.0) 3.0 (0.0/0.0/3.0) 110069.0 (0.0/69.0/110.0) 55589.0 (0.0/589.0/55.0) 8087036.0 (8.0/36.0/87.0) 5010000.0 (10.0/0.0/5.0) 6056000.0 (56.0/0.0/6.0) 69238.0 (0.0/238.0/69.0) 231030.0 (0.0/30.0/231.0) 197111.0 (0.0/1111.0/196.0) 31077131.0 (31.0/1131.0/76.0) 61185029.0 (61.0/29.0/185.0) 37000.0 (0.0/0.0/37.0) 30000.0 (0.0/0.0/30.0) 153034000.0 (153.0/0.0/34.0) 12002003.0 (2.0/3.0/12.0) 246268.2 (0.0/68.2/246.2) 421425.0 (0.0/25.0/421.4) 189390.2 (0.0/188.8/590.2) 36377907.2 (36.0/377.8/107.2) 3.0 (0.0/0.0/3.0) 120855.0 (0.0/55.0/120.8) 76217.6 (0.0/617.6/75.6) 8091450.2 (8.0/50.2/91.4) no solution (–/–/–) 6056000.0 (56.0/0.0/6.0) 69455.6 (0.0/455.6/69.0) 239593.2 (0.0/193.2/239.4) 206509.6 (0.0/1109.6/205.4) 31104598.8 (31.0/998.8/103.6) 61229518.8 (61.0/118.8/229.4) 40400.0 (0.0/0.0/40.4) 30000.0 (0.0/0.0/30.0) 153035200.0 (153.0/0.0/35.2) 155 8.5. Heuristic algorithms for the CarSP tie-breaking helper functions. The two best performing ones were based on the notion of a (dynamic) utilisation rate of an option j in partial solution sp . Definition of such a rate requires that several other objects are defined. Let used(sp ) be a vector of length G describing how many cars of each group are used in sp : used(sp ) = (n01 , n02 , . . . , n0G ). Of course, ∀g ∈ {1, . . . , G} 0 ≤ n0g ≤ ng , meaning that no more cars of each group may be used than the total number of cars in this group. Moreover, ∀g ∈ {1, . . . , G} n0g = |{i ∈ {1, . . . , |sp |} : sp,i = g}|. It means that the component n0g represents the number of cars of group g in sp . It will be also denoted usedg (sp ). For an empty sequence sp = () we have usedg (()) = 0 for all g. The converse notion, of the vector of cars still available given some partial sequence sp , may be defined as: avail(sp ) = (n1 , n2 , . . . , nG ) − used(sp ). Similarly, availg (sp ) = ng − usedg (sp ). The number of available cars requiring option optj given some partial solution sp is defined as: rj (sp ) = G X optj (g) · availg (sp ) g=1 and the (dynamic) utilisation rate of option optj (ratio constraint RCj ) in the cars remaining after sp is given by: rj (sp ) · Dj utilRate(j, sp ) = Nj · (N − |sp |) This function was called dynamic in contrast to a static one, which does not depend on sp and was simply defined by Gottlieb et al. (2003) as utilRate(j, ()). Given these definitions the two most important tie-breaking functions may be defined as: DSU (sp , g) = O X optj (g) · utilRate(j, sp ) j=1 which is the dynamic sum of utilisation rates of group g given partial sequence sp and: DHU (sp , g) = O X optj (g) · 2rank(j) j=1 which is the dynamic highest utilisation rate of group g given sp , and rank(j) = r for the option j with rth smallest utilRate(j, sp ). Gottlieb et al. studied empirically the performance of these two heuristics, and 4 others, on two sets of CSPLib instances. This study revealed that DSU and DHU were the best for harder instances with cars requiring many options (overconstrained instances). While these heuristics were worse on the second set of easier (satisfiable) instances, neither of them performed badly, being generally the second and the third performer. The best one for this set, called the dynamic even distribution heuristic (DED), was on the other hand worse for the harder instances. 8.5.2 Insertion heuristic by Ribeiro et al. Ribeiro et al. (2005) proposed a constructive heuristic for Renault’s CarSP. This was used as a part of their iterated local search algorithm (see section 8.6.2). The heuristic starts with an empty partial solution and in each step inserts one car into the sequence, until all cars are inserted. At each step, the car to be put in the sequence is chosen randomly from the set of available cars. The insert position is chosen heuristically based on the change of the objective function induced by the potential insert. The position with the best overall change (possibly negative) is chosen. Insertions leading to infeasible solutions are discarded. 156 The car sequencing problem 8.6 Metaheuristic algorithms for the CarSP 8.6.1 Local search by Gottlieb et al. Gottlieb et al. published two papers on local search for CarSPs (Puchta & Gottlieb 2002, Gottlieb et al. 2003). They considered several types of neighbourhood operators there: • Insert: removes a group index from some position and puts it back in some other one. The subsequence in between is shifted by one position. Neighbourhood size: O(N 2 ). • Swap: exchanges two different group indexes at least 2 positions apart. Size: O(N 2 ). • SwapT: exchanges two different adjacent group indexes. This is also called transposition. Size: O(N ). • SwapS: exchanges two different groups like Swap, but only if they are similar enough. This is measured by the Hamming distance between the vectors of options. It is required that this distance between exchanged groups g1 , g2 is: 1 ≤ dH (g1 , g2 ) ≤ 2. Neighbourhood size: O(N 2 ), but in practise may be much smaller. • Lin2Opt: inverses a subsequence of at least 2 elements. Size: O(N 2 ). • Random: shuffles randomly a chosen subsequence of at least two elements. Size: unknown. In order to speed up local search they employed the well-known technique of computing only the change of the objective function when evaluating a move (the incremental update scheme, see section 4.6.5). This was especially important for moves that affect only small parts of a solution, like SwapT or SwapS. Additional acceleration of LS was obtained by restricting the moves to some smaller parts of the whole subsequence. Puchta & Gottlieb set the restriction to N/20, meaning that e.g. swapped cars could be at most N/20 positions apart. Gottlieb et al. increased this limit to N/4. No information on any rationale for this decision was given, though. Results presented by Puchta & Gottlieb (2002) are difficult to discuss, since they were obtained for a different version of the CarSP (see section 8.2.1). However, the same LS was employed for CSPLib CarSP by Gottlieb et al. (2003). They reported results that compared the designed iterated local search, iterated heuristics and an ant colony optimisation algorithm (ACO). The algorithms were initialised either by a random permutation or by one of their dynamic greedy heuristics: DSU, DHU, DED (see section 8.5.1). For easier (satisfiable) instances, the results showed that most of the tested algorithms were able to solve them to optimality. These algorithm were: iterated DSU/DHU, ACO DSU/DHU, LS DED and sometimes randomly initialised LS. On the other hand, the ACO with random initial solutions was significantly worse than the LS and iterated heuristics. This result indicates that a good initial solution was crucial for Gottlieb et al.’s (2003) algorithms, especially the ACO. For the harder instances from CSPLib, the ACO DSU/DHU was the best among the presented algorithms. Iterated DSU/DHU and LS DED were only slightly worse, though. However, LS DSU/DHU was not even reported, while this would most probably be the best combination of local search with a greedy heuristic. This is at least bizarre of Gottlieb et al., since a fair comparison of algorithms called for the combination of LS with DSU/DHU. The potential of this combination is visible in the presented results: for some instances the iterated DSU/DHU performs better than the presented LS DED. Moreover, in longer runs the presented LS DED performed equally well to the ACO. On top of that, the comparison of success rates, the percentages of runs with the 8.6. Metaheuristic algorithms for the CarSP 157 best-known solutions found, reveals that LS DED is indiscernible in quality from ACO DSU/DHU. And that these algorithms generate best-known solutions almost in every run. To summarise the results of Gottlieb et al., it seems that the proposed local search was a good design for the CarSP, indeed not worse than their ant colony optimisation. The results showed the importance of having a good initial solution for LS, either DSU or DHU. Which move operator should be used in local search for the CarSP? It is not clear from the results of Gottlieb et al. They used 4 structurally different types of moves (6 operators in total) with equal probabilities of application and did not provide any answer to this question. Nevertheless, they mention that ‘it is well known for some combinatorial optimisation problems that not all information types are relevant’ (Puchta & Gottlieb 2002). They recall the TSP case where edges are important in a solution. Further, they say that in their CarSP ‘several information types are relevant’. They mention adjacency and absolute positions for their more involved version of the CarSP, but do not check this hypothesis in any way. 8.6.2 Iterated local search by Ribeiro et al. The method designed by Ribeiro et al. (2005) for the initial stage of the ROADEF challenge is an iterative local search algorithm (ILS). It maintains a single solution that is being modified and the best solution found so far. This algorithm uses two neighbourhood operators in local search: swap and insert. It is not said whether the LS procedure is greedy or steepest, or some more involved one. Two additional neighbourhood operators are used in perturbation (mutation) procedures: • group exchange: exchanges two maximum subsequences of cars with the same colour; • group reinsertion: a maximum subsequence of one colour is removed from a solution and the solution is made complete by the constructive heuristic. The algorithm is initialised with the constructive heuristic described above (section 8.5.2). This solution is then improved by swap local search and the main ILS loop starts. The loop usually iterates only two steps: a perturbation of the current solution by a random group exchange followed by a swap local search. If the result of local search is not worse than the current solution, the latter is updated. In case the best solution found so far is not improved for 200 consecutive iterations of the main loop, an intensification step is launched. It uses insert local search to improve the current solution. Moreover, if the best solution found so far is not improved for 1000 iterations, a diversification step is performed. It perturbs the current solution with a random group reinsertion move. This is followed by swap local search. When the given time limit is reached, the algorithm returns the best solution found so far. This method was ranked 1st in the initial stage of the challenge. On 16 Renault’s instances from set A it generated very good results. For several of them, it provided results which remained the best even in the final stage: 3 instances of type WD=60 and 2 of type WD=30. A possibly improved version of this algorithm was ranked 2nd in the final stage of the challenge. 8.6.3 Local search and very large neighbourhood by Estellon et al. Although Estellon et al. (2006) considered in their paper only the CSPLib version of the CarSP, they mentioned that their algorithm was the basis for the success in the ROADEF Challenge 2005. 158 The car sequencing problem Local search They called their LS procedure a very fast local search (VFLS). It starts from a solution of the DSU heuristic. What follows is a greedy algorithm where the first found neighbour which is not worse than the current solution is accepted. The algorithm uses 3 types of neighbourhood operators: swap, insert and reflection (called Lin2Opt by Gottlieb et al. (2003)). In all these operators a particular move is specified by picking two positions in a solution, i and k. Estellon et al. said that for i and k ‘clever choices are necessary’ and proposed several strategies making the choice: • generic: choose i and k randomly; • consecutive: choose i randomly and set k = i + 1; • similar: the cars at chosen positions share some options (are similar); • denominator: choose position i and some option j randomly, and set k = i + Dj . These strategies are employed with different probabilities to each operator. The probabilities seem to be especially tailored to the problem, because they were rather nonuniform and given with one decimal digit precision. For example, the generic swap is attempted 69.6% of times. The local search is accelerated by the use of dedicated data structures. Estellon et al. did not give details, but said that the structures help exploit strong locality of the used transformations. In effect, the local search speeds up approximately 10 times. The authors of the VFLS conclude: ‘the efficiency of this approach relies on the huge number of attempted transformations’. Large neighbourhood of k-permutation On top of the local search procedure, Estellon et al. considered an integer linear programming formulation of the CarSP. In this context they made an interesting discovery about an operator with very large neighbourhood. This neighbourhood may be examined in polynomial time for the best element in many practical cases. The operator is called k-permutation and permutes cars which are assigned to k different positions in a solution. In the general case, given k different cars in k positions, the number of possible modifications of a solution with this operator exactly in this configuration is equal to k!. Moreover, the number ¡ ¢ of all possible configurations of chosen positions is Nk . This is practically prohibitive for larger values of k (say, 10 or more). However, Estellon et al. proved that if these positions are distant enough from each other, the optimal assignment of the considered cars to the positions may be done in polynomial time. Assuming that Dmax = maxO j=1 {Dj }, each permuted car has to be not less than Dmax positions away from any other one. In this case the problem to be solved becomes the linear assignment problem (LAP), which belongs to class P. There remains one decision to be made: the choice of positions for the k-permutation move. The authors of this approach recommended choosing some of them randomly and completing the set with positions where violations of RCs appear. Estellon et al. made this LAP-based k-permutation another neighbourhood operator in their local search. It is attempted with the probability of 0.2%, so it is extremely rarely used. But the authors said that when attempted, k-permutation very often improvds the current solution. At the same time it contributed to the diversification of the search, they noted. 8.6. Metaheuristic algorithms for the CarSP 159 Results The results of this method were impressive. VFLS alone proved indeed to be very efficient. It generated the best-known results for the largest CSPLib instances (200–400 cars) in approximately one minute, on average. It even improved the best-known solutions of 3 instances. After introducing the LAP-based k-permutation into VFLS, the algorithm accelerated by 15– 20% and decreased the number of LS iterations by approximately 60%. It means that the large neighbourhood indeed contributed enormously to the chance of success of one LS move, although it also consumed a large part of the saved computation time. As mentioned earlier, the design based on this method won the ROADEF Challenge 2005. The results labelled ‘winner’ in table 8.2 were generated by the algorithm developed by Estellon et al. 8.6.4 Generic genetic algorithm by Warwick and Tsang Probably first evolutionary algorithm for a CarSP was proposed by Warwick & Tsang (1995). It was actually a memetic algorithm which employed local search (hill-climbing) instead of mutation. With respect to other properties, it was a generational MA with elitism: 10% of best solutions in the population always survived selection. As representation they used a sequence of indexes of groups, exactly equivalent to the form of solution s given earlier. This was further extended with a supplementary binary string b of length N , in order to enable their uniform adaptive crossover (UAX). An example of such a representation is given in figure 8.1: there is a sequence of groups p0 and its related binary string b0 . These additional strings are generated randomly during initialisation and processed exclusively by the proposed crossover. Warwick & Tsang (1995) say that ‘the engine of GAcSP [(MA)] is a crossover operator (UAX) that remembers valuable crossover points in order to aid in retention of useful building blocks that may be separated in the string representation’. These crossover points are stored exactly in the additional binary strings. Therefore, it is good to know how the operator works. The pseudocode of UAX is given in algorithm 19. Algorithm 19 (o, b) = U AX(p0 , b0 , p1 , b1 ) o = b = (0, 0, . . . , 0) {start with empty results} draw random a ∈ {0, 1} {index of the first active parent} aprev = a {the previous active parent: active in last assignment} i = 1 {first copied position} while (i ≤ N ) do while ((b0,i 6= b1,i ) or (a 6= aprev )) and (i ≤ N ) do oi = pa,i {copy the car group index} bi = ba,i {copy the associated bit} aprev = a {remember the donor in the last assignment} i = i + 1 {advance in the sequence} a = (1 − a) {switch the active parent} repair o return (o, b) The authors of UAX give an example of how it operates. Lets assume that parents p0 , p1 and their binary strings b0 , b1 are given like in figure 8.1. Lets also assume that first active parent is p0 (a = 0). Then the offspring iteratively inherits its values of o and b from the active parent, starting from position i = 1. The active parent is switched whenever the corresponding b0,i and b1,i match. In the example it means that the offspring receives positions 1–3 from p0 , 4–5 from p1 , 6–7 again from p0 and finally 8–10 from p1 . 160 The car sequencing problem 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 p0 1 2 3 4 2 5 4 1 3 5 p1 2 1 3 2 4 1 5 3 4 5 b0 0 1 0 0 1 1 0 0 1 0 b1 1 0 1 0 0 1 1 0 0 1 1 2 3 4 5 6 7 8 9 10 o 1232454345 b 0100010001 Figure 8.1: Example of UAX: parents p0 , p1 with binary strings b0 , b1 and their offspring o with string b. The UAX result may be infeasible: some groups may be overrepresented and some other underrepresented. Therefore, there is always a repair procedure invoked afterwards, and it may be perceived as a part of UAX. The repairer finds randomly an overrepresented value (position i), then finds the first underrepresented value in the following subsequence (position l > i). The value of oi is set to ol and the process continues for increasing positions i. When the the solution is made valid, the process stops. If not, and the end of this solution is reached, the process is restarted. The repairer does not modify the accompanying binary string. Local search uses a swap move to improve the offspring. The first element swapped is the one with high number of violations (low fitness). The swapped counterpart is the one which causes maximum gain in the objective function. This results in a steepest LS. However, this component is an option in the whole algorithm. It is launched only until a given limit on total local search time is reached. 8.6.5 Genetic algorithm by Terada et al. Terada et al. (2006) proposed a genetic algorithm for the CarSP and compared it with a series of algorithms called ‘squeaky-wheel optimization’ (SWO). Their GA was a steady-state algorithm with 50 individuals, tournament selection, elitist replacement and without local search. These authors employed an indirect representation with two levels of decoders. Their chromosome was a sequence of length N with floating-point numbers. A car was represented by a position in this sequence, while an entry at the position was a priority of the car. First decoder was simply a sorting procedure which established some order of cars based on their priorities. Then, there was most probably a second decoder invoked (this is not completely clear from the given description). This decoder was a polynomial time construction algorithm that created an actual solution (sequence of cars) based on the intermediate sequence. It iteratively added the identifiers of cars to a solution in the given order, one at a time, by finding the earliest possible position in which the car would not cause any violation of constraints. If such a position could not be found, the resulting solution had some positions left unfilled and an invalid solution was constructed. The genetic algorithm employed crossover, and that was the well-known single-point operator. Mutation was not used, because the authors noticed it had not improved results in any setup. They considered only random mutation, though. The algorithms presented by Terada et al. were evaluated on smaller sets of CSPLib instances with 100 and 200 cars. The basic experiment compared all the proposed algorithms with respect to the percentage of runs with optimal solutions found (most of these instances were known to have at least one solution without violations). All algorithms were allowed to run for 1000 iterations, but no time of computation was reported. This basic experiment revealed that the GA was the 8.6. Metaheuristic algorithms for the CarSP 161 best of all 9 tested algorithms. It was the best also when some hardest instances were selected for another comparison. In yet another special experiment the authors allowed all algorithms to run longer on the hardest instances, for 5000 iterations. In this case one version of SWO was best, but it was the one that had most in common with the GA. It used a population of solutions and the same crossover operator. Moreover, since the times of computation were not given, it may not be said that in this special experiment the SWO was indeed better with the same computational resources given. Finally, all other 6 versions of SWO were worse that the GA; some of them even worse than random search followed by decoding. Terada et al. compared their general results to those of Warwick & Tsang (1995) on the same set of instances. This comparison showed that the GA described here performed significantly worse than the MA of Warwick & Tsang. Terada et al. report that they could improve the results to be comparable, by means of some ‘domain-specific post-processing step’, but except for the name of the step no detail is given. To summarise, the best algorithm proposed by Terada et al., the genetic algorithm, was worse than the one of Warwick & Tsang. It therefore seems that the application of two-level indirect representation, single-point crossover and no local search was a poor design for CarSP. Only due to this reason it is not worth implementing the GA for comparison with algorithms developed for this thesis. Besides, Terada et al.’s (2006) description of some elements of their algorithm is vague, so it makes proper implementation of this algorithm hardly possible. 8.6.6 New crossover operators by Zinflou et al. Zinflou et al. (2007) noticed that genetic algorithms were rarely used for solving the car sequencing problem. They explained the situation ‘by the difficulty of defining specific and efficient genetic operators for the problem. In fact, traditional genetic operators (. . . ) cannot deal adequately with the specifics of car sequencing’. Therefore, they proposed 3 new recombination operators for CSPLib CarSP: Interest Based Crossover, Uniform Interest Crossover and Non Conflict Position Crossover (NCPX). These operators were tested in a standard, generational GA without local search. However, some neighbourhood operators were employed as mutation. These were: swap, reflection (Lin2Opt), shuffle (random) and displacement (insert). Overall, on CSPLib instances the best performer was NCPX, so only this operator is presented below. The Non Conflict Position Crossover is based on the notion of a conflict position. A position i in solution s is said to be a conflicting one if for any ratio constraint (option) there exists a constraint window covering this position in which some violations of the RC occur (Zinflou 2008). More precisely, for position i ∈ {1, . . . , N } there is a conflict in s, conf (s, i) = 1, if: ∃j ∈ {1, . . . , O} ∃k ∈ {i − Dj + 1, . . . , i} : vnj (s0 , k) > 0 Otherwise conf (s, i) = 0. NCPX works on two parent solutions. In the first stage, the recombination operator counts in the first parent all the positions without conflicts. Then, a random number of such positions numCopy1 is drawn, together with a random position in the parent. Starting from the position the first numCopy1 non-conflicting positions from the first parent are copied directly to the same positions in the offspring. In the second stage, NCPX starts with gathering from the first parent the groups from positions which were not copied to the offspring. These groups establish the vector of yet available cars. 162 The car sequencing problem These cars are used to fill the remaining positions in the offspring. A random position there is drawn, which is not yet filled with a group index. The offspring completion process starts from this position and continues for the subsequent empty ones. If necessary, it also continues from the beginning of the offspring, until it is made complete. The group index of a car to be inserted at a particular position is chosen from the vector of currently available cars in a hybrid random-heuristic manner. With probability 0.05 the choice is made randomly. In such a case a group is chosen with probability being proportional to the number of available cars of this group. Otherwise, the group g to be inserted at position i of an incomplete solution sp is the one which maximises the heuristic function I: ( −∆V N (sp , i, g) if ∆V N (sp , i, g) > 0 I(sp , i, g) = DSU (sp , g) otherwise The notation has been somewhat abused here. It means, however, that before assigning a group to position i in an incomplete solution, the number of new violations to ratio constraints due to that insert has to be computed. If there are no violations, the heuristic evaluation by the DSU function matters. In case several groups maximise I the tie has to be broken. If the same position is nonconflicting in the second parent and is occupied by a group with maximum I, this group is copied to the offspring. Otherwise, a random group with maximum I is chosen. One may notice that this NCPX operator generates an offspring mainly based on contents of one parent. In the first stage, some non-conflicting positions are copied from this parent. In the second stage the remaining positions are filled to some small extent randomly; to a large extent heuristically; finally, in some unknown percent of cases they are taken from the second parent. This unknown percentage of cases depends on how many times different groups induce equal values of I and at the same time there is no conflict at the considered position in the second parent. At first glance, the chance that the generated offspring inherits anything from the second parent seems small. Therefore, it appears as though NCPX is in fact some large heuristic mutation of the first parent, rather than a recombination operator based on two parents. Without further experimental data this issue remains unclear, though. 8.7 Summary The review presented above allows to form some conclusions about good designs for CarSPs. Importance of local search It seems that local search is the very foundation of efficient al- gorithms for such problems. Indeed, the best algorithms rely heavily on this method: the design of Estellon et al. (2006), Ribeiro et al. (2005), Puchta & Gottlieb (2002). Moreover, Warwick & Tsang (1995) used it in their MA; Zinflou et al. (2007) employed neighbourhood operators as mutation and planned to include local search to their EA to improve efficiency. On the other hand, Terada et al. (2006) did not implement any LS, and their GA design was worse than the 10-years earlier MA of Warwick & Tsang. Finally, the comparison of pure ACO algorithms with pure LS by Gottlieb et al. (2003) revealed the LS to be not worse than ACO. Generic neighbourhood operators The reviewed algorithms all use some subset of one set of neighbourhood operators: swap, insert, reflection, random shuffle. These are not any specialpurpose operators defined exclusively for CarSP problems. Rather, these are generic operators that may be found employed in different problems, especially those concerned with permutations, like: 8.7. Summary 163 the TSP (Merz 2000, Hoos & Stutzle 2004), the CVRP (see section 6.4), the flowshop scheduling (Reeves 1999, Hoos & Stutzle 2004), the QAP (Hoos & Stutzle 2004). The case of k-permutation operator of Estellon et al. seems to be an exception from this rule. However, it is very rarely used by its authors (0.2% of all evaluated neighbours) and in fact does not have a major impact on the results: the average quality of solutions remains the same, the algorithm speeds up by 15–20%. Thus, the real strength of VFLS lies in the generic operators. It is not exactly known which generic operator, or combination of operators, works best for CarSPs and why. The arguments for using them seem to be: the strong locality of transformations and good practical results on benchmarks. The excellent results of Estellon et al. and Ribeiro et al. seem to indicate that swap is the key operator, but no theoretical foundation for this choice exists, as far as the author knows. In particular, he has not seen any ruggedness analysis of these operators published, which could shed some light on the matter. Fast evaluation of neighbours Rather, the speed of evaluation of neighbours seems to be the key issue while choosing an operator, and this conclusion agrees with the guidelines inferred from other problems (see section 4.6). The mentioned strong locality of the operators plays an important role here. Indeed, Puchta & Gottlieb (2002) speed up their algorithm by using the incremental update scheme and also by restricting the neighbourhood to only small parts of a sequence. Estellon et al. argue that fast evaluations are the key success factor; they employ special data structures to accelerate the evaluations. Good constructive heuristic Finally, what is common to some of these algorithms is their use of the good heuristic idea of Puchta & Gottlieb (2002): the dynamic sum of utilities (DSU). It is used by the authors, but also by Estellon et al. On top of that, Zinflou et al. employ DSU as a heuristic guiding insertions in their NCPX operator. Beyond local search? What other metaheuristic may be used to efficiently solve CarSPs? How to adapt them to the problem? The best algorithms did not go far from the basic LS idea. Ribeiro et al. simply iterated it with additional perturbation or LS phase. Estellon et al. found another, large neighbourhood which may be efficiently searched. There seems to be no good design of different kind. Recombination operators This state of affairs also concerns evolutionary algorithms for CarSPs. Especially when one considers their most ‘evolutionary’ part: recombination operators. Very recently Zinflou et al. (2007) stated that is was difficult to find good problem-specific recombinations for CarSPs. The author of this thesis agrees with their observation. Currently, there are only several recombination designs available for CarSPs: single point crossover, UAX, 3 operators of Zinflou et al. Single-point crossover is the most classical operator in EAs. It was not designed for CarSPs, but for problems with binary representation. It frequently produces infeasible sequences of cars. Computational results indicated it was evidently worse than UAX. Similarly, UAX is the uniform crossover proposed for binary problems and adapted to CSPLib CarSP by adding a helper binary string. It has to be followed by a repair procedure in order to generate feasible offspring. Finally, NCPX was designed especially for CSPLib CarSP in order to fill the gap in recombinations. It was based on intuition and the good heuristic idea of DSU. However, it is not even clear if this is indeed a recombination operator and not a macromutation. The other Zinflou et al.’s (2007) 164 The car sequencing problem operators were also based on ideas from other problems; these were partially-mapped crossover and uniform crossover. It appears, therefore, that intuition, good heuristics and experience from other problems were the basis for the existing recombination designs. No author tried to theoretically or empirically verify what kind of information should be preserved or changed by a recombination for CarSPs. That is why the next chapter presents an attempt to find what kind of information is important in good CarSP solutions, by means of the fitness-distance analysis. The results of this analysis are further used as a basis for recombination design. Finally, the computational comparison of these operators with UAX and NCPX will indicate if this systematic design of recombination operators is indeed a good guideline that may lead a designer of a metaheuristic for the considered CarSP beyond local search. Chapter 9 Adaptation of the memetic algorithm to the car sequencing problem based on fitness-distance analysis 9.1 Representation Only one representation was considered for the CarSP: a sequence (vector, array) of group indexes. In this representation each solution is a vector containing: • indexes of groups of previous-day cars (never modified nor moved), followed by • indexes of groups of current-day cars. The order of elements in this vector reflects the order in which cars are put on the production line. During computation only indexes of groups are processed. Grouping of cars is performed while reading input data. If necessary, the generated solution is translated into a sequence of original car identifiers at the end of computation. The indexes of previous-day groups are included in each solution for technical reasons. It simplifies computation of violations of ratio constraints. 9.2 Fitness function and constraints The fitness function for the designed MA is the original objective function of the CarSP. All designed algorithms manipulate feasible solutions only. Infeasible ones are not accepted in any state of computation. 9.3 Local search The local search designed for the CarSP allows no compositions of neighbourhoods. Each operator is implemented as a separate LS process. Local search accepts moves without a change of the objective function (neutral moves). 9.3.1 Insertion of a group index The operator removes a group index from some position i and inserts it at another position, l. When an index is removed, all successive indexes are moved by one position back. Then, when an 165 166 Adaptation of the memetic algorithm to the car sequencing problem index is inserted at position l, the index at position l and all successive indexes are moved by one position forward. Formally, this move transforms solution s into its neighbour sn : s = (s1 , s2 , . . . , si−1 , si , si+1 , . . . , sl−1 , sl , . . . , sN ) sn = (s1 , s2 , . . . , si−1 , si+1 , . . . , sl−1 , si , sl , . . . , sN ) The size of the generated neighbourhood is O(N 2 ). In certain circumstances infeasible solutions may be generated with this operation. • When removing si , the colour of si−1 and si+1 is the same, and the colour of si different. Then two subsequences of the same colour, adjacent to position i, are merged. • When inserting si , the colour of si is the same as sl−1 or sl (or both). Then at last one sequence of the same colour is made longer. These conditions may be tested in constant time given a helper vector F ullColSeqLen(i) of length N . This vector stores at position i the length of the sequence of cars of the same colour as i which contains this position i. The actually implemented local search is a mixture of greedy and steepest approaches. Firstly, the procedure chooses in a random manner one removal position i (the greedy approach). The index at this position is then removed. Secondly, the local search chooses an insertion position l for the removed index. The best insertion position is chosen among all possible ones (the steepest approach). If the insertion at the best position results in an overall improvement of the objective function, taking into account both changes due to the removal and the insertion, the shift move is performed. Since the size of the neighbourhood defined by the shift operator is relatively large (e.g. aproximately 106 for N = 1000), a significant effort has been made to improve the efficiency of the algorithm. First of all, the neighbour solutions are not constructed explicitly, but only modifications of the objective function caused by particular moves are evaluated (like described in section 4.6.5). In this case, since each group index may be inserted at about 1000 positions, most of the CPU time is consumed by the search for the best insertion position, while the time of evaluation of removals is practically meaningless. In addition, the CPU time is consumed mainly by evaluation of changes of ratio constraints violations. Thus, the effort has been focused on efficient evaluation of a group index insertions from the point of view of RCs. Assume that the insertion of an index at position l is evaluated with respect to ratio constraint j, defined by ratio Nj /Dj . It may modify the number of active cars associated with this ratio constraint in subsequences of indexes starting from positions {l − Nj + 1, . . . , l}. Therefore, only the modified subsequences have to be taken into account. Furthermore, evaluation of insertion at position l + 1 may be further accelerated by taking into account results calculated for position l. Insertion at position l modifies subsequences of indexes starting from positions {l − Nj + 1, . . . , l}; insertion at position l + 1 modifies subsequences of indexes starting from {l − Nj + 2, . . . , l + 1}. Thus, it is enough to exclude from the calculations the change due to subsequence starting from position l − Nj + 1 and add the change due to subsequence starting from position l + 1. The efficiency of local search is further improved by storing information about modifications of ratio constraints violations at particular positions. For each insertion position l and ratio constraint j, the evaluation of the insertion of both an index of a group active and not active on this constraint is stored when calculated for the first time. When insertion of next indexes is considered these 167 9.4. Initial solutions stored values are used. Of course, the stored values are cleared in a range of positions defined by Dj when a move is performed. Insert was one of the operators considered earlier by (Puchta & Gottlieb 2002) (see section 8.6.1). For the purpose of the challenge it was designed and implemented by Andrzej Jaszkiewicz. 9.3.2 Swap of two group indexes This operator exchanges (swaps) group indexes between two positions i and l in the sequence. Formally it transforms solution s into its neighbour sn : s = (s1 , s2 , . . . , si−1 , si , si+1 , . . . , sl−1 , sl , sl+1 , . . . , sN ) sn = (s1 , s2 , . . . , si−1 , sl , si+1 , . . . , sl−1 , si , sl+1 , . . . , sN ) The size of the generated neighbourhood is O(N 2 ). Similarly to the case of the insert move, swap may sometimes produce infeasible solutions. Therefore, before evaluating, each swap is tested for infeasibility. This may be performed in constant time given two helper vectors. One is F ullColSeqLen(i), the same as for insert. The other is ColSeqLen(i), which stores the length of the sequence of cars of the same colour as position i and which finishes at i (any further cars of the same colour are not added). Hence, ColSeqLen(i) ≤ F ullColSeqLen(i) for all i. In the implemented local search, for each position i (the first swapped) all other positions l are tested. If any of these pairs (i, l) leads to improvement after swap, then the best one is performed. This leads to the steepest algorithm with respect to l, but greedy for i. As required for a fast local search (see section 4.6.5), only the modification of the objective function due to swap is computed for each (i, l). Concerning P CC(s), the change may be computed in constant time with the help of vectors ColSeqLen and F ullColSeqLen. Violations of ratio constraints are more time-consuming, so special effort has been put in their efficient evaluation. Firstly, the modification with respect to option j is computed only if the swapped groups really differ on the option. Otherwise, the sequence of cars does not change from the point of view of this constraint. Secondly, only the RC windows which overlap the positions i and l are considered in swap evaluation. This speeds up the process drastically if Dj << N , which is a frequent case in Renault’s instances. Finally, when |i − l| < Dj , then even less windows have to be evaluated, since nothing changes in windows which overlap both positions i and l. This operator was earlier considered by Warwick & Tsang (1995) and Puchta & Gottlieb (2002) (see sections 8.6.4 and 8.6.1). In the ROADEF Challenge 2005 it was also used by Ribeiro’s team (see section 8.6.2) and Estellon’s (section 8.6.3). For the purpose of the challenge the swap operator was designed and implemented by Andrzej Jaszkiewicz and the author of this text. 9.4 9.4.1 Initial solutions Exact algorithm for paint colour changes The general idea for such an algorithm is: • to start the current-day sequence with the last previous-day colour, if only possible, • to alternate maximum allowed subsequences (of length PBL) of cars with the same colour, 168 Adaptation of the memetic algorithm to the car sequencing problem • if one colour is in large excess, to separate maximum subsequences of this colour with single cars of some other colours as long as it is necessary, • to signalise infeasibility if no feasible solution exists. In order to properly describe such an algorithm, some definitions are required. First, lets define the number of cars with a specific colour c as the number ncc : ncc = |{i ∈ {1, . . . , N } : col(i) = c}| Lets also define usedCol(sp ), the vector of colours already used in the partial solution sp : usedCol(sp ) = (nc01 , nc02 , . . . , nc0C ) where each component c of the vector is the number of cars in sp with colour c: usedColc (sp ) = nc0c = {i ∈ {1, 2, . . . , |sp |} : col(sp,i ) = c} Similarly to the case of groups, the vector of colours still available given sp is defined as: availCol(sp ) = (nc1 , nc2 , . . . , ncC ) − usedCol(sp ) and the colour with the maximum number of available cars as: C maxAvailCol(sp ) = arg max{availColc (sp )} c=1 P CCLB (sp , g) is the lower bound on the number of further colour changes given some partial solution sp and a group of cars g that is to be appended at the end of sp . It may be computed as the sum of the number of changes immediately caused by g and the minimum number of further changes that will happen when g is appended: P CCLB (sp , g) = CurrentCC(sp , g) + F urtherCC(sp , g) The component functions of the lower bound are given in algorithms 20 and 21. Algorithm 20 CurrentCC(sp , g) curCC = 1 {the default result: there is a colour change} len = |sp | if col(g) = col(s0p,len ) then {the last colour continued} if (LastColSeqLen(s0p ) + 1 ≤ P BL) or (len = 0) then curCC = 0 {paint batch limit not exceeded or first car} else curCC = +∞ {paint batch limit exceeded} return curCC The algorithm computing CurrentCC(sp , g) is rather simple. It checks if there is a colour change between the last colour in sp and the colour of the groups to be appended g. It takes into account the border between the previous and the current day, as well. It also checks if appending g is feasible; it signalises infeasibility through an infinite result. LastColSeqLen(s0p ) is a helper function which returns the length of a continuous subsequence of cars with the same colour as the last car in s0p , where the subsequence ends in the last position of s0p . It does not count the previous-day cars, even if they are of the required colour. If s0p contains only the previous-day cars, the function returns 0. The algorithm which computes F urtherCC(sp , g) is more involved. Its first task is to check if a feasible solution exists after appending g; this true if the condition (minSeqCM ax > f easibleBound) 169 9.4. Initial solutions Algorithm 21 F urtherCC(sp , g) f urCC = 0 cmax = maxAvailCol(sp ) isM axCol = 0 {is g of the maximum colour? no by default} if col(g) = cmax then isM axCol = 1 lastLen = LastColSeqLen(s0p ) {assume g continues the last colour} len = |sp | if (col(g) 6= col(s0p,len ) then {the colour changes with g} lastLen = 0 minSeqCM ax = d(availColcmax (sp ) + isM axCol · lastLen)/P BLe PC f easibleBound = c=1,c6=cmax availColc (sp ) + isM axCol PC f ullSeqBound = c=1,c6=cmax ,c6=col(g) davailColc (sp )/P BLe + isM axCol + (1 − isM axCol) · d(availColcol(g) (sp ) + lastLen)/P BLe if minSeqCM ax > f easibleBound then f urCC = +∞ else if minSeqCM ax > f ullSeqBound then f urCC = (minSeqCM ax − 1) · 2 + (1 − isM axCol) else PC f urCC = c=1,c6=col(g) davailColc (sp )/P BLe + d(availColcol(g) (sp ) + lastLen)/P BLe − 1 return f urCC is false. The second task of the function is to establish if maximum sequences of colours may be alternated; this is possible if the condition (minSeqCM ax > f ullSeqBound) is false. When these conditions are checked, the appropriate scenario for further colour changes is established and the number of changes computed. Infeasibility is signalised through +∞ in the result. The algorithm for minimising P CC(s) starts with an empty partial solution sp and iteratively appends any group of cars g which minimises the current lower bound P CCLB (sp , g). If there are many possibilities, the actual group is chosen randomly. This algorithm is guaranteed to globally minimise P CC(s) of the generated solution, if only such a solution exists. 9.4.2 Kominek’s heuristic Kominek based his constructive heuristic on the notion of the utility rate of an RC, similarly to Gottlieb et al. (2003). He employed it in a slightly different way, though. First, lets define for each option j the weight of violations of the option, wj , being either wHP RCs or wLP RCs , depending on the related priority: prio(j) (1−prio(j)) wj = wHP RCs · wLP RCs The constructive algorithm starts with the empty partial sequence sp . In each step it appends to sp one car of group g with some cars available. The chosen g maximises the heuristic evaluation: evalK(sp , g) = O X wj · evalKj (sp , g) + wP CC · evalKP CC (sp , g) j=1 where: ( evalKj (sp , g) = utilRate(j, sp ) if (V Nj (sp · g) − V Nj (sp )) = 0 −(V Nj (sp · g) − V Nj (sp )) otherwise This means that a group gains wj · utilRate(j, sp ) on constraint j which cause no violations. On the other hand, it costs the weighted number of violations if such occur. 170 Adaptation of the memetic algorithm to the car sequencing problem The colour term, for len = |sp |, is defined as: 0 0 −∞ if col(g) = col(sp,len ) and LastColSeqLen(sp ) + 1 > P BL 0 evalKP CC (sp , g) = 1 otherwise if col(g) = col(sp,len ) −1 otherwise It sets a unit of gain for the group g if it causes no colour change and feasibility is maintained. It adds a unit of cost if a colour changes. Finally, it adds infinite cost for groups leading to immediate infeasibility. The procedure returns an empty partial solution if infeasibility could not be avoided. 9.4.3 Extended Gottlieb and Puchta’s DSU heuristic The original algorithm of (Gottlieb et al. 2003) was designed for the CarSP without colours and priorities of ratio constraints. Therefore, it has to be extended to handle instances of Renault’s CarSP properly. First, the function ∆V N (sp , g) defined in section 8.5.1 is extended with RCs priorities and weights into ∆V Nw (sp , g): ∆V Nw (sp , g) = O X (V Nj (sp · g) − V Nj (sp )) · wj j=1 The extended heuristic algorithm also starts with the empty partial solution sp = (). In each step it chooses for appending to sp a group g which minimises the function appCost(sp , g): if availg (sp ) = 0 ∞ ∞ otherwise if P CCLB (sp , g) = ∞ appCost(sp , g) = ∆V Nw (sp , g) otherwise if wHP RCs > wP CC ∆V N (s , g) + w · CurrentCC(s , g) otherwise w p P CC p If there are many such groups then the one is chosen which maximises the tie-breaker evalDSU . This is actually the DSU function extended with RCs priorities, weights and the lower bound on paint colour changes: evalDSU (sp , g) = O X wj · optj (g) · utilRate(j, sp ) − wP CC · P CCLB (sp , g) j=1 Given the 3 possible values of the vector of weights w, there are actually 3 different heuristics generated by this algorithm. These will be denoted DSU0, DSU3 and DSU6, depending on the value of wP CC . 9.4.4 Random solution The procedure which constructs a random solution is given in algorithm 22. This algorithm starts with the empty partial solution sp . In each step it chooses randomly a group to be appended to sp . The probability P (g) of choosing a group is proportional to the number of still available cars of this group. This probability is changed only if infeasibility of the constructed solution has to be avoided. In such a case a group of different colour is always chosen with uniform probability of such groups, provided there is at least one of them. Although this design may somehow bias the generated solutions, this should happen rarely. The author predicts rather small probability of generating a sequence of P BL cars of the same colour, except some biased cases of input data. 9.5. Fitness-distance analysis 171 Algorithm 22 Random initial solution sp = () for i = 1 to N do lastCol = col(s0p,i−1 ) {the last colour so far} if LastColSeqLen(s0p ) + 1 > P BL then {infeasibility possible; try to avoid} GotherCol = |{g ∈ {1, . . . , G} : col(g) 6= lastCol}| {groups of other colour} if GotherCol = 0 then {no feasible sequence possible} sp = () break for g = 1 to G do P (g) = 0 if col(g) 6= lastCol then P (g) = 1/GotherCol else {any group is feasible} for g = 1 to G do P (g) = 0 if availg (sp ) > 0 then P (g) = availg (sp )/(N − i + 1) choose g randomly with probability P (g) sp = sp · g return sp 9.5 9.5.1 Fitness-distance analysis Similarity measures for solutions of the CarSP In the case of the CarSP similarity measures are used instead of distances. This is due to historical reasons. Similarity has the opposite meaning than distance: it has higher values for objects which are closer. It does not have to possess the properties of a metric. At the initial stage of the challenge, the author’s team intuition indicated that existence and preservation of certain subsequences of cars in solutions may be crucial for good values of the objective. It can be clearly seen from the definition of this function, at least from the point of view of RCs, that cars with active options should be usually separated by cars with inactive ones in order not to cause violations. Moreover, the colour subcriterion (P CC(s)) should also to some extent force certain groups of cars to come in subsequences in good solutions, e.g. long subsequences of the same colour. Therefore, the hypothesis was that in good solutions certain sequences of groups, those that ensure good separation of active cars and good subsequences of colours, may be more frequent than others. This idea led to the definition of similarity in terms of common subsequences. Another concept was to check if consecutive couples of cars are similar in good solutions. It was considered a simplified version of the idea of common subsequences, since it would not check triples or longer parts of solutions. This idea is reflected in the definition of similarity as common succession relations. The last idea was to compare solutions with respect to positions of cars and this motivated the definition of similarity as common positions. The author’s team wanted to check if good solutions should preserve certain positions in sequences. The initial intuition was that this should not be the case; the author expected positions of cars to be irrelevant to the objective function. Consequently, it was interesting to see if FDA could confirm this kind of ’no relevance’ hypothesis. 172 Adaptation of the memetic algorithm to the car sequencing problem Groups and weaker groups of cars The author’s challenge team relaxed the concept of groups and defined the so-called weaker groups. These are sets of cars which have only the most important properties identical, i.e. those with the associated weight 106 . Other properties of vehicles within one weaker group may be completely different: e.g. colour or low priority options if wHP RCs = 106 . The idea behind weaker grouping came from an observation that for instances with D=1 (with HPRCs most important and difficult) the impact of colour and LPRCs on the values of the objective function may be minimal, due to the fact that V NHP RCs (s) is very unlikely to get zeroed. Thus, these properties should not be considered at all during computation of similarity. Since indexes of weaker groups are easily computable, they were also used with other types of instances in order to see the outcome. Similarity as common positions: simcp This similarity measure is based on the idea of Hamming distance: it compares solutions on the position-by-position basis. The value of similarity simcp is the number of positions with the same indexes of groups in the compared solutions (common positions, cp). In figure 9.1 an example of such a comparison is given for two simple solutions s1 and s2 . 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 s1 1 2 1 2 1 2 1 2 1 2 s2 1 2 1 2 2 1 2 1 2 1 simcp(s1,s2)=4 Figure 9.1: Comparison of two sequences by simcp measure. Common positions are emphasised with black background. Similarity as common subsequences: simcs and simcswg The second measure evaluates similarity in the sense of common subsequences (cs). This concept, although independently developed (see Jaszkiewicz et al. (2004)), is similar to the one described by Mattiussi et al. (2004), where sets of common subsequences are computed to determine similarity of two genomes. This concept also has a different mathematical formulation here, with weights of subsequences proportional to their length. Moreover, it was introduced by Mattiussi et al. to measure diversity, not to perform any fitness-distance analysis. The algorithm for computing simcs works as follows. First, all common subsequences of two compared solutions are computed using a generalised suffix tree (Gusfield 1997). A subsequence of at least two elements is common to two solutions if its elements may be found somewhere in these solutions in the same order and without additional spacing. Second, in each solution separately, a subset of maximal subsequences is found. These are subsequences which have the maximum length and are not completely included in any other subsequence, though they may partially overlap each other. An illustration of a result of this process is shown in figure 9.2. One can see two solutions containing only two types of cars (indexes 1 and 2). Below them maximal common subsequences are marked with solid lines. As mentioned before, sets of such subsequences might be different in the compared solutions; subsequences do partially overlap in some cases, as well. When the common subsequences are found, their lengths are summed up with additional weights. In order to achieve this goal the length of each subsequence is increased by lengths of all its proper, shorter subsequences which finish at the same position. The subsequences which are contained in the next maximal common subsequence are not added in order not to inflate the 173 9.5. Fitness-distance analysis 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 s1 1 2 1 2 1 2 1 2 1 2 s2 1 2 1 2 2 1 2 1 2 1 sumcs(s1,s2)=70 simcs(s1,s2)=8 Figure 9.2: Comparison of two sequences by simcs measure. Solid intervals indicate maximal common subsequences; dashed ones show their proper subsequences included in the computation of similarity (see text). value of similarity. The purpose of weights is to give preference to single and very long subsequences over many shorter but overlapping ones. In figure 9.2 these shorter subsequences included in the computation are indicated with dashed lines. The sum of lengths of all subsequences found so far in both solutions (the solid and dashed intervals in figure 9.2) defines the value of sumcs (s1 , s2 ). Finally, the value of similarity is computed with the following formula: p simcs (s1 , s2 ) = 1/2( 4 · sumcs (s1 , s2 ) + 9 − 1) This expression ensures a sort of normalisation of values of this measure: the minimum is 1 (no common subsequences) and the maximum is equal to the length of a solution (identical solutions compared). The value of simcs is also equivalent to the length of such a maximal subsequence which would be the only common one in the two compared solutions. The third measure, simcswg , is computed exactly in the same way as simcs , though the compared sequences contain indexes of weaker groups only. Similarity as common succession relations: simcsuc and simcsucwg The fourth measure of similarity counts the number of common relations of succession (immediate neighbourhood, adjacency) between indexes of groups in compared solutions. A pair of indexes of groups is in this relation if the second index immediately succeeds the first one. An example of computation of simcsuc may be seen in figure 9.3. In the two presented solutions there are exactly 8 common succession relations (emphasised with arcs under the pairs of indexes). Note that the last pair in solution s1 , (1, 2), is not a common succession relation because it does not have its counterpart in s2 (there are 5 such pairs in s1 , but only 4 in s2 ). The last measure of similarity, simcsucwg , is identical in definition to the previous one, simcsuc , but it is computed on the basis of indexes of weaker groups (just like simcswg ). 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 s1 1 2 1 2 1 2 1 2 1 2 s2 1 2 1 2 2 1 2 1 2 1 simcsu (s c 1,s2)=8 Figure 9.3: Comparison of two sequences by simcsuc measure. Arcs indicate common succession relations (see text). 174 9.5.2 Adaptation of the memetic algorithm to the car sequencing problem Random solutions vs. local optima First stage of the fitness-distance analysis tests possible differences between sets of local optima and random solutions of a given instance in terms of distance in these sets (see section 5.2.1). To check these differences in case of the CarSP, large random samples of 1000 different solutions of each type were generated for sets A, B and X of instances. Random solutions were produced using algorithm 22, described earlier in section 9.4.4. Local optima were generated by starting from random solutions and proceeding with local search employing insert (swap was not yet implemented at the time of this experiment). In these sets similarities of each type were computed for 500 different pairs of solutions. All values of similarity were properly normalised (divided by instance size). Finally, statistics on the obtained values were computed: the average similarity for each instance, the aggregate average and standard deviation in all instances. For instance set X, these values are shown in table 9.1. Note that for local optima the table actually shows the difference between the average similarity for these solutions and random ones, but the values of standard deviation are computed for the original averages, not for the differences. Comment on simcp The values of this measure are rather small, if one recalls that the nor- malised simcp has the sense of the percentage of common positions. For rand it is on average 8.3%; for lopt it is only 0.6% larger, with similar deviation across instances. Only in case of two instances, 034X30-0231 and 035X60-0090, the difference between rand and lopt is larger than 1%. It still remains small in these cases, though: 2.8% and 3.3%. ¯ cp > 0.1 in rand, which would suggest There are several instances for which normalised sim extraordinarily high similarity in this set of instances. But when one looks into details of instance data, it transpires that there are one or two groups of cars which dominate others. Hence, in these cases it was easy for a pair of random solutions to have a large number of common positions. Overall, it may be concluded that local optima are practically not more similar to each other than random solutions are with respect to common positions. Comment on simcs At first sight, the values of normalised simcs also appear to be small. On average, for rand simcs = 8.1%, while for lopt simcs = 11.5%. The difference is 3.4%, so it is larger than in the case of simcp , but still seems to be tiny. This is not the case, though, since simcs is heavily influenced by lengths of common subsequences. The value of simcs does not exactly reflect the coverage of solutions by such subsequences, where coverage is the percentage of positions covered by some common subsequence. To show the differences in this coverage due to 3.4% change in simcs a separate series of values was derived from the same rand and lopt solutions. The average values of coverage of rand and the differences for lopt are shown in table 9.2. These values show that the average coverage by common subsequences rises from 65.5% in rand to 80.3% in lopt. It means that almost 15% more positions are covered by some common subsequence in local optima than in random solutions. The largest changes in coverage happen for instances 039X30-1247 (by 44.9%) and 064X30-0875 (by 29.7%). These are rather high increases and for large instances (1247 and 875 cars). Very small changes happen e.g. for 025X00-0996 (only by 5.6%). Also for a number of smaller instances the increase is tiny, but these are the ones with already large coverage for random solutions, more than 90% (e.g. 028X00-0065 or 655*). These small instances are exactly the ones with several dominating groups of cars, so it was easy to find short common subsequences in pairs of random solutions. 0.344 0.195 0.199 0.069 0.192 0.051 0.024 0.084 0.046 0.017 0.043 0.056 0.023 0.035 0.042 0.053 0.016 0.050 0.031 0.083 0.084 avg. std. dev. ¯ cp sim 028X00-0065 035X60-0090 655X30-0219 034X30-0231 655X30-0264 064X30-0273 028X00-0325 035X60-0376 048X31-0459 048X30-0519 022X60-0704 029X30-0780 064X30-0875 034X30-0921 025X00-0996 039X30-1037 039X30-1247 023X30-1260 024X30-1319 instance 0.081 0.065 0.277 0.207 0.141 0.100 0.137 0.078 0.039 0.097 0.052 0.029 0.053 0.059 0.036 0.048 0.040 0.047 0.024 0.044 0.034 0.219 0.226 0.560 0.276 0.200 0.170 0.500 0.157 0.126 0.129 0.112 0.106 0.093 0.137 0.089 0.118 0.074 0.140 1.000 0.102 0.072 0.406 0.193 0.714 0.635 0.673 0.444 0.730 0.338 0.111 0.613 0.247 0.097 0.374 0.488 0.233 0.401 0.312 0.421 0.147 0.447 0.288 0.880 0.078 0.968 0.818 0.896 0.851 0.995 0.843 0.741 0.850 0.766 0.803 0.838 0.961 0.859 0.960 0.791 0.980 1.000 0.937 0.870 random solutions (rand ) ¯ ¯ cswg sim ¯ csuc sim ¯ csucwg simcs sim 0.006 0.088 0.008 0.033 0.000 0.028 0.008 0.005 0.002 0.007 0.001 0.000 0.004 0.003 0.001 0.001 0.000 0.002 0.001 0.001 0.002 ¯ cp sim 0.034 0.097 0.007 0.218 0.027 0.014 0.063 0.044 0.007 0.070 0.010 0.018 0.033 0.033 0.017 0.018 0.005 0.019 0.022 0.015 0.004 0.049 0.234 0.126 0.227 0.015 0.042 0.011 0.011 0.018 0.148 0.014 0.031 0.145 0.016 0.028 0.029 0.018 0.035 0.000 0.010 0.011 0.170 0.215 0.017 0.237 0.100 0.072 0.132 0.275 0.043 0.274 0.082 0.132 0.321 0.257 0.209 0.224 0.048 0.220 0.314 0.200 0.066 0.041 0.061 0.024 0.093 0.018 0.004 0.003 0.040 0.071 0.096 0.042 0.105 0.108 0.010 0.062 0.009 0.033 0.020 0.000 0.017 0.030 difference for local optima (lopt) ¯ cs sim ¯ cswg sim ¯ csuc sim ¯ csucwg sim Table 9.1: Average values of normalised similarity in sets of random solutions and local optima. Set X of instances. 9.5. Fitness-distance analysis 175 176 Adaptation of the memetic algorithm to the car sequencing problem Table 9.2: Average values of coverage in sets of random solutions and local optima for similarity measures based on subsequences. Set X of instances. instance ¯ cs sim rand ¯ cswg sim diff. for lopt ¯ ¯ cswg simcs sim 028X00-0065 035X60-0090 655X30-0219 034X30-0231 655X30-0264 064X30-0273 028X00-0325 035X60-0376 048X31-0459 048X30-0519 022X60-0704 029X30-0780 064X30-0875 034X30-0921 025X00-0996 039X30-1037 039X30-1247 023X30-1260 024X30-1319 0.932 0.914 0.927 0.772 0.954 0.627 0.226 0.939 0.465 0.200 0.670 0.797 0.461 0.723 0.560 0.708 0.299 0.731 0.535 0.999 0.985 0.999 0.997 1.000 0.989 0.957 0.999 0.957 0.985 0.994 1.000 0.990 1.000 0.978 1.000 1.000 0.997 0.997 0.007 0.075 0.037 0.080 0.029 0.269 0.082 0.058 0.098 0.233 0.271 0.150 0.297 0.214 0.056 0.175 0.449 0.142 0.090 0.001 0.010 0.001 0.001 0.000 0.005 0.020 0.000 0.019 0.014 0.003 0.000 0.007 0.000 0.008 0.000 0.000 0.002 0.002 avg. std. dev. 0.655 0.235 0.991 0.013 0.148 0.197 0.005 0.007 But even though the average coverage change by 15% is a significant value, it seems to be moderate. After all, as much as 65.5% positions are already covered in rand, and the average increase of simcs amounts only to 3.4%. However, the values for lopt solutions reflect a significant change compared to rand. The inspection of the actual common subsequences in several pairs of solutions for different instances revealed that there indeed is quite a difference between rand and lopt solutions. In order to illustrate this significance figures 9.4 and 9.5 show common subsequences of two random solutions and two local optima. The figures were generated for instance 064X30-0273, which is small enough to be illustrative. At the same time the computed values of simcs and coverage are near the observed averages in set X. Figures 9.4 and 9.5 each show a pair of sequences of 273 characters. One character always corresponds to one position in a sequence. A ‘-’ sign means a position that is not covered by a common subsequence. ‘(’ opens a common subsequence, ‘*’ fills it and ‘)’ ends. In the algorithms computing simcs there are also involved proper subsequences of a common sequence and overlaps between subsequences. But since these are hard to show in one sequence of characters, such subsequences are merged. Note that common sequences which are exactly adjacent are not merged. One can see in these figures the difference between similarity of random solutions and of local optima. The pair from rand is covered generally by many short common subsequences, which happen to cover 65.2% of all positions. There are 62 such sequences in the upper solution in figure 9.4. The lopt pair is covered in a different way. There are less common subsequences (41 in the upper solution in figure 9.5), but they are substantially longer, covering 88.6% of the two solutions on average. This example clearly shows that although a change in simcs by 4.7% seems to be tiny, it actually reflects a qualitative change in the underlying common subsequences. This is very 9.5. Fitness-distance analysis 177 --()(*)-(****)()------(**)----(*)-(***)()()-----()(***)()-()()----(*)( **)()---()--()-()()()()-()(**)---()--()--(*)()(*)(***)---(***)--()(**) --()()-(*)---(***)--(*)------(**)---()---------()--()-()()(*****)-()-( )-()--()-()-(**)--(*)-(*)-(**)-(*)(******)(**)(*)()---()---()-(***)---()-(***)(***)()-()-----()--(*****)----()------(*)-(****)--(*)(*)-()------()(*)---()-(**)-----(*)-()---()-()-()(*)-(**)()-(**)-()-(* **)----()---(*)--(*)-(*)-(*)---()()-(*****)--(****)(*)-()-(**)()--(*)--()---------()----()(**)(**)()--()()(*)(*)--()--(*)-(*)-()(**) Figure 9.4: Common subsequences in two random solution of instance 064X30-0273. Overlapping sequences are always merged into one. Normalised simcs = 0.079. Average covered length: 65.2%. (***)()-()--(****)(*************)-(***)(*********)(**)-()(**********)(******)-()(********)(*****)-()(****)(********)(***)()-(**)-(****)-(** **)-(*****)-(*****)-(********)(***)(***)-(**)---(****)(*****)(****)(** ******)(****)()----(*********)()(*********)()(*)---(***)---(**) (*****)(***)--(****)()(*)(*******)--(************)-()(****)()(**)()(*) (****)--()-()-(*)(**************)----()(*****)(*)(*)(***)-(*)-()(***** ****)-(*)()-(**********)(*)(********)----(****)-()--()---()-(********* ******)-(************)-()(***)(*************)--()()(**)()-()(*) Figure 9.5: Common subsequences in two local optima of instance 064X30-0273. Overlapping sequences are always merged into one. Normalised simcs = 0.126. Average covered length: 88.6%. different from simcp and simcsuc , where a change by 4.7% would simply mean 4.7% more covered positions. Thus, it may be said that the average change of simcs by 3.4% means a significant increase in the similarity of local optima compared to random solutions. Local optima seem to be more clustered in the search space from the point of view of common subsequences than random solutions are. Comment on simcswg The average change between rand and lopt seems to be larger here than in case of simcs , as much as 4.9%. At the same time there is much larger deviation of similarity in lopt for set X, so that the average is not really representative to the whole set. This is clearly visible for certain instances. For 3 of them, those with wP CC = 106 , the average ¯ cswg is 17.4%. In these problem examples a weaker group is simply the colour of a change in sim car. Thus it is hardly surprising that local optima, with good sequences of colour, are much more similar to each other than random sequences of cars. On the other hand, for instance 039X30-1247, ¯ cswg = 100% already for random solutions, so the change for local optima had to be not larger sim than 0; the equality is exactly the case. This is due to the fact that there is only one HPRC in this instance and actually no car with the related option active. All solutions of this instance are equivalent with respect to simcswg . When one excludes these extraordinary instances from this analysis, the average change in similarity between rand and lopt amounts to 2.7%. This is less than in case of simcs , although still quite a lot given the same interpretation of the values of this measure. What is more important, however, the case of the instance with dif = 1 is less optimistic. This is 048X31-0459, which was the actual basis for introducing weaker groups in set X (see section ¯ cswg is even lower, 1.4%, although a bit larger than for simcs 9.5.1). Here, the change of sim 178 Adaptation of the memetic algorithm to the car sequencing problem (1.0%). The coverage rises even less than for ordinary groups: by 1.9% compared to 9.8%. To summarise, it appears that simcswg in lopt rises considerably for instances with P CC(s) being the most important component of the objective function. For other types of instances the change is smaller than with respect to simcs , although still seems to be significant. Nevertheless, for the single instance for which weaker groups were designed the change is rather small. Local optima are only a bit more similar than random solutions from the point of view of subsequences of weaker groups. Comment on simcsuc The average similarity for random solutions amounts to 40.6%, while for local optima it rises by as much as 17%. Standard deviations are considerable, indicating large differences in simcsuc between instances. Large changes in similarity, by more than 20%, happen for 10 instances out of 19. A change lower than 5% happens only for 3 of them, all of them of type 00. This may indicate that in good solutions of type 00 instances there is much noise in succession relations. Overall, it seems that similarity of lopt is significantly higher than of rand : on average 17% more positions are covered by common succession relations. Local optima are closer to each other than random solutions are, but for type 00 instances. Comment on simcsucwg Random solutions are already very similar to each other: on average 88% of succession relations are common on weaker groups, with reasonably small deviation of ¯ csucwg is equal to 74.1%. 7.8%. The smallest sim The average change in this measure amounts only to 4.1%, which is rather a small value compared to 17% for simcsuc . It seems that local optima cannot be much more similar than random solutions from the point of view of this measure. The latter already have very many common succession relations on weaker groups. This is also the case of 048X31-0459, the only instance with dif = 1: the change of ¯ csucwg = 4.2%, which is a small value. sim 9.5.3 Fitness-distance relationships This section investigates if there are trends in sets of local optima which would confirm the ‘big valley’ hypothesis: that better solutions tend to be more similar to each other than worse ones. This investigation was performed with the method of the analysis of a set of pairs of local optima (see section 5.4.6). The computation was performed on the same set of 500 pairs of local optima as used in the previous section. Raw values of r2 obtained for set X of instances are given in table 9.3. Table 9.4 presents aggregated values for all sets and types of instances, in order to give the reader a general view of results for Renault’s CarSP. Similarly to the CVRP case, the values of r2 emphasised in boldface in table 9.3 are those not smaller than 0.18 and are deemed significant. All cases with r2 ∈ [0.15, 1.18) are typeset in italic. Moreover, each instance from set X was manually classified as ‘big valley?’=‘yes’, ‘no’ or ‘amb.’ (ambiguous) based on the observed r2 . The conclusions related to values of FD determination coefficients are also confirmed through inspection of fitness-distance plots. Here, 2 dimensional plots are obtained by cutting a slice through clouds of 3-dimensional points, along the plane f (s1 ) = f (s2 ). Points included in the plot satisfy the constraint |f (s1 ) − f (s2 )| < ² · f (s1 ). Usually ² was set to 5%, but sometimes it was necessary to put ² = 10% in order to have any points in a scatter plot. 179 9.5. Fitness-distance analysis Table 9.3: Values of the linear determination coefficient r2 between fitness and each similarity measure for instances from set X. instance linear determination coefficient 2 rcp 2 rcs 2 rcswg 2 rcsuc 2 rcsucwg 028X00-0065 035X60-0090 655X30-0219 034X30-0231 655X30-0264 064X30-0273 028X00-0325 035X60-0376 048X31-0459 048X30-0519 022X60-0704 029X30-0780 064X30-0875 034X30-0921 025X00-0996 039X30-1037 039X30-1247 023X30-1260 024X30-1319 0.001 0.008 0.008 0.002 0.006 0.004 0.000 0.002 0.002 0.001 0.007 0.007 0.000 0.005 0.003 0.004 0.006 0.005 0.003 0.000 0.380 0.460 0.009 0.361 0.577 0.003 0.586 0.003 0.289 0.474 0.401 0.024 0.324 0.007 0.370 0.122 0.458 0.005 0.002 0.415 0.014 0.020 0.008 0.045 0.003 0.541 0.003 0.002 0.584 0.140 0.002 0.000 0.001 0.018 0.001 0.011 0.006 0.454 0.171 0.000 0.392 0.424 0.006 0.249 0.001 0.219 0.196 0.157 0.026 0.122 0.006 0.244 0.055 0.243 0.003 0.002 0.376 0.002 0.003 0.006 0.004 0.001 0.322 0.006 0.007 0.522 0.002 0.011 0.002 0.009 0.012 0.001 0.003 avg. std. dev. 0.004 0.003 0.255 0.213 0.101 0.190 0.157 0.148 0.072 0.154 big valley? no yes yes no yes yes no yes amb. yes yes yes no yes no yes no yes no Comment on simcp In case of simcp no correlation between similarity and fitness has been revealed. All the related values of r2 in table 9.3 and also all the averages in table 9.4 are virtually zero. This lack of relationship is also clearly visible in most of FD pots, e.g. 03930-1037 (figure 9.6), 025X00-0996 (figure 9.7). The view on the potential relationship is somewhat obscured in case of instance 048X31-0459 by the existence of separate vertical clouds of points (figure 9.8, top-left plot). When one zooms on the group of best solutions (bottom-left plot in the figure) the lack of relationship is more clear. To summarise, this Hamming-type similarity measure reveals no ‘big valley’ in any of the studied instances. Comment on simcs Conclusions are quite different in case of simcs . Some significant values of the determination coefficient are observed here. It seems that the presence of such values depends heavily on instance type or even the particular instance, though. 2 values for all instance Firstly, one can see in table 9.4 that types (WD) 30 and 60 have average rcs sets rather high: 0.26 and 0.24, respectively. High FD determinations are also found in table 9.3. All 3 instances of type 60 reveal high values (0.38–0.58). 8 for 12 type 30 instances have them high, as well; 4 instances of the type reveal no significant FDC and no trends in FD plots. Concerning the plots, all type 60 instances have visible trends, alike to the one visible in figure 9.9: better solutions are more similar to each other than worse ones. Most of type 30 instances have the trends visible, as well, although the noise in similarity values is also considerable, see e.g. figures 9.6 and 9.10. Secondly, instances of type 00 reveal no correlation between fitness and simcs . All 3 instances of 2 the type in set X have rcs near zero. Moreover, the FD plots show no visible trends, see e.g. figure 180 Adaptation of the memetic algorithm to the car sequencing problem 0.09 0.067 0.08 sim_cs sim_cp 0.07 0.06 0.05 0.04 0.066 0.065 0.064 0.03 290000 300000 fitness 310000 280000 0.18 0.68 0.178 0.67 290000 300000 fitness 310000 290000 300000 fitness 310000 0.66 0.176 sim_csuc sim_cswg 280000 0.174 0.172 0.65 0.64 0.63 0.17 0.62 0.168 0.61 280000 290000 300000 fitness 310000 280000 1.0005 1.00025 sim_csucwg 1 0.99975 0.9995 0.99925 0.999 0.99875 0.9985 280000 290000 300000 310000 fitness Figure 9.6: Fitness–distance plots for instance 039X30-1037 and all similarity measures. 0.07 0.047 0.06 sim_cs sim_cp 0.046 0.05 0.04 0.045 0.03 0.02 500000 600000 700000 fitness 800000 0.044 500000 600000 700000 fitness 800000 Figure 9.7: Fitness–distance plots for instance 025X00-0996, simcp and simcs . 181 9.5. Fitness-distance analysis Table 9.4: Average values of the linear determination coefficient r2 between fitness and each similarity measure, grouped by instance set and type. set WD #inst. 2 avg(rcp ) 2 avg(rcs ) 2 avg(rcswg ) 2 avg(rcsuc ) 2 avg(rcsucwg ) A A A A A 00 01 30 31 60 1 3 4 4 4 0.001 0.007 0.006 0.003 0.003 0.001 0.010 0.349 0.012 0.291 0.001 0.346 0.100 0.241 0.443 0.001 0.005 0.223 0.008 0.164 0.023 0.011 0.014 0.005 0.241 B B B B B 00 01 30 31 60 9 6 9 6 15 0.009 0.008 0.006 0.006 0.005 0.088 0.033 0.199 0.082 0.180 0.007 0.065 0.005 0.047 0.395 0.033 0.013 0.090 0.048 0.083 0.004 0.003 0.006 0.005 0.051 X X X X X 00 01 30 31 60 3 0 12 1 3 0.001 0.004 0.002 0.006 0.003 0.283 0.003 0.480 0.002 0.024 0.003 0.513 0.006 0.171 0.001 0.300 0.004 0.005 0.006 0.407 A B X avg. avg. avg. 16 45 19 0.003 0.007 0.004 0.163 0.133 0.255 0.196 0.149 0.095 0.099 0.060 0.157 0.065 0.020 0.068 avg. avg. avg. avg. avg. 00 01 30 31 60 13 9 25 11 22 0.007 0.007 0.005 0.005 0.005 0.061 0.025 0.263 0.049 0.241 0.006 0.158 0.029 0.113 0.419 0.024 0.010 0.150 0.029 0.127 0.005 0.006 0.007 0.005 0.134 avg. avg. 80 0.005 0.168 0.159 0.091 0.041 0.09 0.066 0.08 0.064 0.06 sim_cs sim_cp 0.07 0.05 0.04 0.062 0.06 0.03 0.058 0.02 0.01 3.1e+007 3.2e+007 fitness 0.056 3.1e+007 3.3e+007 3.2e+007 fitness 3.3e+007 0.09 0.066 0.08 0.06 0.05 0.04 0.064 sim_cs sim_cp 0.07 0.062 0.06 0.03 0.02 0.01 3.121e+007 3.123e+007 3.125e+007 3.127e+007 fitness 0.058 0.056 3.121e+007 3.123e+007 3.125e+007 3.127e+007 fitness Figure 9.8: Fitness–distance plots for instance 048X31-0459, simcp and simcs : whole fitness axis (top); zoom on the best solutions (bottom). 182 Adaptation of the memetic algorithm to the car sequencing problem 9.7. This appears to be the case also in set A of instances. For set B there may exist some outliers, since the average r2 = 0.088; this was not investigated further, though. 2 Thirdly, the only type 31 instance in set X has low rcs , equal to 0.003, which indicates the ‘no relationship’ case. But the FD plot is more informative in this case. The plot with all generated local optima (figure 9.8, top right) is divided into vertical clouds of points. Obviously, these cannot result in high values of the linear determination. But when one zooms on the best solutions, like in case of simcp earlier (figure 9.8, bottom right) there seems to be visible a weak trend between 2 fitness and simcs . With rcs = 0.176 this trend allows to change the classification of the instance from ‘no’ to ‘ambiguous’ in table 9.3. This example shows that indeed conclusions based only on values of FDC may be sometimes wrong and a look at a FD plot may reveal more information. 2 Similarly to this one instance, the average value of rcs for type 31 is very low in set A (0.012) and rather low in set B (0.082), perhaps indicating some outliers in the latter. The look at FD plots reveals no trends in any instance of set A (see figure 9.11 for an example). However, in set B there were two cases very similar to 048X31-0459, namely 023B31-1110 and 029B31-0730. In 2 these instances significant trends were found for groups of the best solutions, with rcs = 0.33 and 2 rcs = 0.21, respectively. Other set B instances did not reveal any relationship between f and simcs . Therefore, the existence of a ‘big valley’ in type 31 seems to be rather unlikely, but depend on the analysed instance. Type 01 is not represented in set X, so sets A and B were investigated. The average values 2 are low in these sets, indicating small chance for ‘big valleys’. This is exactly the case, of rcs 2 except for one instance, 023B01-1110. Indeed, rcs = 0.05 for this problem example, but again FD plots help the analysis, as shown in figure 9.12. The basic plot (top left) shows separated vertical groups of local optima. First zoom on the best group (top right plot in the figure) shows a similar structure, yet with some indication of a slope. Yet another zoom on the best group finally reveals a considerable trend with strength r2 = 0.34. In this group of solutions better ones are more similar to each other than worse ones. The conclusions concerning simcs : • it is correlated with fitness in all studied type 60 instances and most of type 30; • it is not correlated for type 00; • there is rather no correlation for type 31, although with some counterexamples; the correlation seems to depend on the particular instance; • it is mostly uncorrelated with fitness for type 01, with one counterexample found. 0.09 0.28 sim_cswg sim_cs 0.088 0.086 0.084 0.082 3e+007 0.26 0.24 0.22 0.2 4e+007 5e+007 fitness 6e+007 4e+007 5e+007 fitness 6e+007 Figure 9.9: Fitness–distance plots for instance 022X60-0704 and simcs , simcswg . 183 9.5. Fitness-distance analysis 0.062 0.069 0.068 0.061 sim_cs sim_cs 0.067 0.06 0.059 0.066 0.065 0.058 0.057 300000 0.064 320000 340000 fitness 0.063 240000 360000 260000 280000 300000 fitness Figure 9.10: Fitness–distance plots for instances 023X30-1260 and 034X30-0921, simcs . 0.034 0.137 0.135 sim_cswg sim_cs 0.033 0.032 0.031 0.03 2e+007 0.133 0.131 0.129 2.5e+007 3e+007 fitness 0.127 2e+007 3.5e+007 2.5e+007 3e+007 fitness 3.5e+007 0.054 0.053 0.053 0.052 0.052 sim_cs 0.054 0.051 0.05 0.051 0.05 0.049 4.7e+007 4.9e+007 5.1e+007 5.3e+007 5.5e+007 fitness 0.049 4.8e+007 4.8005e+007 fitness 4.801e+007 0.053 0.052 sim_cs sim_cs Figure 9.11: Fitness–distance plots for instance 024A31-1260, simcs and simcswg . 0.051 0.05 4.80006e+007 4.80007e+007 fitness Figure 9.12: Fitness–distance plots for instance 023B01-1110 and simcs : whole fitness axis (top left); first zoom on the best solutions (top right), second zoom on the best solutions (bottom). 184 Adaptation of the memetic algorithm to the car sequencing problem Comment on simcswg This measure was supposed to be a replacement for simcs in case of types 01 and 31. There is only one instance of these types in set X, so it is difficult to form any conclusions based on this one case. Moreover, this instance is 048X31-0459, the one for which correlation of f and simcs was found (see figure 9.8). No such correlation with simcswg was revealed. 2 Type 01 instances from set A reveal high rcswg , the average being 0.346. Significant trends were also confirmed in FD plots. The obtained values for type 31 in this set are high in 3 on 4 cases, being 0.25 or higher. FD plots show the trends, as well; an example is shown in figure 9.11. 2 The sole exception is instance 039A31-0954, for which rcswg = 0. In set B, however, types 01 and 31 reveal no correlation of fitness and simcswg . In some of 2 these instances (type 31) high rcs were found instead, as discussed earlier. 2 Unexpectedly, high rcswg values were found for all instances of type 60, where the colour is the most important property of a car. An exemplary FD plot of such a case is shown in figure 9.9. Therefore, it seems that the existence of a ‘big valley’ with respect to simcswg depends on the analysed instance. It is likely to exist for types 01 and 31 in set A, while it is rather unlikely in set B. Only the type 60 instances behave in a consistent way, but this is not that important knowing that P CC(s) is easily optimisable. Comment on simcsuc This one is very much related to simcs , but measures only the immediate neighbours in a sequence. It was found to correlate with fitness in exactly the same cases as simcs , but usually weaker. This can be to some extent seen on instance 039X30-1037, in figure 9.6: the trend for simcs (r2 = 0.37) is a bit more concentrated around the regression line than the trend for simcsuc (r2 = 0.244). This general observation suggests that the relation of succession (neighbourhood) between certain groups of cars is important for the overall quality of a sequence. This conclusion was to some extent predictable, given the results for simcs . However, the comparison of results for the two measures indicates that also sequences longer than 2 are of importance, so that simcs reveals slightly stronger trends in sets of local optima. Comment on simcsucwg This similarity measure correlates to some extent with fitness in case of type 60 only, where the colour of a car is its only identifier. This is not surprising given the fact that in type 60 P CC(s) is the most important subcriterion, forcing good solutions to have cars of the same colour put in continuous subsequences. For other types of instances no significant FD determination values were found, which is visible in tables 9.4 and 9.3. Even the instances for which simcswg is correlated with fitness are not positive for simcsucwg ; the values of r2 are very low and no trends are visible in FD plots. Moreover, simcsucwg seems to degenerate for some instances, like 039X30-1037 shown in figure 9.6. In this example there are only two values of the measure possible, while for 039X30-1247 there is only one (hence no variance of similarity and no correlation). This happens when there are scarce high priority options, e.g. for 039X30-1037 there are two HPRCs, but only 3 combinations of the related options. Thus, it seems that something more than the succession relations between weaker groups are needed to find a ‘big valley’ in the studied instances. Comment on instances In set X there are 7 instances which appear to have no ‘big valleys’ with respect to any of the presented similarity measures. There are all type 00 instances and 4 of 12 type 30 ones. In 12 other instances some correlation of fitness and simcs was revealed. Some 9.6. CCSPX: conservative common subsequence preserving crossover 185 of them, all type 60 included, reveal fitness-distance correlation also from the point of view of simcswg , simcsuc or simcsucwg . 9.5.4 Main conclusions from the fitness-distance analysis The initial guess that positions of groups of cars in a sequence do not matter in the CarSP has been confirmed in this analysis. Neither are local optima significantly more similar to each other than random solutions, nor there is any correlation between fitness and simcp in sets of local optima. On the other hand, the hypothesis that subsequences of groups of cars (no matter their location) play an important role in good solutions of the CarSP has been to some extent confirmed here, for some types of instances. Similarity with respect to simcs is significantly higher in sets of local optima than in random solutions, except for type 00. Important correlations of f and simcs exist in types 60 and 30 (with some exceptions). For types 01 and 31 it is rather unlikely and depends on an instance. Instances of type 00 show no correlation for this similarity. The introduction of weaker groups and the examination of simcswg revealed that in instances of types 01 and 31 the correlation of fitness and similarity may exist, especially in set A. The result 2 is inverse of the expected one in sets B and X, though. High rcswg exist for instances of type 60. Common succession relations, as measured by simcsuc , are important in local optima, because their similarity is significantly higher than this of random solutions. The observed correlations are considerable for the same instances as for simcs , but usually weaker. This indicates that simcs is a better choice than simcsuc for a measure of similarity of CarSP solutions. The same conclusion applies to simcsucwg when compared to simcswg . Only instances of type 60 have high correlations of this measure and fitness, and they are lower than rcswg . Moreover, the change in average similarity when comparing rand and lopt is lower. Therefore, the author forms the following conclusions concerning the properties of ‘genetic’ operators in the designed memetic algorithm: • for types 60 and 30 a recombination operator preserving simcs should be of particular use; • types 31 and 01 might prefer simcswg , but the results of FDA are somewhat confusing here, with a number of exceptions and ‘no big valley’ cases; • type 00 should prefer mutation to recombination, since no regularities in the analysed landscapes have been found. 9.6 CCSPX: conservative common subsequence preserving crossover Based on the results of the fitness-distance analysis, the idea of this crossover is to preserve all common parental subsequences of groups or weaker groups, depending on instance type. Moreover, the operator is supposed to be conservative and not disruptive, i.e. change as little as possible in the order of common subsequences and in the order of positions not covered by these subsequences. The general steps of CCSPX are given in algorithm 23. The operator is parametrised: groupLevel indicates which groups of cars should be used (ordinary or weaker); numSwaps is the number of pairs of subsequences which are swapped in the offspring compared to one of its parents, the chosen donor. All random choices in CCSPX use uniform probability over the considered set of objects. An example of how CCSPX operates is shown in figures 9.13 and 9.14. The example uses ordinary groups of cars. The first figure shows common subsequences of two parents, using the same means as in figure 9.4. Figure 9.14 illustrates the same subsequences in their CCSPX offspring, where the donor was p2 . Here, the positions not covered by common subsequences are not marked 186 Adaptation of the memetic algorithm to the car sequencing problem Algorithm 23 o = CCSP X(p1 , p2 , groupLevel, numSwaps) compute all maximal common subsequences in p1 and p2 using groups on level groupLevel choose randomly the donor of subsequences, donor = p1 or donor = p2 merge overlapping common subsequences of the donor merge consecutive non-common positions of the donor into subsequences get the vector of all subsequences from the donor in their original order swap numSwaps randomly chosen pairs in the vector of subsequences assemble the offspring o from the subsequences in the order given in the vector return o with a ‘-’ sign, because they were merged into subsequences by CCSPX. Moreover, a ‘.’ indicates subsequences which were swapped. In this example 2 pairs were swapped: subsequence number 18 with 37 (the two longer ones) and 53 with 15 (the pair of shorter ones). ----(****)(*********)(******)(***)(***********)--(**)---(*)-(********) -(************)-(*)()(********)(************)--(***)(**)-(*********)(* *)(****)-(**)(***)(******)()(**)(**********)-(*)(***)-(**)-(**)(***)(* **)-(**)--(**)(*****)(**)--(**)()-(************)(***)()()()(**) (**)-()(***)--()-(****)(**********)(******)(*****)-(**)(************** )-(*******)(*****)(*********)(*)()()()(*************)(*****)(*****)(** ***)--(*)()-()(************)(***)-(*******)(**)(****************)-(*** *****)()()--(******)(**)()-()-(*)()(***)(*)---()(*)(******)(**) Figure 9.13: Common subsequences in two local optima of instance 064X30-0273, parents in CCSPX example. Overlapping sequences are merged into one. Normalised simcs = 0.128. Average covered length: 91.9%. f (p1 ) = 68000 (top) and f (p2 ) = 54000 (bottom). (**)(()(***)()()((****)(**********)(******)(*****)((**)(************** )...(*******)(*****)..................(*)()()()(*************)(*****)( *****)(*****)()(*)()(()(************)(***)((*******)(**)...........((* *******)()()()(******)(**)()(()((*)()(***)(*).()(*)(******)(**) Figure 9.14: Subsequences in a CCSPX offspring o of parents shown in figure 9.13. donor = p2 . Dots (.) indicate swapped subsequences. Normalised simcs (p1 , o) = 0.128, simcs (p2 , o) = 0.466, f (o) = 6057000. One can see that certain positions of cars were disrupted by CCSPX, especially between the swapped subsequences 18 and 37. But similarity of the offspring with their parents in terms of subsequences is not lower than the similarity between parents: simcs is preserved by CCSPX. Moreover, the offspring is rather more similar to the donor than to the other parent. This is an obvious effect of inheriting the subsequences directly from one parent, with only some small changes (swaps). Yet the offspring is different from the donor. The exemplary offspring is worse than both of the parents. After local search it improves, so that f (o0 ) = 58000. At the same time the similarity to parents remains preserved to high degree: simcs (p1 , o0 ) = 0.125, simcs (p2 , o0 ) = 0.169. 9.7 RSM: random shuffle mutation The proposed CCSPX crossover preserves subsequences from parents. Local search further improves them. Therefore, intuitively, the mutation operator should somehow disrupt the subse- 9.8. Adaptation of crossovers from the literature 187 quences contained in a solution. The random shuffle operator was chosen for this task. It was employed earlier e.g. by Gottlieb et al. in their local search (see section 8.6.1). The operator randomly reorders a part of the given sequence. Firstly, it draws randomly (uniformly) the length of the mutated subsequence from the given interval [lowBound, upBound]. Secondly, it chooses randomly (uniformly) the actual starting position for the subsequence so that it fits entirely in the solution. Thirdly, all groups of cars are removed from the subsequence. Finally, all these groups are reinserted into the emptied part, one by one, starting from its beginning. The group to be inserted is chosen randomly from the set of available ones, with the probability proportional to the number of available cars. The only exception is the case when some groups may immediately violate sequence feasibility (the paint batch limit). Then, only the groups which cannot lead to infeasibility are considered. If there is no such possibility, infeasibility of the mutant is signalled. 9.8 9.8.1 Adaptation of crossovers from the literature Adaptation of NCPX NCPX was originally designed for the CSPLib CarSP, so it does not take weights of RCs and colour into account (see section 8.6.6). In order not to worsen its results on Renault’s CarSP, the operator was additionally adapted to this version of the problem. Thus, in the second stage of the NCPX each group index to be heuristically inserted into the offspring is chosen by means of the modified function I 0 : ( −∆V Nw (sp , i, g) if ∆V Nw (sp , i, g) > 0 0 I (sp , i, g) = DSUw (sp , g) otherwise where ∆V Nw is defined like in the extended Gottlieb and Puchta’s heuristic, see section 9.4.3. DSUw (sp , g) is the weighted DSU heuristic evaluation which was used as a part of evalDSU (sp , g) in section 9.4.3: O X DSUw (sp , g) = wj · optj (g) · utilRate(j, sp ) j=1 The colour of cars was not considered in this extended version of NCPX. Taking colour into account is not a straightforward task: the partial solution in NCPX is not simply a sequence of consecutive positions which finished before all cars are inserted, but it may have some gaps of different size. This makes the heuristic evaluation of colour changes more difficult. 9.8.2 Adaptation of UAX The original repair method in UAX does not take into account the situation when an underrepresented group is completely missing from the offspring (see section 8.6.4). In such a case the repair method cannot find any car which belongs to this group in the offspring sequence and the whole crossover has to fail. In initial tests on Renault’s CarSP it was observed that this causes some problems: a large fraction of crossovers on local optima did not generate feasible offspring; only 10–20% of trials were feasible. Therefore, a modified repair method was proposed by the author, which improves over the original. This extended repair method proceeds as the original one with one exception. When an overrepresented groups is found and no underrepresented group exists in the offspring, the set of all underrepresented groups is gathered based on the comparison of problem data and the offspring. Then, an underrepresented group is chosen randomly, with the probability being proportional to 188 Adaptation of the memetic algorithm to the car sequencing problem the number of cars of the group which are missing in the offspring. In initial tests it was noticed that such an approach results in a higher fraction of feasible offspring, between 50% and 100%, with the majority above 80%. Thus, the slightly modified repair function makes UAX more applicable to Renault’s problem. 9.9 Experiments with initial solutions In this experiment the quality and the time of generation of initial solutions was investigated, also when coupled with local search. All the algorithms described in section 9.4 were tested here. Local search was employed in 4 randomised variants: insert, swap, insert+swap, swap+insert. Also solutions without local search were examined. Each combination of a heuristic and local search was run 30 times on each set X instance. All experiments were performed with the code developed by the author in C++, with several procedures implemented by members of the challenge team: Andrzej Jaszkiewicz and Paweł Kominek. Instances of types 00, 30, 31 The average quality of results for all types of initial solutions and LS are shown in figure 9.15. The quality is measured with respect to the average best solution obtained in the challenge (see table 8.2, column ‘best average’). The results are averaged over instances of types 00, 30 and 31 only; instances of type 60 are dealt with separately, because of the exact polynomial algorithm for P CC(s). 300% average excess [%] 250% 200% 150% 100% 50% 0% DSU0 DSU3 DSU6 Kominek's random exact PCC initial algorithm none insert swap insert+swap swap+insert Figure 9.15: Average quality of heuristic solutions over type 00, 30, 31 instances; without and with local search. One can see in the figure that local search is mandatory after initial heuristics in order to obtain good quality. The best LS version involves insert+swap, the insert is second-best. Any LS with swap gives poorer results. On average, the best initial solutions are generated by the DSU6 algorithm coupled with insert+swap; the average excess amounts to 34%. Kominek’s heuristic, DSU3 and exact PCC are not much worse, with results varying between 36% and 40%. However, compared to the best average quality in the challenge, there is still much scope for improvement. 189 9.9. Experiments with initial solutions The time of computation of initial solutions mainly depends on instance size and LS version. Therefore, the results presented in figure 9.16 are averaged over instances and types of algorithms. The figure also presents lines of power regression (all r2 > 0.85). 25000 average time [ms] 20000 15000 10000 5000 0 0 200 400 600 800 1000 1200 1400 instance size none insert swap insert+swap swap+insert r-none r-insert r-swap r-insert+swap r-swap+insert Figure 9.16: Average time of computation of initial heuristics followed by local search. It can be seen in the figure that local search slows down the generation of solutions considerably. While without LS a solution for the largest instance is generated in less that 0.5 s, with local search it may take 50 times longer, on average. Still, the times are reasonable, being less than 25 seconds per solution for the slowest LS, insert+swap. This is the best version of LS in terms of solution quality, though. Instances of type 60 In case of these instances the exact algorithm for P CC(s) ensures that the generated solutions are already of good quality. Coupled with insert+swap LS, the average result of this heuristic is usually very close to the best from the challenge, less than 0.02% of excess, as table 9.5 indicates. Moreover, the best in 30 runs has equal evaluation to the best challenge result. Table 9.5: Average and best quality of solutions to type 60 instances generated by the exact PCC algorithm followed by insert+swap local search. instance 035X60-0090 035X60-0376 022X60-0704 best challenge average initial best initial 5010000 6056000 12002003 5010800.0 6057066.6 12002205.1 5010000 6056000 12002003 Together with local search, it always takes less than 2 seconds to generate such a solution for each type 60 instance. It seems that for these instances it is sufficient to use this algorithm to solve them. Composition of initial population Given the results of this experiment, the initial population of the memetic algorithm always contains solutions improved with insert+swap local search. One solution of each DSU heuristic is 190 Adaptation of the memetic algorithm to the car sequencing problem included, one of exact PCC. If the population is larger than 4 elements, it is completed with different solutions of Kominek’s heuristic. 9.10 Experiments with memetic algorithm 9.10.1 Long runs until convergence In this experiment long runs of the memetic algorithm were allowed, most probably until complete convergence. The algorithm stopped after 75 unproductive generations, i.e. when nothing changed in the population. For a population of 15 solutions it meant 5 crossover and mutation trials on each member before convergence, on average. The same value was used in experiments on the CVRP (section 7.9). Population of size 15 was chosen, half of the size for the CVRP. This choice was made due to larger sizes of instances in the CarSP, which caused the initialisation phase to be considerably longer. For the largest instance the size of 15 means around 200–250 s for initialisation itself. The size of 30 would mean 400–500 s for this phase, around 80% of the time limit in ROADEF Challenge 2005. The author considered this to be too long. The contents of the initial population were described in the previous section. Several versions of the memetic algorithm were run. The basic version employed RSM as mutation and one crossover: NCPX, UAX or CCSPX applied on the ordinary groups with numSwaps = 2 (denoted CCSPX-2). Additionally, one version without crossover was used (MUT) and one without any ‘genetic’ operator, starting in the main loop from random solutions, hence resulting in multiple start local search (MSLS). Moreover, in order to assess the impact of mutation on results of the MA, one configuration without mutation was used for each recombination. This is denoted ‘noMutation’ or ‘noMut’. Each configuration and version of the MA was run 15 times on each of the 19 set X instances. The same set of computers as in CVRP experiments was used. Quality of solutions: aggregated results 10% 9% 9% MSLS MUT CCSPX-2 MA type basic noMutation NCPX UAX 4% 1% 0% MSLS MUT CCSPX-2 NCPX MA type basic noMutation Figure 9.17: Quality of results: average over instances of the average result (left) and of the best result (right) in 15 long runs. 7,29% 2% 3,48% 3% 5,96% 8,76% 4,35% 8,01% 4,37% 0% 5,07% 1% 4,09% 2% 4,45% 3% 5% 3,40% 4% 6% 4,15% 5% 7% 2,90% 6% 8% 3,29% 7% 7,63% 8% average excess [%] 10% 8,95% average excess [%] The values of excess over the best average challenge solution, aggregated over all instances, are presented in figure 9.17: the average of averages on the left, the average of the best in 15 runs on the right. The actual evaluations of solutions, on which these statistics are based, are presented in tables B.3 and B.4 in the appendix. UAX 191 9.10. Experiments with memetic algorithm Concerning the average quality of solutions of the basic versions, one can see in the figure that MSLS is able to get as close as 9% to the best solutions. The best MA, CCSPX-2, can improve this result by around 5%, to 4.09%. Other basic versions of the MA are little worse: UAX has 4.35% of excess, NCPX 4.37%. However, RSM mutation alone is able to get as close as 4.45%. Standard deviations are less than 1% in most of the cases. Aggregated results of statistical tests for the difference between mean quality of results of basic MAs are shown in table 9.6. The same procedure as for the CVRP was used here (see section 7.9.1). Table 9.6: Comparison of the basic algorithm versions with the Cochran-Cox statistical test for the significance of the difference of averages; long runs. MSLS MUT CCSPX-2 NCPX UAX MSLS MUT CCSPX-2 NCPX UAX totals sum 0/0 15/0 15/0 15/0 15/0 0/-15 0/0 6/0 0/0 1/0 0/-15 0/-6 0/0 0/-6 0/-6 0/-15 0/0 6/0 0/0 0/0 0/-15 0/-1 6/0 0/0 0/0 0/-60 15/-7 33/0 15/-6 16/-6 -60 8 33 9 10 The highest net flow score in the table (‘sum’) is obtained by CCSPX-2. This is the best algorithm in direct comparisons with other ones, winning 33 and losing none. It is statistically better than MUT, NCPX and UAX on 6 instances, including the two largest ones (024X30-1319, 023X30-1260). After a substantial gap, UAX, NCPX and MUT follow, having almost the same score. MSLS is last, winning no comparison and losing 15 to each version of the MA. The results of the configurations without mutation reveal the strength of the recombination operators. The excess of NCPX and UAX jumps to 8% and more without RSM, indicating that without mutation these operators are hardly able to generate better solutions than MSLS. From this point of view especially UAX seems to be unable to generate such solutions; NCPX is slightly better. Conversely, CCSPX-2 without mutation is worse than with RSM, but by less than 1%. This indicates that the crossover is able to improve over MSLS, by approximately 3%, on average. While looking at the best results of the MAs (the chart on the right in figure 9.17), one can see that the quality of solutions improves by additional 0.8–1.2% compared to the averages. The winner among the tested algorithms is the same: CCSPX-2 with RSM. MUT comes as second, then NCPX and UAX. Again, the impact of mutation on algorithms is clear: CCSPX-2 loses only 1.2% without RSM, while NCPX loses 2.5% and UAX 3.8%, being almost equal to MSLS. The best-known solutions Basic versions of the MA are able to generate best-known solutions on 5–6 out of 19 instances. The actual numbers are shown in table 9.7. They are based on tables B.3 and B.4. Among the 5 instances common to all best runs there are all 3 instances of type 60, for which the MA is not required. CCSPX-2 is better than other algorithms in the ‘best’ column by one instance; it once generated a solution better than the best in the challenge, on instance 048X30-0519 (see table B.4). The hardest instances For 2/3 of the considered instances the MAs are able to generate very good solutions. However, for 5–6 instances some of the algorithms are not able to get closer than 10% to the best challenge 192 Adaptation of the memetic algorithm to the car sequencing problem Table 9.7: The number of instances for which each basic version of the MA found the best-known solutions in long runs. Best: the best run of 15 for a version; all: all runs of a version. MA version best all 5 5 6 5 5 4 5 5 5 5 MSLS MUT CCSPX-2 NCPX UAX solution. The average results on these instances are shown in figure 9.18 32% average excess [%] 28% 24% 20% 16% 12% 8% 4% 0% 024X30-1319 023X30-1260 025X00-0996 034X30-0921 029X30-0780 064X30-0273 MA type MSLS MUT CCSPX-2 NCPX UAX Figure 9.18: Average quality for the instances hardest for the MA; long runs. It can be seen that the application of ‘genetic’ operators improves the results of MSLS considerably on these instances, in two cases even by more than 20%. CCSPX-2 generates the best results in 4 cases, MUT and UAX in one each. Quality of solutions: summary All these results suggest that, on average, RSM mutation seems to be the most important operator for the tested MAs, and that crossover is only a helper operator. NCPX and UAX seem to be almost redundant when RSM is employed. CCSPX-2, when coupled with RSM, generates the best overall results, so its presence in the algorithm is important. Time of computation: basic MA The average time of computation of the basic MA versions, as a function of instance size, is shown in figure 9.19 In the figure the lines of power regression are also shown. In all the presented cases the values of r2 are larger than 0.77, which indicates a fairly good fit. The quality of regression is rather worse than in the CVRP case, though. There are two causes of this fact. First is the presence of instances of type 60, which have unusually short running times. Second is the number of ratio constraints, which vary across instances and was found to be an important factor influencing the 193 9.10. Experiments with memetic algorithm 12000 average time [s] 10000 8000 6000 4000 2000 0 0 250 500 750 1000 1250 instance size MSLS MUT CCSPX-2 NCPX UAX r-MSLS r-MUT r-CCSPX2 r-NCPX r-UAX Figure 9.19: Average time of computation of the basic MA versions, together with lines of power regression; long runs. time of computation. 16–28% of variance in the average time can be attributed to this number. Nevertheless, the author decided to form conclusions based on the lines presented in figure 9.19. It can be observed in the figure that MSLS usually stops first. MUT is a bit more timeconsuming version of the algorithm, but in some rare cases it takes less time than MSLS. The third version, CCSPX-2, takes around twice as much time as MUT, but only 68% of UAX and 53% of NCPX, on average. The running times of the MA are generally quite long. The maximum time for CCSPX-2 is approx. 4 h, while NCPX and UAX may take even 6 h for the largest instances. Impact of RSM mutation The average time of computation for the basic and noMutation configurations of the MA are shown in figure 9.20, together with lines of power regression. While with RSM mutation NCPX and UAX take considerably more time than CCSPX-2 to converge, without it the times are more or less comparable. Therefore, it may be said that RSM has much stronger impact on the MA in case of NCPX and UAX than on CCSPX-2. Generally, the presence of RSM mutation seems to substantially increase the algorithm’s ability to generate new good solutions for the population. ’Genetic’ operators: effort of local search The author wanted to see what the efficiency of the ‘genetic’ operators actually is, e.g. how much effort does local search need to optimise offspring they generate. Thus, the average number of LS iterations per MA generation for several instances was analysed. Two representative plots of these numbers are shown in figure 9.21. Each line in the plots is an average over 15 independent runs, further processed with a moving average (window size 10) in order to smooth the lines and make them more readable. One can see in the plots that it is MSLS which requires the largest number of LS iterations per generation to be performed. This is an obvious result, since random solutions are starting points for LS in this algorithm. Quite surprisingly, UAX alone (‘UAX-noMut’ series) generates solutions which require comparable LS effort, like they were random solutions. Moreover, the process of computation stops 194 Adaptation of the memetic algorithm to the car sequencing problem 24000 24000 21000 21000 average NCPX time [s] average CCSPX-2 time [s] 2 18000 15000 2 R = 0,7746 12000 9000 6000 2 R = 0,7476 3000 R = 0,801 18000 15000 12000 9000 2 R = 0,8361 6000 3000 0 0 0 250 500 750 1000 1250 0 250 500 instance size 750 1000 1250 instance size 24000 2 R = 0,7971 average UAX time [s] 21000 18000 15000 12000 9000 6000 2 R = 0,8022 3000 0 0 250 500 750 1000 1250 instance size Figure 9.20: Average time of computation: CCSPX-2, NCPX (top) and UAX (bottom); long runs. Basic MA: squares, solid line; noMutation MA: diamonds, dotted line. almost as soon as in the case of MSLS. Hence, UAX appears to be a highly disruptive operator. NCPX is slightly better from this point of view. For the majority of instances it behaves in a way similar to UAX (like for instance 048X30-0519 in figure 9.21). But in some rare cases (like instance 025X00-0996) it is able, after a number of generations, to generate offspring which are closer to local optima in terms of LS iterations than those of UAX. Nevertheless, the offspring of NCPX still require considerable LS effort to become local optima. RSM mutation generates solutions much closer to local optima in these terms, thus saving much computational effort. It does so from the very beginning of the computation and the number of iterations stays at a constant level until the end. This end of artificial evolution comes much later than for UAX and NCPX, meaning that mutation is much longer able to introduce some new solutions to the population. CCSPX-2 alone seems to be the best from this perspective. Moreover, it does not stop before MUT does, so it is also able to sustain the process of artificial evolution (does not converge earlier). At the beginning of computation the versions of MA which employ both the mutation and a crossover generate offspring which simply require the average number of LS iterations after the mutation and the same crossover. The picture usually changes much later. The NCPX-RSM pair start decreasing the number of LS iterations per generation after some 1500–2000 generations. The decrease is rather slow and reaches the level comparable with CCSPX2 after some 1000 more generations, when the MA nearly stops. The picture for UAX is quite similar, with the decreasing trend starting perhaps slightly earlier. Predictably, these are CCSPX-2 and RSM which are closer to local optima and require less LS effort. Perhaps a little bit surprisingly, the number of LS iterations increases in case of 025X00-0996, but the trend is extremely slow and the whole process near convergence (the last of 15 runs stops at generation 3260). 195 average LS iterations per generation 9.10. Experiments with memetic algorithm 50 MSLS 40 UAX-noMut NCPX-noMut NCPX 30 UAX MUT 20 CCSPX-2 CCSPX-2-noMut 10 0 0 500 1000 1500 2000 2500 3000 average LS iterations per generation generation MSLS MUT CCSPX-2 NCPX UAX CCSPX-2-noMut NCPX-noMut UAX-noMut 50 40 MSLS UAX-noMut 30 NCPX-noMut NCPX 20 UAX MUT 10 CCSPX2-noMut CCSPX-2 0 0 500 1000 1500 2000 2500 3000 generation MSLS MUT CCSPX-2 NCPX UAX CCSPX-2-noMut NCPX-noMut UAX-noMut Figure 9.21: Average numbers of LS iterations per generation: 025X00-0996 (top) and 048X30-0519 (bottom); long runs. Looking at figure 9.21 one can also confirm the earlier observation that the presence of RSM mutation hugely influences the ability of MA-NCPX and MA-UAX to sustain the process of computation. Conversely, CCSPX-2 alone can generate new solutions to the population as long as the mutation. ’Genetic’ operators: successful trials To see the efficiency of the operators from a different angle, the author also computed the average percentage of successful trials of each operator in each run. Figure 9.22 presents the percentages averaged over all runs and instances. Obviously, these percentages cannot be constant during a run, because the process of computation would not stop otherwise. But such a figure may give a general idea of how successful each operator is in inserting new solutions into the population. The figure shows that RSM mutation is the most successful operator. It is able to insert (after LS) almost 15% of its mutants into the population. The efficiency of RSM is almost the same when it is coupled with NCPX and UAX. It drops, however, when used with CCSPX-2; instead, a considerable percentage (5%) of CCSPX-2 offspring is inserted into the population. Compared to only approx. 1% of successes of NCPX and UAX, CCSPX-2 seems to be much more efficient. When crossover is a stand-alone operator, the percentage of successes increases. Yet CCSPX-2 remains the best one, with 11% of successful trials; NCPX and UAX are able to insert into the 196 Adaptation of the memetic algorithm to the car sequencing problem 18% 14% 12% 10% 8% 6% MUT CCSPX-2 7,55% 0% 8,40% 2% 7,54% 4% 14,74% successful operations [%] 16% NCPX UAX MA type crossover mutation crossover+mutation noMutation Figure 9.22: Average percentage of successful operations: crossovers, mutations, crossovers and mutations together, crossovers alone (noMutation); long runs. population only 6–7% of their offspring. To sum up these results, CCSPX-2 is undoubtedly the best of the 3 tested crossovers in long runs: it has a higher percentage of successes than NCPX and UAX; it generates better starting points for local search, thus contributing to lower times of computation; it generates better solutions. 9.10.2 Runs limited by time The goal of the second experiment was to see the efficiency of the tested operators in MAs given limited amount of time. Therefore, the same versions and configurations of the MA as in the previous experiment were run with the limit of 600 s (the limit used in the ROADEF Challenge 2005). Each algorithm was again executed 15 times on 19 set X instances. The same set of computers was used. Quality of solutions: aggregated results In figure 9.23 one can see the aggregated quality of results: the average over instances of the average excess over the best average challenge solution. 10% 8% 7% 6% 5% 4% MSLS MUT CCSPX-2 NCPX 8,71% 5,85% 7,74% 6,00% 0% 5,29% 1% 5,23% 2% 5,20% 3% 8,75% average excess [%] 9% UAX MA type basic noMutation Figure 9.23: Quality of results: average over instances of the average result in 15 short runs. 197 9.10. Experiments with memetic algorithm The figure shows that the worst algorithm is again MSLS. This generally confirms that the introduction of mutation and crossover generally contributes to the quality of results. The best MA is the version with mutation alone (5.2% excess), while the one employing RMS and CCSPX-2 is worse only by a fraction of a percent (5.23% excess). Other MAs with crossover and mutation produce a bit worse solutions. The worst is NCPX, which comes as a surprise, since this operator is equipped with some direct heuristic knowledge about the CarSP, contrary to UAX. The ranking of the basic versions based on the Cochran-Cox tests confirms the above conclusions. The ranking is: MUT (34/-2), CCSPX-2 (32/-2), UAX (13/-15), NCPX (12/-20), MSLS (0/-52). The direct comparison of the best and the second-best algorithms, MUT and CCSPX-2, yields a draw: MUT wins on two instances (034X30-0231, 048X31-0459), while CCSPX-2 on two other ones (023X30-1260, 064X30-0875); in other cases the results are not statistically different. The ranking of MAs without mutation is the same as in the previous experiment: CCSPX-2 is the best and nearly as good as with RSM mutation; NCPX comes as second and slightly improves over MSLS; UAX is the worst and contributes nothing to the search compared to MSLS. 9.10.3 Quality vs. FDC 60% 60% 50% 50% 40% 40% r^2 sim_cs r^2 sim_cs The relationship between the quality of results generated by MAs and the fitness-distance correlation was also of interest. As the indicator of quality the average gain of an MA version over MSLS in long runs was employed, i.e. the difference of excess over the best-known solution for each instance. Values of the determination coefficient (r2 ) between fitness and simcs were taken from table 9.3 as indicators of the strength of ‘big valleys’; this similarity measure was directly exploited in CCSPX. The scatter plots of these two variables for the basic and the noMutation MAs are presented in figure 9.24. 30% 20% 10% 30% 20% 10% 0% 0% 0% 5% 10% 15% 20% excess gain over MSLS [%] MUT CCSPX-2 NCPX UAX 25% 0% 5% 10% 15% 20% 25% excess gain over MSLS [%] CCSPX-2-noMut NCPX-noMut UAX-noMut Figure 9.24: Gain in quality of MA versions over MSLS versus fitness-distance determination coefficients for simcs : basic MA (left) and noMutation MA (right). There is no relationship visible in the presented plots. Surely, there is no linear relationship in any series of points. This is confirmed by very low values of the linear determination coefficient: for all series it is well below 0.1. One can see that there are high MA gains both for instances with 2 low and high rcs , and vice versa. More importantly, there is no visible trend for MA-CCSPX-2, either in the basic or in the noMutation configuration. This is the MA employing CCSPX, which was designed based on results of the fitness-distance analysis. This lack of the expected direct relationship may be due to the mixture of instances, with strongly variable sizes, numbers of ratio constraints, types, sources (Renault factories); possibly 198 Adaptation of the memetic algorithm to the car sequencing problem all these factors influence instance hardness. Also the reference quality of solutions, taken from MSLS, is a variable that may be influenced by these factors. 9.11 Summary and conclusions This chapter presented the adaptation of the memetic algorithm to Renault’s car sequencing problem. Crucial elements of this adaptation were described, such as the choice of a representation, design of local search, and operators of crossover and mutation. In particular, the design of ‘genetic’ operators was performed based on results of the fitnessdistance analysis. Before this analysis, several initial hypotheses about properties of solutions which may influence the objective function were formulated. These properties were expressed in terms of similarity measures, further employed in the FDA. The analysis confirmed the initial hypotheses: positions of groups (cars) do not matter in the CarSP, while it is important for good solutions to contain similar subsequences of groups (correlation of fitness and similarity in terms of subsequences exists in many instances). Succession relations (adjacency of groups) are also important for quality, but to a lesser extent than subsequences. Unfortunately, high FDC with respect to simcs or simcswg is not the property of the CarSP as a whole. Rather, it is a property of particular instances: type 60, most of type 30, some of types 31 and 01. It is present in none of type 00 instances. Perhaps some more detailed analyses of those instances could reveal the ‘big valley’ structure, but this is a subject for another investigation. Anyway, based on positive FDA results for most of the analysed instances, a crossover operator preserving common parental subsequences of groups was designed, CCSPX-2. Also a mutation disruptive for subsequences was chosen, RSM. These operators were tested in the memetic algorithm and compared to the operators taken from the literature in two computational experiments. These experiments revealed that the proposed pair of operators, CCSPX-2 and RSM, was the best design in terms of several performance indicators. It generated solutions of the best average quality in long MA runs and generated one new best-known solution. These operators had the highest probability of generating some new solutions for the population. Moreover, they generated offspring which required the least LS effort to make them local optima, thus accelerating the process of artificial evolution to a high extent. Considered separately, RSM mutation seems to be the most important operator in the designed MA. On average, it had the largest contribution to the population and the best results in short runs. CCSPX-2 was the second-best, although for some instances, especially the larger ones, its operations were indispensable. Moreover, CCSPX-2 was most important in long runs until convergence, so it seems that this crossover gives rise to a ‘long-runner’ MA. The attempt to relate the quality of results of the designed MAs to the results of FDA failed. No relationship between the chosen indicators of quality and FDC for simcs was found. Perhaps too many factors influencing solution quality were left uncontrolled in the conducted experiments. To summarise, the method of systematic construction of recombination operators based on fitness-distance analysis gave a good result for Renault’s CarSP. The designed operators, preserving or disturbing subsequences of cars, are the best operators of this kind proposed so far. Chapter 10 Conclusions 10.1 Summary of motivation Metaheuristics are not algorithms in the strict sense of the word. To be practically applicable, metaheuristics have to be adapted to the considered optimisation problem. This adaptation requires that certain components have to be chosen or designed (chapter 2). A universal choice or design of such algorithmic components for all possible problems does not exist, as the No Free Lunch theorems indicate. Rather, the design must be based on properties of the considered problem. If the components do not include problem-specific knowledge, the adapted metaheuristic may in effect become a black-box search algorithm, threatened by the No Free Lunch result. The memetic algorithm considered in this thesis is no exception (chapter 3). Yet at the moment there are no clear design guidelines for components of metaheuristics available. This state of the art is openly complained about in the literature. In particular, the lack of such clear guidelines was demonstrated in case of evolutionary algorithms (chapter 4). However, a detailed review of designs for several problems revealed that efficient designs of components of evolutionary algorithms were sometimes based on observed similarities of good solutions to those problems. This led several authors to analysing fitness landscapes of those problems in search of properties which could be exploited in designed components (chapter 4). Fitness-distance correlation is one of such properties. Many authors believe that its presence in a considered problem may be exploited in the memetic algorithm by means of distance-preserving crossover operators. This belief was to some extent confirmed qualitatively by several existing good operator designs. Hence came the idea of the scheme of adaptation of the memetic algorithm: the construction of distance-preserving crossover operators when fitness-distance correlation is found in the considered problem (chapter 5). 10.2 Contribution of the thesis This was also the source of the main hypothesis of this work and of its main goal (chapter 1): to perform this scheme of adaptation on the capacitated vehicle routing problem (chapter 6) and the car sequencing problem (8). The goal was achieved, as documented in chapters 7 and 9. The following elements form the core of the author’s original contribution presented in these chapters. • The definition and implementation of distance/similarity measures appropriate to the analysed problems: de , dpn , dpc for the CVRP and simcs , simcsuc for the CarSP. 199 200 Conclusions • The fitness-distance analysis of these problems. It revealed that local optima of the problems are to some extent concentrated in the landscapes and that significant fitness-distance correlations exist in majority of the analysed instances. • The construction of distance-preserving crossovers and distance-disturbing mutations based on the results of the FDA. These operators are: CPX2, CEPX, CECPX, GCECPX2 and CPM for the CVRP; CCSPX-2 and RSM for the CarSP. • The experimental comparison of the designed operators with similar ones proposed in the literature. It showed that the designed operators (CEPX with CPM; CCSPX-2 with RSM) generate the best solutions in long runs until convergence. These operators may not be the best choice for short runs, but still generate solutions of very good quality. Therefore, it may be said that the method of systematic construction of crossover operators based on fitness-distance analysis gave a good result for the two considered problems. Together with the cases analysed earlier by other authors, this thesis makes the basis for the practical applicability of this scheme much firmer. The author also contributed to the area of fitness-distance analysis (chapter 5). • The review of fitness-distance analyses performed in the past is most probably the broadest review of this kind currently available in the literature. This may be a valuable source of references for researchers interested in the subject. • The new version of the method for computing FDC proposed by the author is likely to have better statistical and practical properties than methods proposed earlier. 10.3 Perspectives for further work With the completion of this thesis the author perceives some open research issues more clearly. The issue of what fitness-distance correlation actually is still remains open to some extent. Currently, FDC is simply a descriptive statistic of some trend observed in the analysed landscape. There is no appropriate mathematical model of this property. Moreover, the practical significance of the observed trend is determined in an arbitrary way (this thesis is no exception). The existence of some statistical tests resolves no problem: there is a difference between practical and statistical significance. Thus, some proper model of FDC needs to be established. The existence of FDC in a space of solutions depends on the problem, the employed distance measure and, most importantly, the analysed instance. This result of the thesis confirms some earlier observations, as indicated in the review of FDAs. From this perspective, it is interesting to find what the conditions for FDC to exist are. In which types of instances may the correlation be found? How are these types related to real-world instances? One of the most important open issues is the relationship between FDC and efficiency of algorithms which are supposed to exploit it during search. At the moment the existing arguments in favour of such a relationship are mainly qualitative. These are simply the existing good designs of memetic algorithms based on fitness-distance analysis or some of its elements. The existence of this relationship was not confirmed in this thesis, either. This was most probably due to the fact that too many factors possibly influencing instance hardness were left uncontrolled in the conducted experiments. Some more appropriate analysis of the relationship between FDC and efficiency of algorithms would be very welcome. The verification of the FDCs discovered in this thesis would also be welcome. The analyses presented here were conducted using the approximate method, which does not involve global optima. 10.3. Perspectives for further work 201 This is more practical than the basic approach, but since the obtained results are approximate, they should be verified if only global optima of the considered instances are known. Luckily, these are known for 4 analysed instances of the CVRP. Some instances of the CarSP are most probably already solved to optimality, as well. The introduced approximate version of the fitness-distance analysis should also be evaluated on problems with global optima in order to verify its predicted properties. There are still some combinatorial optimisation problems for which fitness-distance analysis was not considered. Therefore, appropriate distance/similarity measures should be defined and the landscapes analysed. Such analyses can provide important insight into properties of the landscapes, e.g. the location of global optima relative to other solutions. Finally, in the author’s opinion it is important to clarify the relationship between the No Free Lunch theorems and practical problems of optimisation: are the theorems applicable to them or not? Some very recent work by He et al. (2007) proves that in the black-box environment the prediction of hardness of the given problem instance is impossible. This result would apply to the FDC when used in such a prediction, but it is still unclear whether the black-box assumption is true for practical optimisation problems. The author’s guess is that this assumption does not hold for NP-hard problems. The resolution of these listed open issues will most likely provide deeper understanding of properties of hard problems of combinatorial optimisation. Hopefully, this will result in better justified designs of even more efficient metaheuristics. Appendix A Names of instances of the car sequencing problem Original names of instances, as used in the ROADEF Challenge 2005 (Cung 2005b), are difficult to manage. These are long strings, with some information being redundant. They probably have meaning to Renault, but for the competitors there were simply labels. Moreover, they were probably encoded in some way, to disguise the origin (e.g. factory location). Therefore, the author of this text proposes a mapping of the original challenge names to some more manageable ones. The proposed name always consists of 11 characters in the form OOOSWD-NNNN, where: • OOO is the first 3-character part of the original name, usually being a number. • S is one character describing the set from which the instance originated. It is one of the values: A, B, X, T. • W is one decimal digit indicating the vector of weights of an instance. • D is equal to the HPRCs difficulty bit dif (see 8.2). • NNNN is a four-digit number being the size of an instance, i.e. the number of cars of the current day. The fields OOO and S are employed to facilitate the use of the map: from the original name to the proposed one and back. The W field encodes the vector of weights w in an instance (see section 8.2). To encode w it is enough to see that the value of wP CC unambiguously indicates the whole vector. wP CC is always of the form 10W , with W ∈ {0, 3, 6}. Hence the value of W. The pair of fields WD encodes the general type of an instance (see section 8.4). Only set X instances are presented in this map. Instances from other set may have their names easily transformed by the presented rule. Table A.1: Map of instance names. Set Original name (ROADEF) Proposed name X 022 RAF EP ENP S49 J2 023 EP RAF ENP S49 J2 024 EP RAF ENP S49 J2 022X60-0704 023X30-1260 024X30-1319 X X 203 204 Names of instances of the car sequencing problem Table A.1: Map of instance names (continued). Set Original name (ROADEF) Proposed name X 025 EP ENP RAF S49 J1 028 CH1 EP ENP RAF S50 J4 028 CH2 EP ENP RAF S51 J1 029 EP RAF ENP S49 J5 034 VP EP RAF ENP S51 J1 J2 J3 034 VU EP RAF ENP S51 J1 J2 J3 035 CH1 RAF EP S50 J4 035 CH2 RAF EP S50 J4 039 CH1 EP RAF ENP S49 J1 039 CH3 EP RAF ENP S49 J1 048 CH1 EP RAF ENP S50 J4 048 CH2 EP RAF ENP S49 J5 064 CH1 EP RAF ENP S49 J1 064 CH2 EP RAF ENP S49 J4 655 CH1 EP RAF ENP S51 J2 J3 J4 655 CH2 EP RAF ENP S52 J1 J2 S01 J1 025X00-0996 028X00-0325 X X X X X X X X X X X X X X X 028X00-0065 029X30-0780 034X30-0921 034X30-0231 035X60-0090 035X60-0376 039X30-1247 039X30-1037 048X30-0519 048X31-0459 064X30-0875 064X30-0273 655X30-0264 655X30-0219 Appendix B Detailed results of memetic algorithms In the tables that follow, the symbol of plus (+ ) beside solution quality for an instance means that the quality is equal to the best-known solution for this instance. The symbol of asterisk (∗ ) means that the quality of the generated solution is better than that of the best-known. 205 best-known 524.61 835.26 826.14 819.56 1042.11 1028.42 1291.29 241.97 1162.96 1618.36 1344.62 1291.01 1365.42 2041.34 1939.90 1406.20 1581.25 3055.23 2656.47 2341.84 2645.39 24431.44 instance c50 c75 c100 c100b c120 c150 c199 f71 f134 tai75a tai75b tai75c tai75d tai100a tai100b tai100c tai100d tai150a tai150b tai150c tai150d tai385 best + 524.61 835.26 827.39 + 819.56 1042.12 1029.16 1306.64 + 241.97 + 1162.96 + 1618.36 + 1344.62 + 1291.01 + 1365.42 2047.90 1940.61 + 1406.20 1585.07 + 3055.23 2727.67 2361.62 2659.63 24540.51 + avg 524.61 837.01 829.01 + 819.56 1043.22 1037.34 1313.25 + 241.97 1163.15 1618.93 1344.89 + 1291.01 + 1365.42 2064.77 1940.81 1411.83 1595.72 3056.71 2732.07 2376.61 2667.64 24633.01 + CPX2 0.00 2.41 1.34 0.00 0.66 4.32 4.48 0.00 0.22 0.94 0.30 0.00 0.00 10.89 0.51 4.24 4.19 1.49 3.62 10.83 3.31 42.00 dev best 524.61 835.32 + 826.14 + 819.56 1042.12 1031.07 1300.79 + 241.97 + 1162.96 + 1618.36 + 1344.62 + 1291.01 + 1365.42 2047.90 1940.61 + 1406.20 1586.33 + 3055.23 2727.96 2362.56 2663.29 24529.96 + avg 524.61 837.00 828.80 + 819.56 1042.40 1037.80 1311.48 + 241.97 1163.10 1618.72 1344.98 + 1291.01 + 1365.42 2062.75 1940.66 1413.08 1596.31 3059.16 2733.10 2364.83 2668.31 24613.77 + CEPX 0.00 2.61 1.26 0.00 0.47 4.15 5.87 0.00 0.14 1.36 0.30 0.00 0.00 10.90 0.04 3.92 2.95 7.47 3.84 2.03 2.02 50.14 dev best 524.61 835.77 827.85 + 819.56 1042.97 1031.96 1306.25 + 241.97 + 1162.96 + 1618.36 + 1344.62 + 1291.01 + 1365.42 2071.52 1940.61 + 1406.20 1596.97 + 3055.23 2727.99 2362.56 2669.26 24540.33 + Table B.1: Quality of results of the basic MA version in long runs for the CVRP: CPX2, CEPX, CECPX2. avg 524.61 837.81 829.71 + 819.56 1043.63 1040.96 1314.96 + 241.97 + 1162.96 1618.84 1344.87 + 1291.01 + 1365.42 2072.69 1940.65 1413.93 1596.98 3057.11 2736.65 2379.25 2669.31 24627.54 + CECPX2 0.00 2.61 0.96 0.00 0.37 4.69 5.71 0.00 0.00 1.22 0.29 0.00 0.00 2.32 0.04 3.86 0.04 2.78 5.41 16.35 0.14 63.78 dev 206 Detailed results of memetic algorithms best-known 524.61 835.26 826.14 819.56 1042.11 1028.42 1291.29 241.97 1162.96 1618.36 1344.62 1291.01 1365.42 2041.34 1939.90 1406.20 1581.25 3055.23 2656.47 2341.84 2645.39 24431.44 instance c50 c75 c100 c100b c120 c150 c199 f71 f134 tai75a tai75b tai75c tai75d tai100a tai100b tai100c tai100d tai150a tai150b tai150c tai150d tai385 best 524.61 835.77 827.39 + 819.56 1042.12 1031.10 1304.58 + 241.97 + 1162.96 + 1618.36 + 1344.62 + 1291.01 + 1365.42 2071.52 1940.61 1406.87 1596.96 + 3055.23 2732.52 2362.86 2663.24 24563.92 + avg. 524.61 838.41 829.32 + 819.56 1043.58 1040.37 1315.07 243.13 1163.04 1621.62 1344.98 + 1291.01 + 1365.42 2074.61 1940.67 1413.14 1598.38 3059.59 2738.00 2384.87 2667.17 24657.98 + GCECPX2 0.00 3.10 1.07 0.00 0.74 3.53 6.14 1.68 0.11 2.93 0.31 0.00 0.00 2.89 0.06 3.79 1.62 7.48 3.64 14.64 3.16 56.56 dev. best 524.61 835.32 827.39 + 819.56 1042.97 1035.22 1308.77 + 241.97 + 1162.96 + 1618.36 + 1344.62 + 1291.01 + 1365.42 2071.52 1940.61 + 1406.20 1596.97 3055.27 2732.46 2366.27 2663.21 24575.69 + avg. 524.61 839.41 829.82 + 819.56 1043.58 1040.33 1317.36 242.34 1163.14 1621.98 1344.82 + 1291.01 1365.68 2073.91 1941.02 1415.48 1598.87 3064.23 2738.68 2395.93 2668.99 24664.09 + RBX 0.00 2.61 1.11 0.00 0.44 3.18 4.73 0.94 0.23 2.69 0.26 0.00 0.25 2.79 1.19 2.87 3.46 10.48 4.77 19.61 6.53 50.63 dev. best 524.61 835.32 + 826.14 + 819.56 1042.12 1031.07 1305.73 + 241.97 + 1162.96 + 1618.36 + 1344.62 + 1291.01 + 1365.42 2047.90 1940.61 1406.87 1596.97 + 3055.23 2727.78 2362.75 2661.72 24504.19 + Table B.2: Quality of results of the basic MA version in long runs for the CVRP: GCECPX2, RBX, SPX. avg. 524.61 837.27 828.84 + 819.56 1043.17 1039.08 1312.05 242.32 + 1162.96 + 1618.36 1344.90 + 1291.01 1365.45 2070.31 1940.79 1413.60 1597.48 3060.05 2732.02 2374.49 2668.01 24606.86 + SPX 0.00 2.78 1.35 0.00 1.01 4.18 3.99 0.87 0.00 0.00 0.32 0.00 0.12 6.11 0.56 3.37 0.75 7.77 2.28 15.94 2.57 54.84 dev. Detailed results of memetic algorithms 207 best average 3.0 5010000.0 153034000.0 8087035.8 30000.0 37000.0 36341495.4 6056000.0 31077916.2 197005.6 12002003.0 110298.4 61187229.8 55994.8 160407.6 231030.0 69239.0 192466.0 337006.0 instance 028X00-0065 035X60-0090 655X30-0219 034X30-0231 655X30-0264 064X30-0273 028X00-0325 035X60-0376 048X31-0459 048X30-0519 022X60-0704 029X30-0780 064X30-0875 034X30-0921 025X00-0996 039X30-1037 039X30-1247 023X30-1260 024X30-1319 3 5010000 153039000 8095038 + 30000 47000 36416093 + 6056000 31105011 207046 + 12002003 136004 61234058 70627 179559 240072 69299 239037 414004 + + best 3.0 5010000.0 153040133.3 8096643.2 + 30000.0 48866.7 37685092.3 + 6056000.0 31107405.3 208241.1 12002003.7 139673.0 61237522.3 72282.5 186890.8 241608.3 69317.3 244179.6 419537.8 + + avg. MSLS 0.0 0.0 805.5 949.0 0.0 1024.2 574255.4 0.0 1949.9 741.8 0.7 2053.4 2185.7 939.4 4903.7 1083.0 8.4 2187.6 2186.6 dev. 3 5010000 153035000 8090034 + 30000 39000 36346088 + 6056000 31091031 198016 + 12002003 116003 61213051 67631 161507 232055 69273 221048 386009 + + best avg. + 3.0 5010000.0 153036466.7 8091045.1 + 30000.0 40733.3 36356554.4 + 6056000.0 31092657.9 199339.8 + 12002003.0 118470.3 61217655.7 70731.1 166317.5 234181.5 69293.5 227108.3 392005.6 + MUT Table B.3: Quality of results of the basic MA version in long runs for the CarSP: MSLS, MUT. 0.0 0.0 1147.0 892.8 0.0 1388.8 5596.7 0.0 1203.6 865.8 0.0 1258.1 2468.8 1621.5 2568.1 1016.9 12.4 3211.9 3384.8 dev. 208 Detailed results of memetic algorithms best average 3.0 5010000.0 153034000.0 8087035.8 30000.0 37000.0 36341495.4 6056000.0 31077916.2 197005.6 12002003.0 110298.4 61187229.8 55994.8 160407.6 231030.0 69239.0 192466.0 337006.0 instance 028X00-0065 035X60-0090 655X30-0219 034X30-0231 655X30-0264 064X30-0273 028X00-0325 035X60-0376 048X31-0459 048X30-0519 022X60-0704 029X30-0780 064X30-0875 034X30-0921 025X00-0996 039X30-1037 039X30-1247 023X30-1260 024X30-1319 3 5010000 153035000 8089046 + 30000 38000 36348083 + 6056000 31088106 ∗ 196986 + 12002003 115003 61212055 67593 161506 231031 69275 219045 380005 + + best 3.0 5010000.0 153036200.0 8090773.6 + 30000.0 40466.7 36358154.7 + 6056000.0 31091802.7 198475.8 + 12002003.0 116803.8 61213990.2 70998.7 164982.1 232168.3 69294.8 224576.7 386005.3 + + avg. CCSPX-2 0.0 0.0 541.6 1060.6 0.0 1203.7 5067.1 0.0 1901.9 717.0 0.0 1274.7 1521.4 1768.8 2219.6 717.7 11.2 2678.5 2582.0 dev. 3 5010000 153035000 8089047 + 30000 39000 36343084 + 6056000 31090060 197995 + 12002003 116003 61215064 67612 163516 232030 69278 222051 387007 + + best avg. + 3.0 5010000.0 153036333.3 8091578.6 + 30000.0 40133.3 36357618.3 + 6056000.0 31092665.9 199411.2 + 12002003.0 117670.2 61218256.6 70984.2 167189.0 233437.3 69294.7 227843.0 390471.9 + NCPX 0.0 0.0 788.8 1147.4 0.0 1203.7 6345.1 0.0 1843.2 885.1 0.0 1247.0 2101.2 1949.4 1811.0 1144.0 8.4 2734.6 2247.3 dev. + + best 3 5010000 153035000 8089058 + 30000 39000 36350088 + 6056000 31089066 198002 + 12002003 116003 61215051 68602 164518 232030 69279 222037 384005 Table B.4: Quality of results of the basic MA version in long runs for the CarSP: CCSPX-2, NCPX, UAX. avg. + 3.0 5010000.0 153036333.3 8091375.9 + 30000.0 39933.3 36423154.4 + 6056000.0 31092323.7 199471.1 + 12002003.0 117938.5 61217123.1 71198.3 167655.0 233246.9 69300.3 226844.2 389605.1 + UAX 0.0 0.0 596.3 939.7 0.0 771.7 245628.1 0.0 1836.6 954.8 0.0 1651.0 1947.6 1351.9 2336.6 1107.5 11.4 2663.8 2823.1 dev. Detailed results of memetic algorithms 209 Bibliography Aarts, E., Korst, J. H. M. & van Laarhoven, P. J. M. (2003), Simulated annealing, in Aarts & Lenstra (2003b), chapter 4. Aarts, E. & Lenstra, J. K. (2003a), Introduction, in E. Aarts & J. K. Lenstra, eds, ‘Local search in combinatorial optimization’, chapter 1. Aarts, E. & Lenstra, J. K., eds (2003b), Local search in combinatorial optimization, Princeton University Press. Alba, E. & Dorronsoro, B. (2004), Solving the vehicle routing problem by using cellular genetic algorithms, in J. Gottlieb & G. R. Raidl, eds, ‘Evolutionary Computation in Combinatorial Optimization’, Vol. 3004 of LNCS, pp. 11–20. Alba, E. & Dorronsoro, B. (2006), ‘Computing nine best-so-far solutions for capacitated vrp with a cellular genetic algorithm’, Information Processing Letters (98), 225–230. Altenberg, L. (1995), The Schema Theorem and Price’s Theorem, in D. Whitley & M. Vose, eds, ‘Foundations of Genetic Algorithms 3’, Morgan Kaufmann, San Francisco, pp. 23–49. Altenberg, L. (1997), Fitness distance correlation analysis: an instructive counterexample, in T. Baeck, ed., ‘Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA97)’, Morgan Kaufmann, San Francisco. Altinel, I. K. & Oncan, T. (2005), ‘A new enhancement of the Clarke and Wright savings heuristic for the capacitated vehicle routing problem’, Journal of the Operational Research Society 56, 954–961. Aronson, L. D. (1996), Algorithms for vehicle routing - a survey, Technical Report DUT-TWI-96-21, Delft University of Technology, The Netherlands. Błażewicz, J. (1988), Złożonośc obliczeniowa problemów kombinatorycznych, Wydawnictwa Naukowo-Techniczne, Warszawa. (In Polish). Baker, B. M. & Ayechew, M. A. (2003), ‘A genetic algorithm for the vehicle routing problem’, Computers and Operations Research 30, 787–800. Beck, J. C. & Watson, J.-P. (2003), Adaptive search algorithms and fitness-distance correlation, in ‘MIC’2003 – 5th Metaheuristics International Conference’, Kyoto, Japan. Bentley, J. L. (1990), Experiments on travelling salesman heuristics, in ‘Proceedings of the first annual ACM-SIAM symposium on discrete algorithms’, pp. 91–99. Berger, J. & Barkaoui, M. (2003), ‘A new hybrid genetic algorithm for the capacitated vehicle routing problem’, Journal of the Operational Research Society 54, 1254–1262. Bierwirth, C., Mattfeld, D. C. & Kopfer, H. (1996), On permutation representations for scheduling problems, in H.-M. Voigt, W. Ebeling, I. Rechenberg & H.-P. Schwefel, eds, ‘Parallel Problem Solving from Nature IV’, Vol. 1141 of LNCS, Springer, pp. 310–318. Bierwirth, C., Mattfeld, D. C. & Watson, J.-P. (2004), Landscape regularity and random walks for the job shop scheduling problem, in J. Gottlieb & G. R. Raild, eds, ‘Evolutionary Computation in Combinatorial Optimization’, Vol. 3004 of LNCS, Springer, pp. 21–30. 211 212 Bibliography Boese, K. D. (1995), Cost versus distance in the traveling salesman problem, Technical Report TR-950018, UCLA CS Department. Boese, K. D., Kahng, A. B. & Muddu, S. (1994), ‘A new adaptive multi-start technique for combinatorial global optimization’, Operations Research Letters 16(2), 101–113. Bonissone, P. P., Subbu, R., Eklund, N. & Kiehl, T. R. (2006), ‘Evolutionary algorithms + domain knowledge = real-world evolutionary computation’, IEEE Transactions on Evolutionary Computation 10(3), 256–280. Boryczka, U., Skinderowicz, R. & Świstowski, D. (2006), Comparative study: ACO and EC for TSP, in J. Arabas, ed., ‘Evolutionary Computation and Global Optimization 2006’, Oficyna Wydawnicza Politechniki Warszawskiej, Murzasichle, Poland. Bronshtein, I. N., Semendyayev, K. A., Musiol, G. & Muehlig, H. (2004), Handbook of Mathematics, Springer-Verlag. Burke, E. K., Kendall, G. & Soubeiga, E. (2003), ‘A tabu search hyperheuristic for timetabling and rostering’, Journal of Heuristics 9, 451–470. Burke, E., Kendall, G., Nevall, J., Hart, E., Ross, P. & Schulenburg, S. (2003), Hyper-heuristics: an Emerging Direction in Modern Search Technology, in Glover & Kochenberger (2003). Cheng, J., Lu, Y., Puskorius, G., Bergeon, S. & Xiao, J. (1999), Vehicle sequencing based on evolutionary computation, in ‘Proceedings of the 1999 Congress on Evolutionary Computation’, Vol. 2, Washington, USA, pp. 1207–1214. Clarke, G. & Wright, J. (1964), ‘Scheduling of vehicles from a central depot to a number of delivery points’, Operations Research 12, 568–582. Cochran, W. G. (1953), Sampling Techniques, John Wiley and Sons, New York. Coffman, Jr, E. G., ed. (1976), Computer and Job-Shop Scheduling Theory, John Wiley and Sons. Cormen, T. H., Leiserson, C. E. & Rivest, R. L. (1990), Introduction to Algorithms, Massachusetts Institute of Technology. Cotta, C. & Fernández, A. J. (2005), Analyzing fitness landscapes for the optimal Golomb ruler problem, in Raidl & Gottlieb (2005), pp. 68–79. Cotta, C. & van Hemert, J., eds (2007), Evolutionary Computation in Combinatorial Optimization, Vol. 4446 of LNCS, Springer Verlag. Culberson, J. C. (1998), ‘On the futility of blind search: an algorithmic view of ‘no free lunch”, Evolutionary Computation 6(2), 109–127. Cung, V.-D. (2005a), ‘Personal communication’. Cung, V.-D. (2005b), ‘Roadef challenge 2005 webpage’, http://www.prism.uvsq.fr/vdc/ROADEF/CHALLENGES/2005/challenge2005 en.html. accessed September 2008. Dorigo, M. & Stutzle, T. (2003), The Ant Colony Optimization Metaheuristic: Algorithms, Applications, and Advances, in Glover & Kochenberger (2003), chapter 9. Duda, R. O., Hart, P. E. & Stork, D. G. (2001), Pattern classification, John Wiley and Sons. Estellon, B., Gardi, F. & Nouioua, K. (2006), ‘Large neighbourhood improvements for solving car sequencing problems’, RAIRO Operations Research 40, 355–379. Falkenauer, E. (1998), Genetic algorithms and grouping problems, John Wiley and Sons. Ferguson, G. A. & Takane, Y. (1989), Statistical Analysis in Psychology and Education, Mc Graw-Hill. 213 Finger, M., Stutzle, T. & Lourenco, H. (2002), Exploiting fitness distance correlation of set covering problems, in S. C. et al., ed., ‘Applications of Evolutionary Computing’, Vol. 2279 of LNCS, Springer, pp. 61–71. Freisleben, B. & Merz, P. (1996), New genetic local search operators for the travelling salesman problem, in H.-M. Voigt, W. Ebeling, I. Rechenberg & H.-P. Schwefel, eds, ‘Parallel Problem Solving from Nature IV’, Vol. 1141 of LNCS, Springer, pp. 890–899. Galinier, P. & Hao, J.-K. (1999), ‘Hybrid evolutionary algorithms for graph colouring’, Journal of Combinatorial Optimization 3, 379–397. Gendreau, M. (2003), An Introduction to Tabu Search, in Glover & Kochenberger (2003), chapter 2. Gendreau, M., Laporte, G. & Potvin, J.-Y. (2002), Metaheuristics for the capacitated VRP, in Toth & Vigo (2002b), chapter 6. Gent, I. P. (1998), Two results on car-sequencing problems, Technical report, APES-02-1998. Gent, I. P. & Walsh, T. (1999), CSPLib: a benchmark library for constraints, Technical report, APES-09-1999. Available from http://www.csplib.org/. A shorter version appears in the Proceedings of the 5th International Conference on Principles and Practices of Constraint Programming (CP-99). Gillet, B. E. & Miller, L. R. (1974), ‘A heuristic algorithm for the vehicle dispatch problem’, Operations Research 22, 340–349. Glover, F. & Kochenberger, G. A., eds (2003), Handbook of metaheuristics, Kluwer Academic Publishers. Glover, F., Laguna, M. & Martí, R. (2003), Scatter Search and Path Relinking: Advances and Applications, in Glover & Kochenberger (2003), chapter 1. Goldberg, D. E. (1989), Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley Publishing Company. Gottlieb, J., Puchta, M. & Solnon, C. (2003), A study of greedy, local search and ant colony optimization approaches for car sequencing problems, in S. C. et al., ed., ‘Applications of Evolutionary Computing’, Vol. 2611 of LNCS, Springer, pp. 246–257. Gravel, M., Gagne, C. & Price, W. L. (2005), ‘Review and comparison of three methods for the solution of the car sequencing problem’, Journal of the Operational Research Society 56, 1287–1295. Gusfield, D. (1997), Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology, Cambridge University Press. Hammond, M. (2003), ‘Chris Stephens on why EC needs a unifying theory’, EvoNet website http://evonet.lri.fr/evoweb/news events/news features/article.php?id=216. accessed January 2008. Hansen, P. & Mladenović, N. (2003), Variable Neighbourhood Search, in Glover & Kochenberger (2003), chapter 1. He, J., Reeves, C. R., Witt, C. & Yao, X. (2007), ‘A note on problem difficulty measures in black-box optimization: classification, realizations and predictability’, Evolutionary Computation 15(4), 435–443. Henderson, D., Jacobson, S. H. & Johnson, A. W. (2003), The Theory and Practice of Simulated Annealing, in Glover & Kochenberger (2003), chapter 10. Hertz, A., Taillard, E. & de Werra, D. (2003), Tabu search, in Aarts & Lenstra (2003b), chapter 5. Ho, S. C. & Gendreau, M. (2006), ‘Path relinking for the vehicle routing problem’, Journal of Heuristics 12, 55–72. 214 Bibliography Hoos, H. & Stutzle, T. (2004), Stochastic Local Search: Foundations & Applications, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Ishibuchi, H., Yoshida, T. & Murata, T. (2003), ‘Balance between genetic search and local search in memetic algorithms for multiobjective permutation flowshop scheduling’, IEEE Transactions on Evolutionary Computation 7(2), 204–223. Jaszkiewicz, A. (1999), ‘Improving performance of genetic local search by changing local search space topology’, Foundations of Computing and Decision Sciences 24(2), 77–84. Jaszkiewicz, A. (2004), Adaptation of the genetic local search algorithm to the management of earth observation satellites, in ‘Seventh National Conference on Evolutionary Computation and Global Optimization’, Kazimierz Dolny, Poland, pp. 67–74. Jaszkiewicz, A. & Kominek, P. (2003), ‘Genetic local search with distance preserving recombination operator for a vehicle routing problem’, European Journal of Operational Research 151, 352–364. Jaszkiewicz, A., Kominek, P. & Kubiak, M. (2004), Adaptation of the genetic local search algorithm to a car sequencing problem, in ‘Seventh National Conference on Evolutionary Computation and Global Optimization’, Kazimierz Dolny, Poland, pp. 75–82. Jones, T. & Forrest, S. (1995), Fitness distance correlation as a measure of problem dificulty for genetic algorithms, in L. J. Eshelman, ed., ‘Proceedings of the 6th International Conference on Genetic Algorithms’, Morgan Kaufmann, pp. 184–192. Jozefowiez, N., Semet, F. & Talbi, E.-G. (2007), ‘Target aiming pareto search and its application to the vehicle routing problem with route balancing’, Journal of Heuristics 13(5), 455–469. Karoński, M. & Palka, Z. (1977), ‘On Marczewski-Steinhaus type distance between hypergraphs’, Applicationes Mathematicae 16(1), 47–57. Kindervater, G. A. P. & Savelsbergh, M. W. P. (2003), Vehicle routing: handling edge exchanges, in Aarts & Lenstra (2003b), chapter 10, pp. 337–360. Kirkpatrick, S. & Toulouse, G. (1985), ‘Configuration space analysis of traveling salesman problems’, Journal de Physique 46, 1277–1292. Kis, T. (2004), ‘On the complexity of the car sequencing problem’, Operations Research Letters 32, 331–335. Kominek, P. (2001), Zastosowanie algorytmów metaheurystycznych sterowanych wiedzą do rozwiązywania złożonych problemów optymalizacji kombinatorycznej, PhD thesis, Poznan University of Technology, Poland. (In Polish). Krasnogor, N. & Smith, J. (2005), ‘A tutorial for competent memetic algorithms: model, taxonomy, and design issues’, IEEE Transactions on Evolutionary Computation 9(5), 474–488. Krysicki, W., Bartos, J., Dyczka, W., Królikowska, K. & Wasilewski, M. (1998), Rachunek prawdopodobieństwa i statystyka matematyczna w zadaniach. Część II. Statystyka matematyczna, PWN, Warszawa. (In Polish). Kubiak, M. (2002), Genetic local search algorithm for the vehicle routing problem, Master’s thesis, Poznan University of Technology, Poznan, Poland. (In Polish and English). Kubiak, M. (2004), ‘Systematic construction of recombination operators for the vehicle routing problem’, Foundations of Computing and Decision Sciences 29(3), 205–226. Kubiak, M. (2005), Distance metrics and fitness-distance analysis for the capacitated vehicle routing problem, in ‘MIC’2005 – 6th Metaheuristics International Conference’, Vienna, Austria, pp. 603–610. 215 Kubiak, M. (2006), Analysis of distance between vehicle routing problem solutions generated by memetic algorithms, in J. Arabas, ed., ‘Evolutionary Computation and Global Optimization’, number 156 in ‘Prace Naukowe, Elektronika’, Oficyna Wydawnicza Politechniki Warszawskiej, pp. 223–236. Kubiak, M. (2007), Distance measures and fitness-distance analysis for the capacitated vehicle routing problem, Operations Research Computer Science Interfaces, Springer, chapter 18, pp. 345–364. Kubiak, M., Jaszkiewicz, A., & Kominek, P. (2006), ‘Fitness-distance analysis of a car sequencing problem’, Foundations of Computing and Decision Sciences 31(3–4), 263–276. Kubiak, M. & Wesołek, P. (2007), Accelerating local search in a memetic algorithm for the capacitated vehicle routing problem, in Cotta & van Hemert (2007), pp. 96–107. Kytojoki, J., Nuortio, T., Braysy, O. & Gendreau, M. (2007), ‘An efficient variable neighbourhood search heuristic for very large scale vehicle routing problems’, Computers and Operations Research 34, 2743–2757. Laporte, G. & Semet, F. (2002), Classical heuristics for the Capacitated VRP, in Toth & Vigo (2002b), chapter 5. Larose, D. T. (2005), Discovering Knowledge in Data. An Introduction to DATA MINING, John Wiley and Sons. Lewis, R. & Paechter, B. (2004), New crossover operators for timetabling with evolutionary algorithms, in A. Lofti, ed., ‘Proceedings of the 5th International Conference on Recent Advances in Soft Computing (RASC 2004)’, pp. 189–195. Lewis, R. & Paechter, B. (2005a), Application of the grouping genetic algorithm to university course timetabling, in Raidl & Gottlieb (2005), pp. 144–153. Lewis, R. & Paechter, B. (2005b), An empirical analysis of the grouping genetic algorithm: the timetabling case, in ‘Genetic and Evolutionary Computation Conference (GECCO)’. Manly, B. F. J. (1997), Randomization, bootstrap and Monte Carlo methods in biology, Chapman and Hall, London. Mantel, N. (1967), ‘The detection of disease clustering and a generalized regression approach’, Cancer Research 27(1), 209–220. Marczewski, E. & Steinhaus, H. (1958), ‘On a certain distance of sets and the corresponding distance of functions’, Colloquium Mathematicum 6, 319–327. Mattfeld, D. C., Bierwirth, C. & Kopfer, H. (1999), ‘A search space analysis of the job shop scheduling problem’, Annals of Operations Research 86, 441–453. Mattiussi, C., Waibel, M. & Floreano, D. (2004), ‘Measures of diversity for populations and distances between individuals with highly reorganizable genomes’, Evolutionary Computation 12(4), 495–515. Merz, P. (2000), Memetic Algorithms for Combinatorial Optimization Problems: Fitness Landscapes and Effective Search Strategies, PhD thesis, University of Siegen, Germany. Merz, P. (2001), On the performance of memetic algorithms in combinatorial optimization, in ‘Genetic and Evolutionary Computation Conference (GECCO)’. Merz, P. (2002), A comparison of memetic recombination operators for the traveling salesman problem, in ‘Genetic and Evolutionary Computation Conference (GECCO)’. Merz, P. (2004), ‘Advanced fitness landscape analysis and the performance of memetic algorithms’, Evolutionary Computation 12(3), 303–325. Merz, P. & Freisleben, B. (1999), Fitness landscapes and memetic algorithm design, in D. Corne, M. Dorigo & F. Glover, eds, ‘New Ideas in Optimization’, McGraw-Hill, chapter 3, pp. 245–260. 216 Bibliography Merz, P. & Freisleben, B. (2000a), ‘Fitness landscape analysis and memetic algorithms for the quadratic assignment problem’, IEEE Transactions on Evolutionary Computation 4(4), 159–164. Merz, P. & Freisleben, B. (2000b), ‘Fitness landscapes, memetic algorithms, and greedy operators for graph bipartitioning’, Evolutionary Computation 8(1), 61–91. Michalewicz, Z. (1996), Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag. Michalewicz, Z. & Fogel, D. B. (2000), How to Solve It: Modern Heuristics, Springer-Verlag. Moscato, P. & Cotta, C. (2003), A Gentle Introduction to Memetic Algorithms, in Glover & Kochenberger (2003), chapter 5. Muhlenbein, H. (1991), Evolution in time and space - the parallel genetic algorithm, in G. J. E. Rawlins, ed., ‘Foundations of Genetic Algorithms’, Morgan Kaufmann. Muhlenbein, H. (2003), Genetic algorithms, in Aarts & Lenstra (2003b), chapter 6. Nagata, Y. (2007), Edge assembly crossover for the capacitated vehicle routing problem, in Cotta & van Hemert (2007), pp. 142–153. Nagata, Y. & Kobayashi, S. (1997), Edge assembly crossover: a high-power genetic algorithm for the traveling salesman problem, in ‘Proceedings of the 7th International Conference on Genetic Algorithms’, pp. 450–457. Nguyen, A. (2003), ‘Challenge roadef 2005 car sequencing problem’, http://www.prism.uvsq.fr/vdc/ROADEF/CHALLENGES/2005/challenge2005 en.html. Pawlak, G. (2007), ‘Personal communication’. A note during a seminar of the Institute of the Computing Science, Poznan University of Technology, Poland. Pawlak, M. (1999), Algorytmy ewolucyjne jako narzędzie harmonogramowania produkcji, Wydawnictwo Naukowe PWN, Warszawa. (In Polish). Potvin, J.-Y. & Bengio, S. (1996), ‘The vehicle routing problem with time windows part II: genetic search’, INFORMS Journal of Computing 8(2), 165–172. Prins, C. (2001), A simple and effective evolutionary algorithm for the vehicle routing problem, in J. P. de Sousa, ed., ‘MIC’2001 – 4th Metaheuristics International Conference’, pp. 143–147. Prins, C. (2004), ‘A simple and effective evolutionary algorithm for the vehicle routing problem’, Computers and Operations Research 31, 1985–2002. Puchta, M. & Gottlieb, J. (2002), Solving car sequencing problems by local optimization, in S. C. et al., ed., ‘Applications of Evolutionary Computing’, Vol. 2279 of LNCS, Springer. Qu, R. & Burke, E. K. (2005), Hybrid variable neighbourhood hyperheuristics for exam timetabling problems, in ‘MIC’2005 – 6th Metaheuristics International Conference’, Vienna, Austria, pp. 781–786. Raidl, G. R. & Gottlieb, J., eds (2005), Evolutionary Computation in Combinatorial Optimization, Vol. 3448 of LNCS, Springer-Verlag. Rao, C. R. (1989), Ramanujan Memorial Lectures. Statistics and Truth. Putting Chance to Work, Council of Scientific and Industrial Research, New Delhi, India. Reeves, C. R. (1999), ‘Landscapes, operators and heuristic search’, Annals of Operations Research 86, 473–490. Reeves, C. R. (2003), Genetic Algorithms, in Glover & Kochenberger (2003), chapter 3. Reeves, C. R. & Rowe, J. E. (2003), Genetic algorithms: principles and perspectives: a guide to GA theory, Kluwer Academic Publishers. 217 Reeves, C. R. & Yamada, T. (1998), ‘Genetic algorithms, path relinking, and the flowshop sequencing problem’, Evolutionary Computation 6(1), 45–60. Reimann, M., Stummer, M. & Doerner, K. (2002), A savings based ant system for the vehicle routing problem, in ‘Genetic and Evolutionary Computation Conference (GECCO)’, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 1317–1326. Remde, S., Cowling, P., Dahal, K. & Colledge, N. (2007), Exact/heuristic hybrid using rVNS and hyperheuristics for workforce scheduling, in Cotta & van Hemert (2007), pp. 188–197. Resende, M. G. C. & Ribeiro, C. C. (2003), Greedy Randomised Adaptive Search Procedures, in Glover & Kochenberger (2003), chapter 8. Ribeiro, C. C., Aloise, D., Noronha, T. F., Rocha, C. & Urrutia, S. (2005), A heuristic for a real-life car sequencing problm with multiple requirements, in ‘MIC’2005 – 6th Metaheuristics International Conference’, Vienna, Austria, pp. 799–804. Robardet, C. & Feschet, F. (2000), A new methodology to compare clustering algorithms, in K.-S. Leung, L.-W. Chan & H. Meng, eds, ‘Intelligent Data Engineering and Automated Learning - IDEAL 2000. Data Mining, Financial Engineering, and Intelligent Agents’, Vol. 1983 of LNCS, Springer. Rochat, Y. & Taillard, E. D. (1995), ‘Probabilistic diversification and intensification in local search for vehicle routing’, Journal of Heuristics 1, 147–167. Schiavinotto, T. & Stutzle, T. (2007), ‘A review of metrics on permutations for search landscape analysis’, Computers and Operations Research 34, 3143–3153. Schneider, J., Britze, J., Ebersbach, A., Morgenstern, I. & Puchta, M. (2000), ‘Optimization of production planning problems - a case study for assembly lines’, International Journal of Modern Physics C 11(5), 949–972. Schumacher, C., Vose, M. D. & Whitley, L. D. (2001), The no free lunch and problem description length, in ‘Genetic and Evolutionary Computation Conference (GECCO)’, Morgan Kaufmann, pp. 565–570. Słowiński, R. (1984), ‘Preemptive scheduling of independent jobs on parallel machines subject to financial constraints’, European Journal of Operational Research 15(3), 366–373. Sorensen, K. (2003), Distance measures based on the edit distance for permutation-type representations, in ‘Genetic and Evolutionary Computation Conference (GECCO)’, Chicago. Sorensen, K. (2007), ‘Distance measures based on the edit distance for permutation-type representations’, Journal of Heuristics 13(1), 35–47. Sorensen, K., Reimann, M. & Prins, C. (2005), Path relinking for the vehicle routing problem using the edit distance, in ‘MIC’2005 – 6th Metaheuristics International Conference’, Vienna, Austria, pp. 839–846. Soubeiga, E. (2003), Development and application of hyperheuristics to personnel scheduling, PhD thesis, University of Nottingham, United Kingdom. Taillard, E. (1993), ‘Parallel iterative search methods for vehicle routing problems’, Networks 23, 661–673. Taillard, E. (2008), ‘Instances of the capacitated vehicle routing problem’, http://mistic.heig-vd.ch/taillard/problemes.dir/vrp.dir/vrp.html. last access April 2008. Tavares, J., Pereira, F. B., Machado, P. & Costa, E. (2003), Crossover and diversity: a study about GVR, in ‘Proceedings of the Analysis and Design of Representations and Operators (ADORO) workshop, Genetic and Evolutionary Computation Conference (GECCO)’. Terada, J., Vo, H. & Joslin, D. (2006), Combining genetic algorithms with squeaky-wheel optimization, in ‘Genetic and Evolutionary Computation Conference (GECCO)’, Seattle, USA. 218 Bibliography Tezuka, M., Hiji, M., Miyabayashi, K. & Okumura, K. (2000), A new genetic representation and common cluster crossover for job shop scheduling problem, in S. Cagnoni et al., eds, ‘Real-World Applications of Evolutionary Computing’, Vol. 1803 of LNCS, Springer, pp. 297–306. Toth, P. & Vigo, D. (2002a), An overview of vehicle routing problems, in The Vehicle Routing Problem (Toth & Vigo 2002b), chapter 1. Toth, P. & Vigo, D., eds (2002b), The Vehicle Routing Problem, SIAM, Philadelphia. Tuson, A. (2005), ‘Are evolutionary metaphors applicable to evolutionary optimisation?’, Presented at: 14th Young Operational Research Conference (YOR14), Bath, UK. Warwick, T. & Tsang, E. (1995), ‘Tackling car sequencing problems using a generic genetic algorithm’, Evolutionary Computation 3(3), 267–298. Watson, J.-P. (2005), On metaheuristics ‘failure modes’: a case study in tabu search for job-shop scheduling, in ‘MIC’2005 – 6th Metaheuristics International Conference’, Vienna, Austria, pp. 910–915. Watson, J.-P., Barbulescu, L., Whitley, L. D. & Howe, A. E. (2002), ‘Contrasting structured and random permutation flow-shop scheduling problems: search-space topology and algorithm performance’, INFORMS Journal on Computing 14, 98–123. Watson, J.-P., Beck, J. C., Howe, A. E. & Whitley, L. D. (2003), ‘Problem dificulty for tabu search in job-shop scheduling’, Artificial Intelligence 143, 189–217. Weiss, D. (2006), Descriptive clustering as a method for exploring text collections, PhD thesis, Poznan University of Technology, Poznań, Poland. Whitley, D., Starkweather, T. & Fuquay, D. (1989), Scheduling problems and traveling salesman: the genetic edge recombination operator, in ‘Proceedings of the Third International Conference on Genetic Algorithms’, Morgan Kaufmann, pp. 133–140. Whitley, D. & Watson, J. P. (2006), Complexity theory and the No Free Lunch Theorem, Springer-Verlag, chapter 11, pp. 317–339. Wolpert, D. H. (2005), ‘Personal communication during 2005 IEEE Congress on Evolutionary Computation, Edinburgh, UK’. Wolpert, D. H. & Macready, W. G. (1997), ‘No free lunch theorems for optimization’, IEEE Transactions on Evolutionary Computation 1(1), 67–82. Woodruff, D. L. & Lokketangen, A. (2005), Similarity and distance functions to support VRP metaheuristics, in ‘MIC’2005 – 6th Metaheuristics International Conference’, Vienna, Austria, pp. 929–933. Zhang, C., Li, P., Rao, Y. & Li, S. (2005), A new hybrid GA/SA algorithm for the job shop scheduling problem, in Raidl & Gottlieb (2005), pp. 246–259. Zinflou, A. (2008), ‘Personal communication’. Zinflou, A., Gagné, C. & Gravel, M. (2007), Crossover operators for the car sequencing problem, in Cotta & van Hemert (2007), pp. 229–239. c 2009 Marek Kubiak ° Poznan University of Technology Faculty of Computer Science and Management Institute of Computing Science Typeset using LATEX in Computer Modern. BibTEX: @phdthesis{ key, author = "Marek Kubiak", title = "{Fitness-distance Analysis for Adaptation of a Memetic Algorithm to Two Problems of Combinatorial Optimisation}", school = "Poznan University of Technology", address = "Pozna{\’n}, Poland", year = "2009", }