thesis

Transcription

thesis

Poznan University of Technology
Faculty of Computer Science and Management
Institute of Computing Science
Doctoral dissertation
FITNESS-DISTANCE ANALYSIS FOR ADAPTATION
OF A MEMETIC ALGORITHM TO TWO PROBLEMS
OF COMBINATORIAL OPTIMISATION
Marek Kubiak
Supervisor
Andrzej Jaszkiewicz, PhD Dr Habil.
Poznań, 2009
Acknowledgement
I would like to thank Dr. Andrzej Jaszkiewicz for introducing me to the area of metaheuristics
and to the property of fitness-distance correlation. Also for the support during my years as a
PhD student and the example of a successful scientist and engineer, which is for me the source of
constant inspiration.
I deeply thank Professor Roman Słowiński for encouragement, sympathy and help.
I appreciate the help, support and presence of: Przemysław Wesołek, Dawid Weiss, Maciej
Komosiński, Wojciech Kotłowski, Izabela Szczęch, Krzysztof Dembczyński, Jerzy Błaszczyński,
Marcin Szeląg and the whole staff of the Laboratory of Intelligent Decision Support Systems.
Without the help and example of my teachers of mathematics, physics and literature on all
levels of education, my knowledge and experience would have been much closer to ignorance.
Therefore, I thank Ewa Krompiewska, Halina Szalaty, Alicja Borowska and Dr. Grzegorz Kubski
for their excellent work as teachers and guides.
Finally, I would like to thank my family for the enormous support and encouragement they
offered me during these last 6 years of scientific work.
iii
Streszczenie
Rozdział 1. Wprowadzenie
Kontekst tematu badawczego
W nowoczesnej ekonomii, administracji i nauce często napotyka się problemy optymalizacji. Wynika to z tego, że decydenci są zwykle zainteresowani efektywnym przydziałem dostępnych zasobów
do zadań lub rozwiązywaniem problemów nowych, wcześniej w ogóle nie rozpatrywanych.
Problemy takie mogą być problemami optymalizacji ciągłej lub kombinatorycznej (dyskretnej).
Ta rozprawa zajmuje się dwoma problemami kombinatorycznymi. Wymagają one, żeby w przestrzeni rozwiązań o skończonej liczbie elementów znaleźć rozwiązanie optymalne w sensie pewnej
danej funkcji celu. Jednak to, że przestrzeń taka jest skończonych rozmiarów nie znaczy, że jest
mała i że łatwo w niej znaleźć to optimum. Jak pokazuje praktyka, wiele problemów kombinatorycznych jest trudnych do rozwiązania w rozsądnym czasie, co teoria złożoności obliczeniowej
nazywa NP-trudnością.
W rezultacie często próbuje się rozwiązywać takie problemy w sposób przybliżony, przy pomocy
algorytmów heurystycznych i metaheurystycznych. Celem takich algorytmów jest generowanie
w rozsądnym czasie rozwiązań o dobrej, akceptowalnej jakości, choć niekoniecznie optymalnych.
Ta rozprawa podejmuje właśnie zagadnienie zastosowania pewnego rodzaju metaheurystyki, algorytmu memetycznego, do rozwiązania dwóch konkretnych problemów.
Algorytmy metaheurystyczne nie są jednak algorytmami sensu stricte. To są meta-algorytmy,
czyli schematy algorytmów, które muszą być najpierw dostosowane do rozpatrywanego problemu,
by móc efektywnie działać. To dostosowanie wymaga dokonania wyboru lub zaprojektowania
tych komponentów meta-algorytmu, które nie są w nim jawnie wyspecyfikowane. A jak pokazuje
literatura przedmiotu, dostosowanie to może bardzo silnie wpłynąć na efektywność uzyskanego
algorytmu. Dlatego też to dostosowanie komponentów metaheurystyki do konkretnego problemu
powinno być wykonywane z uwagą i dobrze uzasadnione.
Niestety, aktualnie brakuje jasnych wskazówek dotyczących tego, jak projektować te niewyspecyfikowane komponenty metaheurystyk. Dostosowanie metaheurystyki do problemu jest w tym
momencie bardziej sztuką niż wiedzą techniczną. Jedynie pewne wstępne wskazówki można znaleźć
w literaturze. W tych wskazówkach mowa o analizie zbioru rozwiązań rozpatrywanego problemu
przed konstrukcją algorytmu. Taka analiza ma dostarczyć wiedzy o własnościach problemu, które
można potem wykorzystać w projektowanych komponentach.
Jedną z takich własności jest korelacja jakości i odległości rozwiązań (ang. fitness-distance correlation, inaczej: globalna wypukłość). Cecha ta polega na tym, że im lepsze są rozwiązania danego
problemu, tym są bliżej siebie w sensie pewnej miary odległości specyficznej dla problemu. Dodatkowo zakłada, że najlepsze rozwiązania (w tym i optimum) są gdzieś „pośrodku” tego trendu. Jak
się obecnie uważa, sposobem na wykorzystanie tej własności przestrzeni rozwiązań jest np. konstrukcja dla algorytmu memetycznego operatorów krzyżowania zachowujących odległość.
Pomysł na konstrukcję takich operatorów opierając się na własności globalnej wypukłości jest
stosunkowo nowy. Był stosowany dopiero w kilku przypadkach. Nawet sama analiza korelacji jakości i odległości rozwiązań nie była jeszcze wykonywana dla wielu problemów. Aktualnie
brak niezbędnych do tego narzędzi: miar odległości dla rozwiązań problemów kombinatorycznych.
Dlatego ta rozprawa podejmuje zagadnienie adaptacji algorytmu memetycznego, czyli konstrukcji
operatorów zachowujących odległość, na podstawie analizy korelacji jakości i odległości. Taka analiza i adaptacja nie były jeszcze wykonywane na problemie planowania tras i problemie planowania
produkcji samochodów, które tutaj się rozpatruje.
Główne założenia i hipoteza pracy
Pierwszym założeniem tej pracy jest to, że dwa rozpatrywane problemy optymalizacji posiadają
własność globalnej wypukłości. Drugie założenie mówi, że obecność globalnej wypukłości ułatwia
dostosowanie algorytmu memetycznego do konkretnego problemu.
Główna hipoteza tej rozprawy głosi, że adaptacja algorytmu memetycznego do danego problemu
powinna polegać na konstrukcji operatorów krzyżowania zachowujących odległość, o ile tylko ten
problem wykazuje globalną wypukłość. W takim przypadku uzyskany algorytm będzie generował
rozwiązania co najmniej niegorsze lub nawet lepsze niż ta sama metaheurystyka z operatorami
innego rodzaju.
Cel pracy
Celem niniejszej pracy jest wykonanie i ocena schematu adaptacji algorytmu memetycznego opartego na badaniu globalnej wypukłości. Schemat ten jest wykonany i oceniony na tytułowych
dwóch problemach: problemie planowania tras dla pojazdów z ograniczeniami pojemnościowymi
i problemie planowania produkcji samochodów.
Opublikowane wyniki
Pewne elementy tej rozprawy były już przez autora opisywane w następujących tekstach: Kubiak
(2004), Jaszkiewicz i inni (2004), Kubiak (2005), Kubiak i inni (2006), Kubiak (2006), Kubiak
(2007), Kubiak i Wesołek (2007).
Rozdział 2. Metaheurystyki w optymalizacji kombinatorycznej
Krótki przegląd metaheurystyk i ich zastosowań dokładniej pokazuje, że to nie są algorytmy, które
można od razu zastosować do rozwiązania dowolnego problemu. Są to raczej ogólne schematy
algorytmów, które trzeba dostosować do konkretnego problemu, czyli zaprojektować lub wybrać
komponenty dla wybranej metaheurystyki. Tablica 1 prezentuje przykłady takich komponentów
dla przeanalizowanych metaheurystyk.
W większości przypadków (w tym w przypadku algorytmu memetycznego) brakuje jasnych
wskazówek projektowych, które mogłyby pomóc w praktyce zastosowania metaheurystyk. Można
za to często znaleźć ogólne stwierdzenia, że należy wprowadzać do takiego algorytmu wiedzę specyficzną dla zadania.
Tablica 1: Komponenty metaheurystyk, które wymagają adaptacji do problemu.
Metaheurystyka
przeszukiwanie lokalne
algorytm kolonii mrówek
hiperheurystyka
algorytm ewolucyjny
algorytm memetyczny
Komponenty wymagające adaptacji
tworzenie rozwiązań początkowych
operator(y) sąsiedztwa
reguła poprawy
definicja komponentu rozwiązania
zrandomizowana heurystyka konstrukcyjna
reguła aktualizacji śladu feromonowego
definicja komponentu rozwiązania
zbiór heurystyk niskiego poziomu
hiperheurystyka wysokiego poziomu
tworzenie rozwiązań początkowych
reprezentacja rozwiązań
operator(y) krzyżowania
operator(y) mutacji
wszystkie komponenty przeszukiwania lokalnego
wszystkie komponenty algorytmu ewolucyjnego
Rozdział 3. Twierdzenia „nic za darmo” i ich konsekwencje dla
optymalizacji
Nie ma ogólnych metod optymalizacji dla dowolnych problemów
Twierdzenia „nic za darmo” (ang. No Free Lunch, NFL) (Wolpert & Macready 1997, Schumacher
et al. 2001) implikują, że nie ma ogólnych metod optymalizacji, które równie dobrze rozwiązywałyby problemy z bardzo obszernej klasy. Dzieje się tak przede wszystkim dlatego, że gdy w algorytmie nie wykorzystuje się wiedzy o problemie, co musi mieć miejsce przy rozpatrywaniu odpowiednio
szerokiej klasy problemów, to ten algorytm jest niejako „ślepy” i działa na problemie zamkniętym
w „czarnej skrzynce”. W takiej zaś sytuacji, zgodnie z tezą wspomnianych twierdzeń, efektywność
takiego algorytmu jest równa efektywności przeszukiwania losowego, czyli bardzo słabej metody
optymalizacji.
W praktyce więc istotą stosowania ogólnych metod optymalizacji, jakimi są metaheurystyki,
jest ucieczka od modelu „czarnej skrzynki”. To jednak nie jest możliwe tylko przez syntaktyczne
manipulacje na rozwiązaniach problemu, bez wykorzystania w algorytmie jego semantyki. Ta
ucieczka jest możliwa tylko wtedy, gdy metaheurystyka jest odpowiednio zaadaptowana do problemu, wyposażona w wiedzę o nim.
Dlatego też należy uznać, że nie ma metaheurystyk, które zawsze byłyby lepsze od innych
przy rozwiązywaniu dowolnych problemów optymalizacji. Są raczej lepsze lub gorsze dostosowania
metaheurystyk do konkretnych problemów.
Wykorzystanie wiedzy o regularności przestrzeni problemu
Twierdzenia „nic za darmo” także pośrednio wskazują, że w przestrzeni rozwiązań rozpatrywanego problemu optymalizacji musi być pewnego rodzaju regularność, jeśli jakiś algorytm ma być
lepszy dla tego problemu niż przeszukiwanie losowe. Co więcej, sama obecność tej regularności nie
wystarczy: musi ona być znana i bezpośrednio wykorzystana w stosowanym algorytmie.
W przypadku metaheurystyk oznacza to, że ogólny schemat algorytmu, zwykle w praktyce
niemodyfikowany, musi zostać wypełniony komponentami wyposażonymi w wiedzę o problemie,
do którego są stosowane. Przykładowo (zob. tablica 1), w przeszukiwaniu lokalnym należy zapro-
jektować odpowiednie operatory sąsiedztwa, a w algorytmie ewolucyjnym operatory krzyżowania
i mutacji.
Co jest regularnością przestrzeni?
Trudno jest jednak obecnie jednoznacznie odpowiedzieć na pytanie, co jest regularnością przestrzeni rozwiązań problemu i jak taką regularność wykorzystać w dostosowaniu algorytmu. Według
pewnych wskazówek taką regularnością może być wszystko, co prowadzi do przyspieszenia przeszukiwania przestrzeni, np. silna lokalność operatorów sąsiedztwa, pozwalająca na obliczanie wartości
funkcji celu szybciej dla sąsiadów niż dla dowolnych rozwiązań. Według innych może to być istnienie szybkiej procedury pozwalającej na obliczenie ograniczenia dolnego na wartość funkcji celu
w pewnej podprzestrzeni rozwiązań problemu. Dzięki takiej procedurze rozmiar przeszukiwanej
przestrzeni może być często znacznie ograniczony.
Konsekwencje dla algorytmów ewolucyjnych
W świetle twierdzeń NFL „wiara w używanie algorytmu ewolucyjnego jako „ślepego” narzędzia
optymalizacji jest nie na miejscu”1 i „nie ma powodu żeby uważać, że algorytm genetyczny będzie
ogólnie bardziej użyteczny niż jakiekolwiek inne podejście do optymalizacji” (Culberson 1998).
A jednak wielu entuzjastów neodarwinizmu powiedziałoby, że wyniki ewolucji naturalnej są
najlepszym dowodem na to, że procesy ewolucyjne oparte na zasadzie przetrwania osobników
najlepiej dostosowanych do środowiska dają w praktyce dobre rezultaty. Takie właśnie procesy
doprowadziły do powstania złożonych organizmów, doskonale dostosowanych do swoich środowisk.
Ale, jak to zręcznie ujmuje Culberson (1998), „sam fakt [istnienia] ewolucji naturalnej nie
wskazuje, gdzie mogą być obszary zastosowania algorytmu ewolucyjnego i z pewnością nie daje
podstaw do twierdzenia, że ten algorytm jest uniwersalnym narzędziem optymalizacji”. Podobnie
twierdzi Muhlenbein (2003): „ jestem przeciwny popularnym argumentom z rodzaju: ten algorytm
jest dobrą metodą optymalizacji, bo jego podstawy są używane w naturze”. Reeves i Rowe (2003)
także dokładnie przeanalizowali tę kwestię i w pierwszym rozdziale swojej książki zauważają, że:
• neodarwinizm jest atrakcyjną teorią i często wystarczy tylko przywołać teorię ewolucji oraz
nazwisko Darwina, żeby uzasadnić ogólność algorytmów ewolucyjnych jako metod optymalizacji;
• jednak mechanizmy ewolucji w naturze nie są jeszcze w wielu przypadkach dobrze znane
i wyjaśnione, jest za to wiele spekulacji na ich temat, bez solidnych dowodów;
• jest prawdopodobne, że ewolucja naturalna nie optymalizuje żadnej funkcji celu, a w każdym razie taki cel nie został jeszcze wskazany; brakuje więc uzasadnienia wychodzącego od
ewolucji naturalnej dla stosowania algorytmów ewolucyjnych do optymalizacji.
Dlatego należy stwierdzić, że algorytm ewolucyjny (a z nim memetyczny) nie jest niczym innym, jak tylko pewnym abstrakcyjnym tworem matematycznym, pewnym schematem optymalizacji, który być może ma niewiele wspólnego z ewolucją w naturze. To adaptacja tego algorytmu
do konkretnego problemu jest podstawą sukcesu lub porażki w optymalizacji, jak wskazują twierdzenia „nic za darmo”. Z tych powodów to adaptacja pewnego rodzaju algorytmu ewolucyjnego
jest głównym tematem tej rozprawy.
1 Wszystkie
tłumaczenia cytatów z języka angielskiego w streszczeniu pochodzą od autora rozprawy.
Rozdział 4. Metody adaptacji algorytmu ewolucyjnego do problemu
optymalizacji kombinatorycznej
Przegląd metod adaptacji dokonany w tym rozdziale może być źródłem jednej podstawowej konkluzji: w algorytmie ewolucyjnym (i memetycznym) jest kilka komponentów, które muszą być
zaadaptowane do rozważanego problemu przed wykorzystaniem tego algorytmu, jednak możliwości wyboru dla tej adaptacji jest wiele i zwykle trudno jest znaleźć w literaturze wskazówki praktyczne, w jakich warunkach i które z nich zastosować. Pewnymi wyjątkami są tutaj pojawiające się
od niedawna wskazówki oparte na analizie chropowatości krajobrazu funkcji celu (ang. landscape
ruggedness) lub jego globalnej wypukłości.
Faktycznie, w poważnych opracowaniach o algorytmach ewolucyjnych i metaheurystykach słychać narzekania na ten stan rzeczy. Michalewicz i Fogel (2000) przyznają, że teoretyczne podstawy
dla projektowania hybrydowych algorytmów ewolucyjnych (np. memetycznych) są niewielkie. Hoos
i Stutzle w epilogu do swojej książki (Hoos & Stutzle 2004) podsumowują w ten sposób stan aktualny:
wiele prac dotyczących projektowania i zastosowania algorytmów metaheurystycznych
raczej przypomina sztukę niż naukę (. . . ), doświadczenie, bardziej niż zrozumienie, jest
często kluczem do osiągania zamierzonych celów.
Podobnie uważają Krasnogor i Smith (2005), przyznając, że „proces projektowania efektywnych
algorytmów memetycznych jest wykonywany obecnie ad hoc i często jest ukryty za szczegółami
charakterystycznymi dla konkretnego problemu”.
Ta duża ilość intuicji i doświadczenia potrzebna do projektowania dobrych algorytmów memetycznych (i ogólniej: metaheurystyk) jest często podstawą krytyki tych algorytmów i opinii,
że ich projektowanie nie jest oparte na systematycznym podejściu naukowym. Dlatego też znani
autorzy w tej dziedzinie oceniają zagadnienie wyjaśniania i predykcji efektywności algorytmów
ewolucyjnych jako jedne z najważniejszych w teorii obliczeń (Hoos & Stutzle 2004, Moscato &
Cotta 2003, Reeves & Rowe 2003). Co więcej, w ich mniemaniu badania prowadzone nad tym
zagadnieniem najprawdopodobniej zaowocują lepszym zrozumieniem związków pomiędzy własnościami problemów kombinatorycznych i algorytmów metaheurystycznych. To może w rezultacie
doprowadzić do zbudowania mocniejszych podstaw do stosowania tych algorytmów w praktyce.
Z tej perspektywy najciekawsze sposoby adaptacji algorytmu memetycznego zaprezentowane
w tym rozdziale to te oparte na pewnych analizach przeszukiwanej przestrzeni rozwiązań. Te
analizy polegają na badaniu chropowatości lub globalnej wypukłości krajobrazu funkcji celu. Autor tej rozprawy zajął się zagadnieniem globalnej wypukłości, ponieważ w przeszłości to właśnie projekty algorytmu memetycznego oparte na tej cesze problemu doprowadziły do uzyskania
dobrych wyników w optymalizacji (Galinier & Hao 1999, Hoos & Stutzle 2004, Jaszkiewicz &
Kominek 2003, Merz 2000). Także dlatego, że „systematyczna metoda projektowania właściwych
operatorów krzyżowania i mutacji byłaby bardzo pomocna ” (Reeves & Rowe 2003, strona 283),
a wykorzystanie globalnej wypukłości może prowadzić właśnie do projektu takich operatorów.
Przez zajęcie się właśnie tym zagadnieniem autor rozprawy kontynuuje badania prowadzone
wcześniej przez takich naukowców jak: Kirkpatrick i Toulouse (1985), Mühlenbein (1991), Boese
i inni (1995, 1994), Jones i Forrest (1995), Altenberg (1997), Merz (2000), Watson i inni (2003),
Jaszkiewicz i Kominek (1999, 2003).
Rozdział 5. Analiza globalnej wypukłości przestrzeni rozwiązań
problemu kombinatorycznego jako podstawa do adaptacji algorytmu
memetycznego
Analiza globalnej wypukłości krajobrazu funkcji celu
Krajobraz L funkcji celu (ang. fitness landscape) dla instancji I problemu optymalizacji kombinatorycznej π to trójelementowy wektor L = (S, f, N ), gdzie S = Sπ (I) to zbiór rozwiązań instancji
I, f to funkcja celu, a N to funkcja sąsiedztwa rozwiązań w zbiorze S, czyli N : S → 2S . Zamiast
funkcji sąsiedztwa można też zastosować funkcję odległości rozwiązań d : S × S → R, co daje ten
sam rezultat.
Analiza globalnej wypukłości (ang. fitness-distance analysis, FDA) poszukuje w krajobrazie
funkcji celu związku pomiędzy jakością rozwiązań, a ich odległością do celu przeszukiwania, czyli
optimum globalnego. Najczęściej była do tej pory wykonywana jako analiza statystyczna próbki
dobrych rozwiązań i podsumowywana przez współczynnik korelacji jakości i odległości. Pożądanym
wynikiem analizy w przypadku problemu maksymalizacji jest korelacja negatywna, czyli malejąca
odległość rozwiązań do optimum globalnego przy zwiększaniu się wartości funkcji celu. W takim
wypadku przeszukiwanie przestrzeni przy pomocy algorytmów opartych na selekcji (np. ewolucyjnych) powinno być łatwe, gdyż istnieje ścieżka do optimum przez rozwiązania o wzrastającej
wartości funkcji celu (Merz 2000).
Podstawowa wersja badania globalnej wypukłości wymaga znajomości co najmniej jednego
optimum globalnego dla analizowanej instancji problemu. Najpierw generowana jest duża próba
losowa dobrych rozwiązań tej instancji. Pojedyncze rozwiązanie generuje się niezależnie od innych
przez wylosowanie punktu startowego w krajobrazie i zastosowanie do niego jakiegoś zrandomizowanego algorytmu przeszukiwania lokalnego. W ten sposób próbka zawiera losowe optima lokalne.
Dla każdego elementu próbki s oblicza się wartość funkcji celu f (s) i odległość od najbliższego
optimum globalnego dopt (s). Również dla każdej pary rozwiązań w próbce, s1 , s2 , oblicza się ich
wzajemną odległość d(s1 , s2 ).
Na podstawie tych pomiarów ocenia się najpierw rozkład wzajemnej odległości rozwiązań.
Oblicza się średnią odległość optimów lokalnych (elementów próby):
d¯ =
n
n
X
X
2
d(si , sj )
n(n − 1) i=1 j=i+1
Tę wartość porównuje się ze średnią odległością rozwiązań losowych lub z analitycznie obliczoną
średnicą krajobrazu. W ten sposób można uzyskać odpowiedź na pytanie, czy optima lokalne
badanej instancji są skupione w jakimś fragmencie krajobrazu, czy też są raczej rozrzucone po
nim całym. Jeśli optima lokalne są skupione, to oznacza, że mają zwykle wiele wspólnych cech, co
może być wykorzystane w projekcie operatorów krzyżowania.
Drugim elementem FDA jest ocena siły związku jakości i odległości w pobranej próbce. W tym
celu oblicza się wartość współczynnika korelacji liniowej pomiędzy jakością i odległością (ang. fitnessdistance correlation, FDC). Dla próbki postaci s = {s1 , . . . , sN }, FDC oblicza się jako:
r=
cov(f, dopt )
sf · sdopt
gdzie cov oznacza estymatę kowariancji dwóch zmiennych na podstawie próbki:
cov(f, dopt ) =
N
1 X
(f (si ) − f¯)(dopt (si ) − d¯opt )
N i=1
f¯ i d¯opt to średnie wartości jakości i odległości w próbce, a s to wartość odchylenia standardowego
odpowiedniej zmiennej w próbce, np.:
v
u
N
u1 X
t
(dopt (si ) − d¯opt )2
sdopt =
N i=1
Wysokie dodatnie wartości FDC (dla problemu minimalizacji) sugerują, że optima lokalne instancji są rozłożone wokół optimum globalnego, które znajduje się w centrum. Co więcej, im
gorsze są optima lokalne, tym dalej od tego centrum się znajdują (Reeves & Yamada 1998). Taki
krajobraz funkcji celu jest wg Jonesa i Forrest (1995) łatwy dla algorytmów genetycznych. Korelacja FDC równa 1 wskazywałaby na liniowy związek jakości i odległości, a więc i na łatwe
przeszukiwanie (Merz 2000). Za to wartości ujemne r ukazują problem trudny, gdzie rozwiązania
coraz to lepsze są coraz to dalej od celu przeszukiwania. Wartości około zera wskazują na brak
związku pomiędzy jakością i odległością w krajobrazie, czyli brak pomocy ze strony funkcji celu
w przeszukiwaniu. Zerowa korelacja może także wskazywać na związek nieliniowy, który nie jest
dobrze opisywany przez współczynnik korelacji liniowej.
Z tego powodu warto się przyjrzeć związkowi jakości i odległości na wykresie rozrzutu. Jeden
element próbki stanowi wtedy pojedynczy punkt na wykresie o osiach f (s) i dopt (s). Przykłady
0.8
0.9
0.75
0.8
0.7
0.7
d_pn
d_e
podobnych wykresów są pokazane na rysunku 1. Zostały one wzięte z rzeczywistych analiz globalnej wypukłości. Dla problemu minimalizacji pożądane jest istnienie trendu w obserwowanym
zbiorze punktów: wraz ze zmniejszaniem się wartości funkcji celu (oś pozioma) maleć powinny
odległości do optimum (oś pionowa).
0.65
0.6
0.6
0.5
0.55
0.4
0.5
1060
1080
1100
1120
fitness
1140
1160
1180
0.3
1600
1640
1680
1720
fitness
1760
1800
1840
Rysunek 1: Przykłady wykresów rozrzutu jakości i odległości dla dwóch instancji
problemu CVRP, z dodanymi prostymi regresji. Dla wykresu po lewej r = 0.54; po
prawej r = 0.52.
Wykorzystanie globalnej wypukłości w operatorach krzyżowania
zachowujących odległość
Według Merza (2000) oraz Jaszkiewicza i Kominka (2003) algorytm memetyczny powinien stosować operator krzyżowania zachowujący odległość (ang. distance-preserving crossover, respectful
crossover) w przypadku globalnie wypukłego krajobrazu funkcji celu. Taka operacja zachowuje
w potomku niezmienione wspólne cechy rozwiązań rodzicielskich, które powodują, że odległość
potomka od rodziców jest niewiększa niż odległość rodziców od siebie. Celem takiego projektu
jest właśnie to, żeby potomek nie był dowolnie odległy od rodziców, ale żeby odległość ta była
kontrolowana i zależna od odległości rodziców.
Operatory krzyżowania zachowujące odległość są użyteczne w sytuacji, gdy dobre rozwiązania
rozpatrywanego problemu są skupione w niewielkim fragmencie krajobrazu. Wtedy potomki krzy-
żowania dziedziczą wiele wspólnych cech rodziców i następuje intensyfikacja obliczeń w niedużej
od nich odległości, w tym właśnie niewielkim fragmencie. Jeśli do tego w tym krajobrazie zaobserwowano globalną wypukłość, to rozwiązanie potomne, zachowujące wspólne cechy rodziców, ma
większą szansę na lepszą wartość funkcji celu niż rozwiązanie potomne, które tych cech nie zachowuje (Merz 2000, Mattfeld et al. 1999). Przykłady takich efektywnych operatorów można znaleźć
w literaturze (Merz 2000, Merz 2002, Merz & Freisleben 2000b, Merz & Freisleben 2000a, Jaszkiewicz & Kominek 2003, Jaszkiewicz 2004).
Przykłady literaturowe i pozytywne wyniki własnych eksperymentów skłoniły Jaszkiewicza
(2004) do sformułowania schematu adaptacji algorytmu memetycznego do problemu optymalizacji
kombinatorycznej opartego na badaniach globalnej wypukłości. Schemat ten składa się z następujących kroków:
1. Wygeneruj zbiory dobrych i zróżnicowanych rozwiązań dla rozważanych instancji problemu.
2. Sformułuj pewne hipotezy dotyczące cech rozwiązań, które miałyby być istotne dla dobrych
rozwiązań tego problemu.
3. Dla każdej cechy i każdej instancji zbadaj istotność tych cech dla jakości rozwiązań przy
pomocy korelacji jakości i odległości (czyli wykonaj badanie globalnej wypukłości). Odległość
jest przy tym zdefiniowana tak, by odzwierciedlała rozpatrywane cechy.
4. Zaprojektuj operator krzyżowania zachowujący odległość, czyli wspólne cechy rozwiązań,
o ile zaobserwowano korelację jakości i odległości dla danej cechy. Jeden operator może
zachowywać wiele cech różnego rodzaju.
Głównym celem tego schematu adaptacji jest redukcja wkładu pracy potrzebnego do zaprojektowania dobrego algorytmu optymalizacji. Ta redukcja jest uzyskiwana przez unikanie projektowania operatorów metodą prób i błędów, co miało miejsce w przeszłości np. dla problemu komiwojażera (zob. uwagi w pracy Jaszkiewicza i Kominka (2003)). Wykorzystanie badania globalnej
wypukłości pozwala projektantowi odkryć te cechy rozwiązań, które powinny być zachowywane
podczas przeszukiwania, żeby uzyskiwać dobrą jakość. W ten sposób wprowadza się do komponentu algorytmu memetycznego niezbędną wiedzę o strukturze problemu.
Wnioski
Analiza globalnej wypukłości bada bardzo ciekawy aspekt przestrzeni rozwiązań problemu optymalizacji: związek pomiędzy jakością rozwiązań i ich odległością wzajemną lub do optimum globalnego. Jeśli istnieje taki związek, wyrażany przez dodatnią korelację jakości i odległości dla
problemu minimalizacji, oznacza to, że coraz lepsze rozwiązania problemu są bliżej siebie i jednocześnie bliżej optimum. Ten związek uzasadnia wprowadzanie do metaheurystyk komponentów
zachowujących odległość aby podnieść efektywność przeszukiwania. Taka idea, że na podstawie
pewnej miary odległości pomiędzy rozwiązaniami problemu kombinatorycznego można projektować lepsze komponenty algorytmu, nie była jeszcze proponowana.
Trzeba jednak przyznać, że analiza FDA nie jest jeszcze w pełni rozwiniętą metodą badania:
• brak odpowiedniego modelu matematycznego globalnej wypukłości,
• reguły interpretacji wartości współczynnika korelacji są do pewnego stopnia uznaniowe,
• wynik analizy zależy od instancji i może być niejednoznaczny dla całego problemu,
• istnieją różne wersje procedury analizy, o różnych własnościach,
• znajomość optimów globalnych jest wymagana dla uzyskania poprawnego wyniku,
• bez znajomości optimów globalnych wynik analizy jest jedynie przybliżony,
• istnieją teoretyczne argumenty przeciw sensowności korelacji FDC jako dobrego wskaźnika
trudności instancji problemu.
Jednak pomimo tych argumentów przeciw FDA należy powiedzieć, że analizy wykonane do tej
pory jasno wykazały istnienie takiego zjawiska jak globalna wypukłość w przestrzeniach kilku
problemów optymalizacji. Trudno więc zaprzeczać samemu istnieniu takiego zjawiska, nawet gdy
metoda jego badania nie jest jeszcze wystarczająco dobra. Według autora tej pracy jest bardzo
prawdopodobne, że zjawisko globalnej wypukłości istnieje także w innych, jeszcze nie przebadanych
problemach.
Patrząc z drugiej strony, zagadnienia adaptacji, można mieć wątpliwości co do istnienia związku
pomiędzy globalną wypukłością instancji problemu i trudnością tej instancji dla algorytmu memetycznego (Bierwirth et al. 2004, Hoos & Stutzle 2004). Istnienie tego związku zostało do tej pory
potwierdzone jedynie jakościowo, np. przez prace wymienione w rozprawie. Wyrażenie ilościowe
takiego lub podobnego związku trudno jeszcze znaleść w literaturze. Istnieje pewna pionierska
praca Watsona i innych (2003), ale dotyczy ona problemu szeregowania zadań (job shop) i algorytmu przeszukiwania tabu. Wygląda więc na to, że wiele jeszcze pozostało do zrobienia, by
wyjaśnić istnienie takiego związku i wykazać, że jest on silny. Wielu badaczy wierzy jednak, że
taki związek istnieje.
Jednym z nich jest autor tej rozprawy. Uważa on, że pomiędzy globalną wypukłością i efektywnością algorytmu memetycznego z operatorami zachowującymi odległość istnieje silny związek.
To przekonanie jest oparte przede wszystkim na wynikach dotychczasowych prac Boese i innych
(1994), Reevesa i Yamady (1998), szeregu prac Merza (2000, 2004), Jaszkiewicza (1999, 2004)
i także Kominka (2003). Te prace przekonują autora, że projektowanie operatorów zachowujących
odległość dla algorytmu memetycznego daje, w przypadku globalnej wypukłości, dobre wyniki
w optymalizacji.
Rozdział 6. Problem planowania tras dla pojazdów z ograniczeniami
pojemnościowymi
W tym rozdziale opisany jest problem planowania tras dla pojazdów z ograniczeniami pojemnościowymi (ang. the capacitated vehicle routing problem, CVRP). Problem taki polega na wyznaczeniu tras dla pojazdów firmy transportowej. Firma ma za zadanie dostarczyć pewien towar ze
swojej centrali do pewnej liczby klientów. Klienci są rozproszeni geograficznie, odpowiednie odległości pomiędzy klientami i magazynem są z góry dane. Każdy klient zgłosił wcześniej pewne
zapotrzebowanie na towar. Firma posiada identyczne pojazdy z pewnym zadanym ograniczeniem
pojemności przewozowej. Trasa dla każdego z pojazdów rozpoczyna się w centrali firmy, prowadzi
przez pewnych klientów, gdzie dokonywany jest rozładunek wymaganej ilości towaru, po czym
powraca do centrali. W żadnym momencie pojazd nie może być przeładowany, tzn. nie może być
przekroczona jego pojemność. Każdy klient jest obsługiwany przez dokładnie jeden pojazd. Celem firmy jest wyznaczenie takiego zestawu tras, który zapewnia obsłużenie wszystkich klientów
oraz minimalizację długości wszystkich tras. Ta sumaryczna długość całego zestawu tras modeluje
koszt wszystkich dostaw realizowanych przez firmę.
Ten problem jest trudny obliczeniowo (NP-trudny). Nawet dla dosyć małych instancji tego
problemu z 75 klientami optima globalne pozostają nieznane.
Przegląd heurystyk i metaheurystyk
Autor dokonał przeglądu niektórych istniejących algorytmów heurystycznych i metaheurystycznych dla tego problemu. Z przeglądu wynika, że algorytmy metaheurystyczne dla CVRP są raczej skomplikowanymi konstrukcjami, z wieloma komponentami, technikami przyspieszania, strategiami dywersyfikacji, a nawet procedurami obliczającymi rozwiązania pewnych podproblemów
w sposób dokładny. Można jednak znaleźć pewne wspólne idee w tych metaheurystykach, a w szczególności w algorytmach ewolucyjnych.
Jeśli chodzi o przeszukiwanie lokalne, to widać, że jest ono konieczne w efektywnej adaptacji
metaheurystyki do CVRP. Najlepsze do tej pory algorytmy są mocno oparte na takim przeszukiwaniu. Do tego musi ono być szybkie. W większości z zaprezentowanych algorytmów to właśnie
przeszukiwanie lokalne zabiera największą część czasu obliczeń, a więc jego projekt ma duży wpływ
na całkowity czas. Najlepsze projekty metaheurystyk dąża też do przyspieszania przeszukiwania
lokalnego przez wykorzystanie różnorodnych technik.
Specjalizowana reprezentacja rozwiązań i operatory rekombinacji W przejrzanych publikacjach można zauważyć, że specjalizowane reprezentacje i operatory krzyżowania oraz mutacji
są konieczne dla CVRP. Widać w tych publikacjach kilka podejść.
Jedno z podejść jest reprezentowane przez prace Rochata i Taillarda (1995) oraz Potvina i Bengio (1996). Ci badacze mieli intuicję, że operator krzyżowania powinien tworzyć potomka przez
dziedziczenie z rodziców całych tras. Ta intuicja opierała się w przypadku Rochata i Taillarda na
wizualnej analizie podobieństwa dobrych heurystycznych rozwiązań CVRP do rozwiązania najlepszego znanego. Stąd też ich procedura konstrukcji potomka. Na wejściu otrzymuje ona dużą liczbę
pełnych tras z dobrych rozwiązań wygenerowanych wcześniej i próbuje niejako złożyć potomka
z tych tras. Także operator krzyżowania RBX Potvina i Bengio próbuje przenosić z rodziców
do potomka pełne trasy. Bardzo podobnie działa operator krzyżowania zaproponowany przez Tavaresa i innych (2003). Te konstrukcje nie były jednak wsparte żadnymi analizami przestrzeni
rozwiązań, które mogłyby potwierdzić tę intuicję dotyczącą podobieństwa tras.
Kolejne podejście do projektowania operatorów było prawdopodobnie oparte na podobieństwie
pomiędzy problemem CVRP i problemem komiwojażera (TSP). Po niewielkiej adaptacji operatory charakterystyczne dla TSP były używane także dla CVRP (Gendreau et al. 2002). Niedawno
także różne operatory rekombinacji krawędzi (ang. edge recombination, edge-assembly crossover)
były adaptowane do CVRP (Alba & Dorronsoro 2004, Alba & Dorronsoro 2006, Nagata 2007).
Takie operatory przenoszą pewne krawędzie tras z rodziców do potomków. I jest to najpewniej
sensowna strategia dla problemu komiwojażera, bo analizy krajobrazu funkcji celu dla tego problemu wskazywały, że to krawędzie są w tym problemie nośnikiem istotnej informacji o jakości
rozwiązań. I choć podobieństwo TSP i CVRP sugeruje, że tak samo może być dla CVRP, to
jednak dla problemu planowania tras takie analizy nie zostały wykonane.
Osobnym przypadkiem jest operator SPX Prinsa (2001), także stosowany niedawno przez Jozefowieza i innych (2007). Jest to de facto operator krzyżowania porządkowego stosowany na
reprezentacji permutacyjnej. Korzysta on z dodatkowej dokładnej procedury dekodowania permutacji do postaci rozwiązania CVRP. To podejście z dokładnym dekodowaniem redukuje znacznie
rozmiar przestrzeni przeszukiwań, gdyż permutacji jest zazwyczaj mniej niż permutacji z podziałami (jakimi są normalnie rozwiązania problemu planowania tras). Jednak zastosowanie krzyżowania porządkowego na permutacjach zostało przez Prinsa uzasadnione wynikami optymalizacji,
a nie badaniami przestrzeni rozwiązań. Ten operator porządkowy został wybrany „po pewnych
wstępnych testach” (Prins 2001).
W takim razie wygląda na to, że istniejące projekty operatorów rekombinacji dla CVRP były
oparte głównie na dobrej intuicji, podobieństwie problemów CVRP i TSP, oraz wynikach wstępnych eksperymentów obliczeniowych. Sensowność operatorów była potem sprawdzana w dalszych
eksperymentach z optymalizacją. Nie była jednak w żadnym z prezentowanych przypadków oparta
na teoretycznych lub empirycznych analizach przestrzeni rozwiązań problemu planowania tras. Autor tej rozprawy nie widział w literaturze np. żadnej analizy globalnej wypukłości dla CVRP, choć
niezbędne dla przeprowadzenia tej analizy miary odległości rozwiązań były całkiem niedawno proponowane. Systematyczna konstrukcja operatorów krzyżowania dla CVRP oparta na badaniach
globalnej wypukłości, która jest głównym tematem następnego rozdziału, jest więc pierwszym
podejściem do projektowania krzyżowania opierającym się na analizie krajobrazu funkcji celu.
Rozdział 7. Adaptacja algorytmu memetycznego do problemu
planowania tras oparta na badaniach globalnej wypukłości
Rozdział ten prezentuje adaptację algorytmu memetycznego do problemu planowania tras. Widać
w tym rozdziale, że ta adaptacja wymaga podjęcia szeregu decyzji projektowych. Autor zaprezentował w szczególności wybraną reprezentację, projekt szybkiego przeszukiwania lokalnego, wybór
heurystyk do generowania rozwiązań początkowych.
Co ważniejsze, w rozdziale jest zaprezentowana systematyczna konstrukcja operatorów krzyżowania oparta na badaniu globalnej wypukłości. To w niniejszej pracy po raz pierwszy zostało
wykonane takie badanie problemu planowania tras. Najpierw autor zaproponował i zaimplementował miary odległości dla rozwiązań tego problemu. Potem wykorzystał te miary w analizie
globalnej wypukłości. Analiza ta wykazała, że średnie odległości de , dpn , deu , dear pomiędzy optimami lokalnymi są ok. 30% mniejsze niż średnie odległości rozwiązań losowych. To oznacza, że
optima lokalne są do siebie podobne w sensie cech mierzonych przez te rodzaje odległości i że są
do pewnego stopnia skoncentrowane w badanych krajobrazach. Co więcej, analiza FDA wykazała istnienie umiarkowanych wartości korelacji jakości i odległości de , dpn , deu , co oznacza, że im
lepsze są optima lokalne, tym więcej posiadają wspólnych cech (krawędzi, klastrów, podsekwencji
klientów). Przykładowe wykresy jakości i odległości dla CVRP, pokazujące istnienie tej korelacji,
są pokazane na rysunku 1. Obecność tej korelacji zależna jest jednak od analizowanej instancji
problemu.
Te wyniki analizy FDA potwierdzają do pewnego stopnia intuicję badawczą wyrażaną wcześniej
np. przez Rochata i Taillarda (1995), że dobre rozwiązania CVRP są podobne do tych najlepszych
znanych. W ich pracy ta intuicja była oparta na wizualnym podobieństwie rozwiązań. Za to w tej
pracy została wyrażona w sposób obiektywny w postaci pewnych miar odległości rozwiązań i przeanalizowana empirycznie. Wygląda jednak na to, że ich intuicja była prawdziwa tylko do pewnego
stopnia (ze względu na umiarkowane wartości korelacji) i nie dla wszystkich instancji.
Te same wyniki analizy FDA były dalej dla autora podstawą dla projektu i implementacji
4 operatorów zachowujących odległość: CPX2, CEPX, CECPX2, GCECPX2. Pierwszy operator
zachowuje dpn , drugi de , a dwa ostatnie obie te miary odległości. Dodatkowo zaprojektowany został
operator mutacji CPM, który zachowuje odległość dpn , ale zaburza de . Te operatory, razem z RBX
i SPX wziętymi z literatury, zostały przetestowane w dwóch eksperymentach z wykorzystaniem
algorytmu memetycznego.
Wyniki eksperymentów pozwalają na wyciągnięcie następujących wniosków.
• W eksperymencie z krótkimi uruchomieniami po 256 sekund szybkość operacji krzyżowania
i następującego po nim przeszukiwania lokalnego okazała się istotna. RBX okazał się szybszy
niż wszystkie zaprojektowane przez autora operatory, a SPX szybszy niż niektóre z nich. Do
tego SPX i RBX dawały lepsze wyniki w tym eksperymencie, co zostało potwierdzone przez
testy statystyczne.
• Zachowujący odległość operator CEPX wykazał jednak większe prawdopodobieństwo generowania coraz to lepszych rozwiązań w algorytmie memetycznym niż RBX i SPX. Można
to było zaobserwować po liczbie generacji algorytmu w eksperymencie ze zbieżnością. Co
więcej, testy statystyczne wykazały najlepszą jakość rozwiązań generowanych właśnie przez
CEPX w tym eksperymencie.
• Obecność mutacji CPM w algorytmie memetycznym wpływała pozytywnie na jakość generowanych rozwiązań. Najbardziej potrzebna ta mutacja była operatorom silnie burzącym
cechy niewspólne rodziców (np. CPX2), a najmniej dla SPX.
• Wyniki algorytmu memetycznego były bardzo dobre. Średni odstęp jakości rozwiązań od
jakości rozwiązań najlepszych znanych wyniósł 0,5–0,7% dla wszystkich rodzajów krzyżowania, z niewielkim rozrzutem. Co więcej, dla połowy z analizowanych instancji pewne
uruchomienia algorytmu znajdowały te najlepsze znane rozwiązania. Najlepsze rezultaty
pod względniem jakości dawał algorytm memetyczny z operatorami CEPX i CPM, które
zostały zaprojektowane przez autora.
Zaprezentowane wyniki eksperymentów oznaczają według autora, że dobry operator krzyżowania dla CVRP powinien zachowywać wspólne cechy rodziców (krawędzie, klastry) i do tego nie
burzyć silnie cech niewspólnych. Gdy potrzebna jest duża szybkość operatora, to można w nim
zawrzeć podprocedurę zachłannego doboru cech niewspólnych (jak w GCECPX2). To powoduje
mniejsze ich zaburzanie i redukuje liczbę iteracji przeszukiwania lokalnego po krzyżowaniu.
Z operatorów przetestowanych w tej rozprawie autor wybrałby SPX Prinsa (2004) do krótkich
uruchomień algorytmu memetycznego. W takich warunkach ten operator generuje całkiem szybko
dobre rozwiązania dla CVRP. Gdy można poświęcić więcej czasu na obliczenia i liczy się bardziej
jakość uzyskanych rozwiązań, autor poleciłby zastosowanie operatora krzyżowania zachowującego
krawędzie, CEPX, i mutacji zachowującej klastry, a zaburzającej krawędzie, CPM. To ta para
generowała w algorytmie memetycznym najlepsze rozwiązania.
Autor podjął także próbę analizy bezpośredniego związku jakości rozwiązań z jednego eksperymentu obliczeniowego z wynikami badania globalnej wypukłości. Ta próba nie powiodła się: nie
znaleziono żadnego związku tego rodzaju. W takim razie nie udało się potwierdzić wcześniejszych
stwierdzeń innych autorów, że korelacja jakości i odległości rozwiązań może służyć do predykcji
trudności danego problemu dla algorytmu ewolucyjnego.
Jakość rozwiązań generowanych przez algorytm memetyczny okazała się związana w pewnym
stopniu z procentem dopuszczalnych sąsiadów najlepszych znanych rozwiązań: im mniejsza frakcja
dopuszczalnych sąsiadów, tym trudniej o dobre rozwiązania w algorytmie memetycznym. Być może
warto w takim razie w tym algorytmie przyjmować do populacji także rozwiązania niedopuszczalne,
by poprawić eksplorację granicy rozwiązań dopuszczalnych i niedopuszczalnych. Stosowany w tej
pracy algorytm memetyczny przyjmował do populacji jedynie rozwiązania dopuszczalne.
Podsumowując, metoda systematycznej konstrukcji operatorów rekombinacji oparta na badaniach globalnej wypukłości dała wg autora dobry wynik dla rozpatrywanego problemu planowania
tras. Najlepszy z zaprojektowanych operatorów, zachowujący krawędzie CEPX, używany z mutacją
CPM generuje rozwiązania najlepsze z punktu widzenia jakości wśród wszystkich przetestowanych
operatorów.
Rozdział 8. Problem planowania produkcji samochodów w fabrykach
Renault
Problem planowania produkcji samochodów w fabrykach Renault (ang. Renault’s car sequencing
problem, CarSP) wymaga, by zadany zbiór pewnych samochodów został ustawiony w pewnej
kolejności (sekwencji) na linii produkcyjnej. Ta kolejność musi zachować przy tym ograniczenia
technologiczne procesu produkcji w poszczególnych jej etapach: w lakierni i w montowni (nie
rozważa się w tym problemie zadań związanych z pierwszym etapem, spawalnią).
Lakiernia wymaga, by samochody o tym samym kolorze nadwozia były ustawione bezpośrednio
po sobie w sekwencji. Takie ustawienie minimalizuje koszt czyszczenia pistoletów natryskowych
w lakierni, gdyż muszą one być czyszczone po każdej zmianie koloru samochodu na linii.
Wymaganie montowni to przede wszystkim równomierne rozłożenie w sekwencji nakładu pracy
potrzebnego do zmontowania samochodów. Ten nakład jest związany z koniecznością instalacji
w pojazdach pewnych dodatkowych elementów (opcji) wyposażenia, np. szyberdachu, systemu
nawigacji, elektrycznych szyb. Wymaganych opcji jest wiele, a samochody ustawione na linii
produkcyjnej często wymagają różnych zestawów opcji. Dlatego nakład pracy na pewnych stanowiskach wzdłuż linii może się zmieniać bardzo nierównomiernie, zależnie właśnie od kolejności
samochodów. To może prowadzić do częstych opóźnień i przestojów produkcji, i stanowi jej dodatkowy koszt. Ten koszt jest modelowany przez liczbę naruszeń pewnych ograniczeń proporcji,
nałożonych na opcje pojazdów w sekwencji.
Ostatecznie rozwiązaniem optymalnym problemu planowania produkcji samochodów jest minimalizacja ważonej sumy liczby zmian koloru (lakiernia) i liczb naruszeń ograniczeń proporcji
(montownia), przy jednoczesnym zachowaniu wymaganych ograniczeń produkcji.
Praktyczne przypadki tego problemu są trudne obliczeniowo (NP-trudne), ze względu na ograniczenia proporcji. Za to minimalizacja liczby zmian koloru w lakierni jest problemem prostym.
Przegląd heurystyk i metaheurystyk
Przegląd kilku istniejących algorytmów dla problemu planowania produkcji samochodów prowadzi
do następujących spostrzeżeń.
Istotne przeszukiwanie lokalne Wygląda na to, że przeszukiwanie lokalne jest podstawą efektywnych algorytmów dla tego problemu. Najlepsze istniejące metody opierają się na właśnie takim
przeszukiwaniu. Do tego wszystkie przeanalizowane algorytmy używają pewnego podzbioru tego
samego zestawu operatorów sąsiedztwa: wymiany (ang. swap), wstawienia (insert), odwrócenia
(reflection), tasowania (random shuffle). To nie są operatory specjalnie przeznaczone dla problemu CarSP. To są raczej ogólne operacje, które były już stosowane w wielu problemach. Nie jest
do końca jasne które z tych ogólnych operatorów, lub jaka ich kombinacja, sprawują się najlepiej
dla CarSP i dlaczego. Raczej to szybkość obliczania wartości funkcji celu dla rozwiązań sąsiednich
jest kluczem do sukcesu przy wyborze operatora.
Dobre heurystyki początkowe Kolejnym wspólnym elementem przeanalizowanych algorytmów jest stosowanie przez nie tej samej dobrej heurystycznej idei Puchty i Gottlieba (2002):
dynamicznej sumy użyteczności (ang. the dynamic sum of utilities, DSU). Poza autorami tej idei
wykorzystują ją także Estellon i inni (2006). A Zinflou i inni (2007) używają jej jako elementu
swojego krzyżowania NCPX.
Co poza przeszukiwaniem lokalnym? Najlepsze z prezentowanych algorytmów nie odchodziły daleko od idei przeszukiwania lokalnego: Ribeiro i inni (2005) po prostu iterowali je, wykorzystując przy tym pewne mechanizmy perturbacji (mutacji); Estellon i inni (2006) wykorzystali
dodatkowo specjalny operator k-permutacji. Algorytmy innego rodzaju są rzadkością.
Operatory rekombinacji? Dosyć trudno znaleźć dla problemu CarSP dobre operatory krzyżowania. Bardzo niedawno Zinflou i inni (2007) dokonali właśnie takiej obserwacji, którą umotywowali 3 swoje nowe propozycje takich operatorów. Poza nimi do rozwiązywania CarSP były
stosowane: klasyczny operator krzyżowania jednopunktowego i dostosowany operator krzyżowania
jednorodnego (ang. uniform adaptive crossover, UAX).
Jednak projekty tych operatorów były motywowane przede wszystkim intuicją, pewnymi pomysłami heurystycznymi (DSU) lub też doświadczeniami z innych problemów. Żaden z przytaczanych autorów nie próbował teoretycznie lub empirycznie odkryć, jaki rodzaj informacji jest
najistotniejszy dla dobrej jakości rozwiązań CarSP.
Z tego powodu w rozprawie podjęta została próba empirycznej analizy globalnej wypukłości. Wyniki tej analizy są dalej podstawą do zaproponowania projektu operatorów dla algorytmu
memetycznego, zgodnie ze schematem systematycznej ich konstrukcji.
Rozdział 9. Adaptacja algorytmu memetycznego do problemu
planowania produkcji samochodów oparta na badaniach globalnej
wypukłości
Treścią tego rozdziału jest adaptacja algorytmu memetycznego do problemu firmy Renault. Opisano w nim następujące elementy adaptacji: wybraną reprezentację, projekt przeszukiwania lokalnego, projekt operatorów krzyżowania i mutacji.
W szczególności, operator krzyżowania został zaprojektowany na podstawie wyników analizy
globalnej wypukłości. Przed analizą autor sformułował kilka hipotez dotyczących tych cech rozwiązań problemu, które mogą mieć wpływ na wartość funkcji celu. Cechy te zostały odzwierciedlone
w zaproponowanych miarach podobieństwa rozwiązań. Analiza podobieństwa optimów lokalnych
potwierdziła wstępne hipotezy: pozycje pojazdów na linii produkcyjnej nie mają znaczenia w problemie CarSP, ale za to ma znaczenie istnienie pewnych takich samych podsekwencji pojazdów
(niezależnie od ich miejsca na linii). Podobieństwo rozwiązań w sensie następstwa pojazdów na
linii również okazało się istotna dla jakości, ale w mniejszym stopniu.
Przykładowe wykresy jakości i podobieństwa, ilustrujące znalezione korelacje, są pokazane na
rysunku 2. Warto zauważyć, że na osiach pionowych wykresów są pokazane wartości podobieństwa,
a nie odległości.
Niestety, okazało się, że wysoka korelacja jakości i dwóch rodzajów podobieństwa nie jest własnością problemu CarSP jako całości, choć wystąpiła dla większości analizowanych instancji. Jest
to raczej własność tylko pewnych typów instancji lub nawet pojedynczych instancji.
Uzyskane wyniki dały jednak podstawy do zaprojektowania operatora krzyżowania CCSPX-2,
zachowującego w potomku wspólne sekwencje pojazdów rodziców. Podobnie, mutacja RSM zaburzająca podsekwencje została wybrana do zastosowania. Operatory te zostały zaimplementowane
i przetestowane w algorytmie memetycznym. W dwóch eksperymentach obliczeniowych porównano
je do operatorów proponowanych w literaturze.
Wyniki eksperymentów pokazały, że ta para zaproponowanych operatorów była najlepsza
w sensie kilku wskaźników jakości. Po pierwsze, generowała rozwiązania o najlepszej średniej
0.062
0.069
0.068
0.061
sim_cs
sim_cs
0.067
0.06
0.059
0.066
0.065
0.058
0.057
300000
0.064
320000
340000
fitness
360000
0.063
240000
260000
280000
300000
fitness
Rysunek 2: Przykłady wykresów rozrzutu jakości i podobieństwa dla dwóch instancji problemu CarSP, z dodanymi prostymi regresji. Dla wykresu po lewej r = 0.68;
po prawej r = 0.57.
jakości w długich uruchomieniach algorytmu, do uzyskania zbieżności. Dla jednej instancji wygenerowała nawet rozwiązanie lepsze niż najlepsze znane dotychczas. Po drugie, te operatory
wykazywały największe prawdopodobieństwo wstawienia nowych dobrych rozwiązań do populacji. Po trzecie, potrzebowały zdecydowanie najmniejszej liczby iteracji przeszukiwania lokalnego
do poprawy swoich potomków, co znacznie przyspieszało obliczenia.
W krótkich uruchomieniach algorytmu memetycznego, zaprojektowana mutacja RSM okazała
się najważniejszym operatorem. Miała w nich największy wkład w poprawę rozwiązań. Operator
krzyżowania CCSPX-2 był drugi w kolejności znaczenia, choć dla największych instancji jego wkład
był równy mutacji. Wygląda więc na to, że krzyżowanie CCSPX-2 było szczególnie użyteczne
w długich uruchomieniach algorytmu i dla dużych instancji.
Autor podjął także próbę związania wyników eksperymentów obliczeniowych i wyników analizy FDA dla podobieństwa wg podsekwencji. Ta próba się nie udała i żaden związek nie został
znaleziony. Prawdopodobnie przeszkodą było to, że wiele z czynników zwykle wpływających na
efektywność algorytmu było niekontrolowanych w przeprowadzanych eksperymentach.
Podsumowując, metoda systematycznej konstrukcji operatorów krzyżowania opierająca się na
badaniach globalnej wypukłości dała dobry wynik w przypadku problemu firmy Renault. Zaprojektowane operatory zachowujące lub zaburzające wspólne podsekwencje to najlepsze operatory
dla algorytmu memetycznego zaproponowane do tej pory.
Rozdział 10. Podsumowanie
Głównym celem pracy było przeprowadzenie i ocena adaptacji algorytmu memetycznego do dwóch
problemów optymalizacji na podstawie proponowanego schematu, opierającego się na badaniach
globalnej wypukłości. Cel ten został osiągnięty: adaptacja została przeprowadzona i oceniona
eksperymentalnie.
Następujące elementy pracy stanowią najważniejsze, oryginalne wyniki autora.
• Definicja i implementacja właściwych dla analizowanych problemów miar odległości (podobieństwa): de , dpn , dpc dla problemu CVRP i simcs , simcsuc dla problemu CarSP.
• Analiza globalnej wypukłości tych dwóch problemów z wykorzystaniem miar własnych i proponowanych w literaturze. Wykazała ona, że lokalne optima są do pewnego stopnia podobne
i skoncentrowane w analizowanych krajobrazach funkcji celu. Korelacja jakości i pewnych
typów odległości została znaleziona w większości analizowanych instancji.
• Konstrukcja operatorów krzyżowania zachowujących odległość i operatorów mutacji zaburzających odległość, właśnie na podstawie wyników analizy globalnej wypukłości. Te operatory
to: CPX2, CEPX, CECPX, GCECPX2 i CPM dla problemu CVRP; CCSPX-2 i RSM dla
problemu CarSP.
• Eksperymentalne porównanie zaprojektowanych operatorów z podobnymi operatorami z literatury. To porównanie wykazało, że operatory proponowane w tej pracy (CEPX i CPM;
CCSPX-2 i RSM) generują najlepsze rozwiązania w długich uruchomieniach algorytmu memetycznego, aż do uzyskania zbieżności. Te operatory mogą nie być jednoznacznie najlepsze
dla krótkich uruchomień, ale pozostają w grupie najlepszych.
Można w takim razie powiedzieć, że metoda konstrukcji operatorów opierająca się na analizie
globalnej wypukłości dała dobry rezultat dla dwóch rozpatrywanych problemów. Łącznie z wcześniejszymi analizami i projektami innych autorów, wyniki tej rozprawy wzmacniają podstawy
do stosowania tej metody w praktyce.
Pewne elementy tej pracy stanowią także dodatkowy wkład autora do rozwoju badań dotyczących zagadnienia globalnej wypukłości.
• Przegląd dotychczas wykonanych analiz globalnej wypukłości jest najprawdopodobniej najszerszym przeglądem tego rodzaju dostępnym w literaturze. Może on być cennym źródłem
informacji dla badaczy zainteresowanych taką metodą analizy.
• Nowa wersja metody badania globalnej wypukłości ma zapewne lepsze własności statystyczne
i praktyczne niż wersje stosowane wcześniej.
Perspektywy dalszych badań
Pewne zagadnienia rozpatrywane w tej rozprawie pozostają po jej zakończeniu nadal otwarte:
• ustalenie matematycznego modelu globalnej wypukłości;
• określenie granic praktycznego znaczenia korelacji jakości i odległości;
• określenie warunków na istnienie znaczącej globalnej wypukłości, np. typów instancji;
• ilościowe wyrażenie związku siły globalnej wypukłości i efektywności algorytmu memetycznego ją wykorzystującego;
• weryfikacja korelacji znalezionych w rozpatrywanych problemach przez wykorzystanie metody stosującej optima globalne;
• wykonanie analiz globalnej wypukłości dla kolejnych problemów kombinatorycznych;
• obiektywna ocena własności zaproponowanej wersji metody badania globalnej wypukłości;
• jednoznaczne określenie relacji pomiędzy twierdzeniami „nic za darmo” i praktycznymi problemami optymalizacji.
Dokładne przeanalizowanie tych zagadnień najprawdopodobniej doprowadzi w przyszłości do
lepszego zrozumienia własności przestrzeni trudnych problemów optymalizacji. W rezultacie może
to pomóc w ustaleniu solidnych podstaw do jeszcze bardziej efektywnych adaptacji algorytmów
metaheurystycznych do takich problemów.
Contents
Acknowledgement
iii
Extended abstract in Polish
v
1 Introduction
1
1.1
Context of the research subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Research subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3
Initial assumptions and the main hypothesis . . . . . . . . . . . . . . . . . . . . . .
3
1.4
Goals of the dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.5
Published work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2 Metaheuristics in combinatorial optimisation
2.1
5
Problems of combinatorial optimisation . . . . . . . . . . . . . . . . . . . . . . . .
5
2.1.1
Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.1.2
Examples of problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.2
Computational complexity of algorithms and problems . . . . . . . . . . . . . . . .
8
2.3
Methods of dealing with hard combinatorial optimisation problems . . . . . . . . .
9
2.4
Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.5
2.4.1
Local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.4.2
Ant colony optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.4.3
Hyperheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.4.4
Evolutionary algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.4.5
Memetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
Metaheuristics: schemes of algorithms which require adaptation . . . . . . . . . . .
20
3 The No Free Lunch theorems and their consequences for optimisation
3.1
21
Formulations of the theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
3.1.1
The original formulation by Wolpert and Macready . . . . . . . . . . . . . .
21
3.1.2
The strengthened formulation by Schumacher et al. . . . . . . . . . . . . . .
22
3.2
Major consequences of the theorems . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3.3
The No Free Lunch theorems vs. the practise of optimisation . . . . . . . . . . . .
23
3.3.1
Practical optimisation problems are not subject to No Free Lunch . . . . .
23
3.3.2
Practical algorithms are not subject to No Free Lunch . . . . . . . . . . . .
25
3.3.3
Not only the sampled points matter . . . . . . . . . . . . . . . . . . . . . .
26
Conclusions: practical implications of the theorems . . . . . . . . . . . . . . . . . .
27
3.4.1
No general tools of optimisation . . . . . . . . . . . . . . . . . . . . . . . .
27
3.4.2
There is some structure in search space of particular problems . . . . . . .
28
3.4.3
Structure should be exploited . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.4
I
II
Contents
3.4.4
Analysis first, exploitation second . . . . . . . . . . . . . . . . . . . . . . . .
28
3.4.5
What is structure? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.4.6
Caution required while evaluating algorithms on benchmarks . . . . . . . .
29
Implications for evolutionary algorithms . . . . . . . . . . . . . . . . . . . . . . . .
30
4 Adaptation of an evolutionary algorithm to a combinatorial optimisation
problem
31
3.5
4.1
Representation of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
4.2
Fitness function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
4.3
Initial population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
4.4
Crossover operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
4.4.1
Importance of crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
4.4.2
The Schema Theorem and the choice of crossover . . . . . . . . . . . . . . .
35
4.4.3
Adaptation of crossover to a problem . . . . . . . . . . . . . . . . . . . . . .
36
4.5
Mutation operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
4.6
Local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4.6.1
Place for local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4.6.2
Choice of a local search type . . . . . . . . . . . . . . . . . . . . . . . . . .
45
4.6.3
Choice of a neighbourhood . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
4.6.4
Neighbourhood and landscape structure . . . . . . . . . . . . . . . . . . . .
46
4.6.5
Efficiency of local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
4.7
Other components and techniques . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.8
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
5 Fitness-distance analysis
5.1
5.2
5.3
5.4
49
Fitness landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.1.1
Neighbourhood-based definition . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.1.2
Distance-based definition . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.1.3
Comparison of definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
5.1.4
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
5.1.5
Landscape and fitness function . . . . . . . . . . . . . . . . . . . . . . . . .
50
5.1.6
Landscape and distance measure . . . . . . . . . . . . . . . . . . . . . . . .
51
Fitness-distance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.2.1
Basic approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
5.2.2
Examples of analyses from the literature . . . . . . . . . . . . . . . . . . . .
55
Exploitation of fitness-distance correlation in a memetic algorithm . . . . . . . . .
63
5.3.1
Design of respectful recombination . . . . . . . . . . . . . . . . . . . . . . .
63
5.3.2
Adaptation of mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
5.3.3
Adaptation of local search . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
5.3.4
Adaptation of other components . . . . . . . . . . . . . . . . . . . . . . . .
69
Variants of the fitness-distance analysis . . . . . . . . . . . . . . . . . . . . . . . .
69
5.4.1
Analysis with only one global optimum known . . . . . . . . . . . . . . . .
69
5.4.2
Analysis with the distance to the best-known solution . . . . . . . . . . . .
69
5.4.3
Analysis with the average distance to all other local optima . . . . . . . . .
70
5.4.4
Analysis with the average distance to not worse solutions . . . . . . . . . .
71
5.4.5
Tests for the value of the FDC . . . . . . . . . . . . . . . . . . . . . . . . .
72
5.4.6
Analysis of a set of pairs of solutions . . . . . . . . . . . . . . . . . . . . . .
73
III
5.4.7
5.5
Comparison of all approaches . . . . . . . . . . . . . . . . . . . . . . . . . .
74
Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
5.5.1
Fitness-distance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
5.5.2
Exploitation of FDC in metaheuristic algorithms . . . . . . . . . . . . . . .
76
6 The capacitated vehicle routing problem
6.1
77
Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
6.1.1
Versions and extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
6.2
Instances used in this study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
6.3
Heuristic algorithms for the CVRP . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
6.3.1
Savings algorithm by Clarke and Wright . . . . . . . . . . . . . . . . . . . .
80
6.3.2
Sweep algorithm by Gillet and Miller . . . . . . . . . . . . . . . . . . . . . .
81
6.3.3
First-Fit Decreasing algorithm for bin packing . . . . . . . . . . . . . . . .
81
Metaheuristic algorithms for the CVRP . . . . . . . . . . . . . . . . . . . . . . . .
82
6.4.1
Iterated tabu search by Taillard . . . . . . . . . . . . . . . . . . . . . . . . .
82
6.4.2
Iterated tabu search by Rochat and Taillard . . . . . . . . . . . . . . . . . .
85
6.4.3
Route-based crossover by Potvin and Bengio . . . . . . . . . . . . . . . . .
87
6.4.4
Memetic algorithm by Prins . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
6.4.5
Cellular genetic algorithm by Alba and Dorronsoro . . . . . . . . . . . . . .
91
6.4.6
6.4
Other algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
7 Adaptation of the memetic algorithm to the capacitated vehicle routing problem
97
6.5
7.1
Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
7.2
Fitness function and constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
7.3
Local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
98
7.3.1
Merge of 2 routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
7.3.2
Exchange of 2 edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.3.3
Exchange of 2 customers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.3.4
Composition of neighbourhoods . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.3.5
Acceleration techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.3.6
Measuring the speed of local search . . . . . . . . . . . . . . . . . . . . . . . 106
7.4
7.5
7.6
Initial solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.4.1
Heuristic solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.4.2
Random solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Fitness-distance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.5.1
New distance metrics for solutions of the CVRP . . . . . . . . . . . . . . . 110
7.5.2
Distance measures defined in the literature . . . . . . . . . . . . . . . . . . 112
7.5.3
Random solutions vs. local optima . . . . . . . . . . . . . . . . . . . . . . . 114
7.5.4
Fitness-distance relationships . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.5.5
Main conclusions from the fitness-distance analysis . . . . . . . . . . . . . . 122
Recombination operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.6.1
CPX2: clusters preserving crossover . . . . . . . . . . . . . . . . . . . . . . 124
7.6.2
CEPX: common edges preserving crossover . . . . . . . . . . . . . . . . . . 125
7.6.3
CECPX2: common edges and clusters preserving crossover . . . . . . . . . 125
7.6.4
GCECPX2: greedy CECPX2 . . . . . . . . . . . . . . . . . . . . . . . . . . 127
IV
Contents
7.7
CPM: clusters preserving mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.8
Experiments with initial solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.9
Experiments with memetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.9.1
Long runs until convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.9.2
Runs limited by time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.9.3
Quality vs. FDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.9.4
Quality vs. feasibility of neighbours . . . . . . . . . . . . . . . . . . . . . . 142
7.10 Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8 The car sequencing problem
147
8.1
ROADEF Challenge 2005 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.2
Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.2.1
Other forms of the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.2.2
Groups of cars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.3
Computational complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.4
Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
8.5
Heuristic algorithms for the CarSP . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.6
8.7
8.5.1
Greedy heuristics by Gottlieb et al. . . . . . . . . . . . . . . . . . . . . . . . 153
8.5.2
Insertion heuristic by Ribeiro et al. . . . . . . . . . . . . . . . . . . . . . . . 155
Metaheuristic algorithms for the CarSP . . . . . . . . . . . . . . . . . . . . . . . . 156
8.6.1
Local search by Gottlieb et al. . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.6.2
Iterated local search by Ribeiro et al.
8.6.3
Local search and very large neighbourhood by Estellon et al. . . . . . . . . 157
8.6.4
Generic genetic algorithm by Warwick and Tsang . . . . . . . . . . . . . . . 159
8.6.5
Genetic algorithm by Terada et al. . . . . . . . . . . . . . . . . . . . . . . . 160
8.6.6
New crossover operators by Zinflou et al. . . . . . . . . . . . . . . . . . . . 161
. . . . . . . . . . . . . . . . . . . . . 157
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
9 Adaptation of the memetic algorithm to the car sequencing problem
165
9.1
Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.2
Fitness function and constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.3
Local search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.4
9.5
9.3.1
Insertion of a group index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.3.2
Swap of two group indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Initial solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
9.4.1
Exact algorithm for paint colour changes . . . . . . . . . . . . . . . . . . . 167
9.4.2
Kominek’s heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9.4.3
Extended Gottlieb and Puchta’s DSU heuristic . . . . . . . . . . . . . . . . 170
9.4.4
Random solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Fitness-distance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.5.1
Similarity measures for solutions of the CarSP . . . . . . . . . . . . . . . . 171
9.5.2
Random solutions vs. local optima . . . . . . . . . . . . . . . . . . . . . . . 174
9.5.3
Fitness-distance relationships . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.5.4
Main conclusions from the fitness-distance analysis . . . . . . . . . . . . . . 185
9.6
CCSPX: conservative common subsequence preserving crossover
. . . . . . . . . . 185
9.7
RSM: random shuffle mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.8
Adaptation of crossovers from the literature . . . . . . . . . . . . . . . . . . . . . . 187
V
9.8.1 Adaptation of NCPX . . . .
9.8.2 Adaptation of UAX . . . . .
9.9 Experiments with initial solutions . .
9.10 Experiments with memetic algorithm
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
187
187
188
190
9.10.1 Long runs until convergence
9.10.2 Runs limited by time . . . .
9.10.3 Quality vs. FDC . . . . . .
9.11 Summary and conclusions . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
190
196
197
198
.
.
.
.
10 Conclusions
199
10.1 Summary of motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
10.2 Contribution of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
10.3 Perspectives for further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
A Names of instances of the car sequencing problem
203
B Detailed results of memetic algorithms
205
Bibliography
211
Chapter 1
Introduction
1.1
Context of the research subject
In modern economy, administration and science problems of optimisation are very often encountered, because decision makers are usually interested in cost-effective assignment of available resources to tasks or in solutions to otherwise unsolvable problems.
Consider as an example a car manufacturer who is concerned with the number of cars produced
during a shift. There are limits to his plant’s throughput due to staff and technological constraints.
But the operations of staff and machines may be optimised without violating the constraints, e.g. by
arranging production tasks in proper order, so the plant’s throughput is increased.
Another example is a manager of a hospital unit who has to assign nurses to shifts (build a
work schedule) for a period of a month. The goal of such a schedule is to balance the workload
of nurses, satisfy their preferences concerning working hours and days-off and minimise personnel
requirements. There are also some limiting legal regulations which influence the possible assignments. Thus, it is not easy for the manager to build a schedule which satisfies all the demands.
This leads to an optimisation problem of building a schedule which improves the conditions of
work.
Yet another example might be a biochemist who wants to determine the complete sequence of
a genome. The whole sequence cannot be accurately checked at once (a technological constraint);
it has to be fragmented before analysis. When the analysis is completed, the fragments have to
be assembled into one. Since the process of analysis does not preserve the ordering of fragments,
the original order is lost. This gives rise to an optimisation problem: short sequences have to be
assembled into the original form, maximising e.g. the amount of overlaps between fragments.
An optimisation problem may be of two major types: continuous or combinatorial. This thesis
is concerned with certain combinatorial problems. The three examples sketched above are in fact
combinatorial optimisation problems of car sequencing, nurse scheduling and DNA assembly. Such
problems deal with finite sets of potential solutions, e.g. sequences of some elements or subsets of
a given set, as opposed to infinite number of potential solutions to continuous ones. The desired
solution of such a problem is usually an object (e.g. a sequence, a subset) which maximises or
minimises the given objective function (having the meaning of gain or cost).
This feature of combinatorial problems does not always render them easy ones, since finite sets
may still have huge cardinalities and may not be easy to search through. As practise demonstrates,
many combinatorial optimisation problems are very hard to solve to optimality with acceptable
time limits. The theory of computational complexity labels such problems NP-hard.
Consequently, computer scientists deal with such hard problems by means of approximate
1
2
Introduction
approaches: heuristic or metaheuristic algorithms. The goal of such algorithms is to generate
some solutions in reasonable time. These solutions are only suboptimal, meaning that they may
not be optimal, but still demonstrate good quality (e.g. low cost).
The literature on heuristic and metaheuristic algorithms is huge. When one decides to use a
metaheuristic, there is a large number of types to choose from: local search, tabu search, simulated
annealing, evolutionary (and memetic) algorithms, just to name a few. Moreover, metaheuristics
are not ready-to-use algorithms, but they are only schemes of algorithms which have to be adapted
to the optimisation problem under consideration. Thus, when one decides upon a certain algorithm,
one has further to design components of the algorithm in order to obtain a functional program.
However, it appears that metaheuristic algorithms are not general tools of optimisation; they
are not equally well-performing across all possible optimisation problems and certainly there exist
cases when each metaheuristic performs poorly. This fact makes the choice of a proper algorithm
for a problem the first important issue.
Another crucial factor for the chosen algorithm’s performance is the already mentioned design of
its components. As it may be seen in the literature, this design severely affects the final algorithm’s
performance (its speed; the quality of solutions it produces). As a consequence, this process of
adaptation of a metaheuristic algorithm to the problem at hand should be performed with care
and justified as well as possible.
Unfortunately, at the moment this process of adaptation of a metaheuristic appears to be
more like craft than science or engineering. There is a clear lack of design guidelines for users of
metaheuristics, lack of schemes of adaptation of algorithms to problems.
1.2
Research subject
Some initial research on schemes of adaptation of metaheuristic algorithms to problems may be
found in the literature. Such schemes are mainly based on analysis of certain properties of problems.
The rationale behind this approach is to exploit knowledge about the problem’s properties in the
design of algorithm’s components, since knowledge-less algorithms are expected to perform poorly.
One of such properties of problems is fitness-distance correlation (also called ’big valley’ or
global convexity). It is a phenomenon occurring in the space of solutions of a problem’s instance:
as solutions become better, the distance between them decreases (with respect to some problemspecific distance measure). This property also assumes that the best solutions (global optima) are
roughly in-between other good solutions.
The hypothesis of fitness-distance correlation was examined and the property was found for a
number of classical combinatorial optimisation problems. Even though there existed few instances
of these problems which did not reveal the property, some algorithmic components exploiting
fitness-distance correlation were proposed and eventually lead to well-performing metaheuristics.
One of the approaches was to design distance-preserving crossover operators for a metaheuristic
called the memetic algorithm.
Still, fitness-distance analysis have not been performed for many hard problems of combinatorial
optimisation. The main issue while testing fitness-distance correlation is the definition of some
distance between solutions of the problem. Many researchers tend to understand distance only in
terms of Hamming or Euclidean metric, while for combinatorial problems (with sequences, sets or
graphs involved) these measures are hardly applicable.
Consequently, there is still scope for research in this area. If some new distance measures
for combinatorial objects (solutions) are defined and fitness-distance correlation is found, then
components of metaheuristics may be designed based on this feature (e.g. distance-preserving
1.3. Initial assumptions and the main hypothesis
3
crossover operators for memetic algorithms). Therefore, this thesis focuses on the scheme of
adaptation of a memetic algorithm which is founded on positive results of fitness-distance analysis
and leads to the design of distance-preserving operators. This scheme is applied to algorithms
solving problems which have not yet been analysed for fitness-distance relationship: a vehicle
routing problem and a car sequencing problem.
1.3
Initial assumptions and the main hypothesis
The assumptions of this thesis are the following:
• the two chosen problems of combinatorial optimisation reveal the property of fitness-distance
correlation
• the presence of fitness-distance correlation in the fitness landscape of a problem facilitates
the design of efficient memetic algorithms.
The main hypothesis of this dissertation states that adaptation of a memetic algorithm to
a combinatorial optimisation problem should lead to the design of distance-preserving crossover
operators if only fitness-distance correlation is revealed. In such a case the adapted algorithm
will generate solutions of not worse or even better quality than the same algorithm with other
operators.
1.4
Goals of the dissertation
The goal of this dissertation is to perform and evaluate the scheme of adaptation of the memetic
algorithm which is based on fitness-distance correlation. This scheme is applied to and evaluated
on two chosen problems of combinatorial optimisation: the capacitated vehicle routing problem
and the car sequencing problem.
This main goal consists of the following sub-goals:
• the design and implementation of fast local search procedures for the two problems,
• the examination of fitness-distance correlation in fitness landscapes of the problems (this
includes defining proper distance/similarity measures for solutions of the problem),
• the design of distance-preserving crossover operators,
• the design and implementation of memetic algorithms which use the operators,
• the comparison of the operators with the ones which may be found in the literature for the
same problems,
• the analysis of performance of the resulting memetic algorithms.
1.5
Published work
Some contents of this thesis have been already published by the author in conference proceedings,
scientific journals and a book chapter.
• The method of fitness-distance analysis based on the examination of a set of pairs of solutions
was proposed and first employed by Kubiak (2005) and Kubiak (2007). Here it is discussed
in more detail in section 5.4.6.
4
Introduction
• Local search acceleration techniques for the capacitated vehicle routing problem were published by Kubiak & Wesołek (2007). They are discussed in section 7.3.5.
• First elements of the fitness-distance analysis of the capacitated vehicle routing problem
were published by Kubiak (2004) and Kubiak (2005). A complete analysis, which also used
distance measures of other authors, was presented by Kubiak (2007). This analysis makes
the contents of section 7.5.
• The first set of distance-preserving recombination operators for the problem was published
by Kubiak (2004). Here they are further developed in section 7.6.
• Fitness-distance analysis of the car sequencing problem was partially published by Jaszkiewicz
et al. (2004) and later extended by Kubiak et al. (2006). This is described in section 9.5.
• First distance-preserving recombination operator and the memetic algorithm for the car
sequencing problem was first published by Jaszkiewicz et al. (2004). A modified operator is
developed in this thesis in section 9.6.
Additionally, the analysis of distance between solutions of the capacitated vehicle routing problem generated by the memetic algorithm was published by Kubiak (2006). Although beyond the
scope of the thesis, this subject is closely related to the fitness-distance analysis and performance
of the designed memetic algorithm.
Chapter 2
Metaheuristics in combinatorial
optimisation
2.1
Problems of combinatorial optimisation
2.1.1
Basic definitions
Combinatorics deals with finite sets and structures, such as orderings, subsets, assignments, graphs,
etc. (Bronshtein et al. 2004). Similarly, combinatorial optimisation is interested in such structures,
which are the basis for defining its problems.
According to Błażewicz (1988) and Hoos & Stutzle (2004), a combinatorial optimisation problem
(COP) defines a finite set of some combinatorial parameters the values of which does not have
to be entirely known in advance. A problem instance completes the definition by setting all the
parameters to certain values. A feasible solution to a problem instance is a combinatorial object
(e.g. a number, a set, a function, etc.) which observes the constraints of the parameters given
in advance. In the problem there is also an objective function defined, which assigns a numerical
value to each solution. This function is to be optimised, meaning that an optimal solution should
be found. The optimal solution to a problem instance is a feasible solution which minimises or
maximises the objective function; the direction of optimisation is always given in the definition of
the problem.
The definition given above, although stresses well the most important aspects of combinatorial
optimisation and gives the necessary intuition about them, is rather informal. Precisely speaking,
a combinatorial optimisation problem π consists of (Merz 2000, Kominek 2001):
• a set of problem instances Dπ ,
• for each instance I ∈ Dπ , a finite set Sπ (I) of feasible solutions,
• an objective function fπ : Dπ × Sπ (I) → Q which assigns a fractional number q ∈ Q for each
solution s ∈ Sπ (I) of instance I ∈ Dπ ,
• the direction of optimisation (either maximisation or minimisation).
An optimal solution s∗ of a problem instance I ∈ Dπ (for a minimisation problem π) is a
feasible solution which has the minimum value of the objective function fπ among all feasible
solutions:
∀s ∈ Sπ (I) :
fπ (I, s∗ ) ≤ fπ (I, s)
For a maximisation problem only the direction of the inequality changes.
5
6
Metaheuristics in combinatorial optimisation
A combinatorial optimisation problem π is solved by an algorithm which generates an optimal
solution for each instance of the problem or indicates that there is no feasible solution for the
instance at all.
As it turns out in practise, one of the most important questions regarding COPs deals with
the running time of such algorithms (to be precise: with their time complexity).
2.1.2
Examples of problems
Before proceeding to a discussion on time-complexity of algorithms, it is useful to see some examples
of combinatorial optimisation problems.
Minimisation of the number of paint colour changes in a paint shop
Consider a production day in a car factory. A set of cars is to be produced. Each car has a
paint colour code assigned, which defines its final body colour. This set of cars is to be put on a
production line, so the order (sequence) of cars has to be determined. But each change of colour
between two consecutive cars in a sequence generates additional cost; if a colour changes, then
spray guns in the paint shop have to be purged. Therefore, the goal of scheduling the cars is to
minimise the number of paint colour changes in the sequence. Additionally, a number is given
which limits the maximum number of consecutive cars with the same colour. This paint batch
limit (PBL) reflects the fact that the spray guns have to be purged regularly.
This problem is a part of a larger one, called the car sequencing problem (CarSP) (Cung 2005b)
and was defined by French car manufacturer Renault. The latter also defines other characteristics
of cars and some more constraints and components of the objective function, but here only the
colour objective is described.
As an example, lets consider an instance of the problem with 10 cars of colour 1, 9 cars of colour
2 and 8 cars of colour 3. The paint batch limit is set to 5. A feasible and an optimal solution to
the instance are shown in figure 2.1.
112233222221111122333331113
111112222233333111112222333
Figure 2.1: A feasible (top) and an optimal (bottom) solution to an exemplary
instance of the car sequencing problem (the colour objective only).
The feasible solution induces 8 colour changes between consecutive cars. The optimal one
contains only 5 colour changes.
Minimisation of the cost of deliver of goods from a central depot to distributed
customers
The second example is problem concerned with deliveries. A transportation company has to deliver
some goods (e.g. petrol) from its depot to a number of geographically distributed customers. All
distances between customers and the depot are known. The company possesses some vehicles, all
with the same capacity limit. These vehicles start their deliveries at the depot, travel to customers,
unload the demanded amounts of goods and return to the company’s depot. Each customer is
serviced exactly once by one vehicle. The goal of the company is to create a delivery plan for its
vehicles (i.e. for each vehicle the order of customers it visits) so the total distance travelled by all
the vehicles is minimised.
2.1. Problems of combinatorial optimisation
7
This informal description defines the capacitated vehicle routing problem (CVRP) (Toth &
Vigo 2002b).
Sketches of a feasible and an optimal solution for an instance of the CVRP are shown in figure
2.2. The instance contains 50 customers (indicated by circles); the depot is the centrally located
circle without connected lines (for the sake of the figure’s clarity). A solution contains edges (lines)
between the depot and customers (circles). A sequence of edges starting and finishing at the depot
(the half-drawn lines) defines a route of one vehicle.
Figure 2.2: A feasible (left) and an optimal (right) solution to an exemplary instance of the capacitated vehicle routing problem.
The feasible solution consists in 5 routes with their total length (cost) of 579 units. The optimal
solution also contains 5 routes; the cost equals to 524.
Other problems
The two problems described above are just some examples of COPs. The set of combinatorial
optimisation problems is vast and diverse. There are classical problems, with short and easy
formulation, such as:
• the problem of satisfiability of boolean expressions (Hoos & Stutzle 2004),
• the knapsack problem (Błażewicz 1988),
• the bin packing problem (Falkenauer 1998),
• the travelling salesman problem (TSP) (Cormen et al. 1990, Hoos & Stutzle 2004),
• the graph colouring problem (Galinier & Hao 1999, Falkenauer 1998).
There are also problems with more complex definitions, like diverse scheduling problems (Coffman
1976, Słowiński 1984, Błażewicz 1988) or vehicle routing problems (Toth & Vigo 2002b).
These problems are not only theoretical ones, but they arise in practical situations of management, where limited resources (people, time, rooms, machines, trucks, etc.) have to be assigned to
tasks and the gain from this assignment has to be maximised (or its cost minimised). This practical significance of combinatorial optimisation problems may be also seen in the cycle of challenges
organised by ROADEF, the French society of operational research and decision support. This
8
biannual series of computational challenges, launched in 1995, always deals with an optimisation
problem posed by some institution or company which has to solve it in its everyday operations.
For example, the challenge problems in years 2003, 2005 and 2007 were formulated by ONERA
and CNES (French space agencies) (Jaszkiewicz 2004), Renault (a car manufacturer) (Jaszkiewicz
et al. 2004), and France Telecom (a telecommunication company).
2.2
Computational complexity of algorithms and problems
As it was indicated earlier in the chapter, an important property of an algorithm for a given
problem is its function of time complexity. It is a function which depends on the instance size (the
size of input data) and bounds from above the number of steps of the algorithm (or its running
time) (Błażewicz 1988).
A crucial characteristic of this complexity function is whether it may be bounded from above
with a polynomial of the instance size. If that is the case, then the algorithm is called polynomial ;
otherwise it is said to be exponential (Błażewicz 1988, Cormen et al. 1990).
An exponential algorithm usually cannot solve to optimality instances of practical sizes in
reasonable time: the running time of such an algorithm quickly grows to infinity with increasing
instance size. Even the incredible growth of the computational power of processors, which has
been observed in recent years and is well described by the Moore’s Law, is not able to overcome
this fundamental issue. That is why exponential algorithms are called inefficient (Błażewicz 1988).
On the other hand, polynomial ones are called efficient.
Consequently, when dealing with new problems of combinatorial optimisation, the first step
toward a solution requires searching for a polynomial algorithm. If one is found, then it may be
said that the problem is computationally easy. Such problems (their decision counterparts, to
be precise) belong to class P: solvable in polynomial time. If this step fails, however, it might
mean that the problem at hand is a difficult one. For many COPs polynomial algorithms have
not been found, which suggests that they are indeed more difficult than problems from P. This
observation led to construction of a class of NP-hard problems of optimisation (or NP-complete
for their decision versions) (Błażewicz 1988, Cormen et al. 1990)
Until now, it has not been proved whether NP-hard problems are really more difficult than
problems from P. Nevertheless, the NP-hard class is constructed in such a way that if a polynomial
algorithm is found for only one problem of this type, then it is found for all of them. Given the fact
that research in the theory of computational complexity has been conducted for some 30–40 years
now and none such algorithm has been found (instead, the class of NP-hard problems has been
enlarged greatly), it is commonly believed that there are no polynomial algorithms for NP-hard
problems. (Błażewicz 1988, Cormen et al. 1990). This is also the author’s personal opinion.
This assumption, or rather a belief, has major consequences to the practise of combinatorial
optimisation: for practically interesting problems it is most likely that there are no efficient algorithms which could solve them. From the two problems described briefly in section 2.1.2, the
CVRP is NP-hard, whereas the CarSP (the decision counterpart with the colour criterion) is in
P. More practical version of the CarSP is, however, also NP-hard. As for the other problems
mentioned earlier, almost all of them have the proved NP-hard status. The exception are some
simple versions of scheduling and vehicle routing problems, but still the majority of more complex
versions of such problems are also computationally hard.
2.3. Methods of dealing with hard combinatorial optimisation problems
2.3
9
Methods of dealing with hard combinatorial optimisation problems
Nevertheless, many hard COPs have so important practical applications that they must be solved
somehow. In the last 30–40 years, several general methods of solving hard problems were proposed
(Błażewicz 1988, Cormen et al. 1990).
One of the approaches is to employ exponential algorithms. This may give satisfactory running
times when the considered instances of a problem are small. Examples may be: branch and bound
or pseudopolynomial algorithms (Błażewicz 1988, Michalewicz & Fogel 2000, Hoos & Stutzle 2004).
Another possibility is to use some approximation algorithms. In this approach the goal is to
‘generate a solution with the objective function value differing only slightly from the value of the
optimal solution’1 (Błażewicz 1988). Thus, usually some measure of the relative error (excess) of
a generated solution s with respect to the optimal solution s∗ is introduced (Cormen et al. 1990),
which should be minimised:
|f (s) − f (s∗ )|
²(s) =
f (s∗ )
In this group of methods examples may be: special-purpose algorithms with proved upper bound
on the relative error (polynomial-time approximation schemes) (Cormen et al. 1990), heuristics
(like greedy algorithms) or metaheuristics (Michalewicz & Fogel 2000, Hoos & Stutzle 2004).
Since the memetic algorithm, a kind of a metaheuristic, is the subject of this thesis, this group
of methods is dealt with in more detail in a separate section.
2.4
Metaheuristics
Metaheuristics, in their original definition, are solution methods that orchestrate an
interaction between local improvement procedures and higher level strategies to create
a process capable of escaping from local optima and perform a robust search on a
solution space. Over time, these methods have also come to include any procedures
that employ strategies for overcoming the trap of local optimality in complex solution
spaces (Glover & Kochenberger 2003)
Due to this great diversity of methods it is hard to enumerate properties common to all metaheuristics. However, some characteristics, which are frequent among these methods, may be given.
1. Metaheuristics are not algorithms in the strict sense, but schemes of algorithms which are
defined without any particular problem in mind.
2. They usually search the space of complete solutions of the given problem.
3. They work by iteratively repeating the same main steps.
4. Most of them are inspired by some natural phenomenon (e.g. physical, biological).
Metaheuristics are not exactly algorithms, because in their definitions mechanisms and components are only ambiguously described. Many design decisions are left to the practitioner who
adapts a metaheuristic to the given problem.
These kinds of algorithms usually work on complete solutions of the problem at hand, using
the so-called perturbative search (Hoos & Stutzle 2004). This means that they employ some kind
of transformation from one complete solution to another by changing some solution components.
These components may be, for example:
1 the
author’s translation from the Polish original
10
• an edge or a vertex in the TSP,
• an assignment of a truth value to a binary variable in the MAX-SAT problem,
• an order of two tasks on a machine in job-shop scheduling,
• an assignment of a car to a position on a production line in the CarSP.
Just like any other approximate approach to COPs, metaheuristics do not guarantee that
optimality is reached. However, they have proved to be very efficient in providing good suboptimal
solutions to many complex, real-world problems (Aarts & Lenstra 2003a, Hoos & Stutzle 2004,
Michalewicz & Fogel 2000, Glover & Kochenberger 2003).
That is why there is so much interest in the development of such methods, and so many types
of metaheuristics have been proposed and tested on a variety of problems. These include:
• local search (Aarts & Lenstra 2003a, Hoos & Stutzle 2004, Michalewicz & Fogel 2000),
• tabu search (Gendreau 2003, Hertz et al. 2003, Michalewicz & Fogel 2000),
• simulated annealing (Aarts et al. 2003, Henderson et al. 2003, Michalewicz & Fogel 2000),
• genetic algorithms and evolutionary computation (including memetic algorithms) (Goldberg
1989, Michalewicz 1996, Michalewicz & Fogel 2000, Reeves & Rowe 2003, Reeves 2003,
Moscato & Cotta 2003, Muhlenbein 2003),
• scatter search (Glover et al. 2003),
• variable neighbourhood search (Hansen & Mladenović 2003, Hoos & Stutzle 2004),
• greedy randomised adaptive search procedures (Resende & Ribeiro 2003, Hoos & Stutzle
2004),
• ant colony optimisation (Dorigo & Stutzle 2003, Hoos & Stutzle 2004),
• hyperheuristics (Burke, Kendall, Nevall, Hart, Ross & Schulenburg 2003).
Some of the methods are not described here, because the literature on metaheuristics is huge;
the interested reader is referred to the cited works. Instead, only some metaheuristics are closer
characterised; these are the ones more recent and perhaps less known than e.g. tabu search or
simulated annealing, or those of special interest to this thesis. In particular, evolutionary and
memetic algorithms are presented in more detail.
2.4.1
Local search
Local search is a basic and simple metaheuristic. It ‘starts off with an initial solution and then
continually tries to find better solutions by searching neighbourhoods’ (Aarts & Lenstra 2003a).
The neighbourhood of a solution is a set of solutions which are in some sense close to it. For a
given instance I ∈ Dπ of a given optimisation problem π the neighbourhood is usually defined as
a function (Aarts & Lenstra 2003a, Michalewicz & Fogel 2000):
N : S(I) → 2S(I)
The notion of a neighbourhood implies the fundamental notion of a locally optimum solution
(Aarts & Lenstra 2003a, Hoos & Stutzle 2004, Michalewicz & Fogel 2000). Solution s ∈ S(I) is a
local optimum (minimum) with respect to neighbourhood N when:
∀sn ∈ N (s)
f (s) ≤ f (sn )
2.4. Metaheuristics
11
Given these definitions, the local search starting from solution s is formulated in algorithm 1.
Algorithm 1 LocalSearch(s).
1: repeat {main local search loop}
2:
s0 = s
3:
betterF ound = false
4:
for all sn ∈ N (s) do {iterate over the neighbours of s}
5:
if f (sn ) < f (s0 ) then {check if this is the best neighbour}
6:
s0 = sn {remember the better neighbour}
7:
betterF ound = true
8:
if betterF ound then
9:
s = s0 {proceed to the better neighbour}
10: until betterF ound == false {stop at a local optimum}
11: return s
It should be noted that local search always returns a locally optimum solution.
This algorithm is by some authors called ‘iterative improvement’ (Aarts & Lenstra 2003a, Hoos
& Stutzle 2004), in order to distinguish it from other, more complex methods collectively called
by them ‘(stochastic) local search’ (like simulated annealing, tabu search or genetic algorithms).
While it is authors’ right to name methods as they please, in this thesis local search always and
only refers to the ‘iterative improvement’ scheme given above.
There are two main versions of local search, which differ in the way an improving neighbour of
s from N (s) is chosen (Aarts & Lenstra 2003a):
• best improvement: always the whole neighbourhood is examined and the best improving
neighbour is chosen as the new current solution; this one is also called steepest local search
and this is the version shown in algorithm 1;
• first improvement: the first neighbour found in N (s) which improves the objective function
is chosen as the new solution; this version is also called greedy local search and may be
implemented by putting an additional break statement after line 7 of algorithm 1.
It is difficult to trace the inspiration for local search applied to COPs. Perhaps it was motivated
by gradient search methods known earlier in numerical optimisation.
According to Aarts & Lenstra (2003a), first trials with local search in combinatorial optimisation were performed in late 1950s on the TSP, with the use of an edge-exchange neighbourhood.
Some examples of local search include:
• k-exchanges for the TSP (Hoos & Stutzle 2004),
• the Clarke and Wright algorithm invented originally for the CVRP (Clarke & Wright 1964)
and also applied to the TSP (Aarts & Lenstra 2003a, Hoos & Stutzle 2004),
• edge-exchange-based algorithms for vehicle routing problems (Kindervater & Savelsbergh
2003).
More modern examples of pure local search are not easy to find, because it is relatively straightforward to extend this kind of algorithm to simulated annealing, tabu search or some other hybrid
approach (Aarts & Lenstra 2003a, Hoos & Stutzle 2004, Michalewicz & Fogel 2000). Hence, local
search is usually a component of these more complex methods.
It can be seen from the description above that local search is not an ‘out-of-the-box’ solution
to COPs; it requires adaptation to the specific problem being solved. It means that a designer
has to define: the neighbourhood(s) being used, the way an initial solution is generated and the
improvement rule (either first or best improvement).
12
2.4.2
Ant colony optimisation
Ant colony optimisation algorithms (ACO) were inspired by the behaviour of foraging ants (Dorigo
& Stutzle 2003). When ants search for food in the area surrounding their ant-hill, they usually
roam randomly in order to find some. But when food is found, ants are able to optimise the route
from the ant-hill to this place and back. This optimisation is performed by ants collectively using
a chemical intermediate, the so-called pheromone trail. Some amount of pheromone is left by each
ant on the route it traversed; the pheromone evaporates with time unless there is another ant
which could sustain or amplify its level.
This inspiration may be translated into the following algorithmic scheme (Hoos & Stutzle 2004).
Algorithm 2 Ant colony optimisation.
Initialise pheromone trails τi
while not stopping do
for all solutions sj in the population do
Construct sj by a randomised procedure based on a heuristic function h and pheromone
trails τi
Update the pheromone trails τi based on the current contents of all solutions sj
Update the currently best-found solution
return the best-found solution
In this algorithm one solution sj is related to an ant in the biological metaphor. At the
beginning of each iteration each ant creates its solution from scratch by subsequently choosing and
inserting solution components into it. This choice is done in an almost greedy way: the components
are chosen probabilistically based on their current evaluation (by the heuristic evaluation function
h) and past evaluations (by the pheromone trail τi ). When the solution of each ant is created, the
pheromone trail of each solution component i is updated. At each step of the ACO algorithm the
trail evaporates to some extent on each possible component i and is amplified on components i
which are present in some solutions sj generated by ants; the amount of amplification is related
to solution quality.
The algorithm was invented by Marco Dorigo in early 1990s, with the first application to the
TSP. According to Hoos & Stutzle (2004), in later years a local search phase was added; it was
performed on each solution separately after the construction phase, and irrespectively of the levels
of pheromone.
Some examples of ACO algorithms are described by Hoos & Stutzle (2004). A simple ACO
for the TSP defines an edge in the input graph as a solution component. The heuristic function,
which evaluates components during the construction phase, is the reciprocal of the edge weight.
Pheromone trail is also defined for each edge.
Another examples for the TSP are the max-min ant system (Hoos & Stutzle 2004) or an ACO
algorithm described by Boryczka et al. (2006). An ant system for the CVRP is presented by
Reimann et al. (2002). Gottlieb et al. (2003) and Gravel et al. (2005) present applications to the
car sequencing problem.
Ant colony optimisation, like any other metaheuristic, has to be adapted to the problem to be
solved. Here, this adaptation requires: a clear definition of a solution component (for the sake of
a construction heuristic and pheromone data structures); a randomised construction heuristic; a
rule of pheromone update.
2.4. Metaheuristics
2.4.3
13
Hyperheuristics
This approach has been inspired by a very pragmatic and market-oriented point of view on COPs
(Burke, Kendall, Nevall, Hart, Ross & Schulenburg 2003): the well-known metaheuristic methods
are too problem-specific and knowledge-intensive to be practically useful in cheap and easy-touse computer systems. They are tailor made to specific problems and when characteristics of the
problem change slightly, they fail to deliver good solutions. Moreover, the extensive use of problemspecific information makes them too resource-intensive in the development, which is unacceptable
to small companies; they prefer ‘good enough - soon enough - cheap enough’ solutions (Burke,
Kendall, Nevall, Hart, Ross & Schulenburg 2003, Soubeiga 2003) which are more general than
special purpose methods.
Therefore, the advocates of hyperheuristics argue that combining simple heuristics is cheaper to
implement and easier to use. This idea leads to ‘using (meta-)heuristics to choose (meta-)heuristics
to solve the problem in hand’ (Burke, Kendall, Nevall, Hart, Ross & Schulenburg 2003), which
may be realised in the following algorithm.
Lets assume that a set H of simple constructive heuristics is given, H = {h1 , h2 , . . . , hm }.
Hyperheuristics assume that each heuristic can be applied to a partial solution and adds only
one component to it, so these low-level heuristics could work alternately. At the beginning of the
algorithm the solution is empty (s0 ) and the goal is to reach a complete solution (sn ) in some
stage n of the algorithm. A very basic hyperheuristic might work as shown in algorithm 3.
Algorithm 3 Basic hyperheuristic.
Create an empty initial solution s0
i=0
while solution si is not complete do
Choose a heuristic hj from H to apply to the current state of the built solution, si
Apply hj to si obtaining si+1
i=i+1
return si
In this scheme the control over low-level heuristics is very simple and leads only to the construction of one solution. This high-level steering may be more sophisticated; examples in the
literature include diverse approaches: a choice function, tabu search, a genetic algorithm or variable neighbourhood search hyperheuristics (Soubeiga 2003, Qu & Burke 2005, Remde et al. 2007).
An important property of a hyperheuristics is that the high-level control is completely detached
from the problem it is trying to solve; it has neither the knowledge about the problem instance
data, nor on the solutions it constructs, except for the objective function values. It works only on
the supplied low-level heuristics (Burke, Kendall, Nevall, Hart, Ross & Schulenburg 2003).
Although some authors indicate that the first hyperheuristics appeared as early as 1960s
(Soubeiga 2003), major interest in such methods could have been seen in 1990s, with a substantial increase in publications in early years of the 21st century (Burke, Kendall, Nevall, Hart,
Ross & Schulenburg 2003, Soubeiga 2003) (mainly due to two British universities: University of
Nottingham and Napier University).
Applications of hyperheuristics mainly concern real-world problems:
• personnel scheduling with a choice function or tabu search hyperheuristic (Soubeiga 2003,
Burke, Kendall & Soubeiga 2003),
• workforce scheduling with a random or greedy hyperheuristic (Remde et al. 2007),
14
• exam timetabling at universities with a variable neighbourhood hyperheuristic (Qu & Burke
2005).
The inventors of hyperheuristics might argue that this type of algorithms is more general than
other metaheuristics and does not require problem-specific components. However, to the author of
this thesis it is clear that at least some adaptation of a hyperheuristic to the problem is required.
Firstly, the set of low-level heuristics has to be provided. This also implies that they have to
possess exactly the same interface (parameters, return values) and the same general behaviour
(transition from one to the next stage of a solution construction process) in order to be used
alternately. Moreover, it means that the concept of a state of a solution hat to be clearly defined
during design and implementation of a hyperheuristic: low-level heuristics have to share exactly
the same concept. On top of that, the high-level control hyperheuristic, even though detached from
the problem and its particularities, also has to be chosen in some way for a specific application.
2.4.4
Evolutionary algorithms
Evolutionary algorithms (EAs) were inspired by the phenomenon of natural selection in the world
of living organisms (Goldberg 1989, Michalewicz 1996, Reeves & Rowe 2003). They mimic this
phenomenon by performing optimisation on a set of solutions (a population) at each search step (a
generation) and by repeating the artificial counterparts of crossover, mutation and selection. It is
hoped that such an artificial evolution will lead to good solutions of the considered problem, just as
the natural evolution ‘generated’ complex living organisms which are well-adapted to demanding
environments.
The general scheme of an evolutionary algorithm is shown as algorithm 4 (Reeves & Rowe 2003).
Algorithm 4 General scheme of an evolutionary algorithm.
Generate a population of initial solutions
while termination condition not satisfied do
while not sufficient offspring generated do
if crossover condition satisfied then
Select parent solutions
Choose crossover parameters
Perform crossover obtaining an offspring solution
if mutation condition satisfied then
Select a solution for mutation
Choose mutation parameters
Perform mutation obtaining an offspring solution
Evaluate fitness of offspring
Select new population from the current one
return the best solution in the population
The history of such algorithms reaches early 1950s, and, as noted by Michalewicz & Fogel
(2000), such an approach was invented approximately 10 times by different scientists and under
different names: genetic algorithms (GAs), evolution strategies, evolutionary programming or
genetic programming. In 1990s the equivalence of these methods was demonstrated and most of
them finally combined into what is now known as evolutionary algorithms, by borrowing ideas and
mechanisms from each other.
Examples of EAs adapted to combinatorial problems are numerous in the literature:
• the book by Falkenauer (1998) is devoted to GAs applied to grouping problems (e.g. bin
packing, graph colouring);
2.4. Metaheuristics
15
• a genetic algorithm for a timetabling problem is described by Lewis & Paechter (2005a);
• applications of EAs to production scheduling are the theme of the work by Pawlak (1999);
• resource-constrained project scheduling problem is solved by a genetic algorithm due to
Kominek (2001);
• genetic algorithms for vehicle routing problems were proposed by Potvin & Bengio (1996),
Tavares et al. (2003), Baker & Ayechew (2003);
• GAs for car sequencing problems are described by Warwick & Tsang (1995), Terada et al.
(2006), Zinflou et al. (2007).
Even more examples of EAs for COPs are given further, together with the description of basic
mechanisms of artificial evolution.
Representation of solutions
Although not directly visible in the algorithm outline above, the issue of representation of solutions
(sometimes called the ‘genetic’ representation), which is used in an application of the EA, is an
important one, since it impacts the algorithm performance. It also influences other components
of the EA: crossover and mutation operators; the usage of certain representations makes the
application of some operators much easier.
Binary representation In this approach solutions are always represented as binary strings
and only such strings are manipulated in an evolutionary algorithm (Goldberg 1989, Michalewicz
1996, Reeves & Rowe 2003). Some research indicated the usefulness of Gray codes while encoding
integers (Michalewicz 1996). Goldberg also advocated the use of ‘messy’ binary encodings with
variable length and redundancy (Goldberg 1989, Michalewicz 1996).
There are problems for which this kind of representation is a natural one (e.g. NK-landscapes,
binary quadratic programming, graph bi-partitioning (Merz 2000)) but in other cases this representation requires that specific encoding-decoding procedures are used.
Floating-point representation Here, solution parameters are represented as vectors of real
numbers. In some cases the direct problem parameters are accompanied by special values of
difference in mutations (the delta encoding (Michalewicz 1996)).
This representation is well-suited to problems with numerical decision variables (Michalewicz
& Fogel 2000).
Specific combinatorial representations In case of COPs, many problem-specific representations have been proposed in the literature. These representations reflect the diverse nature of
combinatorial structures and constraints arising in practical applications. For example:
• there are adjacency, ordinal, path, edge-list and matrix representations for the TSP (Michalewicz
1996, Merz 2000);
• a permutation representation is used for the QAP (Merz 2000),
• in grouping problems (Falkenauer 1998, Michalewicz 1996) there are representations indicating the membership of an object to a group, the order of insertion into groups (requiring a
decoding procedure) or specific ‘group-oriented’ representations;
16
• a matrix representation is used for the university course timetabling problem (Lewis &
Paechter 2005a, Lewis & Paechter 2005b, Michalewicz 1996);
• for the CVRP several possibilities were examined: a ‘genetic vehicle representation’ (GVR)
(Tavares et al. 2003), a permutation representation without (Prins 2001) or with (Alba &
Dorronsoro 2004) route delimiters;
• a sequence representation is employed for the CarSP (Cheng et al. 1999, Zinflou et al. 2007).
These numerous examples demonstrate that the choice or design of a suitable solution representation for COPs is indeed an issue in the field of evolutionary computation.
Crossover operators
The operation of crossover (or recombination) is most often applied to a pair of solutions (parents)
and produces one new solution (an offspring). It aims at generating an offspring which inherits
good properties (components) from both of the parents.
The actual form of this operation depends on the problem being solved and the chosen representation. Well-known examples of crossover operators for binary representations are: one-point
crossover, two-point and multi-point crossover, uniform crossover (Goldberg 1989, Michalewicz
1996, Michalewicz & Fogel 2000, Reeves & Rowe 2003)
In the case of floating-point representations one-point and multi-point crossovers are also used,
but there are some more specialised operators: arithmetic and heuristic crossovers, which make
use of the idea of linear combination of vectors (Michalewicz 1996, Michalewicz & Fogel 2000).
But the clearly visible diversity of specialised crossover operators starts with combinatorial
problems. The TSP, as the test-bed of metaheuristics for COPs, has seen more than 10 different
recombinations (Goldberg 1989, Michalewicz 1996, Michalewicz & Fogel 2000, Reeves & Rowe 2003,
Merz 2000): partially-mapped crossover, order crossover, cyclic crossover, one-point crossover for
the ordinal representation, heuristic greedy crossover, edge recombination crossover, edge assembly
crossover, maximum-preservative crossover, cut and merge operators for a matrix representation,
matrix crossover, distance-preserving crossover.
Similarly, there are many proposals of recombination for other COPs. In the case of the CVRP
these are: order and partially-mapped crossovers (Gendreau et al. 2002), two-point and uniform
crossovers (Baker & Ayechew 2003) or generic and specific crossovers (Tavares et al. 2003). For
the closely related vehicle routing problem with time windows (VRPTW) a sequence-based and a
route-based crossover were proposed (Potvin & Bengio 1996).
Car sequencing problems have also seen several variants of recombination: uniform adaptive
crossover (Warwick & Tsang 1995), a cross-switching operator (Cheng et al. 1999) and the three
crossover operators proposed very recently by Zinflou et al. (Zinflou et al. 2007): interest-based,
uniform interest and non-conflict position crossovers.
Although at the very beginning of GAs there was the prevailing opinion that one-point crossover
is general and sufficient enough to be successful for any problem (Goldberg 1989, Michalewicz &
Fogel 2000, Reeves & Rowe 2003), currently it is a universally shared belief that crossover operators
should be well-adapted to the problem at hand. The story of the TSP and other COPs seems to
confirm this point of view, although it took some 30–40 years of research to reach it. However, it
is not clearly understood how to choose or design a good crossover for the given task.
2.4. Metaheuristics
17
Mutation operators
Mutation is an operation which generates a solution (an offspring or a mutant) by a slight perturbation of another one. The goal of this perturbation is usually to increase the diversity in the
population, to explore new traits in solutions or to escape from local optima (Michalewicz 1996,
Michalewicz & Fogel 2000, Reeves & Rowe 2003).
The definition of mutation also depends on the problem and representation of solutions.
For binary problems the most popular is the bit-flip mutation (Reeves & Rowe 2003).
In case of floating-point encodings several other options have been devised in the literature: nonuniform and border mutations (Michalewicz 1996) or mutations based on probability distributions:
Gaussian, Cauchy or uniform (Michalewicz & Fogel 2000).
Again, applications of EAs to combinatorial problems have seen numerous alternatives. For
the TSP these are: edge-exchange (also known as inversion), relocation of a vertex, relocation
(displacement) of a path and vertex-exchange (Michalewicz & Fogel 2000).
Mutations for the CVRP are: remove-and-reinsert or swap mutations (Gendreau et al. 2002),
and inversion of a path or path displacement (Tavares et al. 2003). Greedy local search with several
neighbourhood operators was also used as a mutation operator for this problem (Prins 2001).
In the case of car sequencing several specialised mutations were employed (Cheng et al. 1999,
Zinflou et al. 2007): switching to non-identical and non-overlapping subsequences of the same
length (block switching); switching two non-identical vehicles (unit switching); inversion of a
subsequence (block inversion); random reallocation of a subsequence (shuffle); displacement of
a subsequence.
These examples demonstrate that the form of mutation depends greatly on the problem being
solved. Moreover, despite the early opinion that mutation has secondary function in EAs (Goldberg
1989), now the mutation operator is perceived as an important one (Michalewicz 1996).
Selection
This mechanism decides which solutions from the current population are selected to the next one.
It usually probabilistically prefers better solutions.
The selection procedure is independent on the problem and representation, i.e. it does not
require any knowledge about them, except for their evaluations. Yet, it influences the search process
considerably, since it determines the selection pressure - the probability of selecting a solution
relative to the probability of selecting the best one from the population (Reeves & Rowe 2003).
This way poor solutions may be either quickly abandoned by an EA or, conversely, left to breed
in hope to generate good successors.
Several procedures of selection have been proposed in the literature. Some of them are: proportional (or roulette-wheel) selection, truncation selection, selection based on ranking, stochastic
universal selection, random tournaments (Goldberg 1989, Michalewicz 1996, Reeves & Rowe 2003).
Closely related to selection is the issue of fitness scaling, which decides how the original values
of the objective function are scaled to obtain selection probabilities; there are also several options
for this choice (Goldberg 1989, Michalewicz 1996, Reeves & Rowe 2003).
Some researchers influence the selection mechanism with the ‘no duplication’ policy: no two
identical solutions may exist in the population (Reeves & Rowe 2003, Prins 2004).
Another selection-related mechanism is the so-called elitist model invented by De Jong (see
Goldberg (1989) or Reeves & Rowe (2003)) in order to improve the optimisation performance of
GAs. This model requires that the currently best individual in the population is preserved.
Yet another modification of the selection mechanism may be the steady-state version of the
18
evolutionary algorithm. This version completely abandons the idea of generations in EAs. Instead, only one offspring is generated in each iteration of the algorithm (either by crossover or by
mutation) and, if good enough, it replaces one (usually the worst) solution in the population.
Initialisation of the population
The way solutions are generated for an initial population is problem- and representation-specific.
However, hardly any detailed guidelines concerning the initialisation exist in the literature; only
some very general methods are given.
Firstly, the population may be initialised at random, using simple random sampling (Merz 2000,
Michalewicz & Fogel 2000, Reeves & Rowe 2003). This method may be easily applied to problems with solutions encoded in binary alphabet; problems with more complicated representations,
e.g. permutations, require specialised procedures to ensure uniform randomness (Manly 1997).
Secondly, the amount of randomness may be somewhat controlled with more systematic sampling, e.g. by the method based on the Latin hypercube sampling described by Reeves & Rowe
(2003). Moreover, randomness may be totally removed from the initial population by completely
systematic sampling (Michalewicz & Fogel 2000). These methods are useful in case of binary or
floating-point representations.
Finally, many authors mention that the initial population may include solutions of high quality
obtained from some other heuristic techniques (Reeves & Rowe 2003, Michalewicz & Fogel 2000).
These are usually fast greedy heuristics, rather problem-specific.
2.4.5
Memetic algorithms
The inspiration for memetic algorithms (MAs) comes from several sources and they have many
inventors. First is the notion of a cultural evolution (Moscato & Cotta 2003, Merz 2000) which is
supposedly faster than the genetic one in improving its objects: memes, the unit of culture (ideas,
designs, tunes, etc.) In such cultural evolution random changes, like mutation, are less probable.
Rather, the variation of memes is performed on purpose by intelligent individuals. On top of that,
cultural evolution requires less resources than its genetic counterpart; it does not have to physically
build living creatures, but only requires some of their memory.
The second source of inspiration is the Lamarckian point of view on evolution (Reeves &
Rowe 2003, Merz 2000, Michalewicz & Fogel 2000). Here, an individual created by evolution has
the ability to learn something which is not encoded in its genotype and improve its fitness during
lifetime, and pass these acquired characteristics to its descendants.
The third source is a rather pragmatic one. In many experiments with evolutionary algorithms
applied to COPs it was noted that they have very limited ability to locally tune the generated
solutions (Reeves & Rowe 2003, Hoos & Stutzle 2004). Some attempts at enriching EAs with other
heuristic techniques, local search among them, demonstrated that such hybridisation very often
results in considerably improved performance of evolutionary optimisation.
This inspiration led to the integration of other heuristics, carrying problem-specific knowledge,
into the evolutionary algorithms (Reeves & Rowe 2003), giving rise to memetic algorithms. Most
often these heuristics are local search algorithms (in the broad sense) (Reeves & Rowe 2003,
Michalewicz & Fogel 2000, Merz 2000, Hoos & Stutzle 2004), although some authors include
also exact methods, approximation algorithms, specialised recombinations in the list (Moscato &
Cotta 2003).
The scheme shown in algorithm 5 represents the steady-state version the memetic algorithm,
with the replacement of the worst solution in the population.
2.4. Metaheuristics
19
Algorithm 5 Steady-state memetic algorithm.
Generate a population of initial solutions
Apply local search to each solution in the population
while termination condition is not satisfied do
if crossover condition satisfied then
Select parent solutions
Choose crossover parameters
Perform crossover obtaining an offspring solution
Apply local search to the offspring
else
Select a solution for mutation
Choose mutation parameters
Perform mutation obtaining an offspring solution
Apply local search to the offspring
if the offspring is better than the worst solution in the population then
Replace the worst solution with the offspring
return the best solution in the population
Some authors state the algorithm as a generational one (Hoos & Stutzle 2004, Merz 2000).
Some other ones explicitly include some restart mechanisms (Moscato & Cotta 2003).
From the perspective of metaheuristics there are two main points of view on the memetic
algorithms:
• it is an evolutionary algorithm which manipulates only locally optimal solutions (Gendreau
et al. 2002, Michalewicz & Fogel 2000, Jaszkiewicz & Kominek 2003);
• it is a local search algorithm restarted multiple times from intelligently generated starting
points (by crossover and mutation) (Gendreau et al. 2002, Jaszkiewicz & Kominek 2003).
This perspective requires from any designer of a memetic algorithm to first compare his algorithm
with an ordinary EA and iterated LS in order to demonstrate usefulness of his design.
According to (Moscato & Cotta 2003) the term ‘memetic algorithm’ was first used in 1989 to
describe a hybrid of a genetic algorithm and simulated annealing. However, hybridisation of genetic
and local search methods was also attempted some years earlier, and hence the term ‘hybrid genetic
algorithm’ is also employed to describe this approach. On the other hand, some authors call such
methods ‘genetic local search’ (Merz 2000, Hoos & Stutzle 2004, Jaszkiewicz & Kominek 2003) or
still use the general term ‘evolutionary algorithms’.
There are many examples of applications of MAs to combinatorial problems. Michalewicz &
Fogel (2000) cite a publication on a MA applied to the TSP with very good results. Jaszkiewicz
& Kominek (2003) use a genetic local search algorithm to solve a vehicle routing problem, while
Prins (2001) applies local search to all individuals in his genetic algorithm for the CVRP. Zinflou
et al. (2007) apply local search in some of their experiments with a GA for a CarSP. Peter Merz’s
Ph.D. thesis (2000) is entirely devoted to the design of memetic algorithms for classical combinatorial optimisation problems. Numerous applications of MAs are listed by Moscato & Cotta
(2003).
An alert reader will note that there are many components in the memetic algorithm which have
to be adapted to the problem under study before the algorithm can actually run. These are all
evolutionary components (representation, crossover, mutation, initialisation) and all local search
components (neighbourhood(s), improvement rule).
20
Table 2.1: Components of metaheuristics which require adaptation to the problem.
Metaheuristic
Local search
Ant colony optimisation
Hyperheuristic
Evolutionary algorithm
Memetic algorithm
2.5
Components requiring adaptation
generation of an initial solution
neighbourhood operator(s)
improvement rule
definition of a solution component
randomised construction heuristic
pheromone update rule
definition of a solution component
set of low-level heuristics
high-level control hyperheuristic
generation of initial solutions
representation
crossover operator(s)
mutation operator(s)
all components of local search
all components of an evolutionary algorithm
Metaheuristics: schemes of algorithms which require adaptation
This short survey of metaheuristics and their applications shows that these are not algorithms
ready to use in every application. Rather, these are general ideas and schemes of algorithms.
They have to be further adapted while considering application to a specific problem: components
of a metaheuristic have to be designed or chosen (from the already existing ones). A list of such
components of the surveyed metaheuristics is presented in table 2.1.
In majority of cases (also in the case of the memetic algorithm) there are no design guidelines
which could help a practitioner to adapt a metaheuristic to a problem, except rather general
statements that problem-specific knowledge should be introduced in such algorithms. As the next
chapter demonstrates, this design of components influences the efficiency of the obtained algorithm
considerably. This is the reason why this thesis undertakes the subject of adaptation of the memetic
algorithm to certain problems.
Chapter 3
The No Free Lunch theorems and their
consequences for optimisation
The No Free Lunch (NFL) theorems (Wolpert & Macready 1997, Schumacher et al. 2001) are
supposed to impose serious limits on the performance of search and optimisation algorithms with
respect to some large sets of functions (problems). Thus, in a text on optimisation they cannot
be omitted; they will be shortly described and their interpretations discussed with consequences
to solving practical combinatorial optimisation problems. The proofs of the theorems will not be
given since they are not essential to the discussion and may be easily found elsewhere.
3.1
3.1.1
Formulations of the theorems
The original formulation by Wolpert and Macready
In their important article Wolpert & Macready (1997) raised an issue of performance of optimisation algorithms with respect to all possible discrete functions (i.e. problems and their instances).
In order to address the problem they used the notion of black-box search: when an optimisation
algorithm is run on some function it is only allowed to evaluate one point in the domain at a time
and guide its further search based only on evaluations performed earlier. The algorithm knows
nothing more on the function than these previously sampled points. Yet another assumption is
that points in the domain are not revisited by the algorithm.
Further, the authors attempted at defining sensible performance measures of algorithms executed in such an environment. They rightly stated that the evaluation of performance (solution
quality) after m steps of an algorithm had to be based on the sample dym of m evaluations of the
optimised function the algorithm performed. Hence, they firstly focused on the distribution of
such samples after m evaluations, putting the issue of performance measures aside. Next, they
considered the distribution of samples dym for any two algorithms a1 , a2 when all possible discrete
functions f were equally likely. Wolpert and Macready came to the conclusion that:
X
X
P (dym |f, m, a1 ) =
P (dym |f, m, a2 )
f
(3.1)
f
It means that for any two algorithms a1 , a2 the probability of obtaining some sample dym after
a number m of search steps is equal when summed across all possible discrete functions f .
Given this result they concluded that it did not matter what performance measure were used to
evaluate algorithms (provided it was based on samples of points); since the distribution of samples
were exactly the same for all algorithms, so were the performance measures.
21
22
The No Free Lunch theorems and their consequences for optimisation
What is also important, Wolpert & Macready (1997) showed that an algorithm need not be
deterministic for the theorem to apply; it is also true for stochastic algorithms.
3.1.2
The strengthened formulation by Schumacher et al.
A stronger formulation of the No Free Lunch theorem was presented several years later by Schumacher et al. (2001). They proved that ‘a No Free Lunch result holds for a set of functions F if
and only if F is closed under permutation’.
Firstly, they demonstrated that a discrete function might be described by a sequence of points
(values) from its domain and the corresponding sequence of evaluations of the points (values from
a co-domain). Then they showed that assuming a certain ‘canonical’ ordering of domain points,
the sufficient means of describing a function was a sequence of evaluations for this ordering.
Secondly, Schumacher et al. (2001) recalled the notion of a permutation of a function. For
example, if a function was described as a sequence f = (1, 2, 3, 2, 1) and permutation π was given
π = (2, 1, 5, 3, 4), then the permuted f , π(f ) = (2, 1, 2, 1, 3) was some other discrete function with
the same domain and co-domain as the original f .
Thirdly, the authors defined a set F of functions as closed under permutation if for every
function f from F and any permutation π applicable to f , the permuted function π(f ) was also
in this set F .
Finally, they proved that the equality 3.1.1 showed by Wolpert and Macready held if and only
if the set of functions F (the basis for summation) was closed under permutation.
This formulation is stronger than the original one because it also shows cases when a No Free
Lunch result cannot hold: when a set of functions is not closed under permutation. It also demonstrates that such a result may hold for a very limited set of functions. Schumacher et al. (2001) give
an example of ‘needle-in-a-haystack’ functions F = {(0, 0, 0, 1), (0, 0, 1, 0), (0, 1, 0, 0), (1, 0, 0, 0)}
which is clearly closed under permutation; hence, the No Free Lunch holds in this tiny case.
3.2
Major consequences of the theorems
Wolpert & Macready (1997) commented on their formulation that it ‘explicitly demonstrates that
what an algorithm gains in performance on one class of problems is necessarily offset by its performance on the remaining problems’. Whitley & Watson (2006) rightly note that this is valid for all
possible performance measures based on sampling the search space. This is a crucial consequence:
there are no general optimisation algorithms which perform well on all possible problems. If an
algorithm performs better than average on some set of functions, it has to perform worse on the
complement of this set (with respect to the universe of all discrete functions).
To emphasise this consequence, some authors (Culberson 1998, Whitley & Watson 2006,
Wolpert & Macready 1997) also apply the No Free Lunch theorem to the algorithm of random enumeration. It appears that across all possible discrete functions there is no algorithm that performs
better than this random enumeration, no matter the performance measure; actually, all algorithms
have equal performance. This applies to either deterministic or stochastic algorithms that conform
to the assumed black-box environment.
As pointed out by Whitley & Watson (2006), this consequence stopped the arguments in the
optimisation community as to which algorithm was more powerful and general than others; in the
black-box environment there are no better algorithms.
It also appears that the reasoning similar to the one in the NFL theorems may be applied to the
issue of encodings. This is an important part of many well-performing evolutionary algorithms and
3.3. The No Free Lunch theorems vs. the practise of optimisation
23
it has been stated many times that a good encoding of solutions is the foundation of good EAs (see
e.g. Michalewicz (1996)). However, when all possible discrete functions are considered, the effect
of encoding is always the same (Culberson 1998, Reeves & Rowe 2003, Whitley & Watson 2006).
However, metaheuristics are not subject to No Free Lunch theorems. Yet this is not good news.
This is bad news because it recalls that metaheuristics are not ready-to-use algorithms, but only
schemes of algorithms which have to be further specified to be fully operational.
But when a metaheuristic is completely specified (consider the Simple Genetic Algorithm from
Goldberg (1989)), it becomes an ordinary algorithm. If this algorithm were considered a general
tool of optimisation, it would be de facto put in the black-box environment and instantly be subject
to the NFL theorems, as any other blindly applied algorithm. This means that when investigating
metaheuristics, we also have to remember the No Free Lunch results.
3.3
The No Free Lunch theorems vs. the practise of optimisation
As it was noted, the NFL theorems require some assumptions about algorithms and problems
to be made. These assumptions have been the basis of criticism about the theorems and their
applicability to the practise of optimisation.
3.3.1
Argument no. 1: practical optimisation problems are not subject to
No Free Lunch
The number and compressibility of all discrete functions
According to Whitley & Watson (2006), some critics of the NFL claim that the original theorem
is not applicable to real-life functions because the set of all possible discrete functions (the basis
of the result by Wolpert and Macready) is not representative of real-world problems.
First of all, the critics say, the set of all functions is uncountably infinite, while the set of
functions practically implementable in any digital computer is only countably infinite (Whitley
& Watson 2006). Therefore, the conclusions from the NFL theorem do not apply to the set of
implementable functions, but only to this larger set of rather abstract ones.
Moreover, even this countably infinite set does not represent what may be called a practical problem, because the majority of functions in this infinite set are incompressible (Whitley
& Watson 2006, Culberson 1998, Reeves & Rowe 2003). This means that the majority of such
functions has to be described with a string of a length comparable to the size of the corresponding
search space. Why is this a problem? Because real-life problems of optimisation (e.g. NP-hard
problems) have very concise formulation (instance description), yet their search spaces are usually by orders of magnitude larger and hard to search through. So the set consisting mainly of
incompressible functions cannot be representative of the hard, yet compressible ones.
It is hard to disagree with these arguments since they show certain weaknesses in the original formulation of the NFL. However, the strengthened formulation by Schumacher et al. (2001)
demonstrates that the No Free Lunch result may apply to finite and sometimes very small sets
of functions (note their ‘needle-in-a-haystack’ example). It also shows that compressible functions
may be subject to the NFL. Therefore, the argument that all possible discrete functions are not
representative of real-world problems may not be used to disregard the consequences of the No
Free Lunch theorems. Due to the strengthened formulation the danger of the NFL comes closer
to practical optimisation problems than in the case of the original formulation.
24
The uniform distribution of all functions
Wolpert & Macready (1997) assumed that all possible discrete functions are uniformly distributed.
Yet, they were aware that for practical problems this distribution will not be uniform. Some interpretations of the original NFL theorem state, therefore, that this is the cause of non-applicability
of the theorem to practical problems (Kominek 2001) and in this case there may exist algorithms
which outperform at least some of the other ones (e.g. random enumeration).
Anticipating such interpretations Wolpert and Macready formulated two counterarguments:
1. In practise the function to be optimised is usually completely specified, or at least its general
form is known and only some parameters may vary across instances. Yet even in this case
of quite detailed knowledge some important characteristics of the function are still unknown
(e.g. the optima). In effect, one knows nothing or very little about the optimised function,
and this ignorance may be expressed with the assumption that actually any function is for
one equally likely (uniformly distributed).
2. If one has some knowledge about the properties of the problem to be solved, but this knowledge is not included in the proposed algorithm, then the distribution of functions this algorithm encounters is effectively uniform. In this case there is simply no guarantee that an
arbitrarily chosen algorithm will perform well on a function it knows nothing about.
These counterarguments raise the issue that algorithms, in order to perform well on a certain
problem, have to have some knowledge on the problem implemented. This issue will be later
commented upon in more detail.
The claim that unequal probability of functions in practise undermines the NFL theorem may
be also disputed to some extent with the strengthened version of the theorem. Whitley & Watson
(2006) rightly point out that the properties of the permutation closure allow some unequally
distributed cases. Thus, even for unequally distributed functions the No Free Lunch result may
hold. It should be clearly stated, however, that these cases are of very specific type and certainly
do not cover all possible non-uniform distributions.
Proved cases of non-applicability of the No Free Lunch theorems
Nevertheless, there is some support for the non-appliance of NFL arguments to ‘real problems’,
as Reeves and Rowe say (2003). They cite a work by Igel and Toussaint where these authors
prove that sets of functions with certain properties (e.g. some kind of neighbourhood relation and
a reasonable limit on the number of local optima) are not closed under permutation and are not
subject to the No Free Lunch result.
Some support for this conclusion is given by Whitley & Watson (2006). They cite another
research where it was proved that the NFL does not hold for sets of polynomials of a single
variable and bounded complexity.
Summary
It may be said that the NFL theorems might not apply to practical problems when something
more is known about them, but at the moment there are still to few arguments to be sure, and
the notion of ‘something more’ remains very vague. Therefore, the NFL theorems may not be
generally disregarded due to this issues yet.
3.3. The No Free Lunch theorems vs. the practise of optimisation
3.3.2
25
Argument no. 2: practical algorithms are not subject to No Free
Lunch
Another line of criticism against the NFL theorems concerned the issue whether practical algorithms are subject to its assumptions.
Revisiting previously sampled points
According to Reeves & Rowe (2003) the idealised algorithm assumed in the NFL theorems differs
considerably from real-world algorithms. They note that Wolpert & Macready (1997) assumed
no revisiting of previously sampled points by an algorithm, and state that such an assumption is
debatable.
Firstly, they say that revisiting of solutions by any algorithm may happen very often if some
countermeasures are not adopted (like in tabu search, for example). Moreover, such revisiting is
not costless (in terms of computing time), so the amount of revisiting an algorithm does may be
the basis of some difference in performance between it and some other algorithm for very broad
sets of functions (even all possible discrete functions).
Secondly, Reeves and Rowe indicate that it follows from the NFL theorem itself that revisiting
should be avoided when possible, because an algorithm which revisits less points cannot be on
average worse than others. They also note that the idea of limiting revisits was the very basis of
many algorithmic innovations which are said to perform well in practise. Therefore, the amount
of revisiting cannot be omitted in practical considerations.
There is much truth in these arguments: revisiting of previously sampled points is an issue
in practise an incurs additional computation cost. It is hard to agree, however, that this issue
undermines the whole theorem, because when we imagine all algorithms equipped with some
sophisticated no-revisiting policies, they basically become subject to the No Free Lunch theorem
again. Limiting revisits or inducing diversity may be (and most probably are) good algorithmic
ideas, but they do not solve the problem of equal performance across all functions.
The assumption of black-box search
Culberson (1998) stresses the fact that the original formulation of the NFL theorem applies only to
black-box search: the algorithm knows nothing about the optimised function except the previously
sampled points. It may be seen that this is also the assumption of the strengthened version of the
theorem (Schumacher et al. 2001).
This black-box assumption is seen by Culberson as an important weakness of the NFL theorems:
in practise the problem to be solved is most often known beforehand (as it is assumed in the
classical computational complexity theory) and the designer may (and usually does) incorporate
some problem-specific knowledge into the solving algorithm. Hence, practical algorithms are not
left problem-blind but made problem-aware by their designers.
Whitley & Watson (2006) state that the NFL theorems hold only due to this black-box assumption: no algorithm is able to efficiently search the given function, because ‘we do not have
information about what a solution might look like or information about how the evaluation [function] is constructed that might allow us to search more intelligently’. For instance, no lower bound
of the evaluation function may be computed in any subspace of the search space, because absolutely any value of the objective function may happen to be there, including the optimum. Hence,
no cuts-off are possible e.g. in a branch and bound algorithm.
Culberson’s comments on the No Free Lunch raise again the issue of implementing problem-
26
specific knowledge into search and optimisation algorithms: they should have such knowledge
incorporated in order to perform well on a specific problem.
What is more important, Culberson’s paper (1998) shows that when the problem to be solved
is known in advance, there is a chance of escaping the No Free Lunch result, because algorithms
operate in the black-box environment no more. They operate in a more specific setting of the
classical computational complexity theory.
Black-box search vs. computational complexity theory
Culberson (1998) demonstrates that there is a huge gap between the assumptions of the original
NFL theorem and the assumptions of the computational complexity theory. He describes five cases
of an algorithm’s knowledge about a problem, the first being full knowledge (classical theory) and
the last being no knowledge (black-box search). Then he presents different implications of those
cases. The most important one is that the last case includes the set of problems even larger
than the whole class NP. It means that by assuming black-box search one is not trying to solve a
particular (probably hard) problem, but all such problems at once. Therefore, it is not surprising
that in such a case all algorithms have equal performance, like the NFL theorem says. Culberson
concludes:
It is difficult to see how throwing away information and enlarging the problem classes
with which an algorithm must cope can improve the search capabilities. The NFL
theorems are much weaker than the intractability [i.e. complexity] theory in the sense
that the claims of difficulty are made on much larger classes of problems.
This argument shows that when problem-specific knowledge is incorporated in an algorithm it
may escape the original NFL result. It appears, however, that it is still subject to the issues of
the classical computational complexity theory where better and worse algorithms for particular
problems do exist.
Nevertheless, there still is the strengthened NFL theorem which says the NFL result holds
only for sets of functions closed under permutation. This may be seen as an argument against
Culberson’s point of view, since there are sets of functions smaller than NP for which the NFL
theorem is true.
However, Whitley & Watson (2006) note that there is no proved example of a problem in NP
which is closed under permutation. Moreover, if there were any, it would also imply P 6= N P ,
solving the most famous problem in the computational complexity theory, so this is not very likely.
Hence, the strengthened NFL formulation appears not to undermine Culberson’s argumentation.
Summary
The black-box assumption behind the No Free Lunch theorem looks like its weakest point, because
when problem-specific knowledge is implemented in an algorithm, it may escape the consequences
of the theorem. But it does not escape the issues of the computational complexity theory.
3.3.3
Argument no. 3: not only the sampled points matter
Wolpert and Macready are well aware of the fact that when comparing algorithms not only the
sampled points do matter. In this context, they say about the NFL theorem (Wolpert & Macready
1997): ‘measures of performance based on other factors than [the sampled points] dym (e.g. wall
clock time) are outside the scope of our result’. They mention computation time probably because
3.4. Conclusions: practical implications of the theorems
27
it is usually one of the key criteria of algorithm’s performance (Hoos & Stutzle 2004). Yet, this
criterion is not addressed in the No Free Lunch theorems.
The revisiting argument by Reeves & Rowe (2003), which was mentioned in the previous
section, also hints at this issue: revisiting of points in search space is not costless and incurs some
additional computation time on algorithms. Thus, performance of algorithms may be different
across all problems when the time of computation is considered.
Culberson (1998) sees this issue of computational effort as a strong argument against the
practical applicability of the NFL theorems. He notes that when we assume that only performance
measures based on sampled points are used and we have some additional knowledge about the
problem to be solved, then we get nonsense with respect to the computational effort as a result.
He gives an example showing that when the black-box assumption is relaxed, the No Free Lunch
reasoning clearly fails to identify the effort. Consider an algorithm which is given a black-box
function to optimise, yet it knows that inside that box there is some specific function (e.g. an
instance of the NP-hard problem of minimum graph colouring). Then the algorithm may ignore the
black-box, compute the optimum solutions without sampling the black-box even once by simulating
the given function on its own. From the point of view of the NFL theorems this algorithm
is a perfect one, since it generates the proper solution ‘instantly’ (no samples required). From
the rational point of view, the algorithm probably requires huge amount of time to complete on
reasonably-sized instances (unless P=NP). This illustrates the practical nonsense of estimating
performance based only on samples of evaluations in presence of additional knowledge about the
optimised function.
This issue is also visible when comparing algorithms which solve exactly the same problem. It
may happen that two algorithms employ the very same search policy (i.e. they always visit the
same points in search space in the same order), but they differ in some implementation details,
so the first algorithm is only faster than the second. While from the point of view of the NFL
performance measures the algorithms are effectively equal, the first algorithm is clearly better. Of
course, such situation may happen only when the algorithms have implemented some knowledge
about the problem being solved.
3.4
Conclusions: practical implications of the theorems
In all the arguments against the practical applicability of the No Free Lunch theorems there is a
point when it comes to the issue of algorithm’s knowledge about the optimised function. Each of
these arguments becomes strong when it is assumed that the problem to be solved is known to the
algorithm, because:
• most probably the set of considered functions is not closed under permutation (although this
is only a conjecture),
• the algorithm runs in the black-box environment no more,
• otherwise identical algorithms may differ in speed due to implementation details.
It appears, therefore, that this issue of problem-specific knowledge indicates the most important
practical implications of the NFL theorems.
3.4.1
No general tools of optimisation
The theorems imply that there are no general, ‘magic’ tools of optimisation which perform well
across a large variety of problems. This is because if there is no problem-specific knowledge
28
exploited in an algorithm, it works in the black-box environment and inevitably becomes subject to
the NFL result (its performance is equal to that of random enumeration). The point in designing
and applying algorithms in practise is to escape the black-box environment (Culberson 1998,
Wolpert & Macready 1997) and this cannot be done only with syntactic manipulations on bit
strings describing solutions of some unknown problem.
Hence, there are also no metaheuristics which could be always better or more general than
others. This applies to all types of metaheuristics (those mentioned in chapter 2 among others).
3.4.2
There is some structure in search space of particular problems
The NFL theorems also indirectly indicate that there must be some structure in the search space
of a problem if an algorithm is to perform better than random enumeration. Reeves & Rowe
(2003) confirm this implication: ‘Practitioners seem to believe that there is something about their
problems that make them amenable to solutions by a particular method (such as a GA) more
rapidly and efficiently than by random search’. Kominek (2001) also subscribes to this point of
view.
3.4.3
Structure should be exploited
However, the structure in search space is not enough. Wolpert & Macready (1997) state that:
‘while most classes of problems will certainly have some structure which, if known, might be
exploitable, the simple existence of that structure does not justify choice of a particular algorithm;
that structure must be known and reflected directly in the choice of algorithm to serve as such a
justification’. Very similar statements were given by Culberson (1998), Schumacher et al. (2001),
Reeves & Rowe (2003) and Whitley & Watson (2006): the algorithm should be matched to the
structure in search space.
In case of metaheuristics it means that the general outline of an algorithm has to be specifically
adapted to the given problem. Since the main steps of a metaheuristic are usually not modified,
the adaptation should focus on the components only vaguely described in the general scheme.
These are, for instance, neighbourhood operators in local search or recombination operators in
evolutionary algorithms (see table 2.1). Problem-specific knowledge should be introduced in such
components for the sake of efficiency (Jaszkiewicz & Kominek 2003).
3.4.4
Analysis first, exploitation second
Wolpert & Macready (1997) note that ‘while serious optimization practitioners almost always
perform such matching, it is usually on a heuristic basis’ and ask a question: ‘can such matching
be formally analyzed?’ There is no guarantee that an informal matching of an algorithm to a
problem, one very much based on experience, will eventually lead to a good algorithm. Perhaps it
should be done some other way. First, some analysis of search space of the given problem should
be performed. The obtained knowledge of the problem characteristics should be then exploited in
the designed algorithm. This is also David Wolpert’s point of view (Wolpert 2005).
3.4.5
What is structure?
Reeves & Rowe (2003) say that there is an important difficulty with the analysis of problem’s
structure: ‘while the existence of structure is undeniable for many practical problems, it is not
easy to pin down exactly what it means, nor how it might be possible to make use of it’.
3.4. Conclusions: practical implications of the theorems
29
Perhaps the strengthened formulation of the NFL theorem may shed some light on the problem.
Permutation closure is the finest level of granularity at which the No Free Lunch result may hold
(Schumacher et al. 2001). Thus, it may be the case that the difference between the given problem
and the smallest set of functions closed under permutation which contains the problem indicated
what structure in the problem is. At first sight it appears to be a difficult research area, though.
Some other kind of problem-specific knowledge may be anything that accelerates computation.
These may be techniques alike to those used in neighbourhood-based search methods:
• computation of the difference of the objective function between neighbouring points instead
of evaluating each point from scratch (Merz 2000, Hoos & Stutzle 2004, Kindervater &
Savelsbergh 2003),
• setting the appropriate order of evaluations of points in the neighbourhood (Kindervater &
Savelsbergh 2003),
• caching evaluations of neighbours from previous search steps (Merz 2000, Hoos & Stutzle
2004)
Yet another type of knowledge about a problem may have the form of a proved upper bound
on the optimum value of the objective function. Whitley & Watson (2006) mention an example of
an approximate algorithm for the Euclidean TSP which may be used to make sure certain parts
of search space are never examined in e.g. a branch and bound algorithm.
3.4.6
Caution required while evaluating algorithms on benchmarks
Wolpert & Macready (1997) say in their conclusions from the No Free Lunch theorem that it is
dangerous to compare algorithms based on their performance on a small sample of test problems.
This is an important remark, since an algorithm may perform well on one set of test problems and
yet exhibit poor performance on some other set.
Reeves & Rowe (2003) and Whitley & Watson (2006) also much concern about evaluating
algorithms on test problems, especially when instances are created by some random generator. In
the light of the NFL theorems, ‘there is no guarantee that algorithms developed and evaluated
using synthetic benchmarks will perform well on more realistic problem instances’ (Whitley &
Watson 2006).
It seems, therefore, that great caution is required while comparing algorithms based on experimental testing. When synthetic benchmarks are used, it should be ensured that they are not
biased, but exhibit properties of real-world instances, because we surely do not want our algorithms to be tuned to artificial problems. But if it is difficult to so (e.g. due to the mentioned
problems with properly defining characteristics of practical instances), examples of problems from
real-world applications should be used.
Nevertheless, even if test cases are realistic, it still may happen that designed algorithms are
overly tuned to these cases. This is a threat especially to algorithms which have been refined for
years (Whitley & Watson 2006). A very similar problem exists with parametrised algorithms, for
which massive search across the space of parameters is performed. The resulting algorithm may
perform well on test instances, but if there is no knowledge on how to adjust the parameters to
new instances, it may fail to perform well on such new examples.
Perhaps a procedure similar to resampling or cross-validation (Manly 1997, Duda et al. 2001)
would be also of use in optimisation. These procedures are widely used in machine learning in
the classification task to avoid over-fitting of a classifier to a learning set. In case of optimisation
30
some test instances should be hidden from the algorithm designer and made available when the
algorithm is completely specified. This idea is well implemented in some computational contests,
e.g. the series of ROADEF Challenges (Cung 2005b) or ACM Collegiate Programming Contests.
3.5
Implications for evolutionary algorithms
In the light of the No Free Lunch theorems ‘the faith in using an EA as a blind search optimization
engine is misplaced’ and ‘there is no reason to believe that a genetic algorithm will be of any more
general use than any other approach to optimization’ (Culberson 1998).
Even so, many enthusiasts of neo-Darwinism would say that natural evolution is the best
evidence that evolutionary engine based on survival of the fittest works in practise; after all it
was able to create complex organisms perfectly adapted to difficult dynamic environments (see
e.g. works cited by Reeves & Rowe (2003)). While reading the book by Michalewicz & Fogel
(2000) one may get an impression that these authors subscribe to some extent to this enthusiastic
point of view.
However, as Culberson (1998) neatly puts it, ‘the fact of natural evolution does not indicate
where these areas of applicability [of EAs] might be, and it certainly yields no basis to claim EAs
as a universal magic bullet’. Similarly, Muhlenbein (2003) says: ‘I object to popular arguments
along the lines: this algorithm is a good optimization method because the principle is used in
nature’. Reeves & Rowe (2003) also thoroughly investigated the issue and in the first chapter of
their book they notice that:
• neo-Darwinism is a seductive theory and very often it is sufficient to quote the theory of
evolution and the name of Darwin to justify evolutionary algorithms as general tools of
optimisation without any further explanation;
• at the same time, the mechanisms of natural evolution are in many cases unknown or insufficiently explained, and there is much speculation about them, without sound proofs;
• natural evolution most probably optimises nothing (at least no objective has been shown yet),
hence it is no justification for the use of evolutionary algorithms as a tool for optimisation.
Very similar arguments about the seductive power of neo-Darwinism were presented by Andrew
Tuson during a conference on operational research (Tuson 2005).
Therefore, it should be concluded that evolutionary algorithms are nothing more than some
abstract, mathematical construction, some optimisation engine which has little in common with
natural evolution. It is the adaptation of the engine to the given problem which is the basis of
success (or failure) in optimisation, as indicated by the No Free Lunch theorems. That is the
reason why the adaptation of a kind of an EA is the focus of attention of this thesis, and some
ways of adaptation will be discussed in the next chapter.
Chapter 4
Adaptation of an evolutionary algorithm
to a combinatorial optimisation problem
Since it is the adaptation of a metaheuristic algorithm to a combinatorial optimisation problem
which greatly affects the algorithm’s performance, it is necessary to know which components of
an evolutionary algorithm should be adapted. The methods of adaptation which are given in the
literature will also be discussed here.
The components of an EA which should be adapted are (see also table 2.1): the representation
(encoding) of solutions, the fitness function, the method (methods) of generation of solutions for
initial population, ‘genetic’ operators: crossover and mutation, local search (if a memetic algorithm
is designed).
Other components of evolutionary algorithms are tuned rather than adapted to a problem.
These are, among others: the selection method, the stop condition, the population size, probabilities of crossover and mutation.
The difference between adaptation and tuning of components is subtle but important. The
adapted components, like the fitness function, depend mainly on the meaning of the considered
problem or, like representation and operators, on contents and structure of solutions of the problem.
Usually, they cannot be easily changed during execution of an algorithm. Conversely, the tuned
components do not depend on these issues, but rather on evaluation of solutions in the population.
Therefore, they are parameters of the method, either numerical or procedural, which may be
changed during execution without much effort.
This distinction between adapted components and tuned parameters is not universally agreed
upon in the literature. There are authors who through adaptation mean exactly the tuning of
EA’s parameters, either before or during an algorithm run (Michalewicz & Fogel 2000). Some
other authors (Bonissone et al. 2006) mix adaptation with tuning, focusing on the latter. The
author of this thesis is against such mixing; he agrees with scientists who concentrate on the
components to be adapted, like Hoos & Stutzle (2004) or Falkenauer (1998), ‘because if those are
not appropriate, the GA will not deliver, whatever the settings [of parameters]’.
4.1
Representation of solutions
Many authors stress the issue of a good representation of solutions for the given combinatorial
optimisation problem (Michalewicz 1996, Falkenauer 1998, Michalewicz & Fogel 2000, Reeves &
Rowe 2003, Hoos & Stutzle 2004).
At the beginning of evolutionary (genetic) algorithms there was the prevailing opinion that
31
32
Adaptation of an evolutionary algorithm to a combinatorial optimisation problem
binary encoding should be used independently on the problem being solved (Goldberg 1989). This
opinion was based mainly on Holland’s Schema Theorem, which was deemed the foundation of genetic algorithms. The advantage of binary representation was supposed to be the largest possible
number of schemata generated by this representation in comparison to other ones (Goldberg 1989).
However, this computation of the number of schemata for other representations appeared to have
had some important flaws (Reeves & Rowe 2003). Moreover, the Schema Theorem itself faced severe criticism. Altenberg (1995) concluded his analysis: ‘the Schema Theorem has no implications
for how well a GA is performing’. Muhlenbein (2003) expressed similar objections. Therefore,
this theorem cannot be the basis for design of representations in evolutionary algorithms and the
importance of binary representation diminishes.
There were some attempts at varying the representation of solutions during the run of an EA.
Michalewicz & Fogel (2000) cite experiments with changes of precision in binary-encoded variables,
with delta encodings and messy GAs. However, these attempts focused only on binary encodings
and led to no general conclusions about where and when such techniques should be used.
After the publication of the book by Michalewicz (1996) it appeared that a good representation
is the one which is in some sense ‘natural’ and close to the definition of the given optimisation task.
This was the main idea of Michalewicz, based on intuition and experiments with diverse problems:
binary, numerical and combinatorial. Falkenauer’s experiments with the so-called grouping problems (e.g. graph colouring or bin-packing) showed that solutions should be encoded in a way that
is ‘natural’ to the problem, as well. In his case the notion of a group was stressed upon in a
representation, as the one most important in grouping problems; his observations were also used
in algorithms designed by Lewis and Paechter (2005a, 2005b). Other authors also emphasise the
importance of problem-specific representations used in EAs for combinatorial problems (Hoos &
Stutzle 2004, Reeves & Rowe 2003), but hardly give any more specific guidelines.
Somehow linked to the argument of ‘naturality’ is the fact that in the family of all bijective
representations of solutions of the given problem there is no choice of a representation which is
better that any other one (Michalewicz & Fogel 2000). Further, Michalewicz and Fogel conclude
that in the light of this argument the best option is to use data structures which are intuitive with
respect to the formulation of the considered problem.
One can see that despite many years of research on applications of evolutionary algorithms to
combinatorial problems, there are still hardly any specific guidelines concerning the design of a
good representation, except advice of naturality and intuition, and some more specifics in case of
certain problems (e.g. the grouping ones).
However, the importance of representation is somehow lessened by the fact of duality of representations and operators (Altenberg 1995): the effect of change in a representation may be also
achieved with the same representation and different operators of crossover and mutation. It appears, therefore, that the issue of representation may be considered as a rather technical one: a
container for solutions. The structure of this container should mainly enable fast operations: evaluation, crossover, mutation and local search (if used). The direction of search should be chosen
by these operators and not necessarily by the representation of solutions.
4.2
Fitness function
In many cases the fitness function which guides evolutionary search is simply identical to the objective function specified in the formulation of the given problem (Hoos & Stutzle 2004). Michalewicz
& Fogel (2000) say, however, that it happens mainly for problems taken from literature, where the
objective function is indeed described, like in the TSP, the knapsack problem, or other problems
4.3. Initial population
33
in operational research. They claim that real-world problems are not that easy from this point of
view, but give examples of problems of optimal control, not combinatorial optimisation.
Nevertheless, it happens in combinatorial optimisation that the originally given objective function is too coarse-grained to properly guide the search. This is the case of the problem of satisfiability of boolean expressions (SAT) (Hoos & Stutzle 2004, Michalewicz & Fogel 2000) which
is, in a way, a needle-in-a-haystack problem: only the optimal solutions (needles) have the objective equal to one; all other solutions have it equal to zero. That is why in optimisation a more
fine-grained objective function is considered, which gives rise to the MAXSAT problem (Hoos &
Stutzle 2004). Very similar difficulty was encountered by Falkenauer (1998) in the bin-packing
problem; he changed the fitness function to be more fine-grained in order to discriminate worse
and better solutions easier. The same strategy of adaptation of the fitness function was applied
by Lewis & Paechter (2005b) in the case of a university course timetabling problem, with some
success.
Not only in these circumstances the fitness function is somehow adapted to both the problem
and the algorithm. More generally, if the original objective function is hard to optimise, e.g. highly
multimodal and rugged, its modification may make the search process easier. But then one has
to remember that the modified function should at least have the same optima as the original
one. Moreover, it is desired that this new function be rather smooth and without many modes
(Michalewicz & Fogel 2000). Except for those general remarks it is hard to find some more detailed
rules for such a modification, though.
Some authors also tried to modify the fitness function dynamically during the run of an EA
(Michalewicz 1996, Michalewicz & Fogel 2000). The modifications concerned mainly some parameters of one fitness formula given in advance, not the whole formula itself. They were also focused
only on highly constrained problems.
Michalewicz & Fogel (2000) discuss the issue how to adapt the fitness function to a problem
with constraints. They stress the fact that real-world problems always come with some additional
constraints on feasibility of solutions. In such cases it may be beneficial to accept infeasible
solutions to the population, but they have to be assigned sensible values of fitness, they say.
First option they propose, though, is to always eliminate infeasible solutions. They claim that
it is one of the most sensible heuristics to maintain feasibility of solutions in the population all the
time through application of specialised representations and operators. It is especially useful when
the space of feasible solutions is convex and is a large part of the whole search space.
They recommend second option when these conditions are not satisfied, e.g. it is hard to generate any feasible solution from scratch. In such a case they suggest accepting infeasible solutions
with some penalty in fitness values. This way any feasible solution is almost always better than
the penalised solutions and will be accepted to the population if only generated. Michalewicz &
Fogel (2000) warn, however, that there are no general and useful heuristics on how to define such
a penalty function; the designer of an EA has to rely on their intuition and experience, it seems.
4.3
Initial population
In the literature one may find two general goals of the generation of an initial population, the goals
which are somehow conflicting.
First goal is to ensure good coverage of search space with the generated solutions. It is to make
sure that potentially any solution may be generated by the subsequent artificial evolution process;
this goal is also referred to as enabling high diversity of initial population.
The second goal, though no less important, is to generate solutions of good quality, since
34
optimisation is the ultimate task of the designed algorithm.
One of the suggested methods of initialisation consists in completely random generation of
solutions (Hoos & Stutzle 2004, Michalewicz & Fogel 2000, Reeves & Rowe 2003), which usually
means a number of independent, unbiased draws from the search space (simple random sampling
(Cochran 1953)). This method is supposed to create an initial population with high diversity, but
this actually depends on pure luck, as some remark (Reeves & Rowe 2003). Moreover, it might be
difficult to implement for practical combinatorial optimisation problems. The notion of a random
structure (e.g. tree, graph, assignment, family of sets) may have very different meaning there,
depending on the actual context. The possible presence of constraints also limits the application
of purely random sampling. Finally, from the point of view of optimisation (the second goal), the
resulting population is usually of rather poor quality.
Another way of initialisation focuses even more closely on the issue of diversity of the generated
solutions. It requires that some more sophisticated statistical methods of sampling are used instead
of simple sampling, in order not to depend on luck only; the initial population is to uniformly
cover the search space on purpose. The methods suggested in this approach include generation of
solutions located in a mesh (Michalewicz & Fogel 2000) or through generalisation of the notion
of a Latin hypercube (Reeves & Rowe 2003). However, these methods are mainly applicable to
binary, integer or continuous optimisation problems, where free variables are available that can
be independently assigned. Similarly to the first initialisation method, it is usually quite difficult
to apply these ideas of systematic sampling to combinatorial structures. The quality of obtained
solutions is also problematic.
Since it is difficult to create an initial solution to a combinatorial problem by random sampling,
Michalewicz & Fogel (2000) suggest a more direct way of ensuring high diversity in the first
population. They propose that each solution added to the initial set should be checked for its
distance to the already generated ones. If this is too low, then this solution should not be admitted
to the population. This way only a diverse enough population is produced, although the whole
procedure may be significantly more time-consuming than simple random sampling. One may
notice that a sensible distance or diversity measure is required to implement this method in practise.
The methods discussed so far specifically addressed the issue of diversity of the initial population
and disregarded the question of quality of solutions. Therefore, with the goal of optimisation in
mind, many authors (Falkenauer 1998, Hoos & Stutzle 2004, Merz 2000, Michalewicz & Fogel
2000, Pawlak 1999, Reeves & Rowe 2003) suggest that the initial population necessarily be seeded
with known good solutions coming e.g. from other heuristic techniques, specific to the problem
being solved. This method, one can see, is also in accordance with conclusions from the No Free
Lunch Theorems (see chapter 3) that problem-specific knowledge about good solutions should be
exploited in the designed algorithms. Although it usually helps in finding good solutions quickly, it
may also be the source of premature convergence of the following evolution process to sub-optimal
solutions, the authors warn. Thus, this approach should be used with some care, often backed up
by additional randomisation (Merz 2000) and applied, perhaps, only to a fraction of the initial
population.
To summarise, there are some guidelines concerning initialisation of an evolutionary algorithm.
One can use simple or more systematic sampling, initialisation based on diversity or some problemspecific, randomised heuristic methods. However, except for the last recommendation, there seems
to be no focus on adaptation of the initialisation procedure to the problem being solved, only on
the aspects of diversity and quality of generated solutions. Even the last suggestion, of problemspecific heuristics, is rather vague and leaves the choice of appropriate procedures to the designer
of an EA.
4.4. Crossover operators
4.4
35
Crossover operators
Crossover is an operation which usually creates one or two new solutions (offspring) based on a pair
of given solutions (parents). Sometimes a generalisation of this procedure with multiple parents is
called recombination (Merz 2000, Michalewicz & Fogel 2000).
4.4.1
Importance of crossover
Opinions as to the importance of crossover (recombination) in EAs vary between authors. Hoos
& Stutzle (2004) say that designing recombination operators is one major challenge in designing
evolutionary algorithms. In his seminal book, Michalewicz (1996) also subscribes to this point of
view, explaining the power of EAs exactly by proper exchange of information through crossover.
Michalewicz & Fogel (2000) also noticed that the design of recombination of operators was the
most important part of much research they had seen. However, they claimed that the focus on
crossover had been mainly historically motivated, and that there was no reason not to design operators of other types, e.g. alike to their well-performing inver-over operator for the TSP. Moreover,
Reeves & Rowe (2003) cite dissenting opinions about the importance of crossover operators which
claim that mutation is superior.
Despite these arguments, the author of this thesis still deems that crossover is an important
part of EAs. Michalewicz & Fogel (2000) might have been right with the argument concerning the
history of EAs, but one can see in their discussion that they almost directly hint at their inver-over
operator as an alternative to crossover. But this operator is actually a kind of crossover coupled
with local search, so this is not a real alternative, especially to memetic algorithms. The arguments
cited by Reeves & Rowe (2003) are perhaps more powerful, but in the author’s opinion they are
not enough to diminish the major importance of crossover operators in EAs, as it is generally
perceived in the evolutionary computation community; crossover usually considerably accelerates
computation, they admit (Reeves & Rowe 2003).
4.4.2
The Schema Theorem and the choice of crossover
Concerning the adaptation of crossover to specific problems of combinatorial optimisation, the
first thing to say is that designs based mainly on Holland’s Schema Theorem and the hypothesis
of building blocks (Goldberg 1989) are no longer valid. As already mentioned in section 4.1 on
representation, the theorem received adverse criticism in 1990s (Reeves & Rowe 2003). Culberson
(1998) directly points out that the Schema Theorem is an attempt at ‘getting a free lunch’ (see
chapter 3), because it is assuming nothing about the optimised function. He also mentions that very
similar theorems may be easily proved for actually any definition of a schema; it is only necessary
to modify crossover and mutation accordingly. Altenberg (1995) puts forward an argument of
performance, proving with Price’s Covariance and Selection Theorem that ‘the Schema Theorem
(. . . ) does not address the search component of genetic algorithms on which performance depends,
and cannot distinguish genetic algorithms that are performing well from those that are not’. He
concludes that ‘schemata are therefore not informative structures for operators and representations
in general’, but their relevance depends on the application of single-point crossover. Thus, from
the point of view of the author of this thesis, the time of universal application of this crossover
operator to any problem is likely to have finished.
36
4.4.3
Adaptation of crossover to a problem
Michalewicz (1996) was one of the first researchers to emphasise the importance of a crossover operator adapted to a given problem. As a matter of fact, the idea of problem-dependent crossovers
is one of cornerstones of his extremely popular book. But while he correctly noticed that each
crossover operator performed well for certain problems and poorly for others (which was an excellent conclusion in the time before the No Free Lunch Theorems), he did not give much guidance
on how to design a crossover. Later, with Fogel (2000), they only admitted there were no proper
choice of crossover for all possible problems.
When attempting to design a crossover operator for a specific task, the first issue to be analysed
is the role of the operator in the developed algorithm. Merz (2000) rightly points out the difference
in the role between evolutionary and memetic algorithms. In EAs crossover is required to produce
offspring with fitness higher than these of their parents, so the offspring could survive selection.
Memetic algorithms have a different role for crossover. Its goal is to produce a starting point for
the subsequent local search; crossover actually has to produce offspring in the attraction region
of a local optimum with high fitness, but the offspring themselves do not have to be extremely
good solutions. Merz concludes his analysis stating that crossover (and mutation) designed for an
evolutionary algorithm does not have to fit a memetic one well.
The second adaptation issue to be addressed while designing crossover is its mutual dependence
with representation of solutions. Michalewicz & Fogel (2000) say, for example, that it is crossover
which strongly depends on the chosen representation. On the other hand, Altenberg (1995) notices
the effect of duality of representation and crossover (this was already mentioned in section 4.1).
As an illustration of this duality one may consider a representation of solutions and a number of
crossovers which may be applied only to some different representations. In such a case a number
of encoding-decoding procedures would be required to enable the crossovers, but, in effect, the
crossovers would not depend on this basic representation at all. Of course, the application of an
encoding-decoding procedure in a real algorithm would be time consuming, but this argument
only stresses the issue of efficiency of the pair (representation, crossover), not any crucial semantic
dependency. Therefore, one may say that there is no strong dependency between a representation
and some crossover operators, unless efficiency is at stake.
The third issue of adaptation of the designed algorithm to a problem concerns the feasibility
of generated offspring. Usually, simple crossover operators from the literature do not produce
valid solution candidates (Hoos & Stutzle 2004, Michalewicz & Fogel 2000, Reeves & Rowe 2003).
Therefore, especially for constrained problems, specific operators have to be devised, which do
not produce infeasible solutions or do it only with small probability. Lewis & Paechter (2004)
emphasise the importance of feasibility in case of a timetabling problem, for example.
Except for these few and general guidelines on how to design crossover operators, until rather
recently there was no clear advice in the literature on how to adapt a crossover to a given problem.
One may notice it while reading the book by Michalewicz & Fogel (2000), for example. These
authors clearly place the full responsibility for the adaptation on the designer, only suggesting
that the designer’s intuition about the search space is important. Another example of such a point
of view may be found in the book by Reeves & Rowe (2003), where they say that ‘we should try to
make the representation and operators as ‘natural’ as possible for the problem’ and ‘a systematic
method of designing appropriate mixing [i.e. crossover and mutation] operators would be extremely
helpful’.
Even though apparently at the moment there are no such systematic methods and the design
of crossover remains generally a difficult task, Reeves & Rowe (2003) note in the summary of their
37
book that the design may not be as hard a job:
if we happen to know which properties ought to be preserved during the search process. In this case, there are results that may help, but they rely on having a good
understanding of the nature of the search space.
Therefore, in order to see what property (or properties) of search space these authors have in mind
and what it means to preserve it, some examples of crossover designs for particular problems of
combinatorial optimisation will be discussed in the following sections.
Travelling salesman problem
Due to its central role as a benchmark problem of combinatorial optimisation, the TSP received
special attention of researchers working in evolutionary computation and a large number of diverse
crossover operators was devised (Merz 2000, Michalewicz & Fogel 2000). Moreover, the design of
crossover (recombination) was, together with local search, the focus of most research effort on EAs
for the problem, as Hoos and Stutzle say (2004). Hence, there is much serious work on the issue
which may serve as a good source of knowledge (e.g. Hoos & Stutzle (2004), Michalewicz (1996),
Michalewicz & Fogel (2000)), and the interested reader is referred there. Here, only the issues
most important for this thesis are addressed.
Historically first attempts at designing crossovers for the TSP concentrated on adjusting the
operator so it worked on the chosen representation of solutions (Michalewicz & Fogel 2000). It
was observed that binary representation was inadequate for the problem (Michalewicz 1996), but
despite this fact there were attempts at applying one-point crossover to solutions described in an
adjacency-based representation. Criticising this poorly performing approach, Michalewicz & Fogel
(2000) stated that it was not enough to create feasible solutions through crossover; offspring had
to inherit good properties of their parents. However, at the time it was hidden what a good parent
trait may be.
Later, some more ‘natural’ representations for the TSP were designed and, consequently, new
crossovers appeared. These were (Michalewicz 1996, Michalewicz & Fogel 2000): partially-mapped
crossover (PMX), order crossover (OX), cyclic crossover (CX) and operators applied to matrix
representations. Some of these operators preserved certain characteristics of parents in the resulting
offspring, e.g. the relative order of vertices, their absolute order or position in the genotype.
Nevertheless, it was apparently still unknown which property of crossover is the most important
for the TSP.
However, many researchers (Michalewicz & Fogel 2000, Reeves & Rowe 2003) had the intuition
that since these were edges in a TSP solution which contributed most directly to the solution’s
quality, a crossover operator had to be focused on edges. In effect, many crossover operators
were proposed which attempted at preservation of parental edges in offspring. These were: maximum preservative crossover (MPX) by Gorges-Schleuter (Merz 2000), edge recombination (ER)
by Whitley et al. (1989), edge assembly crossover (EAX) by Nagata & Kobayashi (1997).
The idea behind MPX and ER was to preserve in offspring as many parental edges as possible;
ER was actually better at the task, resulting in higher inheritance rate of edges. EAX, on the other
hand, not only preserved edges from parents, but also administered a dose of implicit mutations
and local optimisation. This operator was the best crossover for the TSP for some time.
Edge-preservation is also present in the inver-over procedure proposed by Tao and Michalewicz
(described also by Michalewicz & Fogel (2000)). It is neither a recombination nor a local search
operator, but it merges the concepts into one, with good optimisation results.
38
There were even more examples of crossover operators which tried to preserve edges in TSP
solutions. Hoos & Stutzle (2004) cite work by Seo and Moon where a well-performing crossover of
this type was described.
It seems, therefore, that the presence of certain edges is an important property of solutions to
this problem. Moreover, the preservation of parental edges in offspring solutions is a crucial part
of any crossover for the TSP, although this knowledge was acquired after many years of research
on the problem and mainly through intuitive insights into its nature.
These intuitions and results were finally confirmed by more rigorous empirical analyses of search
space of TSP instances. Kirkpatrick & Toulouse (1985), Boese et al. (1994), Boese (1995) and later
Merz (2000) performed fitness-distance analyses of the problem and concluded that it was visible in
the search space itself that the preservation of edges was a crucial property of any metaheuristic algorithm for the TSP. They exploited this result either by designing efficient local search algorithms
(Boese et al. 1994) or memetic algorithms with edge-preserving crossover operators (Merz 2000).
Merz designed his operators based exactly on the results of the mentioned fitness-distance analyses. His experiments confirmed that the designs based on this analysis resulted in well-performing
memetic algorithms.
To summarise the case of the TSP, one may say that the research intuition about the importance
of edges in the problem and their preservation in the designed crossover was finally confirmed and
reinforced by an empirical analysis of search space of the TSP, the fitness-distance analysis. Hence,
if it is possible to perform such an analysis for other combinatorial optimisation problems and
obtain similar results, it may be the basis for design of crossovers and yield powerful operators.
Grouping problems
Under the notion of grouping problems one usually finds the problems of bin-packing and graph
colouring (Michalewicz 1996, Falkenauer 1998), although some more specific problems also contain
the aspect of grouping (e.g. the university course timetabling problem (Lewis & Paechter 2005a),
the workshop layout problem (Falkenauer 1998), etc.) Here, mainly the case of the graph colouring
problem will be discussed, with some remarks concerning other problems.
First genetic approaches to the problem were focused on the choice of crossovers which would
be adequate to the chosen representations. According to Michalewicz (1996), Davies encoded
solutions of the problem with permutations. A colouring was obtained from a permutation by
means of a greedy decoding procedure. The chosen crossover was based on order; it actually was
the operator originally devised for the TSP. Michalewicz mentions also a very similar approach
by Jones and Beltramo, who applied the crossover operators taken from the TSP, as well. It was
simple to do so, since TSP solutions were also encoded as permutations.
According to Michalewicz (1996), Jones and Beltramo experimented also with other representations and, consequently, other operators. They considered an encoding of solutions as permutations
with separators, which directly indicated the beginning and the end of each group (colour). Thus,
no decoding through the greedy procedure was required. For this representation, the authors employed some standard (at the time) operators: order crossover (again) and the partially-mapped
crossover (PMX, taken from the TSP). However, with this approach the feasibility of offspring was
at issue: the crossovers usually had to be applied several times to such a representation in order
not to produce empty groups (i.e. separators standing beside each other).
Another approach to the problem with GAs was connected with the encoding of groups with
numbers (Michalewicz 1996). Here, Jones and Beltramo, and Hans Mühlenbein experimented first.
The first authors tried to apply three different crossover operators: one-point crossover, uniform
39
crossover and edge-based crossover. They concluded that with this representation the last crossover
was best, although it was outperformed by the algorithm based on the greedy decoding procedure
mentioned above. Michalewicz (1996) concludes, however, that this poor performance of the edgebased crossover was not due to its irrelevance to the problem, but because of the incompatibility
of the operator and the representation: it was simply more time consuming to decode edges from
a string describing groups than, say, greedily decode a permutation. Mühlenbein, on the other
hand, designed his own ‘intelligent’ crossover which attempted at transmitting whole parts of a
colouring, not separate vertices, as Michalewicz reports (1996).
In the opinion of the author of this thesis, what unites all these approaches is some degree
of arbitrariness in the choice of crossover operators: if an operator could be applied to the given
representation, it was used in experiments. One can see that simple binary operators (one-point
or uniform crossovers) and operators taken from GAs for the TSP (OX, PMX, edge-based) were
mainly used. (It appears that at the time it was only Mühlenbein who was able to devise his own,
specialised crossover for the graph colouring problem).
However, as Michalewicz (1996) and Falkenauer (1998) point out, the nature of grouping problems is completely different from, say, the travelling salesman problem. In the TSP the focus is
almost completely on edges of a graph, while in grouping problems the notion of a vertex and a
group of vertices should be stressed. It was probably Falkenauer who first explicitly emphasised
the notion of a group in these problems, based on his intuitive analysis of their objective functions:
The cost functions of grouping problems thus yields better values in points of the search
space that represent solutions containing high-quality groups. Even more so, it yields
better values where high-quality groups of groups are located. The high quality groups
and their aggregates constitute a regularity of the functional landscape of grouping
problems (. . . ) It is this landscape that is common to all grouping problems.
Due to this characteristic of grouping problems, Falkenauer (1998) criticised the use of the
crossovers mentioned above: the general binary operators failed to preserve the membership of
vertices of groups; the more specialised operators were better at the task, but usually transmitted
only one group to offspring. He also proposed his own algorithm, the so-called Grouping Genetic
Algorithm (GGA), which used specialised representation and crossover operator focused directly
on groups of elements (vertices in the case of the graph colouring). The crossover operates on whole
groups and completely ignores their labels; only the contents matters. It tries to merge groups of
vertices from parents in the way which always preserves groups common to both parents (or groups
of one parent which are completely contained in some groups of the other). The groups which are
not common to the parents are usually disrupted and new groups are heuristically constructed
from scratch. As Falkenauer reports (1998), the operator was an efficient one, faring better than
heuristic algorithms or naive evolution without crossover.
This approach was well received in the community. Michalewicz (1996) comments that Falkenauer’s crossover operator was able to transmit as much meaningful information as possible from
parents to offspring and that optimisation results were very good. Also Reeves & Rowe (2003)
confirm Falkenauer’s intuitions concerning the role of subset selection (i.e. groups) in his crossover,
instead of simple linear combination of strings (as in simple binary crossovers, for example). However, one has to notice that Falkenauer’s approach was based entirely on his intuitive analysis of
the nature of the grouping problems; neither formal nor empirical analysis was performed.
The next important step in the design of a crossover operator for the graph colouring problem
was taken by Galinier & Hao (1999). Following the same intuitions as Falkenauer’s they say that:
40
in general, designing a crossover requires first the identification of some good properties
of the problem which must be transmitted from parents to offspring and then the
development of an appropriate recombination mechanism.
Then, they contrast two possible approached to the colouring problem: the assignment and the
partition approach. The former one focuses on the assignment of a label (colour) to a vertex and,
hence, emphasises the role of a pair (vertex, colour) in a solution. ‘Nevertheless, such a couple,
considered isolatedly, is meaningless for graph colouring, since all the colours play exactly the same
role’, they say (Galinier & Hao 1999). The partition approach, on the other hand, concentrates on
relationships between vertices. They conclude that the latter approach is more promising, since
‘for the colouring problem, it will be more appropriate and significant to consider a pair or a set
of vertices 1 belonging to a same class’.
Therefore, they decided to employ the partition approach in the design of their algorithm and
proposed the Greedy Partition Crossover (GPX). This operator iteratively transmits one group
from a parent to offspring (parents are considered alternately), taking the largest groups first.
If some vertices are already put in the offspring solution at some stage of the process, they are
removed from the parents in order to prevent potential infeasibility in next steps.
Experimental results confirmed that this crossover, coupled with some local search procedure
(and thus resulting in a memetic algorithm), was an excellent design (Galinier & Hao 1999). Hoos
& Stutzle (2004) comment on this algorithm that it is probably one of the best performing for the
problem.
One could think that Galinier and Hao’s approach is yet another example of the analysis of a
problem and the design of a crossover based only on intuition, similar to Falkenauer’s. This would
be true but for one short paragraph in the paper by these authors (Galinier & Hao 1999). It says
that they performed a rough analysis of the search space of some random graphs and examined
the frequency of assignments of the same colour to pairs of vertices. The analysis revealed, they
say, that certain pairs of vertices were assigned the same colour more often than others. This way
they confirmed Falkenauer’s intuitive analysis of the objective function and laid firmer foundation
for their design of crossover.
To sum up, similarily to the designs of crossovers for the TSP, one can see in the case of
the graph colouring problem a gradual shift from general-purpose crossovers (e.g. binary or taken
from the TSP), through more specialised operators based on intuition, to designs based on some
empirical analysis of the search space of the considered problem. Specifically, the analysis of
frequency of the same colour in pairs of vertices (Galinier & Hao 1999) revealed that for graph
colouring problems it is beneficial in a crossover to preserve groups of vertices with the same colour.
Analysis similar to this one might be the basis for the design of crossovers for other problems, as
well.
Job shop scheduling problem
The next example concerns the well-known job shop scheduling problem (JSP) (Coffman 1976,
Mattfeld et al. 1999). This problem was attempted to be solved with many different metaheuristic
algorithms, with tabu search being superior in most cases (Watson et al. 2003). However, this
section covers only some examples of crossover operators designed for the problem.
Concerning crossover, Bierwirth et al. (1996) stated that this operator had to respect the
semantical properties of the underlying problem representation in order to be efficient, but these
properties were usually unknown in advance. In their work they examined this issue by checking
1 The
emphasis by Galinier & Hao (1999)
41
how three different crossover operators preserved in offspring certain type of distance between
parents. They considered generalised versions of two operators taken from the TSP: the generalised
order crossover (GOX) and the generalised partially-mapped crossover (GPMX), together with
a specialised crossover of their own, called the precedence-preservative crossover (PPX). This
last crossover was designed exactly to preserve in offspring the absolute order of jobs found in
parents. It was not a surprise, therefore, that they found out this PPX was the best in preservation
of the mentioned distance, which actually measured the difference in orderings of jobs between
two solutions. What was more important, their results showed that this crossover was the best
among the three, although it was not enough to obtain state-of-the-art results. Nevertheless,
their experiments confirmed to some extent the hypothesis of the authors that it was exactly the
absolute order of jobs which was an important characteristics of good JSP solutions.
Later, the same authors (Mattfeld et al. 1999) attempted to shed some light on the issue of
local search and recombination design for the JSP by means of a search space analysis. Concerning
recombination (of which crossover is a special case), they stated that the ‘well-known condition for
the success of recombination is seen in the partial similarity of local optima’, the opinion which
they reinforce by a reference to the earlier cited work on the TSP by Freisleben & Merz (1996).
They experimentally examined this similarity by means of the same measure as before. Thanks to
this investigation it transpired that local optima of the JSP were not more similar to each other
than random solutions. Some additional analysis also revealed that there were large plateaus
(i.e. subsets of interconnected solutions with the same objective value) in the search space. These
results convinced the authors to say that the JSP was probably difficult to recombination-based
algorithms and some other methods (like local search, tabu search) should be preferred.
And yet, some authors attempted to design and use specialised recombination operators to solve
the job shop problem. For example, Tezuka et al. (2000) designed their common cluster crossover
(CCX) focusing on the preservation of certain subsequences in offspring solutions. They stated
that ‘preserving good sub-sequences that reduces the setup times [of jobs] is important to reach
the good solutions quickly’. Despite apparently good motivation, which was in concordance with
the intuition behind the crossover of Bierwirth et al. (1996), they somehow failed to achieve the
purpose; their CCX operator performed poorly in the task of subsequence preservation, sometimes
being no more that one-point crossover. And although it was better in optimisation than the
general order-based operator (OX), it fared rather worse than e.g. tabu search algorithms.
Another trial of crossover design for the JSP was performed by Zhang et al. (2005). They also
started with the preservation of certain characteristics of parents as the operator’s goal. They
chose to preserve in offspring the position of some randomly selected jobs and the relative order of
others, thus borrowing the idea of order preservation from Bierwirth et al. (1996). The comparison
of their hybrid genetic algorithm employing this precedence operation crossover (POX) with wellknown tabu search algorithms demonstrated that this was a better design than CCX; the quality
of solutions was comparable to those of tabu search, although the computation time was longer.
To summarise, in the example of crossover designs for the job shop scheduling problem one can
see yet another type of solutions characteristic which should be preserved in a crossover operator:
the absolute order of elements (jobs) (Bierwirth et al. 1996, Mattfeld et al. 1999). Empirical
analysis of the search space of the problem revealed, however, that the use of recombinationbased algorithms (EAs, MAs) may not be a good idea for the problem at all, and some other
metaheuristics should be used (tabu search, most likely). Therefore, the JSP may serve as a
negative example for crossover design: under certain circumstances (results of analysis of the
search space) crossover design should not be attempted at all.
42
Crossover adaptation — summary
From the examples of crossover design described above one can draw several conclusions.
Firstly, it appears that the application of ‘standard’ operators (one-point, UX, OX, CX, PMX)
is not a good idea; a crossover should be adapted to the considered problem if efficiency of the
resulting algorithm matters.
Secondly, the design of specialised crossover should focus on aspects of solutions which are
important to their quality; properties of parents which are inherited by offspring must not be
arbitrary, but directly linked to the problem semantics. One might say that this is the meaning of
the frequently stressed, but somehow vague, notion of ‘naturality’ of a crossover in certain domain.
Thirdly, the more recent designs indicate that analysis of the search space of the considered
problem may shed some light on the issue of what an important solution property is. In particular,
designer’s intuition about a property may be empirically confirmed or rejected by statistical analysis of frequency of the property in good solutions of the problem. Such confirmation was obtained
by means of the fitness-distance analysis in the case of the TSP or the analysis of frequency of
pairs of vertices with the same colour for the graph colouring problem. On the other hand, the
hypothesis of similarity of good solutions with respect to precedence relations between jobs was
rejected in the case of the JSP. It can be said, therefore, that such empirical analysis should be
performed on a problem before actually the design of crossover is attempted. Such analyses usually rely on the notion of the fitness landscape of the considered problem (Altenberg 1997, Hoos
& Stutzle 2004, Merz 2000, Reeves & Rowe 2003).
Therefore, the author of this thesis thinks that the adaptation of a crossover operator to a
particular combinatorial optimisation problem may be performed on the basis of empirical analysis
of the fitness landscape of instances of the problem, namely the fitness distance analysis. In the
author’s opinion this is a way of an empirical and objective exploration of properties of the problem
followed by their exploitation in an algorithm; the way which is clearly indicated in the conclusions
from the No Free Lunch Theorems (see chapter 3). Hence, this approach to crossover design is
one of central issues of this thesis and will be elaborated upon in the next chapter.
4.5
Mutation operators
The issue of design of a mutation operator is closely linked to the role this operator plays in
evolutionary algorithms.
Initially, in genetic algorithms, operators of this kind were perceived as of much lesser importance than crossover. The role of mutation was basically to maintain diversity in the population,
the level of which was usually constantly decreased by the operations of crossover and selection
(Goldberg 1989, Merz 2000). In GAs mainly the bit-flip mutation was used, without much concern about whether it fitted the problem or not. No problem-specific adaptation was considered
in mutation design.
In evolution strategies, on the other hand, mutation had major importance in the algorithm
(Merz 2000, Michalewicz 1996). However, this group of methods was used primarily in numeric
optimisation and mutation had usually the form of a variation applied to a real-valued variable.
One of the most commonly used operators was Gaussian mutation with µ = 0 and adaptive σ.
Sometimes operators based on other distributions were used (uniform, Cauchy, etc.)
Evolutionary algorithms applied to combinatorial optimisation problems usually employed some
kind of mutation, although for a long time the role of this operator was only secondary (the perspective inherited from genetic algorithms). Due to the nature of combinatorial problems, simple
4.5. Mutation operators
43
bit-flip or Gaussian mutation became meaningless, though. Thus, some more problem-specific
operators were devised, mainly based on intuition or time-consuming experimental comparisons
of the final algorithms. Often, the choice of mutation was inspired by some neighbourhood operator used in local search methods, although the goal of such mutation was to randomly perturb a
solution, not to improve it through iterative process.
Michalewicz (1996) describes several mutation operators of this kind for the TSP. The operator
which is usually called inversion is actually the edge-exchange neighbourhood operator well known
in local search algorithms. Evolution strategies for the same problem also employed neighbourhood
operators as mutation: insertion (remove a vertex from a tour and reinsert it in a random location),
relocation of a subpath (very similar to Or-opt), exchange of two vertices (Michalewicz 1996).
In the case of grouping problems, Davies designed a mutation operator with sublist mixing or
an exchange of two fields in a chromosome (Michalewicz 1996), in conjunction with an indirect
representation of solutions. Falkenauer was probably first to use a kind of heuristic mutation
for these problems; in bin-packing he chose to remove the contents of some random bins and
then to heuristically reinsert the removed elements into the solution, as noted by Michalewicz
(1996). This procedure somehow resembles local search. Later, Falkenauer (1998) introduced
mutations operating on whole groups of elements: creation of a new group, removal of a group,
random exchange of elements between two groups. Some other mutation, similar to neighbourhood
operators for these problems, was used by von Laszewski, and Jones and Beltramo, Michalewicz
says (1996).
The job shop scheduling problem also saw mutation operators inspired by neighbourhood-based
methods: a change of relative order of two jobs, an exchange of two jobs on a critical path, sublist
mixing (like in the TSP) (Michalewicz 1996, Pawlak 1999). Even quite recently, Zhang et al. (2005)
employed the first operator in their hybrid EA for the JSP.
In the above examples one can see that mutation was chosen or designed rather in arbitrary
manner, usually based on some neighbourhood operator. Moreover, since it was deemed mutation
was of lesser importance than crossover, the issue of adaptation of this operator to a problem
was rarely raised, except perhaps the methods of adaptation of mutation probability at runtime
(Michalewicz & Fogel 2000); this seems more like tuning of parameters than problem-specific
design, though.
However, more recently some change of opinions concerning mutation was visible in the literature. Michalewicz (1996) says that the role of mutation in GAs was rather underestimated and
there is evidence that it may be even more important than crossover, especially when complex
representation is used. This point of view is also shared by Hoos & Stutzle (2004). Nevertheless,
these authors have little to say about how a mutation should be designed, except giving examples
of designs for certain combinatorial problems.
Even so, the ideas of mutation taken from genetic and evolutionary algorithms may have limited
application in the case of memetic ones, since the role of the operator changes. In MAs local search
is used after crossover and mutation, so the latter should be able to generate a jump in the search
space which is long enough to leave the local search attractor of the mutated solution (Merz 2000).
If mutation was equivalent to one step of the neighbourhood operator, as it usually was in EAs, the
local search would usually revert this effect and return to the starting point. Therefore, in MAs
the mutation operator should be at least based on a different neighbourhood than local search is.
Moreover, the distance of mutation jump should be evaluated in order to check if the operator is
able to escape the attractor. On the other hand, the length of a mutation jump cannot be too
large (the operator should not be too disruptive), since it might result in a completely random
solution instead of one with many properties inherited from the mutated solution (Merz 2000).
44
With these remarks in mind, Merz designed mutation operators for the TSP, the binary
quadratic programming problem (BQP) and the quadratic assignment problem (QAP) (Merz
2000). In the TSP he used a double-bridge move (also called a non-sequential four exchange),
the operation which is not easily reverted by Lin-Kernighan local search he implemented. The
length of a mutation jump for the BQP he set to the average distance between two local optimum
solutions. In QAP he applied an iterative exchange of two elements in a solution until the distance
of the mutant to the mutated one was above a certain threshold.
Merz also linked the mutation and crossover designs together and related them to the fitnessdistance analysis. As it was already mentioned in the section on crossover (4.4), if this analysis
reveals certain properties of the search space (large distance between local optima; no correlation
between fitness and distance), mutation should have major importance in a MA and its jump
should be rather long (Merz 2000). This idea was exploited in Merz’s MA for the QAP, where
fitness-distance correlation was not discovered. As Hoos & Stutzle (2004) note, this algorithm
abandoned recombination and employed only mutation, with very good results.
It seems, therefore, that the designer of a mutation operator for a MA should focus on the
issues emphasised by Merz: mutation should escape attractors of local optima by means of an
operation which differs from the neighbourhood operator used; it should be not too disruptive in
order to retain inherited properties; it should be of major importance when certain structure in
the search space (fitness-distance correlation) is not revealed.
4.6
Local search
In evolutionary algorithms ‘hybridization should be used whenever possible, a GA is not a black
box, but should use any problem-specific information available’, as Reeves & Rowe (2003) say.
Application of local search methods in EAs is a major way of such hybridisation. It was considered
as early as 1989 by Goldberg (1989) and later advocated e.g. by Culberson (1998), Merz (2000),
Michalewicz & Fogel (2000), Moscato & Cotta (2003).
The history of application of evolutionary algorithms hybridised with local search demonstrates
that such methods may yield very good optimisation results. Therefore, many authors emphasise
the utility of such hybrid evolutionary (or memetic) algorithms (Hoos & Stutzle 2004, Merz 2000,
Michalewicz 1996, Michalewicz & Fogel 2000, Reeves & Rowe 2003). For example, Hoos & Stutzle
(2004) state that pure EAs usually have limited capability of search intensification, and the introduction of local search optimisation in the evolutionary engine very often results in an increase of
this capability, with better algorithm performance in consequence. This can be seen e.g. in efficient
memetic algorithms for the TSP, which employ local search with edge-exchange neighbourhoods
(Hoos & Stutzle 2004, Michalewicz & Fogel 2000).
However, the application of some arbitrary local search method to a problem cannot be seen
as a universal remedy for the search limitations imposed by the No Free Lunch Theorems. The
local search part of a memetic algorithm has to be adapted to the considered problem in order to
improve the final algorithm. Concerning this adaptation, there are many choices to be made by a
designer, though.
4.6.1
Place for local search
Local search procedure may be invoked in several places of a memetic algorithm. Most of the
literature advocates to use it after the generation of initial solutions and after each variation
(crossover or mutation), as it was shown in chapter 2 (Merz 2000, Krasnogor & Smith 2005,
4.6. Local search
45
Michalewicz 1996, Michalewicz & Fogel 2000, Moscato & Cotta 2003, Jaszkiewicz & Kominek
2003).
Nevertheless, some authors also see other options, like making the LS a part of a recombination
operator. As examples may serve the inver-over operator by Michalewicz (1996) and the EAX
operator by Nagata & Kobayashi (1997) for the TSP, or Falkenauer’s (1998) operators for bin
packing. Another possibility is to completely replace mutation with local search (Krasnogor &
Smith 2005).
4.6.2
Choice of a local search type
Another design decision is related to the type of local search to be used. There are several
alternatives here, as well, but there are no clear guidelines when to use each of them.
One fairly common choice is to use some standard greedy or steepest local search, and employ
it always until a local minimum is found (Jaszkiewicz & Kominek 2003, Krasnogor & Smith 2005).
Jaszkiewicz & Kominek (2003) found out that in their problem the best option was to use the greedy
version to optimise initial solutions and the steepest one to improve offspring after crossover (they
did not consider mutation).
Krasnogor & Smith (2005) note, however, that such a strategy may be rather time consuming
and that local search should be stopped (truncated) earlier in order not to waste CPU time.
This approach may speed up the algorithm, but it also requires that a sensible enough truncation
strategy is developed, which rather adds complexity to an already complicated process of algorithm
design.
Another possibility is to use a local search algorithm in the broad sense of the word, namely
some more sophisticated procedure like tabu search (Galinier & Hao 1999) or simulated annealing
(Zhang et al. 2005).
4.6.3
Choice of a neighbourhood
Yet another important issue of local search adaptation concerns the choice of a neighbourhood
operator to be used. According to Merz (2000), such an operator should ideally generate nearoptimum solutions and induce only few local optima in the search space.
Some authors openly state that there is no universal rule for the choice of neighbourhood,
though. They say that it is entirely dependent on the problem being solved (Krasnogor & Smith
2005, Moscato & Cotta 2003) and only some limited advice may be given. Krasnogor & Smith
(2005) suggest using multiple neighbourhoods at the same time and adapting the probabilities of
their application at run time, but this suggestion seems to avoid the problem of choice rather than
solve it.
Despite this opinion, other authors give some advice concerning neighbourhoods. Mattfeld
et al. (1999) suggest that a neighbour should be only a slight modification of the original solution.
One can see that many practically used operators actually adhere to this rule, e.g. the operators
used in the TSP (edge-exchanges (Michalewicz 1996)) or vehicle routing problems (vertex shift
(Jaszkiewicz & Kominek 2003)).
Another suggestion given by Mattfeld et al. (1999) is to avoid infeasibility of generated neighbours.
Moscato & Cotta (2003) say that a local search operator should also potentially link any two
solutions in the search space.
In his Ph.D. thesis, Merz (2000) suggests that the size of a neighbourhood is a good indicator
of its practical utility. He says that usually the greater the neighbourhood, the better the quality
46
of results, but too large a neighbourhood quickly becomes impractical. He gives the limit of O(n2 )
for a reasonable neighbourhood size, where n denotes instance size.
4.6.4
Neighbourhood and landscape structure
Although the advice given above may make the decision on a neighbourhood easier, it seems that
it is not enough to completely discern between neighbourhoods, since there may be several move
operators which only slightly change a solution, produce feasible ones, link all of them together
and have reasonable size. Therefore, some more guidelines are required to properly choose a
neighbourhood and adapt local search to the considered problem.
One of them is related to some property of the fitness landscape of the addressed problem. This
property is called landscape ruggedness (or, conversely, smoothness) and intuitively it describes
how much different (or similar) evaluations of neighbouring solutions are. Merz (2000) says that ‘a
fitness landscape is said to be rugged if there is low correlation between neighbouring points of the
landscape’. Obviously, it depends on the employed fitness function and neighbourhood relation.
Ruggedness is usually measured by the so-called correlation length of a random walk in the
landscape (Bierwirth et al. 2004, Hoos & Stutzle 2004, Mattfeld et al. 1999, Merz 2000, Reeves
& Rowe 2003). The value of a correlation length may be derived analytically in some cases, but
usually it is computed based on a sample of random walks in the landscape. The latter method
requires the landscape to be isotropic, i.e. to have the same properties in the whole search space.
This is not always the case, though, as demonstrated recently for the JSP (Bierwirth et al. 2004).
The knowledge of correlation length may be exploited exactly in the choice of a move operator
for local search. However, there is some confusion about the interpretation of the value. Mattfeld
et al. (1999) state that the lower the correlation length, the easier the problem for local search
which is based on the related neighbourhood. Exactly the opposite interpretation is suggested in
more recent works of Merz (2000), and Hoos & Stutzle (2004). The latter interpretation provided
the basis for distinction between edge-based and vertex-based neighbourhoods for the TSP; the
edge-based moves induce higher correlation lengths, so this result confirms the universally held
belief that in this problem manipulation on edges is better than on vertices (Merz 2000).
4.6.5
Efficiency of local search
As noted in section 3.4, adaptation of a metaheuristic to a problem may also concern the speed
of the designed algorithm. From this point of view there are several options concerning local
search, since many authors stress the importance of local search efficiency (Hoos & Stutzle 2004,
Jaszkiewicz & Kominek 2003, Merz 2000).
One of the proposed techniques is to employ some form of neighbourhood pruning (Hoos &
Stutzle 2004), either strict or probabilistic. In some cases, large number of neighbours of a solution
may be provably worse than the current one (strict pruning); this may be seen e.g. in the JSP. In
other cases, there is high probability that some neighbours do not lead to any improvement, so
they are examined with small probability (probabilistic pruning).
Another method of acceleration of local search is to substitute the calculation of the objective
value for a neighbour with the calculation of the difference between the current solution and the
neighbour (Hoos & Stutzle 2004, Jaszkiewicz & Kominek 2003, Merz 2000). This so-called incremental update scheme may be used in problems where a slight change in contents of a solution
does not require recomputation of the objective value from scratch (the locality of transformation). However, not all objective functions have this desired property (e.g. the flowshop scheduling
problem with the makespan minimisation objective, see the paper by Ishibuchi et al. (2003)).
4.7. Other components and techniques
47
But when this property is present in the objective function, also the ‘don’t look bits’ (caching)
technique may be used in order to speed-up the search process (Bentley 1990, Hoos & Stutzle 2004,
Jaszkiewicz & Kominek 2003). It consists in storing the evaluations of neighbouring solutions in
some auxiliary memory (cache). If the employed move operator only slightly changes the current
solution, most of the neighbourhood stays intact and does not have to be evaluated again. Of
course, a part of the neighbourhood does change and has to be recomputed, and it is the size of
this part which impacts the performance of the technique. If it is too large, then cache management
may be too expensive. Moreover, this technique is useful rather with steepest local search than
greedy one (Hoos & Stutzle 2004). The former always scans the whole neighbourhoods and cached
evaluations are always useful. This is not the case with greedy search; it usually scans only a small
fragment of the neighbourhood and, hence, cache is less often used.
Yet another technique of acceleration/adaptation to a problem considers local search interaction with crossover in the presence of positive fitness-distance correlation. Some designs of local
search which were based on this feature resulted in significant accelerations of the whole memetic
algorithms (Jaszkiewicz 1999, Merz 2000).
4.7
Other components and techniques
There are also other techniques of adaptation of an evolutionary algorithm to a problem, which are
perhaps less known and popular, like the diversity maintenance. They are outside the scope of this
thesis and are not elaborated upon; the interested reader is referred to major books and papers
on the subject (Hoos & Stutzle 2004, Michalewicz & Fogel 2000, Reeves & Rowe 2003, Krasnogor
& Smith 2005).
4.8
Conclusions
The review of methods of adaptation given in this chapter may be the source of one major conclusion: there are several components of memetic algorithms which have to be adapted to the
considered problem before the algorithm is actually executed, but there are multiple choices for
each of them and usually there is little guidance in the literature for a practitioner on when and
how to use them (except, perhaps, the advice based on landscape ruggedness or fitness-distance
measurement).
Indeed, in some serious studies this state of the art is complained about. Michalewicz &
Fogel (2000) admit that there is little theoretical background that could help designing hybrid
evolutionary algorithms. Hoos and Stutzle in the epilogue of their book (2004) summarise this
state by saying that:
much of the work on designing and applying SLS [i.e. metaheuristic] algorithms in many
ways resembles a craft rather than a science (. . . ), experience, rather than intellectual
understanding, is often the key to achieving the underlying goals.
Krasnogor & Smith (2005) also subscribe to this point of view admitting that ‘the process of
designing effective and efficient MAs currently remains fairly ad hoc and is frequently hidden
behind problem-specific details’.
This large amount of intuition and experience required to design good MAs (and metaheuristics
in general) is the base of scepticism about such algorithms and the opinion that their design lacks
48
a systematic, scientific approach (Hoos & Stutzle 2004, Moscato & Cotta 2003, Hammond 2003)2 .
That is why well-known authors in the field see the problem of explaining and predicting the
performance of evolutionary algorithms as one of the most important in the theory of computation
(Hoos & Stutzle 2004, Moscato & Cotta 2003, Reeves & Rowe 2003). Moreover, research conducted
in this direction will most likely result in the deeper understanding of the relationships between
properties of combinatorial optimisation problems and metaheuristic algorithms, and eventually
provide solid basis for practical designs (Hoos & Stutzle 2004).
Due to these facts the issue of a systematic design and adaptation of the memetic algorithm
(and, generally, metaheuristics) to a combinatorial optimisation problem is of major importance
for the practise of computation. What is also worth noting, because of this lack of knowledge
on relationships between problems and algorithms, some authors strongly recommend to avoid
overloading algorithms with numerous and diverse components; the computational method experimented with should be as simple as possible in order to make the understanding of basic
relationships easier (Krasnogor & Smith 2005, Michalewicz & Fogel 2000, Moscato & Cotta 2003).
From this perspective, the most interesting approaches to the adaptation of the MA described
in this chapter are the ones which are based on some analysis of the problem to be solved. These
are the analysis of landscape ruggedness or the relationship between solution fitness and distance.
The author of this thesis chose the latter for its subject, because in the past the designs based on
fitness-distance analysis lead to efficient algorithms (Galinier & Hao 1999, Hoos & Stutzle 2004,
Jaszkiewicz & Kominek 2003, Merz 2000), and also because ‘a systematic method for designing
appropriate mixing [i.e. crossover and mutation] operators would be extremely helpful’ (Reeves &
Rowe 2003, page 283).
Therefore, this thesis goes further along the lines of research drawn earlier by scientists like
Kirkpatrick & Toulouse (1985), Muhlenbein (1991), Boese (1995), Boese et al. (1994), Jones & Forrest (1995), Altenberg (1997), Merz (2000), Watson et al. (2003), Jaszkiewicz (1999), Jaszkiewicz
& Kominek (2003). The concepts of the fitness landscape and the fitness-distance correlation,
which were central to their research and are the basis for construction of crossover operators, will
be presented in more detail in the next chapter.
2 Chris Stephens said in the interview (Hammond 2003): ‘I think it [i.e. evolutionary computation] lacks a
systematic ‘scientific’ point of view — people are always tending to move on to the next problem, invent the next
little widget for their algorithms rather than do a more systematic, more thorough analysis of what they have
already done’.
Chapter 5
Fitness-distance analysis for adaptation
of the memetic algorithm to a
combinatorial optimisation problem
Fitness-distance analysis relies heavily on the notion of a fitness landscape. Therefore, the latter
will be defined and commented on first.
5.1
Fitness landscape
Intuitively, a fitness landscape is a graph where solutions play the role of vertices and edges indicate
the neighbourhood relation. It is also labelled on vertices with real values of the fitness function.
This graph may be imagined as a three dimensional discrete surface (hence the name, landscape):
the first two dimensions are spanned over solutions and the last one indicates fitness. However, in
practise this imaginative structure may have to few dimensions to precisely describe the usually
multidimensional search spaces of combinatorial problems (Reeves 1999).
Taking the formal perspective, two different definitions of a landscape may be found in the
literature on evolutionary computation.
5.1.1
Neighbourhood-based definition
A landscape L of an instance I of a combinatorial optimisation problem π is a triple L = (S, f, N ),
where S = Sπ (I) is the set of solutions of this instance, f is the fitness function and N is a
neighbourhood defined for solutions in S. This is the most commonly found definition (Merz
2000, Moscato & Cotta 2003, Reeves & Rowe 2003, Mattfeld et al. 1999, Bierwirth et al. 2004,
Michalewicz & Fogel 2000, Schiavinotto & Stutzle 2007).
5.1.2
Distance-based definition
A less common definition says a landscape is a triple L = (S, f, d), with only d being different
from N which appears above. Here, d is a distance measure: d : S × S → R (Merz 2000, Reeves
& Rowe 2003).
49
50
Fitness-distance analysis
It is usually desired that d possessed properties of a distance metric, namely:
∀s, t ∈ S d(s, t) ≥ 0
d(s, t) = 0 ⇔ s = t
∀s, t ∈ S d(s, t) = d(t, s) (symmetry)
∀s, t, u ∈ S d(s, u) ≤ d(s, t) + d(t, u) (triangle inequality)
Such properties usually facilitate the interpretation of values of the measure, but it is not always
necessary or possible to define a measure which possesses them all. In particular, symmetry may
be not required in practise, while the triangle inequality may be difficult to hold or prove. Without
the property of the triangle inequality a measure is usually called a semi-metric.
5.1.3
Comparison of definitions
Reeves & Rowe (2003) comment on the two definitions that the neighbourhood-based one is perhaps
more common because a neighbourhood relation may be easily converted into a distance metric:
distance between solutions s, t ∈ S is simply measured as the minimum number of neighbourhood
moves required for s to become t. However, this is not always a practical thing to do and sometimes
an operator-independent measure is needed, they say.
Nevertheless, it seems that in most cases the two definitions have identical meaning and may
be used interchangeably.
5.1.4
Applications
When a fitness landscape is determined, many of its properties may be precisely defined and examined. According to Hoos & Stutzle (2004) these are, among others: types of positions (solutions)
in the landscape and their distributions (e.g. local minima, interior plateaus); the number and
density of local minima; fitness-distance correlation; landscape ruggedness; exits and exit distributions; plateau connection graphs. Most of these properties have some influence on the efficiency
of optimisation algorithms, these authors say. Fitness-distance correlation is only one of them.
5.1.5
Landscape and fitness function
One can see in the definition of a landscape that it clearly depends on the analysed fitness function
(usually being the original objective, as well).
In practise, especially when approaching a new problem, it may happen that the fitness function
is not entirely defined yet and an algorithm’s designer is also an analyst of the related real-world
situation. In such a case there is a possibility of defining the fitness function in a way which could
make the landscape easier for optimisation, possibly having some desired properties.
On the other hand, if it is required that global optima are found, the fitness function may be
also manipulated with, provided that optima of the modified fitness are exactly the same. Similarly
to the previous case, the resulting landscape may posses some more desirable properties than the
one for the original fitness.
Another possibility is to have the same fitness function and modify its decision variables. This
may also help create a landscape which is easier for optimisation. An example of such a modification
was given by Altenberg (1995) on a function defined by Michalewicz (1996). As a result, for the
same fitness function a very different landscape was obtained, which was much smoother and had
less local optima.
5.1. Fitness landscape
5.1.6
51
Landscape and distance measure
The form of a landscape depends also on the employed distance measure d (or the neighbourhood
relation N ). If the given fitness function and its variables are unmodifiable (e.g. explicitly given in
the problem formulation, as in the case of classical problems), then only a change in the definition
of distance may change the landscape.
In this context two issues regarding landscape definition and utility arise.
Landscape form depends on a distance measure
This dependence was not fully realised at first. Falkenauer (1998), for example, suggested that
a distance measure should not have been explicitly defined because it would result in the loss of
generality of reasoning. He explained that in order to precisely define a distance between two
points in a landscape, one had to state what the search operator in the designed algorithm is, and
then the focus of research would be on the fitness function and the operator, not on the function
itself.
However, without a clear definition of distance one does not exactly know the structure imposed on the search space: which solutions are connected, which are far apart. Only with an
unambiguous distance a landscape exists, all its properties become clearly defined and may be examined (Altenberg 1995, Hoos & Stutzle 2004, Reeves & Rowe 2003). Moreover, it is not required
to design a complete algorithm first in order to have a landscape, although some correspondence
between the two may be important.
Concerning the existence of distance measures which could help define landscapes for diverse
search spaces, Reeves & Rowe (2003) state that in the continuous case there is little problem
compared to the combinatorial one. They say that for continuous variables ‘the landscape is
determined only by the fitness function’, meaning most likely that there is a natural distance
measure for continuous spaces: the Euclidean metric. Although they may be right that this is
usually the first and natural choice, one should remember that it is only a special case of the
Lp metric, with p = 2. Putting p = 1 one gets city distance, and for p = +∞ it becomes the
Tchebycheff distance.
Nevertheless, Reeves and Rowe are definitely right saying that the choice or design of a distance
measure for a combinatorial search space is difficult, perhaps due to the larger variety of considered
objects (not only vectors of real numbers) and less research in this direction in the past. Despite
this, some advice on distance measures for combinatorial landscapes may be found in the literature.
Firstly, the distance measure employed in the definition of a landscape should not be directly
based on the analysed fitness function. Bonissone et al. (2006) state it clearly when they consider
distance used as diversity in an EA: in such a case distance could not differentiate solutions that
lie e.g. near different global optima with similar fitness.
Secondly, distance should take into account such properties of solutions which are important
for fitness. Here, Hoos & Stutzle (2004) give an example of the distance most commonly used for
TSP solutions, the bond distance (Boese 1995). This measure takes into account the relative order
of vertices in a tour, rather than some arbitrary properties of any representation of the tour.
Thirdly, there are some examples of distance measures defined for combinatorial structures
encountered in optimisation problems. One of them is the already mentioned bond distance for
TSP solutions. Another widely used one is the Hamming distance for bit strings, which was also
employed in numerous different contexts (e.g. the set covering problem (Finger et al. 2002), the
QAP, the BQP, the GBP (Merz 2000)). Reeves, also with Yamada, (1998, 1999) analysed the
landscape of the flowshop scheduling problem using several simple distance measures. Fitness
52
landscape of some JSP instances was defined using the disjunctive graph distance (which was also
inspired by the Hamming distance to some extent) (Bierwirth et al. 1996, Mattfeld et al. 1999,
Bierwirth et al. 2004). Some more distance measures may also be found in the literature, specifically
for permutations (Schiavinotto & Stutzle 2007) or under the notion of diversity (Mattiussi et al.
2004). It seems, however, that there is still some lack of appropriate measures for combinatorial
structures, especially those more involved than permutations. This fact somehow decelerates
research on fitness landscapes of practical problems.
Finally, one has to remember that distance should be efficiently computable. It would be of
no practical use otherwise. An example of a distance measure which is hard to compute (NPhard) is the one based on the 2-edge exchange neighbourhood operator for the TSP. Because of
this fact it was eventually replaced by an approximation, the bond distance (Boese 1995, Hoos &
Stutzle 2004, Schiavinotto & Stutzle 2007).
Perception of a landscape by a search algorithm
Falkenauer (1998) was reluctant to clearly define distance because he did not want his reasoning
on fitness functions to become dependent on any particular search operator (algorithm). It could
be seen from the definition of a landscape that it does not involve any complex algorithm, although
it does involve a neighbourhood operator or a distance measure. Therefore, a landscape may exist
and be analysed without any complex algorithm in mind.
What Falkenauer was right about, though, was that for a landscape analysis to be of any
practical use it is important that it somehow corresponded to the eventually designed algorithm.
If landscape analysis is performed in order to gain insight in the search space structure and the
obtained information is to be exploited in some algorithm, then the algorithm should perceive the
search space from the same perspective, the same landscape. Otherwise there would be no link
between the algorithm and the landscape analysis performed earlier. This viewpoint is also shared
by Reeves & Rowe (2003), and Hoos & Stutzle (2004).
5.2
Fitness-distance analysis (FDA) examines a fitness landscape in search for a relationship between
fitness (quality) of solutions and their distance to the search goal, a global optimum (Jones &
Forrest 1995, Reeves 1999, Merz 2000, Hoos & Stutzle 2004). Most often it is a form of a statistical
analysis of a sample of good solutions of a problem instance which is summarised by a value of
the correlation coefficient between fitness and distance. The desired result of this analysis (in
maximisation problems) is a strong relationship with a highly negative correlation, because ‘if
fitness increases when the distance to the optimum becomes smaller, then search is expected
to be easy for selection-based algorithms, since there is a ‘path’ to the optimum via solutions
with increasing fitness’ (Merz 2000). This Merz’s statement about practical profits from negative
correlation in selection-based algorithms (e.g. EAs) may also be supported by similar opinions
expressed by Jones & Forrest (1995), and Hoos & Stutzle (2004).
This desirable structure in the fitness landscape is sometimes also referred to as ‘big valley’
(Boese 1995, Boese et al. 1994, Reeves 1999), ‘central massif’ (Reeves & Yamada 1998) or ‘global
convexity’ (Boese 1995, Boese et al. 1994, Jaszkiewicz & Kominek 2003).
53
5.2. Fitness-distance analysis
5.2.1
Basic approach
Probably the most common approach to the FDA requires that at least one global optimum of
each analysed instance is known in advance (Jones & Forrest 1995, Boese 1995, Merz 2000, Hoos
& Stutzle 2004).
Landscape sampling
A sample is taken of good solutions of the analysed problem instance. This sample is usually large
(e.g. 500, 1000 or even more solutions) and most often consists of local optima only. The solutions
are generated by starting from randomly chosen points in the landscape and then some simple,
randomised local search is performed independently on each of them (Boese 1995, Merz 2000, Hoos
& Stutzle 2004, Reeves 1999).
Such a sampling procedure may seem to be biased, since local search does not uniformly sample
the space of all solutions. This is indeed the case and is done on purpose, because optimisation algorithms (e.g. MAs) are usually highly biased toward good solutions. The purpose of this sampling
is to approximate the set of solutions an optimisation algorithm may encounter during search, not
the set of all solutions (Hoos & Stutzle 2004). In case of MAs and many other algorithms based
on local search only local optima are accepted to the population; hence, local optima are sampled.
For each sampled solution s two values are computed: quality f (s) and distance to the nearest
known global optimum dopt (s). Also, for each pair s1 , s2 of solutions in the sample the distance
between them is calculated, d(s1 , s2 ).
Distance between local optima
One stage of the analysis is the examination of distance between local optima in the sample and
its comparison to the distance arbitrary solutions of the problem instance may have. Usually the
average distance between local optima in the sample is computed:
d¯ =
n
n
X
X
2
d(si , sj )
n(n − 1) i=1 j=i+1
and is compared to the average distance between random solutions (Boese et al. 1994, Mattfeld
et al. 1999) or to the analytically computed extension (diameter) of the search space (Boese 1995,
Merz 2000), i.e. the maximum distance two solutions in the problem instance may have. This way
two questions may be answered.
• Are local optima of the analysed instance spread all over the search space or rather confined
to its smaller fragment?
• What is the size of the subspace containing local optima?
If it transpires from the analysis that local optima are more tightly clustered in the search space
than arbitrary/random solutions are, it means that they share some common properties. This fact
may be later exploited in the designed algorithm (Boese 1995, Boese et al. 1994, Merz 2000).
Fitness-distance correlation
The relationship between values of f and dopt in the sample is summarised using the linear correlation coefficient, which in this context is called the fitness-distance correlation (FDC). Given
54
the sample s = {s1 , . . . , sN }, the FDC is computed as (Jones & Forrest 1995, Merz 2000, Hoos &
Stutzle 2004, Reeves & Rowe 2003, Bierwirth et al. 2004):
r=
cov(f, dopt )
sf · sdopt
where cov denotes the estimate of covariance of two variables:
cov(f, dopt ) =
N
1 X
(f (si ) − f¯)(dopt (si ) − d¯opt )
N i=1
f¯ and d¯opt are means of fitness and distance in the sample, and s the estimate of the standard
deviation of a variable, e.g.:
v
u
N
u1 X
sdopt = t
(dopt (si ) − d¯opt )2
N i=1
Being simply a linear correlation coefficient, the value of FDC always belongs to the closed
interval [-1,1].
It should be noted that in this context FDC is only a descriptive statistic (Jones & Forrest 1995);
no probabilistic model of the relationship between fitness and distance is assumed here, which would
include FDC as a parameter.
Interpretation of values of FDC obviously depends on the direction of optimisation in the
analysed problem. For minimisation problems, ‘high correlations suggest that the local optima
are radially distributed in the problem space relative to a global optimum at the centre; the more
distant the local optima are form the centre, the worse their objective function values’ (Reeves &
Yamada 1998); Jones and Forrest call such a landscape ‘straightforward’ for GAs. FDC equal to 1
would indicate a perfectly linear relationship between fitness and distance, and promise the search
through the landscape to be easy (Merz 2000). On the other hand, highly negative values of r
indicate a misleading problem, where solutions with better quality tend to be further and further
away from the search goal. Finally, r ≈ 0 indicates no relationship between fitness and distance
(hence, no guidance for search from the fitness function) or some nonlinear relationship, which is
poorly summarised with linear correlation (Jones & Forrest 1995, Hoos & Stutzle 2004).
Scatter plots of fitness vs. distance
The relationship between fitness and distance may also be visualised in a scatter plot. This is a plot
with fitness on one axis and distance on the other, with each sampled solution’s pair (f (s), dopt (s))
shown as one point (Jones & Forrest 1995, Merz 2000, Hoos & Stutzle 2004, Reeves 1999, Reeves
& Rowe 2003). This fitness-distance (FD) plot is said to be useful especially in cases with r ≈ 0,
when a nonlinear relationship might be overlooked (Jones & Forrest 1995, Hoos & Stutzle 2004).
Merz (2000) even says this plot is more informative than the FDC coefficient.
For a minimisation problem, the desired shape should contain a tendency: with decreasing
fitness (increasing quality) the values of distance should be smaller. Ideally, there should be only
such a tendency, without any noise, outliers or other components (FDC near 1).
Examples of FD plots are given in figure 5.1. They were generated from real FD analyses. Note
the possible differences in shapes (a flattened oval going slightly up (left) or a cloud with several
horizontal groups of points below (right)). Note also the related values of FDC and how well (or
poorly) they correspond to the visible shapes.
Some very interesting, diverse plots may also be found in the paper by Jones & Forrest (1995)
and Merz’s Ph.D. thesis (2000).
55
0.8
1
0.9
0.8
0.7
Distance
Distance
0.7
0.6
0.6
0.5
0.4
0.3
0.2
0.1
0.5
1350
1400
1450
1500
0
240
Fitness
250
260
270
Fitness
280
290
300
Figure 5.1: Examples of scatter plots of fitness vs. distance (FD plots) with superimposed lines of first order regression. In the left plot r = 0.487; in the right plot
r = 0.214.
5.2.2
Examples of analyses from the literature
Table 5.1 lists examples of the fitness distance analysis that may be found in the literature. For
each problem (in column 1) the authors of the analysis are noted, together with the year of the
related publication (2). The next column (3) indicates the distance (or similarity) measure used
in the analysis. Further, the applied sample size is given (4) and the kind of reference solution(s)
used to compute distance to (5). Next, the analysed instances are shortly described (6) and the
obtained values of FDC coefficients are given (7). The last column (8) indicates the classification
of the result as given by the author(s) of the analysis; a ‘+’ means that the fitness distance analysis
result was evaluated as positive (i.e. a ‘big valley’ exists), while a ‘−’ indicates no significant FDC
(no ‘big valley’).
Note that an empty entry in the table means the entry has exactly the same value as the one
immediately above it (in the previous row).
It should be noted that not all of the listed analyses were conducted with the basic approach
described earlier. The most important differences are in the reference solution(s) used to compute
the distance of sampled solutions to (column 6). In many cases global optima of the analysed
instances, which are required in the basic approach in order to compute dopt (s), were unknown. In
these cases some other solution (e.g. the best-known) or a group of other solutions (all other local
optima in the sample; all not worse in the sample) were used. Moreover, one can see that there
were also major differences in sizes of samples. All these issues will be discussed later.
The author’s first impression while looking at table 5.1 is that FDA is not a completely new
analysis technique; it has already been performed a number of times and applied to a variety of
optimisation problems. It has also been a subject of publications in serious journals and conference
proceedings.
Boese et al. (1994)
Merz (2000)
Graph
Bipartitioning
Problem
a modified
Hamming distance
up to 10000
2500
2500
0.99
(0.66,0.91)
0.54
(0.02,0.07)
regular graph
grids
mesh
‘caterpillar’ graphs
or best-known
(0.22,0.82)
random graphs
‘apparent’
(not given)
0.01
N=1024, K=11
random graphs
regular graphs
(0.24,0.62)
N=1024, K=2,4
global optima
all other
local optima
best-known
≈0.0
N = 12, K = 11
Merz (2000)
(0.35,0.83)
N = 12, K < 3
NK-landscapes
(−0.5, −0.3)
≈0.0
0.88
1.00
3 test instances
1 test instance
FDC values
(fitness max.)
A 4-bit fully
deceptive problem
global optima
Instances
1 test instance
4000; the whole
space if smaller
Reference
solution
Needle in
a haystack
Hamming distance
Sample size
1 test instance
Jones & Forrest (1995)
OneMax
Distance/
Similarity
Porcupine
Author(s),
Year
Problem
Table 5.1: Fitness-distance analyses described in the literature.
−
+
+
+
+
+
−
+
−
+
−
−
+
+
Class.
56
Problem
Watson et al.
(2002)
Reeves (1999)
2500
Hoos & Stutzle (2004)
Flowshop
Scheduling
up to 2500
Merz (2000)
precedence metric
2 adjacency metrics
precedence metric
position metric
1000
50
2500
Problem
bond distance
Boese (1995)
Sample size
Travelling
Salesman
Distance/
Similarity
Author(s),
Year
Problem
all other
local optima
global optima
global optima
all other
local optima
global optima
global optima
Reference
solution
(0.07,0.27)
0.2
2 ‘drilling’ graphs
1 hard instance
visible
(not given)
not visible
(not given)
machine-correlated
mixed correlation
less significant
example: 0.12
‘significant’
example: 0.54
job-correlated
random instances
50 known random
benchmarks
0.68
(0.4,0.99)
‘fractal’ graphs
geographic graph
(0.54,0.7)
(0.4,0.57)
FDC values
(fitness max.)
geographic graphs
geographic graph
Instances
−
+
−
+
+
−
−
+
+
+
Class.
57
percentage of
Vehicle Routing
Problem
common features
(4 measures)
Jaszkiewicz & Kominek
(2003)
A real-world
number of different
elements
Finger et al. (2002)
Hoos & Stutzle (2004)
for permutations
Problem
Set Covering
Problem
Assignment
Hamming distance
500
1000
1000
10000
global optima
local optima
all not worse
or best-known
global optima
best-known
or best-known
global optima
Merz (2000)
2500
Reference
solution
Quadratic
Hamming distance
Sample size
or best-known
Merz (2000)
Binary
Distance/
Similarity
Quadratic
Programming
Author(s),
Year
Problem
10 instances
15 real-world instances
29 random benchmarks
(0.45,0.61)
(0.0,0.25)
(0.29,0.78)
0.03
(0.09,0.29)
random instances
high flow dominance
1 benchmark
(0.16,0.46)
0.62
(0.0,0.37)
0.31
(0.53,0.81)
FDC values
(fitness max.)
random instances
low flow dominance
1 benchmark
14 benchmarks
1 benchmark
24 benchmarks
Instances
+
−
+
−
−
+
+
−
+/−
+
Class.
58
2 measures, for
1 encoding each
Cotta & Fernández
(2005)
Optimal Golomb
Ruler Problem
percentage of
common features
(5 measures)
Jaszkiewicz (2004)
disjunctive graph
distance
Distance/
Similarity
problem
A satellite
management
Watson (2005)
Beck & Watson (2003)
Job-shop
Scheduling
Problem
Author(s),
Year
Problem
variable
200 best
out of 1000
not given
not given
Sample size
global optima
all not worse
local optima
global optima
global optima
Reference
solution
1 instance
(0.8,0.9)
(0.5,0.9)
‘poor’ (not given)
2nd half
6 instances
high (not given)
few outliers
in (0.05,0.5)
largely in
(0.5,0.9)
FDC values
(fitness max.)
half of known
random benchmarks
1000 small
random instances
Instances
+
+
−
+
−
+
Class.
59
60
Significance of FDC
Looking at the values of the fitness-distance correlation one may notice that the values classified
as revealing a ‘big valley’ structure in the landscape were usually moderate. Values of r as large as
0.4, or even 0.3, were deemed significant, e.g. for NK landscapes, the TSP, the SCP. Such values
were usually backed up by visible trends in fitness-distance plots. However, from the statistical
point of view, r = 0.3 is rather weak and means that only r2 = 0.09 · 100% = 9% of variance of
one variable (e.g. dopt ) may be explained be changes in the second one (e.g. f ) through a linear
regression model (Ferguson & Takane 1989). For r = 0.4 the explained variablity amounts only
to r2 = 16%. And yet, such results were perceived as supporting the ‘big valley’ hypothesis about
the related fitness landscape.
Dependence of FDC on instance type
Another important phenomenon visible in table 5.1 is the dependence of FDA results on the type of
the analysed instances (column 4); the phenomenon was also noticed by Merz (2000). This means
that FDC value is not a characteristic of a problem, but of a problem instance. This dependence
is clearly visible in almost all listed problems: NK-landscapes, the TSP, the JSP, the SCP and,
perhaps to a lesser extent, in the GBP and the QAP.
In NK-landscapes one may see the FDC value depends on K, second of the two numbers
denoting the instance type. In this problem, K determines the number of variables each of the N
binary variables in the problem in dependent on. It appears as though values of K larger than 4
make FDC fall down to zero.
In the TSP the borderline between instance types is less apparent. Instances based on geographic data (e.g. cities in the USA) usually give rise to high FDCs. The same happens for ‘fractal’
instances. However, real-world drilling problems result in instances with virtually no FDC and it
is difficult to judge why this type is so different from the previous ones; perhaps many edges of
the same length pose a problem to FDC.
Most of benchmark instances of the quadratic assignment problem reveal no significant FDC
values. However, random instances of a special type (low flow dominance) may have correlations
larger than zero.
The set covering problem provides an interesting case (Finger et al. 2002, Hoos & Stutzle
2004). On one hand, 29 randomly generated benchmarks from the OR-Library usually exhibit
correlations significantly larger than zero. On the other hand, 15 real-world instances stemming
from applications in airline crew scheduling reveal no correlation of fitness and distance.
Small random instances of the job-shop scheduling problem usually have high FDCs, while
well-known random benchmarks are divided in two with respect to this coefficient, Watson (2005)
reports.
This dependence on the instance type is not visible for the real-world vehicle routing problem
(Jaszkiewicz & Kominek 2003), the satellite management problem (Jaszkiewicz 2004) and the
optimal Golomb ruler problem (in the last case there are no instance types, though; only instance
size may vary). In all these cases all the analysed instances revealed high FDCs.
To summarise the issue, the dependence of FDC on the instance type usually exists, and it
is not good news, because it is not entirely a problem characteristic. Moreover, it is difficult for
the author to say what actually decides that one instance has high fitness distance correlation
and some other has not. As far as the author knows, in the literature there is no satisfactory
explanation for this dependence.
What is also important, some of the results covered here indicate that it might be dangerous
61
to generalise about FD properties of a problem based only on the analysis of randomly generated
instances. Such instances may have quite different landscapes from their real-world counterparts,
as it was in the case of some TSP instances, or in the SCP.
Dependence of FDC on distance measure
The contents of table 5.1 also confirm that a change in the distance measure strongly influences the
properties of the analysed fitness landscape (it actually changes the whole landscape, as noted in
section 5.1.6). This influence is clearly visible in FDC values in the flowshop scheduling problem,
for which 4 different distance measures were employed in the analysis. Two adjacency-based
measures give rise to landscapes without ‘big valleys’, while the measures based on position and
precedence relations let FDA reveal significant correlations.
Also in the case of the real-world VRP and the satellite management problem a change in a
similarity measure influences landscape properties. Here, this influence is perhaps less apparent
due to the fact that all defined measures are positively correlated with fitness.
This dependence of FDC on a distance measure may be interpreted as good news: a problem
instance may be analysed from different points of view (i.e. with different measures) and it is
enough that one measure correlates with fitness to have a ‘big valley’ in the landscape.
Dependence of FDC on the type of sampled solutions
The computed value of FDC may depend on the type of algorithm used to generate solutions to
a sample. Several authors experimented with different algorithms for the same instances and it
appeared that such a change may somewhat alter the value of correlation.
Merz (2000), for example, analysed TSP instances with two types of local optima: 3-opt and
Lin-Kernighan. It transpired that the use of local optima of the second type usually resulted
in higher FDCs, although the final instance ‘big valley’ status remained the same. He obtained
similar results for the GBP, where the use of a more powerful local search also resulted in more
artificially looking fitness-distance plots.
Another example may be the work of Reeves (1999). He experimented with 5 different types
of local optima and his FDA results were to some small extent different for each of the types.
Distance between local optima
Table 5.1 could not fit the results concerning distances between local optima, so they will be
discussed here.
According to Boese (1995) yet Kirkpatrick & Toulouse (1985) noticed that local optima of the
TSP were surprisingly similar (close) to one another. Their findings were confirmed by Boese et al.
(1994) in case of randomly generated instances on the unit square: the maximum distance between
local optima was less than the half of the average distance between random solutions. Later, Boese
(1995) also confirmed this fact for one geographical TSP instance with 532 vertices.
Merz (2000) extended these previous studies of the TSP by examining different types of instances. His findings were similar with respect to geographic and ‘fractal’ instances (d¯ ≤ 1/4
search space diameter), but for one ‘drilling’ instance the average distance between local optima
was rather large (0.41 of the diameter); note that he computed the average distance, as opposed
to the maximum one used by Boese (1995).
Similar analysis was performed by Boese et al. (1994) on random GBP instances. There, they
found that the maximum distance of local optima sometimes happened to be as large as the average
distance between random solutions, quite contrary to the results for the TSP. Merz (2000) obtained
62
similar results for the same type of instances, but regular and grid-like graphs had more similar
local optima. Interestingly, similar optima happened to exist in instances with larger FDC values.
The comparison of distances between local optima and random solutions was also performed
on the JSP. In this case ‘local optima are spread all over the fitness landscape’, Mattfeld et al.
(1999) concluded.
Reeves (1999), on the other hand, concluded from similar analysis of the flowshop scheduling
problem that local optima seemed to be close to each other, although he did not dwell on the
matter.
Merz (2000) also analysed from this point of view the NK-landscapes, the BQP and the QAP.
His results were similar to conclusions on FDC:
• NK-landscapes with K = 2 had very concentrated local optima; for K = 4 this concentration
was smaller;
• NK-landscapes with K = 11 had local optima distributed in a similar way to a uniform
distribution of points in the search space (no concentration);
• local optima of the BQP were highly clustered in the search space of all solutions (1/10 to
1/4 of the diameter);
• local optima of the QAP were spread all over the landscape, with one exceptional instance
(the one with high FDC).
The real-world VRP analysed by Jaszkiewicz & Kominek (2003) had quite similar local optima
from the point of view of 2 similarity measures, while the same solutions had rather low similarity
with respect to the 2 other measures.
To summarise, one can see that clustering of local optima is also dependent on the analysed
problem, instance type and distance measure. In some problems local optima happened to be very
close to one another, and then this landscape feature may be exploited in the designed algorithm.
Historical remarks
The works that probably inspired most of the literature on FDA are due to Kirkpatrick & Toulouse
(1985), and Muhlenbein (1991). As far as the author knows, they did not consider measuring the
correlation between solution quality and distance to a global optimum, but their focus on similarity
of local optima of the TSP and related results convinced many researchers to follow their ideas.
First computation of FDC is most probably due to Boese (1995), and Jones & Forrest (1995).
Their works, published in mid 1990s, are likely to be the most often cited on fitness-distance
correlation.
In late 1990s and early 2000s, Merz and Freisleben enormously contributed to this area of
research (Merz 2000, Merz 2001, Merz & Freisleben 1999, Merz & Freisleben 2000a, Merz &
Freisleben 2000b). At the same time also two important papers on flowshop scheduling appeared
by Reeves (1999), and Reeves & Yamada (1998).
Afterwards, a number of other researchers also focused their work on the fitness-distance analysis and its applications, convinced to some extent by these first results that statistical landscape
analysis may be an area promising further improvement in the design of metaheuristics.
5.3. Exploitation of fitness-distance correlation in a memetic algorithm
1 2 3 4 5 6 7 8 9 10
63
1 2 3 4 5 6 7 8 9 10
p0 0 1 0 0 1 1 0 0 1 0 p1 1 0 1 0 0 1 1 0 0 1
1 2 3 4 5 6 7 8 9 10
o 1100011000
Figure 5.2: An example of a case of a respectful recombination of parents p0 and
p1 on a binary representation. Common assignments to positions of parents and
the offspring o are emphasised. This operator preserves the Hamming distance.
5.3
Exploitation of fitness-distance correlation in a memetic algorithm
However, ‘it is not sufficient to have a landscape with ‘nice’ properties. It is necessary also to have
a means of exploiting the landscape that makes use of these properties’ (Reeves & Rowe 2003).
5.3.1
Design of respectful recombination
One of the major conclusions of Merz’s (2000) states that ‘memetic algorithms should employ respectful recombination operators on correlated landscapes’ because they can exploit this, i.e. FDC,
structure. The same opinion may be found in the paper by Jaszkiewicz & Kominek (2003).
Definition
In case of binary representations, a respectful crossover may be the one that preserves in an
offspring the positions (bits) which have identical values in (are common to) its parents (Merz 2000)
(see figure 5.2). Thinking more generally, a respectful operator preserves in an offspring all solution
properties that are common to parents, irrespective of how a property is actually defined: some
value at a position in a vector, presence of an edge in a tour, a precedence relation between jobs,
etc.
From the perspective of measuring distance in terms of the same properties, all the common
properties of parents do not contribute to the distance, only the different ones do. Thus, these
different ones make the distance between parents larger than zero, d(p0 , p1 ) = d; if there are no
different properties in the parents, then the distance with respect to these very properties should
be zero. Now, if a respectful crossover is performed, the resulting offspring o always inherits
the common properties of its parents. Therefore, the distance of this offspring to its parents,
with respect to the same distance measure, may not be larger than the distance between parents:
d(p0 , o) ≤ d and d(p1 , o) ≤ d. This is the reason why a respectful operator is also called a distancepreserving one (Merz 2000, Jaszkiewicz 2004, Jaszkiewicz & Kominek 2003).
The idea of distance preservation is intuitively shown in figure 5.3. Note that this figure is very
similar to the one given by Merz (2000, page 58). Of course, such a figure is only an intuitive,
2-dimensional illustration of what happens; in combinatorial problems with reasonable instance
size the search space usually has many more dimensions.
Such a recombination operator does not generate arbitrary jumps in the search space, but
always results in an offspring which is located in the region spanned between its parents.
Respectful recombination exploits FDC
Merz explains in conclusions of his thesis (2000) how a memetic algorithm with such a recombination operator can exploit fitness-distance correlation:
64
o
p0
d( p0 , p1 )
p1
Figure 5.3: Intuitive picture of a case of a distance-preserving (respectful) crossover
in 2 dimensional Euclidean space.
The most important property of landscapes on which MAs have been shown to be highly
effective is the correlation of fitness of the local optima and their distance to the global
optimum. MAs employing recombination are capable of exploiting this structure since
with respectful recombination offspring are produced located in a subspace defined by
the parents. These offspring are used as starting solutions for a local search that likely
ends in a local optimum relatively near the starting point and thus near the parents.
Due to the correlation structure of the search space this point has more likely a higher
fitness than local optima more distant in an arbitrary direction from both parents.
Viewing the evolutionary process as a whole, recombination is capable of decreasing
the distance between solutions (and the distance to the optimum) if the landscape is
correlated, while fitness is increased by selection: the higher the fitness of the local
optima and the closer they are in terms of distance, the more likely the optimum is
found in the vicinity of those solutions.
This in not only Merz’s point of view. Mattfeld et al. (1999) express similar opinion when they
discuss the applicability of adaptive search approaches to combinatorial optimisation problems.
They say that in problems like the TSP, where many local optima are located near the global
optimum (i.e. a ‘big valley’ exists), recombination operators like Merz’s are a good option.
Respectful recombination is especially useful if there is high similarity of local optima of the
considered problem (they are clustered in the search space). In this case it is very likely that
parents of recombination share many common properties, so that a large part of the constructed
offspring is determined by the parents.
When not to use respectful recombination
There are cases when a respectful recombination operator should not be used. One of them
happens when there is no correlation between fitness and distance to the search goal. Then, a
mutation operator should be preferred to recombination, since directed jumps with recombination
are aimless (Merz 2000).
Another one is when the fitness landscape of the analysed problem is deceptive. In such a case
an operator should be used which is disrespectful on purpose (Merz 2000).
The last case listed by Merz (2000) happens when the analysis of a fitness landscape reveals
that there is significant FDC, but local optima are too close to each other (he encountered such a
landscape in the BQP). In these circumstances respectful recombination may frequently produce
offspring identical to one of its parents. A mutation with a constant jump distance should be used
in an MA for such a problem, Merz suggests.
65
Greedy choices to complete an offspring
But when a distance-preserving crossover is used in an MA, it is usually not sufficient to complete
an offspring. This can be seen in figure 5.2: common parental properties (positions with the same
values in this case) are not enough to define a complete offspring; some additional values have
to be specified. This is quite important if local optima of the considered problem are not very
similar to each other (the average distance between them is high, so usually there are few common
properties), which usually happens at the beginning of a run of artificial evolution. Merz (2000)
suggests that greedy choices may be used to complete an offspring with common properties already
in place, especially in problems with low epistasis. Jaszkiewicz (2004) and Jaszkiewicz & Kominek
(2003) follow Merz’s example.
Examples of respectful recombination
Not only does Merz (2000) argue in favour of the design of respectful crossover in presence of a
positive FDC, but also gives numerous examples of such a design for the problems he investigated.
He designed and implemented two respectful crossovers for NK-landscapes: a modification of
the uniform crossover (UX) which preserves the Hamming distance, called HUX, and a greedy recombination operator (GX), which beside distance preservation also employed the aforementioned
greedy completion of an offspring. Both of the operators, when used in a memetic algorithm,
performed much better than competitors on NK-landscapes with positive FDC (K < 5). The
competitors were: a multi-start local search and a genetic algorithm with uniform crossover, onepoint crossover or mutation only.
Merz’s experiments on the BQP also demonstrate that distance-preserving crossovers perform
well in a memetic algorithm for a problem with high FDC. For this unconstrained binary problem
he also employed the HUX respectful operator in a memetic algorithm and it appeared that it
was better than good competitors (tabu search, simulated annealing) taken from the literature.
Nevertheless, a mutation-based MA fared even better, due to the very high similarity of local
optima in the BQP, as indicated above.
The experiments on the TSP support the idea of respectful recombination, as well. Merz (2000,
2002) designed two respectful operators for this problem. The distance-preserving crossover (DPX)
preserved in an offspring all edges common to parents and completed it with foreign (uncommon)
edges in a greedy way. The generic greedy recombination operator (GX) was also a distancepreserving one with respect to the bond distance, but the completion of an offspring followed a
slightly different procedure: more parental edges could be inherited. Computational experiments
(Merz 2002) compared the two operators with the maximum preservative crossover (MPX; see also
section 4.4.3) in the memetic algorithm framework and they revealed that GX was the best, DPX
the second best. These MAs were also better than iterated local search and a mutation-based MA.
Respectfulness and a high inheritance rate of parental edges appeared to be the most important
properties of a good crossover for the TSP.
Following the generally positive results of his fitness-distance analysis of the GBP, Merz implemented 3 respectful crossovers for the problem (Merz 2000, Merz & Freisleben 2000b): the
uniform crossover with distance preservation, the HUX crossover and the greedy crossover (GX).
The latter operator differs from the two former ones in that it uses a greedy procedure to complete
an offspring with common parental bits. Experiments with memetic algorithms confirmed that
this GX, when used together with mutation, was superior to the two others, and also to other
metaheuristics (simulated annealing, tabu search).
The quadratic assignment problem was the last one investigated by Merz & Freisleben (2000a).
66
They designed two respectful operators for this problem. In the distance-preserving crossover
(DPX) an offspring inherited all common properties of parents, but the completion phase was based
on features of neither parent, thus resulting in a number of implicit mutations. The cycle crossover
(CX), one the other hand, after transmitting common features of parents, also transmitted from
them the properties which were not common; this way no implicit mutations happened. It appeared
that the DPX was a good design for the few instances of the QAP with high FDC, while CX
was even better, performing well also on some other instances with negligible fitness-distance
correlation.
Based on their FDA of a real-world vehicle routing problem, Jaszkiewicz & Kominek (2003)
designed a distance-preserving crossover (DPX) preserving all 4 types of features they found important for high FDC. They used it in their memetic algorithm and compared to two other crossovers
they implemented, which preserved some of the 4 features, but not all of them together. Computational experiments demonstrated that this DPX was the best, outperforming also some other
metaheuristics (multi-start local search, simulated annealing and an evolutionary algorithm). Distance preservation in the crossover appeared to be crucial for good performance of the memetic
algorithm for this VRP.
Positive results of the fitness-distance analysis also convinced Jaszkiewicz (2004) to design a
distance-preserving crossover for the satellite management problem he considered. There were 5
types of features which positively correlated with fitness in the problem, but one was dependent on
some other one. Therefore, Jaszkiewicz designed an operator preserving all 4 independent features.
He also implemented weaker versions of the operator, preserving fewer of the features, in order to
test if the inheritance of all common features of parents in an offspring was essential. Results of
his experiments showed that the more important types of features were preserved by an operator,
the better its performance in the memetic algorithm was. Moreover, the memetic algorithm were
better than iterated initial solution heuristic, demonstrating that the rational crossover design
based on FDA was well worth the effort.
Adaptation pattern: systematic design of a crossover operator for the memetic
algorithm
Examples from the literature and positive experience with crossover operator design based on
fitness-distance analysis led Jaszkiewicz (2004) to the formulation of an adaptation pattern of the
memetic algorithm to a combinatorial optimisation problem. The patter was formulated as1 :
1. Generate sets of good and diversified solutions for a set of instances of a given problem.
2. Formulate a number of hypotheses about solution features important for a given problem.
3. For each feature and each instance, test the importance of this feature with a correlation
between the value of the objective and similarity of good solutions. The similarity is measured
with respect to this feature.
4. Design distance preserving recombination operator assuring (or aiming at) preservation of
common instances of features for which positive correlations were observed. The operator
may preserve common instances of several features.
(This pattern, although not directly formulated yet, was also followed by Jaszkiewicz & Kominek
(2003) in their work on a real-world VRP ).
1 The
list of steps in the pattern is a citation from Jaszkiewicz (2004)
67
According to Jaszkiewicz (2004), the main goal of this pattern is to reduce the effort required
to design a good optimisation algorithm, by avoiding not promising paths in the development of
the algorithm. By effort he means the designer time and the time of computational experiments.
Indeed, one certainly sees that in some cases the design of well-performing algorithms takes years
(e.g. the TSP case, see comments by Jaszkiewicz & Kominek (2003)). By employing fitnessdistance analysis, which is actually the core of the pattern (points 1 through 3), the designer may
discover which features are important for high quality of solutions (high FDC) and then design
a recombination operator with this knowledge in mind, thus avoiding experiments with diverse
operators in the trial-and-error manner.
5.3.2
Adaptation of mutation
Merz (2000, 2004) says that mutation is especially useful in problems (instances) where fitnessdistance correlation is not revealed (‘unstructured landscapes’ he calls them). In such a case this
operator simply makes jumps out of the basin of attraction of the current local optimum and
enables the subsequent local search to find another one.
But he also states that mutation becomes a valuable operation if local optima in the landscape
are very close to each other (in a structured landscape). In landscapes of this kind respectful
recombination becomes ineffective, producing very often one of the given parents. Thus, mutation
with a constant jump length appears to be more suitable. This point of view on the negative effect
of recombination (convergence) and the need for more mutations is also expressed by Reeves &
Yamada (1998) in their paper on flowshop scheduling.
What may also be deemed significant is that in presence of high FDC Jaszkiewicz (2004) and
Jaszkiewicz & Kominek (2003) did not consider the use of mutation in their memetic algorithms
at all. It should be noted, though, that they had small amount of time for computation and the
danger of convergence was likely to be small in their cases.
Examples of application
Profitability of mutation was clearly visible in the experiments conducted by Merz (2000) on NKlandscapes with K ≥ 5, where FDC was negligible. He employed bit-flip mutation in his MA, with
3 bits being flipped simultaneously. This mutation-based MA was the best algorithm among all
others: MA with HUX crossover, other GAs, and multi-start local search.
The same conclusion might be derived from Merz’s experiments on the QAP (Merz 2000): for
instances without significant FDC the memetic algorithm with mutation only was the best one.
The mutation was a jump on a predefined distance in a random direction. All crossover-based
MAs were worse. This conclusion was also arrived at by Hoos & Stutzle (2004).
The BQP was the problem with high FDC and very closely located local optima, where recombination became unproductive in further stages of the algorithm. Merz’s (2000) experiments
revealed that even in the genetic algorithm mutation was superior to uniform crossover (HUX);
here, mutation flipped a large, constant number of bits, equal to the average distance between
local optima. For the memetic algorithm, the version combining mutation and crossover was best,
better than tabu search and simulated annealing, as well.
An interesting mutation operator for the flowshop scheduling problem was designed by Reeves
& Yamada (1998). In their memetic algorithm they used an operator which required two parent
solutions: one to be mutated and the other to be a reference point in the search space. The
mutated solution was then modified in such a way that the mutant was further away from the
reference point than the original parent. In other words, it was a guided mutation. This design
68
was motivated by a path relinking viewpoint, which will be briefly discussed later. Their memetic
algorithm was excellent in computational experiments, beating the competitors and producing
some new best-known solutions at the time (although this performance was also due to other
components in their algorithm).
5.3.3
Adaptation of local search
Jaszkiewicz (1999) extended the idea of respectful recombination to local search. He argued that
if the problem to be solved exhibits high fitness-distance correlation then it was very likely that
common parental properties which had been inherited be an offspring during crossover were also
common to other good solutions. Therefore, these properties should not be modified during local
search, which is launched after each recombination. This leads to the idea of moves in local search
which are forbidden (locked) after recombination.
Also Merz (2000) considered such a technique. He concluded that it also resulted in significantly
faster local search, due to smaller neighbourhoods to be evaluated.
What Jaszkiewicz (1999) also noticed was that such local search with forbidden moves might
result in offspring which were not local optima in the original, unconstrained landscape. That is
why he also considered a two-phase local search after respectful recombination: first phase with
forbidden moves, the second without them (the ordinary local search).
Ideas similar to this of locked LS moves may also be found in Boese’s paper (Boese 1995),
where he also considered forbidding certain moves based on the fitness-distance analysis.
This technique is supposed to improve MA’s performance if there is high similarity of local
optima in the fitness landscape. In such a case the number of locked moves is likely to be very
high after each recombination, accelerating local search considerably.
Yet, Moscato & Cotta (2003) advise exactly the opposite approach: to enable local search to
change components of an offspring which are common to parent solutions. They argue that the
global optimum may be completely different from local minima, so common properties should be
modified. The author of this thesis thinks that although global optima may happen to be like this
when we do not know them, the strategy proposed by Moscato and Cotta seems to be a wrong
choice when positive fitness-distance correlation is present and, therefore, there is justified hope
that global optima share many common properties with local ones. That technique of theirs may
be useful, though, where there is no fitness-distance correlation in the considered problem.
Examples of application
Merz (2000) experimented with the technique of forbidden moves on several problems: the BQP,
the TSP, the GBP and the QAP. In case of the TSP he concluded that it led to significantly
reduced time of computation, with very good quality of results. The very same conclusion resulted
from experiments on the QAP. In the GBP he also obtained increased performance of the MA
(70%–100% more generations could be computed with the same time limit), but at the cost of a
slight deterioration in the quality of final solutions.
The TSP was also the subject of Jaszkiewicz’s (1999) investigations. The one-phase approach
in his MA led also to a significant increase in performance; as much as 4 to 8 times less evaluations of the objective function had to be performed to obtain the same quality of results as with
unconstrained local search. Jaszkiewicz also noticed a slight deterioration of quality of generated
solutions when the same population size was used, but concluded that with a larger population
for the one-phase approach the final quality was the same, while still reducing the MAs running
time considerably.
5.4. Variants of the fitness-distance analysis
69
His two-phase approach (first, forbid some moves; second, use unconstrained neighbourhood)
resulted in smaller accelerations but better quality. Thus, it was intermediate between the ordinary,
unconstrained version and the one-phase technique.
Jaszkiewicz also employed the one-phase technique in his memetic algorithm for the satellite
management problem (Jaszkiewicz 2004). The recombination operators which locked some of
properties of an offspring were the best performing ones.
5.3.4
Adaptation of other components
Another possibility of exploiting fitness-distance correlation is to embed in the memetic algorithm
a path relinking procedure, as Reeves & Yamada (1998) indicate. By path relinking (path tracing)
they mean ‘tracing local optima step by step, moving from one optimum to a nearby slightly better
one, without being trapped’.
These authors put this idea to work in their MA for the flowshop scheduling problem (Reeves &
Yamada 1998) and embedded a path relinking procedure into the crossover operator they designed.
In the crossover the common elements of parents were always inherited (a respectful crossover it
was), but the choice of the remaining ones was up to the relinking procedure. It started from one
parent and moved step by step toward the other, thus exploiting the space between them in an
organised fashion. ‘The results of this approach (. . . ) were far superior to those of methods based
on simpler crossover operators, such as PMX and others’ (Reeves & Rowe 2003).
Reeves & Yamada (1998) conclude their paper that this embedded path relinking procedure
may be fruitful if ‘big valley’ is a feature of many combinatorial optimisation problems.
5.4
Variants of the fitness-distance analysis
Not all the analyses conducted in the past were performed with the basic approach described
earlier in section 5.2.1. In fact, there are multiple variants of the analysis present in the literature,
most important difference being in the way distance is aggregated.
5.4.1
Analysis with only one global optimum known
Reeves (1999) was probably the only one to have considered the possible difference in values of
the FDC when multiple global optima are substituted with only one of them. He cautiously
concluded from his experiments on the flowshop scheduling problem that results of the analysis
with one optimum only were the same as with multiple optima available (and the distance dopt to
the nearest one is computed).
Obviously, Reeves could not have generalised his results to all problems, having performed the
comparison on one problem only. However, the author thinks that it is likely that high FDC from
the analysis with one optimum will translate to high FDC when multiple optima are used.
5.4.2
Analysis with the distance to the best-known solution
Nevertheless, in practise it is much more probable that none of global optima of the considered
problem is known in advance; the knowledge of any global optimum could imply the problem is
already solved and there is no need to analyse it in order to design an algorithm. Therefore, most
of variants of FDA focus on some substitution of global optima with other reference points, so the
analysis could be applied in practise.
70
One of the variants uses the best-known solution instead of global optima and dopt is substituted
with dbest ; the formula for FDC stays the same.
This variant of FDA was performed by Boese et al. (1994) for the TSP and the GBP. Also Merz
(2000) conducted this type of analysis for some instances of the GBP, NK-landscaes, the BQP and
the QAP. This approach was also followed by Finger et al. (2002) in the case of the SCP.
Interestingly, some of these results were confirmed with analyses where global optima were
available for the same types of instances:
• results of Boese et al. (1994) concerning a random TSP instance and regular graphs in the
GBP were confirmed by Merz (2000);
• Merz’s (2000) analysis of NK-landscapes with large N may to some extent be reinforced by
earlier results of Jones & Forrest (1995), who experimented with much smaller values of N.
Although not all analyses performed with this variant were verified by FDAs with global optimum
known, the cases listed above may be the basis for a cautious hope that such concordance of results
exists in other problems and types of instances.
But despite this hope a question must be asked whether in the fitness-distance analysis one may
freely substitute global optima with some other reference point. The author expects the answer to
this question to be generally negative; the chosen reference point should ideally be an extremely
good solution which is also close to some global optimum (Reeves & Rowe 2003). This cannot
be verified when a global optimum is unknown, though. Therefore, the reference point should be
chosen very carefully, since its form may influence the result of the analysis. Were it distant from
the unknown optimum, the result might be deceptive, meaning that local optima of increasing
quality may tend to be more similar to the reference point, but at the same time more distant
from the global optimum. This is an inherent danger of FDA with unknown global optima.
5.4.3
Analysis with the average distance to all other local optima
Another approach to FDA with global optima not known is to use, instead of dopt , the average
value of distance of each sampled solution to all others in the sample:
1
davg (si ) =
n−1
n
X
d(si , sj )
j=1,j6=i
This type of analysis was performed by Boese et al. (1994) on the TSP and the GBP, beside
the previous type. Boese (1995) compared this variant with the basic approach in the case of a
geographical TSP instance. The same comparison was conducted by Reeves & Yamada (1998) for
the flowshop scheduling problem. Watson et al. (2002) employed only this variant.
There are dangers of this approach, though. Values of dopt and davg are not equivalent with
respect to the size of samples they are based on: dopt is based on only one observation, while
davg , being an average, on (n − 1) observations. This way the variance of each davg (si ) is substantially decreased, compared to dopt (si ) (the operation of taking the average usually decreases
variance). Moreover, the effect of this reduced variance may be visible in the computed value of
FDC: correlations of f with davg may be higher than those with dopt .
In fact, there is such an effect in all cases when comparison is possible: in the study of the
TSP by Boese (1995) and the analysis of the flowshop scheduling problem by Reeves & Yamada
(1998). There the values of correlations when dopt is used are usually by 0.1–0.2 lower, which is a
substantial value given the fact that even small correlations may be deemed significant. Moreover,
71
the scatter plots presented in these works clearly show decreased variance of davg compared to
dopt .
What may also be important, there is doubt whether the values of davg in the sample are
actually realisations of independent random variables, as it is usually assumed while computing
Pearson’s correlation coefficient. For example, davg (s1 ) and davg (s2 ) are based on comparison of
s1 and s2 to almost the same set of (n − 1) reference points; (n − 2) points s3 , . . . , sn are exactly
the same.
Despite the two important drawbacks of this FDC variant, the author thinks it has one advantage over the variant with dbest : the reference point is not a single solution, but the whole sample
and, therefore, it is less likely that the result of this analysis leads away from the best solutions.
The best-known solution can always be added to the sample, anyway, which adds just a tiny bias,
the sample being usually large.
5.4.4
Analysis with the average distance to not worse solutions
A different way of substituting the global optimum was employed by Jaszkiewicz & Kominek
(2003); instead of davg they computed the average distance of solution si to all not worse solutions
in the sample.
Assuming that all sampled solutions have different evaluations and are sorted with the best
one at the beginning (a minimisation problem), f (s1 ) < f (s2 ) < . . . < f (sn ), this quantity is
computed as:
i−1
dbetter (si ) =
1 X
d(si , sj )
i − 1 j=1
for i > 1
and stays undefined for the best solution in the sample, s1 . FDC is computed between f and
dbetter in the sample without s1 .
However, with this approach also the problem of variances arises: they are heterogeneous. For
example, dbetter (s2 ) is based on one observation, while dbetter (s101 ) on 100 and dbetter (s501 ) on 500
of them. Thus, especially first elements of the sorted sample will have high variability of dbetter .
This phenomenon may impact the correlation coefficient and the scatter plot in a hardly predictable
way, and introduce such artifacts to the result that actually did not have any counterpart in the
analysed landscape.
Due to this fact Jaszkiewicz (2004) proposed another modification. In this study the average
distance was not computed to not worse solutions, but in the set of all solutions not worse than
f (si ):
i−1
i
X
X
2
dˆbetter (si ) = dˆbetter (f (si )) =
d(sj , sk )
for i > 1
i(i − 1) j=1
k=j+1,k6=j
This way the size of each sample for dˆbetter (si ) was made larger that for dbetter (si ), but the problem
of unequal variances remained unsolved.
To address this problem, Jaszkiewicz proposed to remove the best 20 solutions from the sample
after dˆbetter was computed. Hence, the smallest sample of distances actually used to compute the
aggregated distance would be approximately of size 200, quite a large size for a sample.
Nevertheless, the problem of unequal variances in the other part of the sample still exists,
albeit less visible. Also, the removal of a number of best solutions may to some extent decrease
the estimated value of correlation; this is called univariate selection in statistics (Ferguson &
Takane 1989). Moreover, the most interesting group of solutions (the best ones) is removed from
the sample in an arbitrary manner. All these issues raise doubt as to the objectivity of this
approach.
72
5.4.5
Tests for the value of the FDC
Some authors applied statistical tests in their attempts to objectively assess the significance of
fitness-distance correlation.
Classical tests
One of such tests that at first sight may seem appropriate is the test employing the t statistic
(Ferguson & Takane 1989, Krysicki et al. 1998):
r
t=r·
n−2
1 − r2
where r is the value of the correlation coefficient computed from the sample of n solutions. With
the null hypothesis being H0 : ρ = 0, the t statistic is supposed to have the t distribution.
It is likely that Boese (1995) applied this very test in his TSP study, although he provided
neither the null hypothesis, nor the statistic formula. However, this test assumes that the involved variables follow the bivariate normal distribution (Krysicki et al. 1998), which is a strong
assumption that was not even reported to have been checked.
There is another test for the significance of a correlation coefficient, which does not require
this assumption to be met (Krysicki et al. 1998). Under the null hypothesis H0 : ρ = 0 and the
alternative H1 : ρ 6= 0, the statistic:
χ2 = n · r 2
has the χ2 distribution with 1 degree of freedom. There is the additional requirement that the
sample should be large (at least several hundreds of elements), but this is easily met by the samples
usually employed in FDA.
But this test also has drawbacks, as all test which require a simple random sample to be drawn.
This was noticed first by Reeves (1999): the values in the sample appear to be dependent; ‘for
example, if local optima A and B are close in terms of their objective function values, and B is also
close to C, then so are A and C’. Hence, if a classical test were used to assess the significance of
correlation, the result could be wrong, because of an improperly chosen model for the real-world
situation; this is sometimes called the type III error in statistics (Rao 1989).
Randomisation test
Reeves (1999) proposed a different approach to testing: a randomisation test. He reported that
a similar problem of assessing the significance of correlation between distance matrices had been
studied in psychology and biology, and solved by the Mantel’s test (Mantel 1967, Manly 1997).
The test starts with two distance (similarity) matrices, Xij , Yij . In the case considered here,
one of them could hold fitness differences and the other values of the employed distance measure.
It is assumed that Xii = Yii = 0 for all i.
The null hypothesis is:
H0 : there is no clustering of objects with respect to both X and Y
and the alternative:
H1 : there is significant clustering
the notion of clustering meaning there are objects which are close to one another in terms of both
X and Y .
73
The test statistic is:
Z=
n X
n
X
Xij Yij
i=1 j=1
The distribution of the Z statistic under the null hypothesis is not established theoretically,
as in classical tests, but by means of randomisation. If the null hypothesis were true, it should
be of no importance which values of Xij are paired with some values of Yij ; there should be no
clustering in any pairing, and any permutation of labels (values) of Xij should give the same ‘no
clustering’ result in the value of Z. Thus, a large number of random permutations of one matrix
(say, Xij ) is generated, for each of them the value znull of Z is obtained, and the set of these values
defines the null distribution of the Z statistic.
Now the value of z obtained from the original data is computed and compared to the null
distribution. For a positive z, the fraction of values for which the condition znull ≥ z ∨ znull ≤ −z
is true, is the estimated probability that the null hypothesis is true (the two-tailed test). If this
fraction is smaller than the initially assumed level of significance α (usually α = 0.05 or α = 0.01),
the null hypothesis is rejected in favour of the alternative one: there is clustering of the observed
objects with respect to distances in X and Y .
Reeves (1999) employed this test in his analyses and reported to have found numerous significant
correlations between fitness and two distance measures (see table 5.1). Also Watson et al. (2002)
reported to have used the test to check the significance of correlation between fitness and davg ,
but specifics were not given.
This test, as all statistical tests, provides means of checking if the observed objects are really
clustered with respect to fitness and distance, or the values of f and d happened to be like this
merely by chance. As such, the test does not provide the assessment of the practical utility of the
observed phenomenon.
5.4.6
Analysis of a set of pairs of solutions
The author of this thesis proposes yet another approach to the fitness-distance analysis. Since
there is doubt concerning the independence of sampled fitnesses and distances in most of the
other approaches, he proposes to draw a sample of pairs of solutions P = (P1 , P2 , . . . , Pn ), where
Pi = (si1 , si2 ) contains two independently drawn solutions (local optima).
For each element Pi of the sample three quantities are computed:
f1 (Pi ) = f (si1 )
f2 (Pi ) = f (si2 )
d(Pi ) = d(si1 , si2 )
and, consequently, three correlation coefficients r(f1 , d), r(f2 , d) and r(f1 , f2 ), with the first two
being fitness-distance correlations. In order to get one fitness-distance relationship indicator from
this kind of sample, one should compute the aggregated effect of the two:
r2 = r2 (f1 , d) + r2 (f2 , d)
which is the coefficient of determination between distance and both the fitnesses. The two correlations may be simply added, because variables f1 and f2 are independent in the sampling model
described above. It may be verified by checking the value of r(f1 , f2 ), which must not differ
significantly from zero.
74
If a coefficient comparable to FDC values of the other approaches were required, one may take
the square root of the proposed determination coefficient. This coefficient was chosen as an F-D
relationship indicator because it has sound interpretation in statistics (Ferguson & Takane 1989).
It may be interpreted as a fraction of variance of one variable (here, d) that may be explained by
variation in the other variables (here, f1 and f2 ) in a 3-dimensional linear regression model.
In this approach a scatter plot may also be examined, although it is a 3-dimensional one. A
2-dimensional plot may be obtained by cutting a slice of the original, e.g. along the plane f1 = f2
of pairs of solutions with the same fitness.
The proposed sampling model has several positive statistical features:
• the measured quantities are sampled independently in each pair Pi ;
• the distributions of d(Pi ) are exactly the same for all i; there is no difference in variances;
• there is no aggregation of distance in the sampling procedure, so there is no artificial increase
in the values of correlation coefficients.
However, as in all approaches where global optima are unknown, the result of this one might be
deceptive, i.e. there may be a visible trend that better solutions are more similar to each other
than worse ones, but at the same time the similarity to the (unknown) global optima does not
have to increase.
5.4.7
Comparison of all approaches
Table 5.2 gathers all the discussed approaches to the fitness-distance analysis and their indicated
negative features.
One can see in the table that the two last approaches, Mantel’s randomisation test and the
sampling of pairs of solutions, have the least negative features of all the approaches which may
be applied to problems when global optima are unknown. Therefore, one of the two should be
employed in practise. But whenever possible, results obtained with these approaches should be
confirmed with the result of the Mantel’s test with known global optima.
5.5
5.5.1
Summary and conclusions
Fitness-distance analysis explores a very interesting aspect of a search space of a combinatorial
optimisation problem: the relationship between quality of solutions and their distance (either
mutual or to a global optimum). If such a positive relationship exists, meaning that better solutions
tend to be closer to each other and to the global optimum (high FDC for a minimisation problem),
it justifies the introduction into a metaheuristic algorithm of some distance-preserving components
which should improve the algorithm’s performance. The concept that based on some measure of
distance between solutions, which does not depend on fitness, one may improve some components
of an algorithm had not been explicitly considered before FDA was proposed.
dopt
yes
no
no
no
no
yes
no
feature \ approach
global optimum required
one arbitrary reference point
decreased variance of d·
unequal variance of d·
univariate selection
probably dependent d· or f
possibly deceptive result
no
yes
no
no
no
yes
yes
dbest
no
no
yes
no
no
yes
yes
davg
no
no
yes
yes
no
yes
yes
dbetter
no
no
yes
yes
yes
yes
yes
dˆbetter
yes
no
no
no
no
no
no
Mantel’s test
with glob. opt.
no
no
no
no
no
no
yes
Mantel’s test
without glob. opt.
Table 5.2: Comparison of features of all the discussed approaches to the fitnessdistance analysis.
no
no
no
no
no
no
yes
sample of
pairs
5.5. Summary and conclusions
75
76
However, the author of this thesis admits that FDA is not yet a properly developed method of
analysis. There is no good mathematical model of what a ‘big valley’ actually is and FDC is not a
parameter of such a model; at the moment FDC is simply a descriptive statistic of an intuitively
understandable, yet vaguely defined, phenomenon of a ‘big valley’.
Secondly, the result of this analysis is in the form of a linear correlation coefficient and a fitnessdistance scatter plot. The rules for interpretation of this result are arbitrary to some extent and
differ between authors. Some say that even relatively small values of FDC and F-D plots with just
a trace of a positive trend are valuable and may be profitable if exploited in an algorithm.
Thirdly, as noted in section 5.2.2, FDA result strongly depends on the analysed landscape: the
fitness function and the measure of distance. It is also dependent on a problem instance, not on
the problem itself. Therefore, the result may be ambiguous for one problem.
Fourthly, there were multiple versions of the analysis procedure proposed in the literature and
there is no standard for it yet. This is not a ready-to-use method of analysis.
Fifthly, the practical usefulness of the analysis requires that it be applicable to problems with
unknown global optima. Yet, without them the analysis may happen to provide wrong guidance:
better solutions may tend to be more similar to each other and to some arbitrary reference point,
but at the same time more distant from the search goal.
What is also important and was not discussed in this chapter, there are some arguments against
the FDC as a reliable indicator discerning between problems. For example, Reeves & Rowe (2003)
show that there is a formula for a modification of a landscape in a way which is undetectable to
FDC. Although it is difficult to say how a landscape actually changes after such a modification (is
it harder to optimise?), this argument casts doubt on FDC as a predictor of problem difficulty for
evolutionary algorithms.
Yet, despite all these arguments against FDA it should be said that in the past this method of
analysis provided undeniably positive results in relatively numerous cases. Thus, it is hard to deny
the existence of the phenomenon of a ‘big valley’, even if the method of analysis has weaknesses.
The author of this thesis expects ‘big valley’ to be present in other, not yet analysed landscapes.
5.5.2
Exploitation of FDC in metaheuristic algorithms
There are doubts concerning the link between FDC and problem difficulty for some metaheuristics,
e.g. genetic or memetic algorithms (Bierwirth et al. 2004, Hoos & Stutzle 2004). This link was to
some extent only qualitatively confirmed by the works listed in table 5.1. There is some pioneering
work on this subject by Watson et al. (2003), but it concerns the JSP and tabu search only.
Therefore, it seems that there remains much to be done in order to clarify whether the link exists
and is strong, although many seem to believe it does exist.
The author of this thesis believes there is a strong link between FDC and the efficiency of
the memetic algorithm with certain algorithmic components. This belief is based mainly on results of Boese et al. (1994), Reeves & Yamada (1998), Merz (2000, 2004), Jaszkiewicz (2004),
Jaszkiewicz & Kominek (2003), which convince the author that components based on ideas of
distance-preservation are highly effective in presence of a positive FDC (although some other
types of components may also be able to exploit it). That is why the adaptation of the memetic
algorithm to two chosen problems of combinatorial optimisation, which is described in the next
chapters, is based on results of the fitness-distance analysis.
Chapter 6
The capacitated vehicle routing problem
6.1
Problem formulation
The capacitated vehicle routing problem (CVRP) was informally described in section 2.1.2.
More formally, let G(V, E) be an undirected graph, V = {v0 , v1 , . . . , vN }, N ≥ 1, is the set of
vertices, v0 represents the depot, while other vertices are customers.
Let c be the function of cost of an edge in G:
c : E → R+ ∪ {0}
and d be the customers’ demand function:
d : V → R+
where the demand of the depot is set to zero: d(v0 ) = 0.
It is also assumed that:
∀ v ∈ V : d(v) ≤ C,
where C > 0 is the capacity constraint of a vehicle.
Under these assumptions the quadruple I = (G, c, d, C) is an instance of CVRP.
A solution s of the CVRP is a set of T (s) routes:
s = {t1 , t2 , . . . , tT (s) }.
A route has the form:
ti = (v0 , vi,1 , vi,2 , . . . , vi,n(ti ) )
for i = 1, . . . , T (s),
where n(ti ) is the number of customers in route ti , the following constraints being satisfied:
∀ i ∈ {1, . . . , T (s)} ∀ ki ∈ {1, . . . , n(ti )} : vi,ki ∈ V \ {v0 }
∀ i, j ∈ {1, . . . , T (s)} ∀ ki ∈ {1, . . . , n(ti )}, kj ∈ {1, . . . , n(tj )} :
¡
¢
(i 6= j) ∨ (ki 6= kj ) ⇒ (vi,ki 6= vj,kj )
∀ v ∈ V \ {v0 } ∃ i ∈ {1, . . . , T (s)}, ki ∈ {1, . . . , n(ti )} : v = vi,ki
(6.1)
(6.2)
(6.3)
n(ti )
∀ ti ∈ s :
X
ki =1
77
d(vi,ki ) ≤ C
(6.4)
78
Constraint 6.1 means that each vertex of each route (except for v0 , the depot) represents some
customer; constraints 6.2 and 6.3 require that each customer is visited exactly once in a solution;
condition 6.4 ensures that the sum of demands of customers on each route (serviced by one vehicle)
does not exceed the capacity constraint.
Let S denote the set of all feasible solutions of the form s. Let f denote the function of cost of
a solution (cost of all edges traversed by vehicles):
f : S → R+ ∪ {0}
f (s) =
X³
´
n(ti )−1
c(v0 , vi,1 ) + c(vi,n(ti ) , v0 ) +
ti ∈s
X
c(vi,ki , vi,ki +1 )
(6.5)
ki =1
The goal of CVRP is to find such sopt ∈ S that:
∀ s ∈ S : f (sopt ) ≤ f (s).
The CVRP is NP-hard in the strong sense (Toth & Vigo 2002a), since it contains the TSP as a
subproblem. Even for some relatively small instances of the CVRP with only 75 customers global
optima remain unknown.
Informally speaking, the problem contains two subproblems (Laporte & Semet 2002, Altinel &
Oncan 2005):
1. The grouping (clustering, bin-packing) subproblem: the goal is to partition the set of all
customers into well-packed subsets which later form separate routes.
2. The routing subproblem: the goal is to solve the TSP in each group (cluster) of customers.
6.1.1
Versions and extensions
There are several versions of the basic problem (Toth & Vigo 2002a):
• a constraint on the maximum distance travelled by a vehicle may be set;
• some additional cost is specified for the use of a vehicle and another objective is to minimise
the cost of vehicles in a solution;
• a fleet of diverse vehicles may be available;
• routes with single customers may be prohibited.
Moreover, there are also some extensions of the vehicle routing problem considered in the
literature. These are, among others, the problems with: time window constraints, split deliveries,
multiple depots, pickup and delivery (Toth & Vigo 2002a).
The basic CVRP was also considered from the multi-objective point of view (Jozefowiez et al.
2007), with one objective being the usual cost objective (as defined by equation 6.5). The other
was the equilibrium objective, where the difference between the cost of the longest and the shortest
route should be minimised.
None of these variants and extensions is considered in this thesis.
79
6.2. Instances used in this study
6.2
Instances used in this study
Instances used in this study (see table 6.1) are taken from Taillard’s website (Taillard 2008), but
they have been used in many studies as benchmarks and come from several sources (according to
Toth & Vigo (2002a)):
• c50, c75 and c100 are taken from a work by Christofides and Elion,
• c100b, c120, c150 and c199 originated from Christofdes, Mingozzi and Toth,
• f71 and f134 are two larger out of 3 Fisher’s instances,
• all 13 tai* instances come from Taillard (Rochat & Taillard 1995).
All these instances represent Euclidean CVRP (vertices are located in a plane). Their basic
properties are listed in table 6.1. Column ‘name’ lists names of these instances as given by Taillard
(Taillard 2008), while ’name in the VRP’ gives names given in Toth & Vigo (2002a), which also
have been used in numerous CVRP studies (the names with† are created by the author of this
thesis following the convention used in (Toth & Vigo 2002a), because this book did not consider the
instances). ‘Best-known cost’ column gives the cost of the best-known solution for each instance.
The symbol of asterisk (∗ ) is put beside costs that are known to belong to globally optimum
solutions. The column ‘best-known status from’ gives reference to works which were the source of
information about the best-known solutions for the author of this thesis.
Table 6.1: Basic information about used instances of the CVRP.
name
name in
the VRP
size
(N )
c50
c75
c100
c100b
c120
c150
c199
f71
f134
tai75a
tai75b
tai75c
tai75d
tai100a
tai100b
tai100c
tai100d
tai150a
tai150b
tai150c
tai150d
tai385
E051-05e
E076-10e
E101-08e
E101-10c
E121-07c
E151-12c
E200-17c
E072-04f
E135-07f
†
E076a10t
†
E076b10t
†
E076c09t
†
E076d09t
†
E101a11t
†
E101b11t
†
E101c11t
†
E101d11t
†
E151a15t
†
E151b14t
†
E151c15t
†
E151d14t
†
E386-47t
50
75
100
100
120
150
199
71
134
75
75
75
75
100
100
100
100
150
150
150
150
385
best-known
cost
∗
524.61
835.26
826.14
∗
819.56
1042.11
1028.42
1291.29
∗
241.97
∗
1162.96
1618.36
1344.62
1291.01
1365.42
2041.34
1939.90
1406.20
1581.25
3055.23
2656.47
2341.84
2645.39
24431.44
best-known
status from
(Prins 2004, Gendreau et al. 2002)
(Ho & Gendreau 2006)
(Toth & Vigo 2002b)
(Toth & Vigo 2002b)
(Alba & Dorronsoro 2006)
80
6.3
Heuristic algorithms for the CVRP
The capacitated vehicle routing problem was formulated in 1950s, and since then a large variety
of exact, heuristic and metaheuristic algorithms was proposed to solve it. Therefore, this review
will focus on algorithms that have some importance for this thesis: evolutionary algorithms, tabu
search (because this type provided the best-known solutions in the past) and others that were
implemented by the author. For a broader review the reader is referred to the survey by Aronson
(1996) or the monograph by Toth & Vigo (2002b).
6.3.1
Savings algorithm by Clarke and Wright
Clarke & Wright’s (1964) heuristic was one of the first formulated for the problem.
The algorithm starts with a ‘daisy-shaped’ solution, i.e. the one where each customer is put in
its own route. Then it uses a merge move to improve the solution; a merge removes the last edge
of one route, the first of another route, and merges the two routes into one by creating a direct
link between customers with removed edges. The merge chosen to be effective is the one with the
largest saving (i.e. improvement in cost) among all feasible route merges. The algorithm stops
when there is no feasible and improving merge.
This heuristic is more formally described in algorithm 6.
Algorithm 6 The savings algorithm (the parallel savings version).
s = ∅ {an empty solution}
for all customers v do {build a separate route for each customer}
t = (v0 , v)
s=s∪t
repeat {find and perform the merge of two routes with the largest saving}
bestM ergeCost = 0
for all routes ti in solution s do
for all routes tj in solution s, tj 6= ti do
s0 = mergeRoutes(ti , tj , s) {s0 is s with merged routes ti and tj }
if isF easible(s0 ) then {check if it is feasible to merge ti and tj }
if f (s0 ) − f (s) < bestM ergeCost then
bestM erge = s0 {remember this better merge}
bestM ergeCost = f (s0 ) − f (s)
if bestM ergeCost < 0 then
s = bestM erge {perform the best-found merge}
until bestM ergeCost == 0
return s
This original version of the method is deterministic and provides only one solution to each
CVRP instance.
Laporte & Semet (2002) and Altinel & Oncan (2005) say this heuristic is one of the most
popular in commercial routing packages, because it is fast, relatively easy to implement and adapt
to more complex versions of the VRP. Perhaps this is the reason why it has been modified and
enhanced many times. The basic version itself may be implemented in two ways (sequential or
parallel savings). Moreover, the saving of a merge may be parametrised (Laporte & Semet 2002,
Altinel & Oncan 2005) in order to make preference for merging of: customers near each other,
customers approximately equidistant form the depot, or customers with large demands. Finally,
some specialised data structures may be used to accelerate the algorithm (Altinel & Oncan 2005),
which has O(N 3 ) pessimistic time complexity.
6.3. Heuristic algorithms for the CVRP
6.3.2
81
Sweep algorithm by Gillet and Miller
The heuristic described by Gillet & Miller (1974) is of the category usually called ‘cluster firstroute second’, because in the first stage it addresses the clustering subproblem, leaving the routing
aspect to the second stage.
This method may be used for planar instances of the CVRP only: it relies heavily on the values
of coordinates of vertices in 2 dimensions. It starts with any straight line going through the depot
vertex and ‘sweeps’ the customers into a cluster by rotating the line around the depot; if a cluster
is full (i.e. the next swept customer would result in the capacity constraint being violated), then
it is stored and a new one is created, and the rotation continued. Thus, this method is a greedy
one, in a way. Finally, TSPs are solved independently in each cluster.
The heuristic is described more formally in algorithm 7.
Algorithm 7 The sweep algorithm.
choose any straight line going through the depot
sort the list L of all customers with respect to the angle between the chosen line and the line
from the depot to each customer
R = ∅ {an empty cluster of customers}
S = ∅ {an empty set of clusters}
while list L contains unassigned customers do
v = f irstU nassigned(L) {first in the list without a cluster assigned}
R0 P
= R ∪ {v} {temporarily add the customer to the cluster}
if u∈R0 d(u) > C then {adding v permanently would exceed capacity}
S = S ∪ R {add the feasible cluster to the set}
R = {v} {start a new cluster}
else
R = R0 {add v to the current cluster}
s = ∅ {an empty solution}
for all R ∈ S do {construct a solution based on the set of clusters}
t = solveT SP (R) {solve the TSP in the set of customers R}
s=s∪t
return s
Gillet & Miller’s (1974) method concentrates on the clustering subproblem of the CVRP, leaving
lots of freedom of implementation concerning the routing subproblem. According to Laporte &
Semet (2002), the TSPs may be solved by virtually any method (either exact or approximate).
Therefore, multiple versions of the algorithm may exist.
This heuristic may be potentially used to generate multiple different solutions, because it may
be initialised with any straight line going through the depot, perhaps resulting in different clusters.
The time complexity of this algorithm is undefined until the TSP algorithm is given. The
pessimistic time complexity of the clustering stage is O(N log N ), due to the call to a sorting
procedure.
6.3.3
First-Fit Decreasing algorithm for bin packing
Another algorithm focused on the clustering subproblem is the First Fit Decreasing heuristic
(Falkenauer 1998). It is actually an algorithm designed for the bin packing problem, which is a
clustering problem only and ignores any routing requirements. Thus, it does not exactly produce
routes, but clusters of customers; TSPs in each cluster have to be solved afterwards. The routes
may be of extremely poor quality, because very distant customers may be put together in a
cluster. However, the algorithm may also generate fewer routes than other heuristics, because it
is not bound by distances and completely focused on clustering.
82
The idea of the algorithm is very simple. First, sort the customers in the decreasing order of
their demands (volumes). Then, consider customers in that order and put each customer in the
first cluster (bin) with sufficient remaining space. If there is no such bin then a new one is created.
The details of the procedure are shown in algorithm 8.
Algorithm 8 The First-Fit Decreasing algorithm.
sort the customers by decreasing demands d(v); the index i of vi reflects the order
numBins = 1 {start with a single bin}
binDemand(0) = 0 {it is empty}
for i = 1 to N do
binF ound = false
for bin = 0 to numBins do
if binDemand(bin) + d(vi ) ≤ C then
binDemand(bin) = binDemand(bin) + d(vi )
customerBin(vi ) = bin
binF ound = true
break
if not binF ound then
binDemand(numBins) = d(vi )
customerBin(vi ) = numBins
numBins = numBins + 1
return customerBin
The time complexity of this clustering algorithm is O(N 2 ) in the worst case.
6.4
6.4.1
Metaheuristic algorithms for the CVRP
Iterated tabu search by Taillard
Taillard (1993) developed an iterated tabu search algorithm for both the CVRP and the distanceconstrained CVRP. For Christofides instances (c*) this algorithm produced solutions that stayed
best-known for at least several years (see Prins (2004)), so it is worth seeing how it works.
The algorithm was divided into two levels. The higher level algorithm generates decompositions
of a CVRP instance into several smaller ones, usually involving 4 to 8 routes. The lower level
algorithm solves each part independently on others by means of tabu search, and returns the
best-found solution to the higher level.
Lower level tabu search
This level starts with a ‘daisy-shaped’ solution (each customer in its own route; see also section
6.3.1). It is modified by two neighbourhood operators, with solution feasibility always maintained:
• swap: swaps two customers between two different routes;
• move: moves one customer from its route to some other one.
Both of these operators were implemented in the way that firstly removed a customer from its
route and then inserted it in the best-possible place in the second route.
Taillard devoted some effort into efficiently implementing the operators. He noticed that a
swap or a move influenced only some of the moves that would be considered in the next iteration,
so the unaffected evaluations were stored between iterations; this design accelerated the whole
process.
6.4. Metaheuristic algorithms for the CVRP
83
As it is usual in tabu search, a tabu list is maintained in the algorithm. It stores the indexes
of moved customers and their original routes. A move is tabu if it attempts to insert a customer
back into the same route it was placed before, unless the move improves on the best solution found
so far. The length of the list is proportional to the number of customers in the solved instance.
Another technique employed by Taillard was to forbid in tabu search the moves that were too
frequent. He noticed that without such mechanism some customers (those near the depot) were
very often moved and it resulted in less diversified search (although diversity was not measured).
Finally, once in 200 iterations or when a solution is reached with cost less than 0.2% above the
best-found, the tabu search launches an exact algorithm to solve independently the TSPs defined
by each route. Since the instances considered by Taillard (1993) had routes with less than 30
customers, the exact algorithm completed in short time, producing optimum routes with small
computational cost.
Higher level decompositions
This level was designed in two different ways for two types of instances: uniform and non-uniform
Euclidean (see section 6.2). The decision on which of the two methods should be used for an
instance was made manually.
Uniform problems: polar and concentric sectors Taillard (1993) noticed that all but one
(c120) Christofides instances were uniform and that this type of instances had an interesting locality
property: ‘it is useless to try to move a city [(customer)] to every other tour [(route)]; it is sufficient
to try to move this city to some of the nearest tours’. Therefore, he recommended to partition such
instances into polar and, for larger instances, concentric sectors, each containing a small number
of vehicles (routes). Once the partition is performed, each sector is solved independently and the
final solutions from sectors are merged into one.
This decomposition was not performed in entirely automatic way. Taillard manually set the
total number of vehicles (routes) to be distributed among subproblems.
When the low-level tabu search is finished, the decomposition algorithm is launched again,
but now there are more types of objects to be considered: complete routes, unsatisfied customers
and unused vehicles. Taillard decided not to partition the routes, but to consider them as supercustomers located at the route’s centre of gravity. With this assumption another decomposition is
generated; unused vehicles are randomly assigned to subproblems.
Non-uniform problems: branches of a spanning tree
The set of instances considered by
Taillard (1993) included also 3 non-uniform ones: c100b, c120 and tai385. Taillard expected
that the decomposition into polar and concentric sectors should not work well for non-uniform
instances, so he resolved to partition them in another way, based on branches of a spanning tree.
This approach was backed up by the intuition ‘that cities that are near to each other will probably
belong to the same branch of the arborescence [(spanning tree)] and thus will probably belong to
the same subproblem.’
Taillard probably thought that an ordinary minimum-cost spanning tree was not exactly what
he wanted, because of the special role of the depot vertex in the CVRP: all routes start and
eventually finish at the depot. He decided to build a minimum-cost spanning tree but for a
somewhat modified matrix of distances between vertices. Generally speaking, the modification
added some distance-proportional penalty to edges that had larger angle with the edge directed to
84
the depot (larger cosine of the angle). The effect was that the generated spanning tree preferred
edges in the direction of the depot to the ones perpendicular to this direction.
This tree was further decomposed into branches (subproblems) by a greedy algorithm. This
procedure in each step cut a branch of the tree that was largest in terms of vertices, had the largest
sum of demands and was furthest from the depot in terms of the number of edges.
Contrary to the procedure for uniform problems, this decomposition procedure was not iterated.
Instead, each series of tabu search was launched starting from the best solution found in the
previous series.
Computational results
Taillard (1993) did not provide an exact stopping condition for his algorithm. However, concerning
only the quality of results for 7 Christofides CVRPs, this algorithm was able to generate the bestknown solutions (new at the time for) 3 instances (c75, c150, c199) and also find the best-known
for the other 4. The algorithm improved some best-known solutions of the distance-constrained
CVRPs.
Concerning the time of computation, Taillard’s algorithm performed generally better for uniform problems that an earlier state-of-the-art tabu search by Gendreau, Hertz and Laporte (see
Taillard (1993) for a reference). It generated solutions of desired average quality (5%, 2%, 1%
above the best-known) in significantly shorter time.
Taillard’s (1993) spanning tree-based algorithm performed better than polar partitioning for
two non-uniform instances, c120 and tai385, although c120 was solved to the best-known value by
the latter.
Overall, this algorithm provided results that were undeniably excellent at the time of publication (1993) and also for several years that followed.
Comments
Taillard’s motivation to use tabu search was rather pragmatic: the best previous approach was
also a tabu search algorithm and he wanted to generate solutions of very high quality.
The algorithm presented above is not a pure TS; it is an early hybrid tabu search, because it also
involves some exact algorithm to solve TSP subproblems (this hybridisation was also motivated
by the goal of generating solutions of very high quality).
In the opinion of the author of this thesis, Taillard’s algorithm may be generally classified as a
‘cluster first/route second’ approach, although in the higher level a decomposition into subproblems
with several clusters is performed, not into single clusters. Nevertheless, in the beginning the
algorithm focuses on the decompositions which are only to some small extent modified during
search, or not modified at all for non-uniform problems.
One of the important elements of the algorithm appeared to be the initial partition of a uniform
instance into subproblems. Taillard noticed that he had not found any rules for preferring a
particular partition over another and that this initial decomposition had great influence on the
final solution.
What one may also notice while reading Taillard’s (1993) paper is that several times he motivated his heuristic approach through comparison with the form of the best-known solutions. The
geometric form of these solutions was exactly the motivation for the decomposition into polar and
concentric sectors. It was also the basis for the modification of the initial decomposition in further
steps of the algorithm. It seems, therefore, that Taillard wanted his heuristics to produce solutions
similar to the best-known ones (most likely in terms of a decomposition) so that tabu search
85
could easily find them, working in their proximity. One might even think that this algorithm tried
somehow to re-engineer the best-known solutions of uniform instances, which resulted in a success.
It also seems that Taillard had very good intuition concerning his polar decomposition algorithm. The locality property he mentioned seems to be true for this uniform instances, although it
should be somehow measured to be confirmed. Nevertheless, it seems that such a partition method
saves his algorithm a lot of effort.
Taillard’s focus on the decompositions, his remarks that the initial partition frequently determined the quality of the final result, and the excellent quality of results of his algorithm seem to
indicate that the clustering subproblem in the CVRP is more important than the routing one.
6.4.2
Iterated tabu search by Rochat and Taillard
Rochat & Taillard (1995) decided to improve the previous algorithm of Taillard (1993), because
the latter had been designed to handle mainly uniform instances of the CVRP, while real-world
problems were usually non-uniform. They did it also because Taillard’s algorithm could easily be
trapped in poor local optima. Their improved iterated local search could also handle the VRP
with time windows.
Motivation for components of the algorithm
These authors gave a very interesting motivation for their design, especially to the intensification
procedure they implemented, which was based on the notions of strongly determined and consistent
variables. ‘A strongly determined variable in one whose values cannot be changed except by
indicating a disruptive effect on the objective function value or on the value of other variables’,
while ‘a consistent variable is one that is frequently strongly determined at a particular value (or
in a narrow range). Specifically, the more often a variable receives a particular value in the best
solutions (. . . ), the more it qualifies as consistent’. Thus, the identification of consistent variables
required defining what actually variables are in the CVRP and then measuring their frequency in
the best solutions.
Rochat & Taillard (1995) decided to use as a variable the presence in a solution of a route of a
certain form. They based this definition on hope that solutions generated by local search (initial
solutions in particular) already contain ‘all the information necessary to create solutions of very
high quality (. . . ), but in a non-apparent way’, because ‘this information is included in the tours
[i.e. routes]. So one hopes that the initialization creates a set of tours that included members not
very different from the tours of a good solution 1 ’. To somehow support their intuition they showed
in a figure a set of routes from some initial local optima and compared it to the best-known (at
the time) solution to the same instance (c199), concluding ‘that several tours are similar, two of
them being identical’.
Given this definition of a variable, they focused their algorithm on measuring and weighting
the frequency of routes in good solutions, and then on using the gathered weighted frequencies in
generating initial solutions for tabu search.
Low level tabu search
The tabu search of Taillard (1993) was used in this algorithm, although slightly changed: the
decomposition procedure had been improved and calls to the exact TSP solver replaced by calls
1 Emphasised
by the author of this thesis.
86
to some heuristic. Details of the changes were not provided; Rochat & Taillard (1995) concluded
only that they improved the behaviour of the method on non-uniform problem instances.
Initialisation of the multiset of routes
In order to discover the consistent variables, one large, weighted multiset of routes is maintained
in the algorithm. The multiset is initialised with routes of 20 different local optima, resulting from
different problem decompositions followed by tabu search, except for routes with one customer
only (‘since they do not contain interesting information’ (Rochat & Taillard 1995)). Each route is
labelled (weighted) with the cost of the solution it belonged to.
Construction of a solution based on the multiset of routes; iteration
The multiset of routes is then used in construction of an initial solution for the subsequent tabu
search. This construction creates a solution by sequentially adding routes from the multiset. This
process is randomised and gives probabilistic preference to routes that originated from better
solutions, but avoids routes with customers chosen in previous steps. This way a solution is
created, and the process stops when no route can be added. If this results in a partial solution, it
is somehow completed in a feasible way.
Then the original decomposition procedure of Taillard (1993) follows. In this algorithm, the
procedure also accepts routes of the constructed solution as input. Next, the low level tabu search
is launched.
The solution generated in this way is then used to enrich the multiset of routes, in the same way
as during initialisation phase. However, no more than some specified number of routes (e.g. 260)
is stored. If necessary, the worst routes are abandoned.
Finally, the loop closes when the next construction phase is started.
Post-optimisation
Rochat & Taillard (1995) noticed that the construction phase was sometimes able to improve
the solution best found so far even without the help of tabu search. This meant that some
subsets of routes from the multiset defined solutions better than all encountered earlier. Therefore,
they decided to introduce a post-optimisation phase in their algorithm, that would attempt to
construct a good solution based on the multiset only. This phase actually consists in solving the
set-partitioning problem induced by routes in the multiset. This problem is solved in an exact way
by a commercial mixed integer programming solver.
The iterated tabu search was first tested on Christofides uniform instances and the authors concluded that it was generally slower than the method of Taillard (1993), which had been designed
especially for such instances.
Conversely, the results on non-uniform instances c120, f71, f134 and tai385 were better than
those of Taillard (1993); the new algorithm was faster in finding very good solutions of a given
quality above the best-known.
Rochat & Taillard (1995) also performed experiments on 12 non-uniform instances they generated (tai75*, tai100*, tai150*) and compared their algorithm again to that of Taillard (1993), and
to the tabu search of Gendreau, Hertz and Laporte that was mentioned earlier (see the works of
87
Taillard for a reference). Again, their algorithm outperformed the competitors in quickly finding
extremely good solutions.
Some additional experiments were performed with the post-optimisation technique of theirs.
After 5 runs of the main algorithm with 50 (70) steps for instances with 100 (150) customers,
250 routes of the best generated solutions were added to the multiset. These routes defined the
set-partitioning problem solved in the exact way. This experiment allowed Rochat & Taillard
to find 3 new best-known solutions for instances f134, c199, tai385, and also 2 new best-known
for new instances tai100a and tai100c. The best-known values were obtained for the other two
tai100* problems, as well, but without further improvement. No improvement of the best-known
was obtained for tai150* instances. No time of computation was mentioned in this case.
Overall, it seems that this algorithm produced excellent results for non-uniform instances, being
somewhat worse than the previous method of Taillard (1993) on the uniform ones.
Comments
The method of Rochat & Taillard (1995) is yet another example of a hybrid algorithm. Not only
does it contain an exact procedure in the post-optimisation step, but also the construction of a
solution based on the multiset of routes resembles an adapted multi-parent recombination operator
(the fact the authors noticed themselves). Thus, a person who classified this method as an adapted
memetic steady-state algorithm would not be entirely wrong.
What struck the author of this thesis in the motivation of this algorithm was that Rochat &
Taillard based their design on the notion of a consistent variable. This notion seems to be very
similar to the one of a solution property that is important for the objective function, which is used
while hypothesising about fitness-distance correlation (this was discussed in chapter 5). Although
expressed in a different language, the motivation of Rochat & Taillard seems to be based exactly
on the hypothesis that good solutions of the CVRP have many properties in common.
They intuitively chose this property (a consistent variable) to be the presence in a solution
of a route of a certain form. However, except for the visual similarity of routes of local optima
to routes of one best-known solution, they did not provide any other objective, measurable argument that their hypothesis about routes being consistent variables was true. One might say, of
course, that very good computational results of their algorithm confirm their hypothesis and the
whole approach, but in a complicated algorithm such as this it is extremely difficult to tell which
component was actually responsible for the excellent final solutions.
The effects of the post-optimisation technique shed some light on this issue. This technique
was able to find or even improve the best-known solutions for 4 instances tai100*, and this fact
proves that the best-known or even better solutions might be assembled out of routes of other
good ones. However, instances tai150* are examples of exactly the opposite case: none of the
best-known solutions could be assembled from pieces in the multiset of 250 routes, proving that
the best-known solutions to these instances are not as similar to other good solutions as Rochat &
Taillard would have liked them to be. Therefore, it seems that the intuition behind this algorithm
was in several cases rather right, while in several others rather wrong.
6.4.3
Route-based crossover by Potvin and Bengio
The hybrid genetic algorithm by Potvin & Bengio (1996) was one of the first for vehicle routing
problems, specifically for the CVRP with time windows. In particular, these authors were first
to propose dedicated recombination operators: sequence-based crossover (SPX) and route-based
crossover (RBX). SBX seems to be specific to the time-windows case, so it is not considered
88
here. On the other hand, RBX was employed in some EA for the bi-objective CVRP (Jozefowiez
et al. 2007), so it will be presented here in the version for this CVRP.
The general RBX idea is to combine complete routes of two parent solutions. In the first stage,
the operator chooses a random number of routes to be copied from the first parent. Then, the
actual routes, also chosen randomly, are copied to the offspring. In the second stage all routes of
the second parent are copied to the offspring, but the offspring’s feasibility is maintained all the
time: the customers already in the offspring are omitted in the copy process.
By such design, Potvin & Bengio wanted firstly to guarantee the feasibility of the generated
offspring. Then, their goal was to generate better solutions through recombination of good characteristics of parent solutions. And although they did not explain what a good characteristic
actually was, their idea seems to be clear: good routes make a good solution. Thus, RBX seems
to be similar in spirit to the construction algorithm based on a multiset of routes, developed by
Rochat & Taillard (1995). However, in RBX there are no weights assigned to routes and there are
only two parents involved.
6.4.4
Memetic algorithm by Prins
Prins (2001, 2004) argued that there had been no evolutionary algorithm for the CVRP that could
have competed with the best tabu search approaches available at the time (the point of view
also expressed by Laporte & Semet (2002)). Therefore, he resolved to develop such an algorithm.
The one he constructed has several interesting features (some of them will be used in this thesis)
and indeed proved to provide very good results at the time, so it will be described here. Formal
description will not be given, since the algorithm generally fits the scheme of an MA given in
section 2.4.5.
This MA is also able to solve distance-constrained CVRPs (see section 6.1.1).
Sequential representation
Prins chose a permutation representation for his MA: a CVRP solution was represented as a
sequence of customer indexes, without any route delimiters. His choice was inspired by a work by
Beasley (see Prins (2004) for a reference), who had solved the problem with a ‘route first/cluster
second’ approach. In this approach the routing subproblem of the CVRP is solved first, while
the clustering one is left to a second stage. Prins exploited the fact that there was a procedure
that for a given sequence of customers produced the optimum (for this sequence) set of routes, in
polynomial time.
Prins (2004) gives the following example of the way the procedure works. Lets assume that an
instance of the CVRP with 5 customers is given. The graph with some edges is shown in figure
6.1. Vertices are denoted by v0 , v1 , . . . , v5 , with v0 being the depot. Edges are labelled with cost,
while customers with demand (in parentheses).
Now let us assume that a sequence of customers is given, s0 = (v1 , v2 , v3 , v4 , v5 ), which represents
a CVRP solution without route delimiters. The procedure (called Split) requires that an auxiliary
directed graph be built, with all vertices but one representing customers; one additional vertex is
added, representing some auxiliary source (i.e. a vertex immediately before v1 in sequence s0 ). An
arc (u, v) is added to the graph if there is a feasible route in the original graph that starts with the
customer immediately after u in the sequence s0 and finishes with v (including all the customers
in-between, in the order given by s0 ). Such an arc is labelled with the cost of the corresponding
route. The auxiliary graph for the considered example is shown in figure 6.2.
89
Figure 6.1: Graph of a CVRP instance - Prins’ example.
Figure 6.2: Auxiliary graph for the given sequence s0 - Prins’ example.
When the graph is built, the Split procedure finds the shortest path from the auxiliary source
vertex to the last customer vertex (here, v5 ). This path determines the optimum split of the given
sequence into feasible routes. In the example the shortest path is indicated in figure 6.2 with thick
lines and defines a solution with 3 routes, s = {(v0 , v1 , v2 ), (v0 , v3 ), (v0 , v4 , v5 )}.
Pessimistic time complexity of the Split procedure appears to be O(N 2 ), due to the auxiliary
graph construction phase.
Sequential representation - comments
The choice of this representation undoubtedly has positive sides: there should be a large reduction
in the size of the search space for an evolutionary algorithm, because there are fewer sequences
of customers than sets of routes for the same instance (although the actual reduction should be
enumerated to be sure about its extent). Moreover, as Prins (2004) notes, the optimum solution
may always be represented in this form, even by more than one permutation (depending on the
number of routes). However, the obtained, reduced problem is still NP-hard and the space to be
searched is of exponential size.
On the negative side, there is the effect that not every CVRP solution may be represented as
a sequence. Of course, its routes may be concatenated and stored as a sequence of customers,
but after decoding the final solution may be different from the initial one. And although this final
solution would be at least of equal quality, there would be an uncontrolled change to the underlying
solution which could be described as mutation implicit in the Split procedure.
Moreover, in this sequential representation there is a problem with application of neighbourhood
operators usually used for CVRP solutions, because there are no explicitly encoded routes. Again,
a CVRP operator may be syntactically applied to a sequence, but the result may be different from
the desired effect: the Split procedure may somehow alter the contents of some routes.
90
To summarise, this representation may be categorised as an indirect one, where the Split
procedure is used as a special decoder. Such representations were used in the past e.g. for scheduling
problems (Michalewicz 1996, Pawlak 1999) or bin packing (Michalewicz & Fogel 2000).
Local search as mutation operator
The title of this paragraph is taken from Prins (2004). He argued that in order for an EA to
be competitive with tabu search or simulated annealing it was important to hybridise the EA
with some improvement procedure (the point of view which is in agreement with remarks given in
section 4.6). Hence, he chose to use local search instead of ordinary mutation.
Due to the difficulty of application of neighbourhood operators to sequences which was mentioned above, Prins decided to decode each solution from the sequential representation before local
search. As operators he chose only the ones with O(N 2 ) neighbourhood size, in order to obtain
reasonable running times. For each pair of customers he examined 9 different types of neighbours,
which could be classified into 3 super-types:
• move: move one customer to a place after the second one; move a pair of successive customers
after the second one; move a pair, but reverse its order in the insertion place;
• swap: swap two customers; swap a pair of successive customers with the second customer;
swap two pairs of successive customers;
• edge-exchange: exchange two edges in the same route; exchange two edges of different routes
in two possible ways.
Local search checks all these neighbourhoods together, but choses the first encountered improvement (the greedy approach). This local search is applied to a solution (offspring of crossover)
only with some small probability, usually 0.2. When it is launched, it always stops at a local
optimum (no truncation).
Order crossover
Because the sequential representation was used to encode solutions, Prins was able to employ in his
MA some standard recombination operators usually applied to permutations. After some initial
testing he decided to use order crossover (OX) (see section 4.4.3).
This crossover is able to preserve in an offspring a chosen subsequence of customers from one
parent and the relative order of some customers from the second parent. It may also happen that
some subsequences from the second parent are preserved.
This operator may be implemented with O(N ) time complexity.
Prins (2004) commented on the operator: ‘the solutions that can be reached by crossover from a
given population define a kind of wide neighbourhood that brings enough diversification [compared
to local search]’. Thus, by using OX he aimed at preserving diversity of solutions in the population.
Initial population: heuristic and random solutions
In the initial population Prins put one solution of the savings heuristic, one of the sweep heuristics
(they were described in sections 6.3.1 and 6.3.2) and one solution of the algorithm proposed by
Mole and Jameson (see Laporte & Semet (2002) and Prins (2004) for a reference). The remaining
part of the population was filled with random permutations. All these solutions were stored in
the sequential representation, which means they were optimised by the Split procedure. However,
91
some of them may have been removed from the population if they represented clones of some other
solution.
Diversity maintenance; restarts
Other techniques employed by Prins were: the diversity maintenance policy by forbidding clones
of equivalent solutions and restarts of the whole algorithm.
In his experiments, Prins (2004) employed the well-known benchmark set of Christofides and the
one of Golden et. al. He included also those instances of the distance-constrained CVRP.
Prins’ results were extremely good. The best results for instances of Christofides placed his
MA as 2nd in the ranking of many metaheuristics, including the state-of-the-art tabu search.
The results of a standard MA setting put his algorithm in the 4th position in the same ranking.
Concerning the test set of Golden et al., Prins’ algorithm discovered 13 out of 20 solutions bestknown at the time, on average beating all other algorithms proposed before.
Interesting remarks
In his paper, Prins (2004) included also several interesting remarks concerning the algorithm.
Firstly, he examined the behaviour and results of his MA without each of the techniques he
implemented. The most important techniques appeared to be the local search phase and the policy
of forbidden clones: when removed from the algorithm, its results deteriorated drastically.
Secondly, he noticed that with a large population and a strong diversity preservation policy
‘an excessive rejection rate (that is most GA iterations are unproductive) appears’ and also that
‘this phenomenon worsens with a high mutation [local search] rate pm , because the local search
compresses the population in a small cost interval’. In other words, the policy of forbidden clones
had strong effect on the admittance of generated local optima into the population.
Finally, Prins concluded that local search phase was the most time-consuming of all in the MA,
typically absorbing 95% of CPU time.
6.4.5
Cellular genetic algorithm by Alba and Dorronsoro
This algorithm, described in two papers by Alba & Dorronsoro (2004, 2006), solves the CVRP and
its distance-constrained version.
Population structure: toroidal grid
The population of this algorithm is cellular, i.e. organised in a 2D toroidal grid, size 10x10, with
each node holding only one solution. Hence, each solution has 4 different neighbours in the grid.
The authors included in this neighbourhood also the solution itself.
They motivated this choice of structure by saying that ‘these overlapped small neighbourhoods
help in exploring the search space because the induced slow diffusion of solutions through the
population provides a kind of exploration, while exploitation takes place inside each neighbourhood
by genetic operators’.
In this population, the selection of one recombination parent is made only in each solution’s
neighbourhood; the other parent is the solution itself.
92
Representation: permutation with route delimiters
A solution to the CVRP is represented by a sequence of customer indexes (starting from 0 to
N − 1) and route delimiters (starting from N ). It allows empty routes by putting two delimiters
beside each other.
Fitness function with infeasibility penalty
Alba & Dorronsoro (2004) modify the objective function of the CVRP by adding a penalty to
infeasible solutions, which are apparently potentially accepted to the population. For the CVRP
the fitness becomes:
f 0 (s) = f (s) + λ0 · overcapacity(s)
where λ0 > 0 is a parameter and overcapacity(s) is the sum of route demands that exceed vehicle
capacity, with the capacity subtracted.
Recombination operator: ERX
This algorithm uses edge recombination that ‘builds an offspring by preserving edges from both
parents’ (see section 4.4.3 on the recombination operators for the TSP).
Mutation: combined neighbourhood operators
The mutation operator in this algorithm is a combination of three operators that are also sometimes
used as neighbourhood operators (see section 6.4.4): insertion (moves one customer to another location in a sequence), swap (swaps two customers in a sequence), inversion (reverses a subsequence
between two locations in a sequence). The authors note that all of these modifications may be
applied in an intra-route and inter-route manner.
Local search: edge-exchange and λ-interchange
Alba & Dorronsoro (2004) employ local search in their GA mainly because ‘it is very clear after
all the literature on VRP that local search is almost mandatory to achieve results of high quality’.
They use two neighbourhood operators: edge-exchange (2-opt), that works inside a route only,
and λ-interchange. The second operator generally exchanges a subset of customers of one route
with a subset from some other route, with the sizes of the subsets limited by λ. In this algorithm,
only λ = 1 or λ = 2 is used.
These authors employ the steepest (best-improvement) version of local search after each recombination/mutation. In case λ-interchange is used, the LS is limited (truncated) to at most 20
iterations (the rationale is not given).
In their first paper, Alba & Dorronsoro (2004) provided some results on 4 Christofides instances
of smaller sizes: c50, c75, c100 and c100b, and reported on the best solutions found; no averages
were provided.
They compared 4 versions of their cGA: without local search, with edge-exchange local search,
with edge-exchange and 1-interchange, and with edge-exchange and 2-interchange. The best version of their algorithm appeared to be the last but one, with edge-exchange and 1-interchange.
It generated best-known solutions for all the considered instances, but it was not said whether is
happened once, several times or in all runs.
93
The authors also compared their best version to other heuristic and metaheuristic algorithms,
including these of Rochat & Taillard (1995) and Prins (2004), and again only with respect to the
best-found solutions. This comparison revealed that their best cGA was of equal performance to
the two mentioned algorithms.
The second paper by Alba & Dorronsoro (2006) was entirely devoted to experiments with their
best version of cGA on some other sets of instances: those of Taillard, Van Breedam and Golden
et al. (see section 6.2). With respect to the best-found solutions, the cGA was able to improve
the best-known solutions for 8 instances of Van Breedam and 1 instance of Taillard (tai75b). It
also found 3 best-known solutions for the other Taillard instances of size 75, another instance of
Van Breedam, and one of Golden et al. (with additional distance constraint).
However, overall results of their algorithm were not that impressive. It was able to get as close
as 0% of excess above the best-known solutions for the smallest instances, but 2.4% for the largest
one of Taillard (tai385); these statistics reflect the best solutions found, not the averages.
Comments
Alba & Dorronsoro (2004, 2006) potentially accepted to the population of their cGA some infeasible solutions. On closer examination, however, it appears that almost any feasible solution was
preferred in their algorithm to even the best-cost infeasible one, because they used λ0 = 1000 in
all experiments. Knowing that for instances of Christofides, Taillard, and also those of Golden et
al. (see Alba & Dorronsoro), the best-known solutions have cost at most of several thousand units,
and that usually a random local optimum has cost less than 10% worse, its seems that any random
local optimum was able to replace any infeasible solution. Hence, this cGA only potentially accepts
infeasible solutions; in practice, further iterations of this algorithm reject infeasible solutions with
high probability, even if any is present in the initial population.
The best version of this algorithm, with 1-interchange, seems to be using almost the same
operations for local search and for mutation, thus making mutation almost completely ineffective. When one recalls the definition of λ-interchange, it transpires that 1-interchange is exactly
equivalent to insertion and swap, that are used in mutation. Only the inversion operator differs
somewhat from local search: inversion, when applied to one route only, becomes edge-exchange in
one route; when applied to more than one route it is equivalent to an exchange of 2 edges, but
between routes, and this operator is not employed in local search. Therefore, it appears that only
inversion, and only applied to more than one route, is not immediately reversed by the subsequent
local search. This fact implies that mutation may be of no importance for the whole algorithm,
since the best results were obtained when mutation was ineffective.
What is also interesting in the papers by Alba & Dorronsoro (2004, 2006), is that they introduced, for the first time in the history of EAs for the CVRP, the edge-recombination operator in
their algorithm, but without any motivation or justification for this design decision.
Overall, the author of this thesis thinks that the cGA contains several arbitrary design decisions
(toroidal grid, infeasibility penalty, mutation, recombination), sometimes hardly motivated, and
fair results, which are at the same time difficult to explain given the design.
6.4.6
Other algorithms
The presented review of algorithms is by no means a complete one. Many other metaheuristics for
the CVRP have been proposed. An interested reader might find the work of Toth & Vigo (2002b)
useful when searching for a broader review.
94
Concerning EAs, some more algorithms for the CVRP were presented by: Berger & Barkaoui
(2003), Tavares et al. (2003), Baker & Ayechew (2003) and very recently by Nagata (2007).
6.5
Summary
Except for the first heuristic ideas, the presented metaheuristic algorithms seem to be rather
complicated machinery, with many components, additional acceleration techniques, diversification strategies, even some exact algorithms for solving subproblems. Nevertheless, the author of
this text draws from the review several conclusions concerning the design of metaheuristics, and
especially evolutionary algorithms.
Mandatory local search It seems that some local search procedure is indispensable in an
efficient algorithm. The tabu search by Taillard and Rochat & Taillard are heavily based on LS.
Prins argues that LS is necessary in an EA for the CVRP, and so do Alba & Dorronsoro and
Nagata. All of these teams employ it in their designs.
Fast local search The employed LS has to be fast, as well. In most of the presented algorithms
it consumes the larger part of the time of computation, so the design of LS influences the overall
running time enormously. The best designs try to accelerate local search e.g. by exploiting the
locality of transformations (the incremental update scheme), caching evaluations of neighbours
for later use (Taillard 1993, Rochat & Taillard 1995), truncation of LS before it reaches a local
optimum (Alba & Dorronsoro 2004) or by applying local search only to a fraction of generated
solutions (Prins 2001, Prins 2004).
Specialised representation and recombination operators Yet Gendreau et al. (2002) noticed that in case of EAs ‘the classical [binary] approach (. . . ) is not appropriate for sequencing
problems, like the TSP or the VRP’. They argued that the bit string representation was not natural for such problems, as well as typical binary crossover or mutation operators (although some of
these have been employed, e.g. the two-point crossover by Baker & Ayechew (2003)). They called
for the development of specialised ones.
It can be seen in the existing EA designs that indeed such operators were developed and
employed. There are several lines of such designs.
One line is represented by Rochat & Taillard (1995) and Potvin & Bengio (1996). These
researchers had the intuition that an offspring had to inherit complete routes from its parents.
Rochat & Taillard were probably first to explicitly express this intuition, based on the visual
form and similarity of good heuristic solutions to the best-known one. Hence their design of the
construction procedure which takes as input a large number of complete routes. Similarly, the
RBX by Potvin & Bengio attempts to inherit in an offspring complete routes from two parents.
Moreover, the crossover designs by Tavares et al. (2003) are very similar to the RBX. Nevertheless,
these designs were not backed up by any verification of the implicit or expressed intuition.
The second line of designs is most probably based on the similarity between the CVRP and
the TSP, which was mentioned in section 6.1. Thus, after little adaptation, operators like order
crossover or partially-mapped crossover were used in EAs for the CVRP (Gendreau et al. 2002).
More recently, edge recombination (Alba & Dorronsoro 2004, Alba & Dorronsoro 2006) and edgeassembly crossover (Nagata 2007) were also employed. The two latter build an offspring by preserving edges from parents, but using different approaches. This is a perfectly sensible strategy for
the TSP, given the analyses of TSP landscapes presented earlier in sections 4.4.3 and 5.2.2. But
6.5. Summary
95
the sensibility of edge-preservation, although seems to be obvious given the similarity of the two
problems, was not checked for the CVRP by such analyses.
The SPX crossover by Prins (also recently employed by Jozefowiez et al. (2007)) seems to be a
separate case, because it works on the permutation representation and is backed up by the exact
decoding procedure. This design reduces the size of the space to be searched, which seems to be
working very well. On the other hand, the choice of order crossover to be used as a part of the
SPX seems to be based only on optimisation results; this operator was chosen simply ‘after some
initial testing’ (Prins 2001).
Hence, it appears that the existing recombination designs were based mainly on good intuition,
the similarity of the CVRP to the TSP, or results of initial optimisation tests. The applicability of
these designs to the CVRP was then checked by final computational experiments. It was not based
on any theoretical or empirical insight into the search space. In particular, the author of this text
has not yet seen any fitness-distance analysis performed for the CVRP, although some distance
measures for CVRP solutions have been proposed recently, as described in the next chapter.
Therefore, the systematic construction of recombination operators for the CVRP based on FDA
results, which is the theme of the next chapter, is the very first attempt at a recombination design
based on empirical search space analysis.
Chapter 7
Adaptation of the memetic algorithm to
the capacitated vehicle routing problem
based on fitness-distance analysis
This chapter describes the adaptation of the memetic algorithm scheme to first of the combinatorial
optimisation problems considered in this thesis. Therefore, it provides a step-by-step description
of components of the algorithm that are specific to the CVRP. Most importantly, a description is
given of the adaptation of recombination operators to the problem based on the fitness-distance
analysis.
7.1
Representation
For the representation of CVRP solutions 3 options were considered: path (several times used
for the TSP (Michalewicz 1996)), adjacency, and sequential (see section 6.4.4). From the three
identified options, the adjacency representation was finally chosen for the designed algorithm.
In the TSP case (see Michalewicz (1996)) this representation is an array of size N which stores
indexes of vertices. At index vi in this array the index of vj , its successor on a route, is stored.
Adaptation of this array to the CVRP case should take into account that the depot vertex
usually has more than one successor. In the pessimistic case it may have them as many as N (for a
‘daisy-shaped’ solution). Therefore, the depot successors are separated from the others in another
array. This additional array for each route stores the index of the first customer on the route;
hence the name of the array: F irst. The array of successors of customer vertices (N ext) remains
unchanged, since each customer has always exactly one successor in a valid CVRP solution.
An exemplary solution s = {(v0 , v1 , v5 , v4 ), (v0 , v2 , v6 ), (v0 , v3 , v7 )} in this adjacency representation is shown in figure 7.1.
First:
Next:
1
2
3
1
2
3
1
2
3
4
5
6
7
5
6
7
0
4
0
0
Figure 7.1: The adjacency representation of an exemplary solution s =
{(v0 , v1 , v5 , v4 ), (v0 , v2 , v6 ), (v0 , v3 , v7 )}.
97
98
Adaptation of the memetic algorithm to the capacitated vehicle routing problem
Next:
1
2
3
4
5
6
7
5
6
7 -1 4 -2 -3
Figure 7.2: The modified N ext array for the adjacency representation of an exemplary solution.
Technically speaking, the number of routes has to be stored together with the two arrays in
order to properly read their contents. Overall, O(N ) integers are required to store a solution.
Concerning the time complexity, the access to the first customer on a route or to a successor of
a given customer requires O(1) time. However, the time to check the index of the route a given
customer belongs to is limited by O(N ) (this requires that all routes are traversed, pessimistically).
This last access time may be improved, though only by a constant value, by a slight modification
of the array N ext. Instead of having 0 as a successor of each route’s last customer, the index of a
route may be stored (as a negative number, to avoid ambiguity). The form of the modified N ext
array is shown in figure 7.2. This way the time to check the index of a route of a given customer
is reduced to O(nti ), where nti is the number of customers in that route. This becomes O(N ) in
a pessimistic case, however.
The chosen representation does not induce problems with local search, like the sequential
representation does (see section 6.4.4). And although it is as ‘natural’ and memory-consuming
as the path representation, it requires less time for some basic operations. Therefore, due to
this technical advantage (which is probably the most important advantage of a representation, as
emphasised in section 4.1), it was chosen for implementation.
Note that this representation was devised by the author of this text in his master’s thesis
(Kubiak 2002), independently on Tavares et al. (2003) and before Kytojoki et al. (2007).
7.2
Fitness function and constraints
The fitness function for the designed memetic algorithm is simply the objective function of the
CVRP (see section 6.1). Infeasible solutions are not accepted by any stage of the algorithm:
neither in local search, nor in the population of the MA. This is a realisation of some of the design
guidelines described earlier in section 4.2. It seems, however, that acceptance of infeasible solutions
with a penalty in the form used by Alba & Dorronsoro (2004) (but with other parameter values)
would also be a sensible option.
7.3
Local search
The general scheme of local search was provided in algorithm 1, page 11. Here, the specifics
concerning the adaptation of local search to the CVRP are given. According to the remarks
presented in section 4.6, several design decisions have to be made before implementation.
Concerning the local search type, in this work only the basic ones, steepest and greedy, are
considered. Any more complex version (e.g. simulated annealing or simple tabu search) would
result in some additional design choices to be made or parameters to be set. Therefore, these two
simple types were preferred.
No local search truncation is performed in this work. Although this may improve the efficiency
of search, it is yet another way of introducing a parameter to the method (e.g. the truncation
probability or the number of iterations of search). The author of this thesis wanted to avoid them.
99
7.3. Local search
Local search is applied to all solutions in the population of the designed MA. Greedy LS is
applied to initial solutions, while the steepest one to offspring of genetic operators. During some
initial testing, this appeared to be the fastest combination.
Another important choice with respect to local search concerned neighbourhood operators. In
the literature there are many possibilities of the choice described: e.g. edge-exchange (λopt), Oropt, λ-interchange, b-cyclic k-transfer exchanges (Laporte & Semet 2002, Gendreau et al. 2002,
Prins 2001, Prins 2004). However, it is not the goal of this work to consider, implement and
evaluate all operators from the literature, but to have a good-enough LS in the fitness-distance
analysis and the memetic algorithm. After some experimental testing the author decided that
having 3 operators was enough to achieve this goal; the choice was arbitrary to some extent.
These operators will be described below.
Considerable effort was devoted to the efficiency of these operators and local search:
• additional data structures enriched the adjacency representation;
• all the operators were implemented using the incremental update scheme;
• special lexicographic order of configurations of one operator was introduced;
• cache for neighbour evaluations for 2 operators was designed, implemented and evaluated;
• low-level implementation changes were introduced.
The mentioned data structures are:
• an additional array of size O(N ) with the current cost of each route,
• an additional array of size O(N ) with the current sum of demands of customers on each
route.
Some remarks concerning the incremental update scheme are given with the description of each
operator. Later, the additional acceleration techniques are presented.
7.3.1
Merge of 2 routes
This merge operator was inspired by the algorithm of Clarke & Wright (1964), described earlier in
section 6.3.1. Therefore, it will be sometimes referred to as the CW operator. It works by merging
sequences of customers from two routes into one sequence (one route), if only such a merge does
not lead to a violation of the capacity constraint.
The idea of the merge operator is shown in figure 7.3 in one of 4 possible modes. The left-hand
side of the figure presents a solution (two routes) before merge; the scheme in the middle shows
the same solution being modified; the right-hand scheme presents the solution after modification.
A merge removes exactly 2 edges from the initial solution; these are the dotted edges connected to
the depot (v0 ). Then, each mode connects the two routes by adding one edge that links together
the vertices detached earlier from the depot; this is the dashed edge.
The modes may be described in the following way. Let the merged routes ti , tj be:
ti = (v0 , vip , . . . , vik )
tj = (v0 , vjp , . . . , vjk )
Let the new route, the one resulting from merge, be denoted by tk . In each mode the route has
the form:
100
Figure 7.3: One mode of the merge (CW ) operator.
1. tk = (v0 , vip , . . . , vik , vjp , . . . , vjk )
2. tk = (v0 , vjp , . . . , vjk , vip , . . . , vik )
3. tk = (v0 , vip , . . . , vik , vjk , . . . , vjp )
4. tk = (v0 , vik , . . . , vip , vjp , . . . , vjk )
For example, the third mode builds route tk by traversing all customers in ti (in the same order),
skipping the depot, traversing the customers from tj , but in the reverse order (starting from the
end of this route), and returning to the depot. This mode is shown in figure 7.3.
The size of the neighbourhood NCW (s) of a solution s depends only on the number of routes
T (s) in the solution (Kubiak 2002):
|NCW (s)| = O(T (s)2 )
An application of the merge operator may potentially lead to an infeasible solution, if the sum
of demands of the merged routes exceeds the vehicle capacity. Therefore, checking the feasibility
of a neighbour in the adjacency representation requires O(N ) time, because the merged routes
have to be traversed first, in order to compute their demand. This can be accelerated if with
the adjacency representation also a vector of the routes’ demands is stored. In such a case the
feasibility check requires only O(1) time.
The evaluation of ∆f = f (sn ) − f (s), the difference between the cost of a neighbour sn and
the current solution s, requires O(1) time: in each mode the cost of 3 edges has to be accessed and
added with appropriate signs.
Once a neighbour sn is chosen to become a new solution (e.g. in local search), the modification
of s takes O(1) time in modes 1, 2 and O(N ) time in modes 3 and 4 (the order of customers in
one route has to be reversed in the latter modes).
This merge operation was chosen for a neighbourhood operator because, intuitively, in the
CVRP it is important to have in a solution as few routes as possible (in metric instances, in any
case). It is very likely that the number of routes is positively correlated with the cost of a solution,
and although it was not verified empirically, minimising the number of routes appears to be a good
idea when searching for a least-cost solution. Another argument for implementing the operator
was that Clarke and Wright algorithm generates fair-enough solutions in very short time.
101
7.3. Local search
Figure 7.4: One inter-route mode of the edge-exchange (2opt) operator.
7.3.2
Exchange of 2 edges
The idea of this operation comes from the TSP, where it is sometimes referred to as 2opt. It
performs the substitution of two edges in a solution by some other two edges in such a way that
the resulting solution is a valid one. In the TSP the exchange of edges is performed only within a
route (there is one route all the time). In the CVRP also two different routes may be considered.
Figure 7.4 presents the edge-exchange that is characteristic to the CVRP: one of the two
possible inter-route modes. It removes exactly two edges (dotted) from the two routes shown. The
parts of routes are then connected by dashed edges: the presented mode links the beginning part
of one route with the beginning part of the other into one route, and the end with the end into
the other route. The other mode (not shown) links the beginning part of one route with the end
part of the other, and vice versa.
Formally, the intra-route mode modifies one route ti of a solution:
ti = (v0 , vi,1 , . . . , vi,p , vi,p+1 , . . . , vi,k , vi,k+1 , . . . )
to become:
t0i = (v0 , vi,1 , . . . , vi,p , vi,k , . . . , vi,p+1 , vi,k+1 , . . . )
One can see that in this mode not only two edges are exchanged, but also a part of route ti between
vi,p+1 and vi,k is reversed.
Let’s assume that the inter-route modes perform their modifications on routes ti , tj :
ti = (v0 , vi,1 , . . . , vi,p , vi,p+1 , . . . , vi,n(ti ) )
tj = (v0 , vj,1 , . . . , vj,k , vj,k+1 , . . . , vj,n(tj ) )
First inter-route mode, the one shown in figure 7.4, alters the routes to have the form:
t0i = (v0 , vi,1 , . . . , vi,p , vj,k , vj,k−1 , . . . , vj,1 )
t0j = (v0 , vi,n(ti ) , . . . , vi,p+1 , vj,k+1 , . . . , vj,n(tj ) )
while the second mode changes them to have the form:
t00i = (v0 , vj,1 , . . . , vj,k , vi,p+1 , . . . , vi,n(ti ) )
t00j = (v0 , vi,1 , . . . , vi,p , vj,k+1 , . . . , vj,n(tj ) )
102
It can be seen that the first mode reverses some parts of the modified routes, while the second
mode reverses nothing.
The size of this neighbourhood N2opt (s) (all intra- and inter-route modes together) depends on
the number of customers in an instance and the number of routes in a solution (Kubiak 2002):
|N2opt (s)| = O((N + T (s))2 )
An application of this edge-exchange operator does not generate infeasible solutions in case of
the intra-route mode: the set of customers of a route is never modified, so the capacity constraint
is never violated. The cost of the modified solution may be evaluated in constant time in the
adjacency representation, so overall this modification may be evaluated in O(1) time.
However, the inter-route modes may produce infeasible solutions, since route contents change.
Therefore, before the the cost of the modified solution is computed, its feasibility status has to be
checked. Without any additional measures, this takes O(N ) time, because the demands of parts
of routes have to be computed. After this computation, the cost of the modified solution may be
evaluated in constant time (two subtract and two add operations).
The modification of solution s by means of the edge-exchange operator to produce a neighbour
sn takes O(N ) time, since some modes require that a part of a route is reversed.
The motivation for using this operator in the CVRP has several sources:
• in the TSP it is a very fast operator in performing basic improvements of routes, and although
it is not able to conduct some more complicated changes, it is very frequently used as a base
point for local search;
• again, in the symmetric TSP case this operator has the largest correlation length in a group
of several operators that are usually considered for implementation (Merz 2000) and this fact
makes the operator very suitable for local search; perhaps it is similar in the CVRP case;
• the inter-route modes of 2opt make it possible to exchange some sequences of customers
between routes and such an ability seems to be necessary to address the grouping subproblem
of the CVRP;
• 2opt seems to be the most elementary operation on edges of a CVRP solution; in the opinion
of the author it is hardly possible to change less edges in a solution to produce some other
one and hence 2opt nicely fits the general definition of a neighbourhood operator that it
should produce solutions that are close to the original in some sense (see section 2.4.1).
7.3.3
Exchange of 2 customers
This operator is based on the 1-interchange which is sometimes used in the literature (see section
6.4.5). It exchanges (swaps) two customers between their locations in a solution. In the CVRP it
has two modes: the intra- and inter-route one.
Formally, the intra-route mode changes route ti :
ti = (v0 , . . . , vi,p−1 , vi,p , vi,p+1 , . . . , vi,k−1 , vi,k , vi,k+1 , . . .)
into:
t0i = (v0 , . . . , vi,p−1 , vi,k , vi,p+1 , . . . , vi,k−1 , vi,p , vi,k+1 , . . .)
except for the case when vi,k is the immediate successor of vi,p (vi,k = vi,p+1 ), in which a swap is
equivalent to the intra-route 2opt on edges (vi,p−1 , vi,p ) and (vi,k , vi,k+1 ).
103
7.3. Local search
Figure 7.5: Inter-route mode of the customer-exchange (swap) operator.
The inter-route mode, shown in figure 7.5, modifies two routes ti , tj :
ti = (v0 , . . . , vi,p−1 , vi,p , vi,p+1 , . . .)
tj = (v0 , . . . , vi,k−1 , vi,k , vi,k+1 , . . .)
so they become:
t0i = (v0 , . . . , vi,p−1 , vi,k , vi,p+1 , . . .)
t0j = (v0 , . . . , vi,k−1 , vi,p , vi,k+1 , . . .)
The size of this neighbourhood depends only on the number of customers:
|Nswap (s)| = O(N 2 )
An application of this operation may produce an infeasible neighbour only in the inter-route
mode and when demands of the exchanged customers differ. Then, the demand of one route
increases after swap and it may violate the capacity constraint. Thus, the constraint has to be
checked, but this takes only O(1) time.
The cost of a neighbour may also be computed efficiently in constant time, since there are
always 4 edges exchanged in a solution, the ones incident to the swapped customers.
When a neighbour of swap is chosen to become the new solution in local search, the necessary
modification of a solution also takes O(1) time.
The main motivation for using this operator in the CVRP was to have in local search the ability
to address the grouping subproblem even closer than 2opt does. It seems that swap, by exchanging
single customers between routes, is more dedicated to this goal than 2opt, which exchanges whole
subsequences of customers, and hence may be more prone to infeasibility of neighbours.
7.3.4
Composition of neighbourhoods
The chosen operators were implemented in a way which allows free composition of any neighbourhoods into one. The goal of this design was to potentially increase search abilities of the
LS procedure: if a solution is a local minimum in one neighbourhood, it does not have to be a
minimum in some other one. Also, the feasibility of neighbours may have some importance here.
For example, swap may help 2opt optimise the load of vehicles.
104
7.3.5
Acceleration techniques
Speeding up 2opt feasibility tests
The main computation cost of finding an improving 2opt move is related to feasibility tests of
inter-route 2opt moves. Kindervater & Savelsbergh (2003) described a technique which reduces
this cost to a constant. It is based on the observation that demands of parts of routes may be
stored and simply updated when iterating over neighbours of a current solution in a right order.
This order is called a lexicographic one.
An example of such an order is given in figure 7.6. The top of the figure shows a 2opt removing
edges (2, 3) and (8, 9) from two routes (the dashed edges); the demands of parts (1, 2), (3, 4, 5, 6),
(7, 8), (9, 10, 11, 12) are required for a feasibility test. The bottom of figure 7.6 shows the immediately next 2opt move (in the lexicographic order), the one removing edges (2, 3) and (9, 10). The
required demands of parts (1, 2), (3, 4, 5, 6) have just been computed in the previous iteration and
may be simply reused; the demands of parts (7, 8, 9) and (10, 11, 12) may be computed from the
previous values at the cost of two additions. This reduces the time of a feasibility test from O(N )
to O(1) if performed in a loop and in the lexicographic order.
Figure 7.6: Edge exchanges in the lexicographic order: 2opt for edges (2, 3), (8, 9)
(top) and 2opt for edges (2, 3) and (9, 10) (bottom)
Cache for evaluations of neighbours
As described in section 4.6.5, this technique involves caching of (storing in auxiliary memory)
values of the difference in objective functions between local search iterations.
The general scheme The operations of merge, 2opt and swap modify only a small fragment
of a CVRP solution. Large parts of this solution stay intact after a performed modification.
Consequently, large number of moves which modified the original solution may also be performed
for the new one, and the changes of the objective function stay the same. Therefore, there is no
need to recompute these changes; they may be stored in cache for later use.
Nevertheless, some moves from the original solution are changed by the actually performed
move. These modified moves must not be stored; they have to be removed from or marked as
invalid in the cache. The set of such moves strongly depends on the performed move.
7.3. Local search
105
These remarks lead to algorithm 9 of local search with cache (compare to algorithm 1):
Algorithm 9 LocalSearch(s).
initialise empty cache
repeat {main local search loop}
s0 = s
betterF ound = false
for all sn ∈ N (s) do {iterate over the neighbours of s}
if ∆f (sn , s) is valid in the cache then
∆f = ∆f (sn , s) {take ∆f from the cache}
else
∆f = f (sn ) − f (s) {compute ∆f from scratch}
∆f (sn , s) = ∆f {store ∆f in the cache for later use}
if f (s) + ∆f < f (s0 ) then {check if this is the best neighbour so far}
s0 = sn {remember the better neighbour}
betterF ound = true
if betterF ound then
for all sa ∈ N (s) which are affected by the chosen move from s to s0 do
mark ∆f (sa , s) as invalid in the cache
s = s0 {proceed to the better neighbour}
until betterF ound == false {stop at a local optimum}
return s
One may notice the possible source of gain in local search speed: instead of computing
∆f (sn , s) = f (sn ) − f (s) for each neighbour sn of s from scratch, the value is taken from the
cache if it was computed earlier. However, the operation of cache update has to be called after a
move is found and performed, in order to ensure the cache stays valid. This is a possible source of
computation cost. The goal of caching is to make the gain higher than the cost.
Cache requirements in the CVRP Firstly, in the CVRP not only the objective function
matters. There is also the capacity constraint, which involves complete routes, not only single
customers or edges. Thus, if the capacity constraint for a neighbour is violated then this neighbour
is infeasible; such moves are forbidden in local search. Therefore, not only the change in the
objective function has to be stored in the cache, but also the status of feasibility of a neighbour.
Secondly, because the considered operators merge, 2opt and swap have different semantics,
cache must be designed and implemented independently for each of them (separate data structures).
Thirdly, the local search considered here assumes that the neighbourhoods of these operators
may be composed to form one large neighbourhood. It also means that the order of execution of
operators cannot be determined in advance (it may be e.g.: merge, merge, 2opt, swap, 2opt,. . . ;
it may be any other order). In case of cache this possibility creates a requirement that when one
type of move is performed, then cache of all operations has to be updated.
Finally, the neighbourhoods of the operators have different sizes; the neighbourhood of 2opt and
swap is considerably larger than the one of merge. Moreover, the merge operation is very specific:
the number of applications of this operator is always very limited by the minimum possible number
of routes. Initial experiments with MAs indicated that the number of applications of this operator
amounts to 5–10% of the total number of applications of all operators; the majority of search effort
is spent on 2opt and swap. Therefore, the cache was implemented for these two operators only.
The size of memory for the cache structures is the same as the size of the related neighbourhoods.
106
Implementation issues
Another factor which influences the speed of local search is the implementation itself. On the low
level, making use of certain techniques accelerates computation. The techniques used in this study
were method inlining and avoiding calls to expensive copy constructors for large objects by passing
references to them.
7.3.6
Measuring the speed of local search
Local search execution profiles
The author decided to make detailed profiles of local search executions in order to check the cost
of and gain from the acceleration techniques. Analytical computation of these values is difficult.
Two variants of LS were tested: greedy and steepest. The neighbourhood was merge followed
by composed 2opt and swap. The acceleration techniques were employed in 3 variants: without
cache and lexicographic order (denoted ‘nc’), with cache and lexicographic order (‘c’), and the
most promising one: with swap cache, without 2opt cache and with lexicographic order (‘c∗ ’). All
the variants considered in this experiment included low-level implementation changes.
Only two instances were tested in detail, tai100d and c120. One run of multiple start local search
(MSLS) was conducted for each setting and instance, consisting of 5 independent LS processes.
The runs were limited and short because code profiling considerably increased the run time due
to injection of timing routines into the original code.
The profiling results of greedy LS for instance c120 are presented in table 7.1. They contain
the times of operations for each profiled LS setting. Also percentages of the total run time of the
base version (nc) are shown. The merge operator is not reported due to insignificant cost of its
operations (1–2% of the total run time in all runs).
Table 7.1: Times of execution of search and cache operations in greedy LS, instance
c120.
nc
operation
c∗
c
time [s]
(percent)
time [s]
(percent)
time [s]
(percent)
2opt:
2opt:
2opt:
2opt:
eval. of neighbours
cache read/write
cache update
total search cost
146.3
0.0
0.0
146.3
(50.4)
(0.0)
(0.0)
(50.4)
47.6
30.1
19.8
97.5
(16.4)
(10.4)
(6.8)
(33.6)
57.1
2.6
0.0
59.7
(19.7)
(0.9)
(0.0)
(20.6)
swap:
swap:
swap:
swap:
eval. of neighbours
cache read/write
cache update
total search cost
33.2
0.0
0.0
33.2
(11.4)
(0.0)
(0.0)
(11.4)
13.3
8.1
5.1
26.5
(4.6)
(2.8)
(1.8)
(9.2)
11.5
7.2
5.1
23.8
(4.0)
(2.5)
(1.8)
(8.2)
operators: total
179.5
(61.8)
124.0
(42.7)
83.5
(28.8)
greedy LS: total
290.2
(100.0)
209.9
(72.3)
173.1
(59.6)
One can see in the table that the variant without the algorithmic techniques (nc) was the
slowest one: it took 290.2 s, while the search itself took 179.5 s. Very high cost of search by the
2opt operator is clearly visible (50.4% of the total run time). The cost of swap is lower, although
it is also considerable (11.4%). Consequently, there is space for improvement in this variant.
The introduction of cache and 2opt lexicographic order decreased the time of computation to
72.3% of the base variant, which is good news. 2opt evaluation time dropped to 16.4%. However,
2opt cache introduces new cost components: cache reads and writes (10.4%), and cache updates
7.3. Local search
107
after a performed move (6.8%). In total, the 2opt search time was reduced from 50.5% to 33.6%.
The situation is similar for swap: the evaluation time dropped from 11.4% to 4.6%, but cache
management (read/write and update) took another 4.6%, making the cache only slightly profitable.
The last variant, c∗ , used 2opt lexicographic order but did not use 2opt cache; it seemed that
the cache management cost for this operator was too high. The results show that this approach
gives the highest gain for the greedy LS: the evaluation cost for 2opt amounts to 19.7%, while
there is no cache management cost (the figure 0.9% reflects the time of calls to empty cache which
is not updated at all). In conjunction with gain from swap cache and the lexicographic order, the
overall time of computation was reduced by 40.4% of the base variant, nc.
The analysis of cache usage in the c variant demonstrated that only 28.1% of 2opt cache was
used, while for swap it was 58.8%. As predicted, the cached values were rarely used in the greedy
version, because improving steps were usually found very quickly (the neighbourhood was not
completely searched through, sparsely filling cache with valid values). Moreover, these numbers
indicate that 2opt updates invalidated large parts of cache, while for the swap operator most of
the cache remained valid after an improving move was performed.
In case of instance tai100d (the detailed results are not reported) the cache management cost
was generally higher, making cache usage rather expensive.
The steepest version differed mainly in the cache usage: 62.0% of 2opt neighbours were evaluated based on the cache contents; as much as 77.2% in the case of swap. Therefore, gains from
cache were higher.
To summarise this experiment, the execution profiles indicate high 2opt cache management
cost which is generally compensated by gain in evaluation of neighbours, but results in no further
significant improvement. On the other hand, the lexicographic order in 2opt and the cache of swap
improve the efficiency of greedy local search. The improvement is even higher for the steepest
version. Therefore, except for 2opt cache, all the tested techniques should be used in local search.
Local search execution times
In order to assess the influence of all the acceleration techniques on local search in an unbiased way,
an experiment without injected timing routines was conducted. It also involved several instances
of different sizes to check the effect of scale.
Two local search versions were tested: greedy and steepest. Also, two more versions with
respect to the acceleration techniques were involved: without any of them (denoted ‘before’) and
with all of them except for the costly 2opt cache (denoted ‘after’). Each combination was run in a
multiple start local search (MSLS) algorithm which started from random solution. MSLS was run
10 times; each run stopped after 100 independent LS processes.
Average times of computation in this experiment are given in table 7.2.
It can be seen in the table that without the acceleration techniques the computation is approximately 10 times longer. This is a constant effect, independent on instance size. The steepest
algorithm reduces the time of computation a bit more than the greedy one (by 91.6% compared
to 90%, on average). The overall result is very good, so this experiment confirms that the chosen
accelerations techniques should be used in LS.
Large experiments with the memetic algorithm, presented further in section 7.9, were practically
possible mainly due to this positive effect of the acceleration techniques.
108
Table 7.2: Times of execution of local search without and with additional speed-up
techniques, and the reduction in the time of computation due to the techniques.
before [s]
steepest n-1-2
after [s] reduction [%]
before [s]
c50
tai75d
tai100d
c120
tai150b
c199
6.0
24.0
50.7
89.5
172.3
307.8
0.5
2.5
5.5
9.2
17.7
29.5
91.7
89.6
89.2
89.7
89.7
90.4
22.3
98.2
218.5
385.0
792.2
1816.5
2.0
8.1
18.5
31.0
68.8
147.1
91.0
91.8
91.5
91.9
91.3
91.9
-
-
90.0
-
-
91.6
avg.
7.4
greedy n-1-2
after [s] reduction [%]
instance
Initial solutions
Local search and memetic algorithms usually start computation from complete solutions of the
given problem. This section presents methods used for generating such initial solutions for these
algorithms.
7.4.1
Heuristic solutions
Several heuristics have been implemented:
• the Clarke and Wright savings algorithm (see section 6.3.1),
• the Gillet and Miller sweep algorithm (see section 6.3.2),
• the First-Fit Decreasing algorithm (see section 6.3.3),
• the Gain-Fit Decreasing algorithm.
The first algorithm is simply the steepest local search with the merge (CW ) operator. It starts
from a ‘daisy-shaped’ solution. It is deterministic.
In the second algorithm a TSP heuristic had to be designed. The well-known algorithm based
on a minimum spanning tree was chosen to solve the TSP subproblems (Cormen et al. 1990).
The second algorithm was also slightly modified so it generated more than one unique solution.
As mentioned in section 6.3.2, the algorithm may be initialised with different straight lines going
through the depot. In the implementation different solutions are obtained exactly in this way, by
starting from straight lines going through the depot and different customer vertices.
The First-Fit Decreasing algorithm, just like the Gillet and Miller one, uses the minimum
spanning-tree heuristics to solve the TSPs.
Gain-Fit Decreasing heuristic
The First-Fit heuristics, although well-performing in clustering demands and minimising the total
number of routes, does not take the distances between customers into account. To amend this,
the author developed the Gain-Fit Decreasing algorithm, which is presented in algorithm 10.
The main idea of the method is to choose for the currently considered customer the cluster
which is closest among all clusters, in hope it creates consistent ones. However, such a simple
extension of the First-Fit Decreasing heuristic hardly results in any change: when there are no
clusters to choose from (as it is always at the very beginning), the customer is still put in the first
7.4. Initial solutions
109
possible one. Therefore, the Gain-Fit Decreasing algorithm is initialised with a number of cluster
seeds which should be distant from each other.
Algorithm 10 GainF itDecreasing(ClusterSeeds)
Clusters = ∅
sort the customers by decreasing demands d(v); the list Cust reflects that order
for all seed ∈ ClusterSeeds do
c = {seed}
Clusters = Clusters ∪ {c}
d(c) = d(seed)
Cust = Cust \ c
for i = 1 to |Cust| do
vi is the i-th customer from the list Cust
P ossibleClusters = {c ∈ Clusters : d(c) + d(vi ) ≤ C}
if P ossibleClusters 6= ∅ then
c0 = arg minc∈P ossibleClusters {distT oCluster(vi , c)}
c0 = c0 ∪ {vi }
d(c0 ) = d(c0 ) + d(vi )
else
c0 = {vi }
d(c0 ) = d(vi )
Clusters = Clusters ∪ {c0 }
return Clusters
The function distT oCluster(vi , c) defines the distance of customer vi to cluster of customers
c. In the area of clustering (Larose 2005, Weiss 2006) there are multiple ways of defining such a
P
distance, the simplest ones being minv∈c c(vi , v) (single linkage), v∈c c(vi , v) (average linkage) or
maxv∈c c(vi , v) (complete linkage).
The result of the algorithm is converted into a solution by the minimum spanning-tree method.
Still, the method for choosing cluster seeds has to be given. This was directly inspired by the
procedure used by Weiss (2006) to initialise the k-Means clustering algorithm. The method selects
the first seed as the customer with the largest demand. It proceeds by selecting each next seed to
be the one farthest from the current ones. In this way a set of seeds distant from each other is
selected.
7.4.2
Random solutions
The procedure for building a random solution is presented in algorithm 11.
Algorithm 11 Building a random CVRP solution.
1: repeat {until a feasible solution is found}
2:
build a random sequence seq of all customers
3:
draw the number of routes
the distribution
¡ N −1 ¢T from
P (T = splits + 1) = splits
· 2−(N −1) for splits = 0, . . . , N − 1
4:
divide seq randomly into T subsequences
5:
build solution s from the subsequences
6: until isF easible(s)
7: return s
This design is an attempt at ensuring that all CVRP solutions of a given instance have equal
probability of being constructed. This is achieved by generating first a random sequence of customers (line 2). Then, this sequence is randomly split into a number of routes (line 4). The
probability of having T as the number of routes is proportional to the number of different CVRP
110
solutions that may be generated with those T routes, i.e. constitutes a binomial distribution (line
3). Finally, if an infeasible solution is constructed, the process is repeated from scratch.
Such a design was also necessary in order to have unbiased random solutions available for the
fitness-distance analysis.
7.5
7.5.1
New distance metrics for solutions of the CVRP
In order to perform the fitness-distance analysis of the problem, some measures of distance between
solutions have to be defined. The distance metrics presented in this section correspond to certain
structural properties of solutions of the CVRP: existence of certain edges (or even paths) or specific
ways of partitioning of the set of customers into routes (clusters). The FD analysis attempts to
answer the question which properties are important for the objective function, i.e. which correlate
with its values.
Although the presented metrics might seem simple at first sight, their strength lies in the fact
that they are linked directly to the mentioned properties of CVRP solutions, not to any specific
solution representation.
Distance in terms of edges: de
The idea of this metric is based on a very similar concept formulated for the TSP: the number
of common edges in TSP tours (Boese 1995). Due to similarity between solutions of the TSP
(one tour) and the CVRP (a set of disjoint tours/routes) the idea of common edges may be easily
adapted to the latter.
In order to properly define the distance metric some definitions are required:
i )−1 n
n
o ³ n(t[
o´ n
o
E(ti ) = {v0 , vi,1 } ∪
{vi,j , vi,j+1 } ∪ {vi,n(ti ) , v0 }
j=1
E(s) =
[
E(ti )
ti ∈s
E(ti ) is a multiset of undirected edges appearing in route ti . E(s) is a multiset of edges in solution
s. The notion of a multiset is required here, because routes in some solutions of the CVRP may
include certain edges twice (the edges to and from the depot).
Using the general concept of distance between subsets of the same universal set, as defined by
Marczewski & Steinhaus (1958) (cited after Karoński & Palka (1977)), the distance de between
two solutions s1 , s2 of the same CVRP instance may be defined as:
de (s1 , s2 ) =
|E(s1 ) ∪ E(s2 )| − |E(s1 ) ∩ E(s2 )|
|E(s1 ) ∪ E(s2 )|
Due to the fact that de is only a special case of the Marczewski-Steinhaus distance, it inherits
all its properties of a metric; its values are also normalised to the interval [0,1].
This distance metric perceives solutions of the CVRP as multisets of edges: solutions close to
each other will have many common edges; distant solutions will have few common ones. However,
closer investigation of the metric reveals that it is not intuitively ‘linear’ (although it is ‘monotonic’), e.g. de = 0.5 does not mean that exactly half of each E(si ) is common; 50% of common
edges implies de ≈ 2/3.
111
Distance in terms of partitions of customers: dpc
The concept behind the second distance metric is based on the ‘cluster first/route second’ heuristic
approach to the CVRP (Aronson 1996, Laporte & Semet 2002) (see also section 6.3.2): first find
a good partition of customers into clusters and then try to find routes (solve TSPs) within these
clusters, separately. According to this idea the distance metric should identify dissimilarities
between solutions perceived as partitions of the set of customers. Therefore, it should directly
address the grouping subproblem in the CVRP.
An example of a distance metric for partitions of a given set may be found in the work by
Karoński & Palka (it is even more generally defined there, for hypergraphs or binary trees). This
example is easily adaptable to solutions of the CVRP. Let us define:
C(s) = {c1 (s), c2 (s), . . . , cT (s) (s)}
ci (s) = {vi,1 , vi,2 , . . . , vi,n(ti ) }
σ(ci (s1 ), cj (s2 )) =
|ci (s1 ) ∪ cj (s2 )| − |ci (s1 ) ∩ cj (s2 )|
|ci (s1 ) ∪ cj (s2 )|
C(s) is a partition of the set of customers into clusters; one cluster, ci (s), holds customers from
route ti of s; σ(·) is the Marczewski-Steinhaus distance between two clusters (sets).
According to Karoński & Palka (1977), the distance between solutions may be defined as:
n T (s1 ) T (s2 )
o
T (s2 ) T (s1 )
dpc (s1 , s2 ) = 1/2 max min σ(ci (s1 ), cj (s2 )) + max min σ(ci (s1 ), cj (s2 ))
i=1
j=1
i=1
j=1
This function is a distance metric for partitions; it is also normalised. It is not exactly a metric
for solutions of the CVRP, because dpc (s1 , s2 ) = 0 does not imply s1 = s2 (the number of solutions
which are not discriminated by dpc may be exponentially large).
The formula for dpc has the following sense: firstly, the best-possible assignment of clusters
from C(s1 ) to clusters from C(s2 ) is made (the one which minimises σ(·)), and vice versa; that
is the idea behind the internal min operators. Secondly, two worst assignments are chosen among
those pairs (the max operators), and distance in those assignments is averaged to form the overall
distance between partitions. Thus, it may be concluded that dpc is somehow ‘pessimistic’ in the
choice of ‘optimistic’ matches of clusters; the same observation was made by Robardet & Feschet
(2000) in the context of clustering algorithms.
This mixture of max and min operators in dpc makes interpretation of its values difficult.
Certainly, values near to 0 indicate great similarity of solutions. However, larger values do not
necessarily indicate very dissimilar partitions; it is sufficient that there are ‘outliers’ in partitions,
which can hardly be well assigned to clusters in the other solution, and the max operator will
result in large values, implying distant solutions.
Distance in terms of pairs of nodes: dpn
The third distance metric, dpn , is based on the same idea as dpc : distance between solutions viewed
as partitions of the set of customers. However, this idea has a different, more straightforward,
mathematical formulation in dpn . Here, the Marczewski-Steinhaus (1958) concept of distance is
applied to sets of pairs of nodes (customers).
Let’s define:
n(ti )−1 n(ti ) n
o
[
[
P N (ti ) =
{vi,j , vi,k }
j=1
k=j+1
112
P N (s) =
[
P N (ti )
ti ∈s
P N (ti ) is the set of undirected pairs of nodes (customers) which are assigned to the same route
ti (it is a complete graph defined over the set of customers in route ti ). The depot node is not
considered here, since every customer is in the same cluster with it. P N (s) is the set of all
undirected pairs in solution s.
The distance dpn between solutions is defined as:
dpn (s1 , s2 ) =
|P N (s1 ) ∪ P N (s2 )| − |P N (s1 ) ∩ P N (s2 )|
|P N (s1 ) ∪ P N (s2 )|
Similarly to dpc , this function is not exactly a metric for solutions of the CVRP, but for
partitions implied by those solutions.
The formula for dpn has a more straightforward sense than the one for dpc ; here, the value
of distance roughly indicates how large are parts of sets of pairs which are not shared by two
compared solutions. If dpn = 0 then two solutions imply identical partitions; dpn = 1 implies
completely different partitions (not even one pair of nodes is assigned to a route in the same way
in s1 and s2 ).
A concept of cluster contamination, very similar in spirit to dpn , was formulated by Weiss
(2006) in the context of clustering text documents.
7.5.2
Distance measures defined in the literature
In recent years some more distance measures and metrics for solutions of the CVRP have been
described in the literature. These are:
• the edit distance (Sorensen 2003, Sorensen 2007),
• the add-remove edit distance for sequential representation (Sorensen et al. 2005),
• the stop-centric and route-centric distance measures (Woodruff & Lokketangen 2005).
The author managed to analyse and implement the two first measures, so they are described
in the sections below. He did not manage, however, to implement the measures developed at the
same time by Woodruff & Lokketangen, due to limited time, so these distances are not considered
here. Nevertheless, it is important to remember that such measures exist and should also be, in
near future, compared to those described in this work.
It is not the purpose of this thesis to provide detailed definitions of all existing distance measures
for solutions of the CVRP. Therefore, in the sections below the measures are only shortly described
and their properties discussed. For the detailed definitions and implementation issues the interested
reader is referred to the cited publications.
The edit distance for CVRP solutions: deu
Sorensen (2003, 2007) defined a distance measure for solutions of the CVRP based on the concept
of the edit distance between strings. An edit operation on a string is a modification of one its
character by means of an elementary operation: insertion, deletion or substitution. Sorensen
describes how to define an edit distance on permutations. Further, he extends this distance to the
case of permutations with reversal independence (or undirected permutations, like single routes in
the CVRP) and to the case of sets of such permutations (like solutions of the CVRP). The sets of
permutations (routes) of two CVRP solutions are matched in this process (in an optimal way) by
113
solving the minimal-cost assignment problem. Therefore, it is possible to determine which routes
in one solution correspond to which routes in the other solution. This distance measure will be
called deu in this thesis (edit distance for undirected routes).
Although the edit distance for strings and undirected permutations is a metric, it is not clear
whether deu is a metric for solutions of the CVRP; this matter is not clarified by Sorensen (although
it is not the most important property of a measure and is not required here). This measure is not
normalised, as well.
The value of this distance is the minimal number of elementary edit operations required to
transform one set of permutations (a CVRP solution) into another such set. Thus, deu = 0 implies
that compared solutions are identical (there is no edit operation required).
This measure focuses on the same order of customers in the matched routes; if this order is
disturbed somehow, then some edit operations are required to perform necessary transformations.
In this sense, the function deu is similar to de , which also stresses the aspect of order (by inspecting
edges and paths). It seems, however, that for this edit distance it is also important that long
identical subpaths are in the same places (absolute positions) of routes in two solutions. Even
if such long subpaths exist in matched routes, a difference in their positions in these routes may
incur some additional edit cost. In consequence, this property of deu makes it different from de ,
which ignores positions of customers in routes and only takes edges into account.
Since the order of customers in routes is important for deu , it means that the same suites of
vertices in routes of two solutions (the same clusters) are not enough for this measure to make
these solutions close. This fact should make it different from metrics which concentrate on clusters
only: dpc and dpn .
It is also worth noting that the distance deu is inflated when numbers of routes in two compared
solutions differ. This is due to the fact that the assignment problem involved in the distance
computation has to match some routes of one solution to artificially added empty routes in the
other one; it implies performing additional insertions or deletions.
The add-remove edit distance for the sequential representation: dear
In their proposal of a path relinking procedure for the CVRP, Sorensen et al. (2005) demonstrated
another kind of distance measure for CVRP solutions, which is also based on the concept of edit
operations. This measure, however, compares solutions encoded in the sequential representation
(proposed earlier by Prins, see section 6.4.4). The distance between such permutations defined
by Sorensen et al. is the edit distance without the operation of substitution; only insertions and
deletions are considered. The authors call it the ‘add-remove edit distance’, so it will be denoted
dear hereafter. The cost of one such edit operation is set to 1/2 by Sorensen et al., but here it is
assumed to be equal to 1.
This measure is not normalised, but this might be amended easily by introducing in its formula
a factor being the reciprocal of twice the number of customers. This measure is a metric for
permutations, but is not exactly a metric for CVRP solutions, because not every solution of the
problem may be encoded in the sequential representation and decoded back without changes.
Nevertheless, it seems not to be a great disadvantage of this distance if one imagines an algorithm
working only on solutions encoded in this representation (like the one by Prins (2001, 2004),
described in section 6.4.4). However, in case any two CVRP solutions had to be compared by a
distance measure, this distance would not be directly useful, unless all solutions were encoded as
permutations (perhaps with some loss of information on the actual routes). In order to have the
possibility of comparison of measures, this approach is applied in this work.
114
It is harder to provide interpretation of values of this measure than in the previous case. This
value is, of course, the minimal number of edit operations required to transform one permutation
into another one, but it is not clear how an edit operation influences the actual underlying CVRP
solution. Due to the nature of the sequential representation it is unknown which edges actually exist
in a solution, and where each route starts and finishes. Thus, an edit operation on a permutation
may imply in the decoded solutions some additional modifications of vertices which are not directly
involved in the edit operation itself. This phenomenon is visible in the example provided by
Sorensen et al. (2005) (page 844, 3 last move operations). It seems that this property of dear
might considerably decrease its utility.
7.5.3
Random solutions vs. local optima
One stage of the fitness-distance analysis focuses on possible differences between sets of local
optima and random solutions of a given instance in terms of distance in these sets (see section
5.2.1).
In order to check these differences in case of the CVRP, large random samples of 2000 different
solutions of each type were generated. Random solutions were produced using algorithm 11,
described earlier in section 7.4.2. Local optima were generated starting from random solutions and
proceeding with a greedy local search. A composite neighbourhood of 3 operators was used in the
algorithm: merge, 2opt, swap.
In these sets distance of each type was computed for 1000 different pairs of solutions, with deu
and dear normalised. Finally, statistics on values of distance in these samples were computed: the
average distance for each instance, the aggregate average and standard deviation in all instances
and rN , the correlation of the average distance and instance size. These values are shown in table
7.3 and figure 7.7. Note that for local optima the table shows the difference between the average
distance for these solutions and random ones, but the values of standard deviation and correlation
are computed for the original averages, not for the differences.
Comment on de Looking at solutions from the point of view of de one can see that random
solutions share surprisingly many edges, on average (1 − 0.716) · 100% ≈ 28%. However, a closer
look at the solutions revealed that this number is artificially high, because random solutions have
many routes and, hence, many depot edges. These are the edges that contribute to this high
similarity. When removed, the distance becomes close to 1.0. For example, in instance c50 the
common edges constitute 27.9% of all of them in compared solutions, but only 0.1% of them are
edges not connected to the depot. The same happens for other instances (e.g. 0.1% for tai385).
Therefore, d¯e = 0.716 is artificially low for random solutions and is the effect of the sampling
procedure.
In local optima, however, the proportions change. In c50 there are on average 39.5% common
edges and as much as 69% of them are not conneced to the depot. Consequently, the change
between rand and lopt pools is higher than indicated by the numbers in the table. The author
estimates it to approximately 0.3.
0.715
0.717
0.711
0.699
0.702
0.701
0.709
0.725
0.725
0.712
0.712
0.715
0.712
0.727
0.730
0.728
0.718
0.723
0.720
0.719
0.731
0.713
0.716
0.009
0.291
avg.
std. dev.
rN
d¯e
c50
f71
c75
tai75a
tai75b
tai75c
tai75d
c100
c100b
tai100a
tai100b
tai100c
tai100d
c120
f134
c150
tai150a
tai150b
tai150c
tai150d
c199
tai385
instance
0.856
0.014
0.781
0.834
0.847
0.838
0.839
0.839
0.841
0.843
0.861
0.860
0.852
0.852
0.853
0.853
0.866
0.870
0.871
0.865
0.868
0.867
0.865
0.877
0.880
0.991
0.003
0.808
0.982
0.987
0.989
0.989
0.989
0.988
0.989
0.991
0.991
0.991
0.991
0.991
0.991
0.992
0.993
0.994
0.994
0.994
0.994
0.994
0.995
0.998
0.813
0.010
-0.420
0.824
0.825
0.812
0.803
0.803
0.800
0.812
0.825
0.826
0.807
0.810
0.812
0.809
0.823
0.825
0.820
0.809
0.814
0.812
0.809
0.817
0.789
random solutions (rand )
1 ¯
d¯pc
d¯pn
N deu
0.837
0.029
0.893
0.775
0.806
0.811
0.811
0.810
0.811
0.811
0.832
0.833
0.832
0.832
0.833
0.833
0.846
0.852
0.860
0.860
0.860
0.860
0.861
0.877
0.909
1 ¯
2N dear
-0.090
0.063
0.613
-0.109
-0.240
-0.023
-0.078
-0.085
-0.119
-0.190
-0.052
-0.185
-0.117
-0.102
-0.089
-0.103
-0.014
-0.097
-0.049
-0.091
-0.060
-0.091
-0.083
-0.030
0.032
-0.065
0.048
0.600
-0.090
-0.201
-0.039
-0.044
-0.031
-0.053
-0.080
-0.095
-0.156
-0.067
-0.046
-0.044
-0.029
-0.072
-0.089
-0.061
-0.046
-0.042
-0.055
-0.037
-0.051
-0.001
-0.294
0.102
0.553
-0.323
-0.648
-0.237
-0.274
-0.256
-0.306
-0.440
-0.260
-0.402
-0.301
-0.293
-0.284
-0.314
-0.246
-0.283
-0.234
-0.263
-0.263
-0.258
-0.244
-0.204
-0.137
-0.121
0.075
0.627
-0.174
-0.297
-0.117
-0.137
-0.116
-0.153
-0.243
-0.076
-0.276
-0.159
-0.134
-0.145
-0.098
-0.057
-0.042
-0.069
-0.096
-0.094
-0.105
-0.051
-0.044
0.024
-0.342
0.072
0.661
-0.324
-0.472
-0.306
-0.351
-0.321
-0.359
-0.435
-0.284
-0.457
-0.378
-0.358
-0.349
-0.335
-0.269
-0.291
-0.298
-0.348
-0.331
-0.351
-0.322
-0.287
-0.290
difference for local optima (lopt)
1 ¯
1 ¯
d¯e
d¯pc
d¯pn
N deu
2N dear
Table 7.3: Average values of distance in sets of random solutions and local optima.
115
116
1
random solutions
local optima
average distance
0.8
0.6
0.4
0.2
0
d_e
d_pc
d_pn
d_eu
distance type
d_ear
Figure 7.7: Aggregate average values of distance in sets of random solutions and
local optima.
One should note a change in the values of the standard deviation of d¯e between rand and lotp.
It rises from 0.009 to 0.063, meaning that while in random solutions the values of the average
distance does not really differ between instances, the difference becomes higher for local optima.
For example, d¯e is much smaller than the aggregate average in case of f71, c100b, tai75d, tai75c,
tai100a. Much larger values of distance appear for c120, c199 and tai385. For the last instance the
distance between local optima seems to grow compared to random solutions, but this is also the
effect of the artificially low distance in the rand pool due to the mentioned numerous depot edges.
The value of rN , the correlation coefficient between d¯e and N , the instance size, is higher for
local optima than for random solutions, and in the former group it amounts to 0.61. This means
that d¯e depends to some extent on the instance size.
Comment on dpn Contrary to d¯e , the values of d¯pn in the rand pool are extremely high, nearly
reaching the maximum possible value. In lopt these values drop significantly, to 0.7 on average,
meaning that approximately 30% of all pairs of nodes are in the same routes in local optima,
irrespective of the order and adjacency of these nodes. In such solutions certain nodes should not
be put in separate routes.
The comparison of standard deviations between rand and lopt reveals that the difference between instances becomes considerable when looking at local optima. The coefficient of variation
rises 148 times in the second pool. For example, d¯pn drops from 0.99 to 0.34 in case of f71 and
from 0.99 to 0.55 for tai75d. These are very high changes compared to a drop of 0.13 and 0.2 for
tai385 and c199, respectively.
The value of correlation rN changes from 0.8 for rand to 0.55 for lopt, It means that while the
distance of random solutions highly depends on instance size, the dependence loses some strength
in sets of local optima.
Comment on dpc
Distance d¯pc between random solutions does not reach the theoretical maxi-
mum of 1.0 here, similarly to the case of de . This is probably due to the fact that there is some
probability of finding at least one common customer vertex in optimistically matched routes of
any two solutions. Nevertheless, in local optima the distance is smaller on average, and amounts
to 0.79. This means that even the most pessimistic of the optimistic matches of routes reveal some
common vertices. This effect is better measured by dpn , the author deems.
117
The increase in the value of standard deviation between rand and lopt, from 0.01 to 0.05,
indicates slightly greater variability between instances with respect to d¯pc when local optima are
examined. Here, the positive examples are again: f71, c100b, and perhaps c100, c50. The negative
ones are: tai385, tai75b, tai100d.
Comment on deu The values of d¯eu in rand pool seem to be approximately the same for all
instances, around 0.81. Again, the theoretical maximum of 1.0 is not reached by the measure,
perhaps due to the same reasons as for de or dpc . In the other pool, lopt, the value drops by 0.12,
meaning that 12% less edit operations are required to convert one local optimum into another one
than in case of random solutions. This is a significant change, although not a huge one. It means
that some subsequences of routes are conserved in local search.
The coefficient of variation of d¯eu increases 9 times between random solutions and local optima.
This signifies that differences between instances are much more visible when local optima are
considered. For example, in case of f71, c100b and tai75d the values of d¯eu change by more than
0.2, while in case of tai385, f134 and c199 the change is by less than 0.05.
Somewhat surprising is the fact that the value of correlation rN changes its sign when shifting
from rand (-0.42) to lopt (0.63).
Comment on dear In the pool of random solutions d¯ear seems to be largely dependent on
instance size: rN is as high as 0.89. With increasing instance size d¯ear appears to slowly approach
1.0, which looks sensible given the fact that it means 2 add or remove operations per customer to
change one random permutation into another one.
This value drops significantly in the lopt pool: by 0.34 on average. This is the largest drop
of all the considered measures. It indicates that there are important regularities in local optima
which are not present in random permutations.
The standard deviation of d¯ear rises from 0.03 to 0.07, meaning the difference between instances
is slightly more visible when examining local optima. Here, the positive examples are again: f71,
c100b, tai75d, the negative ones being: c199, c120, f134, tai385.
7.5.4
Fitness-distance relationships
The second stage of the fitness-distance analysis is an attempt to find trends in the sets of local
optima themselves and verify the ‘big valley’ hypothesis: if better solutions tend to be closer (more
similar) to each other. In case of this study positive values of the fitness-distance correlation would
indicate a ‘big valley’ structure.
The verification was performed with the method of analysis of a set of pairs of local optima (see
section 5.4.6), which leads to the computation of values of the linear determination coefficient r2
between fitness and distance as an indicator of FDC. The same sets of 1000 pairs of local optima
were used as in the previous section. The obtained values of r2 are given in table 7.4. All the
values marked as significant correspond to positive correlations. The author’s classification of each
instance according to the ‘big valley’ status is given in the last column.
The values of r2 emphasised in boldface in table 7.4 are those greater than 0.18. One such
value corresponds to two independent values of correlations r2 (f1 , d) and r2 (f2 , d) being at least
0.3. Although these values are not large as correlations, the author thinks they are significant as
indicators of FDC (compare to values in table 5.1). Jones & Forrest (1995) employed the value of
0.15 as a border between significant and insignificant correlations, so 0.3 used here is much more
prudent (twice as high), although still arbitrary.
118
Table 7.4: Values of the linear determination coefficient r2 between fitness and
each distance measure for all instances.
instance
linear determination coefficient
big valley?
re2
2
rpc
2
rpn
2
reu
2
rear
c50
f71
c75
tai75a
tai75b
tai75c
tai75d
c100
c100b
tai100a
tai100b
tai100c
tai100d
c120
f134
c150
tai150a
tai150b
tai150c
tai150d
c199
tai385
0.167
0.237
0.112
0.165
0.025
0.187
0.244
0.197
0.347
0.079
0.238
0.246
0.178
0.020
0.160
0.296
0.009
0.072
0.190
0.042
0.237
0.123
0.069
0.046
0.011
0.021
0.026
0.034
0.059
0.062
0.211
0.034
0.032
0.036
0.082
0.091
0.099
0.015
0.003
0.035
0.046
0.011
0.014
0.047
0.173
0.378
0.112
0.156
0.075
0.258
0.278
0.126
0.468
0.110
0.233
0.380
0.272
0.152
0.045
0.232
0.000
0.223
0.217
0.057
0.242
0.158
0.150
0.326
0.079
0.163
0.070
0.200
0.199
0.116
0.414
0.114
0.212
0.290
0.198
0.065
0.083
0.207
0.002
0.126
0.216
0.044
0.204
0.106
0.016
0.010
0.001
0.017
0.008
0.016
0.006
0.004
0.027
0.020
0.008
0.031
0.017
0.014
0.010
0.025
0.001
0.021
0.038
0.010
0.008
0.001
avg.
std. dev.
0.162
0.093
0.049
0.045
0.197
0.116
0.163
0.097
0.014
0.010
amb.
yes
no
amb.
no
yes
yes
yes
yes
no
yes
yes
yes
amb.
amb.
yes
no
yes
yes
no
yes
amb.
All cases with r2 ∈ [0.15, 0.18) are typeset in italic. These are values which are deemed
‘borderline cases’; perhaps there exists a ‘big valley’, but there is more doubt about it.
An instance is classified as ‘big valley’=‘yes’ if there is at least one r2 value not less than 0.18.
It is in the ‘no’ class when no r2 is larger than 0.15. Otherwise it is said to be ambiguous (‘amb.’).
Comment on distance measures First general observation made based on values in table 7.4
is that dear is not correlated with fitness at all. Thus, it seems that this type of distance does not
2
reveal any ‘big valley’ in the CVRP. A very similar conclusion may be derived from values of rpc
:
dpc does not correlate with fitness with one exception, c100b, which has the largest values of r2
for all types of distance. The properties measured by these distances, whatever they may be, are
not important for the objective function.
Conclusions are different in case of the three other measures. Firstly, de reveals fitness-distance
correlation for 10–14 instances out of 22. Significant values of FDC indicate that in these cases
better solutions tend to contain some more common edges, on average, so the presence of some
edges is important for good quality of solutions. The highest FDC values are obtained for instances
c100b, c150, tai100c. Virtually zero correlations happen for tai150a, c120, tai75b. The largest
instance, tai385, has it very small, so that r2 =0.12.
Secondly, when dpn is taken into account, it appears as though there are ‘big valleys’ in 11–15
cases (mostly the same as for de ). It means that for these instances better local optima usually
contain more similar clusters (assignments of customers to routes) than poorer ones and that
119
certain contents of clusters are important for good quality of solutions. The best instances from
this point of view are c100b, tai100c, f71. The worst are tai150a, f134, tai150d. The largest
instance, with r2 =0.157, seems to be an ambiguous case, where some traces of a ‘big valley’
structure may be found.
Finally, deu reveals reasonable fitness-distance correlations in 10-11 cases (again, usually the
same as for de ). This result suggests that better solutions of the CVRP are more closely related in
terms of edit operations on routes than worse local optima. The highest values of r2 are for c100b
(again), f71, tai100c, while the lowest for tai150a (again), c120, c75.
Comment on instances Looking at table 7.4 from the point of view of instances, one can see
that 9 instances out of 22 reveal ‘big valleys’ for 3 distance measures: de , dpn , and deu . 3 more
problem examples reveal FDC for at least one of the measures. The most obvious example here
is c100b, with the highest r2 coefficients of all the studied instances, while tai100c, c150, f71 and
tai75d follow.
There are also 5 instances which do not reveal any trace of fitness-distance correlation with
respect to any distance measure used here: c75, tai100a, tai150a, tai150d, tai75b. Values of FDC
for each of them are very small. A negative example may be tai150a: all r2 are virtually zero.
The other instances listed in table 7.4 are intermediate cases: there is some indication of ‘big
valley’ with respect to some measures, but not when some other distances are taken into account.
Fitness-distance plots The conclusions derived from the values of FDC may be further verified
through inspection of fitness-distance plots. In this study, 2-dimensional FD plots are constructed
based on the mentioned 3-dimensional observations by cutting a slice through the 3 dimensions
along the line of solutions with approximately the same fitness (max. 2% difference in quality).
In figure 7.8 one can see FD plots generated for instance c150 and all types of distance. As
revealed by the values of r2 in table 7.4, fitness is hardly correlated with dear and dpc : there is
large variance of distance in the presented sample and no significant trend. Distances de , dpn and
deu are different in the figure. Although the scatter of distance is still large, a moderate trend is
also visible: solutions with smaller fitnesses tend to be slightly closer to each other, as indicated
by the superimposed lines of first-order regression. The strongest trend, with the smallest variance
of distance, appears for de , as predicted by the largest r2 for this distance measure: r2 =0.296.
Another examples of FD plots confirming the presence of ‘big valleys’ are shown in figure 7.9.
The plot generated for instance tai150c and dpn shows that with the improving solutions quality
the average distance between local optima decreases, as indicated by the regression line; here
r2 =0.217. However, in this plot the variance of distance seems to grow at the same time.
There is no such phenomenon for c100b and deu . Here, the variance of distance appears to
be constant, while the average distance between local optima drops significantly with increasing
solution quality (r2 =0.414).
Some more plots for the positive ‘big valley’ cases are shown in figures 7.10 and 7.11. They
generally confirm the earlier conclusions based on r2 , that fitness landscapes of these instances
reveal the ‘big valley’ structure.
Instance tai385 with distance dpn (r2 =0.158) and tai75a with de (r2 =0.165) are examples of
the ambiguous ‘big valley’ status (figure 7.12). Here, there are some visible changes to the average
distance between lopt solutions, but at the same time the variance of distance is rather large and
obscures the trends to high extent.
There are also negative examples in the studied set of instances, tai150a being one of them. FD
scatter plots for this problem example are shown in figure 7.13. One can clearly see that there are
120
0.8
140
0.75
130
0.7
120
d_eu
d_e
0.65
110
0.6
100
0.55
90
0.5
1060
1080
1100
1120
fitness
1140
1160
80
1060
1180
0.9
0.8
0.85
0.75
0.8
1100
1120
fitness
1140
1160
1180
d_pc
d_pn
0.85
1080
0.7
0.75
0.65
0.7
0.6
1060
1080
1100
1120
fitness
1140
1160
1180
1080
1100
0.65
1060
1080
1100
1120
fitness
1140
1160
1180
270
260
250
d_ear
240
230
220
210
200
190
1060
1120
fitness
1140
1160
1180
Figure 7.8: Fitness-distance plots with local optima for instance c150 and all types
of distance, together with lines of regression.
0.85
80
0.8
70
0.75
60
0.65
0.6
0.55
0.5
0.45
2400 2450 2500 2550 2600 2650 2700 2750 2800
fitness
d_eu
d_pn
0.7
50
40
30
20
10
820
840
860
880
900 920
fitness
940
960
Figure 7.9: Fitness-distance plots with local optima for instance tai150c, distance
dpn (left) and instance c100b, distance deu (right), together with lines of regression.
980
121
0.8
0.8
0.75
0.7
0.7
d_e
d_e
0.6
0.5
0.65
0.6
0.55
0.4
0.5
0.3
1960 2000 2040 2080 2120 2160 2200 2240
fitness
0.45
840
860
880
900
fitness
920
940
0.9
90
0.8
80
0.7
70
d_eu
d_pn
Figure 7.10: Fitness-distance plots with local optima for instance tai100b, distance
de (left) and instance c100, distance de (right), together with lines of regression.
0.6
60
0.5
50
0.4
40
0.3
1600
1640
1680
1720
fitness
1760
1800
30
1400
1840
1440
1480
1520
fitness
1560
1600
1640
Figure 7.11: Fitness-distance plots with local optima for instance tai100d, distance dpn (left) and instance tai100c, distance deu (right), together with lines of
regression.
0.8
0.88
0.7
0.86
0.6
d_e
d_pn
0.9
0.84
0.5
0.82
0.4
0.8
26000
26400
26800
fitness
27200
27600
0.3
1640
1680
1720
1760
fitness
1800
Figure 7.12: Fitness-distance plots with local optima for instance tai385, distance
dpn (left) and instance tai75a, distance de (right), together with lines of regression.
1840
122
no traces of the positive trend; in case of de , dpn and deu the average distance between local optima
even seems to increase with decreasing fitness, but the variance of distance is large at the same
time. Surely, there is no ‘big valley’ here from the point of view of this sample of local optima.
0.75
130
0.7
120
110
100
0.6
d_eu
d_e
0.65
0.55
90
80
0.5
70
0.45
60
0.4
3150 3200 3250 3300 3350 3400 3450 3500
fitness
50
3150 3200 3250 3300 3350 3400 3450 3500
fitness
0.85
0.95
0.8
0.9
0.75
0.85
d_pc
d_pn
0.7
0.65
0.6
0.8
0.75
0.55
0.7
0.5
0.45
3150 3200 3250 3300 3350 3400 3450 3500
fitness
0.65
3150 3200 3250 3300 3350 3400 3450 3500
fitness
270
260
d_ear
250
240
230
220
210
200
3150 3200 3250 3300 3350 3400 3450 3500
fitness
Figure 7.13: Fitness-distance plots with local optima for instance tai150a and all
types of distance, together with lines of regression.
7.5.5
Main conclusions from the fitness-distance analysis
Generally speaking, local optima are closer to one another than random solutions, but the decrease
in distance is moderate. The greatest difference is observed for dear (-34% on average), and then for
dpn (-29%). For de and deu the computed differences are smaller, but the real ones are estimated
to approximately 30%, as well. This effect is somewhat fainter for instances c199, tai385 and c75,
while it is much stronger for f71, c100b and tai75d.
The average distance between local optima is to some extent correlated with instance size, even
irrespective of distance type, with rN around 0.6. Hence, it seems that larger instances have local
optima generally more spread in the fitness landscape than smaller ones. This is rather a negative
effect, since one would wish larger (and harder) instances to have local optima more clustered.
The standard deviations of d¯ across instances are higher for local optima than for random
solutions. It indicates that the average distance in the lopt pool depends on the particular instance.
123
In spite of those two rather negative effects, the decrease in the average distance between rand
and lopt is a fact. It confirms that a metaheuristic algorithm for the CVRP should contain a local
search component to increase efficiency, since such an approach helps to reduce to some extent the
size of the search space to be searched for good solutions.
At the same time the author admits that the decrease of approximately 30% of distance is not
a huge one. The average similarity of 30% means that there is not enough common properties
of solutions to define a complete and good new solution; still about 70% of its properties (edges,
clusters, subsequences) has to be added to the common ones. However, in terms of the size of the
space to be searched this 30% may mean a considerable reduction and gain for a search algorithm;
compare the computation of the reduction in the TSP case (Boese 1995).
The result that about 70% of solution properties has to be added to the common ones agrees
to some extent with one of the conclusions of Reeves (1999) made for the flowshop scheduling
problem: there appeared to be some ‘wall’ between local optima and the global optimum which
did not allow the former to be arbitrarily close to the latter. This is visible e.g. in the case of
tai385 and dpn (see figure 7.12), where no solutions are closer to each other than dpn = 0.8.
The landscapes revealed also moderate correlations between fitness and distance in more than
half of the studied instances. The correlations are significant mainly for measures dpn , deu and de ,
with the other measures being generally uncorrelated. It means that better local optima tend to
have more clusters, subsequences or edges in common than worse ones, although the variance of
distance is also visible along the trends.
9 instances out of the studied 22 reveal ‘big valleys’ for the 3 types of distance. 3 more instances
reveal it for at least one of the distances. According to the hypothesis that the presence of such a
structure facilitates search by an evolutionary algorithm with distance-preserving operators, these
instances should be easy for such an algorithm.
Apparently, there is no FDC in case of 5 instances and these are: c75, tai75b, tai100a, tai150b,
tai150d. These instances should be rather difficult for optimisation by such an algorithm, unless
some other properties of the instances make them easy.
Finally, 5 ambiguous cases were found.
Therefore, it seems, somewhat sadly, that the presence of a ‘big valley’ structure is a property
of an instance of the CVRP, not of the problem itself. FDC is not a reliable problem property on
which a metaheuristic algorithm may be founded, because it may or may not exist in the fitness
landscape. The design of an algorithm which assumed that the positive correlation always existed
could be prone to inefficiency, unless some other components were implemented, e.g. mutation.
Concerning the distance measures presented in this chapter, the results for the distance in
terms of edges (de ) confirm to some extent that the intuitive idea of preserving parental edges in
recombination operators for the CVRP does make sense. This idea was the cornerstone of the
operators employed in the efficient algorithms by Alba & Dorronsoro (2004) (ERX) and Nagata
(2007) (EAX), both inspired by similar results for the TSP.
The results for the distance in terms of pairs of nodes (dpn ) confirm that approaches based
on the ‘cluster first/route second’ paradigm are also sensible. Taillard’s (1993) and Rochat &
Taillard’s (1995) algorithms were founded exactly on the idea: a good heuristic clustering should
be only slightly changed in order to obtain good solutions. The second of the mentioned algorithms
even combined this approach with the previous one, based on the preservation of routes (edges),
and this was probably the cause of its long-lasting success.
The last measure which correlates with fitness, deu , reveals that in some instances it is important
to preserve certain subsequences of customers in routes. This result may be exploitable in an
optimisation algorithm, but this will not be attempted in this thesis.
124
7.6
Recombination operators
Based on the results of the FDA, the designed recombination operators should be distancepreserving with respect to dpn and de .
All the presented operators preserve the feasibility of offspring if parents are also feasible.
7.6.1
CPX2: clusters preserving crossover
Algorithm 12 computes clusters of customers which are common to two given solutions. It is used
in the definition of the crossover operator, in algorithm 13.
Algorithm 12 CommonClusters(s1 , s2 )
Ensure: ∀c ∈ Customers ∃cc ∈ Clusters : c ∈ cc
Clusters = ∅ {the common clusters}
C1 = C(s1 )
C2 = C(s2 )
for all ci ∈ C1 do
for all cj ∈ C2 do
cc = ci ∩ cj
if |cc| ≥ 1 then
Clusters = Clusters ∪ {cc}
return Clusters
The idea behind the CPX2 procedure is quite straightforward: build a random route in each
common cluster. If a cluster contains only one customer, a single-customer route is built.
Algorithm 13 CP X2(p1 , p2 )
o=∅
Clusters = CommonClusters(p1 , p2 )
for all cc ∈ Clusters do
o = o ∪ {RandomRoute(cc)}
return o
The unexplained procedure RandomRoute(cc) simply creates a random permutation of the
given customers cc and builds a route from the permutation. Therefore, the result is randomised.
The first version of this algorithm, CPX (Kubiak 2004), preserved only the largest intersection
of each cluster from one solution with a cluster from the other one. This has been improved in
CPX2, by correcting the definition of CommonClusters.
Most likely, although this has not been formally proved, the operator preserves distance dpn ,
so that the following condition holds:
dpn (p1 , o) ≤ dpn (p1 , p2 )
∧
dpn (p2 , o) ≤ dpn (p1 , p2 )
An example offspring of this crossover is generated from parents which are shown in figure 7.14.
These are solutions of instance tai75a. One of the parents (on the left side) is the best-known
solution for this instance, while the other is a local optimum of CW , 2opt and swap operators.
The offspring of CPX2 is shown in figure 7.15, on the left.
In the figure the customers being in some common cluster are emphasised. The actual clusters
are difficult to present in a black and white figure, but one may verify the common clusters in figure
7.14 with parents. It may be seen that common parental clusters are preserved in the offspring,
but not the common edges, because the order of customers in clusters is chosen randomly. Due to
this fact, such a solution may be very expensive, unless local search is performed.
7.6. Recombination operators
125
The distance dpn between parents is preserved in this example: dpn (p1 , p2 ) = 0.755, dpn (p1 , o) =
0.654, dpn (p2 , o) = 0.664.
7.6.2
CEPX: common edges preserving crossover
This operator, explained in algorithm 14, aims at preservation of distance de . It builds routes out
of common edges (paths). All customers which are not incident with any common edge are put in
separate single-customer routes in the offspring.
Algorithm 14 CEP X(p1 , p2 )
o=∅
C = Customers
CE = E(p1 ) ∩ E(p2 )
while CE 6= ∅ do
ce is an edge in CE
p = M aximumP ath(ce, CE)
o = o ∪ Route(p)
for all customers v on path p do
C = C \ {v}
for all edges e in path p do
CE = CE \ {e}
for all v ∈ C do
o = o ∪ {(v0 , v)}
return o
The set of common edges, CE, contains all the edges that are common to parents p1 and
p2 . These edges may (and frequently do) define longer paths, some of them containing the depot
vertex. Therefore, the procedure M aximumP ath(ce, CE) builds a maximum path p based on
those edges in CE which are connected with the chosen edge ce. Each such path is later converted
into a route by a call to Route(p). This results in a deterministic offspring for each pair of parents.
An exemplary offspring, generated from the same parents, is shown in figure 7.15 (right). The
edges common to parents are emphasised. There are especially many such edges around the depot
vertex and near the convex hull of the set of vertices. All these are preserved in the offspring and
none foreign customer-to-customer edge is added; there are many routes.
Distance de is almost preserved in this case: de (p1 , p2 ) = 0.621, de (p1 , o) = 0.640, de (p2 , o) =
0.656. It is not exactly preserved because of the mentioned numerous routes: the parents have less
routes than the offspring and hence they have less different edges; the offspring has many depot
edges which may be found in neither parent, so these edges additionally contribute to the value of
distance to the parents.
7.6.3
CECPX2: common edges and clusters preserving crossover
This operator preserves in an offspring the sets of common clusters and common edges of its
parents. Thus, it also aims at preservation of dpn and de . It uses a helper algorithm 15 to find in
the given cluster cc all paths which can be assembled out of common edges CE.
The CECPX2 procedure (algorithm 16) first creates routes within common clusters using paths
made out of common edges. If there is no edge (path) connecting customers in a cluster, they are
linked randomly. Similarly to CPX2, single-customer clusters define single-customer routes.
In algorithm 16 the operation t = t · p is a concatenation of the path p at the end of route t.
An exemplary offspring of CECPX2 is shown in figure 7.16 (left). The common properties of
parents (edges and clusters) are emphasised like in the CPX2 and CEPX cases. Comparing the
126
Figure 7.14: Solutions used as parents in crossover examples. Instance tai75a.
Figure 7.15: Offspring of the CPX2 (left) and CEPX operators (right).
Algorithm 15 M aximumP athsInCluster(CE, cc)
Ensure: ∀c ∈ cc ∃p ∈ P aths : p contains c
P aths = ∅
while cc 6= ∅ do
v is a customer in cc
if CE contains an edge ce with v then
p = M aximumP ath(ce, CE)
P aths = P aths ∪ {p}
for all customers v on path p do
cc = cc \ {v}
else
P aths = P aths ∪ {(v)} {a single-customer path (without edges)}
cc = cc \ {v}
return P aths
7.6. Recombination operators
127
Algorithm 16 CECP X2(p1 , p2 )
o=∅
Clusters = CommonClusters(p1 , p2 )
CE = E(p1 ) ∩ E(p2 )
for all cc ∈ Clusters do
P aths = M aximumP athsInCluster(CE, cc)
t = (v0 ) {an empty route}
if P aths contains a path p1 from the depot then
t = t · p1 {start from the depot side}
P aths = P aths \ {p1 }
if P aths contains another path p2 from the depot then
P aths = P aths \ {p2 } {remember for later use}
while P aths 6= ∅ do
choose randomly a path p from P aths
t=t·p
P aths = P aths \ {p}
if p2 exists then
t = t · p2 {finish at the depot side}
o = o ∪ {t}
return o
offspring to its parents (figure 7.14) one may note that all the common properties are preserved in
the generated solution. The result is slightly randomised.
In this example distances de and dpn are preserved: de (p1 , p2 ) = 0.621, de (p1 , o) = 0.582,
de (p2 , o) = 0.590; dpn (p1 , p2 ) = 0.755, dpn (p1 , o) = 0.564, dpn (p2 , o) = 0.641. The first distance is
preserved because there are less routes in this offspring than in the one of CEPX.
7.6.4
GCECPX2: greedy CECPX2
This operator (algorithm 17) extends CECPX2 by connecting common paths in a greedy way and
by considering also the possibility of merging different common clusters. This greedy approach
is motivated by the results of the analysis of the average distance between local optima (section
7.5.3), which indicated that preserving only the common parental features in an offspring may be
not enough to create a good solution.
While creating an offspring, sometimes it may happen that a cluster of customers contains only
one depot-to-depot path. In this case the cluster must not be merged with any other, because
some common edges would have to be broken. Therefore, such clusters define complete, separate
routes. This is covered by lines 9 to 11 of algorithm 17.
In other cases the common paths do not define complete routes. Thus, they may potentially be
merged, even if they belong to different clusters. However, the conditions under which a particular
common path p (line 19) must not be attached to the current route t are quite complex:
• it is already present in the offspring (line 21);
• it is in the same cluster as the current route and it leads from the depot, and there are still
some other paths in the same cluster (line 24); in this case taking p would force finishing the
route, coming back to the depot, and in effect breaking the common cluster;
• it is a path in a different cluster and:
– the sum of demands of this and the current cluster would exceed the vehicle capacity
(line 29);
128
Algorithm 17 GCECP X2(p1 , p2 )
1: o = ∅
2: Clusters = CommonClusters(p1 , p2 )
3: CE = E(p1 ) ∩ E(p2 )
4: AllP aths = M aximumP athsInCluster(CE, Customers)
5: while Clusters 6= ∅ do
6:
cc is a randomly chosen cluster from Clusters
7:
P aths = M aximumP athsInCluster(CE, cc)
8:
t = (v0 )
9:
if P aths contains a depot-to-depot path p then
10:
t=t·p
11:
P aths = P aths \ {p}
12:
else
13:
if P aths contains a path p1 from the depot then
14:
t = t · p1
15:
P aths = P aths \ {p1 }
16:
if P aths contains another path from the depot then
17:
p2 is this path from the depot
18:
while P aths 6= ∅ do
19:
for all paths p in AllP aths do
20:
IsF easibleP ath(p) = true {assume the path may be attached to route t}
21:
if p is already present in o then
22:
IsF easibleP ath(p) = false
23:
else if p ∈ P aths then
24:
if (p = p2 ) and (|P aths| > 1) then
25:
26:
else
27:
cc2 is the cluster of this path p
28:
P aths2 = M aximumP athsInCluster(CE, cc2 )
29:
if d(cc) + d(cc2 ) > C then
30:
31:
else if P aths2 contains a depot-to-depot path or two depot paths then
32:
33:
else if (P aths2 contains one depot path) and (p2 exists) then
34:
35:
else if (p is a path to the depot) and ((|CP | > 1) or (|CP2 | > 1)) then
36:
37:
M inP athDist = ∞
38:
for all p ∈ AllP aths do
39:
Dist = GetDistance(LastCustomer(t), p)
40:
if (Dist < M inP athDist) and (IsF easibleP ath(p)) then
41:
M inP athDist = Dist
42:
p0 = p
43:
if M inP athDist < ∞ then
44:
if p0 6∈ P aths then {merge the cluster of p0 into the cluster of t}
45:
cc2 is the cluster of this path
46:
P aths2 = M aximumP athsInCluster(CE, cc2 )
47:
cc = cc ∪ cc2
48:
P aths = P aths ∪ P aths2
49:
t = t · p0
50:
P aths = P aths \ p0
51:
Clusters = Clusters \ cc
52:
o = o ∪ {t}
53: return o
7.7. CPM: clusters preserving mutation
129
– this cluster contains a depot-to-depot path, i.e. defines a complete route on its own (line
31);
– this cluster contains two depot paths; in case of a cluster merge one would have to be
broken (also line 31);
– this cluster contains one depot path but the current route cluster also contains one; one
would have to be broken (line 33);
– this path leads from the depot and there are some other paths in this or the current
route cluster; coming back to the depot would brake a common cluster (line 35).
Once it is established which paths are feasible to add (i.e. the function IsF easibleP ath(p) is
computed), the one is chosen which is closest to the last customer of the current route t. This
is the greedy part of the operator (lines 37 to 42). If this chosen path p0 comes from some other
cluster, then this cluster and its paths are merged with the current cluster and its paths (lines
44 to 48). After attaching path p0 to the current route t this process of building the offspring is
continued with another path, and for all remaining common clusters.
Although the operator is a greedy and distance-preserving one, the result is randomised, because
of the random order of considering common clusters in algorithm 17 (line 6).
The offspring generated from the exemplary parents is shown in figure 7.16 (right). Common
parental properties are emphasised and, one can see, also preserved in the offspring, exactly like
in the CECPX2 case. GCECPX2, however, also connected some of the common clusters; one
example is the top left route of the offspring, which corresponds to two routes in the CECPX2
result. Therefore, an offspring of GCECPX2 will usually have fewer routes than its CECPX2
counterpart.
Distances de and dpn are preserved for this offspring: de (p1 , p2 ) = 0.621, de (p1 , o) = 0.546,
de (p2 , o) = 0.517; dpn (p1 , p2 ) = 0.755, dpn (p1 , o) = 0.604, dpn (p2 , o) = 0.605.
7.7
CPM: clusters preserving mutation
The idea of this mutation is to alter a parent in a way which does not change the contents of
any route (the clusters). Only the order of customers in one route is changed, as one can see in
algorithm 18. The route to be altered is usually chosen randomly, with uniform distribution over
routes.
Algorithm 18 CP M (p, t)
m=∅
copy all routes from p to m except t
cc is the set of customers of t
m = m ∪ {RandomRoute(cc)}
return m
This way the operator builds mutant m based on parent p so that dpn (p, m) = 0.
An example of this mutation is shown in figure 7.17. The mutated solution is the left parent
from figure 7.14. The mutant edges which differ with the parent are emphasised.
In this example obviously dpn (p, m) = 0; de (p, m) = 0.111 and deu (p, m) = 4.
7.8
Experiments with initial solutions
This section describes experiments focused on the quality of solutions produced by heuristic algorithms, since these solutions initialise the population of the designed MA. The algorithms investi-
130
gated here are: Clarke & Wright (CW), Gillet & Miller (GM), First-Fit (FF), Gain-Fit (GF) and
random, also with greedy or steepest local search applied to their result.
In the actual experiment deterministic algorithms were run one time per parameter setting;
all variants of parameters were generated for the GM and GF algorithms. Samples of 30 random
solutions were taken for each instance.
Concerning local search, 30 runs of greedy LS were performed on each parameter setting, since
this LS is randomised. The steepest version was run once only on each heuristic solution.
All experiments were performed with the code developed by the author in C++. The computation was performed on a desktop PC with Pentium 4 2.6 GHz processor, 1 GB RAM, running
MS Windows XP.
The results, averaged over instances, are shown in figure 7.18. Although this cannot be seen in
the figure, the quality of random solutions without LS is extremely poor, as expected; the average
excess amounts to 488%. The FF heuristic is also poor, 143%, while GF improves the result to
62%. The GM algorithm is better and produces solutions on average 36% worse than the bestknown ones. Definitely the best is the CW local search, which produces solutions with only 5.6%
excess, on average.
The quality of solutions after greedy LS improves. For the CW algorithm this improvement
amounts only to 1%, while for other types of heuristics the drop of solution cost is considerable:
to 15.6% for FF, 10.2% for GM and 9.2% for GF. The CW heuristic remains the best with the
average excess of 4.7%.
The steepest algorithm changes little. There is approx. 1% of improvement for the random,
GM and GF algorithms, no change for the CW heuristic, and, quite surprisingly, some drop in
quality for the FF heuristic.
The examination of the worst solutions across instances revealed that there was no great deterioration. For example, without local search the worst CW solution had excess of only 11.5%, while
10% with LS. Other types of solutions were worse than this, but this result shows than even in the
worst case the Clarke & Wright algorithm may be improved only by 10% by any metaheuristic.
However, in the optimistic scenario it may reach solutions as close as 1% to the best-known ones
(e.g. for instances tai75b, c120).
Concerning the time of computation, the generation of all but CW solutions took next to no
time. The CW heuristic took slightly more time, since it is a steepest algorithm. With local search,
all algorithms always finish computation within 10 s, even for the largest instances, with steepest
LS being slightly slower than greedy.
Quite surprisingly, it was also found that local search made only small changes to CW solutions,
thus proving that this heuristic indeed produces results of high quality and near local optima; the
average distance between the heuristic solution before and after LS amounted only to de = 0.17
and dpn = 0.07. This was quite different for the other algorithms, e.g. for the GM algorithm it was
de = 0.6 and dpn = 0.55, and this was the next smallest change. Predictably, random solutions
were almost completely changed by local search: de = 0.93 and dpn = 0.98.
Overall, the CW heuristic generates the best solutions, on average, with GM being second and
GF third. With the help of LS this ranking changes a bit: CW still comes first, then GF and GM
or random. These solutions (after greedy LS) are put in the initial population of the MA.
131
7.8. Experiments with initial solutions
Figure 7.16: Offspring of the CECPX2 (left) and GCECPX2 operators (right).
Figure 7.17: CPM mutant of the best-known solution of instance tai75a.
average excess [%]
70%
60%
50%
no LS
40%
greedy
steepest
30%
20%
10%
0%
random
CW
GM
FF
GF
initial solution type
Figure 7.18: Average quality of heuristic solutions over all instances, without and
with local search.
132
7.9
7.9.1
Experiments with memetic algorithm
Long runs until convergence
The goal of this experiment was to assess the limits of the potential of the memetic algorithm
with different crossover operators. The main question here was what quality of solutions the
MA could generate for a reasonably-sized population, irrespective of the time of computation.
Therefore, very long runs of the algorithm were allowed, very likely until complete convergence:
the population of 30 individuals was modified until 150 consecutive generations without any change
in this population, and the best-found solution was recorded. The time of computation was also
an important aspect and was recorded, as well.
Both the designed crossovers and some operators from the literature were considered: CPX2,
CEPX, CECPX2, GCECPX2, RBX and SPX (see sections 6.4.3 and 6.4.4). This resulted in 6
versions of the basic (pure) MA. CPM always played the role of mutation (if mutation was enabled)
and the greedy LS was employed to initial solutions and offspring: firstly with the CW operator
and then with the combined 2opt and swap neighbourhoods, like in the experiments with local
search speed (section 7.3.6).
Another issue was the impact of mutation on the results of the MA. Therefore, one configuration
without mutation was considered (denoted ‘noMutation’).
Each version and configuration of the MA was executed 15 times on each of the 22 considered benchmark instances of the CVRP, in order to get reliable-enough estimates of average and
standard deviation of two interesting quantities: time of computation and solution quality. All
these runs were performed on identical desktop PCs with Intel Pentium 4 2.6 GHz processor, 2
GB RAM, running MS Windows XP.
Quality of solutions: aggregated results The aggregated results of the experiment are presented first. Although such aggregation obscures the differences between instances, it may be
useful in getting the idea about the general performance of all the algorithm versions.
Figure 7.19 presents the average (bars) and one standard deviation (whiskers) of the results of
all MAs, aggregated across 22 instances and 15 runs per instance.
The figure shows that all versions of the basic algorithm (with mutation) perform very well:
the average quality of solutions exceeds the quality of the best-known ones only by a fraction of
one percent (0.5–0.7%), with the deviation of no more than another percent.
The very good quality of results of the basic MA versions is also confirmed by the number of
instances on which the best-known solutions have been found. This is shown in table 7.5, which
is an aggregated view of detailed tables B.1 and B.2.
Table 7.5: The number of instances for which each basic version of the MA found
the best-known solution in long runs. Best: the best run of 15 for a version; all: all
runs of a version.
MA version
best
all
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
11
11
10
9
9
10
5
5
6
4
3
5
In case of CPX2 and CEPX the best-known solutions were found in some runs for 11 instances
133
7.9. Experiments with memetic algorithm
2,50%
2,25%
average excess [%]
2,00%
1,75%
1,50%
1,25%
1,00%
0,75%
CPX2
CEPX
CECPX2
GCECPX2
RBX
0,57%
0,71%
0,67%
0,61%
0,00%
0,51%
0,25%
0,54%
0,50%
SPX
MA type
noMutation
basic
Figure 7.19: Aggregated quality of solutions generated by different versions and
configurations of the memetic algorithm; long runs.
out of 22. These are mostly smaller ones, but not always (e.g. f134, tai150a). 5 instances were
solved to the best-known values by every run of these algorithms. For 3 instances (c50, c100b,
tai75c) all algorithms in all runs found the best-known solutions. Versions other than CPX2 and
CEPX are only slightly inferior from the point of view of the best-known solutions found.
If the ranking of the quality of solutions produced by the MAs in long runs were required, the
author would establish it based on table 7.6. An entry in the table presents the aggregated results
of statistical tests for the difference in average quality of results in each pair of MAs. The first
number in an entry (before ‘/’) is the number of instances for which the algorithm in a row is
statistically better than the algorithm in a column (i.e. the row is a ‘winner’). The second number
(after ‘/’) is the number of instances for which the algorithm in a column is better, but the number
is given with the negative sign (i.e. the row is a ‘loser’). The column ‘totals’ gives the sums of the
won and lost comparisons for each row. The column ‘sum’ is the sum of the wins and losses, so it
is the net flow score for each algorithm.
The Cochran-Cox test for the difference in two population means was employed (Krysicki
et al. 1998, Ferguson & Takane 1989). This test does not assume that variances in the compared
populations are equal, like Student’s test does. Each instance of the CVRP was tested separately,
and for each pair of the algorithm versions the tested hypothesis was: H0 : µ1 = µ2 against the
X̄1 −X̄2
.
alternative: H1 : µ1 6= µ2 (the two-sided test). The test statistic was: C = √ 2
2
S1 /(n1 −1)+S2 /(n2 −1)
The level of significance was always set to 0.05.
The table shows that the best MA version employs CEPX; it wins 26 direct comparisons and
loses only 2. The runner-up is SPX (17/-4). CPX2 is slightly worse (16/-5) and comes third, then
CECPX2 and GCECPX2. The worst results are due to RBX, which loses 29 comparisons and
wins none: it is not statistically better than any of the algorithms on any instance in long runs.
A direct comparison of the best and the second-best algorithms, CEPX and SPX, reveals that
the former is statistically better on 3 instances (these are: c120, tai100a, tai150c), while the latter
only on one (f134). In other cases the observed differences are not statistically significant.
For most of the versions (crossover operators) the presence of the CPM operator seems to be
134
Table 7.6: Comparison of the basic MA versions with the Cochran-Cox statistical
test for the significance of the difference of averages; long runs.
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
totals
sum
0/0
2/0
1/-2
0/-5
0/-9
2/0
0/-2
0/0
1/-4
0/-8
0/-9
1/-3
2/-1
4/-1
0/0
1/-4
0/-4
2/-1
5/0
8/0
4/-1
0/0
0/0
5/0
9/0
9/0
4/0
0/0
0/0
7/0
0/-2
3/-1
1/-2
0/-5
0/-7
0/ 0
16/-5
26/-2
11/-9
1/-22
0/-29
17/-4
11
24
2
-21
-29
13
important. Without this mutation only the SPX version generates solutions of comparable quality
(deterioration by less than 0.15%); all other crossover operators produce visibly worse solutions.
CPM is most important for CPX2 (deterioration by 1.25% without mutation) and CEPX (0.8%),
while its impact on CECPX2, GCECPX2 and RBX is somewhat smaller (less than 0.4%).
Compared to the quality of the best heuristic solutions (the Clarke and Wright heuristic), the
quality of the basic MA solutions is, on average, better by 4%, which is a moderate gain.
Quality of solutions: basic MA Here, some more detailed results of runs of the basic configuration of the MA for the larger CVRP instances are presented. Figure 7.20 shows the averages and
the standard deviations of the final solution quality for instances with more than 100 customers.
The averages were taken over 15 runs. The actual values of the presented statistics for all instances
are presented in the appendix in tables B.1 and B.2.
3,5%
average excess [%]
3,0%
2,5%
2,0%
1,5%
1,0%
0,5%
0,0%
c120
f134
c150
tai150a
tai150b
tai150c
tai150d
c199
tai385
instance
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
Figure 7.20: Average quality of solutions generated by the basic MA configuration
for larger instances; long runs.
One can see in the figure that the averages differ somewhat between instances. For some of
them almost the best-known solution qualities are consistently obtained (c120, f134, tai150a),
while for some other ones none of the versions of the MA is able to reach the average excess of 1%
(c199, tai150b). It seems that these two instances are hard for the designed MA, irrespective of
the crossover operator employed.
135
Compared to the aggregated results presented in figure 7.19, it seems that these results confirm
the earlier conclusions: the versions which appear good on average (e.g. CEPX) are also good on
the presented instances.
The same effect was observed on the subset of smaller instances. There, only one instance,
tai100a, had the average excess larger than 1% (1.5% exactly). The noMutation configuration of
the MA yielded very similar results, confirming the results presented above, so the details are not
presented here.
Time of computation: basic MA Figure 7.21 presents the average time of computation of
each version of the basic MA configuration as a function of instance size. In addition to raw data
the figure also shows the curves of regression of the form time = sizea · b (the power function).
For all those lines the values of r2 exceed 0.9, so they approximate the actual data very well.
1500
average time [s]
1200
900
600
300
0
50
100
150
200
250
300
350
400
instance size
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
r-CPX2
r-CEPX
r-CECPX2
r-GCECPX2
r-RBX
r-SPX
Figure 7.21: Average time of computation of the basic MA, together with curves
of power regression; long runs.
Comparing the lines one can clearly see that there are major differences between the considered
versions with respect to the time of computation until convergence. The process of computation
finishes earliest for the RBX version, and the difference with the other ones is large. Then the
order is: GCECPX2, CECPX2 and SPX (probably indiscernible), CEPX and CPX2, the slowest.
The exponents of the power regression functions are: 3.45 (RBX), 3.99 (GCECPX2), 3.86 (SPX),
4.18 (CECPX2), 4.38 (CEPX) and 4.13 (CPX2).
If one wanted to compare these times of computation with the results of the statistical test
(table 7.6) then CPX2 would be dominated by CEPX and SPX (faster algorithms and better
results), while all the other versions would be non dominated (a slower algorithm gives a slightly
better result).
The time of computation is not the only indicator of the speed of the artificial evolution. The
number of generations of the process may also shed some light on it. Figure 7.22 shows the average
number of generations of the basic configuration as a function of instance size. Again, the lines of
regression are shown, as these approximate well the observed tendencies (all r2 > 0.89).
This figure partially explains the differences in the time of computation observed earlier: the
MA versions differ in the number of performed generations. RBX performs the least generations
136
average number of generations
8000
7000
6000
5000
4000
3000
2000
1000
0
50
100
150
200
250
300
350
400
instance size
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
r-CPX2
r-CEPX
r-CECPX2
r-GCECPX2
r-RBX
r-SPX
Figure 7.22: Average number of generations of the basic MA, together with lines
of regression; long runs.
for instances with more than 70 customers. GCECPX2 comes as second, then CECPX2, SPX,
CEPX and CPX2 (again the most resource-consuming one).
The number of generations is related to the probability of some new individual being put into
the population. Clearly, this probability is highest for the CPX2 version, since the algorithm is
not able to stop as early as the other ones. On the other hand, this probability is smallest for
RBX, meaning that the process of evolution fades out relatively early in this case, and that the
combination of RBX, CPM and greedy LS is not able to generate anything new to the population.
This means that RBX or GCECPX2 may be profitable in short runs, but they lead to premature
convergence. Hence, when more time is available, CEPX and CPX2, the long-runners with higher
potential for generating new solutions, provide better results.
The last quantity which may give some insight into the nature of the evolution process is the
number of performed local search iterations per generation. This is presented in figure 7.23, as
a function of instance size. Once more the lines of regression are shown, as they generalise the
observed data well (all r2 > 0.89).
The figure reveals that some of the MA versions differ greatly in the average number of LS
iterations. CPX2 performs even more than twice as much as the second-largest CEPX. In this
ranking CECPX2 comes third, with SPX and GCECPX2 being more or less equal; RBX performs
slightly less iterations per generation.
In the author’s opinion, these differences may be closely related to the completeness of an
offspring generated by the considered crossover operators. Here, by completeness the author means
the degree to which the generated routes (vehicles) are filled with demands and are optimised in
terms of distance. In case of CPX2 these indicators are most probably much lower than for the
other operators (see figure 7.15), meaning that CPX2 is too disruptive. The results presented
here would sustain the hypothesis that only SPX, GCECPX2 and RBX generate complete enough
offspring that need few LS iterations to become local optima. CECPX2, CEPX and CPX2 need
substantially more iterations of local search for their offspring.
This completeness of offspring is also important for the quality of solutions when there is no
CPM mutation (see again figure 7.19): the disruptive CPX2 and CEPX substantially deteriorate
their results without this mutation. Therefore, it seems that the more complete an offspring is,
137
average LS iterations per generation
300
250
200
150
100
50
0
50
100
150
200
250
300
350
400
instance size
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
r-CPX2
r-CEPX
r-CECPX2
r-GCECPX2
r-RBX
r-SPX
Figure 7.23: Average number of local search iterations per generation of the basic
MA, together with lines of regression; long runs.
the better. In case of the systematically constructed operators that would mean that the more
common parental properties are preserved in an offspring, the faster the computation converges,
and still to good quality solutions. The disruptive operators need more help from CPM and LS.
On the other hand, it may be said that the high degree of completeness of offspring is not always
beneficial in a long process of computation: the most disruptive and LS-consuming operators,
CPX2 and CEPX, are also the ones to produce the best solutions. This comes at the cost of a
long evolution, though.
Impact of CPM mutation Figure 7.24 present the average time of the memetic algorithm for
the basic and the noMutation configurations. The time is plotted as a function of instance size,
together with lines of power regression. For all the presented lines values of r2 exceed 0.9.
The comparison of the basic and the noMutation series reveals that, generally, the absence of
CPM mutation accelerates convergence. The degree of this acceleration differs between MA versions, being the largest for CPX2 and CEPX, and smallest (or even none) for CECPX2. Significant
speedup is usually visible only for larger instances (size more than 100); for the smaller ones this
effect is negligible. However, except for CPX2 there is no clear relationship of this acceleration
with the instance size. It seems that this phenomenon depends on some other factors than size.
Nevertheless, even from the qualitative point of view this result is telling something about the
importance of CPM mutation for the designed MA. First, CPX2 and CEPX especially need the
presence of this mutation in order to slower their convergence. Although CPM makes computation
much longer in their case, the convergence without CPM is premature. This is confirmed by the
quality of results, which deteriorate without mutation (see figure 7.19).
Second, CECPX2, GCECPX2 and RBX also need mutation, but to lesser extent. It appears
that these operators, coupled with greedy LS, can sustain for themselves the high probability of
generating new good solutions. At the same time they produce very good results, almost as good
as with mutation. From this point of view SPX and LS is the most self-sufficient pair.
Overall, the presence of CPM seems to improve the quality of results of all MA versions. The
crossovers which create only partially complete solutions (CPX2, CEPX) need its presence to high
extent. The other ones, which produce more complete offspring, need it somewhat less.
138
1500
1500
1200
1200
average CEPX time [s]
average CPX2 time [s]
900
600
300
0
900
600
300
0
50
100
150
200
250
300
350
400
50
100
150
instance size
250
300
350
400
300
350
400
300
350
400
1500
average GCECPX2 time [s]
average CECPX2 time [s]
1500
1200
900
600
300
0
1200
900
600
300
0
50
100
150
200
250
300
350
400
50
100
150
instance size
200
250
instance size
1500
1500
1200
1200
average SPX time [s]
average RBX time [s]
200
instance size
900
600
900
600
300
300
0
0
50
100
150
200
250
300
instance size
350
400
50
100
150
200
250
instance size
Figure 7.24: Time of computation: CPX2 and CEPX (top), CECPX2 and
GCECPX2 (middle), RBX and SPX (bottom); long runs. Basic MA: squares,
solid line; noMutation MA: diamonds, dotted line.
7.9.2
Runs limited by time
The goal of the second experiment was to check the quality of results that may be generated by
MAs which are given exactly the same time of computation. The time limit for one run was set
to 256 seconds (almost 5 minutes).
Exactly the same versions of MA were employed in this experiment: CPX2, CEPX, CECPX2,
GCECPX2, RBX, SPX, with the same size of population (30). Additionally, a multiple start
local search (MSLS) was run, in order to check if the recombinations and the mutation contribute
anything to the basic local search. Indeed, the MA may be perceived as MSLS but with different
starting points, provided by the ‘genetic’ operators (Jaszkiewicz & Kominek 2003). The MSLS
had exactly the same set of initial solutions as the MAs; when no more heuristic solutions could
be generated, random ones were always used.
Again, the impact of mutation on the MAs was of interest and the noMutation configuration
was launched.
Each considered version and configuration of the algorithms was executed 15 times on the 22
considered CVRP instances. The same set of PCs was used in this second experiment.
139
Quality of solutions: aggregated results Figure 7.25 presents the average (bars) and one
standard deviation (whiskers) of the results of all MAs and MSLS, aggregated across 22 instances
and 15 runs per instance.
2,50%
2,25%
average excess [%]
2,00%
1,75%
1,50%
1,25%
1,00%
0,75%
MSLS
CPX2
CEPX
CECPX2
GCECPX2
RBX
0,64%
0,64%
0,68%
0,69%
0,65%
0,00%
0,81%
0,25%
2,48%
0,50%
SPX
algorithm type
noMutation
basic
Figure 7.25: Aggregated quality of solutions generated by different versions and
configurations of the memetic algorithm; runs limited by 256 s.
In the figure one can see that MSLS is the worst configuration of all, whether MAs use mutation
or not. This is good news from the point of view of the designed operators: their presence in a
population really matters and improves the final result by 1–2%.
Comparing the basic results to the previous experiment, one can see that the averages are
worse here, but only slightly, e.g. GCECPX2 generates solutions of almost the same quality, SPX
is worse by less than 0.1%. RBX, quite surprisingly, improves the result. This is perhaps due to
the fact that in the previous experiment it actualy converged before 256 s passed. CPX2 is hit
hardest by the time limit of 256 s, but even its result worsens only by 0.26%. This would mean
that the 256 s limit generally allows the algorithms to converge on the tested set of instances.
In this experiment, the ranking based on multiple Cochran-Cox tests is: SPX (16/-1), RBX
(11/0), CECPX2 (8/-3), GCECPX2 (8/-4), CEPX (8/-8) and CPX2 (0/-35). A probable cause of
the change in the ranking compared to the previous one (especially of RBX being second) is the
speed of operators and the completeness of generated offspring, which were commented on earlier.
This should be also visible later e.g. on the numbers of performed generations.
Still, the average results of all basic MAs are less than 1% worse than the best-known solutions.
These are very good results.
The impact of mutation on the quality of results Concerning the configuration without
mutation, it can be seen that SPX performs best in such conditions. Generally, all MA versions
(recombination operators) have worse results, but in case of SPX the deterioration is smallest.
While looking at the noMutation series, it seems that generally the more complete is an offspring
produced by an operator, the better solutions the related MA generates. This is clearly seen for
the operators designed in this thesis: GCECPX2 is best, then CECPX2, CEPX and finally CPX2.
The author of the thesis thinks there are two possible causes of this effect. One is the mentioned
140
offspring completeness: a more complete offspring consumes less LS iterations to become a local
optimum. Hence, more generations of the MA may be performed. The other cause may be that
some recombinations, like SPX and GCECPX2, also generate implicit mutations, while CEPX or
RBX may be more deterministic.
This observation and the comparison of basic and noMutation series give rise to an interesting
hypothesis: that the presence of mutation (either CPM or implicit in recombination) is important
in a memetic algorithm for the CVRP. This might be due to fast convergence of MAs or to a
multimodal landscape. Either of these statements remain a hypothesis without further evidence.
Speed of the algorithms Figure 7.26 shows the generations of each MA as a function of
instance size. A point in the figure represents one instance and the average number of generations
for this instance based on 15 runs. There are also lines of power regression. For all of them the
values of r2 exceed 0.95, so they approximate the raw data very well.
average number of generations
100000
10000
1000
100
50
100
150
200
250
300
350
400
instance size
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
MSLS
r-CPX2
r-CEPX
r-CECPX2
r-GCECPX2
r-RBX
r-SPX
r-MSLS
Figure 7.26: Average numbers of generations of the the basic MA configurations;
runs limited by 256 s.
Firstly, the comparison of these lines shows that multiple start local search makes the least
number of generations (iterations). CPX2 comes next, then CEPX, CECPX2, GCECPX2 and
SPX, and finally RBX. This confirms the result of the previous experiment: RBX coupled with
LS is the fastest pair.
This effect seems to be caused by the number of iterations of local search after recombination.
To demonstrate this, LS iterations as a function of generation number are shown in figure 7.27,
for two largest instances.
The top chart (c199) shows that recombination operators really do improve the situation over
random restarts: all operators result in offspring which are closer to local optima (in terms of
iterations) than a random solution in MSLS. Moreover, all the operators reduce the number of LS
iterations over time (generations). The order of operators with respect to the decreasing number
of iterations is the same as with respect to the number of generations (figure 7.26): CPX2, CEPX,
CECPX2, GCECPX2 and SPX, RBX. It is also consistent with a chart of average quality of
solutions as a function of time (not shown).
141
900
MSLS
800
700
600
500
CPX2
400
300
CEPX
200
GCECPX2
CECPX2
SPX
100
RBX
0
0
200
400
600
800
1000
1200
1400
1600
1800
2000
generation
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
MSLS
2000
MSLS
1800
1600
1400
CPX2
1200
CEPX
1000
CECPX2
800
600
GCECPX2X
400
SPX
200
RBX
0
0
20
40
60
80
100
120
140
160
180
200
generation
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
MSLS
Figure 7.27: Average numbers of LS iterations per generation: c199 (top) and
tai385 (bottom); runs limited by 256 s.
It can be also seen that the more ‘complete’ an offspring is generated by recombination, the
less LS iterations it needs to become a local optimum, and the faster the drop in these iterations.
Compare e.g. CPX2 (500 iterations, slowly decreasing) and CECPX2 (a drop from initial 400 to
less than 100). It may mean that the more preserving an operator is, the faster the population is
made uniform.
The chart showing results on instance tai385 is very similar, although it looks like the one
for c199 truncated at the 200th generation: MSLS has just finished processing heuristic solutions
(approx. 500 LS iterations per solution) and started with random ones (1900 iterations); CEPX is
still slower than CPX2, and GCECPX2 than SPX. The distance preserving operators are slow at
the beginning.
The main conclusion is that the more ‘complete’ an offspring is, the faster the computation.
Since there is a substantial gap between CECPX2 and GCECPX2 in figure 7.27, it also seems that
a distance-preserving recombination operator should include some greedy completion procedure.
142
7.9.3
Quality vs. FDC
The author also analysed the relationship between the quality of results of MAs and the fitnessdistance determination coefficients.
As a quality indicator the average excess over the best-known solution was employed, for each
instance and algorithm version. The basic configuration from the first experiment was considered
(long runs; see tables B.1 and B.2). Values of the fitness-distance determination coefficient (r2 )
for each distance measure and instance were taken from table 7.4 as the second observed variable.
The strength of the analysed relationship was measured by the determination coefficient (r2 ).
The scatter plots are not presented here, because no visible relationships could be found. All
the computed r2 values were below 0.15, indicating the ‘no relationship’ case. Moreover, for the
meaningful measures de , dpn and deu these values were virtually zero. Therefore, it seems that
there is no relationship between the average excess and the FD determination.
Yet another attempt to relate FDC with the quality of results was made. This time the results
were simplified to high extent: instances were classified as easy/hard to solve and as with/without
FDC. A hard instance was one with at least 3 algorithms with the average excess above 1%. In
practise it meant that no algorithm was better than 0.5% on such an instance. The classification
with respect to FDC was taken from table 7.4 and even more simplified: the ‘no’ and ’ambiguous’
cases were merged into ‘no FDC’. Then all the instances were aggregated, as shown in figure 7.28.
number of instances
10
8
6
easy
hard
4
2
0
no FDC
FDC
classification based on FDC
Figure 7.28: The relationship between FDC and the hardness of CVRP instances.
One can see that the ‘no FDC’ class contains mainly easy instances, contrary to the expected
result. Moreover, the ’FDC’ class is mixed, containing almost the same number of easy and hard
instances. Overall, it appears that even in this simplified classification there is no relationship
between FD determination coefficients and the quality of results. This would mean that in the
CVRP the FD coefficients should not be used for the prediction of instance hardness for the
memetic algorithm.
7.9.4
Quality vs. feasibility of neighbours
During the preparation of this thesis numerous analyses of solutions were conducted. One of them
concerned the feasibility of neighbours of local optima and the best-known solutions. The author
had the hypothesis that at least some instances may have the best-solutions with many infeasible
neighbours (somewhat ‘hidden’ in the solution space), thus hindering the designed local search.
143
The author knows the form of the best-known solutions for 18 out of 22 instances. These
solutions remain unknown for: tai100a, tai100b, tai150b, tai150c. All 2opt, swap and merge
neighbours of the available best-solutions and local optima were examined for feasibility; the sets
of local optima employed earlier in the FDA were used here. The fraction of feasible neighbours
is shown in figure 7.29.
c50
f71
c75
tai75a
tai75b
tai75c
instance
tai75d
c100
c100b
tai100c
tai100d
c120
f134
c150
tai150a
tai150d
c199
tai385
0,0
5,0
10,0
15,0
20,0
25,0
30,0
35,0
40,0
45,0
50,0
feasible neighbours [%]
best-known
local-optima
Figure 7.29: The percentage of feasible neighbours of the best-known solutions
and local optima.
This figure reveals interesting properties of solutions. For 15 instances the best-known solutions
have less feasible neighbours then an average local optimum; the exceptions are: c100b, c120 and
f134. The drop in the fraction amounts to 7%, from 28% for local optima to 21% for best-known
solutions.
The actual percentage of feasible neighbours of best-known solutions varies greatly across instances. It is very low for several of them, e.g. 9% for tai385, 11% for c199, 13% for c75. For some
other it is much higher: 40% (c100b), 38% (f71), 34% (f134). This means that instances differ
greatly with respect to the accessibility of the best-known solutions by means of the employed local
search; the best-known solutions for some instances are more on the edge of feasibility than for
others. This may have impact on the quality of results and eventually on the hardness of instances.
Therefore, the relationship between the quality of results and the feasibility of neighbours was
further examined. A scatter plot of it is shown in figure 7.30; the values of r2 for the presented
data series are given in table 7.7.
One can see in the table that as much as 44% of variability in the average excess of results of
MA-RBX may be explained by the variability of the percentage of feasible neighbours of the bestknown solutions. Compared to the relationship between FDC and quality this one is a surprisingly
strong relationship. For other versions of the MA the values of r2 are a bit smaller, but except for
GCECPX2 they are all around 40%. On top of that, the figure shows that there really may be
some impact of the feasible neighbours on the quality of results, in contrast to FDC.
144
2,0%
1,8%
average excess [%]
1,6%
1,4%
1,2%
1,0%
0,8%
0,6%
0,4%
0,2%
0,0%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
feasible neighbours of the best-known solution [%]
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
Figure 7.30: The relationship between the percentage of feasible neighbours and
the quality of results obtained by basic MAs in long runs.
Table 7.7: Values of the r2 determination coefficient for the relationship between
the percentage of feasible neighbours of the best-known solutions and the average
excess of results in long runs.
7.10
CPX2
CEPX
CECPX2
GCECPX2
RBX
SPX
0.38
0.40
0.39
0.27
0.43
0.37
This chapter presented the process of adaptation of the memetic algorithm to the capacitated
vehicle routing problem. It was clearly visible in the chapter that adaptation of the MA requires
many design decisions to be made, like for any other metaheuristic. In particular, the author
presented the chosen representation, the design of the local search algorithm, the choice and
design of initial solutions.
What is more important, the systematic design of recombination operators based on the fitnessdistance analysis was performed. The author was first to have analysed the relationship between
fitness and distance in the CVRP. Firstly, he designed and implemented distance measures for
solutions of the problem. These measures were then used in the fitness-distance analysis. FDA
revealed that the average distances de , dpn , deu , dear between local optima are approximately 30%
smaller than the average distance between random solutions: local optima share some features
and are to some extent concentrated in the analysed landscapes. Moreover, moderate values of
FDC exist in case of 3 measures: de , dpn , deu ; this means that better local optima of the CVRP
tend to have more common features (edges, clusters, subsequences of customers) than worse ones.
However, it was also observed that the existence of FDC depends on the analysed problem instance.
These FDA results confirm to some extent the research intuition expressed earlier e.g. by
Taillard (1993) and Rochat & Taillard (1995), that good solutions of the CVRP are similar to
the best-known ones. There, the intuition was based on visual similarity of solutions, while here
it was expressed with distance measures and analysed empirically. It appears, however, that the
intuition was right only to some extent (moderate correlations) and for some instances.
7.10. Summary and conclusions
145
The results were then the basis for the design and implementation of 4 distance-preserving
recombination operators: CPX2, CEPX, CECPX2, GCECPX2. Some of these operators preserve
dpn , some de , some both the distance measures. Additionally, the mutation operator CPM was
designed to preserve dpn , but disturb de . These operators, also with RBX and SPX taken from
the literature, were tested in two experiments with memetic algorithms.
The presented results of the experiments allow to form the following conclusions.
• In runs limited by time the speed of the pair (recombination, LS) appeared to be an issue.
RBX demonstrated to be faster than all the systematically designed operators, while SPX
faster than some of them. At the same time SPX and RBX provided better results in these
short runs, confirmed by statistical tests.
• The distance-preserving CEPX has a higher probability of generating better solutions in the
MA than RBX and SPX. This was visible in the numbers of generations in long runs. It was
also confirmed by statistical tests on the quality of results in these runs.
• The presence of CPM mutation in all MAs had positive effect on the quality of generated
solutions. The more disruptive an operator is, the more it requires CPM in the MA. CPM
was most important to CPX2, while least to SPX.
• The overall results of the MAs were very good. The average excess above the best-known
solutions amounted to 0.5–0.7% for all recombination versions. For half of the studied instances some of the long runs of MAs found the best-known solutions. The best results were
generated by the MA employing CEPX and CPM designed in this thesis.
The author concludes that a recombination operator for the CVRP should preserve common
parental features (edges, clusters) and try not to be too disruptive. If speed is an issue, it may
contain a greedy component to make an offspring more complete and require less LS iterations.
Overall, from the group of the tested recombinations the author of this thesis would choose
Prins’s (2004) SPX to be used in short runs. It proved to generate good solutions and be reasonably
fast when coupled with local search. For longer runs, when the best quality is required, CEPX
with CPM should be used; the pair has the highest probability of generating better solutions.
The author also attempted to relate the results of the long experiment to the fitness-distance
determination coefficients. This attempt failed. No relationship of this kind was found, which could
confirm some earlier hypotheses and results of other authors that FDC may be a good predictor
of problem hardness for evolutionary algorithms.
Surprisingly, the quality of results of MAs appeared to be linked to some extent with the
feasibility of neighbours of the best-known solutions. This means that an exploration of the edge
between feasible and infeasible solutions might ease optimisation; the algorithms presented here
explored it only from the side of feasible solutions.
Concerning the method of systematic construction of recombination operators based on the
fitness-distance analysis, the author deems it provided good results. The best distance-preserving
recombination coupled with the designed mutation generate solutions of the best quality among
all the tested operators.
Chapter 8
The car sequencing problem
8.1
ROADEF Challenge 2005
The second problem addressed in this thesis was the subject of an open computational contest
ROADEF Challenge 2005 (Cung 2005b), organised by French Society of Operations Research and
Decision Analysis. The formulation of the problem was given by French car manufacturer Renault.
The publication of this problem in July 2003 started the challenge.
The author of this thesis participated in the contest in a senior team from Poznan University
of Technology (PUT), together with Andrzej Jaszkiewicz and Paweł Kominek. There were also
6 other teams from PUT registered in the contest, 5 of which were junior teams led by Grzegorz
Pawlak. These were the only teams from Poland. Overall, 55 teams from 15 countries took part
in the challenge.
There were two stages: qualification and final. Based on the ranking of results of programs
submitted by participants in March 2004, the jury selected 24 teams for the final stage. This
included two PUT teams: one led by Pawlak and one by Jaszkiewicz.
In the final stage the qualified teams could improve their code based on additional instances
provided by Renault. Improved programs were submitted in October 2004. Final results were
announced in February 2005.
The winner was a team from Université de Marseilles, France: Bertrand Estellon, Frederic
Gardi and Karim Nouioua, also the authors of the publication (Estellon et al. 2006). The second
team was from Brazil, led by Celso Ribeiro, the co-author of (Ribeiro et al. 2005). The best team
from PUT finished 10th: Grzegorz Pawlak, Maciej Płaza, Przemysław Piechowiak and Marek
Rucinski. The second-best team from PUT was led by Jaszkiewicz. It finished the competition on
13th place. Detailed results may be found at the challenge web page (Cung 2005b).
8.2
Problem formulation
Renault’s car sequencing problem (CarSP) requires that a set of cars to be produced during a
working day is put in some order (sequence) on a production line. This order has to observe the
constraints of the whole technological process in a paint shop and assembly line (Nguyen 2003).
The paint shop requires that cars to be manufactured are grouped together in subsequences of
the same colour, because it minimises the cost of purging spray guns when a colour changes. At
the same time, the colour has to be changed regularly; it is a hard constraint (see section 2.1.2).
The assembly line requires that workload is distributed evenly along the line. The workload is
related with some car options (e.g. sun-rof, navigation system, power windows, etc.). While the
assembly line should advance in regular time intervals, some options require more work to be done
147
148
at certain assembly stations and delay this advance. Therefore, for each option some constraints
are given in form of ratios N/D. Such a ratio constraint states that at most N cars may require
an option in any continuous subsequence of D cars. Otherwise, some violations of the constraint
are computed. A perfect solution for an assembly line causes no violations along the whole line. If
that is not possible, the number of violations should be minimised. Since there are some options
more important than others, the sum of violations is weighted by the given priorities.
The overall goal of the problem is to establish a feasible sequence of cars which minimises the
weighted sum of paint colour changes and violations of ratio constraints.
Formal description of the problem is given below.
Input data
A set of N cars is given to be manufactured in the current production day. Each car is uniquely
identified by a number. Here it may be assumed that the cars are numbered by consecutive integers
from 1 to N .
Each car has a paint colour code assigned which describes its required body colour. Lets assume
that these codes are consecutive integers from 1 to C. Then, function col(i) = c ∈ {1, . . . , C}
describes the colour required for car i. Related to the colour information is the paint batch limit,
P BL, a natural number.
Another batch of input data is connected with options a car may possess. There is a number
of options given, O. For each option j a function optj indicates whether a car requires the option
to be assembled: optj (i) = 1 if car i needs the option (the option is active for the car), optj (i) = 0
otherwise. All cars are described this way.
With each option there is always a fractional number given in the form: Nj /Dj , where Dj , Nj ≤
N and Dj ≥ Nj > 0. This pair of numbers describes the ratio constraint related to option j. There
is always one constraint of this kind related to one option.
Moreover, for each option there is also its priority assigned, in the form of a binary value
prio(j). If prio(j) = 1, then the related option is of high priority, otherwise (prio(j) = 0) it has
low priority in the planning and manufacturing process. This way the set of options (and related
ratio constrains) is partitioned into two subsets: high and low priority ones.
Additionally, the subset of high priority ratio constraints is said either to be easy (dif = 0) or
difficult (dif = 1) in an instance of the CarSP.
On top of that, there is also a final fragment of a sequence of Nprev cars from a previous
production day given. These cars may be identified by negative indexes i from −Nprev + 1 to 0
(to discern them from the current-day cars). The same pieces of information are given for these
cars as for the ones from the current production day: options optj and colours col. These cars are
required to properly define the objective function in the problem.
Finally, instance data contains a vector of three natural weights for each of the three components
of the objective function: w = (wHP RCs , wLP RCs , wP CC ). The set of all possible values of w
contains only 3 vectors: {(106 , 103 , 1), (106 , 1, 103 ), (103 , 1, 106 )}.
Feasible solution
A solution s to an instance of the CarSP is a sequence (permutation) of all cars:
s = (s1 , s2 , . . . , sN )
where ∀l ∈ {1, . . . , N } sl ∈ {1, . . . , N }; sl = i means that car i appears at position l in the
sequence. Obviously, it is required from a feasible solution that it is a permutation: each car is
149
8.2. Problem formulation
present exactly once in the sequence:
∀l1 , l2 ∈ {1, . . . , N }, l1 6= l2 : sl1 6= sl2
There is one hard constraint in the problem. It is defined by the paint batch limit, PBL, and
states that there must not be any continuous subsequence of cars with the same colour in s which
is longer than P BL:
∀l ∈ {1, . . . , N − P BL}
col(sl+1 ) = col(sl+2 ) = . . . = col(sl+P BL ) ⇒ col(sl ) 6= col(sl+1 )
Objective function
In order to properly and easily define the objective function, each solution s to the CarSP has to
be extended with cars from the previous day and some ‘dummy’ cars from the following day.
There are Nprev cars from the previous day. They are given in a sequence starting from
−Nprev + 1 and finishing at 0 (the position exactly adjacent to the first current-day car).
Let Dmax = maxO
j=1 {Dj } be the maximum of all denominators of ratio constraints. There are
Dmax − 1 dummy cars added. They are numbered i = N + 1, . . . , N + Dmax − 1. Each such car
i requires no additional options to be mounted, so optj (i) = 0 ∀j. The colour of these cars is
irrelevant, so it may as well be col(i) = 1.
Under these assumptions, for each solution s to the CarSP there is solution s0 defined, which
is unambiguously extended to the previous and the following day:
s0 = (s0−Nprev +1 , s0−Nprev +2 , . . . , s00 , s01 , s02 , . . . , s0N , s0N +1 , s0N +2 , . . . , s0N +Dmax −1 )
This is a sequence of Nprev cars from the previous day, followed by N cars from the current day
and Dmax − 1 ‘dummy’ cars from the next day. Hence it is required that:
∀l ∈ {−Nprev + 1, . . . , 0} s0l = l
∀l ∈ {1, . . . , N } s0l = sl
∀l ∈ {N + 1, . . . , N + Dmax − 1} s0l = l
The number of paint colour changes, P CC(s), is one component of the objective function:
P CC(s) = P CC(s0 ) = |{l ∈ {1, . . . , N } : col(s0l ) 6= col(s0l−1 )}|
In other words, it is the number of changes of the paint colour between two subsequent cars in
sequence s, including possibly one colour change at the very beginning of the sequence (hence the
need for a previous-day car).
Then there are two components of the objective function related to the binary options j. Lets
define acj as the number of active options j of such a subsequence of Dj consecutive cars in s0
that starts at position i:
i+Dj −1
X
acj (s0 , i) =
optj (s0l )
l=i
where i ∈ {−Dj + 2, −Dj + 3, . . . , −1, 0, 1, . . . , N }.
The ratio constraint RCj defined by ratio Nj /Dj for option j states that in any sequence of
Dj consecutive cars (called window) there should not be more than Nj cars with option j active.
If for such a window starting at position i this number of active options is exceeded, the number
of violations of the ratio constraint in this window is computed in the form:
(
acj (s0 , i) − Nj if acj (s0 , i) > Nj
vnj (s0 , i) =
0
otherwise
150
The number of violations of a ratio constraint in the entire sequence s is defined as the sum of
violations in all subsequences:
V Nj (s) = V Nj (s0 ) =
N
X
vnj (s0 , i)
i=−Dj +2
and these violations are computed also for all those subsequences starting in the previous-day
sequence which contain at least one current-day car, hence the first i equals −Dj + 2. These
violations are also computed for all the subsequences extending to the following day which contain
at least Nj +1 current-day cars. Such a form of V Nj (s) ensures that when this number is minimised,
then the workload on the assembly line is evenly distributed. The workload cannot be artificially
greater at the beginning or the end of a working day, hence the previous and the following day
cars are needed.
Since the given options are partitioned into two subsets, they give rise to two subsets of ratio
constraints: high priority ratio constraints (HPRCs) and low priority ones (LPRCs). The sums of
violations in these sets define the two components of the objective function mentioned earlier:
V NHP RCs (s) =
O
X
prio(j) · V Nj (s)
j=1
V NLP RCs (s) =
O
X
(1 − prio(j)) · V Nj (s)
j=1
Finally, the objective function f (s) to be minimised is a weighted sum of all components:
f (s) = wHP RCs · V NHP RCs (s) + wLP RCs · V NLP RCs (s) + wP CC · P CC(s)
and it reflects the wish to distribute the workload evenly along the assembly line and simultaneously
minimise the cost of paint colour changes in the paint shop. Since these goals may be conflicting,
the vector of weights w gives the order in which the components in f (s) should be minimised.
For example, when w = (wHP RCs , wLP RCs , wP CC ) = (103 , 1, 106 ), then the number of paint
colour changes P CC(s) should be minimised first, then the number of violations of high priority
ratio constraints V NHP RCs (s), and finally, with the smallest weight, the violations of LPRCs,
V NLP RCs (s).
8.2.1
Other forms of the problem
According to Kis (2004), the car sequencing problem appeared in the literature in mid 1980s as
a particular version of the job-shop scheduling problem. Later, in 1990s, it was addressed in the
constraint programming community (Gent & Walsh 1999). In 1990s and early 2000s it emerged
in papers on metaheuristics (Warwick & Tsang 1995, Cheng et al. 1999, Schneider et al. 2000).
Consequently, Renault’s formulation of the CarSP is not the only one that may be found in the
literature. There are several of them at least, and some of them are also connected with names of
certain manufacturers.
1. Warwick & Tsang’s (1995) CarSP. This form of the problem addresses the assembly line
requirements. There are constraints alike to RCs. Penalties are computed in a different way,
though: only the fact of violating the constraints is counted, not the number of violations.
It implies that optimum solutions without violations are the same as in Renault’s case, but
for overconstrained instances they will be somewhat different.
This version, taken from (Warwick & Tsang 1995) was most probably considered earlier in
the constraint programming context.
8.2. Problem formulation
151
2. CSPLib CarSP. Only the assembly line requirements are considered, with penalties exactly
equivalent to Renault’s ratio constraints. There are no colours of cars involved, no priorities
of RCs, no previous-day cars. Hence, this problem may be modelled as Renault’s problem
with all cars painted black, P BL = N and only LPRCs.
This form of the problem was addressed by Gent (1998), Gottlieb et al. (2003), Kis (2004),
Gravel et al. (2005), Terada et al. (2006) and Zinflou et al. (2007).
3. Ford’s CarSP. This version is quite different from Renault’s problem. It seems to consider all
stages of manufacturing (body shop, paint shop, assembly line). There are two additional,
specific hard constraints. Despite the presence of P CC(s) in the weighted objective function,
penalties for violated ratio constraints are counted by a different formula.
The problem with this form was considered only by Cheng et al. (1999).
4. BMW’s CarSP. Yet another version was inspired by a real production problem in BMW
factories (Schneider et al. 2000). It addresses assembly line requirements only. Penalties
similar in the form to RCs of 1/D or (D − 1)/D are computed, but by a different formula.
Additionally, a specific component is present in the objective function, quite similar to the
cost function in the TSP.
This form of CarSP was described exclusively by Schneider et al. (2000).
5. Puchta & Gottlieb’s (2002) CarSP. Again, only the assembly line is addressed. There are
as much as 5 different types of soft constraints given, two of them resembling RCs and the
paint batch limit. A major difference here is that quadratic penalty functions are employed
instead of linear ones.
CarSP of this kind was attacked only by Puchta & Gottlieb (2002).
Based on the detailed inspection of the mentioned formulations, the author of this thesis considers only CSPLib CarSP as similar enough to Renault’s problem. Other forms differ too much
from Renault’s. Therefore, the algorithms proposed for the CSPLib problem may be the source
of direct inspiration for and comparison with algorithms for the problem considered in this thesis.
Algorithms designed for other problems are not directly comparable, although they may be the
source of some inspiration.
This remark is especially valid when one notices that the colour-related constraint and objective
are actually computationally easy to solve, making CSPLib and Renault’s problems equivalently
hard at their core. This is explained in more detail in section 8.3 on the complexity of the CarSP.
8.2.2
Groups of cars
The original Renault’s problem description does not mention any similarities between cars to be
manufactured. Each car has its own unique identifier in the input. In the output a feasible
permutation of such identifiers is required.
However, CSPLib CarSP (Gent & Walsh 1999, Kis 2004) explicitly states that input data
contains G groups (or classes) of cars; each class represents ng cars (g = 1, . . . , G) with exactly
the same options required. The same is in Ford’s case (Cheng et al. 1999) and Warwick & Tsang’s
(1995). Therefore, it is useful to pose Renault’s problem in a similar way. The sole difference is
that here the colour has to be accounted for: a group (a class) of cars is a set of cars with the
same options and paint colour code.
152
All the objects defined above (solution s, option optj , etc.) may be easily redefined to handle groups of cars. For example, a solution s is still a sequence s = (s1 , s2 , . . . , sN ), but ∀i ∈
{1, . . . , N } si ∈ {1, . . . , G}; si = g means that a car from group g appears at position i. Moreover,
s is feasible if all required cars are indeed produced:
∀g ∈ {1, . . . , G}
|{i : si = g}| = ng
This approach is useful in breaking symmetry in the problem. The search space of the reformulated problem is certainly not larger, but usually much smaller than the original space. This is
because there is more permutations of size N than permutations of the same size but with certain
repetitions. The search space stays exponentially large in practical cases, though.
In the context of Renault’s CarSP and ROADEF Challenge 2005 the notion of a group of cars
was introduced e.g. by Jaszkiewicz et al. (2004). On the other hand, Ribeiro et al. (2005) do not
mention any groups of cars.
8.3
Computational complexity
First proof of the NP-completeness of the decision version of the CarSP was given by Gent (1998)
for the CSPLib form. However, Kis (2004) demonstrated a flaw in this proof and provided his own,
yet for a subset of instances addressed by Gent. This subset contains instances with the number
of cars of each group bounded by a polynomial of the number of classes: maxg {ng } ≤ poly(G). In
other cases it is not even known if CarSP belongs to NP.
Kis (2004) also demonstrated with his polynomial transformation that this subset of instances
is strongly NP-hard.
Concerning the colour component in Renault’s problem, it was found by several authors simultaneously that optimising P CC(s) while observing the paint batch limit is an easy problem. At
least Pawlak (2007) and the author of this thesis designed a polynomial algorithm to deal with
colours in an exact way.
In the light of this last observation it appears that the hardness of Renault’s and CSPLib forms
is related to exactly the same elements in the problem: multiple ratio constraints.
8.4
Instances
Four sets of instances were made available by Renault for the purpose of the challenge: A, B, X
and T (Cung 2005b). First 3 sets were used to evaluate programs submitted by participants; set
T contained 4 test instances.
Table 8.1 lists some basic properties of all A, B, X instances grouped by their general type.
There are 5 types, which result from 2 values of dif and 3 values of w (one combination is
meaningless). The type of each instance is encoded as a two-digit number WD, which results
directly from dif and w (this is explained in appendix A). In this table, there is also the number
of instances of each type given, the numbers of cars, ratio constraints and the paint batch limits.
Overall there are 80 instances in these sets, so their detailed list would be too long to cite it
here. However, 18 instances from set X, together with some reference results, are given in table
8.2. These results were generated in the final stage of the challenge, using the same machine and
time limits for all submitted programs. The results include: the best average of 5 runs across all
teams, the best result of the winning team (Cung 2005a) and the average of 5 runs of the program
submitted by the author’s team.
153
8.5. Heuristic algorithms for the CarSP
Table 8.1: Basic description of the types of instances of Renault’s CarSP, sets: A,
B, X.
#inst.
type
WD
weights
w
dif
D
#cars
(min-max)
#HPRCs
(min-max)
#LPRCs
(min-max)
PBL
(min-max)
13
9
25
11
22
00
01
30
31
60
(106 , 103 , 1)
(106 , 103 , 1)
(106 , 1, 103 )
(106 , 1, 103 )
(103 , 1, 106 )
0
1
0
1
0
65–1231
128–1315
65–1319
128–1315
65–1270
1–11
2–9
1–11
2–9
1–11
1–19
2–16
0–19
0–16
0–19
10–500
10–150
10–500
10–150
10–1000
A competitor of the challenge will instantly note that the original names of Renaults’ instances
were changed here. These was done because the original names were sometimes extremely long
and hard to manage, so they were made shorter. The employed map of instance names is given in
appendix A.
For instance 035X60-0090 the author’s program did not provide any solution in the challenge.
This was due to a bug in the program code, in the input data parsing procedure.
8.5
8.5.1
Heuristic algorithms for the CarSP
Greedy heuristics by Gottlieb et al.
Gottlieb et al. (2003) proposed a set of greedy heuristics for the CSPLib CarSP. These constructive
heuristics start building a solution with an empty sequence. At each step one feasible index of a
group is appended to the current partial sequence. The chosen group minimises the total number
of new violations caused by the extension.
In order to properly define this heuristic the notion of a partial sequence of groups (partial
solution) has to be defined. Informally speaking, it is a sequence of groups which occupy only
some initial, consecutive positions of a complete sequence. Formally, it is a sequence sp of length
l, where 0 ≤ l ≤ N , of the form sp = (sp,1 , sp,2 , . . . , sp,l ). Let |sp | = l denote the length of this
partial sequence and sp = () denote an empty sequence. Let also sp · g be sequence sp extended at
the end (concatenated) with group g, thus creating a sequence with one more car.
The notion of the number of violations of a ratio constraint RCj may be easily extended to a
partial sequence sp if we assume that the undefined positions of sp (from |sp | + 1 up to N ) are filled
with ‘dummy’ cars. Then such a complete current-day sequence may be unambiguously extended
to the previous and the following day giving s0p . Hence, the number of violations of RCj in sp is
computed as:
V Nj (sp ) = V
Nj (s0p )
=
|sp |
X
vnj (s0p , i)
i=−Di +2
The number of new violations of all ratio constraints caused by the extension of sp with a car
from group g is defined as:
∆V N (sp , g) =
O
X
(V Nj (sp · g) − V Nj (sp ))
j=1
and this is the main heuristic function guiding the greedy constructive algorithm.
However, Gottlieb et al. noticed that usually the set of groups which minimised the new violations contained more than one group, resulting in a tie. Thus, they defined a set of additional
154
Table 8.2: Values of the objective function in the final stage of the challenge (set
X): best average result, winning team best, author’s team average. Components of
the objective function given in brackets: (V NHP RCs /V NLP RCs /P CC).
instance
022X60-0704
023X30-1260
024X30-1319
025X00-0996
028X00-0325
028X00-0065
029X30-0780
034X30-0921
034X30-0231
035X60-0090
035X60-0376
039X30-1247
039X30-1037
048X30-0519
048X31-0459
064X30-0875
064X30-0273
655X30-0264
655X30-0219
best average
winner best
author’s team average
12002003.0
(2.0/3.0/12.0)
192466.0
(0.0/66.0/192.4)
337006.0
(0.0/6.0/337.0)
160407.6
(0.0/160.0/407.6)
36341495.4
(36.0/341.4/95.4)
3.0
(0.0/0.0/3.0)
110298.4
(0.0/98.4/110.2)
55994.8
(0.0/794.8/55.2)
8087035.8
(8.0/35.8/87.0)
5010000.0
(10.0/0.0/5.0)
6056000.0
(56.0/0.0/6.0)
69239.0
(0.0/239.0/69.0)
231030.0
(0.0/30.0/231.0)
197005.6
(0.0/1005.6/196.0)
31077916.2
(31.0/1116.2/76.8)
61187229.8
(61.0/29.8/187.2)
37000.0
(0.0/0.0/37.0)
30000.0
(0.0/0.0/30.0)
153034000.0
(153.0/0.0/34.0)
12002003.0
(2.0/3.0/12.0)
191066.0
(0.0/66.0/191.0)
336006.0
(0.0/6.0/336.0)
1181284.0
(0.0/1181.0/284.0)
36361091.0
(36.0/361.0/91.0)
3.0
(0.0/0.0/3.0)
110069.0
(0.0/69.0/110.0)
55589.0
(0.0/589.0/55.0)
8087036.0
(8.0/36.0/87.0)
5010000.0
(10.0/0.0/5.0)
6056000.0
(56.0/0.0/6.0)
69238.0
(0.0/238.0/69.0)
231030.0
(0.0/30.0/231.0)
197111.0
(0.0/1111.0/196.0)
31077131.0
(31.0/1131.0/76.0)
61185029.0
(61.0/29.0/185.0)
37000.0
(0.0/0.0/37.0)
30000.0
(0.0/0.0/30.0)
153034000.0
(153.0/0.0/34.0)
12002003.0
(2.0/3.0/12.0)
246268.2
(0.0/68.2/246.2)
421425.0
(0.0/25.0/421.4)
189390.2
(0.0/188.8/590.2)
36377907.2
(36.0/377.8/107.2)
3.0
(0.0/0.0/3.0)
120855.0
(0.0/55.0/120.8)
76217.6
(0.0/617.6/75.6)
8091450.2
(8.0/50.2/91.4)
no solution
(–/–/–)
6056000.0
(56.0/0.0/6.0)
69455.6
(0.0/455.6/69.0)
239593.2
(0.0/193.2/239.4)
206509.6
(0.0/1109.6/205.4)
31104598.8
(31.0/998.8/103.6)
61229518.8
(61.0/118.8/229.4)
40400.0
(0.0/0.0/40.4)
30000.0
(0.0/0.0/30.0)
153035200.0
(153.0/0.0/35.2)
155
8.5. Heuristic algorithms for the CarSP
tie-breaking helper functions. The two best performing ones were based on the notion of a (dynamic) utilisation rate of an option j in partial solution sp .
Definition of such a rate requires that several other objects are defined. Let used(sp ) be a vector
of length G describing how many cars of each group are used in sp : used(sp ) = (n01 , n02 , . . . , n0G ).
Of course, ∀g ∈ {1, . . . , G} 0 ≤ n0g ≤ ng , meaning that no more cars of each group may be used
than the total number of cars in this group. Moreover, ∀g ∈ {1, . . . , G} n0g = |{i ∈ {1, . . . , |sp |} :
sp,i = g}|. It means that the component n0g represents the number of cars of group g in sp . It will
be also denoted usedg (sp ). For an empty sequence sp = () we have usedg (()) = 0 for all g.
The converse notion, of the vector of cars still available given some partial sequence sp , may
be defined as: avail(sp ) = (n1 , n2 , . . . , nG ) − used(sp ). Similarly, availg (sp ) = ng − usedg (sp ).
The number of available cars requiring option optj given some partial solution sp is defined as:
rj (sp ) =
G
X
optj (g) · availg (sp )
g=1
and the (dynamic) utilisation rate of option optj (ratio constraint RCj ) in the cars remaining after
sp is given by:
rj (sp ) · Dj
utilRate(j, sp ) =
Nj · (N − |sp |)
This function was called dynamic in contrast to a static one, which does not depend on sp and
was simply defined by Gottlieb et al. (2003) as utilRate(j, ()).
Given these definitions the two most important tie-breaking functions may be defined as:
DSU (sp , g) =
O
X
optj (g) · utilRate(j, sp )
j=1
which is the dynamic sum of utilisation rates of group g given partial sequence sp and:
DHU (sp , g) =
O
X
optj (g) · 2rank(j)
j=1
which is the dynamic highest utilisation rate of group g given sp , and rank(j) = r for the option
j with rth smallest utilRate(j, sp ).
Gottlieb et al. studied empirically the performance of these two heuristics, and 4 others, on
two sets of CSPLib instances. This study revealed that DSU and DHU were the best for harder
instances with cars requiring many options (overconstrained instances). While these heuristics
were worse on the second set of easier (satisfiable) instances, neither of them performed badly,
being generally the second and the third performer. The best one for this set, called the dynamic
even distribution heuristic (DED), was on the other hand worse for the harder instances.
8.5.2
Insertion heuristic by Ribeiro et al.
Ribeiro et al. (2005) proposed a constructive heuristic for Renault’s CarSP. This was used as a
part of their iterated local search algorithm (see section 8.6.2).
The heuristic starts with an empty partial solution and in each step inserts one car into the
sequence, until all cars are inserted. At each step, the car to be put in the sequence is chosen
randomly from the set of available cars. The insert position is chosen heuristically based on the
change of the objective function induced by the potential insert. The position with the best overall
change (possibly negative) is chosen. Insertions leading to infeasible solutions are discarded.
156
8.6
Metaheuristic algorithms for the CarSP
8.6.1
Local search by Gottlieb et al.
Gottlieb et al. published two papers on local search for CarSPs (Puchta & Gottlieb 2002, Gottlieb
et al. 2003). They considered several types of neighbourhood operators there:
• Insert: removes a group index from some position and puts it back in some other one. The
subsequence in between is shifted by one position. Neighbourhood size: O(N 2 ).
• Swap: exchanges two different group indexes at least 2 positions apart. Size: O(N 2 ).
• SwapT: exchanges two different adjacent group indexes. This is also called transposition.
Size: O(N ).
• SwapS: exchanges two different groups like Swap, but only if they are similar enough. This
is measured by the Hamming distance between the vectors of options. It is required that
this distance between exchanged groups g1 , g2 is: 1 ≤ dH (g1 , g2 ) ≤ 2. Neighbourhood size:
O(N 2 ), but in practise may be much smaller.
• Lin2Opt: inverses a subsequence of at least 2 elements. Size: O(N 2 ).
• Random: shuffles randomly a chosen subsequence of at least two elements. Size: unknown.
In order to speed up local search they employed the well-known technique of computing only
the change of the objective function when evaluating a move (the incremental update scheme, see
section 4.6.5). This was especially important for moves that affect only small parts of a solution,
like SwapT or SwapS.
Additional acceleration of LS was obtained by restricting the moves to some smaller parts of
the whole subsequence. Puchta & Gottlieb set the restriction to N/20, meaning that e.g. swapped
cars could be at most N/20 positions apart. Gottlieb et al. increased this limit to N/4. No
information on any rationale for this decision was given, though.
Results presented by Puchta & Gottlieb (2002) are difficult to discuss, since they were obtained
for a different version of the CarSP (see section 8.2.1). However, the same LS was employed for
CSPLib CarSP by Gottlieb et al. (2003). They reported results that compared the designed
iterated local search, iterated heuristics and an ant colony optimisation algorithm (ACO). The
algorithms were initialised either by a random permutation or by one of their dynamic greedy
heuristics: DSU, DHU, DED (see section 8.5.1).
For easier (satisfiable) instances, the results showed that most of the tested algorithms were
able to solve them to optimality. These algorithm were: iterated DSU/DHU, ACO DSU/DHU,
LS DED and sometimes randomly initialised LS. On the other hand, the ACO with random initial
solutions was significantly worse than the LS and iterated heuristics. This result indicates that a
good initial solution was crucial for Gottlieb et al.’s (2003) algorithms, especially the ACO.
For the harder instances from CSPLib, the ACO DSU/DHU was the best among the presented
algorithms. Iterated DSU/DHU and LS DED were only slightly worse, though. However, LS
DSU/DHU was not even reported, while this would most probably be the best combination of local
search with a greedy heuristic. This is at least bizarre of Gottlieb et al., since a fair comparison of
algorithms called for the combination of LS with DSU/DHU. The potential of this combination is
visible in the presented results: for some instances the iterated DSU/DHU performs better than
the presented LS DED. Moreover, in longer runs the presented LS DED performed equally well
to the ACO. On top of that, the comparison of success rates, the percentages of runs with the
8.6. Metaheuristic algorithms for the CarSP
157
best-known solutions found, reveals that LS DED is indiscernible in quality from ACO DSU/DHU.
And that these algorithms generate best-known solutions almost in every run.
To summarise the results of Gottlieb et al., it seems that the proposed local search was a good
design for the CarSP, indeed not worse than their ant colony optimisation. The results showed the
importance of having a good initial solution for LS, either DSU or DHU.
Which move operator should be used in local search for the CarSP? It is not clear from the
results of Gottlieb et al. They used 4 structurally different types of moves (6 operators in total) with
equal probabilities of application and did not provide any answer to this question. Nevertheless,
they mention that ‘it is well known for some combinatorial optimisation problems that not all
information types are relevant’ (Puchta & Gottlieb 2002). They recall the TSP case where edges
are important in a solution. Further, they say that in their CarSP ‘several information types are
relevant’. They mention adjacency and absolute positions for their more involved version of the
CarSP, but do not check this hypothesis in any way.
8.6.2
Iterated local search by Ribeiro et al.
The method designed by Ribeiro et al. (2005) for the initial stage of the ROADEF challenge is an
iterative local search algorithm (ILS). It maintains a single solution that is being modified and the
best solution found so far.
This algorithm uses two neighbourhood operators in local search: swap and insert. It is not
said whether the LS procedure is greedy or steepest, or some more involved one.
Two additional neighbourhood operators are used in perturbation (mutation) procedures:
• group exchange: exchanges two maximum subsequences of cars with the same colour;
• group reinsertion: a maximum subsequence of one colour is removed from a solution and the
solution is made complete by the constructive heuristic.
The algorithm is initialised with the constructive heuristic described above (section 8.5.2). This
solution is then improved by swap local search and the main ILS loop starts. The loop usually
iterates only two steps: a perturbation of the current solution by a random group exchange followed
by a swap local search. If the result of local search is not worse than the current solution, the
latter is updated.
In case the best solution found so far is not improved for 200 consecutive iterations of the main
loop, an intensification step is launched. It uses insert local search to improve the current solution.
Moreover, if the best solution found so far is not improved for 1000 iterations, a diversification
step is performed. It perturbs the current solution with a random group reinsertion move. This is
followed by swap local search.
When the given time limit is reached, the algorithm returns the best solution found so far.
This method was ranked 1st in the initial stage of the challenge. On 16 Renault’s instances
from set A it generated very good results. For several of them, it provided results which remained
the best even in the final stage: 3 instances of type WD=60 and 2 of type WD=30.
A possibly improved version of this algorithm was ranked 2nd in the final stage of the challenge.
8.6.3
Local search and very large neighbourhood by Estellon et al.
Although Estellon et al. (2006) considered in their paper only the CSPLib version of the CarSP,
they mentioned that their algorithm was the basis for the success in the ROADEF Challenge 2005.
158
Local search
They called their LS procedure a very fast local search (VFLS). It starts from a solution of the
DSU heuristic. What follows is a greedy algorithm where the first found neighbour which is not
worse than the current solution is accepted.
The algorithm uses 3 types of neighbourhood operators: swap, insert and reflection (called
Lin2Opt by Gottlieb et al. (2003)). In all these operators a particular move is specified by picking
two positions in a solution, i and k.
Estellon et al. said that for i and k ‘clever choices are necessary’ and proposed several strategies
making the choice:
• generic: choose i and k randomly;
• consecutive: choose i randomly and set k = i + 1;
• similar: the cars at chosen positions share some options (are similar);
• denominator: choose position i and some option j randomly, and set k = i + Dj .
These strategies are employed with different probabilities to each operator. The probabilities seem
to be especially tailored to the problem, because they were rather nonuniform and given with one
decimal digit precision. For example, the generic swap is attempted 69.6% of times.
The local search is accelerated by the use of dedicated data structures. Estellon et al. did not
give details, but said that the structures help exploit strong locality of the used transformations.
In effect, the local search speeds up approximately 10 times.
The authors of the VFLS conclude: ‘the efficiency of this approach relies on the huge number
of attempted transformations’.
Large neighbourhood of k-permutation
On top of the local search procedure, Estellon et al. considered an integer linear programming
formulation of the CarSP. In this context they made an interesting discovery about an operator
with very large neighbourhood. This neighbourhood may be examined in polynomial time for the
best element in many practical cases. The operator is called k-permutation and permutes cars
which are assigned to k different positions in a solution.
In the general case, given k different cars in k positions, the number of possible modifications
of a solution with this operator exactly in this configuration is equal to k!. Moreover, the number
¡ ¢
of all possible configurations of chosen positions is Nk . This is practically prohibitive for larger
values of k (say, 10 or more). However, Estellon et al. proved that if these positions are distant
enough from each other, the optimal assignment of the considered cars to the positions may be
done in polynomial time. Assuming that Dmax = maxO
j=1 {Dj }, each permuted car has to be not
less than Dmax positions away from any other one. In this case the problem to be solved becomes
the linear assignment problem (LAP), which belongs to class P.
There remains one decision to be made: the choice of positions for the k-permutation move.
The authors of this approach recommended choosing some of them randomly and completing the
set with positions where violations of RCs appear.
Estellon et al. made this LAP-based k-permutation another neighbourhood operator in their
local search. It is attempted with the probability of 0.2%, so it is extremely rarely used. But the
authors said that when attempted, k-permutation very often improvds the current solution. At
the same time it contributed to the diversification of the search, they noted.
159
Results
The results of this method were impressive. VFLS alone proved indeed to be very efficient. It
generated the best-known results for the largest CSPLib instances (200–400 cars) in approximately
one minute, on average. It even improved the best-known solutions of 3 instances.
After introducing the LAP-based k-permutation into VFLS, the algorithm accelerated by 15–
20% and decreased the number of LS iterations by approximately 60%. It means that the large
neighbourhood indeed contributed enormously to the chance of success of one LS move, although
it also consumed a large part of the saved computation time.
As mentioned earlier, the design based on this method won the ROADEF Challenge 2005. The
results labelled ‘winner’ in table 8.2 were generated by the algorithm developed by Estellon et al.
8.6.4
Generic genetic algorithm by Warwick and Tsang
Probably first evolutionary algorithm for a CarSP was proposed by Warwick & Tsang (1995). It
was actually a memetic algorithm which employed local search (hill-climbing) instead of mutation.
With respect to other properties, it was a generational MA with elitism: 10% of best solutions in
the population always survived selection.
As representation they used a sequence of indexes of groups, exactly equivalent to the form
of solution s given earlier. This was further extended with a supplementary binary string b of
length N , in order to enable their uniform adaptive crossover (UAX). An example of such a
representation is given in figure 8.1: there is a sequence of groups p0 and its related binary string
b0 . These additional strings are generated randomly during initialisation and processed exclusively
by the proposed crossover.
Warwick & Tsang (1995) say that ‘the engine of GAcSP [(MA)] is a crossover operator (UAX)
that remembers valuable crossover points in order to aid in retention of useful building blocks that
may be separated in the string representation’. These crossover points are stored exactly in the
additional binary strings. Therefore, it is good to know how the operator works. The pseudocode
of UAX is given in algorithm 19.
Algorithm 19 (o, b) = U AX(p0 , b0 , p1 , b1 )
o = b = (0, 0, . . . , 0) {start with empty results}
draw random a ∈ {0, 1} {index of the first active parent}
aprev = a {the previous active parent: active in last assignment}
i = 1 {first copied position}
while (i ≤ N ) do
while ((b0,i 6= b1,i ) or (a 6= aprev )) and (i ≤ N ) do
oi = pa,i {copy the car group index}
bi = ba,i {copy the associated bit}
aprev = a {remember the donor in the last assignment}
i = i + 1 {advance in the sequence}
a = (1 − a) {switch the active parent}
repair o
return (o, b)
The authors of UAX give an example of how it operates. Lets assume that parents p0 , p1 and
their binary strings b0 , b1 are given like in figure 8.1. Lets also assume that first active parent is
p0 (a = 0). Then the offspring iteratively inherits its values of o and b from the active parent,
starting from position i = 1. The active parent is switched whenever the corresponding b0,i and
b1,i match. In the example it means that the offspring receives positions 1–3 from p0 , 4–5 from p1 ,
6–7 again from p0 and finally 8–10 from p1 .
160
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
p0 1 2 3 4 2 5 4 1 3 5 p1 2 1 3 2 4 1 5 3 4 5
b0 0 1 0 0 1 1 0 0 1 0 b1 1 0 1 0 0 1 1 0 0 1
1 2 3 4 5 6 7 8 9 10
o 1232454345
b 0100010001
Figure 8.1: Example of UAX: parents p0 , p1 with binary strings b0 , b1 and their
offspring o with string b.
The UAX result may be infeasible: some groups may be overrepresented and some other
underrepresented. Therefore, there is always a repair procedure invoked afterwards, and it may
be perceived as a part of UAX. The repairer finds randomly an overrepresented value (position
i), then finds the first underrepresented value in the following subsequence (position l > i). The
value of oi is set to ol and the process continues for increasing positions i. When the the solution
is made valid, the process stops. If not, and the end of this solution is reached, the process is
restarted. The repairer does not modify the accompanying binary string.
Local search uses a swap move to improve the offspring. The first element swapped is the one
with high number of violations (low fitness). The swapped counterpart is the one which causes
maximum gain in the objective function. This results in a steepest LS. However, this component
is an option in the whole algorithm. It is launched only until a given limit on total local search
time is reached.
8.6.5
Genetic algorithm by Terada et al.
Terada et al. (2006) proposed a genetic algorithm for the CarSP and compared it with a series
of algorithms called ‘squeaky-wheel optimization’ (SWO). Their GA was a steady-state algorithm
with 50 individuals, tournament selection, elitist replacement and without local search.
These authors employed an indirect representation with two levels of decoders. Their chromosome was a sequence of length N with floating-point numbers. A car was represented by a position
in this sequence, while an entry at the position was a priority of the car. First decoder was simply
a sorting procedure which established some order of cars based on their priorities. Then, there
was most probably a second decoder invoked (this is not completely clear from the given description). This decoder was a polynomial time construction algorithm that created an actual solution
(sequence of cars) based on the intermediate sequence. It iteratively added the identifiers of cars
to a solution in the given order, one at a time, by finding the earliest possible position in which
the car would not cause any violation of constraints. If such a position could not be found, the
resulting solution had some positions left unfilled and an invalid solution was constructed.
The genetic algorithm employed crossover, and that was the well-known single-point operator.
Mutation was not used, because the authors noticed it had not improved results in any setup.
They considered only random mutation, though.
The algorithms presented by Terada et al. were evaluated on smaller sets of CSPLib instances
with 100 and 200 cars. The basic experiment compared all the proposed algorithms with respect to
the percentage of runs with optimal solutions found (most of these instances were known to have
at least one solution without violations). All algorithms were allowed to run for 1000 iterations,
but no time of computation was reported. This basic experiment revealed that the GA was the
161
best of all 9 tested algorithms. It was the best also when some hardest instances were selected for
another comparison.
In yet another special experiment the authors allowed all algorithms to run longer on the
hardest instances, for 5000 iterations. In this case one version of SWO was best, but it was the one
that had most in common with the GA. It used a population of solutions and the same crossover
operator. Moreover, since the times of computation were not given, it may not be said that in
this special experiment the SWO was indeed better with the same computational resources given.
Finally, all other 6 versions of SWO were worse that the GA; some of them even worse than random
search followed by decoding.
Terada et al. compared their general results to those of Warwick & Tsang (1995) on the same
set of instances. This comparison showed that the GA described here performed significantly worse
than the MA of Warwick & Tsang. Terada et al. report that they could improve the results to be
comparable, by means of some ‘domain-specific post-processing step’, but except for the name of
the step no detail is given.
To summarise, the best algorithm proposed by Terada et al., the genetic algorithm, was worse
than the one of Warwick & Tsang. It therefore seems that the application of two-level indirect
representation, single-point crossover and no local search was a poor design for CarSP. Only due
to this reason it is not worth implementing the GA for comparison with algorithms developed for
this thesis. Besides, Terada et al.’s (2006) description of some elements of their algorithm is vague,
so it makes proper implementation of this algorithm hardly possible.
8.6.6
New crossover operators by Zinflou et al.
Zinflou et al. (2007) noticed that genetic algorithms were rarely used for solving the car sequencing
problem. They explained the situation ‘by the difficulty of defining specific and efficient genetic
operators for the problem. In fact, traditional genetic operators (. . . ) cannot deal adequately
with the specifics of car sequencing’. Therefore, they proposed 3 new recombination operators for
CSPLib CarSP: Interest Based Crossover, Uniform Interest Crossover and Non Conflict Position
Crossover (NCPX).
These operators were tested in a standard, generational GA without local search. However, some neighbourhood operators were employed as mutation. These were: swap, reflection
(Lin2Opt), shuffle (random) and displacement (insert). Overall, on CSPLib instances the best
performer was NCPX, so only this operator is presented below.
The Non Conflict Position Crossover is based on the notion of a conflict position. A position
i in solution s is said to be a conflicting one if for any ratio constraint (option) there exists a
constraint window covering this position in which some violations of the RC occur (Zinflou 2008).
More precisely, for position i ∈ {1, . . . , N } there is a conflict in s, conf (s, i) = 1, if:
∃j ∈ {1, . . . , O} ∃k ∈ {i − Dj + 1, . . . , i} : vnj (s0 , k) > 0
Otherwise conf (s, i) = 0.
NCPX works on two parent solutions. In the first stage, the recombination operator counts
in the first parent all the positions without conflicts. Then, a random number of such positions
numCopy1 is drawn, together with a random position in the parent. Starting from the position
the first numCopy1 non-conflicting positions from the first parent are copied directly to the same
positions in the offspring.
In the second stage, NCPX starts with gathering from the first parent the groups from positions
which were not copied to the offspring. These groups establish the vector of yet available cars.
162
These cars are used to fill the remaining positions in the offspring. A random position there is
drawn, which is not yet filled with a group index. The offspring completion process starts from
this position and continues for the subsequent empty ones. If necessary, it also continues from the
beginning of the offspring, until it is made complete.
The group index of a car to be inserted at a particular position is chosen from the vector of
currently available cars in a hybrid random-heuristic manner. With probability 0.05 the choice
is made randomly. In such a case a group is chosen with probability being proportional to the
number of available cars of this group. Otherwise, the group g to be inserted at position i of an
incomplete solution sp is the one which maximises the heuristic function I:
(
−∆V N (sp , i, g) if ∆V N (sp , i, g) > 0
I(sp , i, g) =
DSU (sp , g)
otherwise
The notation has been somewhat abused here. It means, however, that before assigning a group
to position i in an incomplete solution, the number of new violations to ratio constraints due to
that insert has to be computed. If there are no violations, the heuristic evaluation by the DSU
function matters.
In case several groups maximise I the tie has to be broken. If the same position is nonconflicting in the second parent and is occupied by a group with maximum I, this group is copied
to the offspring. Otherwise, a random group with maximum I is chosen.
One may notice that this NCPX operator generates an offspring mainly based on contents of
one parent. In the first stage, some non-conflicting positions are copied from this parent. In the
second stage the remaining positions are filled to some small extent randomly; to a large extent
heuristically; finally, in some unknown percent of cases they are taken from the second parent. This
unknown percentage of cases depends on how many times different groups induce equal values of I
and at the same time there is no conflict at the considered position in the second parent. At first
glance, the chance that the generated offspring inherits anything from the second parent seems
small. Therefore, it appears as though NCPX is in fact some large heuristic mutation of the first
parent, rather than a recombination operator based on two parents. Without further experimental
data this issue remains unclear, though.
8.7
Summary
The review presented above allows to form some conclusions about good designs for CarSPs.
Importance of local search
It seems that local search is the very foundation of efficient al-
gorithms for such problems. Indeed, the best algorithms rely heavily on this method: the design
of Estellon et al. (2006), Ribeiro et al. (2005), Puchta & Gottlieb (2002). Moreover, Warwick
& Tsang (1995) used it in their MA; Zinflou et al. (2007) employed neighbourhood operators as
mutation and planned to include local search to their EA to improve efficiency. On the other hand,
Terada et al. (2006) did not implement any LS, and their GA design was worse than the 10-years
earlier MA of Warwick & Tsang. Finally, the comparison of pure ACO algorithms with pure LS
by Gottlieb et al. (2003) revealed the LS to be not worse than ACO.
Generic neighbourhood operators
The reviewed algorithms all use some subset of one set
of neighbourhood operators: swap, insert, reflection, random shuffle. These are not any specialpurpose operators defined exclusively for CarSP problems. Rather, these are generic operators that
may be found employed in different problems, especially those concerned with permutations, like:
8.7. Summary
163
the TSP (Merz 2000, Hoos & Stutzle 2004), the CVRP (see section 6.4), the flowshop scheduling
(Reeves 1999, Hoos & Stutzle 2004), the QAP (Hoos & Stutzle 2004).
The case of k-permutation operator of Estellon et al. seems to be an exception from this rule.
However, it is very rarely used by its authors (0.2% of all evaluated neighbours) and in fact does
not have a major impact on the results: the average quality of solutions remains the same, the
algorithm speeds up by 15–20%. Thus, the real strength of VFLS lies in the generic operators.
It is not exactly known which generic operator, or combination of operators, works best for
CarSPs and why. The arguments for using them seem to be: the strong locality of transformations
and good practical results on benchmarks. The excellent results of Estellon et al. and Ribeiro
et al. seem to indicate that swap is the key operator, but no theoretical foundation for this choice
exists, as far as the author knows. In particular, he has not seen any ruggedness analysis of these
operators published, which could shed some light on the matter.
Fast evaluation of neighbours Rather, the speed of evaluation of neighbours seems to be
the key issue while choosing an operator, and this conclusion agrees with the guidelines inferred
from other problems (see section 4.6). The mentioned strong locality of the operators plays an
important role here. Indeed, Puchta & Gottlieb (2002) speed up their algorithm by using the
incremental update scheme and also by restricting the neighbourhood to only small parts of a
sequence. Estellon et al. argue that fast evaluations are the key success factor; they employ special
data structures to accelerate the evaluations.
Good constructive heuristic Finally, what is common to some of these algorithms is their use
of the good heuristic idea of Puchta & Gottlieb (2002): the dynamic sum of utilities (DSU). It is
used by the authors, but also by Estellon et al. On top of that, Zinflou et al. employ DSU as a
heuristic guiding insertions in their NCPX operator.
Beyond local search? What other metaheuristic may be used to efficiently solve CarSPs? How
to adapt them to the problem? The best algorithms did not go far from the basic LS idea. Ribeiro
et al. simply iterated it with additional perturbation or LS phase. Estellon et al. found another,
large neighbourhood which may be efficiently searched. There seems to be no good design of
different kind.
Recombination operators This state of affairs also concerns evolutionary algorithms for CarSPs. Especially when one considers their most ‘evolutionary’ part: recombination operators. Very
recently Zinflou et al. (2007) stated that is was difficult to find good problem-specific recombinations for CarSPs. The author of this thesis agrees with their observation. Currently, there are
only several recombination designs available for CarSPs: single point crossover, UAX, 3 operators
of Zinflou et al.
Single-point crossover is the most classical operator in EAs. It was not designed for CarSPs,
but for problems with binary representation. It frequently produces infeasible sequences of cars.
Computational results indicated it was evidently worse than UAX.
Similarly, UAX is the uniform crossover proposed for binary problems and adapted to CSPLib
CarSP by adding a helper binary string. It has to be followed by a repair procedure in order to
generate feasible offspring.
Finally, NCPX was designed especially for CSPLib CarSP in order to fill the gap in recombinations. It was based on intuition and the good heuristic idea of DSU. However, it is not even clear if
this is indeed a recombination operator and not a macromutation. The other Zinflou et al.’s (2007)
164
operators were also based on ideas from other problems; these were partially-mapped crossover
and uniform crossover.
It appears, therefore, that intuition, good heuristics and experience from other problems were
the basis for the existing recombination designs. No author tried to theoretically or empirically
verify what kind of information should be preserved or changed by a recombination for CarSPs.
That is why the next chapter presents an attempt to find what kind of information is important
in good CarSP solutions, by means of the fitness-distance analysis. The results of this analysis are
further used as a basis for recombination design. Finally, the computational comparison of these
operators with UAX and NCPX will indicate if this systematic design of recombination operators
is indeed a good guideline that may lead a designer of a metaheuristic for the considered CarSP
beyond local search.
Chapter 9
Adaptation of the memetic algorithm to
the car sequencing problem based on
fitness-distance analysis
9.1
Representation
Only one representation was considered for the CarSP: a sequence (vector, array) of group indexes.
In this representation each solution is a vector containing:
• indexes of groups of previous-day cars (never modified nor moved), followed by
• indexes of groups of current-day cars.
The order of elements in this vector reflects the order in which cars are put on the production line.
During computation only indexes of groups are processed. Grouping of cars is performed while
reading input data. If necessary, the generated solution is translated into a sequence of original
car identifiers at the end of computation.
The indexes of previous-day groups are included in each solution for technical reasons. It
simplifies computation of violations of ratio constraints.
9.2
Fitness function and constraints
The fitness function for the designed MA is the original objective function of the CarSP.
All designed algorithms manipulate feasible solutions only. Infeasible ones are not accepted in
any state of computation.
9.3
Local search
The local search designed for the CarSP allows no compositions of neighbourhoods. Each operator
is implemented as a separate LS process. Local search accepts moves without a change of the
objective function (neutral moves).
9.3.1
Insertion of a group index
The operator removes a group index from some position i and inserts it at another position, l.
When an index is removed, all successive indexes are moved by one position back. Then, when an
165
166
Adaptation of the memetic algorithm to the car sequencing problem
index is inserted at position l, the index at position l and all successive indexes are moved by one
position forward. Formally, this move transforms solution s into its neighbour sn :
s = (s1 , s2 , . . . , si−1 , si , si+1 , . . . , sl−1 , sl , . . . , sN )
sn = (s1 , s2 , . . . , si−1 , si+1 , . . . , sl−1 , si , sl , . . . , sN )
The size of the generated neighbourhood is O(N 2 ).
In certain circumstances infeasible solutions may be generated with this operation.
• When removing si , the colour of si−1 and si+1 is the same, and the colour of si different.
Then two subsequences of the same colour, adjacent to position i, are merged.
• When inserting si , the colour of si is the same as sl−1 or sl (or both). Then at last one
sequence of the same colour is made longer.
These conditions may be tested in constant time given a helper vector F ullColSeqLen(i) of length
N . This vector stores at position i the length of the sequence of cars of the same colour as i which
contains this position i.
The actually implemented local search is a mixture of greedy and steepest approaches. Firstly,
the procedure chooses in a random manner one removal position i (the greedy approach). The
index at this position is then removed. Secondly, the local search chooses an insertion position l
for the removed index. The best insertion position is chosen among all possible ones (the steepest
approach). If the insertion at the best position results in an overall improvement of the objective
function, taking into account both changes due to the removal and the insertion, the shift move is
performed.
Since the size of the neighbourhood defined by the shift operator is relatively large (e.g. aproximately 106 for N = 1000), a significant effort has been made to improve the efficiency of the
algorithm. First of all, the neighbour solutions are not constructed explicitly, but only modifications of the objective function caused by particular moves are evaluated (like described in section
4.6.5). In this case, since each group index may be inserted at about 1000 positions, most of the
CPU time is consumed by the search for the best insertion position, while the time of evaluation of
removals is practically meaningless. In addition, the CPU time is consumed mainly by evaluation
of changes of ratio constraints violations. Thus, the effort has been focused on efficient evaluation
of a group index insertions from the point of view of RCs.
Assume that the insertion of an index at position l is evaluated with respect to ratio constraint
j, defined by ratio Nj /Dj . It may modify the number of active cars associated with this ratio
constraint in subsequences of indexes starting from positions {l − Nj + 1, . . . , l}. Therefore, only
the modified subsequences have to be taken into account.
Furthermore, evaluation of insertion at position l + 1 may be further accelerated by taking into
account results calculated for position l. Insertion at position l modifies subsequences of indexes
starting from positions {l − Nj + 1, . . . , l}; insertion at position l + 1 modifies subsequences of
indexes starting from {l − Nj + 2, . . . , l + 1}. Thus, it is enough to exclude from the calculations
the change due to subsequence starting from position l − Nj + 1 and add the change due to
subsequence starting from position l + 1.
The efficiency of local search is further improved by storing information about modifications of
ratio constraints violations at particular positions. For each insertion position l and ratio constraint
j, the evaluation of the insertion of both an index of a group active and not active on this constraint
is stored when calculated for the first time. When insertion of next indexes is considered these
167
stored values are used. Of course, the stored values are cleared in a range of positions defined by
Dj when a move is performed.
Insert was one of the operators considered earlier by (Puchta & Gottlieb 2002) (see section
8.6.1). For the purpose of the challenge it was designed and implemented by Andrzej Jaszkiewicz.
9.3.2
Swap of two group indexes
This operator exchanges (swaps) group indexes between two positions i and l in the sequence.
Formally it transforms solution s into its neighbour sn :
s = (s1 , s2 , . . . , si−1 , si , si+1 , . . . , sl−1 , sl , sl+1 , . . . , sN )
sn = (s1 , s2 , . . . , si−1 , sl , si+1 , . . . , sl−1 , si , sl+1 , . . . , sN )
The size of the generated neighbourhood is O(N 2 ).
Similarly to the case of the insert move, swap may sometimes produce infeasible solutions.
Therefore, before evaluating, each swap is tested for infeasibility. This may be performed in
constant time given two helper vectors. One is F ullColSeqLen(i), the same as for insert. The
other is ColSeqLen(i), which stores the length of the sequence of cars of the same colour as
position i and which finishes at i (any further cars of the same colour are not added). Hence,
ColSeqLen(i) ≤ F ullColSeqLen(i) for all i.
In the implemented local search, for each position i (the first swapped) all other positions l are
tested. If any of these pairs (i, l) leads to improvement after swap, then the best one is performed.
This leads to the steepest algorithm with respect to l, but greedy for i.
As required for a fast local search (see section 4.6.5), only the modification of the objective
function due to swap is computed for each (i, l). Concerning P CC(s), the change may be computed
in constant time with the help of vectors ColSeqLen and F ullColSeqLen. Violations of ratio
constraints are more time-consuming, so special effort has been put in their efficient evaluation.
Firstly, the modification with respect to option j is computed only if the swapped groups really
differ on the option. Otherwise, the sequence of cars does not change from the point of view of
this constraint. Secondly, only the RC windows which overlap the positions i and l are considered
in swap evaluation. This speeds up the process drastically if Dj << N , which is a frequent case
in Renault’s instances. Finally, when |i − l| < Dj , then even less windows have to be evaluated,
since nothing changes in windows which overlap both positions i and l.
This operator was earlier considered by Warwick & Tsang (1995) and Puchta & Gottlieb (2002)
(see sections 8.6.4 and 8.6.1). In the ROADEF Challenge 2005 it was also used by Ribeiro’s team
(see section 8.6.2) and Estellon’s (section 8.6.3).
For the purpose of the challenge the swap operator was designed and implemented by Andrzej
Jaszkiewicz and the author of this text.
9.4
9.4.1
Initial solutions
Exact algorithm for paint colour changes
The general idea for such an algorithm is:
• to start the current-day sequence with the last previous-day colour, if only possible,
• to alternate maximum allowed subsequences (of length PBL) of cars with the same colour,
168
• if one colour is in large excess, to separate maximum subsequences of this colour with single
cars of some other colours as long as it is necessary,
• to signalise infeasibility if no feasible solution exists.
In order to properly describe such an algorithm, some definitions are required. First, lets define
the number of cars with a specific colour c as the number ncc :
ncc = |{i ∈ {1, . . . , N } : col(i) = c}|
Lets also define usedCol(sp ), the vector of colours already used in the partial solution sp :
usedCol(sp ) = (nc01 , nc02 , . . . , nc0C )
where each component c of the vector is the number of cars in sp with colour c:
usedColc (sp ) = nc0c = {i ∈ {1, 2, . . . , |sp |} : col(sp,i ) = c}
Similarly to the case of groups, the vector of colours still available given sp is defined as:
availCol(sp ) = (nc1 , nc2 , . . . , ncC ) − usedCol(sp )
and the colour with the maximum number of available cars as:
C
maxAvailCol(sp ) = arg max{availColc (sp )}
c=1
P CCLB (sp , g) is the lower bound on the number of further colour changes given some partial
solution sp and a group of cars g that is to be appended at the end of sp . It may be computed as
the sum of the number of changes immediately caused by g and the minimum number of further
changes that will happen when g is appended:
P CCLB (sp , g) = CurrentCC(sp , g) + F urtherCC(sp , g)
The component functions of the lower bound are given in algorithms 20 and 21.
Algorithm 20 CurrentCC(sp , g)
curCC = 1 {the default result: there is a colour change}
len = |sp |
if col(g) = col(s0p,len ) then {the last colour continued}
if (LastColSeqLen(s0p ) + 1 ≤ P BL) or (len = 0) then
curCC = 0 {paint batch limit not exceeded or first car}
else
curCC = +∞ {paint batch limit exceeded}
return curCC
The algorithm computing CurrentCC(sp , g) is rather simple. It checks if there is a colour
change between the last colour in sp and the colour of the groups to be appended g. It takes into
account the border between the previous and the current day, as well. It also checks if appending
g is feasible; it signalises infeasibility through an infinite result.
LastColSeqLen(s0p ) is a helper function which returns the length of a continuous subsequence
of cars with the same colour as the last car in s0p , where the subsequence ends in the last position of
s0p . It does not count the previous-day cars, even if they are of the required colour. If s0p contains
only the previous-day cars, the function returns 0.
The algorithm which computes F urtherCC(sp , g) is more involved. Its first task is to check if a
feasible solution exists after appending g; this true if the condition (minSeqCM ax > f easibleBound)
169
Algorithm 21 F urtherCC(sp , g)
f urCC = 0
cmax = maxAvailCol(sp )
isM axCol = 0 {is g of the maximum colour? no by default}
if col(g) = cmax then
isM axCol = 1
lastLen = LastColSeqLen(s0p ) {assume g continues the last colour}
len = |sp |
if (col(g) 6= col(s0p,len ) then {the colour changes with g}
lastLen = 0
minSeqCM ax = d(availColcmax (sp ) + isM axCol · lastLen)/P BLe
PC
f easibleBound = c=1,c6=cmax availColc (sp ) + isM axCol
PC
f ullSeqBound =
c=1,c6=cmax ,c6=col(g) davailColc (sp )/P BLe + isM axCol + (1 − isM axCol) ·
d(availColcol(g) (sp ) + lastLen)/P BLe
if minSeqCM ax > f easibleBound then
f urCC = +∞
else if minSeqCM ax > f ullSeqBound then
f urCC = (minSeqCM ax − 1) · 2 + (1 − isM axCol)
else
PC
f urCC = c=1,c6=col(g) davailColc (sp )/P BLe + d(availColcol(g) (sp ) + lastLen)/P BLe − 1
return f urCC
is false. The second task of the function is to establish if maximum sequences of colours may be
alternated; this is possible if the condition (minSeqCM ax > f ullSeqBound) is false. When these
conditions are checked, the appropriate scenario for further colour changes is established and the
number of changes computed. Infeasibility is signalised through +∞ in the result.
The algorithm for minimising P CC(s) starts with an empty partial solution sp and iteratively
appends any group of cars g which minimises the current lower bound P CCLB (sp , g). If there are
many possibilities, the actual group is chosen randomly. This algorithm is guaranteed to globally
minimise P CC(s) of the generated solution, if only such a solution exists.
9.4.2
Kominek’s heuristic
Kominek based his constructive heuristic on the notion of the utility rate of an RC, similarly to
Gottlieb et al. (2003). He employed it in a slightly different way, though.
First, lets define for each option j the weight of violations of the option, wj , being either
wHP RCs or wLP RCs , depending on the related priority:
prio(j)
(1−prio(j))
wj = wHP RCs · wLP RCs
The constructive algorithm starts with the empty partial sequence sp . In each step it appends
to sp one car of group g with some cars available. The chosen g maximises the heuristic evaluation:
evalK(sp , g) =
O
X
wj · evalKj (sp , g) + wP CC · evalKP CC (sp , g)
j=1
where:
(
evalKj (sp , g) =
utilRate(j, sp )
if (V Nj (sp · g) − V Nj (sp )) = 0
−(V Nj (sp · g) − V Nj (sp )) otherwise
This means that a group gains wj · utilRate(j, sp ) on constraint j which cause no violations. On
the other hand, it costs the weighted number of violations if such occur.
170
The colour term, for len = |sp |, is defined as:

0
0

 −∞ if col(g) = col(sp,len ) and LastColSeqLen(sp ) + 1 > P BL
0
evalKP CC (sp , g) =
1
otherwise if col(g) = col(sp,len )


−1 otherwise
It sets a unit of gain for the group g if it causes no colour change and feasibility is maintained. It
adds a unit of cost if a colour changes. Finally, it adds infinite cost for groups leading to immediate
infeasibility.
The procedure returns an empty partial solution if infeasibility could not be avoided.
9.4.3
Extended Gottlieb and Puchta’s DSU heuristic
The original algorithm of (Gottlieb et al. 2003) was designed for the CarSP without colours and
priorities of ratio constraints. Therefore, it has to be extended to handle instances of Renault’s
CarSP properly.
First, the function ∆V N (sp , g) defined in section 8.5.1 is extended with RCs priorities and
weights into ∆V Nw (sp , g):
∆V Nw (sp , g) =
O
X
(V Nj (sp · g) − V Nj (sp )) · wj
j=1
The extended heuristic algorithm also starts with the empty partial solution sp = (). In each
step it chooses for appending to sp a group g which minimises the function appCost(sp , g):


if availg (sp ) = 0
 ∞


 ∞
otherwise if P CCLB (sp , g) = ∞
appCost(sp , g) =

∆V Nw (sp , g)
otherwise if wHP RCs > wP CC



 ∆V N (s , g) + w
·
CurrentCC(s
,
g)
otherwise
w p
P CC
p
If there are many such groups then the one is chosen which maximises the tie-breaker evalDSU .
This is actually the DSU function extended with RCs priorities, weights and the lower bound on
paint colour changes:
evalDSU (sp , g) =
O
X
wj · optj (g) · utilRate(j, sp ) − wP CC · P CCLB (sp , g)
j=1
Given the 3 possible values of the vector of weights w, there are actually 3 different heuristics
generated by this algorithm. These will be denoted DSU0, DSU3 and DSU6, depending on the
value of wP CC .
9.4.4
Random solution
The procedure which constructs a random solution is given in algorithm 22.
This algorithm starts with the empty partial solution sp . In each step it chooses randomly
a group to be appended to sp . The probability P (g) of choosing a group is proportional to the
number of still available cars of this group. This probability is changed only if infeasibility of the
constructed solution has to be avoided. In such a case a group of different colour is always chosen
with uniform probability of such groups, provided there is at least one of them.
Although this design may somehow bias the generated solutions, this should happen rarely.
The author predicts rather small probability of generating a sequence of P BL cars of the same
colour, except some biased cases of input data.
171
Algorithm 22 Random initial solution
sp = ()
for i = 1 to N do
lastCol = col(s0p,i−1 ) {the last colour so far}
if LastColSeqLen(s0p ) + 1 > P BL then {infeasibility possible; try to avoid}
GotherCol = |{g ∈ {1, . . . , G} : col(g) 6= lastCol}| {groups of other colour}
if GotherCol = 0 then {no feasible sequence possible}
sp = ()
break
for g = 1 to G do
P (g) = 0
if col(g) 6= lastCol then
P (g) = 1/GotherCol
else {any group is feasible}
for g = 1 to G do
P (g) = 0
if availg (sp ) > 0 then
P (g) = availg (sp )/(N − i + 1)
choose g randomly with probability P (g)
sp = sp · g
return sp
9.5
9.5.1
Similarity measures for solutions of the CarSP
In the case of the CarSP similarity measures are used instead of distances. This is due to historical
reasons. Similarity has the opposite meaning than distance: it has higher values for objects which
are closer. It does not have to possess the properties of a metric.
At the initial stage of the challenge, the author’s team intuition indicated that existence and
preservation of certain subsequences of cars in solutions may be crucial for good values of the
objective. It can be clearly seen from the definition of this function, at least from the point of view
of RCs, that cars with active options should be usually separated by cars with inactive ones in order
not to cause violations. Moreover, the colour subcriterion (P CC(s)) should also to some extent
force certain groups of cars to come in subsequences in good solutions, e.g. long subsequences of
the same colour. Therefore, the hypothesis was that in good solutions certain sequences of groups,
those that ensure good separation of active cars and good subsequences of colours, may be more
frequent than others. This idea led to the definition of similarity in terms of common subsequences.
Another concept was to check if consecutive couples of cars are similar in good solutions. It
was considered a simplified version of the idea of common subsequences, since it would not check
triples or longer parts of solutions. This idea is reflected in the definition of similarity as common
succession relations.
The last idea was to compare solutions with respect to positions of cars and this motivated the
definition of similarity as common positions. The author’s team wanted to check if good solutions
should preserve certain positions in sequences. The initial intuition was that this should not be the
case; the author expected positions of cars to be irrelevant to the objective function. Consequently,
it was interesting to see if FDA could confirm this kind of ’no relevance’ hypothesis.
172
Groups and weaker groups of cars
The author’s challenge team relaxed the concept of groups and defined the so-called weaker groups.
These are sets of cars which have only the most important properties identical, i.e. those with the
associated weight 106 . Other properties of vehicles within one weaker group may be completely
different: e.g. colour or low priority options if wHP RCs = 106 . The idea behind weaker grouping
came from an observation that for instances with D=1 (with HPRCs most important and difficult)
the impact of colour and LPRCs on the values of the objective function may be minimal, due
to the fact that V NHP RCs (s) is very unlikely to get zeroed. Thus, these properties should not
be considered at all during computation of similarity. Since indexes of weaker groups are easily
computable, they were also used with other types of instances in order to see the outcome.
Similarity as common positions: simcp
This similarity measure is based on the idea of Hamming distance: it compares solutions on the
position-by-position basis. The value of similarity simcp is the number of positions with the same
indexes of groups in the compared solutions (common positions, cp). In figure 9.1 an example of
such a comparison is given for two simple solutions s1 and s2 .
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
s1 1 2 1 2 1 2 1 2 1 2
s2 1 2 1 2 2 1 2 1 2 1
simcp(s1,s2)=4
Figure 9.1: Comparison of two sequences by simcp measure. Common positions
are emphasised with black background.
Similarity as common subsequences: simcs and simcswg
The second measure evaluates similarity in the sense of common subsequences (cs). This concept,
although independently developed (see Jaszkiewicz et al. (2004)), is similar to the one described by
Mattiussi et al. (2004), where sets of common subsequences are computed to determine similarity
of two genomes. This concept also has a different mathematical formulation here, with weights
of subsequences proportional to their length. Moreover, it was introduced by Mattiussi et al. to
measure diversity, not to perform any fitness-distance analysis.
The algorithm for computing simcs works as follows. First, all common subsequences of two
compared solutions are computed using a generalised suffix tree (Gusfield 1997). A subsequence of
at least two elements is common to two solutions if its elements may be found somewhere in these
solutions in the same order and without additional spacing. Second, in each solution separately, a
subset of maximal subsequences is found. These are subsequences which have the maximum length
and are not completely included in any other subsequence, though they may partially overlap each
other. An illustration of a result of this process is shown in figure 9.2. One can see two solutions
containing only two types of cars (indexes 1 and 2). Below them maximal common subsequences
are marked with solid lines. As mentioned before, sets of such subsequences might be different in
the compared solutions; subsequences do partially overlap in some cases, as well.
When the common subsequences are found, their lengths are summed up with additional
weights. In order to achieve this goal the length of each subsequence is increased by lengths
of all its proper, shorter subsequences which finish at the same position. The subsequences which
are contained in the next maximal common subsequence are not added in order not to inflate the
173
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
s1 1 2 1 2 1 2 1 2 1 2
s2 1 2 1 2 2 1 2 1 2 1
sumcs(s1,s2)=70
simcs(s1,s2)=8
Figure 9.2: Comparison of two sequences by simcs measure. Solid intervals indicate maximal common subsequences; dashed ones show their proper subsequences
included in the computation of similarity (see text).
value of similarity. The purpose of weights is to give preference to single and very long subsequences over many shorter but overlapping ones. In figure 9.2 these shorter subsequences included
in the computation are indicated with dashed lines.
The sum of lengths of all subsequences found so far in both solutions (the solid and dashed
intervals in figure 9.2) defines the value of sumcs (s1 , s2 ). Finally, the value of similarity is computed
with the following formula:
p
simcs (s1 , s2 ) = 1/2( 4 · sumcs (s1 , s2 ) + 9 − 1)
This expression ensures a sort of normalisation of values of this measure: the minimum is 1 (no
common subsequences) and the maximum is equal to the length of a solution (identical solutions
compared). The value of simcs is also equivalent to the length of such a maximal subsequence
which would be the only common one in the two compared solutions.
The third measure, simcswg , is computed exactly in the same way as simcs , though the compared sequences contain indexes of weaker groups only.
Similarity as common succession relations: simcsuc and simcsucwg
The fourth measure of similarity counts the number of common relations of succession (immediate
neighbourhood, adjacency) between indexes of groups in compared solutions. A pair of indexes of
groups is in this relation if the second index immediately succeeds the first one.
An example of computation of simcsuc may be seen in figure 9.3. In the two presented solutions
there are exactly 8 common succession relations (emphasised with arcs under the pairs of indexes).
Note that the last pair in solution s1 , (1, 2), is not a common succession relation because it does
not have its counterpart in s2 (there are 5 such pairs in s1 , but only 4 in s2 ).
The last measure of similarity, simcsucwg , is identical in definition to the previous one, simcsuc ,
but it is computed on the basis of indexes of weaker groups (just like simcswg ).
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
s1 1 2 1 2 1 2 1 2 1 2
s2 1 2 1 2 2 1 2 1 2 1
simcsu (s
c 1,s2)=8
Figure 9.3: Comparison of two sequences by simcsuc measure. Arcs indicate
common succession relations (see text).
174
9.5.2
Random solutions vs. local optima
First stage of the fitness-distance analysis tests possible differences between sets of local optima
and random solutions of a given instance in terms of distance in these sets (see section 5.2.1).
To check these differences in case of the CarSP, large random samples of 1000 different solutions
of each type were generated for sets A, B and X of instances. Random solutions were produced
using algorithm 22, described earlier in section 9.4.4. Local optima were generated by starting
from random solutions and proceeding with local search employing insert (swap was not yet
implemented at the time of this experiment).
In these sets similarities of each type were computed for 500 different pairs of solutions. All
values of similarity were properly normalised (divided by instance size). Finally, statistics on the
obtained values were computed: the average similarity for each instance, the aggregate average
and standard deviation in all instances. For instance set X, these values are shown in table 9.1.
Note that for local optima the table actually shows the difference between the average similarity
for these solutions and random ones, but the values of standard deviation are computed for the
original averages, not for the differences.
Comment on simcp
The values of this measure are rather small, if one recalls that the nor-
malised simcp has the sense of the percentage of common positions. For rand it is on average
8.3%; for lopt it is only 0.6% larger, with similar deviation across instances.
Only in case of two instances, 034X30-0231 and 035X60-0090, the difference between rand and
lopt is larger than 1%. It still remains small in these cases, though: 2.8% and 3.3%.
¯ cp > 0.1 in rand, which would suggest
There are several instances for which normalised sim
extraordinarily high similarity in this set of instances. But when one looks into details of instance
data, it transpires that there are one or two groups of cars which dominate others. Hence, in these
cases it was easy for a pair of random solutions to have a large number of common positions.
Overall, it may be concluded that local optima are practically not more similar to each other
than random solutions are with respect to common positions.
Comment on simcs At first sight, the values of normalised simcs also appear to be small. On
average, for rand simcs = 8.1%, while for lopt simcs = 11.5%. The difference is 3.4%, so it is
larger than in the case of simcp , but still seems to be tiny.
This is not the case, though, since simcs is heavily influenced by lengths of common subsequences. The value of simcs does not exactly reflect the coverage of solutions by such subsequences,
where coverage is the percentage of positions covered by some common subsequence. To show the
differences in this coverage due to 3.4% change in simcs a separate series of values was derived
from the same rand and lopt solutions. The average values of coverage of rand and the differences
for lopt are shown in table 9.2.
These values show that the average coverage by common subsequences rises from 65.5% in
rand to 80.3% in lopt. It means that almost 15% more positions are covered by some common
subsequence in local optima than in random solutions. The largest changes in coverage happen for
instances 039X30-1247 (by 44.9%) and 064X30-0875 (by 29.7%). These are rather high increases
and for large instances (1247 and 875 cars). Very small changes happen e.g. for 025X00-0996 (only
by 5.6%). Also for a number of smaller instances the increase is tiny, but these are the ones with
already large coverage for random solutions, more than 90% (e.g. 028X00-0065 or 655*). These
small instances are exactly the ones with several dominating groups of cars, so it was easy to find
short common subsequences in pairs of random solutions.
0.344
0.195
0.199
0.069
0.192
0.051
0.024
0.084
0.046
0.017
0.043
0.056
0.023
0.035
0.042
0.053
0.016
0.050
0.031
0.083
0.084
avg.
std. dev.
¯ cp
sim
028X00-0065
035X60-0090
655X30-0219
034X30-0231
655X30-0264
064X30-0273
028X00-0325
035X60-0376
048X31-0459
048X30-0519
022X60-0704
029X30-0780
064X30-0875
034X30-0921
025X00-0996
039X30-1037
039X30-1247
023X30-1260
024X30-1319
instance
0.081
0.065
0.277
0.207
0.141
0.100
0.137
0.078
0.039
0.097
0.052
0.029
0.053
0.059
0.036
0.048
0.040
0.047
0.024
0.044
0.034
0.219
0.226
0.560
0.276
0.200
0.170
0.500
0.157
0.126
0.129
0.112
0.106
0.093
0.137
0.089
0.118
0.074
0.140
1.000
0.102
0.072
0.406
0.193
0.714
0.635
0.673
0.444
0.730
0.338
0.111
0.613
0.247
0.097
0.374
0.488
0.233
0.401
0.312
0.421
0.147
0.447
0.288
0.880
0.078
0.968
0.818
0.896
0.851
0.995
0.843
0.741
0.850
0.766
0.803
0.838
0.961
0.859
0.960
0.791
0.980
1.000
0.937
0.870
random solutions (rand )
¯
¯ cswg sim
¯ csuc sim
¯ csucwg
simcs sim
0.006
0.088
0.008
0.033
0.000
0.028
0.008
0.005
0.002
0.007
0.001
0.000
0.004
0.003
0.001
0.001
0.000
0.002
0.001
0.001
0.002
¯ cp
sim
0.034
0.097
0.007
0.218
0.027
0.014
0.063
0.044
0.007
0.070
0.010
0.018
0.033
0.033
0.017
0.018
0.005
0.019
0.022
0.015
0.004
0.049
0.234
0.126
0.227
0.015
0.042
0.011
0.011
0.018
0.148
0.014
0.031
0.145
0.016
0.028
0.029
0.018
0.035
0.000
0.010
0.011
0.170
0.215
0.017
0.237
0.100
0.072
0.132
0.275
0.043
0.274
0.082
0.132
0.321
0.257
0.209
0.224
0.048
0.220
0.314
0.200
0.066
0.041
0.061
0.024
0.093
0.018
0.004
0.003
0.040
0.071
0.096
0.042
0.105
0.108
0.010
0.062
0.009
0.033
0.020
0.000
0.017
0.030
difference for local optima (lopt)
¯ cs sim
¯ cswg sim
¯ csuc sim
¯ csucwg
sim
Table 9.1: Average values of normalised similarity in sets of random solutions and
local optima. Set X of instances.
175
176
Table 9.2: Average values of coverage in sets of random solutions and local optima
for similarity measures based on subsequences. Set X of instances.
instance
¯ cs
sim
rand
¯ cswg
sim
diff. for lopt
¯
¯ cswg
simcs sim
028X00-0065
035X60-0090
655X30-0219
034X30-0231
655X30-0264
064X30-0273
028X00-0325
035X60-0376
048X31-0459
048X30-0519
022X60-0704
029X30-0780
064X30-0875
034X30-0921
025X00-0996
039X30-1037
039X30-1247
023X30-1260
024X30-1319
0.932
0.914
0.927
0.772
0.954
0.627
0.226
0.939
0.465
0.200
0.670
0.797
0.461
0.723
0.560
0.708
0.299
0.731
0.535
0.999
0.985
0.999
0.997
1.000
0.989
0.957
0.999
0.957
0.985
0.994
1.000
0.990
1.000
0.978
1.000
1.000
0.997
0.997
0.007
0.075
0.037
0.080
0.029
0.269
0.082
0.058
0.098
0.233
0.271
0.150
0.297
0.214
0.056
0.175
0.449
0.142
0.090
0.001
0.010
0.001
0.001
0.000
0.005
0.020
0.000
0.019
0.014
0.003
0.000
0.007
0.000
0.008
0.000
0.000
0.002
0.002
avg.
std. dev.
0.655
0.235
0.991
0.013
0.148
0.197
0.005
0.007
But even though the average coverage change by 15% is a significant value, it seems to be
moderate. After all, as much as 65.5% positions are already covered in rand, and the average
increase of simcs amounts only to 3.4%. However, the values for lopt solutions reflect a significant
change compared to rand. The inspection of the actual common subsequences in several pairs
of solutions for different instances revealed that there indeed is quite a difference between rand
and lopt solutions. In order to illustrate this significance figures 9.4 and 9.5 show common subsequences of two random solutions and two local optima. The figures were generated for instance
064X30-0273, which is small enough to be illustrative. At the same time the computed values of
simcs and coverage are near the observed averages in set X.
Figures 9.4 and 9.5 each show a pair of sequences of 273 characters. One character always
corresponds to one position in a sequence. A ‘-’ sign means a position that is not covered by a
common subsequence. ‘(’ opens a common subsequence, ‘*’ fills it and ‘)’ ends. In the algorithms
computing simcs there are also involved proper subsequences of a common sequence and overlaps
between subsequences. But since these are hard to show in one sequence of characters, such
subsequences are merged. Note that common sequences which are exactly adjacent are not merged.
One can see in these figures the difference between similarity of random solutions and of local
optima. The pair from rand is covered generally by many short common subsequences, which
happen to cover 65.2% of all positions. There are 62 such sequences in the upper solution in figure
9.4. The lopt pair is covered in a different way. There are less common subsequences (41 in the
upper solution in figure 9.5), but they are substantially longer, covering 88.6% of the two solutions
on average. This example clearly shows that although a change in simcs by 4.7% seems to be
tiny, it actually reflects a qualitative change in the underlying common subsequences. This is very
177
--()(*)-(****)()------(**)----(*)-(***)()()-----()(***)()-()()----(*)(
**)()---()--()-()()()()-()(**)---()--()--(*)()(*)(***)---(***)--()(**)
--()()-(*)---(***)--(*)------(**)---()---------()--()-()()(*****)-()-(
)-()--()-()-(**)--(*)-(*)-(**)-(*)(******)(**)(*)()---()---()-(***)---()-(***)(***)()-()-----()--(*****)----()------(*)-(****)--(*)(*)-()------()(*)---()-(**)-----(*)-()---()-()-()(*)-(**)()-(**)-()-(*
**)----()---(*)--(*)-(*)-(*)---()()-(*****)--(****)(*)-()-(**)()--(*)--()---------()----()(**)(**)()--()()(*)(*)--()--(*)-(*)-()(**)
Figure 9.4:
Common subsequences in two random solution of instance
064X30-0273. Overlapping sequences are always merged into one. Normalised
simcs = 0.079. Average covered length: 65.2%.
(***)()-()--(****)(*************)-(***)(*********)(**)-()(**********)(******)-()(********)(*****)-()(****)(********)(***)()-(**)-(****)-(**
**)-(*****)-(*****)-(********)(***)(***)-(**)---(****)(*****)(****)(**
******)(****)()----(*********)()(*********)()(*)---(***)---(**)
(*****)(***)--(****)()(*)(*******)--(************)-()(****)()(**)()(*)
(****)--()-()-(*)(**************)----()(*****)(*)(*)(***)-(*)-()(*****
****)-(*)()-(**********)(*)(********)----(****)-()--()---()-(*********
******)-(************)-()(***)(*************)--()()(**)()-()(*)
Figure 9.5: Common subsequences in two local optima of instance 064X30-0273.
Overlapping sequences are always merged into one. Normalised simcs = 0.126.
Average covered length: 88.6%.
different from simcp and simcsuc , where a change by 4.7% would simply mean 4.7% more covered
positions.
Thus, it may be said that the average change of simcs by 3.4% means a significant increase
in the similarity of local optima compared to random solutions. Local optima seem to be more
clustered in the search space from the point of view of common subsequences than random solutions
are.
Comment on simcswg The average change between rand and lopt seems to be larger here than
in case of simcs , as much as 4.9%. At the same time there is much larger deviation of similarity
in lopt for set X, so that the average is not really representative to the whole set.
This is clearly visible for certain instances. For 3 of them, those with wP CC = 106 , the average
¯ cswg is 17.4%. In these problem examples a weaker group is simply the colour of a
change in sim
car. Thus it is hardly surprising that local optima, with good sequences of colour, are much more
similar to each other than random sequences of cars. On the other hand, for instance 039X30-1247,
¯ cswg = 100% already for random solutions, so the change for local optima had to be not larger
sim
than 0; the equality is exactly the case. This is due to the fact that there is only one HPRC in
this instance and actually no car with the related option active. All solutions of this instance are
equivalent with respect to simcswg .
When one excludes these extraordinary instances from this analysis, the average change in
similarity between rand and lopt amounts to 2.7%. This is less than in case of simcs , although
still quite a lot given the same interpretation of the values of this measure.
What is more important, however, the case of the instance with dif = 1 is less optimistic. This
is 048X31-0459, which was the actual basis for introducing weaker groups in set X (see section
¯ cswg is even lower, 1.4%, although a bit larger than for simcs
9.5.1). Here, the change of sim
178
(1.0%). The coverage rises even less than for ordinary groups: by 1.9% compared to 9.8%.
To summarise, it appears that simcswg in lopt rises considerably for instances with P CC(s)
being the most important component of the objective function. For other types of instances the
change is smaller than with respect to simcs , although still seems to be significant. Nevertheless,
for the single instance for which weaker groups were designed the change is rather small. Local
optima are only a bit more similar than random solutions from the point of view of subsequences
of weaker groups.
Comment on simcsuc The average similarity for random solutions amounts to 40.6%, while for
local optima it rises by as much as 17%. Standard deviations are considerable, indicating large
differences in simcsuc between instances.
Large changes in similarity, by more than 20%, happen for 10 instances out of 19. A change
lower than 5% happens only for 3 of them, all of them of type 00. This may indicate that in good
solutions of type 00 instances there is much noise in succession relations.
Overall, it seems that similarity of lopt is significantly higher than of rand : on average 17%
more positions are covered by common succession relations. Local optima are closer to each other
than random solutions are, but for type 00 instances.
Comment on simcsucwg Random solutions are already very similar to each other: on average
88% of succession relations are common on weaker groups, with reasonably small deviation of
¯ csucwg is equal to 74.1%.
7.8%. The smallest sim
The average change in this measure amounts only to 4.1%, which is rather a small value
compared to 17% for simcsuc .
It seems that local optima cannot be much more similar than random solutions from the point of
view of this measure. The latter already have very many common succession relations on weaker
groups. This is also the case of 048X31-0459, the only instance with dif = 1: the change of
¯ csucwg = 4.2%, which is a small value.
sim
9.5.3
Fitness-distance relationships
This section investigates if there are trends in sets of local optima which would confirm the ‘big
valley’ hypothesis: that better solutions tend to be more similar to each other than worse ones.
This investigation was performed with the method of the analysis of a set of pairs of local
optima (see section 5.4.6). The computation was performed on the same set of 500 pairs of local
optima as used in the previous section. Raw values of r2 obtained for set X of instances are given
in table 9.3. Table 9.4 presents aggregated values for all sets and types of instances, in order to
give the reader a general view of results for Renault’s CarSP.
Similarly to the CVRP case, the values of r2 emphasised in boldface in table 9.3 are those
not smaller than 0.18 and are deemed significant. All cases with r2 ∈ [0.15, 1.18) are typeset in
italic. Moreover, each instance from set X was manually classified as ‘big valley?’=‘yes’, ‘no’ or
‘amb.’ (ambiguous) based on the observed r2 .
The conclusions related to values of FD determination coefficients are also confirmed through
inspection of fitness-distance plots. Here, 2 dimensional plots are obtained by cutting a slice
through clouds of 3-dimensional points, along the plane f (s1 ) = f (s2 ). Points included in the plot
satisfy the constraint |f (s1 ) − f (s2 )| < ² · f (s1 ). Usually ² was set to 5%, but sometimes it was
necessary to put ² = 10% in order to have any points in a scatter plot.
179
Table 9.3: Values of the linear determination coefficient r2 between fitness and
each similarity measure for instances from set X.
instance
linear determination coefficient
2
rcp
2
rcs
2
rcswg
2
rcsuc
2
rcsucwg
028X00-0065
035X60-0090
655X30-0219
034X30-0231
655X30-0264
064X30-0273
028X00-0325
035X60-0376
048X31-0459
048X30-0519
022X60-0704
029X30-0780
064X30-0875
034X30-0921
025X00-0996
039X30-1037
039X30-1247
023X30-1260
024X30-1319
0.001
0.008
0.008
0.002
0.006
0.004
0.000
0.002
0.002
0.001
0.007
0.007
0.000
0.005
0.003
0.004
0.006
0.005
0.003
0.000
0.380
0.460
0.009
0.361
0.577
0.003
0.586
0.003
0.289
0.474
0.401
0.024
0.324
0.007
0.370
0.122
0.458
0.005
0.002
0.415
0.014
0.020
0.008
0.045
0.003
0.541
0.003
0.002
0.584
0.140
0.002
0.000
0.001
0.018
0.001
0.011
0.006
0.454
0.171
0.000
0.392
0.424
0.006
0.249
0.001
0.219
0.196
0.157
0.026
0.122
0.006
0.244
0.055
0.243
0.003
0.002
0.376
0.002
0.003
0.006
0.004
0.001
0.322
0.006
0.007
0.522
0.002
0.011
0.002
0.009
0.012
0.001
0.003
avg.
std. dev.
0.004
0.003
0.255
0.213
0.101
0.190
0.157
0.148
0.072
0.154
big valley?
no
yes
yes
no
yes
yes
no
yes
amb.
yes
yes
yes
no
yes
no
yes
no
yes
no
Comment on simcp In case of simcp no correlation between similarity and fitness has been
revealed. All the related values of r2 in table 9.3 and also all the averages in table 9.4 are virtually
zero. This lack of relationship is also clearly visible in most of FD pots, e.g. 03930-1037 (figure
9.6), 025X00-0996 (figure 9.7). The view on the potential relationship is somewhat obscured in
case of instance 048X31-0459 by the existence of separate vertical clouds of points (figure 9.8,
top-left plot). When one zooms on the group of best solutions (bottom-left plot in the figure) the
lack of relationship is more clear.
To summarise, this Hamming-type similarity measure reveals no ‘big valley’ in any of the
studied instances.
Comment on simcs Conclusions are quite different in case of simcs . Some significant values of
the determination coefficient are observed here. It seems that the presence of such values depends
heavily on instance type or even the particular instance, though.
2
values for all instance
Firstly, one can see in table 9.4 that types (WD) 30 and 60 have average rcs
sets rather high: 0.26 and 0.24, respectively. High FD determinations are also found in table 9.3.
All 3 instances of type 60 reveal high values (0.38–0.58). 8 for 12 type 30 instances have them
high, as well; 4 instances of the type reveal no significant FDC and no trends in FD plots.
Concerning the plots, all type 60 instances have visible trends, alike to the one visible in figure
9.9: better solutions are more similar to each other than worse ones. Most of type 30 instances
have the trends visible, as well, although the noise in similarity values is also considerable, see
e.g. figures 9.6 and 9.10.
Secondly, instances of type 00 reveal no correlation between fitness and simcs . All 3 instances of
2
the type in set X have rcs
near zero. Moreover, the FD plots show no visible trends, see e.g. figure
180
0.09
0.067
0.08
sim_cs
sim_cp
0.07
0.06
0.05
0.04
0.066
0.065
0.064
0.03
290000
300000
fitness
310000
280000
0.18
0.68
0.178
0.67
290000 300000
fitness
310000
290000
300000
fitness
310000
0.66
0.176
sim_csuc
sim_cswg
280000
0.174
0.172
0.65
0.64
0.63
0.17
0.62
0.168
0.61
280000
290000 300000
fitness
310000
280000
1.0005
1.00025
sim_csucwg
1
0.99975
0.9995
0.99925
0.999
0.99875
0.9985
280000 290000 300000 310000
fitness
Figure 9.6: Fitness–distance plots for instance 039X30-1037 and all similarity measures.
0.07
0.047
0.06
sim_cs
sim_cp
0.046
0.05
0.04
0.045
0.03
0.02
500000
600000
700000
fitness
800000
0.044
500000
600000
700000
fitness
800000
Figure 9.7: Fitness–distance plots for instance 025X00-0996, simcp and simcs .
181
Table 9.4: Average values of the linear determination coefficient r2 between fitness
and each similarity measure, grouped by instance set and type.
set
WD
#inst.
2
avg(rcp
)
2
avg(rcs
)
2
avg(rcswg
)
2
avg(rcsuc
)
2
avg(rcsucwg
)
A
A
A
A
A
00
01
30
31
60
1
3
4
4
4
0.001
0.007
0.006
0.003
0.003
0.001
0.010
0.349
0.012
0.291
0.001
0.346
0.100
0.241
0.443
0.001
0.005
0.223
0.008
0.164
0.023
0.011
0.014
0.005
0.241
B
B
B
B
B
00
01
30
31
60
9
6
9
6
15
0.009
0.008
0.006
0.006
0.005
0.088
0.033
0.199
0.082
0.180
0.007
0.065
0.005
0.047
0.395
0.033
0.013
0.090
0.048
0.083
0.004
0.003
0.006
0.005
0.051
X
X
X
X
X
00
01
30
31
60
3
0
12
1
3
0.001
0.004
0.002
0.006
0.003
0.283
0.003
0.480
0.002
0.024
0.003
0.513
0.006
0.171
0.001
0.300
0.004
0.005
0.006
0.407
A
B
X
avg.
avg.
avg.
16
45
19
0.003
0.007
0.004
0.163
0.133
0.255
0.196
0.149
0.095
0.099
0.060
0.157
0.065
0.020
0.068
avg.
avg.
avg.
avg.
avg.
00
01
30
31
60
13
9
25
11
22
0.007
0.007
0.005
0.005
0.005
0.061
0.025
0.263
0.049
0.241
0.006
0.158
0.029
0.113
0.419
0.024
0.010
0.150
0.029
0.127
0.005
0.006
0.007
0.005
0.134
avg.
avg.
80
0.005
0.168
0.159
0.091
0.041
0.09
0.066
0.08
0.064
0.06
sim_cs
sim_cp
0.07
0.05
0.04
0.062
0.06
0.03
0.058
0.02
0.01
3.1e+007
3.2e+007
fitness
0.056
3.1e+007
3.3e+007
3.2e+007
fitness
3.3e+007
0.09
0.066
0.08
0.06
0.05
0.04
0.064
sim_cs
sim_cp
0.07
0.062
0.06
0.03
0.02
0.01
3.121e+007 3.123e+007 3.125e+007 3.127e+007
fitness
0.058
0.056
3.121e+007 3.123e+007 3.125e+007 3.127e+007
fitness
Figure 9.8: Fitness–distance plots for instance 048X31-0459, simcp and simcs :
whole fitness axis (top); zoom on the best solutions (bottom).
182
9.7. This appears to be the case also in set A of instances. For set B there may exist some outliers,
since the average r2 = 0.088; this was not investigated further, though.
2
Thirdly, the only type 31 instance in set X has low rcs
, equal to 0.003, which indicates the ‘no
relationship’ case. But the FD plot is more informative in this case. The plot with all generated
local optima (figure 9.8, top right) is divided into vertical clouds of points. Obviously, these cannot
result in high values of the linear determination. But when one zooms on the best solutions, like
in case of simcp earlier (figure 9.8, bottom right) there seems to be visible a weak trend between
2
fitness and simcs . With rcs
= 0.176 this trend allows to change the classification of the instance
from ‘no’ to ‘ambiguous’ in table 9.3. This example shows that indeed conclusions based only on
values of FDC may be sometimes wrong and a look at a FD plot may reveal more information.
2
Similarly to this one instance, the average value of rcs
for type 31 is very low in set A (0.012)
and rather low in set B (0.082), perhaps indicating some outliers in the latter. The look at FD
plots reveals no trends in any instance of set A (see figure 9.11 for an example). However, in set
B there were two cases very similar to 048X31-0459, namely 023B31-1110 and 029B31-0730. In
2
these instances significant trends were found for groups of the best solutions, with rcs
= 0.33 and
2
rcs = 0.21, respectively. Other set B instances did not reveal any relationship between f and
simcs . Therefore, the existence of a ‘big valley’ in type 31 seems to be rather unlikely, but depend
on the analysed instance.
Type 01 is not represented in set X, so sets A and B were investigated. The average values
2
are low in these sets, indicating small chance for ‘big valleys’. This is exactly the case,
of rcs
2
except for one instance, 023B01-1110. Indeed, rcs
= 0.05 for this problem example, but again FD
plots help the analysis, as shown in figure 9.12. The basic plot (top left) shows separated vertical
groups of local optima. First zoom on the best group (top right plot in the figure) shows a similar
structure, yet with some indication of a slope. Yet another zoom on the best group finally reveals a
considerable trend with strength r2 = 0.34. In this group of solutions better ones are more similar
to each other than worse ones.
The conclusions concerning simcs :
• it is correlated with fitness in all studied type 60 instances and most of type 30;
• it is not correlated for type 00;
• there is rather no correlation for type 31, although with some counterexamples; the correlation seems to depend on the particular instance;
• it is mostly uncorrelated with fitness for type 01, with one counterexample found.
0.09
0.28
sim_cswg
sim_cs
0.088
0.086
0.084
0.082
3e+007
0.26
0.24
0.22
0.2
4e+007
5e+007
fitness
6e+007
4e+007
5e+007
fitness
6e+007
Figure 9.9: Fitness–distance plots for instance 022X60-0704 and simcs , simcswg .
183
0.062
0.069
0.068
0.061
sim_cs
sim_cs
0.067
0.06
0.059
0.066
0.065
0.058
0.057
300000
0.064
320000
340000
fitness
0.063
240000
360000
260000
280000
300000
fitness
Figure 9.10: Fitness–distance plots for instances 023X30-1260 and 034X30-0921,
simcs .
0.034
0.137
0.135
sim_cswg
sim_cs
0.033
0.032
0.031
0.03
2e+007
0.133
0.131
0.129
2.5e+007
3e+007
fitness
0.127
2e+007
3.5e+007
2.5e+007
3e+007
fitness
3.5e+007
0.054
0.053
0.053
0.052
0.052
sim_cs
0.054
0.051
0.05
0.051
0.05
0.049
4.7e+007 4.9e+007 5.1e+007 5.3e+007 5.5e+007
fitness
0.049
4.8e+007
4.8005e+007
fitness
4.801e+007
0.053
0.052
sim_cs
sim_cs
Figure 9.11: Fitness–distance plots for instance 024A31-1260, simcs and simcswg .
0.051
0.05
4.80006e+007 4.80007e+007
fitness
Figure 9.12: Fitness–distance plots for instance 023B01-1110 and simcs : whole
fitness axis (top left); first zoom on the best solutions (top right), second zoom on
the best solutions (bottom).
184
Comment on simcswg This measure was supposed to be a replacement for simcs in case of
types 01 and 31. There is only one instance of these types in set X, so it is difficult to form any
conclusions based on this one case. Moreover, this instance is 048X31-0459, the one for which
correlation of f and simcs was found (see figure 9.8). No such correlation with simcswg was
revealed.
2
Type 01 instances from set A reveal high rcswg
, the average being 0.346. Significant trends
were also confirmed in FD plots. The obtained values for type 31 in this set are high in 3 on 4
cases, being 0.25 or higher. FD plots show the trends, as well; an example is shown in figure 9.11.
2
The sole exception is instance 039A31-0954, for which rcswg
= 0.
In set B, however, types 01 and 31 reveal no correlation of fitness and simcswg . In some of
2
these instances (type 31) high rcs
were found instead, as discussed earlier.
2
Unexpectedly, high rcswg values were found for all instances of type 60, where the colour is the
most important property of a car. An exemplary FD plot of such a case is shown in figure 9.9.
Therefore, it seems that the existence of a ‘big valley’ with respect to simcswg depends on the
analysed instance. It is likely to exist for types 01 and 31 in set A, while it is rather unlikely in set
B. Only the type 60 instances behave in a consistent way, but this is not that important knowing
that P CC(s) is easily optimisable.
Comment on simcsuc This one is very much related to simcs , but measures only the immediate
neighbours in a sequence. It was found to correlate with fitness in exactly the same cases as simcs ,
but usually weaker. This can be to some extent seen on instance 039X30-1037, in figure 9.6: the
trend for simcs (r2 = 0.37) is a bit more concentrated around the regression line than the trend
for simcsuc (r2 = 0.244).
This general observation suggests that the relation of succession (neighbourhood) between
certain groups of cars is important for the overall quality of a sequence. This conclusion was to
some extent predictable, given the results for simcs . However, the comparison of results for the
two measures indicates that also sequences longer than 2 are of importance, so that simcs reveals
slightly stronger trends in sets of local optima.
Comment on simcsucwg This similarity measure correlates to some extent with fitness in case
of type 60 only, where the colour of a car is its only identifier. This is not surprising given the fact
that in type 60 P CC(s) is the most important subcriterion, forcing good solutions to have cars of
the same colour put in continuous subsequences.
For other types of instances no significant FD determination values were found, which is visible
in tables 9.4 and 9.3. Even the instances for which simcswg is correlated with fitness are not
positive for simcsucwg ; the values of r2 are very low and no trends are visible in FD plots.
Moreover, simcsucwg seems to degenerate for some instances, like 039X30-1037 shown in figure
9.6. In this example there are only two values of the measure possible, while for 039X30-1247 there
is only one (hence no variance of similarity and no correlation). This happens when there are scarce
high priority options, e.g. for 039X30-1037 there are two HPRCs, but only 3 combinations of the
related options.
Thus, it seems that something more than the succession relations between weaker groups are
needed to find a ‘big valley’ in the studied instances.
Comment on instances
In set X there are 7 instances which appear to have no ‘big valleys’
with respect to any of the presented similarity measures. There are all type 00 instances and 4 of
12 type 30 ones. In 12 other instances some correlation of fitness and simcs was revealed. Some
9.6. CCSPX: conservative common subsequence preserving crossover
185
of them, all type 60 included, reveal fitness-distance correlation also from the point of view of
simcswg , simcsuc or simcsucwg .
9.5.4
Main conclusions from the fitness-distance analysis
The initial guess that positions of groups of cars in a sequence do not matter in the CarSP has been
confirmed in this analysis. Neither are local optima significantly more similar to each other than
random solutions, nor there is any correlation between fitness and simcp in sets of local optima.
On the other hand, the hypothesis that subsequences of groups of cars (no matter their location)
play an important role in good solutions of the CarSP has been to some extent confirmed here,
for some types of instances. Similarity with respect to simcs is significantly higher in sets of local
optima than in random solutions, except for type 00. Important correlations of f and simcs exist
in types 60 and 30 (with some exceptions). For types 01 and 31 it is rather unlikely and depends
on an instance. Instances of type 00 show no correlation for this similarity.
The introduction of weaker groups and the examination of simcswg revealed that in instances
of types 01 and 31 the correlation of fitness and similarity may exist, especially in set A. The result
2
is inverse of the expected one in sets B and X, though. High rcswg
exist for instances of type 60.
Common succession relations, as measured by simcsuc , are important in local optima, because
their similarity is significantly higher than this of random solutions. The observed correlations are
considerable for the same instances as for simcs , but usually weaker. This indicates that simcs is
a better choice than simcsuc for a measure of similarity of CarSP solutions.
The same conclusion applies to simcsucwg when compared to simcswg . Only instances of type
60 have high correlations of this measure and fitness, and they are lower than rcswg . Moreover,
the change in average similarity when comparing rand and lopt is lower.
Therefore, the author forms the following conclusions concerning the properties of ‘genetic’
operators in the designed memetic algorithm:
• for types 60 and 30 a recombination operator preserving simcs should be of particular use;
• types 31 and 01 might prefer simcswg , but the results of FDA are somewhat confusing here,
with a number of exceptions and ‘no big valley’ cases;
• type 00 should prefer mutation to recombination, since no regularities in the analysed landscapes have been found.
9.6
CCSPX: conservative common subsequence preserving crossover
Based on the results of the fitness-distance analysis, the idea of this crossover is to preserve all
common parental subsequences of groups or weaker groups, depending on instance type. Moreover,
the operator is supposed to be conservative and not disruptive, i.e. change as little as possible in
the order of common subsequences and in the order of positions not covered by these subsequences.
The general steps of CCSPX are given in algorithm 23. The operator is parametrised: groupLevel
indicates which groups of cars should be used (ordinary or weaker); numSwaps is the number of
pairs of subsequences which are swapped in the offspring compared to one of its parents, the chosen
donor. All random choices in CCSPX use uniform probability over the considered set of objects.
An example of how CCSPX operates is shown in figures 9.13 and 9.14. The example uses
ordinary groups of cars. The first figure shows common subsequences of two parents, using the same
means as in figure 9.4. Figure 9.14 illustrates the same subsequences in their CCSPX offspring,
where the donor was p2 . Here, the positions not covered by common subsequences are not marked
186
Algorithm 23 o = CCSP X(p1 , p2 , groupLevel, numSwaps)
compute all maximal common subsequences in p1 and p2 using groups on level groupLevel
choose randomly the donor of subsequences, donor = p1 or donor = p2
merge overlapping common subsequences of the donor
merge consecutive non-common positions of the donor into subsequences
get the vector of all subsequences from the donor in their original order
swap numSwaps randomly chosen pairs in the vector of subsequences
assemble the offspring o from the subsequences in the order given in the vector
return o
with a ‘-’ sign, because they were merged into subsequences by CCSPX. Moreover, a ‘.’ indicates
subsequences which were swapped. In this example 2 pairs were swapped: subsequence number
18 with 37 (the two longer ones) and 53 with 15 (the pair of shorter ones).
----(****)(*********)(******)(***)(***********)--(**)---(*)-(********)
-(************)-(*)()(********)(************)--(***)(**)-(*********)(*
*)(****)-(**)(***)(******)()(**)(**********)-(*)(***)-(**)-(**)(***)(*
**)-(**)--(**)(*****)(**)--(**)()-(************)(***)()()()(**)
(**)-()(***)--()-(****)(**********)(******)(*****)-(**)(**************
)-(*******)(*****)(*********)(*)()()()(*************)(*****)(*****)(**
***)--(*)()-()(************)(***)-(*******)(**)(****************)-(***
*****)()()--(******)(**)()-()-(*)()(***)(*)---()(*)(******)(**)
Figure 9.13: Common subsequences in two local optima of instance 064X30-0273,
parents in CCSPX example. Overlapping sequences are merged into one. Normalised simcs = 0.128. Average covered length: 91.9%. f (p1 ) = 68000 (top) and
f (p2 ) = 54000 (bottom).
(**)(()(***)()()((****)(**********)(******)(*****)((**)(**************
)...(*******)(*****)..................(*)()()()(*************)(*****)(
*****)(*****)()(*)()(()(************)(***)((*******)(**)...........((*
*******)()()()(******)(**)()(()((*)()(***)(*).()(*)(******)(**)
Figure 9.14: Subsequences in a CCSPX offspring o of parents shown in figure 9.13.
donor = p2 . Dots (.) indicate swapped subsequences. Normalised simcs (p1 , o) =
0.128, simcs (p2 , o) = 0.466, f (o) = 6057000.
One can see that certain positions of cars were disrupted by CCSPX, especially between the
swapped subsequences 18 and 37. But similarity of the offspring with their parents in terms of
subsequences is not lower than the similarity between parents: simcs is preserved by CCSPX.
Moreover, the offspring is rather more similar to the donor than to the other parent. This is
an obvious effect of inheriting the subsequences directly from one parent, with only some small
changes (swaps). Yet the offspring is different from the donor.
The exemplary offspring is worse than both of the parents. After local search it improves, so
that f (o0 ) = 58000. At the same time the similarity to parents remains preserved to high degree:
simcs (p1 , o0 ) = 0.125, simcs (p2 , o0 ) = 0.169.
9.7
RSM: random shuffle mutation
The proposed CCSPX crossover preserves subsequences from parents. Local search further improves them. Therefore, intuitively, the mutation operator should somehow disrupt the subse-
9.8. Adaptation of crossovers from the literature
187
quences contained in a solution. The random shuffle operator was chosen for this task. It was
employed earlier e.g. by Gottlieb et al. in their local search (see section 8.6.1).
The operator randomly reorders a part of the given sequence. Firstly, it draws randomly
(uniformly) the length of the mutated subsequence from the given interval [lowBound, upBound].
Secondly, it chooses randomly (uniformly) the actual starting position for the subsequence so that it
fits entirely in the solution. Thirdly, all groups of cars are removed from the subsequence. Finally,
all these groups are reinserted into the emptied part, one by one, starting from its beginning.
The group to be inserted is chosen randomly from the set of available ones, with the probability
proportional to the number of available cars. The only exception is the case when some groups
may immediately violate sequence feasibility (the paint batch limit). Then, only the groups which
cannot lead to infeasibility are considered. If there is no such possibility, infeasibility of the mutant
is signalled.
9.8
9.8.1
Adaptation of crossovers from the literature
Adaptation of NCPX
NCPX was originally designed for the CSPLib CarSP, so it does not take weights of RCs and
colour into account (see section 8.6.6). In order not to worsen its results on Renault’s CarSP, the
operator was additionally adapted to this version of the problem. Thus, in the second stage of the
NCPX each group index to be heuristically inserted into the offspring is chosen by means of the
modified function I 0 :
(
−∆V Nw (sp , i, g) if ∆V Nw (sp , i, g) > 0
0
I (sp , i, g) =
DSUw (sp , g)
otherwise
where ∆V Nw is defined like in the extended Gottlieb and Puchta’s heuristic, see section 9.4.3.
DSUw (sp , g) is the weighted DSU heuristic evaluation which was used as a part of evalDSU (sp , g)
in section 9.4.3:
O
X
DSUw (sp , g) =
wj · optj (g) · utilRate(j, sp )
j=1
The colour of cars was not considered in this extended version of NCPX. Taking colour into
account is not a straightforward task: the partial solution in NCPX is not simply a sequence of
consecutive positions which finished before all cars are inserted, but it may have some gaps of
different size. This makes the heuristic evaluation of colour changes more difficult.
9.8.2
Adaptation of UAX
The original repair method in UAX does not take into account the situation when an underrepresented group is completely missing from the offspring (see section 8.6.4). In such a case the repair
method cannot find any car which belongs to this group in the offspring sequence and the whole
crossover has to fail. In initial tests on Renault’s CarSP it was observed that this causes some
problems: a large fraction of crossovers on local optima did not generate feasible offspring; only
10–20% of trials were feasible. Therefore, a modified repair method was proposed by the author,
which improves over the original.
This extended repair method proceeds as the original one with one exception. When an overrepresented groups is found and no underrepresented group exists in the offspring, the set of all
underrepresented groups is gathered based on the comparison of problem data and the offspring.
Then, an underrepresented group is chosen randomly, with the probability being proportional to
188
the number of cars of the group which are missing in the offspring. In initial tests it was noticed
that such an approach results in a higher fraction of feasible offspring, between 50% and 100%, with
the majority above 80%. Thus, the slightly modified repair function makes UAX more applicable
to Renault’s problem.
9.9
Experiments with initial solutions
In this experiment the quality and the time of generation of initial solutions was investigated,
also when coupled with local search. All the algorithms described in section 9.4 were tested here.
Local search was employed in 4 randomised variants: insert, swap, insert+swap, swap+insert. Also
solutions without local search were examined. Each combination of a heuristic and local search
was run 30 times on each set X instance.
All experiments were performed with the code developed by the author in C++, with several procedures implemented by members of the challenge team: Andrzej Jaszkiewicz and Paweł
Kominek.
Instances of types 00, 30, 31
The average quality of results for all types of initial solutions and LS are shown in figure 9.15.
The quality is measured with respect to the average best solution obtained in the challenge (see
table 8.2, column ‘best average’). The results are averaged over instances of types 00, 30 and 31
only; instances of type 60 are dealt with separately, because of the exact polynomial algorithm for
P CC(s).
300%
average excess [%]
250%
200%
150%
100%
50%
0%
DSU0
DSU3
DSU6
Kominek's
random
exact PCC
initial algorithm
none
insert
swap
insert+swap
swap+insert
Figure 9.15: Average quality of heuristic solutions over type 00, 30, 31 instances;
without and with local search.
One can see in the figure that local search is mandatory after initial heuristics in order to obtain
good quality. The best LS version involves insert+swap, the insert is second-best. Any LS with
swap gives poorer results.
On average, the best initial solutions are generated by the DSU6 algorithm coupled with insert+swap; the average excess amounts to 34%. Kominek’s heuristic, DSU3 and exact PCC are
not much worse, with results varying between 36% and 40%. However, compared to the best
average quality in the challenge, there is still much scope for improvement.
189
9.9. Experiments with initial solutions
The time of computation of initial solutions mainly depends on instance size and LS version.
Therefore, the results presented in figure 9.16 are averaged over instances and types of algorithms.
The figure also presents lines of power regression (all r2 > 0.85).
25000
average time [ms]
20000
15000
10000
5000
0
0
200
400
600
800
1000
1200
1400
instance size
none
insert
swap
insert+swap
swap+insert
r-none
r-insert
r-swap
r-insert+swap
r-swap+insert
Figure 9.16: Average time of computation of initial heuristics followed by local
search.
It can be seen in the figure that local search slows down the generation of solutions considerably.
While without LS a solution for the largest instance is generated in less that 0.5 s, with local search
it may take 50 times longer, on average. Still, the times are reasonable, being less than 25 seconds
per solution for the slowest LS, insert+swap. This is the best version of LS in terms of solution
quality, though.
Instances of type 60
In case of these instances the exact algorithm for P CC(s) ensures that the generated solutions
are already of good quality. Coupled with insert+swap LS, the average result of this heuristic is
usually very close to the best from the challenge, less than 0.02% of excess, as table 9.5 indicates.
Moreover, the best in 30 runs has equal evaluation to the best challenge result.
Table 9.5: Average and best quality of solutions to type 60 instances generated by
the exact PCC algorithm followed by insert+swap local search.
instance
035X60-0090
035X60-0376
022X60-0704
best challenge
average initial
best initial
5010000
6056000
12002003
5010800.0
6057066.6
12002205.1
5010000
6056000
12002003
Together with local search, it always takes less than 2 seconds to generate such a solution for
each type 60 instance.
It seems that for these instances it is sufficient to use this algorithm to solve them.
Composition of initial population
Given the results of this experiment, the initial population of the memetic algorithm always contains solutions improved with insert+swap local search. One solution of each DSU heuristic is
190
included, one of exact PCC. If the population is larger than 4 elements, it is completed with
different solutions of Kominek’s heuristic.
9.10
Experiments with memetic algorithm
9.10.1
Long runs until convergence
In this experiment long runs of the memetic algorithm were allowed, most probably until complete
convergence. The algorithm stopped after 75 unproductive generations, i.e. when nothing changed
in the population. For a population of 15 solutions it meant 5 crossover and mutation trials on
each member before convergence, on average. The same value was used in experiments on the
CVRP (section 7.9).
Population of size 15 was chosen, half of the size for the CVRP. This choice was made due
to larger sizes of instances in the CarSP, which caused the initialisation phase to be considerably
longer. For the largest instance the size of 15 means around 200–250 s for initialisation itself.
The size of 30 would mean 400–500 s for this phase, around 80% of the time limit in ROADEF
Challenge 2005. The author considered this to be too long.
The contents of the initial population were described in the previous section.
Several versions of the memetic algorithm were run. The basic version employed RSM as mutation and one crossover: NCPX, UAX or CCSPX applied on the ordinary groups with numSwaps =
2 (denoted CCSPX-2). Additionally, one version without crossover was used (MUT) and one without any ‘genetic’ operator, starting in the main loop from random solutions, hence resulting in
multiple start local search (MSLS).
Moreover, in order to assess the impact of mutation on results of the MA, one configuration
without mutation was used for each recombination. This is denoted ‘noMutation’ or ‘noMut’.
Each configuration and version of the MA was run 15 times on each of the 19 set X instances.
The same set of computers as in CVRP experiments was used.
Quality of solutions: aggregated results
10%
9%
9%
MSLS
MUT
CCSPX-2
MA type
basic
noMutation
NCPX
UAX
4%
1%
0%
MSLS
MUT
CCSPX-2
NCPX
MA type
basic
noMutation
Figure 9.17: Quality of results: average over instances of the average result (left)
and of the best result (right) in 15 long runs.
7,29%
2%
3,48%
3%
5,96%
8,76%
4,35%
8,01%
4,37%
0%
5,07%
1%
4,09%
2%
4,45%
3%
5%
3,40%
4%
6%
4,15%
5%
7%
2,90%
6%
8%
3,29%
7%
7,63%
8%
average excess [%]
10%
8,95%
average excess [%]
The values of excess over the best average challenge solution, aggregated over all instances, are
presented in figure 9.17: the average of averages on the left, the average of the best in 15 runs on
the right. The actual evaluations of solutions, on which these statistics are based, are presented
in tables B.3 and B.4 in the appendix.
UAX
191
Concerning the average quality of solutions of the basic versions, one can see in the figure that
MSLS is able to get as close as 9% to the best solutions. The best MA, CCSPX-2, can improve
this result by around 5%, to 4.09%. Other basic versions of the MA are little worse: UAX has
4.35% of excess, NCPX 4.37%. However, RSM mutation alone is able to get as close as 4.45%.
Standard deviations are less than 1% in most of the cases.
Aggregated results of statistical tests for the difference between mean quality of results of basic
MAs are shown in table 9.6. The same procedure as for the CVRP was used here (see section
7.9.1).
Table 9.6: Comparison of the basic algorithm versions with the Cochran-Cox statistical test for the significance of the difference of averages; long runs.
MSLS
MUT
CCSPX-2
NCPX
UAX
MSLS
MUT
CCSPX-2
NCPX
UAX
totals
sum
0/0
15/0
15/0
15/0
15/0
0/-15
0/0
6/0
0/0
1/0
0/-15
0/-6
0/0
0/-6
0/-6
0/-15
0/0
6/0
0/0
0/0
0/-15
0/-1
6/0
0/0
0/0
0/-60
15/-7
33/0
15/-6
16/-6
-60
8
33
9
10
The highest net flow score in the table (‘sum’) is obtained by CCSPX-2. This is the best
algorithm in direct comparisons with other ones, winning 33 and losing none. It is statistically
better than MUT, NCPX and UAX on 6 instances, including the two largest ones (024X30-1319,
023X30-1260). After a substantial gap, UAX, NCPX and MUT follow, having almost the same
score. MSLS is last, winning no comparison and losing 15 to each version of the MA.
The results of the configurations without mutation reveal the strength of the recombination
operators. The excess of NCPX and UAX jumps to 8% and more without RSM, indicating that
without mutation these operators are hardly able to generate better solutions than MSLS. From
this point of view especially UAX seems to be unable to generate such solutions; NCPX is slightly
better. Conversely, CCSPX-2 without mutation is worse than with RSM, but by less than 1%.
This indicates that the crossover is able to improve over MSLS, by approximately 3%, on average.
While looking at the best results of the MAs (the chart on the right in figure 9.17), one can
see that the quality of solutions improves by additional 0.8–1.2% compared to the averages. The
winner among the tested algorithms is the same: CCSPX-2 with RSM. MUT comes as second,
then NCPX and UAX. Again, the impact of mutation on algorithms is clear: CCSPX-2 loses only
1.2% without RSM, while NCPX loses 2.5% and UAX 3.8%, being almost equal to MSLS.
The best-known solutions
Basic versions of the MA are able to generate best-known solutions on 5–6 out of 19 instances.
The actual numbers are shown in table 9.7. They are based on tables B.3 and B.4.
Among the 5 instances common to all best runs there are all 3 instances of type 60, for which
the MA is not required.
CCSPX-2 is better than other algorithms in the ‘best’ column by one instance; it once generated
a solution better than the best in the challenge, on instance 048X30-0519 (see table B.4).
The hardest instances
For 2/3 of the considered instances the MAs are able to generate very good solutions. However,
for 5–6 instances some of the algorithms are not able to get closer than 10% to the best challenge
192
Table 9.7: The number of instances for which each basic version of the MA found
the best-known solutions in long runs. Best: the best run of 15 for a version; all:
all runs of a version.
MA version
best
all
5
5
6
5
5
4
5
5
5
5
MSLS
MUT
CCSPX-2
NCPX
UAX
solution. The average results on these instances are shown in figure 9.18
32%
average excess [%]
28%
24%
20%
16%
12%
8%
4%
0%
024X30-1319
023X30-1260
025X00-0996
034X30-0921
029X30-0780
064X30-0273
MA type
MSLS
MUT
CCSPX-2
NCPX
UAX
Figure 9.18: Average quality for the instances hardest for the MA; long runs.
It can be seen that the application of ‘genetic’ operators improves the results of MSLS considerably on these instances, in two cases even by more than 20%. CCSPX-2 generates the best
results in 4 cases, MUT and UAX in one each.
Quality of solutions: summary
All these results suggest that, on average, RSM mutation seems to be the most important operator
for the tested MAs, and that crossover is only a helper operator. NCPX and UAX seem to be
almost redundant when RSM is employed. CCSPX-2, when coupled with RSM, generates the best
overall results, so its presence in the algorithm is important.
Time of computation: basic MA
The average time of computation of the basic MA versions, as a function of instance size, is shown
in figure 9.19
In the figure the lines of power regression are also shown. In all the presented cases the values
of r2 are larger than 0.77, which indicates a fairly good fit. The quality of regression is rather
worse than in the CVRP case, though. There are two causes of this fact. First is the presence
of instances of type 60, which have unusually short running times. Second is the number of ratio
constraints, which vary across instances and was found to be an important factor influencing the
193
12000
average time [s]
10000
8000
6000
4000
2000
0
0
250
500
750
1000
1250
instance size
MSLS
MUT
CCSPX-2
NCPX
UAX
r-MSLS
r-MUT
r-CCSPX2
r-NCPX
r-UAX
Figure 9.19: Average time of computation of the basic MA versions, together with
lines of power regression; long runs.
time of computation. 16–28% of variance in the average time can be attributed to this number.
Nevertheless, the author decided to form conclusions based on the lines presented in figure 9.19.
It can be observed in the figure that MSLS usually stops first. MUT is a bit more timeconsuming version of the algorithm, but in some rare cases it takes less time than MSLS. The
third version, CCSPX-2, takes around twice as much time as MUT, but only 68% of UAX and
53% of NCPX, on average.
The running times of the MA are generally quite long. The maximum time for CCSPX-2 is
approx. 4 h, while NCPX and UAX may take even 6 h for the largest instances.
Impact of RSM mutation
The average time of computation for the basic and noMutation configurations of the MA are shown
in figure 9.20, together with lines of power regression.
While with RSM mutation NCPX and UAX take considerably more time than CCSPX-2 to
converge, without it the times are more or less comparable. Therefore, it may be said that RSM
has much stronger impact on the MA in case of NCPX and UAX than on CCSPX-2.
Generally, the presence of RSM mutation seems to substantially increase the algorithm’s ability
to generate new good solutions for the population.
’Genetic’ operators: effort of local search
The author wanted to see what the efficiency of the ‘genetic’ operators actually is, e.g. how much
effort does local search need to optimise offspring they generate. Thus, the average number of LS
iterations per MA generation for several instances was analysed. Two representative plots of these
numbers are shown in figure 9.21. Each line in the plots is an average over 15 independent runs,
further processed with a moving average (window size 10) in order to smooth the lines and make
them more readable.
One can see in the plots that it is MSLS which requires the largest number of LS iterations per
generation to be performed. This is an obvious result, since random solutions are starting points
for LS in this algorithm.
Quite surprisingly, UAX alone (‘UAX-noMut’ series) generates solutions which require comparable LS effort, like they were random solutions. Moreover, the process of computation stops
194
24000
24000
21000
21000
average NCPX time [s]
average CCSPX-2 time [s]
2
18000
15000
2
R = 0,7746
12000
9000
6000
2
R = 0,7476
3000
R = 0,801
18000
15000
12000
9000
2
R = 0,8361
6000
3000
0
0
0
250
500
750
1000
1250
0
250
500
instance size
750
1000
1250
instance size
24000
2
R = 0,7971
average UAX time [s]
21000
18000
15000
12000
9000
6000
2
R = 0,8022
3000
0
0
250
500
750
1000
1250
instance size
Figure 9.20: Average time of computation: CCSPX-2, NCPX (top) and UAX
(bottom); long runs. Basic MA: squares, solid line; noMutation MA: diamonds,
dotted line.
almost as soon as in the case of MSLS. Hence, UAX appears to be a highly disruptive operator.
NCPX is slightly better from this point of view. For the majority of instances it behaves in a
way similar to UAX (like for instance 048X30-0519 in figure 9.21). But in some rare cases (like
instance 025X00-0996) it is able, after a number of generations, to generate offspring which are
closer to local optima in terms of LS iterations than those of UAX. Nevertheless, the offspring of
NCPX still require considerable LS effort to become local optima.
RSM mutation generates solutions much closer to local optima in these terms, thus saving much
computational effort. It does so from the very beginning of the computation and the number of
iterations stays at a constant level until the end. This end of artificial evolution comes much later
than for UAX and NCPX, meaning that mutation is much longer able to introduce some new
solutions to the population.
CCSPX-2 alone seems to be the best from this perspective. Moreover, it does not stop before
MUT does, so it is also able to sustain the process of artificial evolution (does not converge earlier).
At the beginning of computation the versions of MA which employ both the mutation and a
crossover generate offspring which simply require the average number of LS iterations after the
mutation and the same crossover. The picture usually changes much later.
The NCPX-RSM pair start decreasing the number of LS iterations per generation after some
1500–2000 generations. The decrease is rather slow and reaches the level comparable with CCSPX2 after some 1000 more generations, when the MA nearly stops. The picture for UAX is quite
similar, with the decreasing trend starting perhaps slightly earlier.
Predictably, these are CCSPX-2 and RSM which are closer to local optima and require less
LS effort. Perhaps a little bit surprisingly, the number of LS iterations increases in case of
025X00-0996, but the trend is extremely slow and the whole process near convergence (the last of
15 runs stops at generation 3260).
195
50
MSLS
40
UAX-noMut
NCPX-noMut
NCPX
30
UAX
MUT
20
CCSPX-2
CCSPX-2-noMut
10
0
0
500
1000
1500
2000
2500
3000
generation
MSLS
MUT
CCSPX-2
NCPX
UAX
CCSPX-2-noMut
NCPX-noMut
UAX-noMut
50
40
MSLS
UAX-noMut
30
NCPX-noMut
NCPX
20
UAX
MUT
10
CCSPX2-noMut
CCSPX-2
0
0
500
1000
1500
2000
2500
3000
generation
MSLS
MUT
CCSPX-2
NCPX
UAX
CCSPX-2-noMut
NCPX-noMut
UAX-noMut
Figure 9.21: Average numbers of LS iterations per generation: 025X00-0996 (top)
and 048X30-0519 (bottom); long runs.
Looking at figure 9.21 one can also confirm the earlier observation that the presence of RSM
mutation hugely influences the ability of MA-NCPX and MA-UAX to sustain the process of computation. Conversely, CCSPX-2 alone can generate new solutions to the population as long as the
mutation.
’Genetic’ operators: successful trials
To see the efficiency of the operators from a different angle, the author also computed the average
percentage of successful trials of each operator in each run. Figure 9.22 presents the percentages
averaged over all runs and instances. Obviously, these percentages cannot be constant during a
run, because the process of computation would not stop otherwise. But such a figure may give a
general idea of how successful each operator is in inserting new solutions into the population.
The figure shows that RSM mutation is the most successful operator. It is able to insert (after
LS) almost 15% of its mutants into the population. The efficiency of RSM is almost the same
when it is coupled with NCPX and UAX. It drops, however, when used with CCSPX-2; instead,
a considerable percentage (5%) of CCSPX-2 offspring is inserted into the population. Compared
to only approx. 1% of successes of NCPX and UAX, CCSPX-2 seems to be much more efficient.
When crossover is a stand-alone operator, the percentage of successes increases. Yet CCSPX-2
remains the best one, with 11% of successful trials; NCPX and UAX are able to insert into the
196
18%
14%
12%
10%
8%
6%
MUT
CCSPX-2
7,55%
0%
8,40%
2%
7,54%
4%
14,74%
successful operations [%]
16%
NCPX
UAX
MA type
crossover
mutation
crossover+mutation
noMutation
Figure 9.22: Average percentage of successful operations: crossovers, mutations,
crossovers and mutations together, crossovers alone (noMutation); long runs.
population only 6–7% of their offspring.
To sum up these results, CCSPX-2 is undoubtedly the best of the 3 tested crossovers in long
runs: it has a higher percentage of successes than NCPX and UAX; it generates better starting points for local search, thus contributing to lower times of computation; it generates better
solutions.
9.10.2
Runs limited by time
The goal of the second experiment was to see the efficiency of the tested operators in MAs given
limited amount of time. Therefore, the same versions and configurations of the MA as in the
previous experiment were run with the limit of 600 s (the limit used in the ROADEF Challenge
2005). Each algorithm was again executed 15 times on 19 set X instances. The same set of
computers was used.
Quality of solutions: aggregated results
In figure 9.23 one can see the aggregated quality of results: the average over instances of the
average excess over the best average challenge solution.
10%
8%
7%
6%
5%
4%
MSLS
MUT
CCSPX-2
NCPX
8,71%
5,85%
7,74%
6,00%
0%
5,29%
1%
5,23%
2%
5,20%
3%
8,75%
average excess [%]
9%
UAX
MA type
basic
noMutation
Figure 9.23: Quality of results: average over instances of the average result in 15
short runs.
197
The figure shows that the worst algorithm is again MSLS. This generally confirms that the
introduction of mutation and crossover generally contributes to the quality of results.
The best MA is the version with mutation alone (5.2% excess), while the one employing RMS
and CCSPX-2 is worse only by a fraction of a percent (5.23% excess). Other MAs with crossover
and mutation produce a bit worse solutions. The worst is NCPX, which comes as a surprise, since
this operator is equipped with some direct heuristic knowledge about the CarSP, contrary to UAX.
The ranking of the basic versions based on the Cochran-Cox tests confirms the above conclusions. The ranking is: MUT (34/-2), CCSPX-2 (32/-2), UAX (13/-15), NCPX (12/-20), MSLS
(0/-52). The direct comparison of the best and the second-best algorithms, MUT and CCSPX-2,
yields a draw: MUT wins on two instances (034X30-0231, 048X31-0459), while CCSPX-2 on two
other ones (023X30-1260, 064X30-0875); in other cases the results are not statistically different.
The ranking of MAs without mutation is the same as in the previous experiment: CCSPX-2 is
the best and nearly as good as with RSM mutation; NCPX comes as second and slightly improves
over MSLS; UAX is the worst and contributes nothing to the search compared to MSLS.
9.10.3
Quality vs. FDC
60%
60%
50%
50%
40%
40%
r^2 sim_cs
r^2 sim_cs
The relationship between the quality of results generated by MAs and the fitness-distance correlation was also of interest.
As the indicator of quality the average gain of an MA version over MSLS in long runs was
employed, i.e. the difference of excess over the best-known solution for each instance. Values of the
determination coefficient (r2 ) between fitness and simcs were taken from table 9.3 as indicators of
the strength of ‘big valleys’; this similarity measure was directly exploited in CCSPX.
The scatter plots of these two variables for the basic and the noMutation MAs are presented
in figure 9.24.
30%
20%
10%
30%
20%
10%
0%
0%
0%
5%
10%
15%
20%
excess gain over MSLS [%]
MUT
CCSPX-2
NCPX
UAX
25%
0%
5%
10%
15%
20%
25%
excess gain over MSLS [%]
CCSPX-2-noMut
NCPX-noMut
UAX-noMut
Figure 9.24: Gain in quality of MA versions over MSLS versus fitness-distance
determination coefficients for simcs : basic MA (left) and noMutation MA (right).
There is no relationship visible in the presented plots. Surely, there is no linear relationship in
any series of points. This is confirmed by very low values of the linear determination coefficient:
for all series it is well below 0.1. One can see that there are high MA gains both for instances with
2
low and high rcs
, and vice versa.
More importantly, there is no visible trend for MA-CCSPX-2, either in the basic or in the
noMutation configuration. This is the MA employing CCSPX, which was designed based on
results of the fitness-distance analysis.
This lack of the expected direct relationship may be due to the mixture of instances, with
strongly variable sizes, numbers of ratio constraints, types, sources (Renault factories); possibly
198
all these factors influence instance hardness. Also the reference quality of solutions, taken from
MSLS, is a variable that may be influenced by these factors.
9.11
This chapter presented the adaptation of the memetic algorithm to Renault’s car sequencing
problem. Crucial elements of this adaptation were described, such as the choice of a representation,
design of local search, and operators of crossover and mutation.
In particular, the design of ‘genetic’ operators was performed based on results of the fitnessdistance analysis. Before this analysis, several initial hypotheses about properties of solutions
which may influence the objective function were formulated. These properties were expressed in
terms of similarity measures, further employed in the FDA. The analysis confirmed the initial
hypotheses: positions of groups (cars) do not matter in the CarSP, while it is important for good
solutions to contain similar subsequences of groups (correlation of fitness and similarity in terms
of subsequences exists in many instances). Succession relations (adjacency of groups) are also
important for quality, but to a lesser extent than subsequences.
Unfortunately, high FDC with respect to simcs or simcswg is not the property of the CarSP as
a whole. Rather, it is a property of particular instances: type 60, most of type 30, some of types
31 and 01. It is present in none of type 00 instances. Perhaps some more detailed analyses of those
instances could reveal the ‘big valley’ structure, but this is a subject for another investigation.
Anyway, based on positive FDA results for most of the analysed instances, a crossover operator
preserving common parental subsequences of groups was designed, CCSPX-2. Also a mutation disruptive for subsequences was chosen, RSM. These operators were tested in the memetic algorithm
and compared to the operators taken from the literature in two computational experiments.
These experiments revealed that the proposed pair of operators, CCSPX-2 and RSM, was the
best design in terms of several performance indicators. It generated solutions of the best average
quality in long MA runs and generated one new best-known solution. These operators had the
highest probability of generating some new solutions for the population. Moreover, they generated
offspring which required the least LS effort to make them local optima, thus accelerating the
process of artificial evolution to a high extent.
Considered separately, RSM mutation seems to be the most important operator in the designed
MA. On average, it had the largest contribution to the population and the best results in short
runs. CCSPX-2 was the second-best, although for some instances, especially the larger ones,
its operations were indispensable. Moreover, CCSPX-2 was most important in long runs until
convergence, so it seems that this crossover gives rise to a ‘long-runner’ MA.
The attempt to relate the quality of results of the designed MAs to the results of FDA failed.
No relationship between the chosen indicators of quality and FDC for simcs was found. Perhaps
too many factors influencing solution quality were left uncontrolled in the conducted experiments.
To summarise, the method of systematic construction of recombination operators based on
fitness-distance analysis gave a good result for Renault’s CarSP. The designed operators, preserving
or disturbing subsequences of cars, are the best operators of this kind proposed so far.
Chapter 10
Conclusions
10.1
Summary of motivation
Metaheuristics are not algorithms in the strict sense of the word. To be practically applicable,
metaheuristics have to be adapted to the considered optimisation problem. This adaptation requires that certain components have to be chosen or designed (chapter 2).
A universal choice or design of such algorithmic components for all possible problems does not
exist, as the No Free Lunch theorems indicate. Rather, the design must be based on properties of
the considered problem. If the components do not include problem-specific knowledge, the adapted
metaheuristic may in effect become a black-box search algorithm, threatened by the No Free Lunch
result. The memetic algorithm considered in this thesis is no exception (chapter 3).
Yet at the moment there are no clear design guidelines for components of metaheuristics available. This state of the art is openly complained about in the literature. In particular, the lack of
such clear guidelines was demonstrated in case of evolutionary algorithms (chapter 4).
However, a detailed review of designs for several problems revealed that efficient designs of
components of evolutionary algorithms were sometimes based on observed similarities of good
solutions to those problems. This led several authors to analysing fitness landscapes of those
problems in search of properties which could be exploited in designed components (chapter 4).
Fitness-distance correlation is one of such properties. Many authors believe that its presence in
a considered problem may be exploited in the memetic algorithm by means of distance-preserving
crossover operators. This belief was to some extent confirmed qualitatively by several existing good
operator designs. Hence came the idea of the scheme of adaptation of the memetic algorithm: the
construction of distance-preserving crossover operators when fitness-distance correlation is found
in the considered problem (chapter 5).
10.2
Contribution of the thesis
This was also the source of the main hypothesis of this work and of its main goal (chapter 1): to
perform this scheme of adaptation on the capacitated vehicle routing problem (chapter 6) and the
car sequencing problem (8). The goal was achieved, as documented in chapters 7 and 9.
The following elements form the core of the author’s original contribution presented in these
chapters.
• The definition and implementation of distance/similarity measures appropriate to the analysed problems: de , dpn , dpc for the CVRP and simcs , simcsuc for the CarSP.
199
200
Conclusions
• The fitness-distance analysis of these problems. It revealed that local optima of the problems are to some extent concentrated in the landscapes and that significant fitness-distance
correlations exist in majority of the analysed instances.
• The construction of distance-preserving crossovers and distance-disturbing mutations based
on the results of the FDA. These operators are: CPX2, CEPX, CECPX, GCECPX2 and
CPM for the CVRP; CCSPX-2 and RSM for the CarSP.
• The experimental comparison of the designed operators with similar ones proposed in the
literature. It showed that the designed operators (CEPX with CPM; CCSPX-2 with RSM)
generate the best solutions in long runs until convergence. These operators may not be the
best choice for short runs, but still generate solutions of very good quality.
Therefore, it may be said that the method of systematic construction of crossover operators
based on fitness-distance analysis gave a good result for the two considered problems. Together
with the cases analysed earlier by other authors, this thesis makes the basis for the practical
applicability of this scheme much firmer.
The author also contributed to the area of fitness-distance analysis (chapter 5).
• The review of fitness-distance analyses performed in the past is most probably the broadest
review of this kind currently available in the literature. This may be a valuable source of
references for researchers interested in the subject.
• The new version of the method for computing FDC proposed by the author is likely to have
better statistical and practical properties than methods proposed earlier.
10.3
Perspectives for further work
With the completion of this thesis the author perceives some open research issues more clearly.
The issue of what fitness-distance correlation actually is still remains open to some extent.
Currently, FDC is simply a descriptive statistic of some trend observed in the analysed landscape.
There is no appropriate mathematical model of this property. Moreover, the practical significance
of the observed trend is determined in an arbitrary way (this thesis is no exception). The existence
of some statistical tests resolves no problem: there is a difference between practical and statistical
significance. Thus, some proper model of FDC needs to be established.
The existence of FDC in a space of solutions depends on the problem, the employed distance
measure and, most importantly, the analysed instance. This result of the thesis confirms some
earlier observations, as indicated in the review of FDAs. From this perspective, it is interesting to
find what the conditions for FDC to exist are. In which types of instances may the correlation be
found? How are these types related to real-world instances?
One of the most important open issues is the relationship between FDC and efficiency of
algorithms which are supposed to exploit it during search. At the moment the existing arguments
in favour of such a relationship are mainly qualitative. These are simply the existing good designs
of memetic algorithms based on fitness-distance analysis or some of its elements. The existence
of this relationship was not confirmed in this thesis, either. This was most probably due to the
fact that too many factors possibly influencing instance hardness were left uncontrolled in the
conducted experiments. Some more appropriate analysis of the relationship between FDC and
efficiency of algorithms would be very welcome.
The verification of the FDCs discovered in this thesis would also be welcome. The analyses presented here were conducted using the approximate method, which does not involve global optima.
10.3. Perspectives for further work
201
This is more practical than the basic approach, but since the obtained results are approximate,
they should be verified if only global optima of the considered instances are known. Luckily, these
are known for 4 analysed instances of the CVRP. Some instances of the CarSP are most probably
already solved to optimality, as well.
The introduced approximate version of the fitness-distance analysis should also be evaluated
on problems with global optima in order to verify its predicted properties.
There are still some combinatorial optimisation problems for which fitness-distance analysis
was not considered. Therefore, appropriate distance/similarity measures should be defined and the
landscapes analysed. Such analyses can provide important insight into properties of the landscapes,
e.g. the location of global optima relative to other solutions.
Finally, in the author’s opinion it is important to clarify the relationship between the No Free
Lunch theorems and practical problems of optimisation: are the theorems applicable to them or
not? Some very recent work by He et al. (2007) proves that in the black-box environment the
prediction of hardness of the given problem instance is impossible. This result would apply to the
FDC when used in such a prediction, but it is still unclear whether the black-box assumption is
true for practical optimisation problems. The author’s guess is that this assumption does not hold
for NP-hard problems.
The resolution of these listed open issues will most likely provide deeper understanding of
properties of hard problems of combinatorial optimisation. Hopefully, this will result in better
justified designs of even more efficient metaheuristics.
Appendix A
Names of instances of the car sequencing
problem
Original names of instances, as used in the ROADEF Challenge 2005 (Cung 2005b), are difficult
to manage. These are long strings, with some information being redundant. They probably
have meaning to Renault, but for the competitors there were simply labels. Moreover, they were
probably encoded in some way, to disguise the origin (e.g. factory location). Therefore, the author
of this text proposes a mapping of the original challenge names to some more manageable ones.
The proposed name always consists of 11 characters in the form OOOSWD-NNNN, where:
• OOO is the first 3-character part of the original name, usually being a number.
• S is one character describing the set from which the instance originated. It is one of the
values: A, B, X, T.
• W is one decimal digit indicating the vector of weights of an instance.
• D is equal to the HPRCs difficulty bit dif (see 8.2).
• NNNN is a four-digit number being the size of an instance, i.e. the number of cars of the
current day.
The fields OOO and S are employed to facilitate the use of the map: from the original name to
the proposed one and back.
The W field encodes the vector of weights w in an instance (see section 8.2). To encode w it is
enough to see that the value of wP CC unambiguously indicates the whole vector. wP CC is always
of the form 10W , with W ∈ {0, 3, 6}. Hence the value of W.
The pair of fields WD encodes the general type of an instance (see section 8.4).
Only set X instances are presented in this map. Instances from other set may have their names
easily transformed by the presented rule.
Table A.1: Map of instance names.
Set
Original name (ROADEF)
Proposed name
X
022 RAF EP ENP S49 J2
023 EP RAF ENP S49 J2
022X60-0704
023X30-1260
024X30-1319
X
X
203
204
Names of instances of the car sequencing problem
Table A.1: Map of instance names (continued).
Set
Original name (ROADEF)
Proposed name
X
025 EP ENP RAF S49 J1
028 CH1 EP ENP RAF S50 J4
028 CH2 EP ENP RAF S51 J1
034 VP EP RAF ENP S51 J1 J2 J3
034 VU EP RAF ENP S51 J1 J2 J3
035 CH1 RAF EP S50 J4
035 CH2 RAF EP S50 J4
039 CH1 EP RAF ENP S49 J1
655 CH1 EP RAF ENP S51 J2 J3 J4
655 CH2 EP RAF ENP S52 J1 J2 S01 J1
025X00-0996
028X00-0325
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
028X00-0065
029X30-0780
034X30-0921
034X30-0231
035X60-0090
035X60-0376
039X30-1247
039X30-1037
048X30-0519
048X31-0459
064X30-0875
064X30-0273
655X30-0264
655X30-0219
Appendix B
Detailed results of memetic algorithms
In the tables that follow, the symbol of plus (+ ) beside solution quality for an instance means that
the quality is equal to the best-known solution for this instance. The symbol of asterisk (∗ ) means
that the quality of the generated solution is better than that of the best-known.
205
best-known
524.61
835.26
826.14
819.56
1042.11
1028.42
1291.29
241.97
1162.96
1618.36
1344.62
1291.01
1365.42
2041.34
1939.90
1406.20
1581.25
3055.23
2656.47
2341.84
2645.39
24431.44
instance
c50
c75
c100
c100b
c120
c150
c199
f71
f134
tai75a
tai75b
tai75c
tai75d
tai100a
tai100b
tai100c
tai100d
tai150a
tai150b
tai150c
tai150d
tai385
best
+
524.61
835.26
827.39
+
819.56
1042.12
1029.16
1306.64
+
241.97
+
1162.96
+
1618.36
+
1344.62
+
1291.01
+
1365.42
2047.90
1940.61
+
1406.20
1585.07
+
3055.23
2727.67
2361.62
2659.63
24540.51
+
avg
524.61
837.01
829.01
+
819.56
1043.22
1037.34
1313.25
+
241.97
1163.15
1618.93
1344.89
+
1291.01
+
1365.42
2064.77
1940.81
1411.83
1595.72
3056.71
2732.07
2376.61
2667.64
24633.01
+
CPX2
0.00
2.41
1.34
0.00
0.66
4.32
4.48
0.00
0.22
0.94
0.30
0.00
0.00
10.89
0.51
4.24
4.19
1.49
3.62
10.83
3.31
42.00
dev
best
524.61
835.32
+
826.14
+
819.56
1042.12
1031.07
1300.79
+
241.97
+
1162.96
+
1618.36
+
1344.62
+
1291.01
+
1365.42
2047.90
1940.61
+
1406.20
1586.33
+
3055.23
2727.96
2362.56
2663.29
24529.96
+
avg
524.61
837.00
828.80
+
819.56
1042.40
1037.80
1311.48
+
241.97
1163.10
1618.72
1344.98
+
1291.01
+
1365.42
2062.75
1940.66
1413.08
1596.31
3059.16
2733.10
2364.83
2668.31
24613.77
+
CEPX
0.00
2.61
1.26
0.00
0.47
4.15
5.87
0.00
0.14
1.36
0.30
0.00
0.00
10.90
0.04
3.92
2.95
7.47
3.84
2.03
2.02
50.14
dev
best
524.61
835.77
827.85
+
819.56
1042.97
1031.96
1306.25
+
241.97
+
1162.96
+
1618.36
+
1344.62
+
1291.01
+
1365.42
2071.52
1940.61
+
1406.20
1596.97
+
3055.23
2727.99
2362.56
2669.26
24540.33
+
Table B.1: Quality of results of the basic MA version in long runs for the CVRP:
CPX2, CEPX, CECPX2.
avg
524.61
837.81
829.71
+
819.56
1043.63
1040.96
1314.96
+
241.97
+
1162.96
1618.84
1344.87
+
1291.01
+
1365.42
2072.69
1940.65
1413.93
1596.98
3057.11
2736.65
2379.25
2669.31
24627.54
+
CECPX2
0.00
2.61
0.96
0.00
0.37
4.69
5.71
0.00
0.00
1.22
0.29
0.00
0.00
2.32
0.04
3.86
0.04
2.78
5.41
16.35
0.14
63.78
dev
206
best-known
524.61
835.26
826.14
819.56
1042.11
1028.42
1291.29
241.97
1162.96
1618.36
1344.62
1291.01
1365.42
2041.34
1939.90
1406.20
1581.25
3055.23
2656.47
2341.84
2645.39
24431.44
instance
c50
c75
c100
c100b
c120
c150
c199
f71
f134
tai75a
tai75b
tai75c
tai75d
tai100a
tai100b
tai100c
tai100d
tai150a
tai150b
tai150c
tai150d
tai385
best
524.61
835.77
827.39
+
819.56
1042.12
1031.10
1304.58
+
241.97
+
1162.96
+
1618.36
+
1344.62
+
1291.01
+
1365.42
2071.52
1940.61
1406.87
1596.96
+
3055.23
2732.52
2362.86
2663.24
24563.92
+
avg.
524.61
838.41
829.32
+
819.56
1043.58
1040.37
1315.07
243.13
1163.04
1621.62
1344.98
+
1291.01
+
1365.42
2074.61
1940.67
1413.14
1598.38
3059.59
2738.00
2384.87
2667.17
24657.98
+
GCECPX2
0.00
3.10
1.07
0.00
0.74
3.53
6.14
1.68
0.11
2.93
0.31
0.00
0.00
2.89
0.06
3.79
1.62
7.48
3.64
14.64
3.16
56.56
dev.
best
524.61
835.32
827.39
+
819.56
1042.97
1035.22
1308.77
+
241.97
+
1162.96
+
1618.36
+
1344.62
+
1291.01
+
1365.42
2071.52
1940.61
+
1406.20
1596.97
3055.27
2732.46
2366.27
2663.21
24575.69
+
avg.
524.61
839.41
829.82
+
819.56
1043.58
1040.33
1317.36
242.34
1163.14
1621.98
1344.82
+
1291.01
1365.68
2073.91
1941.02
1415.48
1598.87
3064.23
2738.68
2395.93
2668.99
24664.09
+
RBX
0.00
2.61
1.11
0.00
0.44
3.18
4.73
0.94
0.23
2.69
0.26
0.00
0.25
2.79
1.19
2.87
3.46
10.48
4.77
19.61
6.53
50.63
dev.
best
524.61
835.32
+
826.14
+
819.56
1042.12
1031.07
1305.73
+
241.97
+
1162.96
+
1618.36
+
1344.62
+
1291.01
+
1365.42
2047.90
1940.61
1406.87
1596.97
+
3055.23
2727.78
2362.75
2661.72
24504.19
+
Table B.2: Quality of results of the basic MA version in long runs for the CVRP:
GCECPX2, RBX, SPX.
avg.
524.61
837.27
828.84
+
819.56
1043.17
1039.08
1312.05
242.32
+
1162.96
+
1618.36
1344.90
+
1291.01
1365.45
2070.31
1940.79
1413.60
1597.48
3060.05
2732.02
2374.49
2668.01
24606.86
+
SPX
0.00
2.78
1.35
0.00
1.01
4.18
3.99
0.87
0.00
0.00
0.32
0.00
0.12
6.11
0.56
3.37
0.75
7.77
2.28
15.94
2.57
54.84
dev.
207
best average
3.0
5010000.0
153034000.0
8087035.8
30000.0
37000.0
36341495.4
6056000.0
31077916.2
197005.6
12002003.0
110298.4
61187229.8
55994.8
160407.6
231030.0
69239.0
192466.0
337006.0
instance
028X00-0065
035X60-0090
655X30-0219
034X30-0231
655X30-0264
064X30-0273
028X00-0325
035X60-0376
048X31-0459
048X30-0519
022X60-0704
029X30-0780
064X30-0875
034X30-0921
025X00-0996
039X30-1037
039X30-1247
023X30-1260
024X30-1319
3
5010000
153039000
8095038
+
30000
47000
36416093
+
6056000
31105011
207046
+
12002003
136004
61234058
70627
179559
240072
69299
239037
414004
+
+
best
3.0
5010000.0
153040133.3
8096643.2
+
30000.0
48866.7
37685092.3
+
6056000.0
31107405.3
208241.1
12002003.7
139673.0
61237522.3
72282.5
186890.8
241608.3
69317.3
244179.6
419537.8
+
+
avg.
MSLS
0.0
0.0
805.5
949.0
0.0
1024.2
574255.4
0.0
1949.9
741.8
0.7
2053.4
2185.7
939.4
4903.7
1083.0
8.4
2187.6
2186.6
dev.
3
5010000
153035000
8090034
+
30000
39000
36346088
+
6056000
31091031
198016
+
12002003
116003
61213051
67631
161507
232055
69273
221048
386009
+
+
best
avg.
+
3.0
5010000.0
153036466.7
8091045.1
+
30000.0
40733.3
36356554.4
+
6056000.0
31092657.9
199339.8
+
12002003.0
118470.3
61217655.7
70731.1
166317.5
234181.5
69293.5
227108.3
392005.6
+
MUT
Table B.3: Quality of results of the basic MA version in long runs for the CarSP:
MSLS, MUT.
0.0
0.0
1147.0
892.8
0.0
1388.8
5596.7
0.0
1203.6
865.8
0.0
1258.1
2468.8
1621.5
2568.1
1016.9
12.4
3211.9
3384.8
dev.
208
best average
3.0
5010000.0
153034000.0
8087035.8
30000.0
37000.0
36341495.4
6056000.0
31077916.2
197005.6
12002003.0
110298.4
61187229.8
55994.8
160407.6
231030.0
69239.0
192466.0
337006.0
instance
028X00-0065
035X60-0090
655X30-0219
034X30-0231
655X30-0264
064X30-0273
028X00-0325
035X60-0376
048X31-0459
048X30-0519
022X60-0704
029X30-0780
064X30-0875
034X30-0921
025X00-0996
039X30-1037
039X30-1247
023X30-1260
024X30-1319
3
5010000
153035000
8089046
+
30000
38000
36348083
+
6056000
31088106
∗
196986
+
12002003
115003
61212055
67593
161506
231031
69275
219045
380005
+
+
best
3.0
5010000.0
153036200.0
8090773.6
+
30000.0
40466.7
36358154.7
+
6056000.0
31091802.7
198475.8
+
12002003.0
116803.8
61213990.2
70998.7
164982.1
232168.3
69294.8
224576.7
386005.3
+
+
avg.
CCSPX-2
0.0
0.0
541.6
1060.6
0.0
1203.7
5067.1
0.0
1901.9
717.0
0.0
1274.7
1521.4
1768.8
2219.6
717.7
11.2
2678.5
2582.0
dev.
3
5010000
153035000
8089047
+
30000
39000
36343084
+
6056000
31090060
197995
+
12002003
116003
61215064
67612
163516
232030
69278
222051
387007
+
+
best
avg.
+
3.0
5010000.0
153036333.3
8091578.6
+
30000.0
40133.3
36357618.3
+
6056000.0
31092665.9
199411.2
+
12002003.0
117670.2
61218256.6
70984.2
167189.0
233437.3
69294.7
227843.0
390471.9
+
NCPX
0.0
0.0
788.8
1147.4
0.0
1203.7
6345.1
0.0
1843.2
885.1
0.0
1247.0
2101.2
1949.4
1811.0
1144.0
8.4
2734.6
2247.3
dev.
+
+
best
3
5010000
153035000
8089058
+
30000
39000
36350088
+
6056000
31089066
198002
+
12002003
116003
61215051
68602
164518
232030
69279
222037
384005
Table B.4: Quality of results of the basic MA version in long runs for the CarSP:
CCSPX-2, NCPX, UAX.
avg.
+
3.0
5010000.0
153036333.3
8091375.9
+
30000.0
39933.3
36423154.4
+
6056000.0
31092323.7
199471.1
+
12002003.0
117938.5
61217123.1
71198.3
167655.0
233246.9
69300.3
226844.2
389605.1
+
UAX
0.0
0.0
596.3
939.7
0.0
771.7
245628.1
0.0
1836.6
954.8
0.0
1651.0
1947.6
1351.9
2336.6
1107.5
11.4
2663.8
2823.1
dev.
209
Bibliography
Aarts, E., Korst, J. H. M. & van Laarhoven, P. J. M. (2003), Simulated annealing, in Aarts & Lenstra
(2003b), chapter 4.
Aarts, E. & Lenstra, J. K. (2003a), Introduction, in E. Aarts & J. K. Lenstra, eds, ‘Local search in
combinatorial optimization’, chapter 1.
Aarts, E. & Lenstra, J. K., eds (2003b), Local search in combinatorial optimization, Princeton University
Press.
Alba, E. & Dorronsoro, B. (2004), Solving the vehicle routing problem by using cellular genetic
algorithms, in J. Gottlieb & G. R. Raidl, eds, ‘Evolutionary Computation in Combinatorial
Optimization’, Vol. 3004 of LNCS, pp. 11–20.
Alba, E. & Dorronsoro, B. (2006), ‘Computing nine best-so-far solutions for capacitated vrp with a
cellular genetic algorithm’, Information Processing Letters (98), 225–230.
Altenberg, L. (1995), The Schema Theorem and Price’s Theorem, in D. Whitley & M. Vose, eds,
‘Foundations of Genetic Algorithms 3’, Morgan Kaufmann, San Francisco, pp. 23–49.
Altenberg, L. (1997), Fitness distance correlation analysis: an instructive counterexample, in T. Baeck,
ed., ‘Proceedings of the Seventh International Conference on Genetic Algorithms (ICGA97)’,
Morgan Kaufmann, San Francisco.
Altinel, I. K. & Oncan, T. (2005), ‘A new enhancement of the Clarke and Wright savings heuristic for
the capacitated vehicle routing problem’, Journal of the Operational Research Society 56, 954–961.
Aronson, L. D. (1996), Algorithms for vehicle routing - a survey, Technical Report DUT-TWI-96-21,
Delft University of Technology, The Netherlands.
Błażewicz, J. (1988), Złożonośc obliczeniowa problemów kombinatorycznych, Wydawnictwa
Naukowo-Techniczne, Warszawa. (In Polish).
Baker, B. M. & Ayechew, M. A. (2003), ‘A genetic algorithm for the vehicle routing problem’, Computers
and Operations Research 30, 787–800.
Beck, J. C. & Watson, J.-P. (2003), Adaptive search algorithms and fitness-distance correlation, in
‘MIC’2003 – 5th Metaheuristics International Conference’, Kyoto, Japan.
Bentley, J. L. (1990), Experiments on travelling salesman heuristics, in ‘Proceedings of the first annual
ACM-SIAM symposium on discrete algorithms’, pp. 91–99.
Berger, J. & Barkaoui, M. (2003), ‘A new hybrid genetic algorithm for the capacitated vehicle routing
problem’, Journal of the Operational Research Society 54, 1254–1262.
Bierwirth, C., Mattfeld, D. C. & Kopfer, H. (1996), On permutation representations for scheduling
problems, in H.-M. Voigt, W. Ebeling, I. Rechenberg & H.-P. Schwefel, eds, ‘Parallel Problem
Solving from Nature IV’, Vol. 1141 of LNCS, Springer, pp. 310–318.
Bierwirth, C., Mattfeld, D. C. & Watson, J.-P. (2004), Landscape regularity and random walks for the
job shop scheduling problem, in J. Gottlieb & G. R. Raild, eds, ‘Evolutionary Computation in
Combinatorial Optimization’, Vol. 3004 of LNCS, Springer, pp. 21–30.
211
212
Bibliography
Boese, K. D. (1995), Cost versus distance in the traveling salesman problem, Technical Report
TR-950018, UCLA CS Department.
Boese, K. D., Kahng, A. B. & Muddu, S. (1994), ‘A new adaptive multi-start technique for combinatorial
global optimization’, Operations Research Letters 16(2), 101–113.
Bonissone, P. P., Subbu, R., Eklund, N. & Kiehl, T. R. (2006), ‘Evolutionary algorithms + domain
knowledge = real-world evolutionary computation’, IEEE Transactions on Evolutionary
Computation 10(3), 256–280.
Boryczka, U., Skinderowicz, R. & Świstowski, D. (2006), Comparative study: ACO and EC for TSP, in
J. Arabas, ed., ‘Evolutionary Computation and Global Optimization 2006’, Oficyna Wydawnicza
Politechniki Warszawskiej, Murzasichle, Poland.
Bronshtein, I. N., Semendyayev, K. A., Musiol, G. & Muehlig, H. (2004), Handbook of Mathematics,
Springer-Verlag.
Burke, E. K., Kendall, G. & Soubeiga, E. (2003), ‘A tabu search hyperheuristic for timetabling and
rostering’, Journal of Heuristics 9, 451–470.
Burke, E., Kendall, G., Nevall, J., Hart, E., Ross, P. & Schulenburg, S. (2003), Hyper-heuristics: an
Emerging Direction in Modern Search Technology, in Glover & Kochenberger (2003).
Cheng, J., Lu, Y., Puskorius, G., Bergeon, S. & Xiao, J. (1999), Vehicle sequencing based on
evolutionary computation, in ‘Proceedings of the 1999 Congress on Evolutionary Computation’,
Vol. 2, Washington, USA, pp. 1207–1214.
Clarke, G. & Wright, J. (1964), ‘Scheduling of vehicles from a central depot to a number of delivery
points’, Operations Research 12, 568–582.
Cochran, W. G. (1953), Sampling Techniques, John Wiley and Sons, New York.
Coffman, Jr, E. G., ed. (1976), Computer and Job-Shop Scheduling Theory, John Wiley and Sons.
Cormen, T. H., Leiserson, C. E. & Rivest, R. L. (1990), Introduction to Algorithms, Massachusetts
Institute of Technology.
Cotta, C. & Fernández, A. J. (2005), Analyzing fitness landscapes for the optimal Golomb ruler problem,
in Raidl & Gottlieb (2005), pp. 68–79.
Cotta, C. & van Hemert, J., eds (2007), Evolutionary Computation in Combinatorial Optimization, Vol.
4446 of LNCS, Springer Verlag.
Culberson, J. C. (1998), ‘On the futility of blind search: an algorithmic view of ‘no free lunch”,
Evolutionary Computation 6(2), 109–127.
Cung, V.-D. (2005a), ‘Personal communication’.
Cung, V.-D. (2005b), ‘Roadef challenge 2005 webpage’,
http://www.prism.uvsq.fr/vdc/ROADEF/CHALLENGES/2005/challenge2005 en.html. accessed
September 2008.
Dorigo, M. & Stutzle, T. (2003), The Ant Colony Optimization Metaheuristic: Algorithms, Applications,
and Advances, in Glover & Kochenberger (2003), chapter 9.
Duda, R. O., Hart, P. E. & Stork, D. G. (2001), Pattern classification, John Wiley and Sons.
Estellon, B., Gardi, F. & Nouioua, K. (2006), ‘Large neighbourhood improvements for solving car
sequencing problems’, RAIRO Operations Research 40, 355–379.
Falkenauer, E. (1998), Genetic algorithms and grouping problems, John Wiley and Sons.
Ferguson, G. A. & Takane, Y. (1989), Statistical Analysis in Psychology and Education, Mc Graw-Hill.
213
Finger, M., Stutzle, T. & Lourenco, H. (2002), Exploiting fitness distance correlation of set covering
problems, in S. C. et al., ed., ‘Applications of Evolutionary Computing’, Vol. 2279 of LNCS,
Springer, pp. 61–71.
Freisleben, B. & Merz, P. (1996), New genetic local search operators for the travelling salesman problem,
in H.-M. Voigt, W. Ebeling, I. Rechenberg & H.-P. Schwefel, eds, ‘Parallel Problem Solving from
Nature IV’, Vol. 1141 of LNCS, Springer, pp. 890–899.
Galinier, P. & Hao, J.-K. (1999), ‘Hybrid evolutionary algorithms for graph colouring’, Journal of
Combinatorial Optimization 3, 379–397.
Gendreau, M. (2003), An Introduction to Tabu Search, in Glover & Kochenberger (2003), chapter 2.
Gendreau, M., Laporte, G. & Potvin, J.-Y. (2002), Metaheuristics for the capacitated VRP, in Toth &
Vigo (2002b), chapter 6.
Gent, I. P. (1998), Two results on car-sequencing problems, Technical report, APES-02-1998.
Gent, I. P. & Walsh, T. (1999), CSPLib: a benchmark library for constraints, Technical report,
APES-09-1999. Available from http://www.csplib.org/. A shorter version appears in the
Proceedings of the 5th International Conference on Principles and Practices of Constraint
Programming (CP-99).
Gillet, B. E. & Miller, L. R. (1974), ‘A heuristic algorithm for the vehicle dispatch problem’, Operations
Research 22, 340–349.
Glover, F. & Kochenberger, G. A., eds (2003), Handbook of metaheuristics, Kluwer Academic Publishers.
Glover, F., Laguna, M. & Martí, R. (2003), Scatter Search and Path Relinking: Advances and
Applications, in Glover & Kochenberger (2003), chapter 1.
Goldberg, D. E. (1989), Genetic Algorithms in Search, Optimization, and Machine Learning,
Addison-Wesley Publishing Company.
Gottlieb, J., Puchta, M. & Solnon, C. (2003), A study of greedy, local search and ant colony optimization
approaches for car sequencing problems, in S. C. et al., ed., ‘Applications of Evolutionary
Computing’, Vol. 2611 of LNCS, Springer, pp. 246–257.
Gravel, M., Gagne, C. & Price, W. L. (2005), ‘Review and comparison of three methods for the solution
of the car sequencing problem’, Journal of the Operational Research Society 56, 1287–1295.
Gusfield, D. (1997), Algorithms on Strings, Trees and Sequences: Computer Science and Computational
Biology, Cambridge University Press.
Hammond, M. (2003), ‘Chris Stephens on why EC needs a unifying theory’, EvoNet website
http://evonet.lri.fr/evoweb/news events/news features/article.php?id=216. accessed January 2008.
Hansen, P. & Mladenović, N. (2003), Variable Neighbourhood Search, in Glover & Kochenberger (2003),
chapter 1.
He, J., Reeves, C. R., Witt, C. & Yao, X. (2007), ‘A note on problem difficulty measures in black-box
optimization: classification, realizations and predictability’, Evolutionary Computation
15(4), 435–443.
Henderson, D., Jacobson, S. H. & Johnson, A. W. (2003), The Theory and Practice of Simulated
Annealing, in Glover & Kochenberger (2003), chapter 10.
Hertz, A., Taillard, E. & de Werra, D. (2003), Tabu search, in Aarts & Lenstra (2003b), chapter 5.
Ho, S. C. & Gendreau, M. (2006), ‘Path relinking for the vehicle routing problem’, Journal of Heuristics
12, 55–72.
214
Bibliography
Hoos, H. & Stutzle, T. (2004), Stochastic Local Search: Foundations & Applications, Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA.
Ishibuchi, H., Yoshida, T. & Murata, T. (2003), ‘Balance between genetic search and local search in
memetic algorithms for multiobjective permutation flowshop scheduling’, IEEE Transactions on
Jaszkiewicz, A. (1999), ‘Improving performance of genetic local search by changing local search space
topology’, Foundations of Computing and Decision Sciences 24(2), 77–84.
Jaszkiewicz, A. (2004), Adaptation of the genetic local search algorithm to the management of earth
observation satellites, in ‘Seventh National Conference on Evolutionary Computation and Global
Optimization’, Kazimierz Dolny, Poland, pp. 67–74.
Jaszkiewicz, A. & Kominek, P. (2003), ‘Genetic local search with distance preserving recombination
operator for a vehicle routing problem’, European Journal of Operational Research 151, 352–364.
Jaszkiewicz, A., Kominek, P. & Kubiak, M. (2004), Adaptation of the genetic local search algorithm to a
car sequencing problem, in ‘Seventh National Conference on Evolutionary Computation and Global
Optimization’, Kazimierz Dolny, Poland, pp. 75–82.
Jones, T. & Forrest, S. (1995), Fitness distance correlation as a measure of problem dificulty for genetic
algorithms, in L. J. Eshelman, ed., ‘Proceedings of the 6th International Conference on Genetic
Algorithms’, Morgan Kaufmann, pp. 184–192.
Jozefowiez, N., Semet, F. & Talbi, E.-G. (2007), ‘Target aiming pareto search and its application to the
vehicle routing problem with route balancing’, Journal of Heuristics 13(5), 455–469.
Karoński, M. & Palka, Z. (1977), ‘On Marczewski-Steinhaus type distance between hypergraphs’,
Applicationes Mathematicae 16(1), 47–57.
Kindervater, G. A. P. & Savelsbergh, M. W. P. (2003), Vehicle routing: handling edge exchanges, in
Aarts & Lenstra (2003b), chapter 10, pp. 337–360.
Kirkpatrick, S. & Toulouse, G. (1985), ‘Configuration space analysis of traveling salesman problems’,
Journal de Physique 46, 1277–1292.
Kis, T. (2004), ‘On the complexity of the car sequencing problem’, Operations Research Letters
32, 331–335.
Kominek, P. (2001), Zastosowanie algorytmów metaheurystycznych sterowanych wiedzą do
rozwiązywania złożonych problemów optymalizacji kombinatorycznej, PhD thesis, Poznan
University of Technology, Poland. (In Polish).
Krasnogor, N. & Smith, J. (2005), ‘A tutorial for competent memetic algorithms: model, taxonomy, and
design issues’, IEEE Transactions on Evolutionary Computation 9(5), 474–488.
Krysicki, W., Bartos, J., Dyczka, W., Królikowska, K. & Wasilewski, M. (1998), Rachunek
prawdopodobieństwa i statystyka matematyczna w zadaniach. Część II. Statystyka matematyczna,
PWN, Warszawa. (In Polish).
Kubiak, M. (2002), Genetic local search algorithm for the vehicle routing problem, Master’s thesis,
Poznan University of Technology, Poznan, Poland. (In Polish and English).
Kubiak, M. (2004), ‘Systematic construction of recombination operators for the vehicle routing problem’,
Foundations of Computing and Decision Sciences 29(3), 205–226.
Kubiak, M. (2005), Distance metrics and fitness-distance analysis for the capacitated vehicle routing
problem, in ‘MIC’2005 – 6th Metaheuristics International Conference’, Vienna, Austria,
pp. 603–610.
215
Kubiak, M. (2006), Analysis of distance between vehicle routing problem solutions generated by memetic
algorithms, in J. Arabas, ed., ‘Evolutionary Computation and Global Optimization’, number 156 in
‘Prace Naukowe, Elektronika’, Oficyna Wydawnicza Politechniki Warszawskiej, pp. 223–236.
Kubiak, M. (2007), Distance measures and fitness-distance analysis for the capacitated vehicle routing
problem, Operations Research Computer Science Interfaces, Springer, chapter 18, pp. 345–364.
Kubiak, M., Jaszkiewicz, A., & Kominek, P. (2006), ‘Fitness-distance analysis of a car sequencing
problem’, Foundations of Computing and Decision Sciences 31(3–4), 263–276.
Kubiak, M. & Wesołek, P. (2007), Accelerating local search in a memetic algorithm for the capacitated
vehicle routing problem, in Cotta & van Hemert (2007), pp. 96–107.
Kytojoki, J., Nuortio, T., Braysy, O. & Gendreau, M. (2007), ‘An efficient variable neighbourhood search
heuristic for very large scale vehicle routing problems’, Computers and Operations Research
34, 2743–2757.
Laporte, G. & Semet, F. (2002), Classical heuristics for the Capacitated VRP, in Toth & Vigo (2002b),
chapter 5.
Larose, D. T. (2005), Discovering Knowledge in Data. An Introduction to DATA MINING, John Wiley
and Sons.
Lewis, R. & Paechter, B. (2004), New crossover operators for timetabling with evolutionary algorithms,
in A. Lofti, ed., ‘Proceedings of the 5th International Conference on Recent Advances in Soft
Computing (RASC 2004)’, pp. 189–195.
Lewis, R. & Paechter, B. (2005a), Application of the grouping genetic algorithm to university course
timetabling, in Raidl & Gottlieb (2005), pp. 144–153.
Lewis, R. & Paechter, B. (2005b), An empirical analysis of the grouping genetic algorithm: the
timetabling case, in ‘Genetic and Evolutionary Computation Conference (GECCO)’.
Manly, B. F. J. (1997), Randomization, bootstrap and Monte Carlo methods in biology, Chapman and
Hall, London.
Mantel, N. (1967), ‘The detection of disease clustering and a generalized regression approach’, Cancer
Research 27(1), 209–220.
Marczewski, E. & Steinhaus, H. (1958), ‘On a certain distance of sets and the corresponding distance of
functions’, Colloquium Mathematicum 6, 319–327.
Mattfeld, D. C., Bierwirth, C. & Kopfer, H. (1999), ‘A search space analysis of the job shop scheduling
problem’, Annals of Operations Research 86, 441–453.
Mattiussi, C., Waibel, M. & Floreano, D. (2004), ‘Measures of diversity for populations and distances
between individuals with highly reorganizable genomes’, Evolutionary Computation 12(4), 495–515.
Merz, P. (2000), Memetic Algorithms for Combinatorial Optimization Problems: Fitness Landscapes and
Effective Search Strategies, PhD thesis, University of Siegen, Germany.
Merz, P. (2001), On the performance of memetic algorithms in combinatorial optimization, in ‘Genetic
and Evolutionary Computation Conference (GECCO)’.
Merz, P. (2002), A comparison of memetic recombination operators for the traveling salesman problem,
in ‘Genetic and Evolutionary Computation Conference (GECCO)’.
Merz, P. (2004), ‘Advanced fitness landscape analysis and the performance of memetic algorithms’,
Merz, P. & Freisleben, B. (1999), Fitness landscapes and memetic algorithm design, in D. Corne,
M. Dorigo & F. Glover, eds, ‘New Ideas in Optimization’, McGraw-Hill, chapter 3, pp. 245–260.
216
Bibliography
Merz, P. & Freisleben, B. (2000a), ‘Fitness landscape analysis and memetic algorithms for the quadratic
assignment problem’, IEEE Transactions on Evolutionary Computation 4(4), 159–164.
Merz, P. & Freisleben, B. (2000b), ‘Fitness landscapes, memetic algorithms, and greedy operators for
graph bipartitioning’, Evolutionary Computation 8(1), 61–91.
Michalewicz, Z. (1996), Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag.
Michalewicz, Z. & Fogel, D. B. (2000), How to Solve It: Modern Heuristics, Springer-Verlag.
Moscato, P. & Cotta, C. (2003), A Gentle Introduction to Memetic Algorithms, in Glover &
Kochenberger (2003), chapter 5.
Muhlenbein, H. (1991), Evolution in time and space - the parallel genetic algorithm, in G. J. E. Rawlins,
ed., ‘Foundations of Genetic Algorithms’, Morgan Kaufmann.
Muhlenbein, H. (2003), Genetic algorithms, in Aarts & Lenstra (2003b), chapter 6.
Nagata, Y. (2007), Edge assembly crossover for the capacitated vehicle routing problem, in Cotta & van
Hemert (2007), pp. 142–153.
Nagata, Y. & Kobayashi, S. (1997), Edge assembly crossover: a high-power genetic algorithm for the
traveling salesman problem, in ‘Proceedings of the 7th International Conference on Genetic
Algorithms’, pp. 450–457.
Nguyen, A. (2003), ‘Challenge roadef 2005 car sequencing problem’,
http://www.prism.uvsq.fr/vdc/ROADEF/CHALLENGES/2005/challenge2005 en.html.
Pawlak, G. (2007), ‘Personal communication’. A note during a seminar of the Institute of the Computing
Science, Poznan University of Technology, Poland.
Pawlak, M. (1999), Algorytmy ewolucyjne jako narzędzie harmonogramowania produkcji, Wydawnictwo
Naukowe PWN, Warszawa. (In Polish).
Potvin, J.-Y. & Bengio, S. (1996), ‘The vehicle routing problem with time windows part II: genetic
search’, INFORMS Journal of Computing 8(2), 165–172.
Prins, C. (2001), A simple and effective evolutionary algorithm for the vehicle routing problem, in J. P.
de Sousa, ed., ‘MIC’2001 – 4th Metaheuristics International Conference’, pp. 143–147.
Prins, C. (2004), ‘A simple and effective evolutionary algorithm for the vehicle routing problem’,
Computers and Operations Research 31, 1985–2002.
Puchta, M. & Gottlieb, J. (2002), Solving car sequencing problems by local optimization, in S. C. et al.,
ed., ‘Applications of Evolutionary Computing’, Vol. 2279 of LNCS, Springer.
Qu, R. & Burke, E. K. (2005), Hybrid variable neighbourhood hyperheuristics for exam timetabling
problems, in ‘MIC’2005 – 6th Metaheuristics International Conference’, Vienna, Austria,
pp. 781–786.
Raidl, G. R. & Gottlieb, J., eds (2005), Evolutionary Computation in Combinatorial Optimization, Vol.
3448 of LNCS, Springer-Verlag.
Rao, C. R. (1989), Ramanujan Memorial Lectures. Statistics and Truth. Putting Chance to Work,
Council of Scientific and Industrial Research, New Delhi, India.
Reeves, C. R. (1999), ‘Landscapes, operators and heuristic search’, Annals of Operations Research
86, 473–490.
Reeves, C. R. (2003), Genetic Algorithms, in Glover & Kochenberger (2003), chapter 3.
Reeves, C. R. & Rowe, J. E. (2003), Genetic algorithms: principles and perspectives: a guide to GA
theory, Kluwer Academic Publishers.
217
Reeves, C. R. & Yamada, T. (1998), ‘Genetic algorithms, path relinking, and the flowshop sequencing
problem’, Evolutionary Computation 6(1), 45–60.
Reimann, M., Stummer, M. & Doerner, K. (2002), A savings based ant system for the vehicle routing
problem, in ‘Genetic and Evolutionary Computation Conference (GECCO)’, Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, pp. 1317–1326.
Remde, S., Cowling, P., Dahal, K. & Colledge, N. (2007), Exact/heuristic hybrid using rVNS and
hyperheuristics for workforce scheduling, in Cotta & van Hemert (2007), pp. 188–197.
Resende, M. G. C. & Ribeiro, C. C. (2003), Greedy Randomised Adaptive Search Procedures, in Glover &
Kochenberger (2003), chapter 8.
Ribeiro, C. C., Aloise, D., Noronha, T. F., Rocha, C. & Urrutia, S. (2005), A heuristic for a real-life car
sequencing problm with multiple requirements, in ‘MIC’2005 – 6th Metaheuristics International
Conference’, Vienna, Austria, pp. 799–804.
Robardet, C. & Feschet, F. (2000), A new methodology to compare clustering algorithms, in K.-S. Leung,
L.-W. Chan & H. Meng, eds, ‘Intelligent Data Engineering and Automated Learning - IDEAL 2000.
Data Mining, Financial Engineering, and Intelligent Agents’, Vol. 1983 of LNCS, Springer.
Rochat, Y. & Taillard, E. D. (1995), ‘Probabilistic diversification and intensification in local search for
vehicle routing’, Journal of Heuristics 1, 147–167.
Schiavinotto, T. & Stutzle, T. (2007), ‘A review of metrics on permutations for search landscape
analysis’, Computers and Operations Research 34, 3143–3153.
Schneider, J., Britze, J., Ebersbach, A., Morgenstern, I. & Puchta, M. (2000), ‘Optimization of
production planning problems - a case study for assembly lines’, International Journal of Modern
Physics C 11(5), 949–972.
Schumacher, C., Vose, M. D. & Whitley, L. D. (2001), The no free lunch and problem description length,
in ‘Genetic and Evolutionary Computation Conference (GECCO)’, Morgan Kaufmann, pp. 565–570.
Słowiński, R. (1984), ‘Preemptive scheduling of independent jobs on parallel machines subject to
financial constraints’, European Journal of Operational Research 15(3), 366–373.
Sorensen, K. (2003), Distance measures based on the edit distance for permutation-type representations,
in ‘Genetic and Evolutionary Computation Conference (GECCO)’, Chicago.
Sorensen, K. (2007), ‘Distance measures based on the edit distance for permutation-type
representations’, Journal of Heuristics 13(1), 35–47.
Sorensen, K., Reimann, M. & Prins, C. (2005), Path relinking for the vehicle routing problem using the
edit distance, in ‘MIC’2005 – 6th Metaheuristics International Conference’, Vienna, Austria,
pp. 839–846.
Soubeiga, E. (2003), Development and application of hyperheuristics to personnel scheduling, PhD
thesis, University of Nottingham, United Kingdom.
Taillard, E. (1993), ‘Parallel iterative search methods for vehicle routing problems’, Networks 23, 661–673.
Taillard, E. (2008), ‘Instances of the capacitated vehicle routing problem’,
http://mistic.heig-vd.ch/taillard/problemes.dir/vrp.dir/vrp.html. last access April 2008.
Tavares, J., Pereira, F. B., Machado, P. & Costa, E. (2003), Crossover and diversity: a study about
GVR, in ‘Proceedings of the Analysis and Design of Representations and Operators (ADORO)
workshop, Genetic and Evolutionary Computation Conference (GECCO)’.
Terada, J., Vo, H. & Joslin, D. (2006), Combining genetic algorithms with squeaky-wheel optimization,
in ‘Genetic and Evolutionary Computation Conference (GECCO)’, Seattle, USA.
218
Bibliography
Tezuka, M., Hiji, M., Miyabayashi, K. & Okumura, K. (2000), A new genetic representation and common
cluster crossover for job shop scheduling problem, in S. Cagnoni et al., eds, ‘Real-World
Applications of Evolutionary Computing’, Vol. 1803 of LNCS, Springer, pp. 297–306.
Toth, P. & Vigo, D. (2002a), An overview of vehicle routing problems, in The Vehicle Routing Problem
(Toth & Vigo 2002b), chapter 1.
Toth, P. & Vigo, D., eds (2002b), The Vehicle Routing Problem, SIAM, Philadelphia.
Tuson, A. (2005), ‘Are evolutionary metaphors applicable to evolutionary optimisation?’, Presented at:
14th Young Operational Research Conference (YOR14), Bath, UK.
Warwick, T. & Tsang, E. (1995), ‘Tackling car sequencing problems using a generic genetic algorithm’,
Watson, J.-P. (2005), On metaheuristics ‘failure modes’: a case study in tabu search for job-shop
scheduling, in ‘MIC’2005 – 6th Metaheuristics International Conference’, Vienna, Austria,
pp. 910–915.
Watson, J.-P., Barbulescu, L., Whitley, L. D. & Howe, A. E. (2002), ‘Contrasting structured and random
permutation flow-shop scheduling problems: search-space topology and algorithm performance’,
INFORMS Journal on Computing 14, 98–123.
Watson, J.-P., Beck, J. C., Howe, A. E. & Whitley, L. D. (2003), ‘Problem dificulty for tabu search in
job-shop scheduling’, Artificial Intelligence 143, 189–217.
Weiss, D. (2006), Descriptive clustering as a method for exploring text collections, PhD thesis, Poznan
University of Technology, Poznań, Poland.
Whitley, D., Starkweather, T. & Fuquay, D. (1989), Scheduling problems and traveling salesman: the
genetic edge recombination operator, in ‘Proceedings of the Third International Conference on
Genetic Algorithms’, Morgan Kaufmann, pp. 133–140.
Whitley, D. & Watson, J. P. (2006), Complexity theory and the No Free Lunch Theorem, Springer-Verlag,
chapter 11, pp. 317–339.
Wolpert, D. H. (2005), ‘Personal communication during 2005 IEEE Congress on Evolutionary
Computation, Edinburgh, UK’.
Wolpert, D. H. & Macready, W. G. (1997), ‘No free lunch theorems for optimization’, IEEE Transactions
on Evolutionary Computation 1(1), 67–82.
Woodruff, D. L. & Lokketangen, A. (2005), Similarity and distance functions to support VRP
metaheuristics, in ‘MIC’2005 – 6th Metaheuristics International Conference’, Vienna, Austria,
pp. 929–933.
Zhang, C., Li, P., Rao, Y. & Li, S. (2005), A new hybrid GA/SA algorithm for the job shop scheduling
problem, in Raidl & Gottlieb (2005), pp. 246–259.
Zinflou, A. (2008), ‘Personal communication’.
Zinflou, A., Gagné, C. & Gravel, M. (2007), Crossover operators for the car sequencing problem, in
Cotta & van Hemert (2007), pp. 229–239.
c 2009 Marek Kubiak
°
Poznan University of Technology
Faculty of Computer Science and Management
Institute of Computing Science
Typeset using LATEX in Computer Modern.
BibTEX:
@phdthesis{ key,
author = "Marek Kubiak",
title = "{Fitness-distance Analysis for Adaptation of a Memetic Algorithm to Two Problems
of Combinatorial Optimisation}",
school = "Poznan University of Technology",
address = "Pozna{\’n}, Poland",
year = "2009",
}

thesis

Transcription

Similar documents

cicable PXO – Improve your LS3/5a !

TRx115N

PANOMATIC® OPTIMA 2.indd

datadays_with_notes

PDF

Distortion, 0.01 Full Power Input - Electro

CARVIN ENGINEERING DATA

CONFIRMATION SHMONFIRMATION!

programme - sixdays.nl

Rogers Geotechnical Services needed to Streamline their