Preferred clip - Heino Jørgensen

Transcription

Aalborg University Copenhagen
Department of Medialogy
Semester:
5th semester
Semester Coordinator: Henrik
Schønau Fog
Secretary: Judi Stærk Poulsen
Phone: 9940 2468
Title:
SmileReactor
Project Period:
2th September 2008 – 18th December 2008
Semester Theme:
Audiovisual Experiments
Supervisor(s):
Daniel Grest (Main supervisor)
Henrik Schønau Fog
Project group no.:
08ml582
Members:
Sune Bagge
Poul Martin Winther Christensen
Kim Etzerodt
Mikkel Berentsen Jensen
Mikkel Lykkegaard Jensen
Heino Jørgensen
Aalborg University Copenhagen
Lautrupvang 15, 2750 Ballerup,
Denmark
judi@media.aau.dk
https://internal.media.aau.dk/
Abstract:
This project is concerning movie playback which
reacts on when and how much the user is smiling, in
order to choose a type of humor, which the user
thinks is amusing. By analyzing various techniques
of character modeling, animation and types of
humor, three different movie clips have been formed.
Techniques of facial expression detection and in
particular smile detection have been analyzed and
implemented into a program, which plays the movie
clips according to ratings, gained from the detection
of the user’s facial expressions. The final solution has
been tested both in regards of smile detection rate,
user preferences and timing of user reactions in
relation to the intended humorous parts of the movie
clips. After the conclusion of the project, a discussion
of the relevance and execution of the project is
presented. The project is put into perspective in the
chapter Future perspectives, and proposals for
further development and alternative approaches,
Copies: 3
Pages: 238 including Appendix
Finished: 18th December 200
within the field of passive interaction, are described
and evaluated.
Copyright © 2008. This report and/or appended material may not be partly or completely published or copied without
prior written approval from the authors. Neither may the contents be used for commercial purposes without this written
approval.
Group 08ml582
Reactive movie playback
Preface
Preface
This project has been developed during the 5th semester of Medialogy at Aalborg University
Copenhagen, in the period from the 1st of September 2008 to the 18th of December 2008. The
project is based on the semester theme defined as: Audiovisual Experiments – Computer
Graphics and Animation. The product produced is based on facial reading – more specifically
smile detection – in order to control the playback of pre-rendered animated movie clips.
This report will document and discuss the steps of the production process that the group
went through during the project period. The report will cover issues such as modeling,
animation, rendering and programming in C++. It should be noted that all sounds used in the
implementation are copyrighted to the respective owners.
We want to thank stud.scient. in Computer Vision, Esben Plenge for valuable feedback on the
smile detection method.
Readers Guide
The report is organized in the following chapters: Pre-Analysis, Analysis, Design,
Implementation, Testing, Discussion, Conclusion and Future Perspectives. Each chapter is
briefly described in the start of the chapter and features a sub-conclusion in the end.
Concerning the Analysis, Design and Implementation, the order of the sub-chapters will be
sorted according to the product flow, meaning that first the movie clips are shown and then
the program registers a smile and controls further movie clips playback. Therefore, these
chapters will be structured in a similar manner, with topics concerning the creation of the
movies being discussed first, having the topics regarding creation of the program being
discussed second.
The header contains the current chapter number, chapter name and sub-chapter name in the
right side, to always have a point of reference. The footer contains the current page number in
the outermost corners. Furthermore, when a notation such as “Expo671”, there will be a short
explanation of the notation in the footer.
I
Group 08ml582
Preface
The APA Quotation standard is used when referring to sources and quoting: “To be, or not to
be, that is the question” (Shakespeare, 1603). The full list of sources can be found in chapter
10 Bibliography.
When referencing to other chapters or illustrations the text is italic, as it can be seen in the
line above.
Illustrations are numbered X.Y, where X is chapter number and Y is the number of the
illustration in the specific chapter. Illustration text is easily distinguishable because the font is
blue and the font size is smaller than the body text, which is True type font “Cambria” in size
12. Code examples and variable is formatted with Courier New and code examples are
replaced with pseudo-code, when the implementation is too long to cite in the report.
A full list of illustrations with sources can be found in chapter 11 Illustration List.
Chapter 12. Appendix contains all storyboards, implementation code and additional graphs
and data, which are not shown in the report itself. Additional data, which is too large to be
represented in the appendix, can be found on the enclosed CD-Rom.
II
Group 08ml582
Content
Content
1. Introduction............................................................................................................................................................. 1
1.1 Motivation ......................................................................................................................................................... 1
1.2 State of the art.................................................................................................................................................. 3
1.3 Initial Problem Formulation ...................................................................................................................... 6
2. Pre-analysis ............................................................................................................................................................. 7
2.1 Narrowing down emotions ......................................................................................................................... 7
2.2 Storytelling in a movie ................................................................................................................................ 11
2.3 Rendering methods ..................................................................................................................................... 14
2.4 Target group ................................................................................................................................................... 18
2.5 Testing the product ..................................................................................................................................... 19
2.6 Delimitation .................................................................................................................................................... 21
2.7 Final Problem Formulation ...................................................................................................................... 22
3. Analysis ................................................................................................................................................................... 23
3.1 Delimitation of the character ................................................................................................................... 23
3.2 Narrative structures .................................................................................................................................... 24
3.3 Lighting, sound and acting ........................................................................................................................ 28
3.4 Camera movements and angles .............................................................................................................. 32
3.5 Humor ............................................................................................................................................................... 39
3.6 A reacting narrative ..................................................................................................................................... 45
3.7 Programming languages ............................................................................................................................ 47
3.8 Smile detection .............................................................................................................................................. 49
3.9 Analysis Conclusion ..................................................................................................................................... 56
4. Design....................................................................................................................................................................... 58
4.1 Character Design........................................................................................................................................... 58
4.2 Humor types ................................................................................................................................................... 71
III
Group 08ml582
Preface
4.3 Drafty Storyboards ...................................................................................................................................... 73
4.4 Animation ........................................................................................................................................................ 82
4.5 Smile Detection Program........................................................................................................................ 106
4.6 Design conclusion...................................................................................................................................... 111
5. Implementation................................................................................................................................................. 113
5.1 Modeling ....................................................................................................................................................... 113
5.2 Rigging ........................................................................................................................................................... 117
5.3 Animation tools .......................................................................................................................................... 129
5.4 Smile Picture Capture .............................................................................................................................. 138
5.5 Training program ...................................................................................................................................... 140
5.6 Smile detection ........................................................................................................................................... 144
5.7 Implementation Conclusion .................................................................................................................. 149
6. Testing .................................................................................................................................................................. 150
6.1 Cross Validation Test ............................................................................................................................... 150
6.2 DECIDE framework................................................................................................................................... 154
6.3 Analyzing the results ................................................................................................................................ 158
6.4 Conclusion of the test ............................................................................................................................... 169
7. Discussion............................................................................................................................................................ 171
8. Conclusion ........................................................................................................................................................... 172
9. Future perspective ........................................................................................................................................... 175
10. Bibliography .................................................................................................................................................... 176
11. Illustration List ............................................................................................................................................... 182
12. Appendix ........................................................................................................................................................... 185
12.1 Storyboards ............................................................................................................................................... 185
12.2 Implementation Code ............................................................................................................................ 197
12.3 Forms ........................................................................................................................................................... 225
12.4 Test Results ............................................................................................................................................... 228
IV
Group 08ml582
1. Introduction
Motivation
1. Introduction
The introduction of this report will serve as a guide through the first chapter. After this short
introduction, the motivation of the problem will be presented. The motivation explains why
the group has found the problem interesting to work with and why it has been chosen for the
project.
The next step will be to examine other work in this area, not only to achieve inspiration for
the product of this report, but furthermore to avoid creating a product which is merely a
duplicate of an already existing product.
Lastly, the information uncovered in this chapter will assist the group in formulating the
initial problem formulation for the project, which will become the basis on which all further
research is conducted.
1.1 Motivation
The motivation for the problem of the project began with a simple thought: Imagine yourself
or any person you know watch a movie - be it in the cinema, the theater or the TV. What is the
most prominent action you imagine that person performing while experiencing this movie?
There is a good chance the answer will be: “Not really anything”. Because that is often what
happens when regular people sits down and watches a movie play out in front of them.
Normally we take in the movie through our eyes and ears – and maybe feeling it somewhat,
when we cry or get goose bumps – but it all takes place with us sitting perfectly still and doing
nothing to contribute to the cause of the story.
Usually, if the story of a movie is good enough, it is easy for us to be swept away and immerse
our selves fully in it. Many people can imagine a person watching a scary movie and being so
enticed by it, that they begin shouting advice like: “NO – don’t go in there!”, to the actor on the
screen, completely oblivious to the fact that it will not influence the movie in any way
whatsoever. Or maybe we can recognize the sensation of watching a movie that actually
provokes such reactions from us, such as tears, joy or goose bumps.
But all these influences are merely one-way. Everything that happens between the movie and
us as the audience is that the movie can affect us in some way – it is not the other way around.
1
Group 08ml582
1. Introduction
Motivation
What this project aims to establish, is the possibility for the audience to influence the movie,
rather than only having the movie influence the audience.
When considering what type of influence to have over the movie, the goal is to avoid rather
simple interactions such as pressing either a red button or a blue button at a given time in the
movie. Such a way of influencing a story exists already and we have to look no further than to
popular mainstream videogames, such as Mass Effect (Bioware, 2007) or Metal Gear Solid
(Konami, 1998) to find a story that is influenced by simple button presses.
Illustration 1.1: The game Mass Effect uses user interaction – by pressing buttons – to alter the narrative of the game.
This project aims at incorporating a more subtle form of influence over a story. Referring back
to the example of the person shouting warnings at the character in the scary movie: What if
the fear and dread experienced by this person actually determined what would happen in the
story from that point on? Or what if a character in a movie came to a crossroad – facing a
choice of whether or not to help another character out of a sticky situation – and would make
his choice based on how the audience felt about this other character at this time in the movie?
This all fall under the attempt to have the users’ emotion influence the story. By focusing on
emotions and not simple button presses affecting how the story plays out, it is possible to let
the user experience the story based on how he feels and not by what he does or simply how
the director wanted the story to play out to begin with. It presents the opportunity of having a
user alter the course of the movie, maybe without them even noticing or maybe not noticing
until he watches the movie again and unconsciously moves the story in a different direction.
This leads to many interesting questions to investigate, such as how personal the movie feels
to the user, how the user would react to experiencing the movie a second time, having it
2
Group 08ml582
1. Introduction
State of the art
change based on his emotions, but without him knowing about it and generally creates a great
many possibilities to have the movie and the user influence and interact with each other in
ways that the user might not expect or even notice. And this is why making the users’
emotions having an influence over the story forms a strong motivation and starting point for
this project. The task is now to convert this motivation into a suitable project problem.
1.2 State of the art
In order to know what has been done already in the area of interactive movies and
measurements of users’ emotions, this chapter will look at previous attempts at doing
interactive movies and examine what has been done in the area of facial expression detection,
in order to interpret the emotions of a user. It is important to determine which ideas have
already been realized in the area of interactive movies, since the making of a product
mimicking the functions or ideas of a previous product is futile, offering nothing new to
examine, test and analyze. Rather, it would be prudent to – at most – take inspiration from
previous products but still ensure that the final product of this project will be unique and new,
so that it provides a reason to test and evaluate it.
1.2.1 Interactive movies
True interactive movies have existed in various forms for more than 40 years, but it is still a
technology, which is not widely used when producing movies for the general cinema
audience. In 1967, the Czechoslovakian director Radúz Činčera directed the movie
Kinoautomat: One Man and His House, which was a black comedy movie, presented at the Expo
67 1; the movie was created to run on a system developed by Činčera, called Kinoautomat and
made it the worlds’ first interactive movie (Kinoautomat, 2007). The movie was produced
such that at certain points during the movie, a moderator would appear on stage in front of
the audience and ask how the movie should develop from this point on. The audience had
before them a red and a green button and could choose one of the options presented to them
by pressing the corresponding button. The movie would develop according to how the
majority of the audience voted. The movie was well received at the Expo 67 and the New
Yorker wrote:
The World Fair of Montreal 1967. A cultural exhibition held from the 27th of April to the 29th of October 1967
and received more than 50 million visitors in that period (Expo 67, 2007)
1
3
Group 08ml582
1. Introduction
State of the art
“The Kinoautomat in the Czechoslovak Pavilion is a guaranteed hit of the World Exposition, and
the Czechs should build a monument to the man who conceived the idea, Raduz Cincera.”
(Kinoautomat, 2007)
Furthermore, the movie was revived in 2007, by Činčera’s daughter Alena Cincerova
(Cincerova, 2007).
Despite the success back in 1967, the system was never widely used and interactive movies
did not experience a great development in the following years. In 1995, the company Interfilm
Technologies claimed to be the first to produce an interactive movie for viewing in cinemas,
when they presented the movie Mr. Payback (Redifer, 2008). However, this movie was based
on exactly the same principles as Činčera did in 1967. Mr. Payback was also made such that
the cinemas showing it were required to have special equipment installed in their seats, in
order for the audience to interact with the movie. Compared to Činčera’s movie, with a
running time of 63 minutes, Mr. Payback would only reach a maximum playtime of
approximately 20 minutes (Redifer, 2008). The movie never turned out to be a great success,
and even though Interfilm Technologies did produce a couple of more of these interactive
movies, the company was quickly shut down.
With the development of personal computers, it has become possible to show interactive
movies for people in their own homes, meaning that the interactive movie becomes more like
a game, where the user can personally decide the branching of the story, instead of being
“overruled” by other members of an audience. This has lead to experiments within interactive
movies and it can be difficult to differ between an interactive movie and a game. One of the
earliest approaches to an interactive movie as a computer game was the game called
Rollercoaster from 1982 (NationMaster, 2005). The game was based on the movie of the same
name, released in 1977. It was a purely text based game, a game which use text instead of
graphics, describing everything in the game with words, but with the ability of triggering
playback from a laserdisc, showing parts of the actual Rollercoaster movie. In 1983, the game
Dragon’s Lair was released, which was the first commercial interactive movie game (Biordi,
2008). Opposite of a regular interactive movie, Dragon’s Lair had a winning and a losing
condition, making choices in the game more like challenges than plain choices. Dragon’s Lair
4
Group 08ml582
1. Introduction
State of the art
was quite successful, partially because of the great quality of the cartoon - produced by
previous Disney animator Don Bluth - on which the game was built (NationMaster, 2005).
A later experiment is the interactive drama Façade, which is an attempt to produce an
interactive movie that involves the user as an actual actor in the drama that Façade is based
on. Façade is developed by Andrew Stern and Michael Mateas, who together call themselves
Procedural Arts. Andrew Stern is a designer, researcher, writer and engineer and has worked
on other award winning software such as Virtual Babyz, Dogz and Catz (Stern & Mateas,
2006). Michael Mateas is an assistant professor at Georgia Tech and has worked on expressive
AI in projects such as Office Plant #1 and Terminal Time (Stern & Mateas, 2006). Stern and
Mateas claims to focus on people who are not used to playing computer games, but are more
used to watching movies in cinemas and going to the theatre (Procedural Arts, 2006).
1.2.2 Facial Expression Detection
Having seen how previous attempts at interactive movies have been conducted, it is necessary
to look at how it is possible to use other forms of interaction methods to let the user interact
with a movie. As mentioned in chapter 1.1 Motivation one way could be to somehow measure
the emotions of the audience. In this chapter, recent approaches to facial expression detection
will be discussed, to give an idea of how to detect a user’s reaction to certain situations.
In 2007 Sony introduced the groundbreaking “Smile Shutter” function for low budget pocket
cameras. The cameras used a proprietary algorithm to make the camera shoot only when a
smile was detected. The cameras also recognize faces to guide the autofocus function, making
the cameras even more suitable and nearly infallible for point-and-shoot situations. The
technology has also been integrated into video cameras, so that the camera only records when
people are smiling (Sony Corp., 2007).
Facial expressions often convey massive amounts of information about the mood and
emotional status of a person. Since every human face is different, the detection of facial
features such as eyes and mouth has been one of the major issues of image processing. Due to
the complexity of the face, it is complicated to map all muscles and skin to an equation.
Furthermore, the difference between human faces regarding colors, sizes, shapes, muscle
contraction paths and hair growth makes the identification even more complex. Advanced
face identification uses so-called hybrid approaches combining advanced neural networks,
5
Group 08ml582
1. Introduction
Initial Problem Formulation
which can “learn” to detect expressions by training (which will be elaborated in chapter 3.8.1
Training). While neural networks will not be described in this report due to the semester
theme, it is unavoidable when describing state of the art technology in this field. More simple
approaches involve large, tagged photo libraries which are being feature-matched with the
pictures to be checked for facial expressions.
1.3 Initial Problem Formulation
Based on what has been researched in the area of interactive movies and face recognition, an
initial problem formulation can be described. What was discovered through this chapter was
that most interactive movies previously created, were based on users interacting through
pressing buttons, at which points the movies were paused, to ask the users of their opinions.
With the modern technologies within the field of face detection, there might be an
opportunity to create a new form of interactive movies, which would not require the movie to
pause at important dramatic situations. Thus, the following initial problem formulation for
this project can be given:
How can we create an interactive, uninterrupted movie, which is controlled by the
users’ emotions?
6
Group 08ml582
2. Pre-analysis
Narrowing down emotions
2. Pre-analysis
Having found an initial problem formulation which is kept general by default, the task is now
to narrow it down and make it as specific as possible in order to have a focused goal on which
to base the further work of the project. First off, emotions will be further specified and
narrowed down to one specific emotion. Second, it will be determined how this emotion will
be measured (facial expression, heart-beat measurement etc.). Overall elements of telling a
story will be examined, such as defining the narrative, introducing Mise-en-scène. The method
of implementation will be discussed, by comparing methods such as pre-rendering with run
time rendering. The target group for the project will be discussed and determined in order to
exclude inappropriate test participants. Finally the testing method for the project will be
introduced and discussed, in order to explain what tests the product will undergo.
2.1 Narrowing down emotions
Having decided upon using emotions as the method of user interaction with a story and all the
elements within this story, the task is now to narrow the choices down and determine exactly
which emotion to focus on. Because simply using “emotions” is far too wide a term, being that
human beings are capable of displaying joy, fear, boredom, curiosity, depression, jealousy,
bewilderment and many, many others, with new emotions continuously being discovered
(The Onion, 2007). Therefore, aiming to include a wide variety of different emotions in this
project presents the risk of completely eliminating any possibility of finishing on time.
Before deciding which emotion should influence the narrative, it would be prudent to briefly
discuss how to discover or measure which emotion the user is portraying – how to read the
user. Should it be by measuring the pulse or heartbeat? The amount of perspiration of the
user? Reading the facial expression or maybe even a combination of these methods?
What would be preferred is not to require the user to undergo a long preparation process of
attaching various sensors to his body, in order to conduct various body-examinations either
during or after watching the movie. Requiring this of the user, produces the risk that he will
become distracted from watching the actual movie, or maybe be too aware of sensors and the
measuring equipment to be able to fully concentrate on the movie itself. Or he might simply
lose interest in the movie, if too much preparation is required. Instead, a far better alternative
7
Group 08ml582
2. Pre-analysis
would be to simply include non-intrusive measuring and have the user watch and interact
with the movie in a way that ensures he can focus on the movie and nothing else.
And so, the project will focus on reading the face of the user and read his emotions that way.
But while many emotions are deeply connected to e.g. the heartbeat, such as stress or
excitement (Bio-Medicine.org, 2007), there is still many different emotions that can be read
from the face alone. These include among others happiness, sadness, disgust and surprise
(Hager, 2003).
When looking to mediate an effect to a user, it is prudent to ask: Do we want to make people
happy, sad, bored, disgusted etc? A large part of this project will be to evoke an emotional
response in the user and as it can be seen in chapter 1.1 Motivation, TV and movies are very
adept at doing exactly this. When looking at the emotions found at the website: A Human Face
– Emotion and Facial Expression (Hager, 2003), such as happiness, fear or disgust, and
comparing them to movies and TV, there are generally two of the emotions that stand out:
Happiness and fear.
Two entire movie-genres focus almost entirely on these emotions: The comedy and the horror
movie genre. A few well-known examples of comedies, are Ace Venture: Pet Detective
(Shadyac, 1994) and Monty Python and the Holy Grail (Gilliam & Jones, 1975), while famous
examples of horror films are such films as The Shining (Kubrick, 1980) and Psycho (Hitchcock,
1960).
Happiness is also heavily featured on TV, with many sitcoms showing, focusing solely on
making people laugh. Shows like Friends (Crane & Kaufman, 1994), Seinfeld (Larry & Seinfeld,
1990) and Everybody Loves Raymond (Rosenthal, 1996) are very popular examples of sitcoms.
And even if happiness is not the main purpose of the production, a multitude of movies and
TV-shows include jokes and humor.
Horror is also featured in TV-series, although somewhat less prominently. Shows like The
Scariest Places on Earth (Conrad & Kroopnick, 2001) Are You Afraid of the Dark (Peters &
Pryce, 1991) focus primarily of horror and suspense and are examples of horror as the main
goal on TV-shows.
8
Group 08ml582
2. Pre-analysis
When looking at the other emotions described by Hager, such as anger or disgust, few movies
or TV-series exist that focus mainly on making people angry or disgusted, with a movie like
Hostel (Roth, 2005) being one of few movies focusing on disgusting the audience to achieve an
effect. And while the element of surprise play a big role in movies like The Usual Suspects
(Singer, 1994) and The Game (Fincher, The Game, 1997), focusing on the particular emotions
of anger or disgust occurs with far less frequency than fear or happiness.
Based on the spread of these emotions in screen media, it is reasonable to suggest that fear
and happiness are the two most widely used emotions to focus on. And so, the final choice
comes down to either one of these.
Looking at a user of the product of this project, it should not be a requirement to invest a long
time to be able to naturally interact with the product. What is meant by this can be explained
with how either happiness or fear is achieved:
When considering how to achieve fear in a movie, suspense is often involved, e.g. when
watching the movie and knowing that danger lurks nearby. But it is not often that fear is
achieved in a few seconds without any prior action or knowledge. With happiness, it can vary
from involving a long story that ends in a humorous climax or it can be as short as a one-liner,
such as a play-on-words like: “Why is it that when a door is open, it's ajar, but when a jar is
open, it's not a door? “
People who e.g. suffer from arachnophobia will feel fear immediately when seeing a spider,
but phobias are very individual and as such, a certain phobia cannot form the basis of making
the user feel fear. This means that there are more possibilities for producing happiness than
fear, regardless of how much time the user invests.
Next, it is prudent to think about how to recognize the features of a facial expression showing
happiness and fear. When thinking of a person smiling, it is extremely hard not to think of that
person as happy in some way, be it joyful, satisfied, cheery etc. The degree of the smile will
vary based on the emotion portrayed, but no matter what, a smile will generally be associated
with the feeling of happiness and it is therefore arguable that detecting happiness can be
simplified to detecting a smile.
9
Group 08ml582
2. Pre-analysis
But is it just as easy to determine the expression of fear? Looking at the fearful expression
such as the one on Illustration 2.1, we can see open eyes and a wide mouth to indicate the
emotion, but neither the eyes nor the mouth suggests the emotion alone.
Illustration 2.1: The open eyes and wide mouth a both necessary to express fear.
Some people suggest that the most important part of the face when showing fear is the eyes
since they widen when we are afraid (American Association for the Advancement of Science,
2004), but wide eyes alone do not exclusively show fear. Amazement also includes wide-open
eyes as shown on Illustration 2.2.
Illustration 2.2: Here it can be seen, how wide eyes also help in express amazement.
Finally, different mouth-shapes span a far greater area of the face than the eye-shapes,
resulting in a higher degree of variations between expressions, making it easier to focus on
10
Group 08ml582
2. Pre-analysis
Storytelling in a movie
the mouth than the eyes when recognizing emotions. And because happiness can be more
easily defined by the mouth alone than fear, which needs more than just the mouth, it
becomes the easier expression to recognize.
Based on these traits of the emotion of happiness, along with it being the easier emotion to
recognize, happiness will be chosen as the emotion to focus on in this project.
As discussed in chapter 1.1 Motivation, there can be many ways to invoke certain emotions of
a person watching a movie, a theatre play etc. In order to make a user smile, it would be
prudent to use humor, given the following definition from The American Heritage® Dictionary
of the English Language:
1. The quality that makes something laughable or amusing; […] 2. That which is intended to
induce laughter or amusement. (American Heritage® Dictionary, 2008)
Based on this definition, humor is very useful for the purpose of this project, given that the
purpose is to find a smile on the user. As humor is the quality that makes something
laughable, it is possible to make the user smile, given that he understands the humor.
Therefore, it might also be prudent to investigate different types of humor, to make it fit the
user. This will be discussed in chapter 3.5 Humor. This indicates that several types of humor
can be necessary in order to increase the possibility of a user smiling.
2.2 Storytelling in a movie
It is important to know how to tell a story in order to captivate the viewer, convey a message
through a story or even trigger emotions such as fear, sadness, happiness. Different genres
use different techniques and methods in order to invoke certain feelings in the viewer, and as
such, knowing how to incorporate and use these is an important part of telling a story.
Telling a story is not as simple as it might seem. It involves several factors which each plays a
vital part in the overall concept of telling a story. Firstly, the person to which the story is told,
who will be referred to as the user, is bound to have certain expectations. The user will expect
the story to contain one or more characters, and he will expect that a series of events, which
are connected in some way, will happen to this character. The user will furthermore assume
that some problem or conflict will arise during the story, but also that this will be resolved or
at least handled in some way (Bordwell & Thompson, 2008, s. 74).
11
Group 08ml582
2. Pre-analysis
When we are watching a film, we are being told a story. During this story, we pick up clues, we
store information of what happened when, where, how, why, and our profound expectations
are being manipulated in order to create surprises, suspense and curiosity. When we get to
the end, we usually find that our expectations have been satisfied. Although it can also occur
that they have been cheated which makes us see the past events in the story from a new
perspective (Bordwell & Thompson, 2008).
The group members are to take on the role as filmmakers, it is important to have an
understanding of how to evoke suspension, surprises etc. in the audience. Specific methods to
create e.g. suspense and expectations will be explained in chapter 3.3 Lighting, sound and
acting and 3.4 Camera movements and angles.
2.2.1 The Narrative
David Bordwell holds a master’s degree and a doctorate in film from the University of Iowa.
Furthermore he has won a University Distinguished Teaching Award and has been awarded
an honorary degree by the University of Copenhagen. Of his several books are Narration in the
Fiction Film, On the history of film and Figures traced in light: On cinematic staging (Bordwell &
Thompson, 2008).
Kristin Thompson holds a master’s degree in film from the University of Iowa and a doctorate
in film from the University of Wisconsin-Madison. Her publications include Storytelling in the
new Hollywood: Understanding classical narrative technique and Storytelling in film and
television (Bordwell & Thompson, 2008).
Bordwell and Thompson define the narrative as:
“[…] a chain of events in cause-effect relationships occurring in time and space.” (Bordwell &
and define the story as:
Thompson, 2008, s. 75)
“The set of all the events in a narrative […].” (Bordwell & Thompson, 2008, s. 76)
The narrative begins with a situation. Then a series of changes or alterations happen, which
are all related to a cause-effect pattern. And in the end, a new situation appears, which may be
that our expectations have been fulfilled or that we see the pattern of cause and effect from
12
Group 08ml582
2. Pre-analysis
another perspective (Bordwell & Thompson, 2008), such as in The Usual Suspects, where the
ending reveals that the entire story has been made up by Verbal Kint.
This knowledge will be valuable when making the storyboards. The basic building blocks of a
story needs to be in place, which means that the chain of events in a cause-effect relationship
has to be maintained.
2.2.2 Mise-en-Scène
The meaning of this term is “putting into the scene”. It is used to describe what appears within
the film frame, which could e.g. be the location, what kind of props, characters etc. to include.
It is a term that describes the director’s power and control with regards to the event that the
camera is to record.
The following are aspects of Mise-en-Scène, and they will each be shortly introduced
(Bordwell & Thompson, 2008, s. 112):
•
•
•
•
Setting
Costumes and makeup
Lighting
Staging
There are several ways to control the setting. The director can e.g. choose to film at a location
with already existing materials, or he can construct the setting. This means building a studio
which in many aspects such as sound and lighting, provides the director with more control.
Furthermore the setting often plays a vital role in films in the way that it can contain clues
and/or other information about the story or characters.
Similarly costumes and makeup also have important purposes. Usually they assist the viewer
by providing visual information, e.g. to emphasize a poor family, they would all wear worn-out
dirty clothes and have sad looks. But should this family find financial fortune, the costumes
and makeup could help illustrate this transformation, e.g. by outfitting them with better
clothes, new jewelry etc.
Manipulation of lighting has an enormous influence on how we are affected when watching a
film. Lighting can guide our attention to certain objects or characters, merely by using darker
and lighter areas, e.g. to illuminate important objects in a scene. Suspense can be built up by
13
Group 08ml582
2. Pre-analysis
Rendering methods
concealing details in shadows and key gestures or clues can be pointed out using brighter
light on these, than in the rest of the scene. The various techniques, theories, methods of
lighting will be elaborated in chapter 3.3 Lighting, sound and acting.
Staging refers to movement and performance, which involves several elements like gestures,
facial expressions, sound etc. and generally setting up the entire scene. It is the control of
what the figures or characters should do while being in the scene and how they should act
(Bordwell & Thompson, 2008).
The tools of Mise-en-Scène represent control of what is in the scene. Thus, these tools will
become very useful when designing storyboards. They can be used as a guideline or check list
for what the director needs to be aware of, specifically in the design of storyboards, which is
where the original stories will be created.
2.3 Rendering methods
Referring to the preface, it is a direct requirement of the semester theme, that 3D animation is
incorporated in the project. Therefore this limits other mediums of animation, such as hand-
drawn animation. In order to achieve the best end-result regarding the animation, various
rendering methods must be examined.
There are two main methods of rendering that need to be taken in to concern before doing
any kind of computer animated video production: Real-time rendering and Pre-rendering. As
the words indicate, one method renders everything as it is used, while the other method is
rendering all the material before it is to be used. There are pros and cons of both methods and
this chapter will explain these pros and cons and thereby find the best possible rendering
method for this project.
First of all, it is necessary to explain what rendering actually is. The reason for doing
rendering – at least in the form that is suitable for this project – is to get 3D models
transformed into a 2D image, projected on a screen. A screen is a plane and as such, it can only
display two dimensions (height and width), so when doing 3D modeling on a computer,
rendering is constantly taking place, in order to show the modeler what he is doing to his
(perceived) 3D model.
14
Group 08ml582
2. Pre-analysis
Rendering methods
Illustration 2.3: The 3D model is projected on to the image plane, point by point.
By projecting all points and connecting them in the right order, a 2D projection of the model is obtained.
Illustration 2.3 shows that the image plane is what will be the screen in a rendering of a
computer image. In the illustration, not all points of the cube are projected, which they should
be. The projection of all the points onto the plane would give a 2D representation of the 3D
object (Owen, 1999). For any object that is within the scene spanned by the image plane, such
a projection would have to be made for each point (or in Latin: vertex, as it is more commonly
called in computer graphics) on each object.
If the scene was a game or similar, with multiple objects and many things happening
simultaneously, there would be millions of vertices to project. As it is impossible to predict
how a user would move around such a scene (the user should be allowed to move around
inside the game world), the projection has to be done frame by frame. This means that for
every frame the computer will have to compute the projection of each of the points in the
scene onto the image plane. There are many computations and the reason why computer
games tend to use rendering methods, which are as cheap as possible with regards to the
amount of computations.
15
Group 08ml582
2. Pre-analysis
Rendering methods
In animated movies however, there is only one way the scene can evolve, which means that
the scene can be rendered before the user sees the movie. In this situation, there is
theoretically unlimited time to do the rendering, which means that it is possible to choose
very computationally complex rendering methods. Methods like ray-tracing (which will be
explained in chapter 2.3.2 Pre-rendering) and radiosity 2 are examples of methods used
primarily in animation video rendering. The following chapters will look briefly into real-time
rendering and pre-rendering, to justify the final choice of rendering method for this project, in
terms of production time and image quality.
2.3.1 Real-time rendering
Real-time rendering is as the name suggests rendering which happen in real-time. The realtime rendering method makes the content highly customizable since the variables of the
movie can be changed real-time. This makes it possible to create a multitude of different
movies, just by changing one parameter like for example camera angle. The dynamic of the
real-time rendering is the big advantage.
One disadvantage of real-time rendering is that the rendering has to be very fast in order to
maintain a proper frame rate. The minimum frame rate sets an upper limit for the complexity
of the rendering. This means that heavy computational effects, like the before mentioned raytracing or radiosity, might not be useable. High polygon models are also less appropriate to
use when utilizing a real-time rendering engine. This indicates that real-time rendering will
be less suited for this project, since image quality is at risk of being compromised in order to
maintain a decent frame rate. Furthermore, the use of high polygon models might be required
in the animation, which also indicates real-time rendering as being a poor choice.
2.3.2 Pre-rendering
As opposed to real-time rendering, pre-rendering will do all the work before the final
rendering is to be shown to a user. As already mentioned in this chapter, this presents an
opportunity for doing rendering, which is much more computationally complex, since the
rendering time is not an issue. Typical examples of software for doing pre-renderings are
programs such as MAYA, 3D Studio Max or Blender. Some are open source programs (for
2 Radiosity is the computation of the color of one diffuse object, based on reflected color from all other surfaces
in the scene (Spencer, 1993)
16
Group 08ml582
2. Pre-analysis
Rendering methods
instance Blender) and some are expensive software (MAYA or 3D Studio Max), but the
programs are similar in functionality, so choosing one over the other is mainly based on
personal preferences. However, MAYA is considered to be the industry standard in
professional character animation studios (Animation Mentor, 2008), so this indicates that
Maya can be the best choice for this project.
One of the possible techniques that can be used in pre-rendering is ray-tracing. Ray-tracing is
based on tracing rays from the eye/camera to a light source. Ray-tracing will not be discussed
deeply in this chapter, but the theory will shortly be discussed, to give an impression of the
opportunities of pre-rendering compared to real-time rendering. Illustration 2.4 shows the
effects of doing ray-tracing.
Illustration 2.4: By using ray-tracing it is possible to do realistic reflections and simulation of lighting, as the theory
is based on tracing rays of lights in the image.
Since every ray should ideally be traced all the way from the eye to a light source – or a diffuse
object, meaning an object which reflects light equally in all directions – there will be many
rays to do computations on. Some rays may be refracted and some reflected and for each new
point which a ray hits, a new computation is to be done. In programming, this would mean a
recursive function, which should follow each ray through each of its reflections and
refractions. This gives more computations to perform than the renderings done real-time,
which will lead to reduced frame rate.
17
Group 08ml582
2. Pre-analysis
Target group
Since this project is focused on doing an animated movie, the quality of the image has a high
priority. Consequently, the rendering method will be pre-rendering, since this offers the
opportunity to produce images of higher quality, by e.g. including methods, such as ray
tracing, in order to obtain realistic reflections in water or mirrors, if needed in the product.
And since there is no need for moving around in the animations, there is no need to
compromise image quality.
With regards to what tool to use in creating the animated movie clips, there are - as
mentioned – several possibly programs to choose from, but since MAYA is considered
industry standard and that MAYA is available in a free learning edition, which offers almost all
the functionality of the full version, MAYA will be the choice of software for 3D modeling,
animation and rending in this project.
2.4 Target group
In this chapter, a target group will be defined for the project. To shape the initial problem
statement into the final problem statement, it is necessary to identify what audience the
solution aims at. This will be done mostly by choices made by group consensus and will not
include complex, social group classifications and lifestyle attitude segments. This chapter will
analyze the choices made in order to establish who the target group will be.
For the purpose of this project, a target group could be defined as being the audience of
interactive exhibitions and museums. Since the goal of the project is to test whether or not a
certain technology works in collaboration with a reacting movie, the audience for such an
application is wide. What is needed is users who are interested and wants to actively
participate in an interactive communication.
Many museums feature non-interactive, non-animated artworks which do not captivate the
audience in the same way as for example a science center exhibition. The need for this kind of
interactive exhibition manifests itself in for example the amount of visits at the science center
Experimentarium in Hellerup. The visits per year have been steady since 1991 with around
350.000 per year (Experimentarium, 2008) which is almost as many as the biggest “ordinary”
museum: Statens Museum for Kunst (Ohrt & Kjeldsen, 2008). The target group of this project
could be that of the museum or the exhibition, which the solution of this project should be
18
Group 08ml582
2. Pre-analysis
Testing the product
located. However, the product is not restricted in application area to museums and or
exhibitions. These areas are just suitable areas for testing our product.
There is no need for an upper age limit in our target group since the users should only be able
to watch a movie. However, there is a lower age limit for the interactivity element. As a lower
limit, an age of 18 years is chosen. This is primarily out of legal issues, as people older than 18
would not need to be given legal permission to participate in our test, by their parents or
guardian.
The placement in an exhibition provides one major advantage. It is easier to attract an
audience because it is plausible that the audience is at the exhibition or museum because they
are most likely already interested and curious about what is happening. An eye-catching
feature of the product is therefore less important and production time in the project can be
spent elsewhere. The interpretation and understanding of art and art installations is
extremely subjective, hence it is complex to measure the experience and effect. The
understanding of the art is not the main issue on the test of the product, since it is possible to
produce quantifiable results from the measures of how long and how often the users are
smiling. Furthermore, quantifiable results can be obtained by asking users to fill out
questionnaires afterwards.
2.5 Testing the product
For this project there are two parts to be tested: The interaction method itself and the
consequences the interaction has on the users. For the interaction method - the smiling of the
user – a so called cross validation will work as a test of the successfulness. Cross validation
works by taking a predefined amount, usually 10%, of the training data (for example a set of
images) and use it as test data on the remaining 90% of the training data, in order to
determine the validity of the training data and data processing (Moore, 2005). In the case of
this project, the program can read the user as either “smiling” or “not smiling”, so the cross
validation will only test these options. So, taking e.g. 10 pictures from the training data for the
smiles and 10 pictures from the training data for the non-smiles, the result could be a table
like seen in Fejl! Henvisningskilde ikke fundet..
19
Group 08ml582
2. Pre-analysis
Testing the product
Detected as smile
Detected as non-smile
Smiles
85%
15%
Non-smiles
13%
87%
Table 2.1: An imaginary example of the results of a cross validation. The higher the numbers in the diagonal from top
left to bottom right, the better.
The more of the smiles from the training data that is actually identified as smiles the better.
The same applies for the non-smiles. Ideally, there should be a 100% detection rate, but that
is not realistic to achieve. However, values around 50% would be just random detection, so
the detection rate should be closer to 100% than 50%, meaning at least 75% is the goal for
the detection rate of this cross validation. Furthermore, a higher detection rate is to be
desired, so a detection rate of 80% is going to be the success criteria for the smile detection
application.
For the test of the interaction method, users will have to be involved. The goal of the final test
is to determine whether the program’s interpretation of when the user smiles, is consistent
with when the user himself thought the movie was funny. In order to ensure a good possibility
of having the user smile and thus enable to program to register this smile, it would be sensible
to include short movie clips containing several different types of humor, rather than merely
showing him one long movie containing only a single type of humor. This would result in the
program adapting to the user’s preferred humor type and focusing on this particular type of
humor. It is important that the users are not informed of the specific way of interacting with
the movie. The goal is to let the program register natural smiles, rather than having the users
deliberately smile in order to change the way the movie is played, and it becomes difficult to
determine if the user thought the movie was funny, if he is merely smiling with the objective
of making the program react.
This is a very subjective perception, which could vary from test person to test person and
thus, it can be difficult to produce measurable data. One method of obtaining quantifiable
results would be to make the users fill out questionnaires about their experience with the
product. However, qualitative results, such as the opinions expressed through an interview
could also be appropriate, in order to get an understanding of the users’ experience. In
chapter 0.
20
Group 08ml582
2. Pre-analysis
Delimitation
Testing, a more detailed description of the test will be presented. The testing framework
called DECIDE, will be used to plan and evaluate the tests. The DECIDE framework is widely
acknowledged within professional companies such as Nokia for the purpose of usability
testing (Sharp, Rogers, & Preece, 2006) and will be sufficient for the usability tests that is to
be conducted in this project.
2.6 Delimitation
During this pre-analysis, it has been decided that rather than measuring users’ emotions as a
whole, the project will focus only on registering their smile and make the movie react
according to that, so the product will essentially focus on creating a reactive movie – where
the user subconsciously controls the movie (i.e. the user will not be constantly be informed of
the need to interact with the movie) – rather than regular interactive movie, such as the
examples described in chapter 1.2.1 Interactive movies, which requires the user to part himself
for the immersion of the movie to make a conscious and deliberate choice of how it should
unfold.
Using pre-rendering methods ensures a high quality of the movie. Since the movie is not
supposed to allow the user to move around freely, there is no need to compromise image
quality, such as ray tracing, by choosing a game engine.
The target group will be people above the age of 18, who are set into a pre-decided test-
environment, being unaware that they will influence our story.
The testing will be performed in two steps. The first step will focus on the reaction method
itself, which involves performing a cross validation test on this method. The second step will
involve users, by making the users try the final product, which will involve a functional smile
detection program and fully animated movie clips. Questionnaires will be used in order to
produce hard, measurable data in this case.
With the goal of creating a program, which can adapt to a user’s reaction to certain movie
clips, it has been established that the product needs to detect when the user is smiling and use
this information to change the type of humor of the movie that the user is watching. Thus a
final problem formulation can be expressed.
21
Group 08ml582
2. Pre-analysis
Final Problem Formulation
2.7 Final Problem Formulation
How can we, through the use of smile detection, create an adaptive program, which
reacts to the smile of a user by changing the type of humor in a pre-rendered
movie?
22
Group 08ml582
3. Analysis
Delimitation of the character
3. Analysis
This chapter will analyze the various aspects of the final problem statement and go into depth
with them. First of all, in order to be able to create the necessary movie clips, theory of
narration has to be explored, since a structure must be applied to the movie clips to ensure
that they become structured and connected and appear as part of the same whole.
Afterwards, theories of light, sound and acting will be discussed and it will be explained how
these elements can be used in correlation with this project and how to use cameras properly
in order to obtain what is needed of the movie.
Since this project will use humor to obtain the results needed, this chapter will also go into
discussing humor types in connection to animated movies.
Finally, various programming languages and environments for creating the smile detection
will be discussed and theories of detecting both a face and a smile will be examined.
3.1 Delimitation of the character
The reason for this subchapter is to avoid unnecessary analysis with regards to the humor in
the movie clips.
There will not be any dialog in the animations, because the project character is not going to
speak. The reason for this involves lip synchronization, also known as lip-sync which is the
term for matching the lip movements with a voice. Among animators it is considered that lipsync is a very time consuming and animation-wise often very challenging process to make it
look right.
According to Richard Williams The Animator’s Survival Kit (Williams, 2001), the animator
should focus on the body and facial attitudes first, and then decide if it is necessary to include
working with the mouth.
(Williams, 2001, s. 314)
23
Group 08ml582
3. Analysis
Narrative structures
Furthermore Williams suggests that one should not have too much going on with regards to
e.g. how many poses a character should have for a certain sentence. “Keep it simple” is the
advice, and aspects of dialogue such as accents, the sharpness of the voice and making the
words appears as words, not just a collection of letters, lip-synching requires far too much
effort to make it look right, for it to be worth including in the animation, given the time
available for this project.
This is the argumentation for excluding the mouth and thereby dialogue in the animations.
The character design will aim to comply with the “keep it simple” mantra, and the goal will
therefore be to make the viewer understand what is going on in the scenes, using the
movements, gestures and facial expressions of the character.
3.2 Narrative structures
When performing any kind of storytelling, whether it is on a screen, in a book or through any
other media, it is important to be aware of how to tell the story. The structure of the narrative
has a big influence on how the story is perceived by the viewer. Without a narrative structure,
a movie can be very chaotic in terms of e.g. the level of excitement throughout the story or the
number of acts of the story. This chapter will focus on different narrative structures, with the
goal of finding the most suitable structure for the purpose of this project.
3.2.1 The dramatic Structure
One of the structures, which have been researched the most throughout history, is the
dramatic structure. The Greek philosopher Aristotle was the first to critically describe the
theories of the drama (Clark, 1918, s. 4-5). Aristotle is the author of the book named Poetics
(350 B.C.), in which he describes the origin and development of poetry and the tragedy.
Aristotle also mentions the structure of the comedy in the Poetics, but this description of the
comedy is believed to have been lost, so only the tragedy can be described. However, as this
has been the basis of so many dramatic and poetic theories since then, such as using
Aristotle’s 3 act structure in the movie Gladiator (Scott, 2000), this chapter will take into
consideration the works of Aristotle in explaining the dramatic structure.
Aristotle believed that humans are born with the instinct of imitation and that we from early
childhood learn by imitating the actions of other living creatures (Aristotle, 350 B.C.). This is
the reason why we as humans are so intrigued by movies and other acts, which essentially
24
Group 08ml582
3. Analysis
imitates human behavior. Even though many visual and auditory effects are used in obtaining
this imitation of real life, the most important part is still the actions of the story (the Mythos),
according to Aristotle, which closely relates to the definition of Bordwell & Thompson, as
described in chapter 2.2.1 The Narrative.
In Aristotle’s terms, there could be no tragedy (and thus no other form of narrative) without
any Mythos.
Since the presentation of the tragedy by Aristotle, many have discussed, praised and criticized
the structure which he proposed. In 1863, the German philosopher Gustav Freytag published
his book Die Technik Des Dramas (Freytag, 1876), in which he structured Aristotle’s theories
in a 5 acts system and this structure is very much similar to what we know today as the
Hollywood model.
Illustration 3.1: Gustav Freytag’s dramatic structure evolves such that the excitement of the story rises until a climax
is reached, where after the action is falling, until the catastrophe is reached, which in dramatic terms means the final
resolution of the problem.
Illustration 3.2: The Hollywood model in many ways resembles the model of Freytag, as it also reaches its highest
excitement level at a climax and then fades out, leaving the audience at the same state as when the movie started.
As it can be seen in Illustration 3.1 and Illustration 3.2, there are many similarities between
Freytag’s model and the “modern” Hollywood model. The idea behind both models is that the
25
Group 08ml582
3. Analysis
characters and the conflict are introduced in the beginning of the story. The characters,
setting and environment is described to set the scene of the story. After the introduction a
conflict or a change occurs, which somehow influence the characters. The introduction of the
characters and environment are crucial, because the rest of the story builds upon the first
impressions of the character. Freytag also states that the introduction is perhaps the most
essential part of the five acts and the introduction should be weighted out carefully against
the rest of the narrative. It should also have a clear connection to the catastrophe, in which the
problem of the story is resolved (Price, 1892, s. 76-94). This is also why some refer to it as
Freytag’s triangle, as a line could be drawn directly from the end (the catastrophe) back to the
beginning (the introduction), where a new story would start.
The conflict of the narrative is further complicated in the next act of Freytag’s structure and in
the Hollywood model, this happens through an elaboration and an escalation. At the highest
point of excitement, the climax is reached, illustrated by a solution to the problem, followed
by a fast decline in excitement, until the narrative ends (Price, 1892, s. 94-111).
3.2.2 Other narrative structures
Having discussed the most common narrative structure, the dramatic structure, other
narrative structures can be discussed. As mentioned in the introduction of this chapter, the
movie clips that will be produced in this project will be very short clips. It is difficult to go
through the entire development of the dramatic structure in such a short time and thus other
alternatives should be explored.
The episodic structure is one alternative, in which the action is based on small episodes. This
structure is used in TV series where there might or might not be a connection between one
episode and the next, but where each episode should have some sort of dramatic excitement,
to keep the viewers interested. Most often though, the dramatic structure is applied to each
episode, which would again require each of the clips produced in this project to have a
structure similar to that of the dramatic structure.
Another approach could be the star shape structure. The idea of this structure is to have an
origin, to which the story returns periodically.
26
Group 08ml582
3. Analysis
Illustration 3.3: The star shape structure has its point of origin in the center and the story evolves according to – for
instance – the main character’s immediate state of mind.
Using this structure fits well with having small clips, as the development of the narrative
would essentially develop according to whether the viewer is smiling or not. The viewer’s
smile (or lack of so) will be the cause of whatever action will happen, where the story goes
from the center and out to one of the events (1-8 in Illustration 3.3) - and the effect will be that
the main character acts according to the cause.
For further development of the idea, the star shape structure could be implemented in an
overall dramatic structure, such that the narrative develops in a more classic way, but still
having the character return to a relatively similar point of entry between all events of the
narrative.
The structure of each of the small clips could also have a short, simplified dramatic structure.
The introduction to the character need only happen once, since it is the same character that
experiences all the events in the narrative. An example of a scene could be that the character
is faced with the problem of a locked chest. First he examines the chest, noticing that it is
locked. As he tries to unlock the chest the excitement rises until suddenly, the chest pops
open, essentially representing the climax. Whatever is inside the chest could be of amusement
to the character and the fade out could be the character playing with his new found toy.
A more detailed description of the content of the scenes will be described in analyzing
different kinds of humor and how these should be laid out to the viewer. However, the overall
narrative structure of the clips should fit inside the star shape structure, such that all effects
are caused by the user. The small individual clips should have some dramatic development
27
Group 08ml582
3. Analysis
Lighting, sound and acting
applied, in order to keep the user interested, even if it is not as detailed as in the full dramatic
structure.
3.3 Lighting, sound and acting
Having explored some of the aspects of narration and how to use them to tell a story suited
for this project, it is necessary to look at the elements that will influence the perception of the
movie clips; elements such as lighting, sound and acting. When utilized right, each of these
elements can be very effective in e.g. creating suspension, supporting a mood, help/guide the
viewer etc.
This chapter will introduce the terms and theories relevant for utilizing lightning, sound and
acting in the movie clips and describe how these can be used in the project.
3.3.1 Lighting
The elements that help make sense of a scene’s space (which is the entire area of the scene
itself) are highlights and shadows, where shadows have two further elements: shading and
cast shadows.
There are four features of film lighting (Bordwell & Thompson, 2008):
•
Quality – the relative intensity of illumination.
o Hard light
•
o Soft light
Direction – the path of light from its source(s) to the object lit.
o Frontal lighting – eliminate shadows.
o Side-lighting
o Back-lighting – creates silhouettes.
o Under lighting – distorted features, used to create dramatic or horror effects.
•
o Top lighting
Source
o Key light - primary
o Fill light – primary
o Back light – secondary
28
Group 08ml582
3. Analysis
o High-key - used for comedies, adventure films and dramas. Low contrast and
the quality is soft light.
o Low-key- used for mysterious effect and creates high contrast and sharper,
darker shadows. The quality is hard light, and the amount of fill light is small or
•
Color
none at all.
o The standard way is to work with as white a light as possible, which can then be
manipulated either by the use of filters on the set or by computer programs
later in the process.
As it can be seen in the above list, there are numerous ways to use different kinds of lights.
However it is already possible to narrow down at this point, since the storyboards will only
include humor and comedy. This means that elements like under-lighting and low-key light
are unlikely to be used, since their primary use, according to the above list, is within the
horror genre.
The need of light for this project is rather simple. The only specific requirement for the light is
the visibility of movements, gestures and facial expressions. This is crucial since there will be
no dialog present.
3.3.2 Acting
The performance of a character consists of visual elements and sound elements. Belonging to
the visual elements are appearance, gestures and facial expressions, while e.g. music and offscreen sound belong to the sound elements (Bordwell & Thompson, 2008).
The project character will not be able to speak and thus the visual elements and sound effects
can fill this void, which means that his movements and facial expressions will largely be in
charge of telling the story. In order to fulfill this task, the movements and facial expressions
will be inspired from Disney’s 12 Principles of Animation (these will be elaborated in chapter
4.4.2 Disney’s 12 Principles of Animation) in the aspect of exaggeration. A situation where our
character is to be surprised could e.g. be done by making him jump a large distance upwards
or having his eyes increase in size and have a pop-out-of-the-head effect.
Furthermore his gestures must be easy recognizable and understandable. It is important that
a person watching is able to follow and understand what he sees without the help of dialog.
29
Group 08ml582
3. Analysis
3.3.3 Sound
Sound is a powerful element in films, and using it the right way provides several interesting
possibilities. The director can by using sound, alter the way a viewer sees and understands
the: The viewer is shown a series of images, but there are three different soundtracks
accompanying these images. Each one leaves the viewer with a very different understanding
than he would have gotten, had he watched either of the two others. The only changing
element in this example is the sound. However it needs to be said that in this example a
narrator is also heard along with the soundtracks, which has a major effect on the different
ways they are understood (Bordwell & Thompson, 2008, s. 266).
Another important feature of sound is its ability to direct attention to a specific area, object
etc. A very simple example is the mentioning of an object by the narrator while that object
appears on the screen. The viewers gaze will most certainly turn towards this mentioned
object.
Sound can also prove useful for anticipation along with direction of attention. The sound of a
squeaking door will result in the thought of a door and thus if a door should appear in the
following shot, then that would be the focus of the viewers attention along with a wondering
of who might enter. However if the door remains closed, the interpretation process begins:
One might think that maybe it was not the door but something else.
This is one example of how sound can be used to make people feel more immersed in films.
Thus sound can clarify events, contradict them and also aid the viewer to form expectations
(Bordwell & Thompson, 2008, s. 265).
These ways of utilizing sound can definitely be incorporated in the project. Some of the
actions of the character could be clarified by the use of sound. E.g. a squeaking sound from the
floor boards would help clarify the action and set the mood in a situation where the character
attempts to be quiet and therefore walks in a sneaky manner.
Sound effects are most certain to be used for other clarification aspects as well. Popular
examples from the cartoon arena are e.g. the long whistle when something heavy is falling
towards the ground or the windy “woof” when something disappears rapidly (Jones &
Monroe, 1980).
30
Group 08ml582
3. Analysis
Another technique is the use of off-screen sound, which could make the viewer more
interested and immersed in the way that he cannot see what is going on, only hear it. Off-
screen sound can create the illusion of a larger space than the viewer will ever see, and it can
shape the expectations about how the scene will develop (Bordwell & Thompson, 2008, s.
279). With regards to the creating of the storyboards, off-screen sound could prove very
useful. Firstly because of its’ ability to make the viewer more immersed, and secondly - and
more practically - because it can reduce animation time considerably, since the event causing
the sound does not have to be animated, but can be imagined by the viewer alone.
A different tool to use revolves around whether to use diegetic or nondiegetic sound:
Diegetic sound is sound that has a source in the story world. Spoken words by characters,
sounds coming from objects or music which comes from an instrument inside the story world
are all diegetic sounds.
Nondiegetic sound is defined as coming from a source outside the story world. The most
common examples of this is music that is added in order to enhance the films action, or the
omniscient narrator whose voice in most cases does not belong to any of the characters
(Bordwell & Thompson, 2008, s. 279).
There are also nondiegetic sound effects. Bordwell and Thompson mention an example where
a group of people are chasing each other, but instead of hearing the sounds they would
naturally produce, the viewer hears various sounds from a football game; crowd cheering,
referee’s whistle etc. The result of this is an enhanced comical effect (Clair, 1931).
Diegetic sounds are much likely to be used in the animations with regards to footsteps,
interaction with objects etc; however nondiegetic sounds could also be included either to
enhance important sequences or to achieve another effect. If one thinks back to those Sunday
mornings watching cartoons, one might remember the vast amount of nondiegetic sounds
being used in these, such as the many Warner Brothers Looney Tunes cartoons. The primary
purpose of nondiegetic sound in these cartoons is to support and enhance the comical effect,
but according to Bordwell and Thompson, it is also possible to blur the boundaries between
diegetic and nondiegetic, which can then have a surprising, puzzling or humoristic effect on
the audience.
31
Group 08ml582
3. Analysis
Camera movements and angles
3.4 Camera movements and angles
With the narrative structure in place, the lighting set up correctly, the correct sounds
recorded and the acting determined, all these elements has to combined in a actual movie and
for this purpose, a camera is needed. However, the movements of this camera also have to be
planned, in order to emphasize the structure and mood that is aimed at.
When you want to make a film, the camera should not merely be randomly positioned before
filming: Planning is the key. Everything visible in the shot is there for a reason, be it for setting
the mood, the environment etc. Random objects are not put in the scene for no reason. The
camera is the eyes of the audience: What it does not show, the audience cannot see and
therefore one must be aware of the techniques and methods to use and what results they
provide.
3.4.1 The image
Knowledge about how to setup a scene and the choices of what to show of this scene is
important, since the location of objects and characters can drastically change how the scene is
perceived. For instance in a scene where an object is highly important, it would most likely be
placed in the center of the image and probably also be in the foreground compared to actors
or other objects in the scene.
This information is useful when creating storyboards to guide the users’ attention to specific
parts of a scene, ensuring that they notice the plot points in a story.
32
Group 08ml582
3. Analysis
It is important to maintain a balance in the scene. If for example there is a single character
present in the shot, he should be more or less at the center of the image and if there are more
characters, they could be placed so a balance is kept and this can be seen in Illustration 3.4.
The reason for this is an attempt to distribute elements of interest evenly around the image,
thus guiding the viewer’s attention (Bordwell & Thompson, 2008, s. 142). The viewers tend to
concentrate their gazes more at the upper half of the image, since it is where the faces of the
characters are usually to be found.
Illustration 3.4: Balancing the image, either with on ore several characters in the frame.
However creating an unbalance can produce interesting effects as well. As it can be seen in
Illustration 3.5, there is a significant overweight in the right side of the image.
Illustration 3.5: Overweight in one side of the image
This creates the effect of making the father seem superior, since there besides him are more
people and weight on his side. On the other side is the son, which is perceived as smaller and
more vulnerable because he is such an ineffective counterweight.
33
Group 08ml582
3. Analysis
These examples shows how much Information it is possible to present to the user, by the
placement of the camera. In this case the viewer is presented with character characteristics
from a single shot (Bordwell & Thompson, 2008, s. 143).
Illustration 3.6: The character in a slightly unbalanced image.
Illustration 3.7: As she lowers her arm, the door comes more into focus and the viewer begins to form expectations.
Another aspect of balance in the shot is when the director wants to play with the expectations
of the viewers. The actions taking place in Illustration 3.6 and Illustration 3.7 show how such
expectations are formed. In Illustration 3.6 the focus is on the actress in a slightly unbalanced
image but in Illustration 3.7 she has moved further to one side of the screen, and the door
comes into focus. Now with the enhanced unbalance in the scene, the viewer has begun to
form expectations regarding the door and who might enter. Working with this unbalance is
also known as preparing the viewer for new narrative developments (Bordwell & Thompson,
2008, s. 143).
3.4.2 Angle and Distance
The purpose of this chapter is to provide an overview of the features and purposes of the
various shots in the movie clips. For the project this will become very useful when creating
34
Group 08ml582
3. Analysis
the storyboards. These techniques are mere guidelines, as there is no universal measure of
camera angle and distance.
It will be of great help to know how to create the various effects to be used in the movie clips,
e.g. triggering the expectations of the viewers or emphasizing an important gesture of the
character.
What the camera can see, the viewer can see. The viewer can see a frame, or a “window” so to
speak, in a space and where this frame is placed has great importance as to how the film is
experienced and perceived.
The number of possible camera angles is infinite, since the camera can be placed anywhere.
However there are three general categories between which filmmakers usually distinguish.
•
The straight-on angle.
This is the most common and it portrays the scene laterally, as if the viewer was
standing in the room. This angle can be seen in Illustration 3.8. The effect of using this
angle is a very neutral shot with little or no psychological effect on the viewer.
Illustration 3.8: A shot from the straight-on angle
•
The high angle.
As the name implies, the viewer is above the scene and looking down at it. This is also
known as birds-eye perspective, and can be seen in Illustration 3.9. A possible effect of
this angle is that the viewer perceives the subject as being small, weak or inferior.
35
Group 08ml582
3. Analysis
Illustration 3.9: A shot from a high angle
•
The low angle.
The viewer is looking up at the scene. This is also known as frog perspective. See
Illustration 3.10. A psychological effect of using this angle is the viewer perceiving the
subject as powerful, threatening or superior.
Illustration 3.10: A shot from a low angle
The framing of the image does not just position the viewer at a certain angle from which to
see the scene, but also controls the distance between the camera and the scene. This camera
distance is what provides the feeling of being either near or far from the scene. The following
examples are to be interpreted not as an absolute rule, which applies for all films, but should
rather be thought of as guidelines which can be helpful in the pre-production process, e.g. for
storyboards.
Camera distances are presented as follows, (with the human body as measure).
•
Extreme long shot
The human figure is barely visible, since this framing is normally intended for
landscapes and birds-eye views of cities.
36
Group 08ml582
3. Analysis
Illustration 3.11: Extreme long shot
•
Long shot
Human figures are visible, but still it is the background that dominates.
Illustration 3.12: Long shot.
•
Medium long shot
In these shots, the human figure is usually framed from the knees and up. This is a
frequently used shot, because it provides a pleasant balance between illustration and
surroundings.
Illustration 3.13: Medium long shot.
•
Medium shot
This frames the human frame from the waist and up. Gestures and expressions are now
becoming more visible.
37
Group 08ml582
3. Analysis
Illustration 3.14: Medium shot.
•
Medium close-up
This frames the body from the chest and up. Again more focus on gestures and
expressions.
Illustration 3.15: Medium close-up.
•
Close-up
Traditionally this shot shows only the head, hands, feet or a small object. Facial
expressions, the details of a gesture or of an object are emphasized.
Illustration 3.16: Close-up
•
Extreme close-up
This shot singles out a portion of the face, which would often be the eyes or the lips.
Another purpose is to magnify or isolate an object (Bordwell & Thompson, 2008, s.
191).
38
Group 08ml582
3. Analysis
Humor
Illustration 3.17: Extreme close-up
As seen in this chapter, the camera can have a major influence over aspects of a movie such as
balancing an image. If an image turns out being unbalanced, it is important to do this
deliberately to obtain a certain effect in the image, such as suggesting status of a character vs.
another character in a frame. Camera angles can also be used to great effect in playing with
the expectations of the viewer, e.g. by making a character in the frame directly interact with
something or someone outside the frame, making the user form expectations about this
outside influence on the character. The camera can assume different angles of the viewing the
scene, such as a birds-eye view and thereby completely change how the scene is viewed.
Finally the camera can - by using various degrees of zooming - focus on various parts of a
scene, such using a close-up to focus on facial expressions or using an extreme close-up to
magnify or isolate an object.
3.5 Humor
In chapter 2.1 Narrowing down emotions it was decided to utilize smiles as the emotion that
will influence the movie. Based on this decision, humor will be a necessary part of the movie
clips. Using humor should provide many opportunities for having the user smile and thereby
control the movie clip, providing measurable data to analyze. Also, without using humor in the
movie clips, it becomes very difficult to provide a controllable opportunity for the user to
smile, making the inclusion of humor in these movie clips futile when testing their reaction to
a smile. Therefore, this chapter will focus on different ways of communicating humor, with the
goal of finding the best types of humor to implement in this project.
In chapter 3.1 Delimitation of the character it has been chosen not to have a mouth on the
character, which exclude any form for humor relying on dialogue. In Paul Wells
Understanding Animation (Wells, 1998), there is a comprehensive list of ways to start
laughing in animation. In this list, the only humor type without dialogue is slapstick. For that
reason this chapter will focus on that particular humor type, based on two well-known
cartoons.
39
Group 08ml582
3. Analysis
Humor
When excluding dialog a natural approach would be applying humor to facial gestures and
body language and support it with sounds in the different scenes. By using anthropomorphic
characteristics for a main character, as done with a character like Jiminy Cricket in Disney’s
Pinocchio who can be seen in Illustration 3.18, the audience is able to understand the
character on human terms, even though he is indeed not human.
Illustration 3.18: The anthropomorphic characteristics of Jiminy Cricket make him appear human like, even though
he is an insect.
When creating a character it is important to give it some kind of personality, in order to relate
to the comic aspect of the character. As mentioned earlier in chapter 3.1 Delimitation of the
character, the character does not speak so the humor must be centered on the behavior of the
character and the movie clips.
Two key aspects that influence the personality are (Wells, 1998, s. 129):
•
•
Facial gestures which clearly identify particular thought processes, emotions and
reactions experienced by the character.
Physical traits and behavioral mannerisms common to, and recognized by, the
audience, which remain largely consistent to the character.
Even though the main character has no mouth, it is still possible to make facial gestures to
express thoughts, emotion and different types of reaction using only the eyes of the character.
To quote Richard Williams, director of animation on the film Who Framed Roger Rabbit
(IMDB.com, 2008):
40
Group 08ml582
3. Analysis
Humor
“Our eyes are supremely expressive and we can easily communicate with the eyes alone. We can
often tell the story just with the eyes.” (Williams, 2001, s. 326).
Looking at famous personalities from the Disney universe like Mickey Mouse, Donald Duck
and Pluto, it is not the jokes we remember, but the behavior and reaction from each of the
characters, e.g. Donald as the hot-tempered duck who swears and curses. Frank Thomas and
Ollie Johnson, both veteran animators at the Disney Studios, suggest that Walt Disney
understood that the fundamental principle of comedy was that:
“The personality of the victim of a gag determines just how funny the whole incident will be.”
(Wells, 1998, s. 130).
That is why it is funnier when Donald slips in a banana peel, instead of one of his nephews
slipping in it. Donald’s hot-tempered personality will burst out in swearing and cursing.
Thomas and Johnson also say about Walt Disney, that he, rather than thinking of cartoon
material as being entertaining, he thought of it as being captivating. It is all about impressing
the audience; make them forget the real world and lose themselves in the cartoon universe.
Walt Disney had to find funny actions in everyday life, but still connected to something well-
known and based on everyone’s experience (Wells, 1998, s. 131). Examples of such a
connection can be seen in Illustration 3.19.
41
Group 08ml582
3. Analysis
Humor
Illustration 3.19: Shows Walt Disney Donald Duck in two typical unlucky situations.
3.5.1 Tex Avery’s approach
A man who went in another direction of what Disney did was Tex Avery. Avery was a famous
animator and cartoonist and has done much work for Warner Bros, such as A Wild Hare
(Avery, 1940) or Porky's Duck Hunt (Avery, 1937). Avery rejected the cuteness often used in
Disney animation and was going for the madness in the cartoon universe. Avery is behind
characters such as Daffy Duck (IMDB.com, 2008) and Bugs Bunny. Tex Avery realized that
physical comedy slapstick would be satisfying for children, while adults would require some
more mature themes. These include (Wells, 1998, s. 140):
•
•
Status and power, and specifically the role of the underdog.
•
emergence of previously feelings.
•
Irrational fears, principally expressed through paranoia, obsession, and the reThe instinct to survive at any cost.
A direct engagement with sexual feelings and sexual identity.
Slapstick was originally a term for making sound when an actor was e.g. hitting another actor
on stage. Slapstick consists of a pair of sticks, making a loud sound when struck: the recipient
actor will react comically to the impact, in order to make it more amusing.
Alan S. Dale refers to M. Wilson Disher who claimed that there were only six kinds of jokes, falls, blows, surprise, knavery, mimicry and stupidity (Dale, 2000, s. 3). But for a comedy to
42
Group 08ml582
3. Analysis
Humor
register as slapstick, the fall and blow are the only types needed. The fall occurs e.g. when a
guy slips in a banana peel. The blow is used in scenes where a guy gets hit by e.g. a pie in the
face, causing a loud sound to appear. It is arguable that the soul of a slapstick joke is the
physical assault on, or fail of, the character.
Avery took cartoons to the extreme with Screwball Squirrel (Avery, IMDB, 1944). Screwball
Squirrel is an entirely unlikeable smart guy with a very aggressive and amoral personality. His
only mission in life is to get his opponent to suffer, typically by exposing them to extreme
pain. In Screwball Squirrel the start scene seems a bit “Disney-like”: we see a cute squirrel
doing happy walk/dancing in the forest and stops by Screwball Squirrel. He asks the cute
squirrel what kind of cartoon this is going to be, and the cute squirrel starts blabbering about
how it is going to be about him and how cute he is. Screwball hates it and a second later we
see him beating up the cute squirrel. The narrative context in Avery’s films is often disrupted
with various small gags and unexpected black humor, such as the cartoon character talking to
the audience. The entire cartoon consists of many shorter clips, some more extreme than
others, were Screwball continuously makes his opponent - in this case a dog - suffer in the
most crazy ways. The disruptions in Avery films became the narrative itself.
43
Group 08ml582
3. Analysis
Humor
Illustration 3.20: Shows the start scene of Screwball Squirrel were he beats up the cute Disney like character.
3.5.2 Chuck Jones’ approach
Chuck Jones is the father of the cartoon “Road Runner”. Chuck Jones’ cartoons are similar to
Avery’s, but what characterizes Jones’ work was his interest in limiting the logic of a situation
instead of over-extending it. Jones had some rules for his famous Coyote and Road Runner and
it became a sort of comic model. The rules are as follows (Wells, 1998, s. 150-151):
1. The Road Runner cannot harm the Coyote except by going “Beep-Beep”.
2. No outside force can harm the Coyote – only his own ineptitude or the failure of the
ACME 3 products.
3. The Coyote could stop at anytime - if he was not a fanatic 4.
4. No dialogue ever, except “Beep-Beep”.
5. The Road Runner must stay on the road – otherwise, logically, he would not be called a
Road Runner.
6. All action must be confirmed to the natural environment of the two characters – the
south-west American Desert.
7. All material, tools, weapons or mechanical conveniences must be obtained from the
ACME Corporation.
8. Whenever possible, make gravity the Coyote’s greatest enemy.
9. The Coyote is always more humiliated than harmed by his failures.
The audience of Road Runner always knows that the Coyote would fail in his attempt to catch
the Road Runner, but the audience love to see the Coyote try again and again. The gags appear
when something happens instead of how it was expected to have happened; because of
Coyote’s reaction to bizarre failure of his ACME products or how the environments always
3
4
44
A company supplying unusual inventions
A fanatic is one who redouble his effort when he has forgotten his aim – George Santayana
Group 08ml582
3. Analysis
A reacting narrative
conspire against him. Jones believes that this is as important as the Coyote getting physically
injured. Jones also had a skill for creating comic suspense, when the audience recognizes the
gag or the seed of a building joke. In one episode the coyote builds an ACME trapdoor in the
middle of the road, which will swing up with at use of a remote. The Coyote pushes the remote
when the Road Runner speeds around the corner, in hopes of having the Road Runner coming
to an end. Nothing happens of course, so the Coyote walks out to the trapdoor in the middle of
the road to check the mechanism, pushes the remote one more time. Here the audience
expects the door to swing up and harm the Coyote in some way, but nothing happens and the
scene ends. A new scene begins with another chase between the Coyote and Road Runner,
and, when the audience almost forgotten the failure from before, the trapdoor swings up in
the face of the Coyote. It is the delayed outcome and the surprise that makes it funny.
Illustration 3.21: Coyote is always trying to catch road runner and sometimes with the most crazy inventions, like
this rocket-bike with handlebars and a cross-hair.
3.6 A reacting narrative
Having explored how to create the movie clips, the next step is to determine how to make
them react to user input in a proper manner, such that the movie playback becomes
structured and meaningful.
The narrative can be controlled through a tree structure and the user can choose between
different paths through the story. One could imagine the timeline as a tree where every split is
a choice as seen on the left side of Illustration 3.22.
45
Group 08ml582
3. Analysis
A reacting narrative
Movie 2
Choice
Movie 3
Movie 1
Choice
Choice
Choice
Movie 2
Movie 3
Choice
Movie 3
Choice
Choice
Movie 1
Movie 4
Movie 4
Movie 7
Movie 4
Movie 4
Movie 4
Movie 2
Choice
Movie 3
Neutral
state of
character
Movie 6
Movie 3
Movie 4
Movie 4
Movie 4
Movie 5
Movie 4
Illustration 3.22: To the left: Tree structure of choices in an interactive movie. To the right: Star structure
When looking at the different narrative structures described in chapter 3.2 Narrative
structures, the sequence of the story is fixed and should not be changed, if the structures are to
be utilized effectively. The choices should therefore not change the structure, but alter the
different elements in the structure. For example, if using the Hollywood model the
presentation could change from introducing a boy to introducing a girl.
The star structure is looser because it does not have a timeline. A movie clip is played, starting
with the character at the neutral state and going to one of the possible branches. After this
movie clip is finished, the user makes a choice and based on this, another movie clip is played
with the character once more starting at the neutral state and going to a new branch, which
reflects the choice made by the user.
When choosing the appropriate structure for reactive control of the narrative, the form of the
movie should be evaluated. If the movie contains one storyline with several different parts,
the tree structure would be appropriate. If the movie however contains of multiple small
stories, the movies would fit better into a star structure.
This sub-chapter indicates that the Star structure is suitable for solving the project problem,
since each of the branches could represent a different type of humor, allowing the program to
change between these types, based on the choices made by the user.
This concludes the first part of the analysis, which is concerned with the creation of movie
clips, with regards to narrative structures and elements of Mise-en-Scène. It was discovered
how these elements can aid the design of the movie clips. Furthermore, different
interpretations of the slapstick humor were described. The next step will be to analyze on the
46
Group 08ml582
3. Analysis
Programming languages
program that is needed to connect the users’ facial expressions to the playback of the movie
clips.
3.7 Programming languages
Having been through the topics relating to the creation of the movie clips, the rest of this
analysis will be concerned with the theory of the smile detection, which is needed for the
product to be successful. This subchapter will look at what programming languages are
available in order to create the smile detection and playing back video. Many of these include
the same features and are combinations of the same few programs and programming
languages.
Keeping in mind the limited time for this project, learning a completely new language for
developing this tool would be too time-consuming, and as such the choices of possible
programming languages could be narrowed down to those known by the group. These
languages were C++ and Max/MSP with the Jitter extension.
3.7.1 C++
One method of creating this program would be to use a mixture of C++ with OpenCV (Bradski
& Kaelhler, 2008) and Adobe Flash. C++ is a textual programming language, and Flash is a
graphical animation tool with support for minor programming through the use of a scripting
language called Action Script. OpenCV is a collection of libraries for C++ containing functions
allowing work with computer vision. OpenCV includes a feature matching function using the
Haar cascades, which is an algorithm for finding objects in a digital image, as described in
chapter 3.8.1.1 Haar Cascades, and would therefore be a close to premade tool ready for use.
This method would work by having OpenCV allowing the data from a webcam stream to be
loaded into C++ - converting the feed into a stream of images - and from there manipulate the
images, allowing the full freedom provided by C++, and afterwards passing the data about the
detection to Flash (through file input/output), which would then serve as the graphical
interface used for playing back the video and sound for the program. One of the drawbacks
with this method is that videos in Flash have a tendency to lose synchronization between
audio and video, which ruins the purpose of playing a video using this method. The biggest
problem however, is the way that data has to be passed from C++ to Flash. The method of
doing this is by saving the data and instructions from C++ into a text file every frame, and then
47
Group 08ml582
3. Analysis
Programming languages
having Flash loading these data at the same speed. This however had issues in which the
whole program had slowdowns, and in the worst cases Flash attempted to read the
instruction file while C++ was writing to it, effectively making the program crash.
It could also be done using C++ with only OpenCV. The main drawback with C++ is that it is
time consuming as you have to create the entire program from scratch. This drawback
especially shows itself in OpenCV’s lack of capability to playback video stream alongside audio
streams. There are two ways to get past this obstacle, either programming a media player in
C++, or performing a system call, which uses the command prompt to execute an already
existing media player for the playback. The system call is made in C++ by using the system
command as follows:
system(“start calc”);
This code would start the Windows Calculator and calling other Windows applications would
be done in a similar manner.
An example of such a media player could be Windows Media Player, which exists on all
personal computers using the Windows operating system.
3.7.2 Max/MSP with Jitter
Max/MSP is a graphical programming interface that allows programming to be done at a
faster speed than a textual programming language such as C++. Its main force is that it is easy
to work with and it is possible to create simple solutions and prototypes in short amount of
time. However its’ drawback is that it offers less freedom compared to languages such as C++.
Max/MSP focuses mainly on sound manipulation, but has extensions increasing the
possibilities within the program regarding visual applications..
One such extension is the Jitter, which provides Max/MSP with the ability to display and
manipulate graphics such as videos, webcam feeds and even 3D objects. Jitter includes several
basic tracking forms such as blob detection and facial recognition. The facial recognition, seen
in Illustration 3.23 could then be customized into only tracking the smiles instead of the entire
face.
48
Group 08ml582
3. Analysis
Smile detection
.
Illustration 3.23: Facial tracking being done in Max/MSP Jitter, used to manipulate the view of a 3D scene.
From looking at these possibilities, the two methods that would serve this project the best
would be either Max/MSP due to the fact that it is less time consuming to program in, and
would allow a prototype to be up and running within a minimal amount of time, or C++ using
only OpenCV, based on the degree of freedom it offers, as well the possibility of using already
existing media players for the playback of video and audio. Both of these choices also contain
a feature detection method, and in both cases this can be modified for use in detecting smiles.
As such the required work to be done is close to being the same regardless of which of the two
programs are being worked with.
The group has more experience working with C++ and OpenCV, giving the group knowledge of
the fact that OpenCV provided many opportunities for creating a suitable program, along with
OpenCV being far more flexible than Max/MSP, as opposed to needing to adjust to the working
environment in Max/MSP and examining how this can be used to solve the problem. Based on
this, the chosen programming method was decided to be C++ with the OpenCV libraries.
3.8 Smile detection
Smile detection is a branch of face detection, which determines the presence of a smiling
mouth in a digital image. The reason to explore the technology of smile detection in relation to
this project is to be able to create a smile detection program, which can track the users of the
49
Group 08ml582
3. Analysis
Smile detection
final product and determine if they are smiling. As described in chapter 2.1 Narrowing down
emotions, smile will be an appropriate way to determine if the user is finding a movie clip
funny and therefore, smile detection is vital for the success of the product connected to this
project.
Smile detection is, as mentioned in chapter 1.2.2 Facial Expression Detection, used in compact
cameras and is often called “Smile Shutter” (Sony New Zealand Limited, 2007) and, as the
term implies, the shutter is triggered when the camera detects a smile. The algorithm for the
cameras “Smile Shutter” is often proprietary code and is therefore not accessible to the public.
The function is often combined with face detection so that the camera sets focus on faces,
which is often the most important part of amateur pictures. Furthermore the Sony DSC-T300
supports recognition of children’s smiles and adults’ smiles (Sony Corp., 2007), hence it can
differentiate between adults and children using face detection.
To analyze smile detection, it is necessary to divide the topic into subtopics and thoroughly
investigate the different parts:
The first step of smile detection is to find a face using face detection. After finding a face, the
next thing is to find the mouth of the person. The smile detection on the mouth picture is the
final part. These steps could be the workflow of the final smile detection. The main challenge
is to get a sufficient amount of pictures, to train the smile detection as described in the next
section, chapter 3.8.1 Training, and make the smile detection use the training material.
3.8.1 Training
Smile detection training is done by going through many pictures of smiling faces and finding
common characteristics. An often used method is to reuse pre-made database. Table 3.1 gives
an overview of some of the large database collection of face pictures.
50
Group 08ml582
3. Analysis
Smile detection
Name
Images
Sequences
Subjects
Expressions
Remarks
Thesis Database
Original and
> 1400 RGB
12
Played and natural
Natural
9 mug-shots per
RGB
284
Natural
Expr. Often subtle or
None
329 BW/RGB
100
Played
> 40,000
Includes some
68
Played
> 14,000 BW
None
1009
Played
None
10
Played
HID
processed
Cohn-Kanade
subj.
CMU-PIE
FERET
JAFFE
213
talking clips
Table 3.1: Overview of existing databases of faces (Skelley, 2005)
Head movements
mixed
FACS coded, no head
movement
Large Variations in
Pose and Illum.
Designed for
identification
Only Japanese
Women
The databases can be e.g. sorted by facial expression and include different illuminations in
order to make the face detection brightness independent. The CMU-PIE database contains
more than 40.000 tagged pictures (Skelley, 2005). The tags describe e.g. the facial
expressions, marking them as happy, bored etc., which is particularly useful when training for
smile detection. A major drawback on the already existing databases is that they are mostly
not accessible for the public and require licenses.
Therefore the database can also be made from bottom up if the premade databases are not
sufficient (e.g. not including smiling pictures of pictures of too low quality), if they are too
large or inaccessible. The training can be very time-consuming and the amount of pictures
plays an important role time-wise. However, it is important to have as much training data as
possible to make the detection less sensitive to different face shapes, beards, skin color and
the like.
3.8.1.1 Haar Cascades
There are different training algorithms that can be used. OpenCV use the so called Haar
cascades (OpenCVWiki). The Haar cascade training requires a massive amount of calculations
and therefore is very time expensive. According to Ph.D. student at the University of Maryland,
Naotoshi Seo, training using the CMU-PIE database mentioned in Table 3.1 can take up to
three weeks (Seo, 2008).
When using Haar cascades to detect objects, the classifiers made by the Haar training are
loaded and compared with the live feed. The Haar cascade detection use different shapes to
51
Group 08ml582
3. Analysis
Smile detection
compare the classifier with the live feed from the webcam. The rectangular shapes used for
matching are called Haar features.
Illustration 3.24: Haar-Cascade patterns. The orange/white shapes are each mapped to a different facial feature, of
either a smile or a non-smile.
The shapes make the Haar cascade comparison more reliable than simpler methods
(described in chapter 3.8.1.2 Simple method), because it compares on subdivisions of the
image. A Haar cascade is a series of object comparison operations encoded in a XML file
featuring a simple tree structure. The different features are compared and the algorithm only
detects a match, if every feature in the check is a positive match.
3.8.1.2 Simple method
Another approach to image classification is to assume, that the smile and non-smile images
build clusters in a multidimensional space as shown in 2D on Illustration 3.25. The images are
converted into vectors, by storing each pixel of the image in an indexed vector, a vector having
a number for each position.
52
Group 08ml582
3. Analysis
Smile detection
Illustration 3.25: Clusters of pictures in multidimensional space. Pn is non-smile cluster of the training data. Ps the
smile cluster of the training data. The purple line is the picture currently being tested on. The black line is the threshold.
The mean of the images is calculated using the formula below and the converted pictures
create clusters into the multidimensional space.
𝑛𝑛
1
� = 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑣𝑣𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
𝑭𝑭𝑭𝑭𝑭𝑭 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊 𝑝𝑝̅ = � 𝑝𝑝𝑖𝑖 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝒏𝒏 = 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑎𝑎𝑎𝑎𝑎𝑎 𝒑𝒑
𝑛𝑛
𝑖𝑖=1
Equation 3.1: Mean calculation
The mean approach is simpler and much faster, but it does not compare the images in the
same way as the Haar training does.
The smile detection training can be done in the same ways, but should be focused on mouth
and the facial features that reproduce a smile. The Haar cascade training is much more time
consuming than the simple approach and even though OpenCV contains compiled training
programs, the programs are not well documented.
OpenCV uses Haar cascades to detect faces and therefore Haar cascades will provide an easy
way to realize face detection, being almost fully implemented in the OpenCV library. However,
since the analysis found no Haar cascades for smile detection, this will have to be manually
implemented by the group. Since the algorithm for training Haar cascades is too complicated
for this project, the Simple method will most likely be used to create smile detection.
3.8.2 Detection
This subchapter will suggest two methods on how to compare the mean training image with
another image.
53
Group 08ml582
3. Analysis
Smile detection
3.8.2.1 Method 1 – Absolute value difference
For every pixel: get the difference in pixel value and evaluate the sum of the pixel difference:
𝑘𝑘
�|𝑝𝑝𝑖𝑖 − 𝑞𝑞𝑖𝑖 |
𝑖𝑖=1
𝒌𝒌 𝑖𝑖𝑖𝑖 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝, 𝒑𝒑 𝑖𝑖𝑖𝑖 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖,
𝒒𝒒 𝑖𝑖𝑖𝑖 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑜𝑜𝑜𝑜 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 (𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑓𝑓𝑓𝑓𝑓𝑓 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜) 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
Equation 3.2: Method 1 – Absolute difference
To illustrate how to use the equation, a mean image and a live image, both with the
dimensions 1x3 pixels can be compared like this. In the program there would be two different
mean images that would have to be checked, but to make the idea clear, Illustration 3.26 only
shows the calculations for one mean image:
Illustration 3.26: Enlarged mean training image to the left and live image to the right
The first pixel is 100 in the mean training image and 185 in the live image, hence the
difference is 85. So the calculations are as follows:
|100 − 185| + |30 − 40| + |230 − 255| + 𝑡𝑡 = 𝟏𝟏𝟏𝟏𝟏𝟏 ,
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑡𝑡 𝑖𝑖𝑖𝑖 𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜
Equation 3.3: Calculation of sum of difference with threshold = 0
The computation in Equation 3.3 shows that the total distance from the live image to the mean
image is equal to 120. Let us say this is the total distance from live image to mean smile image.
Then the program would do the same comparison for the mean image of non-smiles. If this
comparison produces a value above 120, the live image would be detected as smiling. The
threshold can bias the results towards smiling or neutral, in order to compensate for faces not
similar to training data. Illustration 3.25 shows how the threshold could move the decision
54
Group 08ml582
3. Analysis
Smile detection
boundary towards either cluster. The threshold will be set after evaluating the equation
without threshold.
Using the mean comparison it is possible to set a region of interest in the picture. The
difference between the pixels within the region of interest can be multiplied with a scalar to
make the pixels more important (if scalar > 1) or less important (if scalar < 1). This could be
used in the smile detection, where the middle of the mouth does not change as much as each
side of the lip. To make the middle pixel of our test picture a region of interest, the middle
calculation is multiplied with 3 and make the right pixel less important it is multiplied with
0.5.
|100 − 185| + |30 − 40| ∗ 3 + |230 − 255| ∗ 0.5 + 𝑡𝑡 = 𝟏𝟏𝟏𝟏𝟏𝟏. 𝟓𝟓 ,
Equation 3.4: Method 1 – Absolute difference with weighted threshold
The threshold should be adjusted when changing the equation.
3.8.2.2 Method 2 – Distance of pixels
Another approach of comparing images is to calculate the distance of the live image vector to
the mean image vector:
𝑘𝑘
��(𝑝𝑝𝑖𝑖 − 𝑞𝑞𝑖𝑖 )2
𝑖𝑖=1
𝒌𝒌 𝑖𝑖𝑖𝑖 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑜𝑜𝑜𝑜 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝, 𝒑𝒑 𝑖𝑖𝑖𝑖 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖, 𝒒𝒒 𝑖𝑖𝑖𝑖 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑜𝑜𝑜𝑜 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
Equation 3.5: Method 2 – Distance of pixel calculation
When trying to show how to utilize the formula, the example from Illustration 3.26 will be
used:
�(100 − 185)2 + (30 − 40)2 + (230 − 255)2 + 𝑡𝑡 ≈ 𝟖𝟖𝟖𝟖, 𝟐𝟐 ,
Equation 3.6: Method 2 – Distance of pixel calculation with threshold
Compared to the previous method, the result is smaller; hence the threshold should also be
smaller. Both the pixel difference sum and distance method is tested in the chapter 6.1 Cross
Validation Test.
55
Group 08ml582
3. Analysis
Analysis Conclusion
3.9 Analysis Conclusion
This chapter went through the aspects of creating a movie, establishing the star shaped
narrative structure as the most prudent for this project. The important factors of sound and
lighting, acting and cameras were explored and it was noted how sound is a vital part of
setting the correct mood of a movie, how light could be used to enhance details of the scene,
how acting can be used to emphasize emotions of the character and how it is important to
balance the scenes by placing the camera correctly.
The overall humor type of the movie clips were chosen to be slapstick humor, since it is easy
to incorporate this type of humor in short movie clips, which is necessary to make the users
smile in the short duration of the movie clips.
Finally, C++ was chosen as the programming language, with the support of the OpenCV
library, as this library has a built-n, well-functioning face detection system – based on the Haar
cascade training method - and C++ gives the necessary flexibility to develop the programs
needed.
3.9.1 Solution requirements
Using the different parts of the analysis, it is possible to formulate the following demands that
the design of the solution should comply with. The requirements will be additionally
elaborated in the different sub-chapters in chapter 4. Design, in order to make the
implementation as consistent as possible. The first part of the analysis contains with
storytelling, cinematic elements and humor. The requirements for this part of the analysis are
formulated as:
•
•
•
•
•
Character is without a mouth
The movie clips should be make the viewer smile
Clear narrative structure
Consistent use of slapstick humor
Use cinematic elements to enhance storytelling
The rest of the analysis covers interaction and programming. The solution requirements for
the programming part are as follows:
56
Group 08ml582
•
•
•
•
3. Analysis
Analysis Conclusion
C++ as programming language
OpenCV library for face detection
Smile detection is done by mean image comparison
Detection rate for smile detection must be at least 75%, 80% is desired
These solution requirements conclude the analysis. It has provided the project with a very
clear foundation and precise direction to begin the design phase of a final product that will
fulfill the final problem formulation.
57
Group 08ml582
4. Design
Character Design
4. Design
This chapter will look into the design of the product to be produced in the project.
The first step will be to design the storyboards for the movie, in order to show how the type of
humor is to be realized in a short movie clip. Furthermore, the various techniques regarding
cinematic elements such as lighting and sound will be utilized to aid the movie clips.
The character will be examined, with all the aspects that need to be taken into consideration,
when designing a character. The character should be suitable for the purpose of the project
and be doable within the given time limit regarding the implementation of the character in 3D.
When the character is in place, it has to be animated and put into a movie. The techniques of
animating will be explained in this chapter and the choice of these will be directly based on
the design of the character.
The overall concept of the order of the movies will also be discussed. The design of the
program will be decisive as to how the movies will be played and this has to follow the
structure of the narrative that was chosen earlier.
4.1 Character Design
It must now be decided how the project character is going to look. The character should fit
into the theme of the project and be able to mediate the desired effect to the user (trying to
make them smile). With this in mind, this chapter will go through the various steps from
deciding whether to make a complex or a simple character; making a realistic or cartoonish
character; drawing inspiration to the shape of the character in order to make it likable and
where inspiration was found; how the character will be further detailed down to its final form,
how this helps achieve the desired effect for the project, along with thoughts on what was not
chosen and why.
4.1.1 Ideas behind the character
When deciding the overall look of the main character, it was designed with the subject of “fun”
and “being likable” in mind. Based on the pre-analysis of choosing comedy and smile and the
analysis with the requirement of humor in the animations, it was important that the character
had potential for making people laugh, when they observe and smile at it.
58
Group 08ml582
4. Design
Character Design
What immediately comes to mind when thinking of a character that mediates “fun” and “being
likable” to an audience, is the cartoon-like character such as the various Looney Tunes
characters (see them all at the website: www.Nonstick.com (Nonstick.com, 2003)). There are
many other sources for character inspiration, such as Pixar or DreamWorks, but the reason to
look at Looney Tunes is both because these have been around for decades and are widely
recognized, but also since there is no inherent bias toward any of these characters in the
minds of the audience – they can all be the hero of the story and create jokes in their own way,
making each or all of them a good source for inspiration.
These characters are very simple, in that they are defined largely by their outer shape. A
character like Bugs Bunny has close to no muscle definition, wrinkles in his fur or similar
details within the shape of the character as shown in Illustration 4.1. Another character, such
as Daffy Duck, shares similar characteristics. For example, any indication that that he is clad in
feathers is shown by a few feathers sticking out of his head and shoulders. Otherwise, he is
also kept very simple as shown in Illustration 4.1. And yet, despite their obvious lack of detail,
they are capable of displaying a great range of personality, emotion and they generally
encompass a great deal of character.
Illustration 4.1: Bugs Bunny and Daffy Duck illustrate, that simple characters can portray both emotion and
personality.
With this in mind, the main character of this project will also be kept simple, since there are
many possibilities to let the animation breathe life into the character, be it simple or not.
59
Group 08ml582
4. Design
Character Design
This simplistic nature of the character allows more freedom with the character design. If the
object had been to create a realistic human, various elements, like the eyes, ears or nose,
would have to have very specific sizes, placements etc. Straying from realism and going more
towards a cartoon-like approach allows the design to truly emphasize what is important
about the character and seclude or completely remove unimportant features. An example
could be to make the eyes fill much of the head and minimizing the mouth and nose, thereby
emphasizing eye-movements and the shape of the eyes. This ensures that the character design
will deliver the necessary messages and cut away unwanted elements. Even if the character is
a traditional biped, walking on two feet, having to hands, the head on top etc, it still leaves a
great deal of freedom when aiming for a cartoon-like design. The hands could have only two
fingers, instead of five; the feet could be directly attached to the body without legs; the head
could be shaped like a sphere or a cube etc.
Thus, the character will also assume a cartoon-like look.
This choice will also help us emphasize key movements and events that take place around the
character. First off, it gives great freedom concerning animation-techniques such as the
weight in the animation, which will be explained in chapter 4.4.4 Weight. A realistic human
walk would most likely contain – possibly with slight variations – the basic elements of a
classic walk cycle, such as the one explained in chapter 4.4.3 The walk cycle, with the two
contact positions, the straight passing position and the up and down. But having a cartoon
character, it will not be expected of it to act and move exactly like a real human would,
especially if it is designed not to resemble a human being. The passing position could maybe
have the character crouched down, rather than straight up, the back foot could be delayed
way beyond the normal passing position and zoom past the other foot in a frame or two etc.
If the character is thrown at a wall, a normal human would react to the impact and then fall to
the ground quickly thereafter. But a cartoon character can emphasize this impact in a much
greater way. When hitting the wall, the character could deform and flatten itself against the
wall. Maybe it would stick to the wall for several seconds and then fall down to the ground,
maybe sliding along the wall down to the ground like a drop of water, or maybe even not fall
at all, requiring someone else to pull it off the wall again, like a sticker.
60
Group 08ml582
4. Design
Character Design
Having a cartoon-like character allows for the use of the 12 principles of animation (these
principles will be examined in greater detail in chapter 4.4.2 Disney’s 12 Principles of
Animation) with the aim of further emphasizing what the character is thinking or doing. E.g.
the 10th principle of Exaggeration can be used to push the shape of the eyes far beyond what
would look appropriate for a realistic human character or motion, such as a fall or a punch,
could be deliberately over-done to emphasize the motion. In relation to this, the 1st principle
of Squash and Stretch can help the exaggeration by e.g. overly squashing a character hitting
the ground or stretching a character reaching for an object.
In short, aiming for a simple and cartoon-like character allows for great emphasis, both with
the character itself, but also with regards to what happens to the character. Furthermore it
more easily enables the additional use of other animations tools or guidelines such as the 12
Principles to aid in letting the character getting its’ message across.
4.1.2 Constructing the character
The next step is to examine how to construct the character and look at how to do this.
When the user interacts with our character, it will typically be within the time-frame of a few
minutes. He will not have the duration of a feature film or even 10 minutes to become
acquainted with the character. He should not take a look at the character and then spend time
figuring out how it is going to move, if it can move at all etc. It would be more advantageous if
the user could gain an immediate assumption of how the character moves, so he can focus on
what happens with the character, rather than trying to figure out the character first. As such,
the aim is for a biped-character: A character, walking upright, on two feet, having two arms
and a head on top. This familiar character gives the user an immediate understanding of the
character and provides a wide range of possibilities regarding movements and interaction.
Walking, running, jumping, picking up objects, looking at objects, punching and many, many
other types of motions are available to this type of character. But the beauty of giving the
character a cartoon design is that it is possible to tweak and put a new spin on every single
motion, thereby making it unique and different from other characters that already exist.
4.1.3 The basic shape
The next step is to define the shape of the character. When composing a simple cartoon
character, it is prudent to construct it from simple shapes. However, constructing a character
61
Group 08ml582
4. Design
Character Design
from simple shapes is not limited to simple characters. It is also used on e.g. feature Disney
films, such as Aladdin, as shown on Illustration 4.2.
Illustration 4.2: Basic shape diagram for Aladdin characters
When the basic shape of the character has been determined, it can be further refined by
adding in necessary detail such as arms, the head, clothing etc. And even though the character
of this project will not be anywhere near as detailed as the characters from Aladdin, this
guideline for basic character creation has proven to be an efficient starting point, so it will also
be used for the character of this project.
4.1.4 A likable character shape
Determining which shape to use, the main goal is to make the character friendly and likable.
Disney’s strong line-up of characters from various feature films will be used for inspiration for
the basic shape of the character of this project. The reason for not using e.g. Looney Tunes
characters again is that, when looking for shapes not to use, there is little clarity to be found
regarding shapes defining a non-likable character. Practically every Looney Tunes character is
fun and likeable to watch and there are no real villains amongst them. However, in Disney
features, there are always identifiable good and likable characters along with the obvious bad
and non-likable villains, so there is a much more visible pattern regarding shape and the
nature of the character.
62
Group 08ml582
4. Design
Character Design
There are too many Disney features to examine them all in depth, so this will be limited to two
features; Sleeping Beauty (Geronimi, 1959) and Aladdin (Clements & Musker, 1992), since
these features - one released in 1959, the other in 1992 – demonstrates that the method for
defining a character by its’ basic shape has been in used for decades, which makes it a viable
source for inspiration.
When looking at shapes within the character, it becomes clear that Disney films make use of
different shapes to create contrast between different characters. Regarding Sleeping Beauty,
let us look at the three good fairies – Flora, Fauna and Merryweather – in contrast to the evil
witch – Malefacent. These characters represent the good and evil side of magic – the fantastic
element of the story – and this stark contrast is illustrated in various ways, including the basic
shape of the characters.
Let us compare the characters side-by-side to get an overview on Illustration 4.3.
Illustration 4.3: The good and evil side of magic in Sleeping Beauty
There are many differences in these characters, but the most basic difference is round edges
and soft shapes vs. sharp edges and pointy shapes. The three good fairies all share the
rounded heads and cheeks, the rounded bosoms, rounded lower bodies, the small and
somewhat chubby hands – they are largely build up from round, circular and full shapes.
In stark contrast we see Malefacent, who consists mainly of points, spikes, sharp edges and
thin shapes. Her face contains a pointy nose, chin, thin and pointy fingers, she has horns and
the collar of her coats spikes out in many directions. While the details are many, the
distinction is clear: Round and full for good – sharp and thin for evil.
63
Group 08ml582
4. Design
Character Design
We see the same type of shape-contrast when we examine characters from Aladdin. On one
side we have the good Sultan and on the other side we have the evil Jafar – these characters
representing a good and evil version of what the protagonist Aladdin must overcome in order
to win his princess – the laws that the Sultan upholds and the main antagonist that Jafar is.
Let us compare the characters side-by-side on Illustration 4.4.
Illustration 4.4: The good and evil obstacles in Aladdin
The good Sultan is shape-wise round and full. The huge, almost Santa Claus-like belly, his
chubby face and the puffy clothing all make him a very rounded and soft character. Contrary
to the evil Jafar, who – like Malefacent - is very thin, has thin, pointy fingers, a sharp face and
even though he wears the same style of clothes as the Sultan, his outfit is far sharper. And the
overall idea is the same as in Sleeping Beauty: Round and full for good – thin and sharp for
evil.
This was just examples from two films, but there are numerous others (Shrek (Adamson &
Jenson, 2001), Sid from Ice Age (Wedge, 2002) or even Santa Claus) and they all encompass
this idea of what constitutes a good and positive shape. Thus, using rounded a full shapes can
go a long way towards making the character good and likeable, which is desirable for the
character of this project.
4.1.5 Initial character suggestions
Thus it has been shown, that using round and full shapes to build your character can go a long
way towards portraying it as friendly and likeable – one that the audience will enjoy watching.
64
Group 08ml582
4. Design
Character Design
Inspired by this observation, the character of this project will therefore be built from round
and full shapes. And which shape is more round and full than the sphere? Having no corners,
being the prime example of the Principle of Squash and Stretch (look up this principle and you
are very likely to see the bouncing ball) and being used to create comical characters time and
again, this shape fits our character very well.
With these design-ideas and inspirations, we can start to sketch out some initial suggestions
for characters. Following what has been discussed in this chapter so far, they will be simple,
cartoonish and consist mainly of full, round shapes. Some of the initial character suggestions
are shown in Illustration 4.5 but every initial suggestion can be found in chapter 12. Appendix.
Illustration 4.5: Some initial character suggestions
While these all follow the ideas discussed so far, they still vary somewhat. Some have arms
and legs – some do not. The feet and hands of some of the characters are more detailed than
others, as are their eyes. These characters illustrate some of the range to work with within the
design ideas they follow. However, while our character can become based on one of these
suggestions, it could just as easily become a mixture of various features of our initial
characters, or maybe only borrow a few features and then creating the rest from scratch.
4.1.6 Detailing the character
Knowing the basic shape of the character and having examined the initial ideas for variety and
range, the final step is to decide exactly what the character should contain. In the words of the
famous cartoon and CG-film (Computer Graphics) director Brad Bird, currently employed by
Pixar Animation Studios (Pixar Animation Studios, 2008):
65
Group 08ml582
4. Design
Character Design
“The reason to do animation is caricature and good caricature picks out the elements that are
the essence of the statement and removes everything else.” (Milton, 2004)
With this in mind, the character can now become more focused. It is a biped, so in order to
move around, it should have two feet. Furthermore, it should have two hands, both due to the
fact that this is a part of the biped figure, but also for the character to be able to pick up and
interact with objects. The character should have a head, in order for him to portray emotions
and thoughts. While expressions can be supplied by body movement, based on chapter 2.1
Narrowing down emotions, it has been shown that emotions such as joy, fear, disgust etc. can
be clearly shown in the face, so the character needs to have this.
But apart from this, the necessities cease. A normal biped has arms, legs and a neck to connect
the hands, feet and head to the body, but when designing a simple cartoon character, are
arms, legs and a neck really necessary? Apart from connecting hands, feet and the head to the
body, they are not vital to the character. Function-wise, much of what arms, legs and the neck
can accomplish – bouncing a ball on the knee, using the elbows as a pillow for the head - can
just as easily be accomplished by the hands and feet and head.
Furthermore, characters without arms, legs a neck have been used with great success many
times. A character like Rayman from Ubisoft (Ubisoft, 1995) has starred in many computer
games over the years and he works very nicely without legs, arms and a neck. He can be seen
in Illustration 4.6.
Illustration 4.6: Even without legs, arms and a neck, Rayman appears like a connected character.
66
Group 08ml582
4. Design
Character Design
The game-review-series known as Zero punctuation (Croshaw, 2008) also employs a
character without arms, legs and a neck and that character also works just fine and appear
complete, just as he would with arms, legs and a neck and he can be seen in Illustration 4.7.
Illustration 4.7: Zero Punctuation character - like Rayman - has no need for arms and legs to be a complete character.
So, both as a still picture and as an animated character, it is perfectly fine to exclude these
limbs and still have a working character. Especially when working with an animated
character, Rayman has shown that when the character moves, it can be done in such a way, it
comes very close to being as natural as if the character had arms, legs and a neck.
However, when excluding arms, legs and the neck, certain considerations become necessary
regarding hands, feet and the head. If the hands had the shape of a plain sphere, the arms
would still ensure that the outer shape of the character resembled a biped. But when working
without arms, a bit of detail is needed to enforce the impression of a hand, which is why the
character of this project is equipped with a big, full thumb on each hand.
Illustration 4.8 The hand of the character, being made simple and from full shapes.
The palm will also be deformed to resemble an actual palm slightly. But apart from this, the
hand will still be simplified and the four other fingers will be combined into one big, full
finger. This will enable the character to pick up and manipulate many objects and therefore,
67
Group 08ml582
4. Design
Character Design
having four fingers become unnecessary. As a last idea, the hands will keep to the idea of
being round and full.
The feet will be shaped like a big slipper, which fits the character nicely, being normally
thought of as a soft shoe.
Illustration 4.9: The exaggerated feet also fit well within the design ideas for the character.
Toes are not needed to portray the illusion of either a foot or a shoe, so they are not included.
Going back to the characters from Disney features, such as shown in Illustration 4.3, many of
them do have small feet. But in addition, they often have big or full legs to go along with the
feet. The character of this project has no legs, so the sizes of the legs are transferred
somewhat into the feet instead. While they will not be as big as clown-feet, they will still keep
to the idea of round and full shapes by slightly exaggerating their size.
The head will be entirely like a sphere.
Illustration 4.10: The head of the character conveys emotions through its' exaggerated eyes and eyebrow alone.
Details like a jaw, cheek bones, ears etc. are not important to give the impression of a head. In
fact, as long as features such as the eyes are present, that is sufficient. The character is not
going to speak, since the humor used in our project does not include telling jokes or similar
68
Group 08ml582
4. Design
Character Design
vocal comedy. And since speaking is not important, the inclusion of a mouth must be
discussed. A mouth can be very useful in conveying various emotions in the face of a
character. But a character can portray emotions without it. The Zero Punctuation character is
a good example of this. The online Animation School known as Animation Mentor is another
good example: During the education, the students are provided with a character that is only
equipped with eyes in the head – otherwise, the head is just a regular sphere. But it can still
convey emotion by the eyes alone (AnimationMentor.com, 2008). This serves as an important
source of inspiration for the head of our character, both shape-wise, but also regarding the
eyes and mouth. But the main deciding factor in excluding a mouth will be production time
and this decision has been elaborated in chapter 3.1 Delimitation of the character.
The Animation Mentor character also shows that when using a simplistic and cartoon-like
representation of the eyes, elements such as pupils loose importance. This boils down to that
the head of this projects character will be simplified to a sphere with two eyes, not containing
pupils. However, the eyes will be scale much larger than normal eyes. This is due to the fact,
that the eyes are the main source of emotion from the character. And in order to emphasize
these emotions, one way to do it is to make the eyes big. Lastly an eyebrow will be included, in
order to help emphasize emotions in such cases where the eyes of the character are closed.
Without eyebrows it can become troublesome to determine if he is concentration, in pain,
listening sleeping etc, and eyebrows – even if they are simply just a line above the eyes – can
help greatly to clarify the emotion.
Lastly, the body of the character will also be a sphere, but deformed somewhat to resemble
the shape of a regular body.
69
Group 08ml582
4. Design
Character Design
Illustration 4.11: While starting from a basic sphere, the body still resembles a regular torso somewhat to give the
characters body a sense of direction.
It will be wider near the head than near the feet. If the body had been a completely round
sphere there could be a chance that the user would lose track of what is up and down in the
projects character during fast and somewhat erratic motion, such as air acrobatics, tumbling
down a flight of stairs etc. So this deformation gives a sense of up and down in the character.
Lastly, since the character has no arms, the body will be deformed to almost have an edge
near the head. This edge gives a sense of the body having shoulders, which helps illustrate
how the hands move in connection with the rest of the body.
And that concludes the construction of the character. It should now be able to come across as
a connected character due to the body shape and the inclusion of a head, hands and feet, move
around due to the feet, interact with objects due to the hands and display emotions due to the
eyes. But in addition, it keeps to the idea of being both simple and cartoonish, with much of
the character – hands, feet etc. – being simplified in relation to a real human, but then also
being scaled up, making them round and full. And as such the character has been designed to
fit the requirements of the analysis and inspired by proved techniques and idea from other
characters used in the same way as this projects characters is going to be: To be likable.
The final version of the character can be found in Illustration 4.12:
70
Group 08ml582
4. Design
Humor types
Illustration 4.12: A front and side-view of the final project character
4.2 Humor types
A combination of Avery’s Screwball Squirrel and Chuck Jones’ Road Runner humor will be
used in the project. Both are of the slapstick humor type and can be executed in short
timeframes. Combining the extreme craziness and the type of black humor from Avery with
the funny “accidents-are-bound-to-happen” and surprising style from Chuck Jones would be
the keyword in the humor of the movies clips of the project.
4.2.1 Ball play
All 3 gags are inspired by the slapstick humor and are similar by having at least one ball in the
scene. In the scene the character will wonder about finding a ball. The ball itself is harmless;
however it becomes destructive for the character after interacting with it. The character will
be drawn to kick the ball and the ball will always respond by hurting the character in extreme
and unpredictable ways. This should happen either rapidly to exaggerate the craziness, or in
slow-motion in order to make the scenes funnier in facial and body appearance.
71
Group 08ml582
4. Design
Humor types
4.2.2 Black humor
While still being a variant of the slapstick humor genre, the black humor type is greatly
inspired by the group of comedians known as Monty Python, who is well known for their
irrelevant and often surreal sketch comedy (Fogg, 1996).
A recurring element in the sketches of Monty Python is that an event completely unrelated to
the story suddenly happen, leaving the viewer quite puzzled and surprised but still amused by
the absurdness of the whole situation.
Furthermore the expression “And now for something completely different” (IMDB, 1971) is
commonly associated with Monty Python, while also describing their type of humor very
fittingly.
This is what the black humor type will imitate; to have the viewer smiling because of a
completely unrelated event, which however still adheres to the slapstick humor, meaning that
this event will inflict some damage to the character.
The thought is to build up a story with the character interacting with a prop, and then after
establishing a series of logical chronological actions, an event will occur which can in no way
be expected by the viewer to happen. And as it is the case with Monty Python, it is the
absurdness of this unexpected event that will cause the viewer to find it amusing.
4.2.3 Falling apart
This type of humor is based on exaggeration in relation to the character, taking advantage of
how he is perceived by the viewer and how he is actually constructed.
Due to the form of the body, resembling a torso and the elements that make up the character,
he will be perceived as a whole and connected character. But in actuality, the character is not
connected: his body, hands and head are all floating in the air and only his feet stand on the
ground. This type of humor will revolve around the contrast between the perception of the
character and the actuality of the character, since impacts between objects and the character
can be uniquely interpreted in the movie clips contrary to a character that actually had arms
and legs:
When the character is hit by an object, he falls apart.
72
Group 08ml582
4. Design
Drafty Storyboards
This action of actually taking the construction of the character very literally is surprising to
the viewer, since he perceives the character as being a connected character and as thus does
not expect him to fall apart, even though it is logical for him to fall apart due to gravitational
pulls. The viewer might expect him to stretch unnaturally, but will most likely expect him to
return to the normal form again.
Furthermore, aftermath of the character falling can also be used to comical effect, since the
viewer will have no real-world reference to how a character like the one in this project reattaches separated limbs to his body. The character might push limbs back onto the body so
hard, the limbs on the other side of the body will fall off due to the force, the limb might not
want to re-attach at all etc.
4.3 Drafty Storyboards
This chapter will apply theories and methods from chapter 3.3 Lighting, sound and acting and
chapter 3.4 Camera movements and angles to the drafty storyboards (from here on referred to
merely as storyboards), such as camera angles, movements, acting of the character and sound,
as well as explain the reasons and effects of these applications.
The storyboards are useful in planning the movie clips, before going into the process of
implementing them. A storyboard is a simplified way of visualizing a scene, without having to
animate and model the parts of the scene. The storyboard will show the animators and
modelers what camera angles to be used, which objects and props to implement and how
these should be animated. Therefore, using storyboards for each of the movie clips in this
project will optimize the implementation process and minimize production time.
4.3.1 Use of the camera
As mentioned in chapter 3.4.2 Angle and Distance about the various shots, the human figures
are visible, but the background dominates in the long shot. This can be used to introduce the
viewer to the scene, as this will give an overview of what is happening. The long shot has been
used in several of the group’s storyboards for the first picture. This can be seen on e.g.
Storyboard “Trampoline” on page 191 in Appendix and on Illustration 4.13.
73
Group 08ml582
4. Design
Drafty Storyboards
Illustration 4.13: The long shot used in humor type 1, 2 and 3.
Illustration 4.28 shows the first picture of a storyboard from each type of humor. This
approach, using a long shot, has been used to introduce the viewer to the character and the
scene.
Whenever the character displays important facial expressions, other shots like the medium
shot and the medium close-up can be used in order to emphasize these expressions. Medium
shots especially have been used as can be seen in Illustration 4.14.
Illustration 4.14: Medium shots showing the character from the waist and up.
The closest the viewer gets to the character in the storyboards is a medium shot.
The effect of the medium shot is that the important parts of the character are still visible using
this closer framing, and furthermore that it becomes easier to see the facial expressions and
gestures.
In chapter 3.3.3 Sound, off-screen sound can be used to make the viewer form expectations of
what caused the sound. However, for certain scenes, such as the Music Box scene which can be
seen on page 185 in Appendix, the framing of the camera will be used in correlation with off-
screen sound to alter the effect of off-screen sound slightly. This use can be seen in Illustration
4.15.
74
Group 08ml582
4. Design
Drafty Storyboards
Illustration 4.15: Using cameras in correlation with off-screen sound alters the effect of the sound slightly.
What is achieved by this use of framing is that the audience will know perfectly well what
makes the off-screen sound. However, the result of the event creating the sound is still
unknown. The viewer will know that the character has been hit by the boxing glove, but will
not know what this will result in, mainly because it is a cartoon-like character and it does not
have to behave like a real human being when getting injured. This use of framing will create a
comical effect, since the viewer is trying to imagine how the character could get hurt by the
boxing glove, before actually seeing the result. Moments later, the result becomes visible and
proves to be a much exaggerated result – following the humor type used in the storyboard of
having the character fall apart on impact with objects – which is unexpected and takes the
viewer by surprise.
4.3.2 Acting
The eyes and hands of the character are very important with regards to portraying his mood
and actions. Therefore they have become accentuated in the storyboards. Since the character
cannot speak, his intentions and moods must be presented visually through his body
language, i.e. through his hands and eyes.
75
Group 08ml582
4. Design
Drafty Storyboards
Illustration 4.16: Hand and eye movement visualize the mood of the character.
Keeping his eyes and eyebrows clearly in focus helps to visualize these moods and what
actions he intends to perform. The left part clearly shows that the character is confused about
something, while he, in the right part, uses his hands to emphasize that he is laughing.
4.3.3 Props
It would be difficult to continuously have the viewer interested in what he sees if the
character was the only thing present in the movies. Therefore various props have been
included in the movie clips, so the character has something to interact with, and thereby the
interest of the viewer can be maintained.
The props included are the following:
•
•
•
•
•
•
A music box
A boxing glove
A ball
Several smaller balls
A trampoline
A piano
The reason for the music box is that the scene needed an prop from which music should be
played, but also that this prop did not look like a typical music playing device e.g. a ghetto
blaster, since the character should not immediately be familiar with the prop and therefore be
somewhat cautious when interacting with it. Furthermore the prop should be tangible, since
the character still needs to interact with it.
So although these requirements can be fulfilled by many shapes, the final choice came to be a
box with part of a sphere protruding from one of the sides, since this fulfills the requirements
76
Group 08ml582
4. Design
Drafty Storyboards
and furthermore because the production time is fairly short. While still being a somewhat
unfamiliar prop, its’ outer shape does not immediately inform of what the function of this
prop is, so the prop could do anything.
The boxing glove has been included because a prop was needed for the purpose of an impact
with the character. Again the situation is that many props can fulfill this requirement of
impacting with the character, but since a boxing glove clearly indicates an impact, due to its
real world reference, it was deemed suitable.
The ball and the smaller balls are related in the sense that the single ball moves out into offscreen space and returns as several smaller balls. The requirement for the prop in this case
was that it was to be kicked, and to this end, a ball is a very suitable choice.
An important factor in the story using the black type of humor is the use of off-screen space,
i.e. that something happens to the character outside the viewing range. As such, a prop is
needed to help moving the character into off-screen space, rather than just having him walk
out of the scene. A trampoline has therefore been decided for this interaction, since it is
capable of launching the character up in the air, thereby having him disappear upwards.
Lastly something should hit the character while he is in off-screen space and after he has
fallen to the ground, this prop should fall right on top of him. The point of the black humor
story is that suddenly something completely unrelated to the story happens, and as such this
prop could be almost anything. A piano is a good choice because it firstly is not the typical
object to encounter high up in the air, and secondly because it can continue the very unusual
event (falling on top of the character) by beginning to play music, hereby proceeding in the
black humor path.
Furthermore the gag of a piano falling on a character has been used before on an episode of
the show Family Guy called “Tales of a Third Grade Nothing” (Langford, 2008). The fact that it
has been used in such a popular show indicates the quality of the gag.
4.3.4 Sound
Sound has been visualized a few times in some of the storyboards. In these cases the function
of off-screen sound has been used, creating the effect of the viewer becoming more immersed,
as he cannot see what happens and therefore has to imagine it.
77
Group 08ml582
4. Design
Drafty Storyboards
Illustration 4.17: The use of off-screen sound in the storyboards.
Using off-screen sound also reduces animation time considerably and the fact that you cannot
see the cause of the sound makes it possible that the sound could represent almost anything.
In the Illustration 4.17, the “bang”-noise might come from the ball hitting a door, or it could
just as well come from the ball hitting a spaceship. The unknown of the off-screen space
intensifies the immersion of the viewer.
Sound used for directing the viewers attention will be used in movie clips, to provoke an
expectation in the user and attempt to make the guess what will happen next. An example of
this use of sound can be seen in Illustration 4.18.
Illustration 4.18: Showing an example of sound used to direct the viewers’ attention.
What happens is that the box being held by the character has abruptly stopped playing music
and the character has picked it up to examine the reason why. After shaking it for a moment,
the music abruptly starts again and finishes playing. However, there are differences in the
way it finished playing and how it started playing. At first, the music was played at a normal
volume, with the viewer hearing it from the center of the scene. But when it abruptly starts up
again, the volume has been significantly increased and the music is now heard to the right of
the scene. This change of both volume and position of the music source, along with a reaction
78
Group 08ml582
4. Design
Drafty Storyboards
from the character, will aid in making the user anticipate a forthcoming event from the right
of the screen.
Another use for sound in efforts of achieving the effect of involving the viewer more in the
movie clips, is when diegetic and nondiegetic sounds are mixed, in the sense that the viewer
does not know whether a sound is diegetic or nondiegetic. This happens as shown in
Illustration 4.19.
Illustration 4.19: The music playing can easily be considered to be nondiegetic, but is later revealed to be diegetic.
The music starts playing before the first frame is shown. The movie clip then fades in while
the music is playing, so the music should be perceived as regular background-music, since
music with no discernible source in the scene should be perceived nondiegetic music.
However, when it abruptly stops and the character even reacts to this, the viewer becomes
aware, that maybe the music was played from within the scene and the source of the music
thereby becomes an element of the scene, which the viewer will be interested in finding out
more about. This way, the scene “plays” with both the viewer’s expectations about the music
used in the scene, but it also draws the viewers’ attention to the box, before it is even shown.
Apart from these instances of using sound to achieve an effect in either information displayed
in the scene or to make the viewer form expectations about what will happen, the remaining
sound used in the movie clips are functional, in that they support the visuals; the sound of
footsteps will be heard when the character walks around, sounds of crashes will be heard
when objects, such as the boxing glove or the piano, collides with the character etc.
To gain an overview of the sounds used, a movie clip could be complemented by a sound map,
such as the one proposed by David Sonnenschein, an accomplished sound designer on feature
films such as Dreams Awake and I’d Rather be Dancing (IMDB.com, 2008), which can be seen
in Table 4.1:
79
Group 08ml582
4. Design
Drafty Storyboards
Table 4.1: Showing how various sounds in a scene of a movie can be illustrated in a sound map.
Source: https://internal.media.aau.dk/semester/wp-content/uploads/2006/10/sonnenschein.pdf page 19
However, duplicating the categories of this sound map would be inappropriate for these
movie clips, since e.g. voice is not a factor at all. Therefore, the following categories will be
included in the sound maps for the movie clips of this project:
-
-
Functional sounds: These are the sounds the support and enhance the visuals, such as
the sounds of footsteps, on-screen collisions of objects etc.
Effecting sounds: These are the sounds that have a purpose beyond just supporting a
visual object. These sounds works as directing the attention of the viewer, to make the
-
viewer form expectations of what will happen etc.
Character sounds: These are the sounds that the character makes.
An example of a sound map for a movie clip of this project – can be found in Table 4.2.
Table 4.2: Organizing the various sounds in such a sound map makes it easy to get an overview over which sounds
are part of the scene and how.
4.3.5 Light
None of the storyboards have drawn information regarding light. The primary goal has been
to develop the stories and even though light can help doing this, the lighting requirements for
the stories are limited. Basically the viewer needs to be able to have a clear image of the
80
Group 08ml582
4. Design
Drafty Storyboards
character throughout the animation, since the movements, gestures and facial expressions are
important and need to be seen in order to better understand the film. And since everything in
the scenes will be clearly visible in the storyboards, there is no need to draw light information
on them.
In chapter 3.3 Lighting, sound and acting, four features of film lighting were listed: quality,
direction, source and color. The following is a general description on how light will be used
with regards to the animations.
The quality will not be specific hard nor soft light, but rather be something in between,
Although it will lean more towards the hard light because it creates sharp edges and clearly
defined shadows.
The direction of the light in the scene will be kept downwards. It is important to ensure, that
cast shadows from props do not conceal important details. E.g. when the piano falls on top of
the character as seen in storyboard “Trampoline” on page 191 in Appendix, the cast shadow of
the piano cannot be allowed to conceal the hand of the character. The direction of the light
will kept downwards in order to more easily control the length of the shadows, to determine
what they might conceal.
The key light is the only type of light, providing the dominant illumination. This is what will
maintain a neutral illumination of the scene.
With all the decisions made about how to craft a storyboard, an example of a complete
storyboard can be seen in Illustration 4.20.
81
Group 08ml582
4. Design
Animation
Illustration 4.20: The drafty storyboard developed in this project does only consist of drawn thumbnails,
representing key moments in the scenes.
4.4 Animation
The next step of the design phase is to examine various techniques within animation that can
become useful for animating the project character. Since the project character is going to react
to the user through various animated scenes, it is very important to be able to craft this
animation with a sense of purpose and direction of making the user smile, rather than just
moving the character around and hoping for the best. This chapter will explore various
elements of animation techniques, such as Disney’s 12 Principles of Animation, the walk cycle,
82
Group 08ml582
4. Design
Animation
weight in animation etc., along with exploring how these can be utilized in this project. In the
end the chapter will provide a thorough overview of how the animated scenes can be realized
and how the project character can come alive from the drawing board and start to make
people smile.
4.4.1 Full vs. Limited Animation
Before starting to delve on which animation techniques to use, a brief look at what type of
animation – either full or limited – to make use of. This will be done with the remaining
production time in mind, since the project-period is no longer than 3 months, along with the
fact that animation can quickly become a time-consuming element of the project. Replicating
movement so it appears realistic or natural is not an easy feat and going through a specific
animation sequence, tweaking each frame until a satisfactory result is finally accomplished
can take a significant amount of time. As such, techniques or methods to shorten to process of
actually animating are very appropriate for a project such as this and that is the main reason
for looking at full vs. limited animation.
The differences between full and limited animation are listed here (Furniss, 1998):
Full animation
Limited animation
- Many in-betweens
- Few in-betweens
- Every drawing is used only once,
- Reuse of animations
- Many animated frames per second
filming on “ones” or “twos”
- Movement in depth.
- Few animated frames per second
- Cycles
- “Dead Backgrounds”
- More dialogue driven
- Many camera moves
In order to fully understand these differences, a brief explanation of the terms “in-between”
and “ones/twos” are needed.
-
Detailed information on the in-between can be found at the website The Inbetweener
(Bluth, 2004), but a short description is that in-betweens are one or several
drawings/positions between two extreme drawings/poses on an animation. For
example, animation of a person opening a door would typically have the person
83
Group 08ml582
4. Design
Animation
grabbing the handle as one extreme and the door fully open as another, with the poses
-
in between these extremes being the in-betweens.
Detailed information on ones/twos can be found here in the book The Animator's
Survival Kit (Williams, 2001, s. 47-68), but a short description is whether to use the
same frame of animation once or twice in a row. One second of animation running at
24 frames per second (from this point referred to as FPS) can either consist of 24
different frames if shot on ones, or only 12 different frames if shot on twos.
When having fully understood the differences between full and limited animation, we can
start to see, that the type of animation for this project will end up being a mix between the
two, and first the elements from limited animation will be chosen, along with why they are
chosen:
4.4.1.1 Elements from Limited Animation
“Reuse of animations”: The most prominent element to be used from Limited Animation is
the re-use of animations. Movements that are similar in all of the scenes to be animated – such
as the character walking - will be reused due to the fact, that the scenes feature the same
character and unless something drastic will happen to the character, he will be walking the
same way, thus eliminating the need to re-animating the walking for each scene.
“Cycles”: Making use of cycles within the animation can be time efficient. Referring to the
character walking again, if each and every step was to be manually animated, it would slow
down the animation process to a great degree. Instead, it is possible to create a so-called walk
cycle, which will be explained in more detail in chapter 4.4.3 The walk cycle: Rather than
animating every step, it is only necessary to animate two steps and then these can be repeated
to create a walk for as long as needed. To avoid these cycles becoming to mechanic, minor
tweaks can be applied to them, such as the wobble of the head, the shaking of the arm, but the
cycles are largely identical and time-saving.
4.4.1.2 Elements from Full Animation
“Many in-betweens/Many animated frames per second”: While it might seem odd to
deliberately include many in-betweens when aiming at saving time, it becomes clear why
doing so is beneficial when working with 3D animation. While classical 2D animation only
produced motion based on every drawing produced, 3D animation takes any number of poses
84
Group 08ml582
4. Design
Animation
of the character and interpolates motion between these poses based on how long the
animation is. Interpolating means that the computer tries to calculate the smoothest
transition from one pose to the next over the given period of time (e.g. 10 frames between
each pose). While this interpolation often requires the animator to tweak them manually
before they look natural, it always ensures that there are many in-betweens present in the
animation. If it runs at 24 FPS, there are always enough in-betweens to have 24 frames each
second – they do not have to be manually created. Thereby, to achieve the effect of only have
maybe 8 in-betweens will require extra work effort of duplicating frames to deliberately make
the animation less smooth. This actually means that fewer in-betweens can easily include
more work than many in-betweens.
The type of animation used in the project has been decided upon – the next step is to explore
various techniques and methods of animation to be used to create the scenes with which the
users will interact.
4.4.2 Disney’s 12 Principles of Animation
Created back around 1930 by the Walt Disney Studios, the twelve principles of animation
were used in many of the early Disney features, such as Snow White (Hand, 1937), Pinocchio
(Luske & Sharpsteen, 1940) and Bambi (Hand, 1942). Furthermore, pretty much every
animator is familiar with these principles, since they are an important part of the skill-set of a
competent animator. A detailed description of each of these principles has been written by
Ollie Johnston and Frank Thomas in the book The Illusion of Life (Johnston & Thomas, 1997).
They were directing animators on many Disney feature films (Thomas & Johnston, 2002) and
are both part of Disney’s Nine Old Men, a collection of early Disney employees who are all
considered some of the most influential animators of the 20th century (von Riedemann, 2008).
-
Squash and stretch
Anticipation
Staging
Straight Ahead and pose-to-pose
Follow-through and overlapping action
Slow in and slow out
Arcs
85
Group 08ml582
-
4. Design
Animation
Secondary action
Timing
Exaggeration
Solid Drawing
Appeal
Each of these principles has its’ use and purpose (lest they would not exist) but it is important
to determine which principles fit this project, how they can possibly be used and which can be
skipped (if any).
Squash and stretch – seen on Illustration 4.21 - is often used to exaggerate physical impact
on a shape or object, but it can also reveal information about the physical nature of an object e.g. a flexible rubber ball vs. a rigid bowling ball.
Illustration 4.21: The squash and stretch principle on a bouncy ball
It can be used to achieve a comic or cartoony effect, making it a fitting principle for this
project. This principle is usually accompanied by the example bouncing ball that squashes on
impact with the ground and stretches when bouncing back up in the air. And since e.g. the
head of the project character is basically a ball, this principle definitely has potential in this
project. Furthermore, the character is cartoonish and not realistic, so squashing and
stretching him to some extend to emphasize motion or events in the story become a viable
option, such as when the character falls to the ground from great heights. It can be used to
great effect to emphasize the impact of one object against another, if – in the very last frame
before the impact-frame – the moving object is greatly stretched out to the point where it
touches the non-moving object as seen in Storyboard “Trampoline” on page 191 in Appendix.
When this happens only in one frame, it is not very obvious, but it has a great effect on the
overall motion of the impact between a moving object and a non-moving object.
86
Group 08ml582
4. Design
Animation
Anticipation – seen on Illustration 4.22 - has to do with cueing the audience on what is about
to happen and using anticipation can often greatly enhance the impact and result of a motion.
Illustration 4.22: To the left: The anticipation of a punch, before it occurs to the right
E.g. if a character wanted to push an object, the motion of the push itself would be much more
emphasized if he pulls his arms back before pushing forward, rather than just pushing
forward. One of the main thoughts behind designing the project character cartoonish, was to
be able to exaggerate what happens to it. Anticipation is another method of achieving
emphasis on the character and what happens to it, so this principle also has its’ uses in this
project, e.g. when the character prepares to launch itself into the air from a trampoline as seen
on Storyboard “Trampoline” on page 191 in Appendix. Or when charging up the jump, he will
curl down to a great extend, the arms will go far back behind the head and the eyes will pinch
down hard to build up energy, which will then be released into a jump.
Staging – seen on Illustration 4.23 - refers to making it clear to the audience what is
happening in the scene.
Illustration 4.23: To the left: Very bad staging, since most of the details of the scene is concealed. Much better staging
occurs to the right where the details of the scene are clearly visible.
Choosing the correct poses and lines of motion of a character for example, can assist greatly in
helping the audience understand the story better. It is an important principle in relation to
87
Group 08ml582
4. Design
Animation
this project, since information about the scene and the story can become entirely lost if
staging is not considered. If the character is pulling something heavy towards the camera with
his back to the camera, it will be extremely difficult to see the physical strain portrayed in the
face of the character, motion of the hands can become concealed by the body and it becomes
difficult to see how far the object has been pulled. So, without staging, the scenes can quickly
become a visual mess and must be paid heed to when the character picks up objects or stands
next to a trampoline.
Straight ahead and pose-to-pose is regarding how to animate. It can either be using key
poses and then filling in frame between these key poses to create the animation, thereby
making it very predictable, but also limited when it comes to freedom with the animation. Or
it can simply be beginning from one point and then just animating and creating the motion as
you go along, causing the animation to be very free and spontaneous, but it can very easily get
out of hand, messing up the timing of the entire scene or straying from what was originally
intended with the animation. While it is hard to recognize which method has been used when
watching the finished animation, it is worth considering for this project, mainly due to the
limited time available, which makes pose-to-pose the obvious choice. However, it will be with
a slight mix of straight ahead animation as well, since subtle motion, such as the wiggle of a
hand during a walk, kicking in the air during a jump etc. can be incorporated without it
necessarily being a part of the key-poses.
Follow through and overlapping action – seen on Illustration 4.24 - is when a part of the
character or object continues to move, despite the fact that the character itself has stopped
moving (follow through) or changed direction (overlapping action).
88
Group 08ml582
4. Design
Animation
Illustration 4.24: To the left: Follow through in the fingers and toes - To the right: Overlapping action in the hand to
the right, since everything has turned another direction except for this hand.
Being somewhat related to dragging action, when part of the character takes a bit longer to
move than the rest, this can be hair or clothing moving after a character has stopped moving,
but could also be a head turning slower than the body and stopping later than the body. In
relation to this project it can be very useful in using little effort to create a more fluent or
natural motion and loosen up the motion, keeping it from being stiff and rigid. E.g. a walk
cycle can become much more natural if the toes are delayed a small amount when the foot
goes down. Arm movement can go from being very stiff and robotic with a straight arm
swinging back and forth to being more loose and natural, if e.g. the palm and fingers trail
behind the wrist when the arms swings. It can even be used to more extremes with the
projects character, since his hands and feet are not directly attached to the body. If he
becomes hit by something or starts running fast, it is possible to make the hands, feet or even
the head wait before following the body itself to really exaggerate the sudden shift in motion
of the character. So whether it is concerning subtle or exaggerated motion, this principle is a
great help in achieving more life-like movement for the project character.
Slow in and slow out – seen on Illustration 4.25 - is yet another principle that is useful for
this project. It is used to create snappy action, the changes in tempo, bringing more variation
into the animation, rather than having it move at the same pace the whole way through.
89
Group 08ml582
4. Design
Animation
Illustration 4.25: The little spikes on the arc to the right represent frames in the motion. The further apart they are,
the faster the action goes, thereby being slowest and the start and finish - slowing when coming into the action and
when coming out of the action
Using this give the character the possibility to emphasize greatly the motion of slapping
something, pulling something, when he re-attaches his head to his body, etc. and generally
help in making an action particularly powerful, if the anticipation of this action is slowed
down, then speeding up the motion in the action itself as seen on Storyboard “Music Box” on
page 185 in Appendix. Or, the exact opposite can easily be used to create a bullet-time effect
when details must be shown within a normally very fast motion, such as dodging projectiles
etc. as seen on Storyboard “Bullet Time” on page 194 in Appendix.
Arcs – seen on Illustration 4.26 - are one of the bases for obtaining more natural and life-like
action of a character. When walking, the arms do not move in a straight line, but rather swing
back and forth in arcs, the up-and-down-motion of the head in a walk occur in arcs and
generally a lot of motion occurs along arcs.
Illustration 4.26: The red lines represent arcs of the motion from the left picture to the right, when the ball is
thrown. Pretty much every motion here takes place along arcs.
As can be seen from Illustration 4.26, arcs can also help in creating exaggerated motion and
extreme poses for a character, making for much more lively and interesting animation,
especially for a cartoonish character like the one present in this project. Say the character
90
Group 08ml582
4. Design
Animation
wants to open a door, which is stuck. Well, rather than just pulling the body back to try and
open the door, the body can really stretch out in a long arc to greatly exaggerate the motion of
trying to open a stuck door or when stretching out in mid-air during a trampoline-jump, the
exaggeration can be seen on Storyboard “Open Door” on page 188 in Appendix. Or when
falling from a great height, the character would not fall straight down, but rather arc as the
body would fall first with the dragging of the legs and head following behind.
Secondary action – seen on Illustration 4.27 - can bring more character into the animation. It
can be referred to as action within the action and it concerns getting as much out of the
animation as possible.
Illustration 4.27: The top jump is relatively closed and fairly little movement apart from the jump itself. The bottom
jump uses secondary action to loosen up the jump a lot more, with the legs and arms moving independently to create
more action within the jump itself.
Say a character is jumping. Instead of simply pushing off with both feet as once and landing
with both feet again, the character could lift off and land with one foot delayed. Furthermore,
the legs have many opportunities for secondary action whilst in the air. They could air-run,
cycle around, waggle uncontrollably etc. as seen on Storyboard “Trampoline” on page 191 in
Appendix. So, while a character performs major actions, adding nuances in the form of
secondary action can add personality to this motion. This is very useful for this project. When
the project character discovers a strange new object to interact with, rather than just looking
at it before maybe picking it up, there is great opportunity to twist the head to create more
wondering facial expressions or having the character scratch his head while looking at the
object.
91
Group 08ml582
4. Design
Animation
Timing – seen on Illustration 4.28 - is useful for this project solely due to the fact that it uses
animation, since timing can cover how long a specific motion takes, such has how long it takes
for a certain object – or in the case of this project, the character – to fall a certain distance
based on the physical nature of the character as seen on Storyboard “Trampoline” on page
191 in Appendix; a heavy character will fall faster than a character which is light as a feather.
But timing can also cover comical timing, known from e.g. Looney Tunes cartoons, where a
character steps over the edge of a cliff, but only falls down, when he discovers that he has
stepped into thin air and it is more fun to fall down at that point.
Illustration 4.28: Here, timing is used to achieve a comic effect, since the character only falls down once he realizes
that he stands in mid-air
Using timing this way allows comic effect, if e.g. the character looses the his body and the head
only falling to the ground when the character realizes its body is missing and it thus becomes
more fun for the head to fall down at this specific point. A certain amount of delaying between
sounds happening off-screen until events take place on-screen gives us the opportunity to let
the user form his own expectations about what will happen next, before actually showing it.
Exaggeration - seen on Illustration 4.29 - can be used to emphasize actions or events
happening in the scene.
92
Group 08ml582
4. Design
Animation
Illustration 4.29: Here we see two events. The top are without exaggeration and the effect is noticeable. However,
when they are exaggerated at the bottom, the effect of the impacts is greatly enhanced and emphasized.
It has been mentioned in the description of other principles is this chapter, such as Arcs and
Follow through and overlapping action, since these principles can be used to realize this
principle. Working particularly well for cartoons, since they do not have to abide by realistic
motion, exaggeration can be used to great effect in the project. The character anticipating an
action can be greatly exaggerated, e.g. making it arc far back before punching something. Or
something hitting the character can be greatly exaggerated to emphasize this impact by
simply making it cause the character to fall apart, flatten completely out etc. as seen on
Storyboard “Music Box” on page 185 in Appendix.
Solid drawing and Appeal mainly concerns the design of the character such as making it
more life-like and likeable, which has been described in detail in chapter 4.1 Character Design.
The original idea of Solid drawing was to use the appropriate depth, weight and balance to
give the drawings a fitting style. Kerlow suggests that the principle is renamed to “Solid
Modeling and Rigging” to meet up to the 3D modeling and rigging techniques of today
(Kerlow, 2004). The principle can be shortly described as the art of modeling and rigging a
character, so that the style and expression is as desired. It should also convey a sense of
weight and balance in the character along with simplifying rigging the model. Appeal of the
character is aimed at providing a character with a specific appeal such as being goofy or evil.
The appeal could for example be expressed using different walk cycles, but could also be
realized in the form on scarring a characters face, making it hump-back etc..
Thus it appears, that each of these 12 principles is indeed useful in some way or another to
this project, either alone – such as Staging - or in combination with other principles – such as
93
Group 08ml582
4. Design
Animation
exaggeration. They really help in breathing life into the character when it is being animated,
especially for a cartoon character that can use certain principles such as Squash and Stretch or
Secondary Action to a greater extent than a realistic character.
4.4.3 The walk cycle
Let us look at the project character briefly: It is a biped and as such, it has got two feet. So,
how does it move around in the scene? Well, since it is a cartoon character in a fictional
universe, no distinct rule applies – he could fly around, he could teleport, he could collapse
and roll around. But – as stated in chapter 4.1 Character Design, it is important for the
audience to be able to quickly identify how the character should move and form expectations
about it, since the sequences are rather short. As such, this character will use his feet and walk
around.
But, even though walking is completely natural to us in the real world, creating a walk in
animation is a difficult process. To quote Ken Harris, an animator working on Warner
Brothers cartoons, the Pink Panther and various Hanna Barbera cartoons (IMDB.com, 2008):
“A walk is the first thing to learn. Learn all kinds of walks, ‘cause walks are about the toughest
thing to get right.” (Williams, 2001, s. 102)
Therefore it becomes very important to examine and break down the process of walking to
see how to create a walk that appears fluent and natural. Only when knowing about how to
create a walk can it be twisted and adapted to the needs of the animation in which it much
partake. The following section will examine the basic steps towards creating successful walk,
how to give personality to a walk and how to adapt the walk to the project character, along
with how to adapt the walk into an actual walk cycle.
While a walk consists of several different positions, two very important positions are known
as the step or the contact positions and can be seen in Illustration 4.30.
94
Group 08ml582
4. Design
Animation
Illustration 4.30: The two contact positions of taking a single step in a walk
Each contact position is simultaneously what ends one step and begins the next. The timing of
the walk can be roughed in with these positions, such as considering whether or not one step
should happen in the course of 12 frames, 25 frames etc. in order to characterize the nature of
the walk. While a walk can easily be adapted to fit many different characters, there is a
general scheme of beats for a single step, to which a walk can be timed and it can be seen on
Illustration 4.31.
Illustration 4.31: Different people walk with different timing, as this scheme suggests
When the position and timing of a step has been decided upon, the rest of step could be inbetweened to finish the walk. But there are many problems that can arise from simply in-
betweening the walk without considering subjects such as weight in the walk, how the feet
move in relation to the legs, the hands in relation to the arms etc. Therefore, it is important to
create a few more positions within the step to ensure the walk has whatever weight, feet-
movement etc. it needs to function for the character. In order to determine the nature of the
walk, the so-called passing position can be added. This is the middle position of the step when
one foot passes the other, so for a step taking up 13 frames, the passing position would occur
on frame 7. Adding a passing position to the two contact positions in Illustration 4.32, the walk
starts to become much more defined.
95
Group 08ml582
4. Design
Animation
Illustration 4.32: The inclusion of the passing position, the walk begins to take shape
The step has - with only 3 pictures – begun to illustrate what kind of mood the character is in,
the weight of the character, the pace of the walk etc. The example in Illustration 4.32 shows a
normal and neutral paced walk, with little hurry, neither relaxed nor tense, neither proud nor
ashamed etc. But playing a bit with this passing position will quickly start to change the walk
quite radically as seen in Illustration 4.33.
Illustration 4.33: Just by altering the passing position, we can easily create the basis for four very different walks
The walk is now starting to obtain a great deal of character and the mood starts to come out
more. But a few more positions should still be defined, which will help the walk convey
additional useful information – namely the up position and the down position and we will
start by looking at the down-position.
The down position is where the front leg bends down and all the weight of the body is shifted
onto this leg – this position can be seen in Illustration 4.34.
96
Group 08ml582
4. Design
Animation
Illustration 4.34: The passing position - drawn in blue - defines the walk even further
This is where we can really start to play around with the weight of the character. This position
determines most of how the character shifts its body weight when walking. Depending on the
shape and mood of the character, the weight can be shifted in a multitude of ways when a
character walks. When carrying a heavy load on its’ back, the down position could be made
very extreme, threatening to make the legs give way to the weight; a plump character might
need to lean to the side to swing the leg around; a joyful character might only touch lightly
down on the toe before skipping on to the next step; an angry character would maybe stomp
into the ground on the down position.
Lastly, we look at the up position, which can be seen in Illustration 4.35.
Illustration 4.35: The up position - drawn in green - is the last key position in the basic step before only the inbetweens remain
At this point in the step, the character pushes off and maintains momentum in the walk. The
pace of the walk is often determined at this point, essentially making it the position that
“keeps the walk going”. As seen in Illustration 4.35 this is often the highest point in the step
and it is another great place to convey a sense of weight in the character. A sad character
might only lift itself very slightly off the ground, opposite of a cheerful character that might
almost lift off from the ground entirely at this point in the step.
97
Group 08ml582
4. Design
Animation
Now that we have seen the five key positions of a basic step, we begin to see how they are all
connected and how to play with the walk per say. First of all, we see a clear use of the 7th
Disney principle of Arcs when following the path of the head, the hips or the heel of the
moving foot. Furthermore, we get a sense of which parts of the body are actually leading the
movement in the walk and how to exploit this to tweak the walk, even in this most basic form.
Let us look at the hands and feet.
According to Richard Williams, regarding foot-movement in a walk:
“The heel is the lead part. The foot is secondary and follows along.” (Williams, 2001, s. 136)
He illustrates this point as seen in Illustration 4.36.
Illustration 4.36: Demonstrating how the heel leads the movement, while toes, the ball of the foot etc. follow this
motion.
Knowing this gives many possibilities to spice up the motion of the foot, when taking a step.
Rather than having the foot touch down flat on the ground, doing as in Illustration 4.36 –
having the heel touch the ground and then having the ball of the foot and the toes follow
behind - gives the motion greater flexibility (although stepping on a flat foot might be more
appropriate for a tired walk), just as maybe curling or wiggling the toes can do.
The same is true for hand/arm motion during a step. In relation to hand and arm movement,
Williams notes:
“The wrist leads the arc.” (Williams, 2001, s. 148)
And he illustrates this point, as seen in Illustration 4.37.
98
Group 08ml582
4. Design
Animation
Illustration 4.37: The motion of the wrist creates an arc of motion, with the hand following as secondary motion.
In Illustration 4.37 we can begin to see, how the motion of the hands can be used to add more
nuance and life to the walk. The 7th and 8th Disney Principles (Arcs and Secondary Action) are
in play here and playing around with the arcs or finger motion makes it possible to convey e.g.
very feminine qualities in the hand motion alone, by making them very flowing and dragging
far behind the wrist, or maybe making the walk very lazy by making the arcs of the wrist very
small and possibly eliminating wrist and arm motion entirely during the walk.
With all the basic elements of the walk completed, we can now create the walk for the project
character and the five positions – the contacts, the passing and the up and down positions –
can be seen in Illustration 4.38, going from the right to the left.
Illustration 4.38: The five basic positions of a step for the project character, with the red lines showing some of the
arcs of motion
99
Group 08ml582
4. Design
Animation
Many things can be read from this walk. Referring back to the down position we can see, that
it is a slight down-motion, illustrating that the character is somewhat light and is able to walk
without stepping down too heavily. Also, this walk is of moderate speed. The distance
between the back foot and the body is fairly minimal in the up-position, so that the character
does not push off from the ground with very much force. Worth noting is also the subtle handmotion: As the left hand is in front of the body, the fingers are bent slightly, but as the wrist
swings back and the hand follows, the fingers drag behind, causing them to stretch out, before
curling slightly again. A similar drag happens in the right hand, where the fingers curl more, as
the wrist and hand swing forward, with the fingers dragging behind. So even though this is a
fairly neutral and moderate walk, there are still opportunities to use some of Disney’s
principles to loosen up the walk and make it less rigid. And to illustrate exactly how much can
be done with a walk to loosen it up and make it livelier, Williams has crafted somewhat of a
recipe for getting vitality into a walk as seen in Illustration 4.39.
Illustration 4.39: There are many, many different ways to make a walk come alive as seen from this recipe.
What is great about having these basic elements of the walk and having made them specific to
the character, now it is possible to re-use them in order to make the character walk for as long
100
Group 08ml582
4. Design
Animation
as needed. Even though there might be very slight variations in these positions for each step,
an entire walk can be based on a single step. Since each contact position is both the end of one
step and the beginning of the next step, it is possible to use the 2nd contact position from the
previous step as the 1st contact position for the next step, ending this step with the 1st contact
position from the previous step. Since the timing has already been determined from the
previous step, placing the passing position is similar to the previous step, as is the up and
down positions. And when making the next step, the positions from the very first step can be
duplicated and maybe tweaked slightly and this process can now be repeated for as long as
the character needs to walk. This process of recycling previous poses of individual steps to
create an entire walk is known as a walk cycle and making use of this ensures a cheap and
easy alternative to posing the character anew for each step it must take.
4.4.4 Weight
As the final part of this chapter, weight in animation will be discussed, since weight is very
important in order to achieve a connection between the character and the world. Referring
back to chapter 4.4.3 The walk cycle, if the up and down position were not included in the
walk, making the character walk in a completely straight line with no up or down motion, the
character would have been floating around in the world, rather than walking and connecting
with the world.
But apart from connecting the character to the world, weight is also a big part of conveying
the physical properties of characters or objects to the viewer. A heavy object, like a piano, will
fall quicker and straighter than a light object, such as a feather. But it is also important to
think about subjects such as reactions to a fall, namely the bounce off the ground. While
conveying information about the weight of the object, a very bouncy ball will also be more
rubbery or elastic than something like a bowling ball, which will have little bounce. A clear
difference weight-wise can be seen in the scenes, when the character falls from trampoline
jump and hits the ground, which causes him to bounce off the ground, before coming to a rest.
Moments later, when the much heavier piano drops onto the character, this piano does not
bounce at all, clearly illustrating the vast difference in weight on the impact.
Often times it becomes necessary to convey a sense of weight in an object by how a character
is interacting with it, rather than just how quickly the object falls. What if there is a big
101
Group 08ml582
4. Design
Animation
boulder that needs to be pushed, a door that must be opened or merely an object that needs to
be picked up? The sense of weight of the object must also be portrayed in these actions. What
can be done is to use the character to portray the sense of weight, even before he interacts
with the object itself. In the words of Williams:
“One way we can show how heavy an object is, is by the way we prepare to pick it up.”
(Williams, 2001, s. 256)
While this refers to picking up an object, it also holds true for pushing or pulling objects. But
since it holds true for many different interactions, let us only discuss it in relation to picking
up objects, since the character will be doing this several times in the scenes of the project as
seen on Storyboard “Music Box” on page 185 in Appendix.
Making heavy use of the 2nd Disney Principle of Anticipation, how the character anticipates
picking up the object is a big part of conveying the sense of weight of the object and whether
or not the character is familiar with the object. If the character is about to pick up an unknown
object, he will not immediately know how to do this and will most likely consider the action
before carrying it out: Observing the object for a while, before beginning to interact with it;
maybe breathing in, in a way to gather momentum for bending down and beginning the lift;
when the character has bent down and gotten a grip in the object, he will bend down slightly
more to gain extra momentum in the actual lift, if the object is a heavy one. There can be many
ways to make the character perform extra preparations for the lift of an unknown object. In
the case of this project, such preparations are present when the character is about to pick up
and examine the musical box. This looks as seen in Illustration 4.40.
Illustration 4.40: The lift of the music box and 6 important steps in using it to convey the weight of the box.
102
Group 08ml582
-
4. Design
Animation
At 1 the character thinks about the action at hand. It is important to note, that at this
point in the scene, the box has abruptly stopped playing and so the character wonders
what to do with it, making this consideration about more than just how to lift the box
-
itself. Nonetheless, the approach to lift the box is still being considered.
At 2 the character anticipates the downwards motion to gain extra momentum for the
bend and really preparing himself to lift the box, no matter how heavy it might be.
What also happens is that the character takes a small step closer to the box, in order to
-
get it more under control, should he need to use body weight to support the lift.
At 3 the character bends down to the box, grips it and starts preparing to lift it.
At 4 the character bends down even more to allow himself to push up again using body
weight to a greater effect than he could achieve without this slight anticipation for the
-
lift. The eyes also squint to prepare for the upwards motion again.
At 5 the character lifts the box. The box is just a bit too heavy to be lifted simply by the
hands alone, so backwards motion of the body is also put to use to aid the lift, moving
-
backwards and even taking a small step back as well.
At 6 the lift is complete with the final action of the character widening the space
between his feet, since the added weight of the box demands a bit more steady balance
than when the character did not carry anything.
On the other hand, if the character is familiar with the object, little to no preparation will be
required, since the character knows what to expect when trying to lift the object. The entire
action of lifting such a familiar object will therefore be much more straightforward and far
less anticipation will be present, such as the case, when the character picks up his head and
attaches it backwards, after having it knocked off by the boxing-glove, as seen on Storyboard
“Music Box” on page 185 in Appendix. In the actual scene, this will look as seen in Illustration
4.41.
103
Group 08ml582
4. Design
Animation
Illustration 4.41: Being that it picks up its own head, the character is now much more familiar with this lift and it
becomes much more simple and easy.
-
At 1 the character has positioned himself in order to be able to pick up the head. In the
scenes, this happens almost as soon as the character knows where the head lies on the
-
ground and there is no consideration regarding the lift before attempting it.
At 2 the character has bent straight down and grabbed the head. There is no
anticipation or gain in momentum before bending, since the character knows perfectly
-
well what to expect when lifting the head from the ground.
At 3 the character effortlessly lifts the head off the ground and prepares to attach it to
the body again. The character handles the head somewhat more gently than the box,
holding more by the fingertips, than the palm of the hand. It should be noted that even
though the character has taken a step backwards, this is not done to use body weight to
aid the lift itself, but rather to assume a stable and secure stance for when the head
becomes re-attached, which happens in a swift motion, which will cause the character
to sway back and forth, requiring balance.
The first example, seen in Illustration 4.40, made use of far more anticipation poses and stages
of preparation for the lift, while the latter, seen in Illustration 4.41, consisted mainly of
straight forward motion, since this lift will be well-known to the character. But both examples
used both the character alone in getting ready to lift the object and a direct interaction
between the character and the object to convey a sense of weight in the object, which shows
that there are many ways to think about and illustrate weight in the scene in order to connect
characters and objects to the world and each other thereby giving the viewer an impression
that they are in fact part of the world, rather than floating arbitrarily around in space.
104
Group 08ml582
4. Design
Animation
4.4.5 Sum-up
Now several techniques to help enhance the animation of the various scenes have been
examined. A combination of Full and Limited has been established to be used for the scenes,
along with determining which elements from both types are appropriate and why. Disney’s 12
Principles have been examined, along with how every single principle can in fact be used to
great effect in the project in various ways. A look has been taken at the steps required to
produce a walk that can convey a sense of weight in the character when walking, making the
walk come across as connected to the world, rather than floating around, and also how this
walk can be re-cycled to create the walk cycle, making it easy to make the character walk as
far as needed without reproducing each step manually. And lastly the chapter examined how
to achieve a sense of weight in various objects, both when the objects move by themselves,
such as in a fall, but also when the character interacts with the objects and how to use the
character alone to convey the weight of the object with which the character is going to
interact.
This will not cover every part of the animation however. Motions such as the jump on the
trampoline or the bullet-time movements are not described in detail in this chapter, but
describing in detail each and every motion in every scene would be far beyond what is
necessary. Many movements are very individual to a story. Having a character such as the one
in this project, does not immediately suggest that he must dodge projectiles in slow-motion,
or that he must jump on a trampoline. However, it is very likely that the character will walk
around the scene, that weight becomes an important factor in achieving believable interaction
with objects or that Disney’s 12 Principles will be put into use. So, what has been examined in
this chapter is animation that is widely used, making it prudent to examine techniques to
achieve a believable version of this. The rest is up to the creative freedom of the animator.
The design of the project so far covers the character itself and how this has been created to fit
the aim of the project. The storyboards have been created with 3 different types of humor, as
described in chapter 4.2 Humor types, in order to have variations in the test of the program.
When the program then reacts to the smile of the user, it can provide him with a different type
of humor based on what he likes the most. And it has been decided upon how to use the
various cinematic elements, such as light and sound in order to further enhance the message
of the storyboards and bring it out from the screen to the user.
105
Group 08ml582
4. Design
Smile Detection Program
The relevant animation techniques and theories have been examined and chosen in order to
help the character come to life in a natural and believable way, in order for it better to connect
with the viewer.
With this design in place, the next phase of the design is to create the program to ensure, that
the smile of the users will be registered and recorded such that the test of the product is
actually able to establish when the user is smiling and when he is not, in order to determine
what type of humor the user liked best and react to this.
4.5 Smile Detection Program
In order to be able to make our animated movie clips react to the user’s smile, it is necessary
to produce a program, which can track the user’s mouth and decide whether this mouth were
smiling or not. It is important that the user did not need to have various tracking devices
placed on his face or that he had to interact in any other way, except from the smile. This
chapter will cover the design of this program and the requirements of the program, such as
determining the success criteria for the smile detection and creating a flowchart for the
overall program functionality.
4.5.1 Requirements
In order to know what to aim for and what was important for the program, a list of
requirements were formed. First of all, the program should be able to take in a webcam feed
as input and be able to detect a human face in that feed. When having detected the face, the
program should determine which area of the face is the mouth and then be able to decide
whether the mouth is smiling or not. With this information, the program should be able to
select and play an *.avi movie clip, based on the smile detection.
Given that the program should ideally be able to work in all conditions, such as light intensity,
number of persons present in the webcam feed etc., there are several things to take into
consideration when designing the program. However, with the limited time available for this
project and with consideration of the complexity of a program, which would truly work in all
conditions, the design of this application will be limited. First of all, the program should
ideally be able to work in all lighting conditions, for instance in the bright daylight shining
through a window or in the dim light of a dark living room, when watching a movie in privacy.
Given that light will be a very decisive factor in making the smile detection work, the design
106
Group 08ml582
4. Design
will be limited to work in normal daylight and therefore, the requirements to the application
will be that it works in this specific lighting condition, or in similar lighting condition that we –
as test conductors – can set up artificially.
Even in the most optimal lighting conditions, there should be some requirement to the success
rate of the smile tracking. In chapter 2.5 Testing the product, it was mentioned that a success
rate of at least 75% should be achieved. The higher the success rate of the program, the higher
are the chances of a successful test.
4.5.2 Movie playback
The playback of the animation clips will be ordered by type of humor. Two suggestions for the
playback will be proposed and this chapter will explain and discuss the two ways of playing
the movies.
For the first proposal, the users will always be shown the same initial movie and based on
their reaction to this movie clip, the program will choose either a movie of a new type of
humor or the same type. The mapping of the movie clips according to the users’ reactions
looks like illustrated in Illustration 4.42. The structure is a tree structure which is described in
chapter 3.6 A reacting narrative.
Illustration 4.42: The playlist for playing the movie clips in this method is based on the users’ respond to the
previous clips.
Based on the smile detection performed by the program, the program will choose according to
whether or not the user smiled enough to pass the threshold of “liking” the movie. With this
method, the user can see between 3 and 5 movie clips. If the user likes the first type of humor
presented, he/she will continue to watch this type and if he/she continues to like this type,
the playback will end after having showed the three videos of type 1. If the user enjoys the
first movie, but then does not enjoy the next movie in type 1, the program will choose the
107
Group 08ml582
4. Design
second movie in type 2. Again, the program will detect smiles throughout the movie clip and
determine if the scene was to the user’s “liking”. The program can follow any path through the
tree, indicated by arrows. The benefit about this method is that the program checks the user’s
opinion towards every movie clip, but the drawback of the method is that a certain value has
to be found, for deciding when a user is actually enjoying the current type of humor, which
can be difficult to determine exactly. Another drawback is that if the user does not like the
current movie clip, it is uncertain which type of humor to choose instead. Smile or not smile is
a binary choice and having three options to a binary choice is an inherent problem.
The second method shows one of each type of humor at the beginning and then measures how
much the user is smiling during each movie clip. The one type of humor the user is smiling
most at is then the type which is chosen. Then for the next two movie clips, the user’s smiles
will be ignored and no change of type will happen. The method can be seen in Illustration 4.43.
Illustration 4.43: The playlist for playing the movie clips in this method is based only on the users’ response to the
first three clips.
The advantage about this method is that there is no need to find a threshold to decide
whether or not the user liked the previous movie clip. It is simply a matter of comparing the
three initial movie clips and see which of them the user smiled of the most. The disadvantage
about it is that once a type of humor is chosen, the program will no longer react to the user
smiling or not.
For the purpose of this project, the second method will be chosen, in order to avoid having to
find a threshold, defining when a movie clip is likeable. Even if such a threshold was found
through a number of tests, there will still be a chance that the movie was not to the users
taste, even if the program detected so. Furthermore, there is the risk that the user will
constantly be shown movie clips he does not like, since, if the user does not like the previous
108
Group 08ml582
4. Design
clip, he will merely be shown a new option – this option will be based on what the user did not
like and not what the user actually did like.
With the second method, the program will always choose the type of humor that the user
smiled the most at, and this should in theory also be the movie clip that the user likes the
most. The advantage of this method is also, that at minimum one point, the user will be shown
a new movie clip based on what he liked, rather than a guess based on what he did not like.
Both methods can use the star shape structure, which were introduced in chapter 3.6 A
reacting narrative, since they are both formed by small individual clips, which all starts at a
neutral state for the character. The second method does not risk taking the user jumping from
one type of humor to another all the time, but stick to one type, after an initialization period.
The decisive process where the user’s smile is determining the chosen type could be repeated
at dramatic height points of an entire narrative, as it was seen in the Kinoautomat system
described in chapter 1.2 State of the art. However, for the purpose of testing the product in
this project, the movie clips will be created such that the character always returns to the same
starting point at each movie clip, no matter what has happened to him in the last movie clip.
4.5.3 Flowchart
In order to understand the progress of the program, a flowchart will be presented. This will
explain the step-by-step process of the program and how it reacts in different situations. The
flowchart in Illustration 4.44 only covers the flow of the smile detection program.
109
Group 08ml582
4. Design
Illustration 4.44 The flowchart covering the smile detection program and playback of the relevant movie clips.
110
Group 08ml582
4. Design
Design conclusion
This flowchart is based on the decision made about the movie playback, which is to show the
users three initial clips and then compare the user’s reaction to each of these three clips. The
program will start by establishing a connection to the webcam and start the first movie clip in
the playlist. As the movie clip is playing, the program will try to find a face in the given
webcam feed. For the application of the product in relation to this project, there will always be
a person present in the webcam feed, so no error handling is done, in case the program fails to
find a face. Should the program for a period of time be unable to track the user – if the user e.g.
scratches his nose or looks away for a moment – the program will keep trying to find a face
until the user reappears.
If a face is detected, the program will determine the position of the mouth and then detect if
the mouth is smiling. If the user is smiling, a value of 1 will be added to a vector called
movieMood and if the user is not smiling, a zero will be added. At the end of the movie clip the
total value of movieMood for that single movie clip is stored.
If the last of the three initial movie clips has been played, the program will compare the
“score” for each of the three movie clips and the clip that scored the highest value, will be
decisive for what movie clips will be played next. If for example the first movie clip scored the
highest, the program will show clips from the playlist of this type of humor.
4.6 Design conclusion
The design of the product is now complete. The character has been designed to fit the project
and to convey the message of “fun”, taking inspiration from sources such as Disney and
Warner Brothers cartoons. The storyboards on which to base the movie clips have been
designed, such that there are several clips containing each of the chosen types of humor, so
that there are variations in the test at which the user can smile and make the program react.
These storyboards have been greatly enhanced using theories of various cinematic elements,
such as how to properly apply sound and different camera angles to either enhance comical
effect, involving the user in speculating what happens at certain points during the movie clips
or simply cutting down production time while still conveying the same message to the user. It
has been decided which animation techniques to use when animating the character, such as
Disney’s 12 Principles or applying weight to an animation, and why these techniques would
aid the character in coming more to life.
111
Group 08ml582
4. Design
Design conclusion
On a technical level, the movie playback method was decided. To ease the testing of the
product, it was decided to play the movies in a way that the program only had to compare the
user’s reaction to three initial clips and choose the following clips according to this
comparison.
Requirements to the success rate and working conditions of the program were introduced.
The success rate of the smile detection was determined to reach at least 75 % in an
environment of light intensity similar to that of daylight.
112
Group 08ml582
5. Implementation
Modeling
5. Implementation
This chapter will explain the implementation process of the project and the work that was
done in this process. First the implementation of the storyboards will be discussed. This
process was done by using MAYA, and the steps of going from the 2D storyboards to a 3D
animation will be covered during this chapter. The overall techniques used for modeling the
character and examples of how to apply them will be described. Afterwards, a description of
the techniques used for rigging the character in order to prepare it for animation will likewise
be shown, along with examples of where to use them. And, as the final part of the section
concerning MAYA™, techniques used for animating the character and making it come alive on
the screen will be examined, as well as where they were used.
For the programming part of the implementation, it will be described how the code was
created and what was used in order to make the program work the way it does. A total of
three different programs were made to get the final result: A picture capture program – for
capturing the training images and cropping them to the right dimensions - a training program
– for teaching the program what is a smile and what is not – and the actual smile detection. All
programs are produced in C++, using the OpenCV library developed by Intel (OpenCV, 2008).
5.1 Modeling
The first step of implementing the character and the objects that it will interact with, such as
the trampoline or the boxing glove, is to model it in 3D.
Concept art of the character, like shown in Illustration 4.6 in chapter 4.1.6 Detailing the
character. One approach could have been to import concept art into Maya and then model
precisely after this, but the project character as simple enough to model it to a satisfactory
level with following the concept art meticulously.
When modeling the character, there were various approaches to follow. One could be using
NURBS curves - Non-Uniform Rational B-Splines – a technique which involves drawing a few
curves along the outer shape of the character and then lofting them, which mean stretching a
surface between them. This produces a very smooth result, but it can be difficult to add small
details if needed and using this for more detailed areas such as the hands would pose
problems when creating the thumb. Also, the curves which are to be lofted into an object
would have to consist of an equal amount of control vertices (the points that the NURBS
113
Group 08ml582
5. Implementation
Modeling
curves are being formed according to and shown as purple dots in Illustration 5.1), which can
be difficult to ensure, if curves are used to create asymmetrical objects, and in general it can
be difficult to be sufficiently precise with the structure and placement of the curves to ensure
a proper end-result.
To illustrate how this works, Illustration 5.1 shows how the body of the character could have
been made using NURBS curves.
Illustration 5.1: To the left: 3 NURBS curves to form 1st half of the body. In the middle: Lofting the 3 curves. To the
right: Mirroring the 1st half to form the full body
Another available method could have been edge-extrusions, which involves starting from a
single polygon surface and extruding new surfaces out from the edges of this one polygon.
This process can be compared somewhat to creating paper mache, where you have an object
and thin pieces of paper. You then fold and stretch the paper around the object until you have
a closed shape on paper, based on the object. Edge extrusions work in a similar fashion of
folding thin surfaces around to form a closed shape, except that there is no form to shape
directly around – the end result relies largely on how good the modeler is at imagining the
correct volume of the model.
Replicating the entire body of the character just for the sake of illustrating the process is too
time-consuming, so a general example of edge extrusion will instead be shown Illustration 5.2.
114
Group 08ml582
5. Implementation
Modeling
Illustration 5.2: Starting from the orange surface, the following model has been produced by extruding various edges
a random number of times
A third method of modeling, which was also the method that was used for the project
character, is called box-modeling. As the name implies, this method involves starting with a
polygon primitive – often a box, but it can also be a sphere, a cone etc. - and then modifying
this primitive until the final model is obtained. This is done largely by extruding the faces of
the box, which will pull the selected face out from the box and create new faces between the
chosen face and the neighboring faces of the box. However, there are many possibilities of
adding the necessary details and number of faces to extrude. Faces can be divided in many
ways, either by splitting them, merging more faces into fewer, beveling edges – a process that
takes an edge, splits it in two and creates a face between these two new edges – and many
more. Using these options for extruding and creating new faces enables a modeler to mold a
very primitive object into something much more detailed.
To demonstrate how the body was modeled, Illustration 5.3 shows the starting polygon
primitive and how the final result would look.
115
Group 08ml582
5. Implementation
Modeling
Illustration 5.3: To the left: Starting with a simple cone primitive. To the right: Having extruded the primitive, the
shape of the body has been made.
The hands, feet, eyes and the eyebrow were made using the same modeling technique of
extruding faces from a polygon primitive. The only different part of the character is the head.
As per the character description of having the head be a sphere, modeling the head involved
nothing more than creating a polygon sphere and scaling it to fit the size.
Illustration 5.4 shows how the character looks when completely modeled from a front, side
and perspective view.
Illustration 5.4: The character from a front, perspective and side view, along with a wireframe on top to illustrate the
number of polygon faces that makes up the character
116
Group 08ml582
5. Implementation
Rigging
5.2 Rigging
Now the project character has been built and can be used in the scene. However, before
animation can begin, the character must be rigged - a setup-process that involves the
construction of a skeleton, constraints for the skeleton etc. Without rigging a character,
moving it around would become a very tiresome process of moving each vertex manually,
effectively lengthening the animation process too much for it ever being possible to complete
in time. The character of this project has roughly 4.500 vertices. Just the thought of having to
move this many vertices around for maybe about 240 frames of animation (roughly 10
seconds) renders the process of animating too heavy and cumbersome to even consider
including. This is where rigging has its uses, since a rigged character becomes very easy to
move around. Outfitting the character with a skeleton of joints allows the character to be
controlled according to the design of the skeleton. E.g. a biped character like the one in this
project can be rigged to fit a basic humanoid skeleton and thus achieve easy hand and feet
movement as well as general humanlike motion. This chapter will describe the various
elements of constructing a working rig for the character used in this project. It will cover
when to use each technique and an example of how it was actually used in the character. And
while each and every step of creating the finished rig will not be covered, the chapter will – at
the end – have explained the various techniques such that it is possible to understand how the
rig was created.
5.2.1 Joints and skeleton
The first thing to construct when rigging a character is the joints of the skeleton. A joint is
exactly what it sounds like – a connection between two bones and as such their functionality
and purpose need only little explanation: Any point where the mesh of the character is
intended to bend in any way should be connected to a joint. On the project character, this
includes places like the toes or fingers or in the torso.
On Illustration 5.5 it can be seen how the joints for the leg of the project character has been
constructed along with how the final skeleton looks, in relation to how they fit inside the mesh
of the character. Only these two parts are shown, since going from a leg all the way to the final
skeleton only involves creating the two legs and arms and parenting these to the spine.
117
Group 08ml582
5. Implementation
Rigging
Illustration 5.5: To the left, the details of one part of the skeleton is shown, while an overview of the entire skeleton
is shown to the right.
Using joints, the entire skeleton for the character can be created. It is important to keep in
mind, that when creating joints for e.g. the leg, all these joints will be connected in a joint
chain and this chain has an internal parent-child relationship. This means, that the joint
highest up in the chain – the first joint that was placed, the parent-joint of the leg is the Hip-
joint – will influence the other bones in the chain. E.g. when the top joint is rotated, all the
other joints below are also rotated, but when the lowest joint in the chain is rotated, nothing
happens to the joints above. Think of this as moving your knee; your ankle and foot follows
this motion, but if you move your toes, the knee will not be moved along with it.
Understanding this parent-child relationships shows in what order the various joints of the
skeleton should be created in order to create a logically functioning skeleton, e.g. the shoulder
should be made before the wrist and the knuckle should be made before the finger tip.
It does pose a slight problem in that it is not possible to create a skeleton consisting of two
legs, a spine, two arms and a neck in a single chain. However, this problem is solved by simply
parenting the various joint-chains to each other afterwards, such as parenting the top joint of
each leg to a joint near the bottom of the torso to act as the hip.
5.2.2 IK/FK
IKs and FKs are terms used to describe methods of translating joints around. FK means
Forward Kinematics and IK means Inverse Kinematics.
118
Group 08ml582
5. Implementation
Rigging
One use of an IK chain in the project character can be seen in Illustration 5.6. Also note, that a
chain of joints is essentially an FK chain until an IK is added, and an FK chain can also be seen
on Illustration 5.6.
Illustration 5.6: On the left is an IK-chain in the leg between the brown-colored joints while the rest of the joints in
the foot are FK chains. On the right is an overview of all IK handles in the character enclosed in white circles.
The main difference between IKs and FKs is that IKs uses simple movement of a single point
of a joint chain to translate the joints in the scene, while FKs make use of rotations of joints in
a chain to place the joints where they are needed. Deciding whether to use IKs or FKs is really
depending upon the action or motion that must be animated. Say for example, that an arm is
bent and must be stretched out.
-
With IKs this can be accomplished by creating an IK-chain between the top joint in the
arm joint chain (the shoulder joint) and the lowest joint (the wrist), grabbing the end
of the IK chain (the wrist) and dragging this single joint out from the body, until the
arm joint chain is stretched out. This is possible since the IK system will ensure, that
the remaining joints in the chain will automatically adjust themselves according to the
-
position of the wrist joint.
With FKs, it is more cumbersome. Using FKs requires, that each joint in the arm be
rotated individually to obtain the final stretched-out pose of the arm, which in most
cases would involve rotating the shoulder joint first and then rotating the elbow joint
second.
119
Group 08ml582
5. Implementation
Rigging
For such movements, IKs are superior to FKs, which is also the case with e.g. posing of the feet
in a walk cycle, since creating an IK chain between the hip and the ankle joints and then only
moving the ankle is easier than rotating the hips joint and then the knee joint.
However, FKs have their own advantages over IKs. In such actions as a walk cycle, when the
arms often swings back and forth, the swinging motion is easy to obtain, by simply rotating
the shoulder joint back and forth between each step, while obtaining the swing by moving the
wrist up down, left and right with IKs, making sure that the elbow also looks right is much
more time-consuming.
Since many types of joint chains can benefit from both an IK and an FK chain, it would be
prudent to be able to switch between these two methods and luckily Maya provides an IK-
handle with the “IK Blend” option that allows for this IK-FK switch - a control than can enable
or disable an IK chain.
5.2.3 Control objects and constraints
A control object is used to gain easier access to certain joints that can be difficult to access
directly. In fact, it is generally not a good idea to rig a character such that any big movements
are performed by joints that must be directly selected: In order to keep the joints from
clustering up too much, their size can be scaled down to any size that fits the animator. But if
direct selection of joints is required to animate the character, a small joint size causes
problems, if it requires the animator to constantly zoom in on the character to select a joint
and then back out to animate.
A close-up of the control object for the wrist and an overview of every control object in the
character can be seen in Illustration 5.7.
120
Group 08ml582
5. Implementation
Rigging
Illustration 5.7: To the left, the wrist control object is shown. To the right, every control object in the character is
displayed.
A control object in itself is any primitive that can be created, be it a polygon, a NURBS surface,
a curve etc., but is normally a curve, since curves are not rendered and do not conflict with the
scene this way. The control objects used with the project character are just curves shaped to
fit various positions around the character, such as a box around the hips and top of the torso
or a circle around the shoulders and elbows. It can now be made as big as needed by the
animator for easy selection and be made to control any number of joints or IK handles.
In order to make the control object actually control anything in the rig, the various constraints
inside Maya can be put to use. There are many different types of constraints, such as the point
constraint, which handles normal movement much like an IK chain, or the orient constraint,
which handles rotation much like an FK system, so when creating e.g. a control-object for the
wrist, the object is created and positioned at the position of the wrist joint. The joint is then
both point and orient constrained to the control object (to make it work for both IKs and FKs)
and now the rig includes a working control object.
5.2.4 Driven Keys
Apart from making the various joints and IK handles more easy to control, control objects also
have many possibilities to make it easy to organize controls for certain join motions, which
are cumbersome to animate by either IKs or FKs, such as forming a fist with the hand or the
foot movement required to simulate pushing off from the ground in a step. Control objects
121
Group 08ml582
5. Implementation
Rigging
offer natural places to create these extra utilities from the rig. However, the extra control itself
comes from using Driven Keys.
Using Driven Keys is somewhat like setting a normal animation key (this will be covered in
chapter 5.3.1 Key frames). However, rather than keying transformation, rotate or scale of an
object, a joint etc. to a certain time, it can be keyed to a control or a slider instead. Even though
it is possible to set a Driven Key to already existing controls, such as Scale X or Rotate Y, in
order to make a proper control for a certain transformation, a new control should be created.
The project character has such a custom control for e.g. making each hand form a fist and to
understand Driven Keys fully, we will look at this control. All the controls for the right wrist
can be seen in Illustration 5.8.
Illustration 5.8: Many custom controls have been made for each wrist of the character, such as "Fist", Thumb Bend"
etc.
When having created the custom Fist control and set its minimum and maximum value (in
this case from 0 to 10), it is possible to start setting Driven Keys. The joints that form the fist
are every joint in the four fingers. When Fist control is at 0, no fist should be formed, so a
Driven Key is set between the joints and the Fist control for each of the finger joints in their
neutral position now. A Fist value of 10 is then set. Now the fist should be fully formed, so
each joint in all the fingers are rotated to form a fist and now a Driven Key between the joints
and the Fist control is set. When this has been done, the Fist control will - when varying the
value between 0 and 10 – cause the joints in the fingers to form a fist or go back to neutral
position.
Using Driven Keys in this way thereby ensures that when wanting a fist in the animation, it is
no longer necessary to manually move or rotate each joint in the finger – it can be done by use
of a single custom-made control.
122
Group 08ml582
5. Implementation
Rigging
The Fist control in action can be seen in Illustration 5.9.
Illustration 5.9: To the left: The hand in the neutral position. To the right: A fist has been formed by setting "Fist" to
10.
5.2.5 Clusters
They are not used very much in the rig, so only a brief look will be taken at them. A cluster is a
deformer, which controls any number of joints, vertices, control points etc., with varying
influence over what it controls.
Illustration 5.10: To the left, the spine is bent by use of clusters. To the right, no clusters are used
In the character, clusters are used in the spine. In order to achieve a more fluent bending of
the spine, it is important to ensure, that not only a single part of the spine bends at any time,
since this is not how a real spine works and would make the character look choppy and weird.
123
Group 08ml582
5. Implementation
Rigging
Instead, by using a cluster for several points of the spine, when one of these clusters are
moved, it will also have influence over other parts of the spine, causing them to move as well,
although to a lesser extend. This creates a smooth curve in the spine, rather than a sharp
bend. The result of using clusters in the spine can be seen in Illustration 5.10.
5.2.6 Painting the weights
With the skeleton rigged and all the controls set up, it is time to look at attaching the skeleton
properly to the mesh and that is done via the process of skinning, which involves binding each
vertex of the mesh to the various joints in the skeleton, based on which bones are closest to
the vertices (the number of bones allowed to influence each vertex can be adjusted). But that
is the basics of skinning and this specific process will not be covered further, since skinning a
character just involves pressing one button.
Due to the highly automated nature of skinning, problems will often arise with bones affecting
too many vertices and making the mesh of the character deform incorrectly when moving the
joints. Illustration 5.11 shows incorrectly influenced vertices around the head of our character
(the shoulder vertices are influenced incorrectly when tilting the head) as well as how it looks
when these influences have been fixed.
Illustration 5.11: To the left: Incorrect influence. To the right: Correct influence
The way to fix problems such as this is by manually painting weights of each joint. This is a
method of assigning influence from a joint to vertices of a mesh, thereby controlling how the
mesh deforms when certain joints are moved. When painting influences of vertices to a joint,
all currently influenced vertices will show as white and the less influence the joint has on any
vertex, the more black the vertices be, allowing for easy view for how to e.g. fade out influence
124
Group 08ml582
5. Implementation
Rigging
between two joints. It also makes it easy to correct, if vertices are incorrectly influenced by a
joint; paint those vertices black and it is done.
Illustration 5.11 shows that the neck joint (enclosed in the red circle) has been rotated. To the
left, the vertices (enclosed in the blue circle) are incorrectly influenced by this joint, shown by
them being not black, but grayish. This problem has been corrected to the right, where all the
influence in the vertices in the blue circle has been removed, causing them to appear as black
instead and ensuring that the neck joint does not have any influence anymore.
5.2.7 Influence objects
Using influence objects is another method of ensuring that the mesh deforms like intended,
when moving or rotating joints. An influence object by itself is normally just a primitive
object, such as a sphere that is inserted into the mesh.
Illustration 5.12: A top, side and perspective view of the influence objects in the right hand of the character
Where influence objects are useful, is when bulges in the mesh are intended when rotating a
joint or when ensuring that parts of the mesh, such as the outside of the elbow, the muscle in
the arm or the palm of the hand, do not collapse into the mesh. When inserting an influence
object, this ensures, that the mesh cannot deform through it, thereby allowing for the mesh to
retain a desired amount of volume when moving joints and bending the mesh.
For the project character, influence objects are used to keep the volume of the hand fairly
similar, no matter how the joints in the hand are rotated. After testing the various hand
controls, such as “Fist”, three places were found where the use of Influence objects would help
the mesh deform correctly and they are shown in Illustration 5.12.
125
Group 08ml582
5. Implementation
Rigging
And, as Illustration 5.13 shows, when making the hand into a fist, the influence objects really
does go a long way towards retaining the volume of the hand, such as the area of the palm
near the thumb.
Illustration 5.13: To the left is the fist with influence objects maintaining volume. To the right there are no influence
objects and the fist is not nearly as closed.
5.2.8 The reverse foot control
One specific part of the rig that warrants a more detailed look is the reverse foot control
(referred to as RFC from this point forward), which can be seen in Illustration 5.14.
Illustration 5.14: This reverse foot control, will be extremely helpful in e.g. the walk cycle
Illustration 5.14 shows the mesh of the foot as purple, the joints of the foot in green and the
RFC in blue and the control object for the entire foot in the blue circle going around the foot.
126
Group 08ml582
5. Implementation
Rigging
The RFC is made up of four joints, marked by the red arrows. The reason for adding such a
control is to ease the process of posing the foot during a walk. So, this control can simulate the
motion of the foot pushing off from the ground, having the green joints move independently of
each other to make the motion – all with a single custom created control, similar to the fistcontrol seen chapter 5.2.4 Driven Keys
The first step is to parent the green joints to joint 4, 3 and 2 (the red numbers) in the RFC, so
that when using the foot control to move the foot around, the foot will move around with it.
But the nature of a joint chain means, that if one of the green joints were parented to a joint in
the RFC, the green joint chain would be broken, which would essentially break the foot
entirely. This problem can be solved with inserting an IK chain between each of the green
joints, which creates an IK handle at each joint. Now these IK handles can be parented to the
blue joints without breaking the green joint chain – IK handle 1 to RFC joint 4, IK handle 2 to
RFC joint 3 and IK handle 2 to RFC joint 3.
The custom foot control – named Foot Roll - is now created within the control object for the
foot and given a minimum-value of -10 and a maximum-value of 10 and set to 0 as default.
This will control the motion of the foot pushing off, going from 0 to 10, but will also control to
entire foot bending backwards and up, as when to prepare to set down the foot on the ground
again when taking a step, going from 0 to -10.
Creating the functionality of the foot motion requires the use of Driven Keys like shown in
chapter 5.2.4 Driven Keys. A driven key is set for all the joints as they are shown in Illustration
5.14, when Foot Roll is at 0, since this is the resting position for the foot. Using this value is an
easy way to ensure that the foot will not bend in any direction, but rather be flat against the
ground.
Creating the motion of the foot lifting off from the ground is done in two steps.
-
First, Foot Roll is set to 5. Then, RFC joint 3 is rotated a certain distance. This is done to
simulate the position of the foot pushing off, when the heel has lifted from the ground,
but the toes remain on the ground and a Driven Key is set for this position and value of
Foot Roll and it can be seen in Illustration 5.15.
127
Group 08ml582
5. Implementation
Rigging
Illustration 5.15: Half-way through the step, at a Foot Roll value of 5
-
Second, Foot Roll is set to the maximum of 10. RFC joint 3 is kept at the same rotation,
but now, RFC joint 2 is rotated a certain distance to simulate the remaining motion of
the toes also lifting from the ground, when the weight of the body has shifted to the
other foot and this foot is in the air. A Driven Key is set for this position and value of
Foot Roll and can be seen in Illustration 5.16.
Illustration 5.16: All the way through the step, at a Foot Roll value of 10.
Creating the backwards foot motion involves setting Foot Roll to -10 and rotating RFC joint 1 a
certain value and then setting a Driven Key for this position and value of Foot Roll and this can
be seen in Illustration 5.17.
128
Group 08ml582
5. Implementation
Animation tools
Illustration 5.17: Preparing the foot to land on the ground again.
This reverse foot control now reveals itself to greatly simplify the process of animating the
foot through taking a step. When the foot pushes off from the ground, Foot Roll can merely be
set to 10; when the foot is in the air, Foot Roll goes back to 0; when the foot goes down to the
ground again, Foot Roll is set to -10; and at the down position of the step (see chapter 4.4.3
The walk cycle) Foot Roll goes back to 0 again. It should also be noted, that through all the
various steps of using Foot Roll to pose the foot through taking a step, the control object for
the foot (the blue circle going around the foot in Illustration 5.17) is plane, making it easy to
align the foot correctly to the ground, ensuring that it is flat on the ground and generally
position the foot, regardless of the value of Foot Roll.
5.3 Animation tools
This section describes how the process of animating the character has been carried out. It will
examine the various tools used for this process, such as what Key Frames are, how to
interpret and take advantage of the Graph Editor and how real-life video footage has been
used to aid in achieving natural timing in the animation.
When animating in 3D, the animator will not manually pose every single frame, but rather
create the key poses (with Key Frames) and let the program automatically create the
remaining frames. However, the program often creates the frames in unintended ways and it
therefore becomes important for the animator to be able to control this automation using
tools suited for this task (such as the Graph Editor).
129
Group 08ml582
5. Implementation
Animation tools
5.3.1 Key frames
Key Frames help the user to control the animation. When a frame is to become a Key Frame, a
key is set for all transform attributes of an object. This means that the translation, scaling or
rotating performed with the object in that specific frame is saved. In the time slider, which
will be in chapter 5.3.2 Time slider and Range slider, a red marker appears at the frame that
becomes keyed, and this can be seen in Illustration 5.18.
If another Key Frame is made in a later frame, and the object is rotated, MAYA interpolates the
transformation of the object between the Key Frames.
Illustration 5.18: The red markers appear several times along the time slider, thereby making it easy to see where
you have placed your Key Frames.
Key frames have been heavily used by the group with regards to the animation of the
character and the objects he interacts with. Key frames can be directly related to key positions
or extremes of classical 2D animation. The key positions tell the important parts of the story,
while the remaining frames are the in-betweens, which make the animation appear smooth.
Key frames in 3D are similarly the key positions in the animation and created by the animator,
while the computer creates the in-betweens to make the animation smooth by interpolating
between the transformations of the animated objects between each Key Frame. The following
illustrations show the various positions of the character at some of the Key Frames in
Illustration 5.19.
Key frame 1
130
Key frame 10
Group 08ml582
5. Implementation
Animation tools
Key frame 30
Key frame 35
Key frame 40
Illustration 5.19: Five different Key Frames. For classical 3D animation, these would be five key poses and the other
frames of the animation would be in-betweens.
5.3.2 Time slider and Range slider
The time slider and range slider can be seen in Illustration 5.20.
Illustration 5.20: The time slider and range slider
The time and range sliders allows control of either play back of or scrolling through a given
animation, which is very useful if e.g. it is desired to play the animation inside the program
before rendering it. The time slider specifically displays the playback range and keys, if such
are made, and the range slider controls the range of frames that will be played if you click the
play button. The reason for the range slider to have two values on each side (1.00 and 1.00 to
the left and 24.00 and 24.00 to the right in Illustration 5.20) is that the range slider controls
both how many frames are currently visible in the time slider, but also how many frames are
totally available, while maybe not being visible. The inner-most values on each side of the
range slider – the values closest to the range slider itself – controls the currently visible
frames in the time slider, while the outer-most values control the total number of frames
131
Group 08ml582
5. Implementation
Animation tools
available. These tools are important to know in order to maintain an overview of the
animation, which can easily reach a size of e.g. 720 frames, which is only 30 seconds.
In Illustration 5.20, the time slider currently shows 24 frames, from frame 1 to frame 24, and
the currently active frame is frame 1, with no keys being set, since there are no red markers to
indicate any Key Frames. The range slider allows for a maximum of 24 frames, while also
currently displaying 24 frames.
These sliders have been used in a variety of ways on the implementation of animation. One
use has been to obtain an overview of every Key Frame in the entire animation, regardless of
there being 100 or 1000 frames currently keyed. Another use has been to make detailed
adjustments to the placements of the key-frames. The time slider allows for manipulation of
the placement of key-frames and thereby the timing of the animation, by moving the keys
from one frame to maybe a few frames further down the time slider (e.g. a key from frame 4 to
frame 7). Viewing 700 frames in the time slider makes it very difficult to move frames like
this, so adjusting the range slider to see fewer frames allows for more detailed manipulation
of frame positions.
5.3.3 Graph editor
The motions and transformations of every animated object in a scene can be graphically
represented as curves in the graph editor. These curves are called animation curves and there
is one for each keyed attribute of the object. I.e. there is one for the translation of the object in
the y-axis, one for the scaling in the z-axis etc. Each curve shows how an attribute is changing
during the animation. In Illustration 5.21 the animation curve for translation in the y-axis is
shown.
132
Group 08ml582
5. Implementation
Animation tools
Illustration 5.21: The animation curve for translation in the y-axis.
The steep rise and fall of the curve in Illustration 5.21 indicates that the character moves
rapidly up and down in the scene, i.e. when he jumps. The following wave-like shape of the
curve (from frame 225 and onwards) is when the character has landed on the trampoline and
the elastic fabric gives in and eventually settles.
The transformation displayed as a graph in Illustration 5.21 is translation in the y-axis (the
green color indicates change - be it rotation, transformation or scaling - in the y-axis).
The way to interpret these curves is that the x-axis of curves in the graph editor represents
time in the animation or frames if you will (Illustration 5.21 shows animation from frame 150
to 294).
The y-axis represents changes in value for the current transformation that is being animated.
This is what makes transformations in any y-axis the easiest to understand in the graph
editor. When values of transformation in the y-axis are increased, the object moves along the
y-axis in the scene (normally up on the screen). Therefore, the motion of the object in the
scene would be very similar to the curves displayed in the graph editor.
However, transformation in e.g. the x-axis would be more difficult to interpret, since when the
curve goes up, the object might move to the right in the scene, and move to the left, when the
curve moves down. In this case, the object would be aligned in the scene such that an increase
133
Group 08ml582
5. Implementation
Animation tools
of motion along the x-axis would cause the object to move to the right in the scene and left
following a decrease in motion along the x-axis. And since the y-axis in the graph editor
represents a change in value, an increase of motion in the x-axis in the scene would be
displayed as the curve going up in the graph editor, while the curve would go down, if there
were a decrease of motion along the x-axis in the scene.
The curves work similarly for rotating and scaling. If an object was rotated or scaled in the
positive direction in the scene, the animation curve in the graph editor would go up, while a
negative scale or rotation of the object would cause the animation curve to go down.
A clear example of this correlation between motion and the graph editor can be seen in
Illustration 5.22.
Illustration 5.22: Showing how translation in the x-axis corresponds to the curve in the graph editor
In Illustration 5.22 the box in the lower right corner has been moved 20 units along the x-axis
and the lower left box represents the starting position for the box to the right. In the scene –
the lower part of Illustration 5.22 - motion along the x-axis is motion to the right on the screen.
But when looking at the animation curve in the graph editor (the upper part of Illustration
5.22), the graph moves up. This moving of 20 units has happened over 24 frames, which can
be seen along the x-axis in the graph editor, where the numbers 0, 2, 4, 6…, 18, 20, 22 and 24
represent the number of frames in the animation. And looking at the y-axis reveals that the
134
Group 08ml582
5. Implementation
Animation tools
graph starts at frame 0 and a value of transformation along the x-axis of 0. But, as more
frames goes by, this transformation values also increases (along the y-axis) until it reaches a
value of 20, when the numbers of frames gone by reaches 24.
A final clarification can be found in Illustration 5.23, which shows how the same amount of
transformation (20 units along the x-axis) would look if it lasted 18 frames, 12 frames and 6
frames. The amount of motion remains the same – only the amount of frames gone by
changes.
Illustration 5.23: When the amount of transformation remains the same, the height of an animation curve also
remains the same, since these two elements are directly connected
This translation of every possible transformation of an object in the scene onto a twodimensional graph can take some time getting used to, but once the animator understands
this system, many animation problems can be solved simply by tweaking the animation
curves in the graph editor.
In Illustration 5.21 the graph editor has been used to ensure that the jump of the character had
a realistic feel to it, bearing in mind that elements from cartoons are also present, like an
unnaturally long time for the character to take-off from the trampoline.
5.3.4 Blend shapes
The function of blend shape deformers is to change the shape of one object into the shapes of
other objects and is typically used when creating facial animation of character.
135
Group 08ml582
5. Implementation
Animation tools
In this project, blend shapes has been used to create the various facial expressions of the
character, which can be seen in Illustration 5.24.
Illustration 5.24: Blend shapes were used to make the various facial expressions of the character.
Through the use of the blend shapes, it is possible to make the character seem happy, sad,
angry etc. This is a very important aspect of the overall impression that the user gets from
watching the character.
When these facial blend shapes were created, the first thing to do was to make the eyes to be
used on the character, which would also be the ones that change shape between the blend
shapes. Next step is to copy these eyes one time for each blend shape that must be made. So in
this project, the original eyes were copied 12 times in order to make all the different blend
shapes. The next step is to modify the shape of each of the eye-copies into the various facial
expressions, such as the sad eyes, the angry eyes etc. until the result shown in Illustration 5.24
was achieved. The last step is to make blend shapes out of these new facial expressions and tie
them to the original eyes. By use of Maya’s Blend Shape Editor, the animator can now switch
between the facial expressions and key them to the necessary frames of the animation. It
should be noted, that the reason that the blend shapes can be placed anywhere in the scene,
while the original eyes will change shape, while still being stuck to the head of the character
is, that Maya can be setup to ignore any differences in position, rotation and scale of the blend
136
Group 08ml582
5. Implementation
Animation tools
shapes and only take into account changes in position of the vertices relative to the center of
the eyes.
5.3.5 Reference video
In order to obtain more realistic movements from the character, a recording of movements of
the group members were made so it could work as a reference of movements in the animating
process. This can be of great help in such aspects as timing of e.g. a jump or a walk, or just as a
study of exactly how the various limbs rotate, bend and move. Also it becomes easier to
exaggerate movements when you have a clear reference to how they normally look.
Illustration 5.25 shows some key positions of the character as well as the real world reference.
Illustration 5.25: Key positions in the animation and in real world reference video.
When thinking in terms of one second lasting 24 individual frames, it can be somewhat
abstract to visualize how many frames e.g. a jump would last. The timing of the jump and the
key poses, such as the anticipation, the pushing off from the ground, the character in mid-air
and the landing can require an immense amount of trial-and-error to get to look right. But
when filming a jump of a real person and then being able to play back this jump at a speed of
24 frames per second to match the animation, while also being able to slow down the video
137
Group 08ml582
5. Implementation
Smile Picture Capture
and view each individual frame, the process of timing the key poses of the jump and make it
look natural becomes much easier. This way, the animator can see exactly how many frames it
takes to go from anticipation to lift-off or from lift-off to landing. Therefore, the use of
reference video is an extremely helpful aid and can speed up the production time
tremendously.
It has now been seen which techniques have been used to model the project character, such as
box-modeling and examples of where they were applied. The process of rigging the character
and the functionalities of techniques used have been described such as joints or control
objects, along with where to utilize them has likewise been covered. And then, the
implementation of animation was described, which tools were used, such as Key Frames or
the graph editor, while also describing where they were used. The next part of this chapter
will detail how the smile detection was implemented, which techniques where used and
examples of how to use them.
5.4 Smile Picture Capture
The learning data for the program is created by a stand-alone program called ”Smile Picture
Capture”. The program use the same Haar cascade as the Smile Detection program to locate
the face, and the mouth is also located using the same algorithm as in the detection program.
The “Smile Picture Capture” makes it easy to capture new learning material and import it into
the learning program. The program is set to 25 pictures of smiles and 25 pictures of neutral
faces for each test-participant as default. The code will be commented using pseudo code:
138
Group 08ml582
5. Implementation
Smile Picture Capture
1. Establish connection to webcam.
2. Create directory C:\Pictures\RunXXX\
3. Create index files C:\Pictures\RunXXX\SmilePics.txt and …\NeutrPics.txt
4. While not more than 25 smile and neutral images captured
1. If a face is found, do:
2. Grayscale the image
3. Crop the image so only the mouth is visible (See XXXXXX for more information)
4. Scale the image to 30x15 pixels
5. Return image
5. If keypress “1”, do
1. If less than 25 images of smiles has been saved, do
2. Save image to RUNTIMEDIRECTORY\SmilePictureX.jpg
(X in SmilePicture is replaced with A-Z)
3. Save path to SmilePics.txt for later collection import
6. If keypress “2”, do
1. If less than 25 images of neutral faces has been saved, do
2. Save image to RUNTIMEDIRECTORY\NeutrPictureX.jpg
(X in NeutrPicture is replaced with A-Z)
3. Save path to NeutrPics.txt for later collection import
7. Copy all files from RUNTIMEDIRECTORY with Smile* in the filename to:
C:\Pictures\RunXXX\Smiles\
8. Copy all files from RUNTIMEDIRECTORY with Neutr* in the filename to:
C:\Pictures\RunXXX\Neutrals\
9. Close windows, files and terminate program
If button 1 or 2 is pressed on the keyboard, the program will capture either a smile- or a
neutral picture. The program gathers all pictures in C:\Pictures\RunXX\ directory which can
easily be renamed to for example “Batch nr. 3.” The Smile Detection Training can, with slight
changes, import the produced index files into a vector of pictures, which is called a collection.
139
Group 08ml582
5. Implementation
Training program
5.4.1 Photo Session
The capture of the image to be used in the smile detection was done in a lab with the same
camera, which will be used in the testing. Light conditions were partly controlled by
fluorescent tubes, but did also include sunlight shining on the left sides of the faces.
Illustration 5.26: A series of 5 smile pictures from the same participant.
Seven participants were provoked into laughing and making neutral faces. The group of
participants included four Caucasians (Males – two with a beard), one East-European
(Female), one African (Male) and one Asian (Male). The broad composition of the participant
group was made in order to make the smile detection less sensitive to skin colors, facial hairs,
genders and inheritance. The Smile Picture Capture program was used to easily capture the
smiles and save them into collection indexes. 25 smile and 25 neutral faces were captured per
participant, making a total of 350 pictures. 26 pictures were sorted out due to wrong facial
expressions, inappropriate lights and general mouth detection failures. A total of 165 smile
pictures and 173 neutral pictures were sent to the learning program.
5.5 Training program
For the training program, first a collection of photos on which to train is needed. This is done
by the function loadCollection, which takes as input a string and an integer value. The string
is the path to a txt-file, which contains the path for all images that is to be trained on. The
integer value is used to specify the maximum size of the collection, if it is needed to limit the
collection size.
The loadCollection function implementation is explained, with original code and comments:
140
Group 08ml582
5. Implementation
Training program
vector <IplImage*> loadCollection(string file, int maxCollectionSize) /* Function takes
in two variables: The path to the index file of pictures, and number of how big the
collection */
{
vector <IplImage*> collection;
// Creates a vector of image pointers
const char *filename = file.c_str();// Converts the string file to a const char
ifstream myfile (filename);
// Opens the file for reading
string line;
// Creates a string named line
if (myfile.is_open())
// Run through the file
{
while (! myfile.eof() && collection.size() < maxCollectionSize )
{
getline(myfile,line);
const char *charline = line.c_str();
IplImage *image = cvCreateImage(cvSize(30,15),IPL_DEPTH_8U,1);
// This is image pointer for the image
IplImage *imageFlip = cvCreateImage(cvSize(30,15),IPL_DEPTH_8U,1);
// This is image pointer for the flipped image
image= cvLoadImage(charline,0);
cvEqualizeHist(image,image);
collection.push_back(image);
cvFlip(image,imageFlip,1);
collection.push_back(imageFlip);
//
//
//
//
//
Load the image
Equalize the histogram
Save image into vector
Flip image and save into imageFlip
Save imageFlip into vector
}
}
return collection;
}
The flipping of the image was done since some problems about light direction occurred. Since
the training images were taken with the light coming from approximately the same direction
in all the images, the smile detection program had difficulties tracking the smiles correctly, if
the light was coming opposite of the light in the training images. Instead of wasting time on
taking many new training images, it was decided to simply flip each image, to simulate light
coming from the opposite direction.
With the collection loaded, the mean of all the images has to be calculated. The function
getMean
takes care of calculating the mean values for each pixel, given a number of images of
equal dimensions. The function takes only one input, which is the vector of images defined by
the loadCollection function.
141
Group 08ml582
5. Implementation
Training program
Illustration 5.27: Histogram equalization process. T - Transformation
All pictures are histogram equalized since without, the neutral template turns out brighter
than the smile template, because a smile creates more attached shadows on the face (shadows
casted by the face itself). By equalizing, the histogram is “stretched”, such that the lowest
valued pixel present is set to 0 and the highest valued pixel is set to 255. All pixels in between
are scaled to fit into the new scale of the histogram. In this way, both templates are
guaranteed to span from entirely black to entirely white.
The function in OpenCV, which equalizes the histogram is called cvEqualizeHist(input,
output)
to normalize brightness and increase contrast. The function first calculates the
histogram, normalizes the histogram, finds the integral of the histogram and applies the
altered histogram to the image. The implementation of the function can be found in
cvhistogram.cpp in OpenCV source folder.
The getMean function goes through the following steps:
142
Group 08ml582
5. Implementation
Training program
/* Loads a collection of images into the function */
IplImage* getMean(vector <IplImage*> collection)
{
/* Creates two scalars, which contains an 1D array with RGB and Alpha values (a 8
bit picture) */
CvScalar s, t;
/* Creates an image with the same width and height as the training images */
IplImage* meanImg = cvCreateImage(cvSize(collection[0]->width,collection[0]>height),IPL_DEPTH_8U,1);
int temp = 0;
/* Creates a vector to temporarily save pixel values
vector <int> coordinate((collection[0]->width)*(collection[0]->height));
/* Goes through every picture in collection */
for( int i = 0; i < collection.size() ; i++ )
{
int coordinateCounter = 0;
for (int y=0; y<collection[i]->height; y++)
// For Y values
{
for (int x=0; x<collection[i]->width; x++) // For X values
{
s = cvGet2D(collection[i],y,x); // Get pixel value for image in X,Y
/* Add the pixel value for the current image into the coordinate vector */
coordinate[coordinateCounter] += s.val[0];
coordinateCounter++;
}
}
}
/* Go through the added pixel values and divide with the amount of pictures */
for (int j = 0; j<coordinate.size(); j++)
{
coordinate[j] = coordinate[j]/collection.size();
}
int pixelCounter = 0;
/* For loop that converts the coordinate vector into an image (meanImg) */
for (int h = 0; h < meanImg->height; h++)
{
for (int w = 0; w < meanImg->width; w++)
{
for (int scalar = 0; scalar < 4; scalar++)
{
t.val[scalar] = (double)coordinate[pixelCounter];
}
cvSet2D(meanImg, h, w, t);
pixelCounter++;
}
}
return meanImg;
}
The OpenCV variable CvScalar (MyDNS.jp, 2008), is actually an array with a size of four,
holding a value of red, green, blue and alpha channel at a specific pixel. However, as the
program is using only grayscale images, the value of red, green and blue is the same for each
single pixel, the first value (position 0 in the array) is chosen for each channel. As the alpha
channel is not used at all, it gets the same value as the rest of the channels, for simplicity only.
143
Group 08ml582
5. Implementation
Smile detection
The main function of the smile training program calls the loadCollection and meanImg
functions and then saves the mean image for both the neutral face expressions and for the
smiles. The saved pictures are then what are going to be used as templates in the smile
detection program.
This completes the training program that is capable of loading a series of pictures, calculating
the mean and outputting the result to an image file, that can be used in the Smile Detection
Program.
5.6 Smile detection
For the smile detection program, which was described in chapter 4.5 Smile Detection Program,
the Haar cascades are used. However, since the code used for finding a face in the webcam
feed (which is where the Haar cascades are used) is developed by and published with the
OpenCV library it will not be described in this report. The function to load a Haar cascade is:
(CvHaarClassifierCascade*)cvLoad("haarcascadefile.xml", 0, 0, 0 );
For the part of the smile detection program that is developed in this project, two different
methods were described in chapter 4.5.2 Movie playback, but in the conclusion of the design, it
was decided only to implement the second method of movie playback, which were comparing
three initial clips and then deciding on a humor type, based on the user’s reaction to these
movies. Therefore, the implementation of the main program developed in this project looks as
follows:
144
Group 08ml582
5. Implementation
Smile detection
1. Load the movie playlist from playlist.txt.
2. Establish connection to webcam.
3. Start first video clip.
4. Try to detect a mouth in the webcam feed, using mouthDetection function.
5. If a mouth is found, do
1. Check if mouth is smiling, using isSmiling function.
2. If movie clip has ended, do
1. In case of first movie, do
1. Set score for first movie clip to amount of smiling frames/time of clip.
2. Start next movie clip.
2. In case of second movie, do
1. Set score for second movie clip to amount of smiling frames/time of clip.
2. Start next movie clip.
3. In case of third movie, do
1. Set score for movie clip to amount of smiling frames/time of clip.
2. Compare the three movie clips.
3. Start playing the next movie clip in the playlist of the style with the
highest score.
6. If program is terminated, save txt-file containing times of when user started and
stopped smiling.
7. Else, go back to step 4.
5.6.1 Mouth Detection
The first necessary function is the mouthDetection. A big part of this function is also
developed by the creators of OpenCV and will not be explained here. Using the Haar cascades,
the function can find the face. What has been implemented in addition to this Haar cascade
detection is the detection of the mouth. However, this is actually just a predefined area in the
face. The area is defined as being a rectangle with the following coordinates:
145
Group 08ml582
5. Implementation
Smile detection
1
𝑥𝑥1 = 𝑐𝑐𝑥𝑥 − 𝑟𝑟 ,
2
1
𝑦𝑦1 = 𝑐𝑐𝑦𝑦 + 𝑟𝑟
3
Equation 5.1: Left, top coordinate calculations
𝑥𝑥2 = 𝑥𝑥1 + 𝑟𝑟 ,
1
𝑦𝑦2 = 𝑦𝑦1 + 𝑟𝑟
2
Equation 5.2: Right, bottom coordinate calculations
, 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑥𝑥1 , 𝑥𝑥2 , 𝑦𝑦1 , 𝑦𝑦2 𝑎𝑎𝑎𝑎𝑎𝑎 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐, 𝑐𝑐𝑥𝑥 , 𝑐𝑐𝑦𝑦 𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑎𝑎𝑎𝑎𝑎𝑎 𝑟𝑟 𝑖𝑖𝑖𝑖 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟
OpenCV draws rectangles from the top left corner to the bottom right corner. This is specified
in the code and calculated by using the information about the face center and the radius of the
circle surrounding the face. Again, it is important to note for these calculations that the origin
of an OpenCV image is specified as the top left corner. Illustration 5.28 shows the parameters
in Equation 5.1 and Equation 5.2 in a graphical manner.
Illustration 5.28: The mouth detection function specifies an area around the mouth by starting at the center of the
face (center of the black circle) and calculating the top left and bottom right corner of the green rectangle, using the
radius of the black circle.
146
Group 08ml582
5. Implementation
Smile detection
When the mouth is defined, the pixel data for the area is saved into an IplImage* and is
returned. If no face is found, the function returns a grey image.
Next step is to resize the image to a 30*15 pixel image. This step is done purely to optimize
the process and make the program run faster. With only 450 pixels to compare, instead of
maybe 3000-5000 pixels, the program will make the comparison much faster. Furthermore,
the image should have exactly the same size to use the mean image detection method.
5.6.2 Comparison
With the mean images loaded and an image of the mouth in the current frame, the actual smile
detection can be made. This is done with the isSmiling function, which performs the
following steps:
Compare current image with smile template:
1. For every pixel in current image, do
{
1. Get CvScalar value of current pixel in current image.
2. Get CvScalar value of current pixel in smileTemplate.
3. Increase diff by the absolute difference between the two images.
4. Store diff.
}
{
5. Get CvScalar value of current pixel in current image.
6. Get CvScalar value of current pixel in neutralTemplate.
7. Increase diff by square of difference between the two pixels.
8. Store diff.
}
2. If squareroot from current image to smileTemplate < distance to
neutralTemplate + threshold, return true, else, return false.
The threshold value is used to bias the results towards smiling, if the program does not track
smiles accurately enough. What was discovered during the implementation phase was that
the distance from the image to the smileTemplate was often longer than the distance to the
147
Group 08ml582
5. Implementation
Smile detection
neutralTemplate, resulting in the image being detected as not smiling, even if it really was. By
adding the threshold value, as described in subchapter 3.8.1.2 Simple method, to the distance
from the current image to the neutralTemplate, the distance from the current image to the
smileTemplate will more often be smaller than the distance to the neutralTemplate, resulting
in the image being detected as a smile.
5.6.3 Movie Playback
The movie playback is performed by sending a command through the system function, which
is part of the C++ library called stdlib.h. The system function executes commands through
the command prompt and as such, any process can be started with this command. This
functionality is used to start an instance of Windows Media Player, with the movie name
included in the command call. An example of a command could be:
“start wmplayer.exe C:\\Movies\\movie0.wmv /fullscreen”
This command starts Windows Media Player and plays the file movie0.wmv, which can be
found in the folder C:\Movies. /fullscreen at the end of the command will ensure that
Windows Media Player is running in full screen.
A playlist is composed and written as a txt-file, containing the commands for start of all
needed video files and the smile detector program is then loading in this text-file into an
array, from which the commands can be called. The function called playMovie takes as input
an integer and an array of const char*. Calling the function, it will play the movie from the
array, with the integer value specified. playMovie(2,file) will play the 3rd file (due to array
index, where 0 is the first element of the array) in the array called “file”.
In order to get statistics from the test, the program detects every time a user starts smiling
and stops smiling. When a user starts smiling, the amount of milliseconds since program start
is stored in a vector called smileTimer and no other value is stored until the user stops
smiling again. This continues throughout the entire program and as the program terminates
(when last movie is played), the content of the vector is saved in a txt-file for use afterwards.
This concludes the implementation of Smile Detection Program, which is capable of loading
the image files from the Training program, utilizing these as templates for smile detection.
The Smile Detection Program will detect the face of a user and determine whether the user is
148
Group 08ml582
5. Implementation
Implementation Conclusion
smiling or not. Depending on the results of the smile detection, the program will choose to
play a movie clip according to these results.
5.7 Implementation Conclusion
In this chapter it has been shown how the entire product was created, based on the designs
made in chapter 4. Design. First of all, the actual implementation of the character within Maya,
including techniques concerning the process of modeling the character was covered, followed
by the techniques and methods utilized when rigging the character and setting it up for
animation. And then, the animation process and the tools used for this were covered.
Three programs where created for the smile detection, the first of these was the Smile Picture
Capture which was used to create a series of images of people smiling and non-smiling people,
and categorize them accordingly.
The second program was the Training Program which then took all of the pictures that were
taken in the Smile Picture Capture program, processed them and calculated the mean values
of each pixel coordinate from the pictures from each category. The calculated means where
then saved into two files “smileTemplate” and “neutralTemplate”.
The third program was the Smile Detection program, which is the program actually used to
detect the smiles. This was done by using a webcam feed, locating the mouth and then
comparing each pixel coordinate in that region to that of the mean from the training data
provided by the second program. The data itself is also used by the program in order to
determine which of the animated movies should be played. With this, a working prototype has
been created consisting of animated movie clips, and a program that detects smiles and then
chooses between the movies.
149
Group 08ml582
6. Testing
Cross Validation Test
6. Testing
This chapter will cover the setup and conduction of the test that has been performed in the
project. First, the cross validation will be covered as it was described in chapter 2.5 Testing the
product and it will be checked if the detection rate fulfills the criteria set (75% is required 80% is desired). Next, the tests involving users will be introduced, using the DECIDE
framework and finally the results of the initial and final test will be presented, analyzed and
evaluated. The conclusion of this chapter will determine if the solution proposed in this
project, is a valid solution for the problem formulation.
6.1 Cross Validation Test
As mentioned in chapter 2.5 Testing the product, there are some requirements to the
successfulness of the smile detection program, in order for the final test to be useful. It was
established that a success rate of at least 75% was needed, and a success rate of at least 80%
was desired. The method of doing cross validation test was discussed in chapter 2.5 Testing
the product, so this chapter will be concerned about the implementation of the test and the
results produced.
For conducting the test, another small application was developed. However, the functions of
this new application was already implemented in the actual smile detection program, so for
doing the cross validation test, only some minor changes were needed. Instead of testing on
all of the training data, it was necessary to subtract 10% of the training data and use it as test
data. This was done by running through a loop in the code, each time subtracting a new 10%
from the training data and then specifying the same 10% as the test data, until all images has
been trough the test. In the case of the cross validation for this program, it means that there
were taken 16 images out of each class of the training data to be used as test data, for each
iteration of the test. This leaves 298 images as smile training data and 314 as neutral training
data, because all images have been flipped and copied. The program will then produce a text
file for the result of the neutral images and smiling images, stating the success rate.
Furthermore, a threshold was also implemented, so the cross validation should take into
concern different threshold values.
Two methods were tested in the cross validation, both of them described in chapter 3.8 Smile
detection, and what will be described here is the results of both tests, run at different
150
Group 08ml582
6. Testing
threshold values, as described in chapter 5.6 Smile detection. The first method uses no length
calculation, but calculates the absolute pixel difference between two pictures. The second
method calculates distance between the images, represented as vectors in Euclidean space,
and uses this for comparison.
6.1.1.1 Method 1
Referring back to chapter 2.5 Testing the product, the way to represent the cross validation, is
to make a table, showing the detection rate for both smiles and non-smiles. For example, at a
threshold of 0, the results of the cross validation for method 1, could be illustrated as in Table
6.1.
t = threshold
Detected as smile
Smiles (t=0)
76.67%
23.33%
Non-smiles (t=0)
7.5%
92.5%
Table 6.1: The results of the cross validation of the first method, as threshold 0, shows a good detection rate for nonsmiles, but a poor detection rate for smiles.
This shows that the 76.67% of the smiles were actually detected as smiles; where as 92.5% of
the non-smiles were detected as such. In Table 6.2, a few selected thresholds have been picked
out, to show the results at different threshold values.
t = threshold
Detected as smile
Smiles (t=2500)
83.33%
16.67%
Non-smiles (t=2500)
11.67%
88.33%
Smiles (t=5000)
91.67%
8.33%
Non-smiles (t=5000)
17.5%
82.5%
Smiles (t=7500)
97.5%
2.5%
Non-smiles (t=7500)
24.17%
75.83%
Smiles (t=10000)
100%
0%
Non-smiles (t=10000)
35%
65%
Table 6.2: The results of the cross validation test run on the first method, at a few selected threshold values.
As it can be seen from Table 6.2, a threshold higher than 7500 will be too high, as the nonsmiles success rate is getting too low at that point. In order to determine the optimal
threshold value, more comparisons have to be made.
151
Group 08ml582
6. Testing
6.1.1.2 Method 2
t = threshold
Detected as smile
Smiles (t=0)
74.79%
25.21%
Non-smiles (t=0)
7.56%
92.44%
Smiles (t=125)
84.03%
15.97%
Non-smiles (t=125)
9.24%
90.76%
Smiles (t=250)
93.28%
6.72%
Non-smiles (t=250)
14.29%
85.71%
Smiles (t=375)
97.48%
2.52%
Non-smiles (t=375)
17.65%
82.35%
Smiles (t=500)
100%
0%
Non-smiles (t=500)
24.42%
70.58%
Table 6.3: The results of the cross validation test run on the second method, at a few selected threshold values.
As it can be seen in Table 6.3, there are different threshold values for this test, which is due to
the way the difference between the images is computed. In the second method a square root
computation is involved in calculating the difference between the images, which was not the
case in the first method, in which the total difference was computed. Therefore, the values in
the second method are smaller than in the first method and the threshold has to be scaled
accordingly.
In order to compare the two methods, the average value of the detection rate for both smiles
and non-smiles in both methods is computed. Illustration 6.1 and Illustration 6.2 shows the
average detection rate for smiles and non-smiles in both methods.
152
Group 08ml582
6. Testing
Average detection rate
method 1
92
90
88
% 86
84
82
80
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
Threshold
Illustration 6.1: The first method peeks at just above 89% average detection rate, at a threshold of 6600-6700.
Average detection rate
method 2
92
90
88
% 86
84
82
80
450
400
350
300
250
200
150
100
50
0
Threshold
Illustration 6.2: The second method peeks at almost 90% average detection rate, at a threshold of 215-220.
In the first method, the highest average detection rate is 89.08%, which is reached at a
threshold of 6600-6700. In the second method the average detection rate reaches a height
point at threshold of 315-320 as it reaches an average detection rate of 89.92%. With an
average detection rate of almost one percentage point higher for the second method, this
proved to be the most efficient way of doing the smile detection. Furthermore, the detection
rate of both smiles and non-smiles in the second method, at the optimal average detection
rate is above the desired detection rate of 80%, as the detection rate for smiles is 96.64% at a
threshold of 315-320 and the detection rate for non-smiles is 83.19%. For the first method,
153
Group 08ml582
6. Testing
DECIDE framework
both values do also pass the 80% detection rate, but the detection rate for non-smiles is only
80.67%, while the detection rate for smiles is higher, at 89.08%. Referring back to chapter 2.5
Testing the product, the results of the cross validation of both methods exceeded our
expectations, but the second method is chosen because of the best result.
6.2 DECIDE framework
As mentioned in chapter 2.5 Testing the product, the DECIDE framework will be used to setup
and evaluate the test to be performed in this project. The DECIDE framework offers a six-step
guide to perform evaluation which is(Sharp, Rogers, & Preece, 2006):
1. Determine the goals.
2. Explore the questions.
3. Choose the evaluation approach and methods.
4. Identify the practical issues.
5. Decide how to deal with the ethical issues.
6. Evaluate, analyze, interpret, and present the data.
Following this framework ensures that many aspects in the evaluation context are covered.
The DECIDE framework is driven by goals, which assist in clarifying the scope of the test,
Once the goals have been established, the next step is to begin investigating which questions
to ask, and later which specific methods to use. Thus whether one chooses to employ e.g. a
usability test or a focus group interview depends on what the goals and questions are.
Although the DECIDE framework is a list, it does not necessarily mean that the work should be
done step-by-step wise. The items in the list may either be worked with iteratively, or perhaps
going backwards as well as forwards in the list.
6.2.1 Determine the goals
Based on the problem statement of the project, the overall goal of this test will be to
determine whether or not the program created is successful in choosing the movie clip that
the users actually thought was the funniest. If this is the case, the users should get an
impression that the reactive playback of movies is funnier than just playing back random
movie clips.
154
Group 08ml582
6. Testing
DECIDE framework
The goal will be to obtain quantitative data about what movie clip the users thought was the
funniest in their own opinion and what movie clip the users thought was the funniest
according to the program.
6.2.2 Explore the questions
The questions asked should lead to achieving the goal by getting the relevant data from the
users. In this test, the program itself will give many of the answers to the questions needed to
be answered, e.g. when and how long the users smiled. This data can be used to determine if
the users are smiling more during the reactive version compared to the non-reactive version.
However, some questions are still needed to be answered by the users. First of all, the users
should be presented a still shot from each of the three initial movies and be asked to choose
the one they thought was the funniest.
As the program should choose a type of humor corresponding to the one the user thought was
the funniest, a question should also be asked about whether or not the user could see this
correlation. However, the program might not choose the type of humor that the user
identified as the funniest clip in the previous question, so this question should explore which
of the three initial types the last two movie clips was most similar to.
This sum up to the following questions:
•
•
Did the program succeed in detecting the right humor type for the user?
•
Which movie clip did the user like the most according to himself?
•
initial movie?
Did the user see the connection in humor type from last two movies to the according
Did the users smile more of the reactive version than the non-reactive version?
6.2.3 Choose the evaluation approach and methods
The users will fill out a questionnaire, producing quantitative results about the questions
answered. Comparing the data given by the users with the data produced by the program, it is
possible to get results about the success of the product. Both questions in the questionnaire
are producing quantitative data and so will the program, so the data is easily comparable.
The test will be conducted by placing the user in front of a screen with a integrated webcam.
The users will watch the movie clips on the screen, while the webcam is tracking their face.
155
Group 08ml582
6. Testing
DECIDE framework
The program will continuously save information about the user’s facial expression (smiling or
not smiling), but during the test, the user is not supposed to do anything but to watch the
movies. Afterwards, the user will be asked to answer the questions given on the
questionnaire.
Opposite of the user, one test observer will be watching the progress on another screen. This
test observer will be able to see whether or not the program is tracking the user correctly, as
he can see a symbol representing whether or not the program is tracking a smile or not, as
seen in Illustration 6.3.
Illustration 6.3: As the test is running, the test observer watching the process of the test is seeing these two symbols
to identify whether or not the user is smiling. If the program is detecting that the user is not smiling, the program
shows the left picture and vice versa.
This allows the observer to change the smile threshold, if the program is tracking the current
user’s smile poorly.
Another observer is standing by to answer questions from the user before, under or after the
test. This observer will also be in charge of handing out the consent form to be signed before
the test starts and the questionnaire that is to be filled out after the test.
The test persons will be divided into two groups: Half the test persons will test a reactive
version, with the program reacting to their smiles, and the other half will test a non-reactive
version, not considering the test persons’ smiles in the movie playback.
6.2.4 Identify the practical issues
The practical issues of this test are mostly concerned with the test environment. The test
persons will be fulfilling the target group description discussed in chapter 2.4 Target group, so
there are no legal issues about age to be taken into concern.
156
Group 08ml582
6. Testing
DECIDE framework
The test will be conducted at Aalborg University Copenhagen and as students at this
institution, we can use the rooms freely, which rules out the need of taking into concern
economical issues. At a later test of the product, it might be prudent to test it in the
environment and conditions that it is supposed to be used in, but for this prototype test, the
conditions within the university will be sufficient.
The equipment needed is limited to a computer, with a webcam running the program which
has been developed. It is not necessary to have any external cameras running, to record the
movement of the users as the webcam will be sufficient to track the things needed. However
the users will need to sign an agreement that they are going to be recorded.
The test of each user will be only around 10-15 minutes, including answering the
questionnaire. This means that it should be possible to get a sufficient number of participants
tested during a day of testing. Since this test will only produce indications to the
successfulness of the product and indications about the whether or not the goal is achieved, a
test group of roughly 30 persons will be sufficient, 15 for each of the two test methods.
6.2.5 Decide how to deal with the ethical issues
The ethical aspects of the test cover the rights of the users we are going to test on. The users
have some basic rights that are to be thought of when conducting a test. First of all, the
subjects will have to give some personal information, in order to assure that they fit the age
requirements for the target group, so the only necessary information is about age.
Before the tests starts, the participants will have to be informed about what they are going to
be exposed to, during the test and what the goals of this test are. The subjects have the right to
know what the goals of this project are and what their role is in relation to the project. This
means that the users will have to sign a consent form, where they give their consent for us to
use any data that the test will give.
When the test is running, the subjects have the right to leave at any time, which they should
also be informed about. The subjects are also allowed to leave even before the tests starts, if
they should not accept the terms of the test.
Since the participants of this test will be recorded using a camera, they will need to accept this
recording and the right of the use of it in the project. The test could be performed without
157
Group 08ml582
6. Testing
Analyzing the results
recording the participants, but in order to verify the results afterwards and have some
material to analyze, each participant will be recorded, unless he explicitly decides not to be
recorded.
All these ethical rights have to be written down, such that each participant’s have given their
consent with the terms and that each have signed this consent form. This consent form can be
seen in Appendix X.
6.3 Analyzing the results
In this sub-chapter the results acquired from both the initial and final test will be analyzed.
Selected results will be displayed graphically and the results will be discussed. Furthermore,
the test method will be evaluated and suggestions for an extensive test will be presented.
6.3.1 Initial test
The first test that was conducted was a test to see whether or not the users agreed with the
choices the program made or not. The program is, as explained in chapter 5. Implementation,
designed to measure the amount of time the user is smiling during one specific clip and then
compare the three initial clips to determine which of the movie clips were liked the most. This
test took advantage of this feature and measured the user during a test of approximately 2-4
minutes.
The users were placed in front of a laptop screen with a built-in webcam, which were used for
tracking. One test observer was sitting behind another screen, watching the progress of the
program and making sure the program was tracking as supposed to. Two other test observers
were present to help in assisting the test subject and answering questions.
Before the test started the users signed a consent form about their participation in the test
and that they agreed to the terms of the test. Then they were instructed to just sit back and
watch the movie and afterwards, they were asked to fill out a questionnaire about the movie
they saw.
Since this was an initial test, a total of only 10 persons participated in this test. The small
number of participants is justified because the test only served as an indication of the validity
of the test method.
158
Group 08ml582
6. Testing
The participants were asked to answer the following questions:
•
On a scale from 1 to 5, where 1 is no change and 5 is a higher number of changes, how
•
often did you find the type of humor to change during the test?
•
feel you had control over the choice of movie clips?
•
clips?
On a scale from 1 to 5, where 1 is no control and 5 is full control, how much did you
If you felt that you were in control, how did you feel you were controlling the movie
Which of the following clips did you think were the funniest? (The user was presented
with the screenshots: One from each of the three initial movies)
Illustration 6.4: From the right: 1 is the first type of humor (ball play), 2 is the second type of humor (black humor)
and 3 is the third type of humor (falling apart)
The program itself returned data about what movie clip the user smiled at the most, for
comparison after the test. In Appendix 12.4 Test Results, the full results for the test can be seen,
but in this chapter, only the facts about choice of movie clips will be discussed deeply. Looking
at the results of what the users chose as the funniest clip, Illustration 6.5, shows the
distribution of answers.
159
Group 08ml582
6. Testing
Preferred clip
User
6
5
4
Ball play
Black humor
3
Falling apart
2
1
0
Illustration 6.5: The distribution of answers to the question of which movie clips the user found the funniest. 1 is the
first type of humor (ball play), 2 is the second type of humor (black humor), 3 is - third type of humor (falling apart)
As it can be seen in Illustration 6.5, there is an uneven distribution of preference towards the
movies. 5 people thought that the black type of humor was the funniest, while 4 users thought
the falling apart type of humor, in which the character is losing limbs, was the funniest. Only
one person preferred the ball play humor.
However, the results produced by the program, measuring how long the users where smiling
at each clip, produced somewhat different results, as seen in Illustration 6.6.
Preferred clip
Program
3,5
3
2,5
2
1,5
Ball play
Black humor
Falling apart
1
0,5
0
Illustration 6.6: The program detected a somewhat different distribution of what the users liked. Here, 1st and 3rd
type of humor each received 3 votes, while the second type received only two. Two tests turned out invalid.
160
Group 08ml582
6. Testing
First, what has to be noted is that two of the tests turned out invalid. In one test, the user
smiled so much that it ended in a “draw” between two of the movies. In another case, the user
was not smiling at all (or at least not enough to make the program react). This meant that
none of the movies received any score and thus, the program could not decide which clip the
user liked the most. These problems in the program was something to be solved before the
final test, preventing a premature termination of the program.
As it can be seen from Illustration 6.5 and Illustration 6.6, there were differences between the
users’ answers and what the program detected. However, to get a clear view of the result, it is
necessary to look at each test subject and their answers, compared to the results of the
program. This comparison can be seen in Table 6.4:
Test subject
Person 1
Person 2
Person 3
Person 4
Person 5
Person 6
Person 7
Person 8
Person 9
Person 10
User choice
2
3
2
2
3
1
3
2
2
3
Detected funniest
N/A
2
N/A
3
1
1
3
2
3
1
Correct choice?
False
False
False
False
False
True
True
True
False
False
Table 6.4: Distribution of answers for each test person and the according result of the program reading.
As it can be seen, only 3 of 10 test persons actually chose the funniest clip to be the clip that
the program detected them to like the most (that is, smiled the most at). The reasons for this
result can be many, one of them being the program developed during this project. There were
instances where the program lost track of the user’s face, because the user moved in his chair,
ending up outside the camera scope. In another instance, the user started scratching his nose
during the test, which also prevented the camera from tracking him correctly.
Another problem is that the user might not actually think that the funniest clip is the one that
he/she smiles at the longest. What the program lacks is the opportunity to measure how much
the user is smiling, rather than only measuring the amount of time he smiled. If the user finds
that one movie clip has many small funny situations, he might actually smile for a long period
161
Group 08ml582
6. Testing
during that clip, but find that another clip is funnier for a short period of time and prefer that
clip.
6.3.2 Final test
The final test was in many ways conducted in the same way as the initial test. The general test
setup was the same, but instead of only testing on the reactive program, every second test
person was tested on a program that did not react according to the user’s smile. The nonreactive program simply just chose two movie clips without taking into concern the amount of
smiling during each clip.
Furthermore, the questionnaire was changed to reflect what was described in chapter 6.2
DECIDE framework, since the only important information needed from the users, were about
what movie they preferred and if they could recognize the humor type from initial to ending
movie clips. Therefore, the users only had to choose which of the first three movie clip was the
funniest and which of the three first movie clips were most similar to the two last animatics.
This is done to determine if the user could recognize a correlation between a type of humor
from the animated clips and the type of humor in the storyboards. For this test it was not
important whether or not the users felt they were in control or if they understood the
interaction method, as the comparison could be made from the reactive and the non-reactive
version.
The final test was conducted on a total of 30 test persons, 15 testing the reactive and 15
testing the non-reactive. The entire amount of data can be seen in Appendix 12.4 Test Results,
but in this chapter, only the key elements of the test will be discussed. First of all, the
program’s ability to detect what movie clip the user liked the most was tested. This test was
performed on all 30 test persons, even though the information was only used in half of the
cases. With the information saved by the program and the answers that the users gave, the
results shown in Illustration 6.7 were obtained.
162
Group 08ml582
6. Testing
Preferred clip
Preferred clip
User
Program
25
12
20
10
15
Falling apart
Black humor
10
Ball play
8
6
4
5
2
0
0
Falling apart
Black humor
Ball play
Illustration 6.7: The left illustration shows the distribution of answers given by the users; to what movie clip they
liked the most. The right illustration shows what the program detected that they liked the most.
Despite the difference in what the users chose compared to what the program chose, the
program did choose the type of humor that the users preferred in half of the cases. As seen,
the users had a strong preference toward the third movie clip (ball play humor), but the
program only detected the users to be smiling at most at this clip in 11 cases. The first movie
clip (falling apart humor) however, was only preferred by 3 test persons, but the program
detected that the users were smiling most of this clip in 11 cases too, just as for the third
movie clip. Looking back at chapter 6.3.1 Initial test (note that the first and third type of
humor has been interchanged, such that the ball play humor is now the third type of humor,
compared to being the first in the initial test), the results in the initial test differs from the
results of the final test, in what movie clip the users’ preferred. The animatics produced
apparently did not reflect the story in the scenes as well as the animated counterparts.
When it comes to the users connecting the last two movies clips to one of the three initial
clips, the users had difficulties in recognizing them. In total, 11 of the users related the type of
the last two movie clips to the correct initial one, while 19 did not. However, 8 out of 11 of the
correct choices were found in the reactive version. This shows that the users were in general
not able to relate the one of the first three animation movie clips to the last animatics with the
same humor type.
163
Group 08ml582
6. Testing
As explained in the DECIDE model, the reactive version is expected to entertain the user more
than the non-reactive version. This was tested by measuring for how long the users smiled
during the test. Illustration 6.8 shows for how long the users smiled in average during the two
different test methods.
Average Smile Time
25000
Miliseconds
20000
15000
10000
Reactive
Non-reactive
5000
0
Illustration 6.8: As seen in the above illustration, the users were in average smiling more at the reactive version of
the video playback than the non-reactive version.
This test cannot prove that the users liked the reactive method more than the non-reactive,
just because the users smiled more at one kind of playback than the other. However, it is an
indication that the users actually did smile more when they were showed the type that they
initially preferred (at least by the program’s definition).
With the information gained from this test, there is an indication that the reactive version of
the product is actually more entertaining than just watching the movie clips alone. However,
this is based on the assumption that the users are always smiling for the longest period of the
movie they thought was the funniest. This is, as already described in this chapter, not always
the case, since users in half the cases, chose another movie clip than the one the program had
detected, to be the funniest. One movie clip might seem very funny to the user for a short
period of time, while another movie clip might seem less funny, but for a longer period of
time. In this case, the program would detect that the user liked the second movie clip the
better, simply because the user smiled for a longer period in this movie clip, but in reality the
164
Group 08ml582
6. Testing
user might think the movie with a short, but very funny situation was the funniest. A more
accurate test-result would require that the program could also detect the degrees of smiling.
There were also indications that the user had problems connecting humor type from initial
movie clips to following movie clips. However, as there were only animated video clips
available for the three initial movies, the significant change in look of the movie clips (full
animation vs. animatics consisting of drafty storyboards) might have confused the users so
much that they were not able to recognize the change in humor type. Furthermore, the
different humor types might have been too close to each other to clearly identify one type
from another.
The data produced by the program during the test also made it possible to establish at
approximately what times the users smiled at each movie clip. There was some delay between
the program and the video player showing the videos to the user, but an exact duration of this
delay could not be established and since the program returned times of smiling at
milliseconds after program start, the values of milliseconds produced by the program might
not be exactly at the correct times during the movie clips. It will however give an indication as
to whether or not there were humoristic peaks in the movie clips that was coherent to what
was expected to be funny.
What should ideally be seen in the graphs is that few or no people should be smiling at the
sequences of the movie clips, which was not designed to be funny and then have a high
amount of people smiling when the scenes went into a funny sequence.
What can be seen in Illustration 6.9is how the users reacted to the first movie clip, with the
falling apart type of humor. The graph is made by rounding all data about smiles to whole
seconds and then plotting how many users were smiling at each individual second during the
movie clip. Note that the data is gained only from people testing the reactive version.
165
Group 08ml582
6. Testing
Smile Development
Amount of users smiling
Falling apart
10
8
6
4
2
0
1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33
Seconds
Illustration 6.9: At certain times during the movie clips with the falling apart type of humor, there were peaks, where
more than half of the users were smiling at the same time.
Comparing the data in Illustration 6.9 with the actual video and what was supposed to be
funny, there are similarities. This was the first video to be played in the test, so this was the
very first time any of the users saw the main character. This video started with a short period
of only music playing, so the first peak (the green dot) is actually where the users first see the
character. Even though this was not supposed to be a funny element of the movie clip itself,
the character was designed to be perceived as likeable, so the reaction of the users smiling is
an indication that they perceived the character as intended.
The next peak (the yellow dot) is around the time where the character finds the mysterious
music box, lifts it and shakes it, but the funniest element of this movie was designed to be the
part where the boxing glove knocks the head of the character. In this period however, there
were actually less users smiling, which might have been due to the implementation of the
movie. The boxing glove is appearing for less than one second in the movie and with no
previous introduction and the users might have had a hard time perceiving what actually
happened to the character in this short period.
In the end, the curve peaks again (the red dots), which is approximately around the time
where the character turns to walk out of the scene, his head on backwards, and crashes into
something off-screen. This was a final twist to the movie clip and the users seemed to have
found this surprising end funny, or at least liked it.
166
Group 08ml582
6. Testing
The other movie clip that – according to the program – was the funniest to watch was the ball
play type of humor clip. In this movie clip, the humoristic peak was designed to be around
when the character goes into slow-motion when dodging the small balls coming towards him.
This actually did seem to make the users smile and again taking into consideration the delay
of the time values, the highest peak in Illustration 6.10 (the yellow dots) is matching the
period of time in the video, where the character starts to go into slow-motion and the camera
starts moving around him. However, it seems the users got bored of the effect quite quickly, or
that they did not understand what was happening when all the balls came flying back, since
there was a strong decrease in amount of smiles in the last period of the slow-motion effect
sequence.
Smile Development
Ball play
7
6
5
4
3
2
1
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
Seconds
Illustration 6.10: The clips with the ball play type of humor had a high peak around the time of the slow motion
sequence.
There was a small peak towards the end of the timeline (the red dots), when the character
thinks he has dodged all the balls, but then a single, delayed ball comes flying towards the
character and knocks him over. This was the “joke” that this movie clip was based around and
even though there was a rise in amount of people smiling, more people enjoyed the slow-
motion sequence than this last humoristic development.
For reference, the development of smiles during the third movie clip is shown in Illustration
6.11, but there was no real indications of what was preferred and what was not, during this
movie clip. The actual fun part of the movie clip (the character dropping to the ground and
167
Group 08ml582
6. Testing
having a piano drop on top of him), made less people smile than the introduction and jumping
on the trampoline. However, this clip was also the clip that made fewest people smile in total
and therefore, there was not as much data to work with, in making this graph.
Smile Development
Black Humor
6
5
4
3
2
1
0
1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35
Seconds
Illustration 6.11: The clips of the black humor showed no real indications as to what was preferred or not by the
users and what made them smile most. There are no real peaks in the graph.
Even though the data about when the users smiled gave indications, it is difficult to prove that
the users did smile at the actual sequences they were planned to smile at. The peaks on the
graphs were not clearly separated from the theoretically non-humoristic parts, where the
graph should have been at a lower level. Furthermore, at no point did more than eight of 15
users smile at the same moment in time and even if there was delays in the data produced, the
peaks should still be clearly separated, even if they were a little shift in either direction on the
time line.
6.3.3 Extensive testing
For future development and testing, a more extensive test could be conducted, compared to
the final test conducted in this project. Even if the method used for the final test in this project
was performed without major flaws, there are more things to be tested on and taken into
consideration for a future test.
Even though the final test was conducted on 30 persons, there was a chance that the users
tested on the reactive version were simply more expressive when seeing something they
168
Group 08ml582
6. Testing
Conclusion of the test
found amusing, than the users testing the non-reactive version. For further development of
the product, it would be prudent to introduce a focus group in the development stage. By
testing the product at different stages of the development, it would be possible to get the user
of the focus group mapped to the different types of humor and how expressive each of the
users are when watching the movie clips. With this information, it would be possible to do a
more balanced test to compare a non-reactive and a reactive movie playback.
The final test in this project placed the users in a unfamiliar situation, because they were
placed together with two test observers that they did not know beforehand. Had the users
been placed in an environment they were familiar with and around people they knew, it might
have been easier to influence them and make them smile. If using a focus group the users
would have a chance to get familiar with this situation and feel more relaxed in the test
situation.
6.4 Conclusion of the test
The cross validation proved to be fulfilling both the required and the desired requirements
set. With the method chosen for doing the smile detection, an average success rate of 89.92%
was found at the optimal threshold setting, which proved that the technical aspect of the smile
detection were useful for the final test involving users.
The initial test involving users proved that the test setup and method was useful for further
testing. Minor adjustments had to be made to the detection program before the final test, in
order to prevent invalid test results.
The final test showed that the smile detection program detected the type of humor that the
users actually thought was the funniest in half the cases. This result might be due to the
structure of the movie clips and the users’ personal preferences, since the success rate of the
smile detection was already proven to be sufficient. The individual movie clips would have to
be funny for an equally long period of time, in order for the program to make a fair
comparison between the different clips.
The final thing that was investigated in the analyses of the final test was the coherence
between at what times the users smiled at the animated movie clips, compared to what was
expected of the movie clips when they were implemented. For the falling apart type and ball
169
Group 08ml582
6. Testing
play type of humor, there was indications that in general, more people smiled at the periods of
the movie clips that were designed to be the funniest. However, there was no clear proof in
the results and for the black humor, it was impossible to even get useable indications, partly
due to lack of data, since the users were not smiling as much during this clip, compared to the
two others.
What the testing of this product indicated, is that it is indeed possible to make an adaptive
program – the smile detector – which controls the type of humor in a series of movie clips,
according to a user’s reaction to previous played clips. However, the movie clips would have
to be highly adjusted for this purpose, or the program be able to detect the smiles in a
different way than just by the amount of time the user was smiling. The program did react on
user input, but the general interpretation of the data obtained from the users, would have to
be altered to make the program choose a correct type of humor in every case.
170
Group 08ml582
7. Discussion
7. Discussion
There are a number of things that could have been improved about the product of this project,
in order to prove the final problem formulation even more extensively. First off all, it would
be prudent to have more animated clips and make them seem connected through one long
story. The test showed that users had problems connecting the type of humor in the animatics
with the type of humor in the animated movie clips. It would be needed to first of all have
everything fully animated and make sure that every clip is connected through a narrative
structure to both the previous and next movie clip.
Furthermore, it also proved hard to distinct the various types of humor. Without the use of
dialogue, the types of humor available were limited to only slapstick humor. And even though
the movie clips are variations of slapstick humor, it can be difficult to immediately distinguish
between the three humor types. All three animations involves the character getting injured in
various ways, in even though these ways are what actually distinguish the humor types from
each other, the viewer might simply view them as three instances of the character getting
hurt. Therefore, it could greatly help distinguish between the humor types by adding different
types of dialogue based humor, making a clear difference between speaking and non-speaking
movie clips.
In order to enable the program to read more subtle smiles, it would be sensible to include the
ability to detect more than just a smile. It was established during the test that users did not
necessarily like the movie clip they smiled at the most. This was due to the fact that a user
might like one, short joke more than a long joke or a lot of small jokes. As the program could
only detect how long the users were smiling, it would determine the clip which the user
smiled at the longest, as being the one that the user thought was funniest. In order to take care
of this problem, the program would have to be adjusted to be able to read the degrees of a
smile, to determine what the user actually thought were the funniest.
The last thing to improve is the amount of training data for the program. With more training
data, the detection would be even more accurate and also in higher degree be able to read
people with e.g. beard.
After this discussion of what can be improved about the product of this project, a conclusion
which will give an overview of the entire project will be introduced.
171
Group 08ml582
8. Conclusion
8. Conclusion
By reading the preface along with this conclusion, the reader should be able to get a short
summary of the contents and purpose of each chapter in the report is given in this conclusion.
The motivation of the entire project was to challenge the nature of communication between a
movie and a viewer. Traditionally, this communication is entirely one-way, in that only the
movie can influence the viewer. This has made watching a movie a rather passive experience
and all the effects a movie can have on a viewer never go beyond reaching the viewer. This
project aimed at seizing these effects experienced by the viewer and utilizing them with the
goal of changing the movie itself. This should make watching a movie a much more personal
and specific experience, based on the mood of the viewer when watching it.
In order to formulate a specific problem from this motivation, the problem inherent in the
motivation was researched, investigating the other work done in this area of having a viewer’s
mood influence the movie being watched. Different attempts at creating interactive movies we
examined, along with ways of detecting a person’s facial expression, since doing so was
decided upon as the method of registering a viewer’s mood. From this it was concluded that
no previous work had been done to achieve a solution, which could satisfy the motivation.
Thus an initial problem statement for the project could be formed. This was the first step in
specifying the motivation into a problem to work with, in the rest of the project.
In order to narrow down this statement into a concrete final problem statement, it was
analyzed in its various areas. Firstly, the specific mood to focus on in the project was
determined to be “happy”. The next step was researching storytelling in a movie in order to
understand how to evoke a specific mood in a viewer. On a more technical side, it was
examined whether to use real-time or pre-rendering to obtain the best way of presenting the
movie to the viewer and pre-rendering was found to be the optimal choice. A target group for
the product was determined as primarily being people above the age of 18. And finally it was
chosen to test the product through cross validation and the DECIDE framework. This
preliminary analysis enabled the group to decide upon a specific area of interest, from which
the final problem formulation was derived. At this point, the motivation has been specified
into a precise problem, which will be what the remainder of the project work is based upon.
172
Group 08ml582
8. Conclusion
As with the initial problem statement, this final statement was broken down into the areas
that needed to be researched in order for the final statement to be analyzed and fulfilled.
This research was divided into two main areas. Firstly, relevant research was done to gain
sufficient knowledge about how to use the medium of a movie, such a deliberate use of sound
or the camera, to mediate an effect to the viewer. Secondly, research concerning the creation
of a program to capture the facial expression of the viewer, such as which programming
language to use and which method of detecting a smile to use, was done. The analysis of the
final problem formulation became the foundation of the design of the product – a product that
aims at fulfilling the motivation and solving the final problem formulation.
The project character was designed taking inspiration from e.g. Warner Brothers and Walt
Disney. Next, the different types of humor to be used in the movie clips were chosen as
variants of slapstick humor. Nine drafty storyboards were created, involving the character
being exposed to each humor type in three storyboards per type, and by using knowledge
gained from analyzing the medium of movies, such as the use of acting. In order to bring the
character to life, several animation techniques were examined and chosen to be used in
correlation with the drafty storyboards, such as Disney’s 12 Principles. In the last part of the
design phase, it was chosen how to implement the smile detection, using Haar cascades for
face detection and a more simple method was used for detecting smiles. The decisions made
in this design chapter served as a template for the realization of the actual product of the
project.
In the implementation of the movies, the techniques used for modeling the character, such as
box-modeling were explained. Next, the process of rigging the character, thereby preparing
him for animation, was explained. Lastly the techniques used for animating the character, e.g.
using Key Frames, were explained. During the implementation of the program, an application
to obtain training data for smile detection, a program for doing training of smile detection and
the actual smile detection program was developed. Each of the programs was developed in
C++, using the OpenCV library for the image processing. This implementation resulted in the
creation of a product to be tested in order to fulfill the motivation prove the final problem
statement.
173
Group 08ml582
8. Conclusion
With both program and movie clips implemented, the combined product was tested. A cross
validation of the smile detection proved that the success rate of the program fulfilled the
requirements set, with an average detection rate of more than 89%. The user tests proved
that the program was able to detect the users smiling, but only in half the cases was the
detection of the program coherent with what the users’ own opinion. The test indicated that
the users were smiling more when they watched reactive version of the program, which chose
clips of their preferred type of humor, than if they watched a program that did not take the
smiling into consideration. This test indicated that the final problem formulation was indeed
solved by this project and that the product is in fact one possible solution to creating an
adaptive program reacting to the smile of a user.
174
Group 08ml582
9. Future perspective
9. Future perspective
When glancing at the final product of the project, the initial idea of making the user passively
control the narrative has been delimited into selecting small narratives, with the same
starting point. A natural successor of the product could be one, long lasting continuous
narrative, which react to the users facial expressions. By having only one, long story and not a
lot of different short stories, the user experience would be less sporadic due to the continuous
story, but also because of the non-interrupting nature of the passive interaction form.
During the development of the product, it has become clear for the group, that the area of
animation and interactive controls has many facets. The passive interaction form suggested
by report, provides possibilities for further exploration. The passive control of a medium,
through facial expressions and moods, offers a wide variety of usage. Not only within the area
of the implementation used in this project, but also within many daily routines. The
adjustment of a product’s functionality according to their user’s mood can be applied in many
home appliances, such as lamps, windows, beds etc. Imagine the color of a lamp, changing
according to the user’s mood.
In connection with a PC, the smile detection specifically can be used in connection with many
online applications, communities etc. One example could be the video portal YouTube, which
already has the functionality of recommending video clips to the users, according to what the
individual user has previously watched. This could be further developed by implementing
smile detection on the site and make YouTube detect whether or not the user actually liked
the clips he was shown. The technology could also be applied in connection to music, where
an online radio station could create a personalized playlist, based on the users’ reaction to
different types of music. Reading all kind of reactions would also require the technology to be
able to detect other facial characteristics than just a smile, e.g. eye movement, blushing or
other facial expression, such as frown.
This shows that, even outside of the main concept of altering the narrative through smile
detection, there are a large number of possibilities to enhance every day entertainment and
appliances using passive interaction.
175
Group 08ml582
10. Bibliography
10. Bibliography
Adamson, A., & Jenson, V. (Directors). (2001). Shrek [Motion Picture].
American Association for the Advancement of Science. (2004, 12 16). Finding fear in the whites of the eyes.
Retrieved 10 02, 2008, from American Association for the Advancement of Science - EurekAlert:
http://www.eurekalert.org/features/kids/2004-12/aaft-ffi020805.php
American Heritage® Dictionary. (2008). The American Heritage® Dictionary of the English Language: Fourth
Edition. Houghton Mifflin Company.
Animation Mentor. (2008). Requirements to Attend Animation Mentor. Retrieved October 6, 2008, from
Animationmenter.com: http://www.animationmentor.com/school/requirements.html
AnimationMentor.com (Director). (2008). Student Showcase Summer 2008 [Motion Picture].
Aristotle. (350 B.C.). Poetics.
Avery, T. (Director). (1940). A Wild Hare [Motion Picture].
Avery, T. (1944). IMDB. Retrieved from The Internet Movie Database: http://www.imdb.com/title/tt0037251/
Avery, T. (Director). (1937). Porky's Duck Hunt [Motion Picture].
Bio-Medicine.org. (2007, 11 11). Nervous system. Retrieved 10 02, 2008, from Bio-medicine.org: http://www.bio-
medicine.org/biology-definition/Nervous_system/)
Biordi, B. (2008). Retrieved December 6, 2008, from Dragon's Lair Fans: http://www.dragonslairfans.com/
Bioware. (2007). Mass Effect Community. Retrieved 10 15, 2008, from Bioware: http://masseffect.bioware.com/
Bluth, D. (2004). The Inbetweener. Retrieved 11 23, 2008, from Don Bluth's Animaiton Academy:
http://www.donbluth.com/inbetweener.html
Bordwell, D., & Thompson, K. (2008). Film Art. New York: McGraw-Hill.
Bradski, G., & Kaelhler, A. (2008). Learning OpenCV: Computer Vision with the OpenCV Library. O'Reilly Media,
Inc.: 1st edition.
Bros., W. (1930-1969). Looney Tunes. Retrieved 12 12, 2008, from Looney Tunes:
http://looneytunes.kidswb.com/
Cincerova, A. (2007, June 14). Groundbreaking Czechoslovak interactive film system revived 40 years later. (I.
Willoughby, Interviewer)
176
Group 08ml582
10. Bibliography
Clair, R. (1931). Le Million. Retrieved 12 12, 2008, from IMDB.com: http://www.imdb.com/title/tt0022150/
Clark, B. H. (1918). European Theories of the Drama: An Anthology of Dramatic Theory and Criticism from Aristotle
to the Present Day. Stewart & Kidd.
Clements, R., & Musker, J. (Directors). (1992). Aladdin [Motion Picture].
Conrad, T., & Kroopnick, S. (2001). The scariest place on Earth. Retrieved 12 08, 2008, from
http://www.imdb.com/title/tt0280312/
Crane, D., & Kaufman, M. (1994). Friends. Retrieved 12 08, 2008, from http://www.imdb.com/title/tt0108778/
Croshaw, B. ". (2008, 11 06). Zero Punctuation. Retrieved 11 06, 2008, from the escapist:
http://www.escapistmagazine.com/videos/view/zero-punctuation
Dale, A. S. (2000). Comedy is a Man in Trouble: Slapstick in American Movies. U of Minnesota Press.
Drawingcoach. (2008). Retrieved 12 10, 2008, from www.drawingcoach.com:
http://images.google.dk/imgres?imgurl=http://www.drawingcoach.com/image-
files/cartoon_eyes_female.gif&imgrefurl=http://www.drawingcoach.com/cartoon-
eyes.html&h=600&w=135&sz=11&hl=da&start=67&sig2=YBdhnc6G5qDk9ofV9aOiHA&um=1&usg=__w65giv6Q
dud78QIdvtzZT5-lq
Experimentarium. (2008). Statistik. Retrieved October 3, 2008, from Experimentarium:
http://www.experimentarium.dk/presse_corporate/tal_fakta/statistik/
Expo 67. (2007). Montreal Universal and international exhibition 1967. Retrieved December 6, 2008, from
expo67: http://expo67.morenciel.com/an_expo67/
Fincher, D. (1995). Se7en. Retrieved 12 12, 2008, from IMDB.com: http://www.imdb.com/title/tt0114369/
Fincher, D. (1997). The Game. Retrieved 12 12, 2008, from http://www.imdb.com/title/tt0119174/:
Fogg, A. (1996, 02 18). Monty Pythons's completely useless website. Retrieved December 16, 2008, from Monty
Pythons's completely useless website: http://www.intriguing.com/mp/
Freytag, G. (1876). Die Technik des Dramas. Leipzig, S. Hirzel.
Furniss, M. (1998). Art in motion: Animation aesthetics. Sydney: John Libbey.
Geronimi, C. (Director). (1959). Sleeping Beauty [Motion Picture].
Gilliam, T., & Jones, T. (1975). Monty Python and the Holy Grail. Retrieved 12 12, 2008, from IMDB.com:
177
Group 08ml582
10. Bibliography
Hager, J. C. (2003, 01 17). Emotion and Facial Expression. Retrieved 10 02, 2008, from A Human Face:
http://www.face-and-emotion.com/dataface/emotion/expression.jsp
Hand, D. (Director). (1942). Bambi [Motion Picture].
Hand, D. (Director). (1937). Snow White and the Seven Dwarves [Motion Picture].
Hitchcock. (1960). Psycho. Retrieved 12 12, 2008, from IMDB.com: http://www.imdb.com/title/tt0054215/
IMDB. (1971). And Now for Something Completely Different. Retrieved December 16, 20008, from IMDB.com:
IMDB.com. (2008). David Sonnenschein. Retrieved 12 11, 2008, from IMDB.com:
http://www.imdb.com/name/nm0814408/
IMDB.com. (2008). Ken Harris. Retrieved 12 06, 2008, from IMDB.com:
http://www.us.imdb.com/name/nm0364938/
IMDB.com. (2008). Ollie Johnston. Retrieved 12 12, 2008, from IMDB.com:
IMDB.com. (2008). Porky's Duck Hunt. Retrieved 12 12, 2008, from IMDB.com:
IMDB.com. (2008). Richard Williams. Retrieved 12 06, 2008, from IMDB.com:
Johnston, O., & Thomas, F. (1997). The Illusion of Life. Hyperion; 1st Hyperion Ed edition.
Jones, C., & Monroe, P. (1980). Soup of Sonic. Retrieved 12 12, 2008, from IMDB.com:
Kerlow, I. V. (2004). Applying the Twelve Principles to 3D Computer Animation. In I. V. Kerlow, The Art of 3D
Computer Animation and Effects (pp. 278-283). Hoboken, New Jersey: John Wiley & Sons, Inc.
Kinoautomat. (2007). About Kinoautomat. Retrieved December 6, 2008, from Kinoautomat:
http://www.kinoautomat.cz/index.php?intro=ok
Konami. (1998). Metal Gear Series. Retrieved 10 15, 2008, from Kojima Productions:
http://www.konami.jp/kojima_pro/english/index.html
Kubrick, S. (1980). The Shining. Retrieved 12 12, 2008, from IMDB.com: http://www.imdb.com/title/tt0081505/
Langford, J. (Director). (2008). Tales of a Third Grade Nothing [Motion Picture].
178
Group 08ml582
10. Bibliography
Larry, D., & Seinfeld, J. (1990). Seinfeld. Retrieved 12 08, 2008, from http://www.imdb.com/title/tt0098904/
Luske, H., & Sharpsteen, B. (Directors). (1940). Pinocchio [Motion Picture].
Microsoft. (2008, August). Download details: DirectX End-User Runtime. Retrieved October 2, 2008, from
Microsoft Download Center: http://www.microsoft.com/downloads/details.aspx?familyid=2da43d38-db71-
4c1b-bc6a-9b6652cd92a3&displaylang=en#Requirements
Milton, E. (Director). (2004). The Incredibles - Behind the Scenes [Motion Picture].
Moore, A. W. (2005, October 11). Cross-validation for detecting and preventing overfitting. Pittsburgh, USA.
MyDNS.jp. (2008). Class: OpenCV::CvScalar. Retrieved November 24, 2008, from mydns.jp:
http://doc.blueruby.mydns.jp/opencv/classes/OpenCV/CvScalar.html
NationMaster. (2005). Encyclopedia > Interactive movie. Retrieved December 6, 2008, from NationMaster:
http://www.nationmaster.com/encyclopedia/Interactive-movie
Nonstick.com. (2003). www.nonstick.com. Retrieved 11 05, 2008, from Looney Tunes Character List:
http://www.nonstick.com/wdocs/charac.html
Ohrt, K., & Kjeldsen, K. P. (2008). Årsrapport 2007 for Statens Museum for Kunst. Copenhagen: Statens Museum
for Kunst.
OpenCV. (2008, November 19). Welcome. Retrieved November 23, 2008, from OpenCV Wiki:
http://opencv.willowgarage.com/wiki/
OpenCVWiki. (n.d.). Face Detection using OpenCV. Retrieved November 19, 2008, from willowgarage.com:
http://opencv.willowgarage.com/wiki/FaceDetection
OpenGL.org. (2008). OpenGL Platform & OS Implementations. Retrieved October 2, 2008, from OpenGL.org:
http://www.opengl.org/documentation/implementations/
OpenGL.org. (2008). Using and Licensing OpenGL. Retrieved October 2, 2008, from OpenGL.org:
http://www.opengl.org/about/licensing/
Owen, S. (1999, June 2). Perspective Viewing Projection. Retrieved October 11, 2008, from ACM SIGGRAPH:
http://www.siggraph.org/education/materials/HyperGraph/viewing/view3d/perspect.htm
Peters, S., & Pryce, C. (1991). Are you affraid of the dark? Retrieved 12 08, 2008, from
Pisarevsky, V. (2004, June 10). intel.com. Retrieved November 3, 2008, from OpenCV Object Detection: Theory
and Practice: http://fsa.ia.ac.cn/files/OpenCV_FaceDetection_June10.pdf
179
Group 08ml582
10. Bibliography
Pixar Animation Studios. (2008). Pixar Animation Studios. Retrieved 12 12, 2008, from Pixar.com:
http://www.pixar.com/companyinfo/history/1984.html
Price, W. T. (1892). The Technique of the Drama. New York: Brentano's.
Procedural Arts. (2006). Façade Vision and Motivation. Retrieved September 14, 2008, from InteractiveStory.net:
http://www.interactivestory.net/vision/
Redifer, J. (2008). Interfilm. Retrieved September 14, 2008, from Joe Redifer.com:
http://www.joeredifer.com/site/interfilm/interfilm.html
Rosenthal, P. (1996). Everybody loves Raymond. Retrieved 12 08, 2008, from
Roth, E. (2005). Hostel. Retrieved 12 12, 2008, from IMDB.com: http://www.imdb.com/title/tt0450278/
Scott, R. (2000). Gladiator. Retrieved 12 12, 2008, from IMDB.com: http://www.imdb.com/title/tt0172495/
Scratchapixel. (2008, July 10). Lesson 1: How does it work? Retrieved September 28, 2008, from
www.scratchapixel.com: http://www.scratchapixel.com/joomla/lang-en/basic-lessons/lesson1.html
Seldess, Z. (n.d.). Retrieved November 3, 2008, from Zachary Seldess:
http://www.zacharyseldess.com/sampleVids/Seldess_faceTrackGLnav.mov
Seo, N. (2008, October 16). Tutorial: OpenCV haartraining. Retrieved December 15, 2008, from Naotoshi Seo:
http://note.sonots.com/SciSoftware/haartraining.html#j1d5e509
Shadyac, T. (1994). Ace Ventura: Pet Detective. Retrieved 12 12, 2008, from IMDB.com:
Sharp, H., Rogers, Y., & Preece, J. (2006). Interaction Design. West Sussex: John Wiley & Sons Ltd.
Singer, B. (1994). The Usual Suspects. Retrieved 12 12, 2008, from IMDB.com:
Skelley, J. P. (2005). Experiments in Expression Recogntion. Massachusetts: M.I.T - Department of Electrical
Engineering and Computer Science.
Sony Corp. (2007). Cyber-Shot handbook - Sony DSC-T300.
Sony New Zealand Limited. (2007, August 24). Scoop Independent News - Sci Tech. Retrieved November 12, 2008,
from Cyber-shot introduces smile detection: http://www.scoop.co.nz/stories/SC0708/S00064.htm
Spencer, S. (1993). Radiosity overview, part 1. Retrieved 12 12, 2008, from SIGGRAPH.org:
http://www.siggraph.org/education/materials/HyperGraph/radiosity/overview_1.htm
180
Group 08ml582
10. Bibliography
Stern, A., & Mateas, M. (2006, July 14). Behind Façade: An Interview with Andrew Stern and Michael Mateas. (B.
B. Harger, Interviewer)
The Onion. (2007, May). Hallmark Scientists Identify 3 New Human Emotions. Retrieved 10 02, 2008, from The
Onion: http://www.theonion.com/content/news/hallmark_scientists_identify_3_new
Thomas, F., & Johnston, O. (2002). Our Work. Retrieved 12 12, 2008, from Frank & Ollie's official site:
http://www.frankandollie.com/Film_Features.html
Ubisoft. (1995). Rayman Zone. Retrieved 12 12, 2008, from www.ubisoft.com: http://raymanzone.uk.ubi.com/
von Riedemann, D. (2008, June 02). Walt Disney's Nine Old Men. Retrieved December 12, 2008, from
suite101.com: http://vintage-animated-films.suite101.com/article.cfm/walt_disneys_nine_old_men
Wedge, C. (Director). (2002). Ice Age [Motion Picture].
Wells, P. (1998). Understanding animation. Routledge.
Williams, R. (2001). The Animator's Survival Kit. New York: Faber and Faber Inc.
181
Group 08ml582
11. Illustration List
Illustration 1.1: Retrieved: http://news.filefront.com/wp-content/uploads/2008/04/mass-effect11.jpg
Illustration 2.1: Retrieved: http://img2.timeinc.net/ew/dynamic/imgs/071029/horrormovies/psycho_l.jpg
Illustration 2.2: Retrieved: http://www.myaffiliatetips.com/Images/amazed-affiliate-woman.jpg
Illustration 2.3: Own creation: Heino Jørgensen
Illustration 2.4: Retrieved: http://pages.cpsc.ucalgary.ca/~apu/raytrace1.jpg
Illustration 3.1 - Illustration 3.3: Own creation: Heino Jørgensen
Illustration 3.4: Retrieved: (Bordwell & Thompson, 2008, s. 142)
Illustration 3.8 - Illustration 3.10: Retrieved: (Bordwell & Thompson, 2008, s. 190)
Illustration 3.11 - Illustration 3.17: Retrieved: (Bordwell & Thompson, 2008, s. 191)
Illustration 3.18: Retrieved: http://i3.photobucket.com/albums/y90/pinkfloyd1973/myspace/jiminycricket.png
Illustration 3.19: Retrieved: http://www.coverbrowser.com/image/donald-duck/31-1.jpg and
http://www.coverbrowser.com/image/donald-duck/47-1.jpg
Illustration 3.20: Retrieved: http://classiccartoons.blogspot.com/2007/05/screwball-squirrel.html
Illustration 3.21: Retrieved: http://justyouraveragejoggler.files.wordpress.com/2006/11/111806-roadrunner.jpg
Illustration 3.22: Own Creation: Mikkel Berentsen Jensen
Illustration 3.23: Retrieved: http://www.zacharyseldess.com/sampleVids/Seldess_faceTrackGLnav.mov
Illustration 3.24: Retrieved: http://fsa.ia.ac.cn/files/OpenCV_FaceDetection_June10.pdf
Illustration 3.25: Own creation: Heino Jørgensen
Illustration 3.26: Own Creation: Mikkel Berentsen Jensen
Illustration 4.1: Retrieved: http://realitymarbles.files.wordpress.com/2007/08/bugsbunny.png and
http://www.funbumperstickers.com/images/Daffy_Duck_1.gif
Illustration 4.2: Retrieved:
http://animationarchive.net/Feature%20Films/Aladdin/Model%20Sheets/AladdinModelSheet1.jpg
Illustration 4.3: Retrieved: http://www.acmeanimation.com/FAIRIES.JPG and
http://www.quizilla.com/user_images/N/Nightshadow/1034485810_EWorkPicsMalifecant.JPG
182
Group 08ml582
Illustration 4.4: Retrieved: http://disneyheaven.com/images/DisneyStorybook/Aladdin/Sultan.gif and
http://arkansastonight.com/uploaded_images/jafar-719207.jpg
Illustration 4.5: Own Creation: Kim Etzerodt
Illustration 4.6: Retrieved: http://pressthebuttons.typepad.com/photos/uncategorized/rayman.jpg
Illustration 4.7: http://img512.imageshack.us/img512/373/yathzeewallbx1.jpg
Illustration 4.8: Own Creation: Kim Etzerodt
Illustration 4.9 - Illustration 4.12: Own creation: Kim Etzerodt
Illustration 4.13: Own creation: Kim Etzerodt, Mikkel Lykkegaard Jensen and Sune Bagge
Illustration 4.14: Own creation: Mikkel Lykkegaard Jensen.
Illustration 4.15: Own creation: Kim Etzerodt.
Illustration 4.16: Own creation: Mikkel Lykkegaard Jensen and Sune Bagge.
Illustration 4.17: Own creation:: Kim Etzerodt and Sune Bagge.
Illustration 4.18 - Illustration 4.20: Own creation: Kim Etzerodt.
Illustration 4.21: Retrieved: http://www.coe.tamu.edu/~lcifuent/edtc656/Unit_08/reading_files/image001.gif
Illustration 4.30: (Williams, 2001, s. 107)
Illustration 4.33: (Williams, 2001, s. 112-113)
Illustration 4.42: Own creation: Heino Jørgensen.
183
Group 08ml582
Illustration 5.18 - Illustration 5.21: Own creation: Mikkel Lykkegaard Jensen.
Illustration 5.22: Own creation: Kim Etzerodt
Illustration 5.23: Own creation: Kim Etzerodt
Illustration 5.24: Own creation: Mikkel Lykkegaard Jensen
Illustration 5.25: Own creation: Mikkel Lykkegaard Jensen
Illustration 5.26: Own creation: Mikkel Berentsen Jensen
Illustration 5.27: Retrieved:
http://upload.wikimedia.org/wikipedia/commons/7/71/Histogrammspreizung.png
Illustration 5.28 - Illustration 6.8: Own creation: Heino Jørgensen
184
Group 08ml582
12. Appendix
12. Appendix
12.1 Storyboards
Music Box (Humor Type ”Falling apart”– Story board for 3D Animation)
185
Group 08ml582
12. Appendix
Cheated 1 of 2 (Humor Type "Falling apart")
186
Group 08ml582
12. Appendix
Cheated 2 of 2 (Humor Type "Falling apart")
187
Group 08ml582
12. Appendix
Open Door 1 of 3 (Humor Type "Falling apart")
188
Group 08ml582
12. Appendix
189
Group 08ml582
12. Appendix
190
Group 08ml582
12. Appendix
Trampoline (Humor Type "Black humor"– Story board for 3D Animation)
191
Group 08ml582
12. Appendix
Stickman (Humor Type "Black humor")
192
Group 08ml582
12. Appendix
Kick Box (Humor Type "Black humor")
193
Group 08ml582
12. Appendix
Bullet Time (Humor Type "Ball play" – Story board for 3D Animation)
194
Group 08ml582
12. Appendix
Lethal Ball (Humor Type "Ball play")
195
Group 08ml582
12. Appendix
Hard Ball (Humor Type "Ball play")
196
Group 08ml582
12. Appendix
12.2 Implementation Code
Smile Picture Capture Code
#define VERSION "Smile Picture Capture v. 0.004 Alpha"
#include "cv.h"
#include "highgui.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <math.h>
#include <float.h>
#include <limits.h>
#include <time.h>
#include <ctype.h>
#include <iostream>
#include <fstream>
using namespace std;
static CvMemStorage* storage = 0;
static CvHaarClassifierCascade* cascade = 0;
IplImage* mouthDetection( IplImage* image );
IplImage* resizeMouth(IplImage* src);
const char* cascade_name =
"haarcascade_frontalface_alt2.xml"; // Choice of haarcascade (here frontal face)
int main( int argc, char** argv )
{
char smileFile[] = "SmilePictureX.jpg"; // Filename template for smile pictures
char neutralFile[] = "NeutrPictureX.jpg";// Filename template for neutral pictures
int smilePicCount = 65; // Changing Ascii value which are replacing the "X" in the
filenames above
int neutralPicCount = 65; // Changing Ascii value which are replacing the "X" in the
filenames above
// Ascii 65 = A
CvCapture* capture = 0;
IplImage *frame, *frame_copy = 0;
const char* input_name;
input_name = argc > 1 ? argv[1] : 0;
cascade = (CvHaarClassifierCascade*)cvLoad( cascade_name, 0, 0, 0 );
if( !cascade ) // If haar cascade is not found
{
fprintf( stderr, "ERROR: Could not load classifier cascade\n" );
system("pause");
return -1;
}
storage = cvCreateMemStorage(0);
if( !input_name || (isdigit(input_name[0]) && input_name[1] == '\0') ) // If camera is
found capture feed
capture = cvCaptureFromCAM( !input_name ? 0 : input_name[0] - '0' );
else // Else capture from movie (film.avi)
capture = cvCaptureFromAVI( "film.avi" );
197
Group 08ml582
12. Appendix
cvNamedWindow( "Mouth", 1 );
// Open Window
cvNamedWindow( "Original Picture", 1 ); // Open Window
#ifdef WIN32
// This is only for Windows 32 bit
system("mkdir C:\\Pictures\\RunXXX\\");// Tells system to create the
C:\Pictures\RunXXX
#endif
fstream file_smile("C:\\Pictures\\RunXXX\\SmilePics.txt",ios::out);
// Opens a txt file for write access
fstream file_neutr("C:\\Pictures\\RunXXX\\NeutrPics.txt",ios::out);
// Opens a txt file for write access
if( capture )
{
// While smiles or neutrals are needed keep running
for(;smilePicCount < 90 || neutralPicCount < 90;)
{
if( !cvGrabFrame( capture ))
break;
frame = cvRetrieveFrame( capture );
if( !frame )
break;
if( !frame_copy )
frame_copy = cvCreateImage( cvSize(frame->width,frame->height),
IPL_DEPTH_8U, frame->nChannels );
if( frame->origin == IPL_ORIGIN_TL )
cvCopy( frame, frame_copy, 0 );
else
cvFlip( frame, frame_copy, 0 );
directory
IplImage* mouthImage;
// Creates a new picture for the image of the mouth
mouthImage = mouthDetection( frame_copy ); // Finds the mouth in the picture
frame_copy
if (mouthImage->width != 0)
// If the mouth image exists
{
mouthImage = resizeMouth(mouthImage);// Resize
cvShowImage( "Mouth", mouthImage );// Show mouth
cvShowImage( "Original Picture", frame_copy );// Show webcam feed
// Added this to save image:
int key=cvWaitKey(10);
if(key=='1')
// If "1" is pressed
{
if(smilePicCount != 90);// If less than 25 pictures been found
{
smileFile[12] = smilePicCount; // Change the 12. character in the
filename to ASCII code from smilePicCount
std::cout << "Smile saved as: " << smileFile << std::endl;
cvSaveImage(smileFile,mouthImage); // Save mouth image
file_smile << "C:\\\\Pictures\\\\RunXXX\\\\Smiles\\\\" << smileFile <<
endl; // Add path to smileFile text file
smilePicCount++;
// Increment of character
} else {
// Enough smiles
std::cout << "No more smiles needed! " << std::endl;
}
};
if(key=='2')
198
// If "2" is pressed
Group 08ml582
12. Appendix
{
if(neutralPicCount != 90)// If less than 25 pictures has been found
{
neutralFile[12]
=
neutralPicCount;//
Change
the
12.
character in the filename to ASCII code from neutralPicCount
std::cout << "Neutral saved as: " << neutralFile << std::endl;
cvSaveImage(neutralFile,mouthImage);// Save mouth image
file_neutr
<<
"C:\\\\Pictures\\\\RunXXX\\\\Neutrals\\\\"
<<
neutralFile << endl;
neutralPicCount++;
} else {
std::cout << "No more neutral expressions needed! " << std::endl;
}
};
}
else // If no mouth has been found
{
cvShowImage( "Mouth", frame_copy );
// Show webcam feed instead
cvShowImage( "Original Picture", frame_copy );// Show webcam feed instead
}
cvReleaseImage( &mouthImage );
}
// Finished gathering smiles and neutrals
#ifdef WIN32
// This is only for Windows 32 bit
system("xcopy *Smile*.jpg C:\\Pictures\\RunXXX\\Smiles\\"); // Copy all files with
filename *Smile* to specific path
system("xcopy *Neut*.jpg C:\\Pictures\\RunXXX\\Neutrals\\"); // Copy all files
with filename *Neut* to specific path
#endif
cvReleaseImage( &frame_copy );
cvReleaseCapture( &capture );
}
cvDestroyWindow(VERSION);
return 0;
}
IplImage* mouthDetection( IplImage* img ) // Mouth
mouthDetection function in Smile Detection Program
{
IplImage* mouthPixels;
bool detectedFace = 0;
static CvScalar colors[] =
{
{{0,0,255}},
{{0,128,255}},
{{0,255,255}},
{{0,255,0}},
{{255,128,0}},
{{255,255,0}},
{{255,0,0}},
{{255,0,255}}
};
detection
is
identical
with
the
double scale = 1.3;
IplImage* gray = cvCreateImage( cvSize(img->width,img->height), 8, 1 );
IplImage* small_img = cvCreateImage( cvSize( cvRound (img->width/scale),cvRound (img>height/scale)),8, 1 );
int i;
199
Group 08ml582
12. Appendix
cvCvtColor( img, gray, CV_BGR2GRAY );
cvResize( gray, small_img, CV_INTER_LINEAR );
cvEqualizeHist( small_img, small_img );
cvClearMemStorage( storage );
if( cascade )
{
double t = (double)cvGetTickCount();
CvSeq* faces = cvHaarDetectObjects( small_img, cascade, storage,
1.1, 2, 0/*CV_HAAR_DO_CANNY_PRUNING*/,
cvSize(30, 30) );
t = (double)cvGetTickCount() - t;
for( i = 0; i < (faces ? faces->total : 0); i++ )
{
CvRect* r = (CvRect*)cvGetSeqElem( faces, i );
CvPoint facecenter;
int radius;
facecenter.x = cvRound((r->x + r->width*0.5)*scale);
facecenter.y = cvRound((r->y + r->height*0.5)*scale);
radius = cvRound((r->width + r->height)*0.25*scale);
// Mouth detection
CvPoint mouthUpLeft;
CvPoint mouthDownRight;
mouthUpLeft.x = facecenter.x - 0.5*radius;
mouthUpLeft.y = facecenter.y + 0.3*radius;
mouthDownRight.x = cvRound(mouthUpLeft.x + radius);
mouthDownRight.y = cvRound(mouthUpLeft.y + radius * 0.5);
detectedFace = true;
cvRectangle( img, mouthUpLeft, mouthDownRight, colors[3%8], 3, 8, 0); //Pixels
we need for smile :D
int step
= gray->widthStep/sizeof(uchar);
//std::cout << "Step: " << step << " Width: " << gray->width <<
uchar* data
= (uchar *)gray->imageData;
std::endl;
mouthPixels
=
cvCreateImage(
cvSize(mouthDownRight.y
mouthUpLeft.y,mouthDownRight.x - mouthUpLeft.x),IPL_DEPTH_8U, 1 );
mouthPixels->height
= mouthDownRight.y - mouthUpLeft.y;
mouthPixels->width
= mouthDownRight.x - mouthUpLeft.x;
mouthPixels->widthStep = mouthPixels->width/sizeof(uchar);
mouthPixels->nChannels = 1;
uchar* data2
= (uchar *)mouthPixels->imageData;
int data2Location = 0;
for(int a = mouthUpLeft.y; a < mouthDownRight.y; a++)
{
for(int b = mouthUpLeft.x; b < mouthDownRight.x; b++)
{
data2[data2Location] = data[b+a*step];
data2Location++;
}
}
}
}
if(detectedFace
{
// cvShowImage(
cvReleaseImage(
cvReleaseImage(
200
== true)
VERSION, mouthPixels ); } else { cvShowImage( VERSION, gray);};
&gray );
&small_img );
-
Group 08ml582
12. Appendix
return mouthPixels;
}
else
{
gray->width = 0;
cvReleaseImage( &small_img );
return gray;
}
};
IplImage* resizeMouth(IplImage* src)
{
IplImage* resizedMouth = cvCreateImage(cvSize(30,15),IPL_DEPTH_8U, 1);
cvResize(src, resizedMouth, 1);
return resizedMouth;
};
201
Group 08ml582
12. Appendix
Smile Training Program Code
#define VERSION "Smile Training v. 0.001 Alpha"
#include <cv.h>
#include <highgui.h>
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
<stdio.h>
<stdlib.h>
<string.h>
<assert.h>
<math.h>
<float.h>
<limits.h>
<time.h>
<ctype.h>
<iostream>
<string>
<cstring>
<fstream>
<vector>
#ifdef _EiC
#define WIN32
#endif
vector <IplImage*> loadCollection(string file, int maxCollectionSize) /* Function takes in
two variables: The path to the index file of pictures, and number of how big the
collection */
{
// Creates a vector of image pointers
const char *filename = file.c_str();// Converts the string file to a const char
// Opens the file for reading
string line;
// Creates a string named line
// Run through the file
{
{
// This is image pointer for the image
//
This
is
image
pointer
for
the
flipped
image
//
//
//
//
//
Load the image
Equalize the histogram
Save image into vector
Flip image and save into imageFlip
Save imageFlip into vector
}
}
return collection;
}
/* Loads a collection of images into the function */
IplImage* getMean(vector <IplImage*> collection)
{
/* Creates two scalars, which contains an 1D array with RGB and Alpha values (a 8 bit
picture) */
CvScalar s, t;
/* Creates an image with the same width and height as the training images */
IplImage*
meanImg
=
cvCreateImage(cvSize(collection[0]->width,collection[0]>height),IPL_DEPTH_8U,1);
int temp = 0;
202
Group 08ml582
12. Appendix
/* Creates a vector to temporarily save pixel values
/* Goes through every picture in collection */
{
// For Y values
{
for (int x=0; x<collection[i]->width; x++) // For X values
{
s = cvGet2D(collection[i],y,x); // Get pixel value for image in X,Y
/* Add the pixel value for the current image into the coordinate vector */
}
}
}
/* Go through the added pixel values and divide with the amount of pictures */
{
}
int
pixelCounter
=
0;
/* For loop that converts the coordinate vector into an image (meanImg) */
{
{
{
}
pixelCounter++;
}
}
return meanImg;
}
{
vector <IplImage*> smileCollection;
vector <IplImage*> neutralCollection;
IplImage* imageS;
IplImage* imageN;
smileCollection = loadCollection("SmilePics.txt", 330);
neutralCollection = loadCollection("NeutrPics.txt", 346);
imageS = getMean(smileCollection);
imageN = getMean(neutralCollection);
CvScalar s;
s = cvGet2D(imageS,0,0);
cvNamedWindow("Picture", 1);
// Create a window and name it: Picture
cvShowImage("Picture", imageS);
// display it
cout << "Current pixel value at pixel (0,0): " << s.val[0] << endl;
cvWaitKey();
// Wait for a KeyPress
cvSaveImage("smile.jpg",imageS);
cvSaveImage("neutral.jpg",imageN);
cvDestroyWindow("Picture");
cvReleaseImage(&imageS);
cvReleaseImage(&imageN);
return 0;
}
203
Group 08ml582
12. Appendix
Cross Validation Program Code
#define VERSION "Smile Detector v. 0.003 Alpha"
#include <cv.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <math.h>
#include <float.h>
#include <limits.h>
#include <time.h>
#include <ctype.h>
#include <iostream>
#include <vector>
#include <fstream>
#include <string>
#include <cstring>
#ifdef _EiC
#define WIN32
#endif
bool isSmiling(IplImage* mouthPicture, IplImage* smileTemplate,
IplImage*
neutralTemplate, int st);
double pixelDifference(IplImage* pic1, IplImage* pic2);
double pixelDifferenceSqr(IplImage* pic1, IplImage* pic2);
vector
<IplImage*>
loadLearningCollection(string
file,
int
maxCollectionSize,
int
skipStart, int trainingDataSize)
vector <IplImage*> loadTestingCollection(string file, int maxCollectionSize, int offset);
IplImage* getMean(vector <IplImage*> collection);
{
vector <IplImage*> smileCollection;// Creates vector of images to hold all smile
pictures
vector <IplImage*> neutralCollection;// Creates vector of images to hold all neutral
pictures
vector <IplImage*> smileTestData;// Creates vector of images to hold the smile
pictures to test on
vector <IplImage*> neutralTestData; // Creates vector of images to hold all neutral
pictures to test on
IplImage* smileTemplate = cvLoadImage("smile.jpg");
//cvCreateImage(cvSize(30,15),IPL_DEPTH_8U,1);
IplImage* neutralTemplate = cvLoadImage("neutral.jpg");
//cvCreateImage(cvSize(30,15),IPL_DEPTH_8U,1);
int smile = 0, neutral = 0;
int n = 1,
trainingDataSize = 16;
ofstream sResults ("cvSResult.csv");// Outputs the results in a comma seperated value
file
ofstream nResults ("cvNResult.csv");Outputs the results in a comma seperated value
file
//cvNamedWindow( "Current picture", 1);
for (int st = 0; st<10000; st+=100) // Checks different thresholds using steps of 1000
{
smile = 0; // Counts the smiles
neutral = 0;// Counts the neutrals
n = 0;
for (int i=0; i<10; i++)
{
204
Group 08ml582
12. Appendix
smileCollection
=
loadLearningCollection("SmilePics.txt",
297,
i*trainingDataSize, trainingDataSize); // Load the collection of smiles, excluding the
pictures that is to be used as test data
neutralCollection
=
loadLearningCollection("NeutrPics.txt",
313,
i*trainingDataSize, trainingDataSize); // Load the collection of non-smiles, excluding the
pictures that is to be used as test data
smileTemplate = getMean(smileCollection);
// Get the mean image of the smile collection
neutralTemplate = getMean(neutralCollection);
// Get the mean image of the smile collection
smileTestData
=
loadTestingCollection("SmilePics.txt",
trainingDataSize,
i*trainingDataSize);
// Load the smile training collection
neutralTestData
=
loadTestingCollection("NeutrPics.txt",
trainingDataSize,
i*trainingDataSize);
// Load the non-smile training collection
for (int j=0; j<12; j++)
{
/*cvShowImage("Current picture", neutralTestData[j]);
cvWaitKey();*/
if (isSmiling(smileTestData[j],smileTemplate,neutralTemplate,st)) // Simply
tests if the image contains smiles (See SmileDetection for more information)
{
//sResults << "Picture " << n << " is smiling" << endl;
smile++;
}
else
{
//sResults << "Picture " << n << " is not smiling" << endl;
}
if (isSmiling(neutralTestData[j],smileTemplate,neutralTemplate,st))
Simply tests if the image contains smiles (See SmileDetection for more information)
{
//nResults << "Picture " << n << " is smiling" << endl;
}
else
{
//nResults << "Picture " << n << " is not smiling" << endl;
neutral++;
}
n++;
}
}
cout << "Picture " << st << " done." << endl;
sResults << ((float)smile/(n-1))*100 << ";";
// Calculates the percent of smiles detected as smiles
nResults << ((float)neutral/(n-1))*100 << ";";
// Calculates the percent of neutrals detected as neutrals
}
cout << "Cross validation complete." << endl;
cvReleaseImage(&smileTemplate);
cvReleaseImage(&neutralTemplate);
//cvDestroyWindow("Current picture");
sResults.close();
// Close file
nResults.close();
// Close file
//
return 0;
}
IplImage*
neutralTemplate, int st) // Function that checks if user is smiling
{
int smileDist = 0;
int neutralDist = 0;
205
Group 08ml582
12. Appendix
int width = mouthPicture->width;
if
(pixelDifferenceSqr(mouthPicture,smileTemplate)
<
pixelDifference(mouthPicture,
neutralTemplate)+st) // Comparision between difference between (live image and smile
image) and (live image and neutral image)
{
return true;
}
else
{
return false;
}
};
double pixelDifference(IplImage* pic1, IplImage* pic2)
// Smile comparision using method 1
{
// Please see "SmileDetection" program for more information
CvScalar s, t;
// "SmileDetection" contains a more streamlined version of this function
int width = pic1->width;
double diff = 0;
for (int y = 0; y < pic1->height; y++)
{
for (int x = 0; x < pic1->width; x++)
{
if (s.val[0]-t.val[0] < 0)
diff += t.val[0]-s.val[0];
else
diff += s.val[0]-t.val[0];
}
}
return diff;
};
double pixelDifferenceSqr(IplImage* pic1, IplImage* pic2)
// Smile comparision using method 2
{
// Please see "SmileDetection" program for more information
CvScalar s, t;
// "SmileDetection" contains a more streamlined version of this function
double diff = 0;
{
{
s = cvGet2D(pic1,y,x);
t = cvGet2D(pic2,y,x);
diff += (s.val[0]-t.val[0])*(s.val[0]-t.val[0]);
}
}
return sqrt(diff);
};
vector
<IplImage*>
loadLearningCollection(string
file,
int
skipStart,
int
trainingDataSize)
//
Loads
collection
of
"SmileLearner"
{
const char *filename = file.c_str();
string line;
int lineNum = 0;
206
maxCollectionSize,
int
images,
descriped
in
Group 08ml582
12. Appendix
{
{
if (lineNum >= skipStart && lineNum < skipStart+trainingDataSize) // Slightly
changed to skip every 12'th picture
{
lineNum++;
}
else
{
// This is image pointer
// load the image
lineNum++;
}
}
}
return collection;
};
vector <IplImage*> loadTestingCollection(string file, int maxCollectionSize, int offset)
{
const char *filename = file.c_str();
string line;
int lineNum = 0;
{
{
if (lineNum < offset) // Slightly changed to use only every 12'th picture
{
lineNum++;
}
else
{
// load the image
lineNum++;
}
}
}
return collection;
};
IplImage* getMean(vector <IplImage*> collection) // Described in "SmileLearner"
{
CvScalar s, t;
207
Group 08ml582
12. Appendix
IplImage*
meanImg
=
cvCreateImage(cvSize(collection[0]->width,collection[0]>height),IPL_DEPTH_8U,1);
int temp = 0;
{
{
for (int x=0; x<collection[i]->width; x++)
{
s = cvGet2D(collection[i],y,x);
}
}
}
//cout << "coordinate[150] before averaging= " << coordinate[150] << endl;
{
}
//cout << "coordinate[150] after averaging= " << coordinate[150] << endl;
int pixelCounter = 0;
{
{
{
}
pixelCounter++;
}
}
return meanImg;
};
208
Group 08ml582
12. Appendix
Smile Detection Program Code
#define VERSION "Smile Detector v. 0.7 Release Candidate"
#include <cv.h>
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <assert.h>
#include <math.h>
#include <float.h>
#include <limits.h>
#include <time.h>
#include "timer.h"
#include <ctype.h>
#include <iostream>
#include <vector>
#include <fstream>
#include <cstring>
#ifdef _EiC
#define WIN32
#endif
// OpenCV variable decleration
// Function prototypes for the functions created by Group 08ml582
IplImage* mouthDetection( IplImage* image, unsigned char colorChange );
IplImage*
void sendToFlash(int smileCount, int neutralCount);
void playMovie(int newMovie, const char *file[]);
// Load the haar cascade used for face detection
"haarcascade_frontalface_alt2.xml";
{
// Initialization of variables
IplImage* smiley = cvLoadImage("smiley.jpg");
IplImage* saddy = cvLoadImage("saddy.jpg");
vector <bool> frames,movieMood;
vector <int> smileTimer,movieOrder;
Timer timer;
bool smiling = false;
int smileCount = 0, neutralCount = 0, st = 0, lineNum = 0, movieNr = 0, init = 0, fa =
0, cd = 0, clipEnd = 0;
float duration = 0.f, clipDuration[9], hStyle[3];
unsigned char colorChange = 0;
// Load playlist
string fileArray[9] = {"0","0","0","0","0","0","0","0","0"};
ifstream playList ("playlist.txt");
string line;
if (playList.is_open())
{
209
Group 08ml582
12. Appendix
while (! playList.eof() && lineNum < 18)
{
if(lineNum%2==0)
//For every linenumber dividable by two (0,2,4,8...)
{
getline(playList,line); //Get the current line as a string (command for
starting a movie clip)
fileArray[fa] = line;
//Save current line in fileArray at position fa
fa++;
}
else
//For every odd linenumber
{
playList >> duration;
//Find and store a float (duration of movie found
in previous line)
getline(playList,line); //This line is needed in order to make the getline
work properly in above if statement!
clipDuration[cd] = duration; //Save current duration in clipDuration at
position cd
cd++;
}
lineNum++;
}
}
// Create a const char* array and save all values of fileArray as c strings in this
array (needed for reading later)
const
char
*file[9]
=
{fileArray[0].c_str(),fileArray[1].c_str(),fileArray[2].c_str(),fileArray[3].c_str(),
fileArray[4].c_str(),fileArray[5].c_str(),fileArray[6].c_str(),fileArray[7].c_str(),
fileArray[8].c_str()};
// Initialization of haar cascade
if( !cascade )
{
system("pause");
return -1;
}
// Test if a webcam is connected. If no webcam is connected, use a movie as input
if( !input_name || (isdigit(input_name[0]) && input_name[1] == '\0') )
else
// Set up windows for viewing
//cvNamedWindow( VERSION, 1 ); //Window for viewing the webcam feed (only for
debugging)
cvNamedWindow( "Happy?", 1);
//Window for viewing the image showing if the user is
smiling
cvCreateTrackbar("Threshold","Happy?",&st,500,NULL);
//Create
a
trackbar
in
the
"Happy?" window, for adjusting the threshold
playMovie(0,file);
//Start playing movie 0 from array file
clipEnd = clipDuration[0]; //Set clipEnd to the duration of movie 0
int done = 0;
//done is set to 0, to make the smile detection start
if( capture )
{
while (!done)
//As long as done is 0...
{
// Create image from webcam feed to be detected on
break;
if( !frame )
break;
210
Group 08ml582
12. Appendix
if( !frame_copy )
else
// Create IplImage* mouthImage and save the result of mouthDetection in this
variable
mouthImage = mouthDetection( frame_copy, colorChange );
if (mouthImage->width != 0) //If mouthImage has a width bigger than 0 meaning that a mouth is found
{
mouthImage = resizeMouth(mouthImage); //Resize the mouthImage to the
desired size
//cvShowImage( VERSION, mouthImage ); //Show the mouthImage in the VERSION window
//cvShowImage( VERSION, frame_copy ); //Show the current frame in the VERSION window
/***************************************************************
** Method 2:
**
** The code below is used for the method checking for smiles **
** only on the three intial movie clips. Depending on what
**
** movie clip the user smiled most at, an according new set
**
** of clips is chosen.
**
***************************************************************/
if (isSmiling(mouthImage, smileTemplate, neutralTemplate, st))
// If user is smiling
{
frames.push_back(1);// Save to vector that the user is smiling
if (frames.size() > 5) // Keeps the vector size to max 5 (frames)
{
frames.erase(frames.begin(),frames.begin()+1);
}
}
else
{
frames.push_back(0);
if (frames.size() > 5)
{
}
}
smileCount = 0;
neutralCount = 0;
for (int i = 0; i<frames.size(); i++)
{
if (frames[i] == 1)
{smileCount++;}
else
{neutralCount++;}
}
if (smileCount > neutralCount)
{
cvShowImage( "Happy?", smiley);
colorChange = 3;
movieMood.push_back(1);
if (smiling == false)
{
smileTimer.push_back(clock());
}
smiling = true;
}
211
Group 08ml582
12. Appendix
if (smileCount < neutralCount)
{
cvShowImage( "Happy?", saddy);
colorChange = 0;
if (smiling == true)
{
}
smiling = false;
}
if (timer.elapsed(clipEnd+2000))
{//If current movie clip has ended
int mood = 0;
switch(init)
{
//switch statement makes sure that the program plays movie 0, followed by movie 3,
followed by movie 6, followed
//by two movies of the same style as the movie which scored the highest
case 0: //If the ended movie was movie 0
movieOrder.push_back(init);
//Store 0 in movieOrder's last place. MovieOrder keeps track
//of the what movies the user saw and in which order. Used for testing.
mood = 0;
for (int i = 0; i<movieMood.size(); i++)
//Calculate the sum of elements in movieMood.
//As 1 is smiling and 0 is not, this will give
//the total amount of smiling frames during the last video
{
mood += movieMood[i];
}
movieMood.clear();
//Store mood/duration of ended clip as the score for this clip
hStyle[0] = (float)mood/clipDuration[0];
init = 3;
playMovie(init, file); //play init from file (init = 3)
clipEnd = clipDuration[init]; //Set clipEnd to duration of clip
init
break;
case 1:
movieOrder.push_back(init); //Remember that this video is seen
init = 2;
playMovie(init, file);
clipEnd = clipDuration[init];
break;
case 2:
done = 1; //Terminate program
break;
case 3: //Same procedure as in case 0
mood = 0;
{
}
movieMood.clear();
init = 6;
clipEnd = clipDuration[6];
break;
case 4:
init = 5;
212
Group 08ml582
12. Appendix
playMovie(5, file);
break;
case 5:
done = 1;
break;
case 6:
mood = 0;
{
}
movieMood.clear();
cout << "Style 1 scores: " << hStyle[0] << endl << "Style 2
scores: " << hStyle[1] << endl << "Style 3 scores: " << hStyle[2] << endl;
//Outputs the calculated scores for each style
if (hStyle[0] >= hStyle[1] && hStyle[0] >= hStyle[2])
//If the score for style 1 (hStyle[0]) beats the score for the two other styles...
{
init = 1;
//Play the next movie in style 1
}
if (hStyle[1] > hStyle[0] && hStyle[1] >= hStyle[2])
{
init = 4;
}
if (hStyle[2] > hStyle[1] && hStyle[2] > hStyle[0])
{
init = 7;
}
break;
case 7:
init = 8;
playMovie(8, file);
break;
case 8:
done = 1;
break;
}
}
/***************************************************************
** End of method 2
**
***************************************************************/
}
else
{
cout << "No face detected!" << clock() << endl;
//Output a text followed by a time in ms, if the user's face is not detected
cvShowImage( VERSION, frame_copy );
}
cvReleaseImage( &mouthImage );//Release image (delete it from memory)
213
Group 08ml582
12. Appendix
if( cvWaitKey( 10 ) >= 0 )
//Wait for a keypress. If a key is pressed, the program is terminated
done = 1;
}
}
//cvDestroyWindow(VERSION);
cvDestroyWindow("Happy?");
//The following code writes to the log file. In stead of writing cout we write log,
but the rest is pretty self-explanatory
ofstream log ("log.txt");
for (int j=0; j<smileTimer.size(); j++)
{
if (j%2==0)
{
log << "User started smiling at " << smileTimer[j] << "ms" << endl;
}
else
{
log << "User stopped smiling at " << smileTimer[j] << "ms" << endl;
}
}
log << endl << "The user had the following \"Smile Score\" for each of the three humor
types:" << endl
<< "Style 1: " << hStyle[0] << "\nStyle 2: " << hStyle[1] << "\nStyle 3: " <<
hStyle[2] << endl;
log << endl << "User watched the movies in the following order:";
for(int k=0; k<movieOrder.size();k++)
{
log << " " << movieOrder[k];
}
log << "." << endl;
log << endl << "User smiled a total of " << smileTimer.size()/2 << " times." << endl;
return 0; //End of main
}
IplImage* mouthDetection( IplImage* img, unsigned char colorChange )
{
//Most of this is not developed by Group 08ml582, but is standard OpenCV code
{
{{0,0,255}},
{{0,128,255}},
{{0,255,255}},
{{0,255,0}},
{{255,128,0}},
{{255,255,0}},
{{255,0,0}},
{{255,0,255}}};
double scale = 2.2;
int i;
214
Group 08ml582
12. Appendix
if( cascade )
{
cvSize(30, 30) );
{
CvPoint facecenter;
int radius;
// Mouth detection
//Define the top left corner of the rectangle spanning the mouth
//Check report for further explanation
//Define the bottom right corner of the rectangle spanning the mouth
mouthDownRight.x = mouthUpLeft.x + radius;
mouthDownRight.y = mouthUpLeft.y + radius * 0.5;
cvRectangle( img, mouthUpLeft, mouthDownRight, colors[colorChange], 3, 8, 0);
//Create an rectangle as specified above. This is the mouth!
int step
uchar* data
//Set mouthPixels' different attributes
mouthPixels
=
cvCreateImage(
cvSize(mouthDownRight.y
mouthUpLeft.y,mouthDownRight.x - mouthUpLeft.x),IPL_DEPTH_8U, 1 );
mouthPixels->height
mouthPixels->width
uchar* data2
//Set data2 to the imageData of mouthPixels
{
{
//For
the
rectangle that makes up the mouth, save the image data from the webcam feed
data2Location++;
}}}}
//std::cout << detectedFace << std::endl;
-
specified
if(detectedFace == true)
{
// cvShowImage( VERSION, mouthPixels ); } else { cvShowImage( VERSION, gray);};
cvReleaseImage( &gray );
cvEqualizeHist(mouthPixels,mouthPixels);
return mouthPixels; //If a mouth was found, return the image that specifies the mouth
}
215
Group 08ml582
12. Appendix
else
{
gray->width = 0;
return gray; //If no mouth was found, return a gray image
}
};
{
IplImage* resizedMouth = cvCreateImage(cvSize(30,15),IPL_DEPTH_8U, 1);
//Create an IplImage* and set its size to 30*15
cvResize(src, resizedMouth, 1);
//resize input image to fit inside resizedMouth and store the image in resizedMouth
return resizedMouth; //return resizedMouth
};
IplImage*
neutralTemplate, int st)
{
int smileDist = 0;
//Call pixelDifference to calculate difference between mouthPicture and smileTemplate
and compare the value to the difference between mouthPicture and neutralTemplate +
threshold.
if
(pixelDifference(mouthPicture,smileTemplate)
<
neutralTemplate)+st)
{
return true; //If the difference was smaller between the mouthPicture and the
smileTemplate, frame is smiling, so we return true
}
else
{
return false; //If the difference was smaller between the mouthPicture and the
neutralTemplate, frame is not smiling, so we return false
}
};
{
CvScalar s, t;
double diff = 0;
{
{
//Save the current pixel value of first input image in CvScalar s
//Save the current pixel value of second input image in CvScalar t
//Increase diff by the absolute value of the difference between s and t
}
}
return sqrt(diff);
//return diff, which is now the total difference between the two input images
};
void playMovie(int newMovie, const char *file[])
{
system(file[newMovie]);
//Calls the command stored in place newMovie of array file
};
216
Group 08ml582
12. Appendix
Smile Detection Program Code
#define VERSION "Smile Detector v. 0.7 Release Candidate"
#include <cv.h>
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <assert.h>
#include <math.h>
#include <float.h>
#include <limits.h>
#include <time.h>
#include "timer.h"
#include <ctype.h>
#include <iostream>
#include <vector>
#include <fstream>
#include <cstring>
#ifdef _EiC
#define WIN32
#endif
// OpenCV variable decleration
// Function prototypes for the functions created by Group 08ml582
IplImage* mouthDetection( IplImage* image, unsigned char colorChange );
IplImage*
void sendToFlash(int smileCount, int neutralCount);
void playMovie(int newMovie, const char *file[]);
// Load the haar cascade used for face detection
"haarcascade_frontalface_alt2.xml";
{
// Initialization of variables
IplImage* smiley = cvLoadImage("smiley.jpg");
IplImage* saddy = cvLoadImage("saddy.jpg");
vector <bool> frames,movieMood;
vector <int> smileTimer,movieOrder;
Timer timer;
bool smiling = false;
int smileCount = 0, neutralCount = 0, st = 0, lineNum = 0, movieNr = 0, init = 0, fa =
0, cd = 0, clipEnd = 0;
float duration = 0.f, clipDuration[9], hStyle[3];
unsigned char colorChange = 0;
// Load playlist
string fileArray[9] = {"0","0","0","0","0","0","0","0","0"};
ifstream playList ("playlist.txt");
string line;
217
Group 08ml582
12. Appendix
if (playList.is_open())
{
while (! playList.eof() && lineNum < 18)
{
if(lineNum%2==0) //For every linenumber dividable by two (0,2,4,8...)
{
getline(playList,line); //Get the current line as a string (command for
starting a movie clip)
fileArray[fa] = line;
//Save current line in fileArray at position fa
fa++;
}
else
//For every odd linenumber
{
playList >> duration; //Find and store a float (duration of movie found in
previous line)
getline(playList,line);
//This line is needed in order to make the getline work properly in above if statement!
clipDuration[cd] = duration;
//Save current duration in clipDuration at position cd
cd++;
}
lineNum++;
}
}
// Create a const char* array and save all values of fileArray as c strings in this
array (needed for reading later)
const
char
*file[9]
=
{fileArray[0].c_str(),fileArray[1].c_str(),fileArray[2].c_str(),fileArray[3].c_str(),
fileArray[4].c_str(),fileArray[5].c_str(),fileArray[6].c_str(),fileArray[7].c_str(),
fileArray[8].c_str()};
// Initialization of haar cascade
if( !cascade )
{
system("pause");
return -1;
}
// Test if a webcam is connected. If no webcam is connected, use a movie as input
if( !input_name || (isdigit(input_name[0]) && input_name[1] == '\0') )
else
// Set up windows for viewing
//cvNamedWindow( VERSION, 1 );
//Window for viewing the webcam feed (only for debugging)
cvNamedWindow( "Happy?", 1);
//Window for viewing the image showing if the user is smiling
cvCreateTrackbar("Threshold","Happy?",&st,500,NULL);
//Create
a
trackbar
in
the
"Happy?" window, for adjusting the threshold
playMovie(0,file);
//Start playing movie 0 from array file
clipEnd = clipDuration[0]; //Set clipEnd to the duration of movie 0
int done = 0;
//done is set to 0, to make the smile detection start
if( capture )
{
while (!done)
//As long as done is 0...
{
// Create image from webcam feed to be detected on
break;
218
Group 08ml582
12. Appendix
if( !frame )
break;
if( !frame_copy )
else
// Create IplImage* mouthImage and save the result of mouthDetection in this
variable
mouthImage = mouthDetection( frame_copy, colorChange );
if (mouthImage->width != 0)
//If mouthImage has a width bigger than 0 - meaning that a mouth is found
{
mouthImage = resizeMouth(mouthImage);
//Resize the mouthImage to the desired size
//cvShowImage( VERSION, mouthImage );
//Show the mouthImage in the VERSION window
//cvShowImage( VERSION, frame_copy );
//Show the current frame in the VERSION window
/***************************************************************
** Method 2:
**
** The code below is used for the method checking for smiles **
** only on the three intial movie clips. Depending on what
**
** movie clip the user smiled most at, an according new set
**
** of clips is chosen.
**
***************************************************************/
if (isSmiling(mouthImage, smileTemplate, neutralTemplate, st))
// If user is smiling
{
// Save to vector that the user is smiling
// Keeps the vector size to max 5 (frames)
{
}
}
else
{
{
}
}
smileCount = 0;
neutralCount = 0;
for (int i = 0; i<frames.size(); i++)
{
if (frames[i] == 1)
{smileCount++;}
else
{neutralCount++;}
}
if (smileCount > neutralCount)
{
cvShowImage( "Happy?", smiley);
colorChange = 3;
219
Group 08ml582
12. Appendix
if (smiling == false)
{
}
smiling = true;
}
if (smileCount < neutralCount)
{
cvShowImage( "Happy?", saddy);
colorChange = 0;
if (smiling == true)
{
}
smiling = false;
}
if (timer.elapsed(clipEnd+2000))
{//If current movie clip has ended
int mood = 0;
switch(init)
{
//switch statement makes sure that the program plays movie 0,
followed by movie 3, followed by movie 6, followed
//by two movies of the same style as the movie which scored the
highest
case 0: //If the ended movie was movie 0
//Store 0 in movieOrder's last place. MovieOrder keeps track of the what movies the user
saw and in which order. Used for testing.
mood = 0;
//Calculate the sum of elements in movieMood.
//As 1 is smiling and 0 is not, this will give the total amount of smiling frames during
the last video
{
}
movieMood.clear();
//Store mood/duration of ended clip as the score for this clip
init = 3;
playMovie(init, file); //play init from file (init = 3)
clipEnd = clipDuration[init];//Set clipEnd to duration of clip
init
break;
case 1:
movieOrder.push_back(init); //Remember that this video is seen
init = 2;
clipEnd = clipDuration[init];
break;
case 2:
done = 1;
//Terminate program
break;
case 3: //Same procedure as in case 0
mood = 0;
{
}
movieMood.clear();
220
Group 08ml582
12. Appendix
init = 6;
break;
case 4:
init = 5;
playMovie(5, file);
break;
case 5:
done = 1;
break;
case 6:
mood = 0;
{
}
movieMood.clear();
cout << "Style 1 scores: " << hStyle[0] << endl << "Style
scores: " << hStyle[1] << endl << "Style 3 scores: " << hStyle[2] << endl;
//Outputs the calculated scores for each style
if (hStyle[0] >= hStyle[1] && hStyle[0] >= hStyle[2])
{
init = 1;
}
if (hStyle[1] > hStyle[0] && hStyle[1] >= hStyle[2])
{
init = 4;
}
if (hStyle[2] > hStyle[1] && hStyle[2] > hStyle[0])
{
init = 7;
}
break;
case 7:
init = 8;
playMovie(8, file);
break;
case 8:
done = 1;
break;
}
}
/***************************************************************
** End of method 2
**
***************************************************************/
}
2
221
Group 08ml582
12. Appendix
else
{
cout << "No face detected!" << clock() << endl;
//Output a text followed by a time in ms, if the user's face is not detected
cvShowImage( VERSION, frame_copy );
}
cvReleaseImage( &mouthImage ); //Release image (delete it from memory)
if( cvWaitKey( 10 ) >= 0 )
//Wait for a keypress. If a key is pressed, the program is terminated
done = 1;
}
}
//cvDestroyWindow(VERSION);
cvDestroyWindow("Happy?");
//The following code writes to the log file. In stead of writing cout we write log,
but the rest is pretty self-explanatory
ofstream log ("log.txt");
for (int j=0; j<smileTimer.size(); j++)
{
if (j%2==0)
{
log << "User started smiling at " << smileTimer[j] << "ms" << endl;
}
else
{
log << "User stopped smiling at " << smileTimer[j] << "ms" << endl;
}
}
log << endl << "The user had the following \"Smile Score\" for each of the three humor
types:" << endl
<< "Style 1: " << hStyle[0] << "\nStyle 2: " << hStyle[1] << "\nStyle 3: " <<
hStyle[2] << endl;
log << endl << "User watched the movies in the following order:";
for(int k=0; k<movieOrder.size();k++)
{
log << " " << movieOrder[k];
}
log << "." << endl;
log << endl << "User smiled a total of " << smileTimer.size()/2 << " times." << endl;
return 0; //End of main
}
IplImage* mouthDetection( IplImage* img, unsigned char colorChange )
{
//Most of this is not developed by Group 08ml582, but is standard OpenCV code
{
{{0,0,255}},
{{0,128,255}},
{{0,255,255}},
{{0,255,0}},
{{255,128,0}},
{{255,255,0}},
{{255,0,0}},
{{255,0,255}}
};
double scale = 2.2;
222
Group 08ml582
12. Appendix
int i;
if( cascade )
{
cvSize(30, 30) );
{
CvPoint facecenter;
int radius;
// Mouth detection
//Define the top left corner of the rectangle spanning the mouth
//Check report for further explanation
//Define the bottom right corner of the rectangle spanning the mouth
mouthDownRight.x = mouthUpLeft.x + radius;
mouthDownRight.y = mouthUpLeft.y + radius * 0.5;
//Create an rectangle as specified above. This is the mouth!
cvRectangle( img, mouthUpLeft, mouthDownRight, colors[colorChange], 3, 8, 0);
int step
uchar* data
//Set mouthPixels' different attributes
mouthPixels = cvCreateImage( cvSize(mouthDownRight.y mouthUpLeft.y,mouthDownRight.x - mouthUpLeft.x),IPL_DEPTH_8U, 1 );
mouthPixels->height
mouthPixels->width
//Set data2 to the imageData of mouthPixels
uchar* data2
{
{
//For the specified rectangle that makes up the mouth, save the image data from webcam
data2Location++;
}
}
}
}
if(detectedFace == true)
{ cvReleaseImage( &gray );
223
Group 08ml582
12. Appendix
cvEqualizeHist(mouthPixels,mouthPixels);
return mouthPixels; //If a mouth was found, return the image that specifies the mouth
}
else
{
gray->width = 0;
return gray; //If no mouth was found, return a gray image
}
};
{
IplImage* resizedMouth = cvCreateImage(cvSize(30,15),IPL_DEPTH_8U, 1); //Create an
IplImage* and set its size to 30*15
cvResize(src, resizedMouth, 1); //resize input image to fit inside resizedMouth and
store the image in resizedMouth
return resizedMouth; //return resizedMouth
};
IplImage*
neutralTemplate, int st)
{
int smileDist = 0;
//Call pixelDifference to calculate difference between mouthPicture and smileTemplate
and compare the value to the
//difference between mouthPicture and neutralTemplate + threshold.
if
(pixelDifference(mouthPicture,smileTemplate)
<
neutralTemplate)+st)
{
return true; //If the difference was smaller between the mouthPicture and the
smileTemplate, frame is smiling, so we return true
}
else
{
return false; //If the difference was smaller between the mouthPicture and the
neutralTemplate, frame is not smiling, so we return false
}
};
{
CvScalar s, t;
double diff = 0;
{
{
//Save the current pixel value of first input image in CvScalar s
//Save the current pixel value of second input image in CvScalar t
//Increase diff by the absolute value of the difference between s and t
}
}
return sqrt(diff);
//return diff, which is now the total difference between the two input images
};
void playMovie(int newMovie, const char *file[])
{
system(file[newMovie]); //Calls the command stored in place newMovie of array file
};
224
Group 08ml582
12. Appendix
12.3 Forms
Consent form
Test of animated movie
Age:
18-24
25-31
32+
If you would like us to contact you when further test of this product is to be conducted, please fill out the following
form too:
Name:
_______________________________________________________________
Address:
_______________________________________________________________
Zip-code
city:
& _______________________________________________________________
Phone:
_______________________________________________________________
E-mail:
_______________________________________________________________
This test is performed in order for the study group 08ml582 of Aalborg University Copenhagen, to collect data
about the product you are about to test. The test will last approximately 10 minutes and you are free to leave the
test at any time, if you feel you do not want to continue. You will be sitting in a room together with three observers.
All three persons are from the previously mentioned study group. One person will be in charge of technical issues,
one person will be an observer and one person will be there to help you in case of any doubts about the test.
The goal of the project is to determine whether it is possible to establish a new way for viewers to watch interactive
movies. In this test, your reactions to certain animated clips will be tested and afterwards, you will be asked to fill
out a questionnaire about what you have been trough.
Your personal information will only be used by the group internally and none of your personal information will be
given to third parts.
If you agree to these terms, please state so below.
I understand the terms above and agrees to participate in the test on these terms:
Yes
No
Date
_____________
Signature
_____________________________________________________
225
Group 08ml582
12. Appendix
Questionnaire for test of animatics
Age:
18-24
25-31
32+
On a scale from 1 to 5, where 1 is no change and 5 is a lot of changes, how often did you find
the humor style to change during the test?
1
2
3
4
5
On a scale from 1 to 5, where 1 is no control and 5 is full control, how much did you feel you
had control over the choice of movie clips?
1
2
3
4
5
If you felt that you were in control, how did you feel you were controlling the movie clips?
____________________________________________________________________________________________________________
____________________________________________________________________________________________________________
____________________________________________________________________________________________________________
____________________________________________________________________________________________________________
Which of the following clips did you think were the funniest?
226
Group 08ml582
12. Appendix
Questionnaire for final test
Which of the following clips did you think were the funniest?
Which of the following clips did you think the last two clips (the black-white ones) were most
similar to?
227
Group 08ml582
12. Appendix
12.4 Test Results
Initial test
For the initial test, the users were asked whether the sensed the style of humor change during
the test. They were to answer this on a scale from 1 to 5. Users were also asked, on a scale from
1 to 5, whether they thought they were in control of the change in humor style. Choosing
between three screenshots, they had to choose the funniest movie clip in their opinion. The
program was detecting which clip the users smiled the most at.
Age
Style change
Control User choice
Detected funniest clip
Correct choice
Person 1
25-31
3
2
2
N/A
False
Person 2
18-24
3
3
3
2
False
Person 3
18-24
4
1
2
N/A
False
Person 4
18-24
3
2
2
3
False
Person 5
18-24
4
1
3
1
False
Person 6
18-24
1
1
1
1
True
Person 7
25-31
2
1
3
3
True
Person 8
25-31
1
1
2
2
True
Person 9
25-31
4
2
2
3
False
Person 10 25-31
3
1
3
1
False
Style change
228
Control
0 = No change
0 = No change
2
2
3
3
4
4
5 = A lot of
change
5 = A lot of
change
Group 08ml582
12. Appendix
Final Test
The final test was in many ways conducted in the same way as the initial test. The general test
setup was the same, but instead of only testing on the reactive program, every second test
person was tested on a program that did not react according to the user’s smile.
The reactive test:
Age
User funniest clip
Fitting style
Person 1 18-24
3
3
1
95499
Person 3 18-24
3
1
2
26284
Person 5 18-24
3
3
3
26662
Person 7 18-24
2
1
1
445
Person 9 18-24
1
1
3
71847
Person 11 18-24
2
1
2
6733
Person 13 18-24
2
1
1
70333
Person 15 18-24
3
3
1
21374
Person 17 18-24
2
2
1
3865
Person 19 18-24
3
1
1
18699
Person 21 18-24
3
1
1
104697
Person 23 18-24
3
2
3
128658
Person 25 25-31
3
3
3
1787
Person 27 18-24
3
3
3
47395
Person 29 18-24
3
3
3
52642
All users total smile time:
Average user smile time:
Total smile time (ms)
676920 ms
22564 ms
229
Group 08ml582
12. Appendix
Preferred clip
Preferred clip
User
Program
6%
Style 1
27%
40%
47%
Style 2
Style 3
67%
Style 1
Style 2
Style 3
13%
The non-reactive test:
Age
User funniest clip
Fitting style
Total smile time
Person 2
18-24
1
1
1
24499
Person 4
18-24
3
0
1
4441
Person 6
18-24
3
2
3
82507
Person 8
18-24
3
3
1
25047
Person 10
18-24
3
2
1
16073
Person 12
25-31
3
3
1
67742
Person 14
18-24
3
1
1
24220
Person 16
18-24
3
2
1
55104
Person 18
18-24
3
3
1
4814
Person 20
18-24
3
1
3
31198
Person 22
18-24
1
2
3
1473
Person 24
18-24
2
2
1
61533
Person 26
18-24
3
3
1
6466
Person 28
18-24
3
1
1
2120
Person 30
18-24
3
3
1
9030
All users total smile time:
Average user smile time:
230
416267 ms
13875.57 ms

Preferred clip - Heino Jørgensen

Transcription

Similar documents

THEDROPBOXFILM.ca

Creating Picture-in-Picture Movie Files (Using QuickTime) 1. Open

MOVIE DAY

Case study - The Da Vinci Code The Book… The Film…

ACTIVITY 9: MOVIE RELEASE DATES Activity Overview: Instructions:

here - Laffa

Dukes Diary - Worldwide Discipleship Association

April 24 News - Chautauqua County Community USD 286

the monthly buzz - Ballina Public School

Going to the Movies - Children`s Specialized Hospital