FINAL YEAR PROJECT THESIS Project Members Project Supervisors

Transcription

FINAL YEAR PROJECT THESIS Project Members Project Supervisors
FINAL YEAR PROJECT THESIS
Project Members
Syed Waqas Burney 2004185
[mail@waqasburney.com]
Mutahira Ikram Khan 2004136
[mutahirakhan@gmail.com]
Project Supervisors
Mr. Badre Munir
FCSE
[badr@giki.edu.pk]
Mr. Umar Shafique
FES
[shafique@giki.edu.pk]
Faculty of Computer Science & Engineering
GIK Institute, Pakistan
[May, 2008]
Smart Control of Domestic Appliances
using a Computer Vision-based Approach
Codename: imaGInation KIeve
1
CERTIFICATE OF APPROVAL
It is certified that the work contained in this thesis, entitled “vSmart - Smart Control
of Domestic Appliances using a Computer Vision-based Approach”, was carried out
by Syed Waqas Ali Burney and Mutahira Ikram Khan under the supervision of Mr.
Badre Munir, FCSE and Mr. Umar Shafique, FES for the partial fulfillment of the
Degree Requirement of Bachelors of Sciences in Computer System Engineering.
Project Advisors
[May 10, 2009]
Mr. Badre Munir
Mr. Umar Shafique
2
ACKNOWLEDGEMENTS
First and foremost, we would like to thank Almighty Allah for Blessing us with all the
energy, enthusiasm, knowledge, wisdom, courage and much more, to help us
achieve our goals and complete, very successfully, our Final Year Project.
We are thankful to our parents for providing us with their invaluable love and
support, and instilling in us the very confidence and morale that set us out in the
search for success. We would also like to express our earnest gratitude to our
advisors, Mr. Badre Munir and Mr. Umar Shafique for their priceless guidance and
stimulation throughout the course of the project, and for providing to us
constructive support at every step.
Last, but definitely not the least, we would like to thank our many student mentors
and friends without whom this project would not have been possible. The deepest
and most gratifying words of thanks’ for our friends Mohammad Yousuf (GIKI Class
of 2007) and Hafiz Faheem Raza (GIKI, Class of 2007) for their endless support,
heartfelt efforts and for making this all possible. Thank you so much. Due
acknowledgements also to our mentors at Intel– Dr. Edwin Chung & Edwin Lee, and
the many others- Rana Mohammad Bilal (GIKI, Class of 2009), Iqbal Talat Bhatti (GIKI,
Class of 2006), Mr. Murtaza Shabbir Safri (GIKI, Class of 2006), Mr. Junaid Shahid
(GIKI, Class of 2006) and Mr. Mohammad Nasrullah (GIKI, Class of 2004).
Thank you all!
3
Dedicated to our parents...
4
TABLE OF CONTENTS
PROJECT TITLE ...........................................................................................................1
CERTIFICATE OF APPROVAL ......................................................................................2
ACKNOWLEDGEMENTS .............................................................................................3
EXECUTIVE SUMMARY ..............................................................................................7
1. INTRODUCTION .....................................................................................................8
1.1 BACKGROUND ..............................................................................................8
1.2 OBJECTIVE ....................................................................................................8
2. PROJECT DESIGN ....................................................................................................9
2.1 ARCHITECTURE OVERVIEW ...........................................................................9
2.2 MODULARIZATION .....................................................................................10
2.3 MODEL DESIGN ...........................................................................................11
2.4 HARDWARE DESIGN....................................................................................13
2.4.1. SERIAL INTERFACING ..............................................................................13
2.4.2. MICROCONTROLLER INTERFACING .............................................................14
2.4.3. INTERFACE CIRCUITRY .............................................................................16
2.5 SOFTWARE DESIGN .....................................................................................18
2.5.1. DATABASE DESIGN ................................................................................18
3. PROJECT FUNCTIONALITY ....................................................................................21
3.1 SYSTEM DEPLOYMENT AND CONFIGURATION ...........................................21
3.2 HUMAN PRESENCE DETECTION AND ACTIVITY MONITORING ...................24
3.2.1. MOTION DETECTION FOR CONTINOUS VIDEO STREAMS ................................24
3.2.2. THE POINT-IN-POLYGON ALGORITHM .........................................................30
3.3 THE DECISION-TAKING SOFTWARE ENGINE ................................................34
3.4 THE APPLIANCE CONTROL MECHANISM .....................................................35
4. PROJECT LIMITATIONS .........................................................................................36
5. CONCLUSION .......................................................................................................37
REFERENCES.............................................................................................................38
5
TABLE OF FIGURES
FIGURE 01 - THE VSMART ARCHITECTURE ............................................................... 9
FIGURE 02 - MODEL SKETCH UP .............................................................................. 11
FIGURE 03 - INITIAL MODEL-BUILD SNAPSHOTS ..................................................... 11
FIGURE 04 - POST-APPLIANCE SETUP SNAPSHOTS ................................................. 12
FIGURE 05 - A4TECH PK 5 WEB CAMERA ................................................................ 12
FIGURE 06 - THE RS232 CONNECTOR ..................................................................... 13
FIGURE 07 - THE MICROCONTROLLER SNAPSHOT .................................................. 14
FIGURE 08 - MICROCONTROLLER CONNECTIVITY TO THE SERIAL PORT ................ 15
FIGURE 09 - THE INTERFACE CIRCUITRY .................................................................. 16
FIGURE 10 - THE CIRCUIT DIAGRAM ....................................................................... 17
FIGURE 11 - THE DATABASE DIAGRAM .................................................................. 18
FIGURE 12 - CAMERA REGISTRATION .................................................................... 21
FIGURE 13 - DEPLOYED CAMERA-VIEW SNAPSHOTS ............................................. 21
FIGURE 14 - DEFINING HOTSPOTS FOR THE SYSTEM .............................................. 22
FIGURE 15 - INITIAL SYSTEM CONFIGURATIONS .................................................... 23
FIGURE 16 - MOTION DETECTION ALGORITHMS .................................................... 29
FIGURE 17 - MER FORMATION ON A MOVING OCCUPANT .................................... 30
FIGURE 18 - THE POINT-IN-POLYGON FIGURES ....................................................... 31
FIGURE 19 - THE SYSTEM APPLICATION ................................................................. 33
FIGURE 20 - THE APPLIANCE CONTROL MECHANISM ............................................. 35
6
EXECUTIVE SUMMARY
vSmart (codenamed: ImaGInation KIeve) achieves at a basic, yet a truly foundational
level, a flexible and user-customizable system which using low-cost cameras
intelligently controls basic domestic appliances, such as room lights, fans and lamps
in local environments, such as homes and offices.
However, yet evolving from its ideation phase, as its first prototype, it merely serves
to mark a niche as an innovative approach taken into the concept of homeautomation using the much-evolving scientific discipline of Computer Vision. We
much firmly believe that research and further experimentation using this lateral
approach and its fusion thereon with common technological gadgetries such as RFID
and Infra-Red sensors can certainly much evolve these basic, yet solid and
foundational, results to new epoch heights; taking the vision of Smart Houses from
the realm of imagination to actual practice, and thereby achieving multidimensional
user-convenience and using modern technology to endow a sustainable
environment in our everyday lives.
7
1. INTRODUCTION
1.1 Background
The dream of an intelligent home that automatically controls the living environment
and responds to individual preferences has been around, amidst much research, for
more than three decades. However, high costs, limited capability, difficulty-in-use
factors and reliability issues have imposed major constraints on the market, keeping
home automation more in the realm of imagination than practice.
Domotics is the field of study of specific automation requirements for homes, and
the application of innovative techniques for more comfort and convenience in and
around the home. At our time initial literature reading and survey, what was
surprising to us as a group, was that a much of the work in this growing field has
actually been on-going using various electronic devices, microcontrollers and many
sensor-based approaches. Reasons for this would primarily perhaps attribute to cost
control, however, thereby compromising with flexibility and general practical use.
1.2 Objective
As computer science and engineering majors’, the team upon much brain-storming
and ideation, decided to take up a unique challenge to experiment a fusion of
Domotics with the much-evolving scientific discipline of Computer Vision. Computer
Vision is a branch of Applied computing concerned with computer processing of
images from the real world for the extraction of useful information.
The aim of the project was to prototype, at a very foundational level, a flexible and
user-customizable system which would use low-cost cameras to intelligently control
basic domestic appliances, such as room lights, fans and lamps in local environments,
such as homes and offices.
8
2. PROJECT DESIGN
2.1 Architecture Overview
vSmart, once configured as per the needs of a user for a given environment,
maintains a knowledge base for the 'smart' behavior of various electronic appliances.
The prototype upon deployment, using the low cost and low resolution (web)
cameras, detects for optimized motion activity within these local environments, and
thereby feeds the detected activity to the software engine. The motion detected
signifies user(s) presence and movements within the environment. The expertsystem here-on, makes the decisions and by spawning multiple threads, concurrently
controls the various, serially interfaced, domestic hardware appliances set upon the
prototype model house
Figure 1 - The vSmart architecture
9
2.2 Modularization
The vSmart architecture, as show in Figure 1, can be modularized as the following:
 System Deployment and Configuration
 Camera Deployments
 Camera Registration
 Defining Hotspots
 System Configuration
 Detecting Human Presence and Activity Monitoring
 Motion Detection
 MER formation
 Threading
 The Decision-taking Software Engine
Action Triggering
 HotSpot overlap calculations
 Season
 Time
 Reaction Generation
 Trigger Times Wait
 Trigger Firing
 The Appliance-control Mechanism




Serial Transmission
Microcontroller
Appliance Control
10
2.3 Model Design
vSmart was prototyped upon a realistic scaled-down model house. The model,
3.5’x3.5’x1.5’ in dimensions, was a double-walled structure for internal wiring, and
depicted a furnished two-bedroom house of dimensions 1.5’x1.5’x 1.5’ each.
Moreover, the elevated model had a thin cardboard base for usage of magnets on
magnetic toy-men, 1.5”x1”x2” in dimensions, to depict natural human movements
within the house. All furniture thereby also had cuts drawn on its sides for the toymen to “move in” into the furniture-space.
 Model sketch Up
Outer
Slide-Up-and-Removable
Walls
ROOM
Normal Walls
Wiring concealed in between by the removable Outer Wall
Figure 2
 Initial model-build snapshots
Figure 3
11
 Post-appliance setup snapshots
Cameras
Figure 4
Low-cost A4tech Pk-5 web cameras were used for the vision-based system on boththe actual model itself, and also during demo-testing in an actual student hostelroom. These 1/4"CMOS cameras, having a frame rate of 30fps at a 320x240 picture
resolution, had a view angle of 54°, and were thereby aligned for maximum required
coverage within the given environment. The system used was a regular home
computer- a Pentium 4 (2.4Ghz), with 512MB DDR2 Ram and an in-build VGA card.
Camera Specifications
Image Sensor:1/4"CMOS, 640×480pixels
Frame Rate:30fps@640x480, @600x800, @320x240, @160x120
Lens: F=2.4,f=4.9mm
View Angle:54 degree
Focus Range: Automatic focus, 30cm to infinity
Exposure Control: Automatic
White Balance: Automatic
Still Image Capture Res.:1280X960,600x800, 640X480, 352x288, 320x240
Figure 5
12
2.4 Hardware Design
The electronic interfacing circuitry is divided into three parts, which are as follows:

Serial interfacing

Microcontroller interfacing

Main circuitry
2.4.1 Serial interfacing:
The RS232 serial port is used to transmit data between the PC and the
microcontroller. One of the major functions of the serial port is to put data into a
serial format so that it can be transmitted via modem. The one good feature is that
the RS232 needs three wires between the PC and the microcontroller. One line is
data transmit; one line is data receive, and the last line is a common ground between
the two devices.
The draw backs to using RS232 are that it uses negative logic where a ‘1’ is -3V to 12V and a ‘0’ is +3V to +12V and the region from -3V to 3V is undefined. The
microcontroller uses standard TTL logic so the RS232 signal has to be sent through
another device to convert the negative logic back to TTL. This adds hardware to the
system which adds difficulty to production.
MAX233 chip can be used which
converts the negative logic to TTL and keeps the data in a serial format.
Figure 6 – The RS232 Connector
13
IBM PC/Compatible computers based on x86 microprocessors normally have two
COM ports. Both COM ports have RS-232 type connectors. Many PCs use one each of
the DB-25 and DB-9 RS232 connectors. The COM ports are designated as COM 1 and
COM 2. At the present time COM 1 is used for the mouse and COM 2 is available for
devices such as a modem. 89C51 serial port can be connected to the COM 2 port of a
PC for serial communications experiments.
2.4.2 Microcontroller Interfacing
ATMEL AT89C51 microcontroller is used in the
project. The AT89C51 is a low-power, highperformance CMOS 8-bit microcomputer with
4K bytes of Flash programmable and erasable
read only memory (PEROM). The device is
manufactured
using
Atmel’s
high-density
Figure 7 – The microcontroller snapshot
nonvolatile memory technology. The on-chip Flash allows the program memory to
be reprogrammed in-system or by a conventional nonvolatile memory programmer.
By combining a versatile 8-bit CPU with Flash on a monolithic chip, the Atmel
AT89C51 is a powerful microcomputer which provides. A highly-flexible and costeffective solution to many embedded control applications.
There are advanced generations of microcontroller available these days but as this
prototype doesn’t require real time data, 89c51 fulfills the project requirement. The
8051 has two pins that are used specifically for transferring and receiving data
serially. These two pins are called TxD and RxD and are of the part of the port 3
group. (P3.0 and P3.1). Pin 11 of the 89C51 is assigned to TxD and pin 10 is
designated as RxD. These pins are TTL compatible. One such line driver is the
MAX233 chip.
Since the RS232 is not compatible with today’s microcontrollers we need a line
driver to convert RS232 signal to TTL’s voltage levels that will be acceptable to the
TxD and RxD pins. One example of such a converter is MAX232 and MAX 233. MAX
233 converts from RS232 voltage levels to TTL voltage levels, and vice versa. One
advantage of the MAX233 chip is that it uses +5 V power-source which is the same
14
source as that of the microcontroller. MAX 233 has two sets of line drivers for
transferring and receiving data. The line drivers used for TxD are called T1 and T2.
While the line drivers for RxD are designated as R1 and R2. In this project only one of
each is used. Signal send to the microcontroller by the software has the information
regarding which device is to be controlled and to which pin of the controller it is
connected to. Microcontroller is programmed to interpret the signal in such a way
that it identifies which electronic device, in which part of the house, it has to turn on
or off. Port1 and Port2 are used to connect 16 devices. Microcontroller has 4 ports
and one controller can be used to connect 30 devices. To further extend the number
of devices to be controlled, latches can be connected to the controller port and then
devices to the latches.
Figure 8 – Microcontroller connectivity to the serial port circuit diagram
15
2.4.3 Interface Circuitry:
Microcontroller is attached to the devices through the main interface circuitry.
Interface circuitry consists of the following components:
 2n222 Transistors
 Diodes
 Relays
 Capacitors
 Resistors
Figure 9 – Interface Circuitry
Transistors:
Microcontroller does not provide sufficient current to drive the relays. Transistors
are used to provide sufficient current to derive the relay as well as to protect the
microcontroller from burning off. An electrical signal can be amplified by using a
device that allows a small current or voltage to control the flow of a much larger
current. Transistors are the basic devices providing control of this kind.
Modern transistors are divided into two main categories: bipolar junction transistors
(BJTs) and field effect transistors (FETs). Application of current in BJTs and voltage in
FETs between the input and common terminals increases the conductivity between
the common and output terminals, thereby controlling current flow between them.
Diodes:
Diodes allow electricity to flow in only one direction. The arrow of the circuit symbol
shows the direction in which the current can flow. Diodes are the electrical version
of a valve and early diodes were actually called valves.
Relays:
A relay is an electrical switch that opens and closes under the control of another
electrical circuit. Relays are connected to the devices and switch them off/on.
When a current flows through the coil, the resulting magnetic field attracts an
armature that is mechanically linked to a moving contact. The movement either
makes or breaks a connection with a fixed contact. When the current to the coil is
switched off, the armature is returned by a force approximately half as strong as the
magnetic force to its relaxed position. Most relays are manufactured to operate
16
quickly. In a low voltage application, this is to reduce noise. In a high voltage or high
current application, this is to reduce arcing.
The contacts in the relay are described as "normally open" (NO) or "normally closed"
(NC). This simply describes what the "at rest" state is. For a relay, that means if no
power is applied to the coil/trigger wire. In the typical case where something is to be
turned on, "normally open" set of contacts is used so that when power applied to
the relay, the contacts close, and power is sent to the desired device. In the case of
wanting to turn something off, "normally closed" set of contacts are used so that
when power is applied to the relay, the contacts open and the power is no longer
sent to the desired device.
Resistor:
Resistor is connected in the circuit to provide safety. If no resistor is added in the
circuit, as VCC and ground are serially connected, there will be a heavy flow of
current, which can damage/burn the coil of the relay.
Figure 10 – The complete circuit diagram
17
2.5 Software Design
2.5.1 Database Design
The database diagram as seen in the SQL Server diagram view:
Figure 11 – The database diagram
The database, simple and small as it is, incorporates 5 tables:

Cams

Polygons

Coords
18

Appliances

Reactions
Cams
The table contains the information about all the cameras plugged in to the system
and their respective configurations
ATTRIBUTE
DESCRIPTION
id
The primary key of the table; each camera is assigned an ID
name
The moniker name of the camera
roomName
The room where a particular camera is deployed
autoDeploy
To set if the camera should auto or manually be started
MERminHeight
The minimum height limit of the MER
MERmaxHeight
The maximum height limit of the MER
MERminWidth
The minimum width limit of the MER
MERmaxWidth
The maximum width limit of the MER
MERdiffThreshold
The value of the threshold filter for the camera
MERframesPerUpdate The “speed” with which the bg frame catches current frame
MERframeSkipOption
To set if frames should be allowed to be dropped
MERframeSkip
The number of frames to drop everytime
Polygons
The table contains basic information about the HotSpots marked for every camera
ATTRIBUTE
DESCRIPTION
camId
A foriegn key representing the camera associated
Id
The primary key of the table; each polygon is assigned an ID
name
The room where a particular camera is deployed
Coods
The table contains the coordinate information about all HotSpots
ATTRIBUTE
DESCRIPTION
polyid
A foriegn key representing the HotSpot polygon
CoodOrder
(Imp) To return coordinate in the order they were saved
xCood
The x-coordinate marked for the various polygons
yCood
The y-coordinate marked for the various polygons
19
Appliances
The table contains the information of the appliances attached to the system for a
particular environment
ATTRIBUTE
DESCRIPTION
pinNo
The primary key of the table; also the pin of the h/w device
name
The name of the appliance
camID
A foriegn key representing the camera associated
currentState
The present state (on/off) of the appliance
Reactions
This is the table alone which forms the knowledge base of the mini-expert system! 
ATTRIBUTE
DESCRIPTION
id
The primary key of the table; each reaction is assigned an ID
roomID
A foriegn key representing the room associated
polygonID
A foriegn key representing the HotSpot associated
appliancePinNo
A foriegn key representing the appliance associated
action
Has Entered, Staying, Leaving
isDay
The boolean differenciating between the day and night
isSummer
The boolean differenciating between summer and winter
applianceFinalState
The final state of the appliance that is to be
applianceTriggerTime
The time-of-wait (ms) before the reaction is fired
20
3. PROJECT FUNCTIONALITY
The vSmart modules, as outlined in the system architecture above, integrate to
achieve complete functionality in the following mentioned steps:
3.1 System Deployment and Configuration
The system, before being put for use, is to be carefully
deployed at least once. System deployment involves
mounting the cameras in the rooms at best-coverage
locations, registering them within the application (plugand-play detection) and then optimally configuring them
for the particular environment in which they have
deployed. It may be noted that, if needed, multiple
cameras may be deployed and registered with the system
within the given locality.
Figure 12 – Camera registration
Figure 13 – Deployed-camera snapshots
Marking “HotSpots”:
Once this done, the “HotSpots” are marked for the given environment. HotSpots are
enclosed regions (sets of coordinate points) marked on a “clean” camera-instanced
“base-image”, which thereby define for the system the region of a particular
location, e.g. the bed, in that camera captured environment. The marking is carried
out using a simple draw-able mouse-pointer.
21
Figure 14 – Defining HotSpots for the system
22
The system, hence, now “knows” where for e.g. in a particular room are located the
bed, study table, sofa, etc. It must be noted that these deployment-settings are valid
only till when the furniture settings or the camera positioning has not been changed
within the environment for that particular system-deployed camera.
As a final step to system deployment comes configuring, as per user requirements,
the decision-taking, multi-threaded software engine– the mini expert system. The
relational database-based expert system consists of a large number of standard, predefined facts within. An example of in-built fact can be the particular that if in the
Summer season and in the afternoon day-time, a person enters the room, then
(only) the ceiling fan should switch on by itself, whereas all the lamps and lights need
not to. These rules have been put up as a standard; however, as mentioned above,
they can well- easily be customized from application GUI, as per requirements of the
users in the environment. Moreover, based on changing seasons, number of users
and other conditions system notifications may also be generated to recommend to
or inquire from the users their preferences of appliance behavior in their
environment.
Figure 15 – Initial system configurations
23
3.2 Human Presence Detection and Activity Monitoring
vSmart, following upon a vision-based approach, uses motion detection to judge the
presence of human(s) and the activity in any given environment.
The motion detection application is based on the AForge.NET framework.
AForge.NET is a C# framework designed for developers and researchers in the fields
of Computer Vision and Artificial Intelligence - image processing, neural networks,
genetic algorithms, machine learning, etc.
At this point the framework is comprised of 5 main and some additional libraries:
 AForge.Imaging – a library for image processing routines and filers;
 AForge.Neuro – neural networks computation library;
 AForge.Genetic – evolution programming library;
 AForge.Vision – computer vision library;
 AForge.Machine Learning – machine learning library.
The work on the framework's improvement is in constants progress, what means
that new feature and namespaces are coming constantly.
For the project, we have used the framework’s Imaging and Vision libraries
3.2.1 Motion Detection for Continuous Video Streams
There are many approaches for motion detection in a continuous video stream. All of
them are based on comparing of the current video frame with one from the previous
frames or with something that we'll call background. In this article, we'll try to
describe some of the most common approaches:
One of the most common approaches is to compare the current frame with the
previous one. It's useful in video compression when you need to estimate changes
and to write only the changes, not the whole frame. But it is not the best one for
motion detection applications. Describing the idea more closely:
Assume that we have an original 24 bpp RGB image called current frame (image), a
grayscale copy of it (currentFrame) and previous video frame also gray scaled
(backgroundFrame). First of all, let's find the regions where these two frames are
differing a bit. For the purpose we can use Difference and Threshold filters.
24
// create filters
Difference differenceFilter = new Difference( );
IFilter thresholdFilter = new Threshold( 15 );
// set backgroud frame as an overlay for difference filter
differenceFilter.OverlayImage = backgroundFrame;
// apply the filters
Bitmap tmp1 = differenceFilter.Apply( currentFrame );
Bitmap tmp2 = thresholdFilter.Apply( tmp1 );
On this step we'll get an image with white pixels on the place where the current
frame is different from the previous frame on the specified threshold value. It's
already possible to count the pixels, and if the amount of it will be greater than a
predefined alarm level we can signal about a motion event.
But most cameras produce a noisy image, so we'll get motion in such places, where
there is no motion at all. To remove random noisy pixels, we can use an Erosion
filter, for example. So, we'll get now mostly only the regions where the actual
motion was.
// create filter
IFilter erosionFilter = new Erosion( );
// apply the filter
Bitmap tmp3 = erosionFilter.Apply( tmp2 );
The simplest motion detector is ready! We can highlight the motion regions if
needed.
// extract red channel from the original image
IFilter extrachChannel = new ExtractChannel( RGB.R );
Bitmap redChannel = extrachChannel.Apply( image );
// merge red channel with motion regions
Merge mergeFilter = new Merge( );
mergeFilter.OverlayImage = tmp3;
Bitmap tmp4 = mergeFilter.Apply( redChannel );
// replace red channel in the original image
ReplaceChannel replaceChannel = new ReplaceChannel( RGB.R );
replaceChannel.ChannelImage = tmp4;
Bitmap tmp5 = replaceChannel.Apply( image );
Here is the result of it:
25
From the above picture we can see the disadvantages of the approach. If the object
is moving smoothly we'll receive small changes from frame to frame. So, it's
impossible to get the whole moving object. Things become worse, when the object is
moving so slowly, when the algorithms will not give any result at all.
There is another approach. It's possible to compare the current frame not with the
previous one but with the first frame in the video sequence. So, if there were no
objects in the initial frame, comparison of the current frame with the first one will
give us the whole moving object independently of its motion speed. But, the
approach has a big disadvantage - what will happen, if there was, for example, a car
on the first frame, but then it is gone? Yes, we'll always have motion detected on the
place, where the car was. Of course, we can renew the initial frame sometimes, but
still it will not give us good results in the cases where we cannot guarantee that the
first frame will contain only static background. But, there can be an inverse situation.
If we put a picture on the wall in the room? We'll get motion detected until the initial
frame will be renewed.
The most efficient algorithms are based on building the so called background of the
scene and comparing each current frame with the background. There are many
approaches to build the scene, but most of them are too complex. We'll describe the
Andrew Kirillov’s approach here for building the background. It's rather simple and
can be realized very quickly.
As in the previous case, let's assume that we have an original 24 bpp RGB image
called current frame (image), a grayscale copy of it (currentFrame) and a background
frame also gray scaled (backgroundFrame). At the beginning, we get the first frame
of the video sequence as the background frame. And then we'll always compare the
current frame with the background one. But it will give us the result I've described
above, which we obviously don't want very much. Our approach is to "move" the
background frame to the current frame on the specified amount (e.g. 1 level per
frame). We move the background frame slightly in the direction of the current
frame- we are changing colors of pixels in the background frame by one level per
frame.
26
// create filter
MoveTowards moveTowardsFilter = new MoveTowards( );
// move background towards current frame
moveTowardsFilter.OverlayImage = currentFrame;
Bitmap tmp = moveTowardsFilter.Apply( backgroundFrame );
// dispose old background
backgroundFrame.Dispose( );
backgroundFrame = tmp;
Let P1 be the value of pixel in the first image (current frame), P2 be the value in
second image (background). We then move the value of P2 towards the value of P1,
thereby minimizing the difference between P1 and P2. Hence, the formula is:
P2 += min ( level, |P2 – P1| ) * sgn( P1 – P2 )
where level is the "speed" of moving the background frame towards the current
frame and sgn(x) = 1, if x >= 0 & sgn(x) = –1, if x < 0;
And now, we can use the same approach we've used above. But, let us extend it
slightly to get a more interesting result
// create processing filters sequence
FiltersSequence processingFilter = new FiltersSequence( );
processingFilter.Add( new Difference( backgroundFrame ) );
processingFilter.Add( new Threshold( 15 ) );
processingFilter.Add( new Opening( ) );
processingFilter.Add( new Edges( ) );
// apply the filter
Bitmap tmp1 = processingFilter.Apply( currentFrame );
// extract red channel from the original image
IFilter extrachChannel = new ExtractChannel( RGB.R );
Bitmap redChannel = extrachChannel.Apply( image );
// merge red channel with moving object borders
Merge mergeFilter = new Merge( );
mergeFilter.OverlayImage = tmp1;
Bitmap tmp2 = mergeFilter.Apply( redChannel );
// replace red channel in the original image
ReplaceChannel replaceChannel = new ReplaceChannel( RGB.R );
replaceChannel.ChannelImage = tmp2;
Bitmap tmp3 = replaceChannel.Apply( image );
Now it looks much better!
27
There is another approach based on the idea. As in the previous cases, we have an
original frame and a gray scaled version of it and of the background frame. But let's
apply Pixellate filter to the current frame and to the background before further
processing.
// create filter
IFilter pixellateFilter = new Pixellate( );
// apply the filter
Bitmap newImage = pixellateFilter( image );
So, we have pixellated versions of the current and background frames. Now, we
need to move the background frame towards the current frame as we were doing
before. The next change is only the main processing step:
// create processing filters sequence
FiltersSequence processingFilter = new FiltersSequence( );
processingFilter.Add( new Difference( backgroundFrame ) );
processingFilter.Add( new Threshold( 15 ) );
processingFilter.Add( new Dilatation( ) );
processingFilter.Add( new Edges( ) );
// apply the filter
Bitmap tmp1 = processingFilter.Apply( currentFrame );
After merging tmp1 image with the red channel of the original image, we'll get the
following image:
May be it looks not so perfect as the previous one, but the approach has a great
possibility for performance optimization.
Looking at the previous picture, we can see that objects are highlighted with a curve,
which represents the moving object's boundary. But sometimes it's more likely to
get a rectangle of the object. Not only this, what should be done if we want, not just
to highlight the objects, but also get their count, position, width and height? Don't
be afraid, it's easy. It can be done using the BlobCounter class from the Aforge
28
imaging library, which was developed recently. Using BlobCounter we can get the
number of objects, their position and the dimension on a binary image. So, let's try
to apply it. We'll apply it to the binary image containing moving objects, the result of
Threshold filter
BlobCounter blobCounter = new BlobCounter( );
...
// get object rectangles
blobCounter.ProcessImage( thresholdedImage );
Rectangle[] rects = BlobCounter.GetObjectRectangles( );
// create graphics object from initial image
Graphics g = Graphics.FromImage( image );
// draw each rectangle
using ( Pen pen = new Pen( Color.Red, 1 ) )
{
foreach ( Rectangle rc in rects )
{
g.DrawRectangle( pen, rc );
if ( ( rc.Width > 15 ) && ( rc.Height > 15 ) )
{
// here we can higligh large objects with something else
}
}
}
g.Dispose( );
Here is the result of this small piece of code:
Figure 16 – Motion detection algorithm
29
This in turn gives us efficient results and thereby proves to be relatively “easy” on
machine processing. The motion detected is grouped as per defined parameters
(alterable) and then eventually formed is a MER – the Minimum Enclosed Rectangle.
Multiple people within a room result in multiple MERs being formed.
Figure 17 - MER Formation on an occupant (moving) within the room
Now to eventually answer the core question- “How can we know where in the room
a person is?” the MER formed is checked for “overlap” with the outlined HotSpots.
At the time of the initial deployment, upon marking all the HotSpots, all the
coordinates which lie in their respective HotSpots are once determined. These
HotSpots, however, generally being marked irregularly on the camera-instanced
perspective image, are saved as irregular closed polygons, and hence make this
process not so straightforward. A rectangle is drawn upon every HotSpot, such that it
completely encloses the irregular polygon, and thereafter, inside the rectangle is
drawn a point grid. Every point in that grid is then checked to see if it exactly lies
within the closed complex polygon using the Point-In-Polygon Algorithm.
3.2.2 The Point-In-Polygon Algorithm
The Point-In-Algorithm compares each side of the polygon to the Y (vertical)
coordinate of the test point, and compiles a list of nodes where each node is a point
where one side crosses the Y threshold of the test point. If there are an odd number
of nodes on each side of the test point, then it is inside the polygon; if there are an
even number of nodes on each side of the test point, then it is outside the polygon.
30
Figure 18.1 demonstrates a typical case of a severely concave
polygon with 14 sides. The red dot is a point which needs to be
tested, to determine if it lies inside
Figure 18.1
Figure 18.2 shows what happens if the polygon crosses itself. In this
example, a ten-sided polygon has lines which cross each other. The
Figure 18.2
effect is much like “exclusive or,” or XOR as it is known to assembly-
language programmers. The portions of the polygon which overlap cancel each
other out. So, the test point is outside the polygon, as indicated by the even number
of nodes (two and two) on either side of it.
In Figure 18.3, the six-sided polygon does not overlap itself, but it does have
lines that cross. This is not a problem; the algorithm still works fine
Figure 18.3
Figure 18.4 demonstrates the problem that results when a vertex of the
Figure 18.4 polygon falls directly on the Y threshold. Since sides a and b both touch
the threshold, should they both generate a node? No, because then there would be
two nodes on each side of the test point and so the test would say it was outside of
the polygon, when it clearly is not! The solution to this situation is simple. Points
which are exactly on the Y threshold must be considered to belong to one side of the
threshold. Let’s say we arbitrarily decide that points on the Y threshold will belong
to the “above” side of the threshold. Then, side a generates a node, since it has one
endpoint below the threshold and its other endpoint on-or-above the threshold.
Side b does not generate a node, because both of its endpoints are on-or-above the
threshold, so it is not considered to be a threshold-cross
Figure 18.5 shows the case of a polygon in which one of its sides
lies entirely on the threshold. Side c generates a node, because it
has one endpoint below the threshold and its other endpoint onFigure 18.5
or-above the threshold. Side d does not generate a node, because it has both
endpoints on-or-above the threshold. And side e also does not generate a node,
because it has both endpoints on-or-above the threshold.
31
All points in the grid which lie within the particular HotSpot polygon are thereby
saved in a special data structure. This cumbersome one-time-process, upon
completion, gives approximately all the points which make up the respective
HotSpots/region of particular locations on the perspective image.
To then check for the percentage overlap of the defined locations by the human
user, all these coordinates are checked to see if the rectangular MER contains them.
The percentage overlap of the HotSpot is first judgment of knowing human activity,
i.e. for example, a person has entered the study table region, a person has entered
the bed region, etc. This is the “Has Entered” type of HotSpot-based calculation.
Fundamental to us is also to know if a person, who has come and sat at the study
table, is that person now “Leaving” from the study-table or is the person now
“Staying” there, since the behavior of all appliances may be very different in all three
cases. This is more obvious in the example of the bed, where upon ‘Has Entered’
perhaps the bed-side lamp should simply switch on without disturbing any of the
other appliances in the room; upon ‘Leaving’ should simply switch off the bed-side
lamp, indicating that perhaps the person just sat down on the bed for a short while
and has left now; and finally upon ‘Staying’ the bed-side lamp stays lit, whereas after
respective due time periods the study-table lamp and then the room lights should
switch off, indicating that the person was initially lying on bed and has now fallen
asleep! Differentiating between the three types of HotSpot calculations in the
algorithm; ‘Has Entered’ is set simply when the percentage overlap crosses the preset threshold. Once ‘Has Entered’ is set, a timer is triggered. ‘Staying’ is set if the
current overlap value has decreased compared to the previous frame's overlap and
thereafter also if the MER has stayed inside the HotSpot for a specified set duration
of time. Once ‘Staying’ is set, the timer is then reset. To ensure that ‘Leaving’ is only
triggered by a MER inside a particular HotSpot, the distance between the MER’s
centre and the HotSpot’s centre is first calculated and compared to a pre-set
minimum threshold. This ensures that the MER actually is within the HotSpot.
Thereafter, the Overlap value being less that the earlier ‘Has Entered’ value at a lowset Overlap- threshold value (rather than plainly at 0, as motion may disappear
suddenly thereby reducing the Overlap to 0 anyways then) ensures that ‘Leaving’ can
now be safely set.
32
Figure 19 – The system application
33
3.3 The Decision-taking Software Engine
Other than the HotSpot Overlap calculations made from the video-stream inputs,
there are two other input parameters which are vital to judge appliance behavior in
the system; these system-attained parameters being the Season and the Time of
Day. To exemplify, for a person “Leaving” the room on a summer night, the room
lights and fans should not switch off immediately, as it may well result in irritation
for the person in the dark. However, had it been the summers but the daytime, it
would have perhaps been alright to switch off the lights immediately, but the fans
after a little while.
Together, the three parameters, namely the HotSpot Overlap calculations, the
Season and the Time of Day are input into the software engine. The engine, using the
relational database-based expert system, checks for all defined outputs- appliance
“reactions”, against the given inputs. For any given input set, there may well be all
appliance-states being affected just as there may just be none. A reaction generated
based on the actions triggered, is again a set comprising of three parameters- the
effected appliance, its new state which is then to be set, and the triggering time. The
last parameter- the triggering time is what controls the “irritation factor” in the
system. As exemplified earlier, it is not desired that lights and fans be going “on” and
“off” immediately, and that too at the slightest of movements, as in may become
highly irritating for people. The triggering time therefore sets time-limits upon
appliances after which they may switch their states, once being generated from the
knowledge base.
34
3.4 The Appliance-control Mechanism
A reaction, after having waited for its respective trigger-time, is then “fired”. A signal
is serially transmitted, via the RS232 port, to the electronic circuitry on the
breadboard. The signal, consisting of the microcontroller port information and a
“byte-of-data”, contains information of the appliance to be controlled and its new
state to-be. This is then interpreted by a programmed ATMEL AT89C51
microcontroller on the electronic board, which thereby identifies the precise pin to
which that particular electronic device is attached to and the bit-signal (0/1) to be
sent to that pin. However, as the microcontroller does not provide sufficient current
to drive the relays controlling the electronic appliances (lights, fans, etc), transistors
are used in midway. Hence, upon the microcontroller firing the final bit-signal, the
electronic devices are successfully switched on and off.
Figure 20 – The appliance control mechanism
Lastly, in a real-life practical scenario, there may be scenarios with not a single, but
multiple people within a room engaged in different activities. For e.g. two people
may enter a room, where one sits down at the study-table to study, and the other
sits on the sofa to watch television. Clearly then, there are multiple MERS formations
within the environment and owing to different activities, there will be overlapping of
multiple HotSpots at the same time. This, hence means, that different appliances
perhaps may have to behave differently and yet in parallel. In the above example,
the study-table lamp should switch on for one of the persons, and so should the softside lamp at the same time for the other.
vSmart, to cater for appliance control in parallel/simultaneously, smartly employs
the concept of multi-threading. For every MER formation that takes into place, a new
thread is spawned. Each child thread is then responsible for all tasks starting from
the HotSpot overlap calculations to the eventual reactions being fired off.
Thereafter, the thread is simply killed.
35
4. PROJECT LIMITATIONS
It is worthy to mention that the projects’ Computer vision-based approach, by no
means, serves to be a 100% complete solution for the home automation industry. In
fact, the project yet evolving from its ideation phase, as its first prototype, indeed
has its significant limitations as well. Some of them are outlined below:
There is no way of identifying the different choices of different people within an
environment; the system lacks user-identification & prioritization. For example, Mr.
X would want the study-table lamp to be switched on when he enters the room,
whereas, on the contrary, Ms. Y would find that of no sensible use, and would hence
desire else wise. How should the system then behave? With whom should it comply
with?
The system does not take into account the common “unusual” behavioral
aspects of people. For example, Mr. Z may have a habit of watching television at
night, whilst lying down in bed rather than seating himself on the sofa. Upon doing
so, the system would eventually detect Mr. Z to be “Staying” in bed, and assuming
that he is sleeping (still as he is perhaps), turn off all the room lights and appliances!
The low-cost CMOS cameras fail to function in complete darkness. For example,
if at night, the system detects that a person is “Staying” on bed, it turns off the
rooms lights. In case that the person wakes up later at night for a glass of water, the
system would not function since the normal cameras would not function in complete
darkness.
The very genuine privacy concerns of the people. Having to use cameras in the
system for monitoring user activity in personal environments, such as bedrooms,
thereby raises genuine privacy concerns and the feeling of uneasiness with people.
36
5. CONCLUSION
vSmart achieves at a basic, yet a truly foundational level, a flexible and usercustomizable system which using low-cost cameras intelligently controls basic
domestic appliances, such as room lights, fans and lamps in local environments, such
as homes and offices.
However, yet evolving from its ideation phase, as its first prototype, it merely serves
to mark a niche as an innovative approach taken into the concept of homeautomation using the much-evolving scientific discipline of Computer Vision.
We much firmly believe that research and further experimentation using this lateral
approach and its fusion thereon with common technological gadgetries such as RFID
and Infra-Red sensors can certainly much evolve these basic, yet solid and
foundational, results to new epoch heights; taking the vision of Smart Houses from
the realm of imagination to actual practice, and thereby achieving multidimensional
user-convenience and using modern technology to endow a sustainable
environment in our everyday lives.
37
REFERENCES
 http://www.smarthouse.com.au/Automation/Industry/R7X7C6F8?page=1
“Where is Home Automation Going?”, Richard A Quinnell
 http://code.google.com/p/aforge, AForge.NET, Andrew Kirillov
 http://www.codeproject.com/KB/audio-video/Motion_Detection.aspx,
Motion Detection Algorithms, Andrew Kirillov
 http://www.alienryderflex.com/polygon, Point-In-Polygon Algorithm, Darel
Rex Finley
 http://www.atmel.com/dyn/products/product_card.asp?part_id=1930,
AT89C51 Microcontroller
 Robert L. Boylestad and Louis Nashelsky, Electronic Devices and Circuit
Theory, 9th Edition
38
THE TEAM
Team Members
Syed Waqas Ali Burney (2004185)
+ 92 300 4211047
mail@waqasburney.com
Mutahira Ikram Khan (2004136)
+ 92 300 3858722
mutahirakhan@gmail.com
Team Advisors
Mr. Badre Munir, FCSE
+ 92 345 9491428
badre.munir@gmail.com
Mr. Umar Shafique, FES
+ 92 321 5374563
umar.shafique@gmail.com
39