FINAL YEAR PROJECT THESIS Project Members Project Supervisors
Transcription
FINAL YEAR PROJECT THESIS Project Members Project Supervisors
FINAL YEAR PROJECT THESIS Project Members Syed Waqas Burney 2004185 [mail@waqasburney.com] Mutahira Ikram Khan 2004136 [mutahirakhan@gmail.com] Project Supervisors Mr. Badre Munir FCSE [badr@giki.edu.pk] Mr. Umar Shafique FES [shafique@giki.edu.pk] Faculty of Computer Science & Engineering GIK Institute, Pakistan [May, 2008] Smart Control of Domestic Appliances using a Computer Vision-based Approach Codename: imaGInation KIeve 1 CERTIFICATE OF APPROVAL It is certified that the work contained in this thesis, entitled “vSmart - Smart Control of Domestic Appliances using a Computer Vision-based Approach”, was carried out by Syed Waqas Ali Burney and Mutahira Ikram Khan under the supervision of Mr. Badre Munir, FCSE and Mr. Umar Shafique, FES for the partial fulfillment of the Degree Requirement of Bachelors of Sciences in Computer System Engineering. Project Advisors [May 10, 2009] Mr. Badre Munir Mr. Umar Shafique 2 ACKNOWLEDGEMENTS First and foremost, we would like to thank Almighty Allah for Blessing us with all the energy, enthusiasm, knowledge, wisdom, courage and much more, to help us achieve our goals and complete, very successfully, our Final Year Project. We are thankful to our parents for providing us with their invaluable love and support, and instilling in us the very confidence and morale that set us out in the search for success. We would also like to express our earnest gratitude to our advisors, Mr. Badre Munir and Mr. Umar Shafique for their priceless guidance and stimulation throughout the course of the project, and for providing to us constructive support at every step. Last, but definitely not the least, we would like to thank our many student mentors and friends without whom this project would not have been possible. The deepest and most gratifying words of thanks’ for our friends Mohammad Yousuf (GIKI Class of 2007) and Hafiz Faheem Raza (GIKI, Class of 2007) for their endless support, heartfelt efforts and for making this all possible. Thank you so much. Due acknowledgements also to our mentors at Intel– Dr. Edwin Chung & Edwin Lee, and the many others- Rana Mohammad Bilal (GIKI, Class of 2009), Iqbal Talat Bhatti (GIKI, Class of 2006), Mr. Murtaza Shabbir Safri (GIKI, Class of 2006), Mr. Junaid Shahid (GIKI, Class of 2006) and Mr. Mohammad Nasrullah (GIKI, Class of 2004). Thank you all! 3 Dedicated to our parents... 4 TABLE OF CONTENTS PROJECT TITLE ...........................................................................................................1 CERTIFICATE OF APPROVAL ......................................................................................2 ACKNOWLEDGEMENTS .............................................................................................3 EXECUTIVE SUMMARY ..............................................................................................7 1. INTRODUCTION .....................................................................................................8 1.1 BACKGROUND ..............................................................................................8 1.2 OBJECTIVE ....................................................................................................8 2. PROJECT DESIGN ....................................................................................................9 2.1 ARCHITECTURE OVERVIEW ...........................................................................9 2.2 MODULARIZATION .....................................................................................10 2.3 MODEL DESIGN ...........................................................................................11 2.4 HARDWARE DESIGN....................................................................................13 2.4.1. SERIAL INTERFACING ..............................................................................13 2.4.2. MICROCONTROLLER INTERFACING .............................................................14 2.4.3. INTERFACE CIRCUITRY .............................................................................16 2.5 SOFTWARE DESIGN .....................................................................................18 2.5.1. DATABASE DESIGN ................................................................................18 3. PROJECT FUNCTIONALITY ....................................................................................21 3.1 SYSTEM DEPLOYMENT AND CONFIGURATION ...........................................21 3.2 HUMAN PRESENCE DETECTION AND ACTIVITY MONITORING ...................24 3.2.1. MOTION DETECTION FOR CONTINOUS VIDEO STREAMS ................................24 3.2.2. THE POINT-IN-POLYGON ALGORITHM .........................................................30 3.3 THE DECISION-TAKING SOFTWARE ENGINE ................................................34 3.4 THE APPLIANCE CONTROL MECHANISM .....................................................35 4. PROJECT LIMITATIONS .........................................................................................36 5. CONCLUSION .......................................................................................................37 REFERENCES.............................................................................................................38 5 TABLE OF FIGURES FIGURE 01 - THE VSMART ARCHITECTURE ............................................................... 9 FIGURE 02 - MODEL SKETCH UP .............................................................................. 11 FIGURE 03 - INITIAL MODEL-BUILD SNAPSHOTS ..................................................... 11 FIGURE 04 - POST-APPLIANCE SETUP SNAPSHOTS ................................................. 12 FIGURE 05 - A4TECH PK 5 WEB CAMERA ................................................................ 12 FIGURE 06 - THE RS232 CONNECTOR ..................................................................... 13 FIGURE 07 - THE MICROCONTROLLER SNAPSHOT .................................................. 14 FIGURE 08 - MICROCONTROLLER CONNECTIVITY TO THE SERIAL PORT ................ 15 FIGURE 09 - THE INTERFACE CIRCUITRY .................................................................. 16 FIGURE 10 - THE CIRCUIT DIAGRAM ....................................................................... 17 FIGURE 11 - THE DATABASE DIAGRAM .................................................................. 18 FIGURE 12 - CAMERA REGISTRATION .................................................................... 21 FIGURE 13 - DEPLOYED CAMERA-VIEW SNAPSHOTS ............................................. 21 FIGURE 14 - DEFINING HOTSPOTS FOR THE SYSTEM .............................................. 22 FIGURE 15 - INITIAL SYSTEM CONFIGURATIONS .................................................... 23 FIGURE 16 - MOTION DETECTION ALGORITHMS .................................................... 29 FIGURE 17 - MER FORMATION ON A MOVING OCCUPANT .................................... 30 FIGURE 18 - THE POINT-IN-POLYGON FIGURES ....................................................... 31 FIGURE 19 - THE SYSTEM APPLICATION ................................................................. 33 FIGURE 20 - THE APPLIANCE CONTROL MECHANISM ............................................. 35 6 EXECUTIVE SUMMARY vSmart (codenamed: ImaGInation KIeve) achieves at a basic, yet a truly foundational level, a flexible and user-customizable system which using low-cost cameras intelligently controls basic domestic appliances, such as room lights, fans and lamps in local environments, such as homes and offices. However, yet evolving from its ideation phase, as its first prototype, it merely serves to mark a niche as an innovative approach taken into the concept of homeautomation using the much-evolving scientific discipline of Computer Vision. We much firmly believe that research and further experimentation using this lateral approach and its fusion thereon with common technological gadgetries such as RFID and Infra-Red sensors can certainly much evolve these basic, yet solid and foundational, results to new epoch heights; taking the vision of Smart Houses from the realm of imagination to actual practice, and thereby achieving multidimensional user-convenience and using modern technology to endow a sustainable environment in our everyday lives. 7 1. INTRODUCTION 1.1 Background The dream of an intelligent home that automatically controls the living environment and responds to individual preferences has been around, amidst much research, for more than three decades. However, high costs, limited capability, difficulty-in-use factors and reliability issues have imposed major constraints on the market, keeping home automation more in the realm of imagination than practice. Domotics is the field of study of specific automation requirements for homes, and the application of innovative techniques for more comfort and convenience in and around the home. At our time initial literature reading and survey, what was surprising to us as a group, was that a much of the work in this growing field has actually been on-going using various electronic devices, microcontrollers and many sensor-based approaches. Reasons for this would primarily perhaps attribute to cost control, however, thereby compromising with flexibility and general practical use. 1.2 Objective As computer science and engineering majors’, the team upon much brain-storming and ideation, decided to take up a unique challenge to experiment a fusion of Domotics with the much-evolving scientific discipline of Computer Vision. Computer Vision is a branch of Applied computing concerned with computer processing of images from the real world for the extraction of useful information. The aim of the project was to prototype, at a very foundational level, a flexible and user-customizable system which would use low-cost cameras to intelligently control basic domestic appliances, such as room lights, fans and lamps in local environments, such as homes and offices. 8 2. PROJECT DESIGN 2.1 Architecture Overview vSmart, once configured as per the needs of a user for a given environment, maintains a knowledge base for the 'smart' behavior of various electronic appliances. The prototype upon deployment, using the low cost and low resolution (web) cameras, detects for optimized motion activity within these local environments, and thereby feeds the detected activity to the software engine. The motion detected signifies user(s) presence and movements within the environment. The expertsystem here-on, makes the decisions and by spawning multiple threads, concurrently controls the various, serially interfaced, domestic hardware appliances set upon the prototype model house Figure 1 - The vSmart architecture 9 2.2 Modularization The vSmart architecture, as show in Figure 1, can be modularized as the following: System Deployment and Configuration Camera Deployments Camera Registration Defining Hotspots System Configuration Detecting Human Presence and Activity Monitoring Motion Detection MER formation Threading The Decision-taking Software Engine Action Triggering HotSpot overlap calculations Season Time Reaction Generation Trigger Times Wait Trigger Firing The Appliance-control Mechanism Serial Transmission Microcontroller Appliance Control 10 2.3 Model Design vSmart was prototyped upon a realistic scaled-down model house. The model, 3.5’x3.5’x1.5’ in dimensions, was a double-walled structure for internal wiring, and depicted a furnished two-bedroom house of dimensions 1.5’x1.5’x 1.5’ each. Moreover, the elevated model had a thin cardboard base for usage of magnets on magnetic toy-men, 1.5”x1”x2” in dimensions, to depict natural human movements within the house. All furniture thereby also had cuts drawn on its sides for the toymen to “move in” into the furniture-space. Model sketch Up Outer Slide-Up-and-Removable Walls ROOM Normal Walls Wiring concealed in between by the removable Outer Wall Figure 2 Initial model-build snapshots Figure 3 11 Post-appliance setup snapshots Cameras Figure 4 Low-cost A4tech Pk-5 web cameras were used for the vision-based system on boththe actual model itself, and also during demo-testing in an actual student hostelroom. These 1/4"CMOS cameras, having a frame rate of 30fps at a 320x240 picture resolution, had a view angle of 54°, and were thereby aligned for maximum required coverage within the given environment. The system used was a regular home computer- a Pentium 4 (2.4Ghz), with 512MB DDR2 Ram and an in-build VGA card. Camera Specifications Image Sensor:1/4"CMOS, 640×480pixels Frame Rate:30fps@640x480, @600x800, @320x240, @160x120 Lens: F=2.4,f=4.9mm View Angle:54 degree Focus Range: Automatic focus, 30cm to infinity Exposure Control: Automatic White Balance: Automatic Still Image Capture Res.:1280X960,600x800, 640X480, 352x288, 320x240 Figure 5 12 2.4 Hardware Design The electronic interfacing circuitry is divided into three parts, which are as follows: Serial interfacing Microcontroller interfacing Main circuitry 2.4.1 Serial interfacing: The RS232 serial port is used to transmit data between the PC and the microcontroller. One of the major functions of the serial port is to put data into a serial format so that it can be transmitted via modem. The one good feature is that the RS232 needs three wires between the PC and the microcontroller. One line is data transmit; one line is data receive, and the last line is a common ground between the two devices. The draw backs to using RS232 are that it uses negative logic where a ‘1’ is -3V to 12V and a ‘0’ is +3V to +12V and the region from -3V to 3V is undefined. The microcontroller uses standard TTL logic so the RS232 signal has to be sent through another device to convert the negative logic back to TTL. This adds hardware to the system which adds difficulty to production. MAX233 chip can be used which converts the negative logic to TTL and keeps the data in a serial format. Figure 6 – The RS232 Connector 13 IBM PC/Compatible computers based on x86 microprocessors normally have two COM ports. Both COM ports have RS-232 type connectors. Many PCs use one each of the DB-25 and DB-9 RS232 connectors. The COM ports are designated as COM 1 and COM 2. At the present time COM 1 is used for the mouse and COM 2 is available for devices such as a modem. 89C51 serial port can be connected to the COM 2 port of a PC for serial communications experiments. 2.4.2 Microcontroller Interfacing ATMEL AT89C51 microcontroller is used in the project. The AT89C51 is a low-power, highperformance CMOS 8-bit microcomputer with 4K bytes of Flash programmable and erasable read only memory (PEROM). The device is manufactured using Atmel’s high-density Figure 7 – The microcontroller snapshot nonvolatile memory technology. The on-chip Flash allows the program memory to be reprogrammed in-system or by a conventional nonvolatile memory programmer. By combining a versatile 8-bit CPU with Flash on a monolithic chip, the Atmel AT89C51 is a powerful microcomputer which provides. A highly-flexible and costeffective solution to many embedded control applications. There are advanced generations of microcontroller available these days but as this prototype doesn’t require real time data, 89c51 fulfills the project requirement. The 8051 has two pins that are used specifically for transferring and receiving data serially. These two pins are called TxD and RxD and are of the part of the port 3 group. (P3.0 and P3.1). Pin 11 of the 89C51 is assigned to TxD and pin 10 is designated as RxD. These pins are TTL compatible. One such line driver is the MAX233 chip. Since the RS232 is not compatible with today’s microcontrollers we need a line driver to convert RS232 signal to TTL’s voltage levels that will be acceptable to the TxD and RxD pins. One example of such a converter is MAX232 and MAX 233. MAX 233 converts from RS232 voltage levels to TTL voltage levels, and vice versa. One advantage of the MAX233 chip is that it uses +5 V power-source which is the same 14 source as that of the microcontroller. MAX 233 has two sets of line drivers for transferring and receiving data. The line drivers used for TxD are called T1 and T2. While the line drivers for RxD are designated as R1 and R2. In this project only one of each is used. Signal send to the microcontroller by the software has the information regarding which device is to be controlled and to which pin of the controller it is connected to. Microcontroller is programmed to interpret the signal in such a way that it identifies which electronic device, in which part of the house, it has to turn on or off. Port1 and Port2 are used to connect 16 devices. Microcontroller has 4 ports and one controller can be used to connect 30 devices. To further extend the number of devices to be controlled, latches can be connected to the controller port and then devices to the latches. Figure 8 – Microcontroller connectivity to the serial port circuit diagram 15 2.4.3 Interface Circuitry: Microcontroller is attached to the devices through the main interface circuitry. Interface circuitry consists of the following components: 2n222 Transistors Diodes Relays Capacitors Resistors Figure 9 – Interface Circuitry Transistors: Microcontroller does not provide sufficient current to drive the relays. Transistors are used to provide sufficient current to derive the relay as well as to protect the microcontroller from burning off. An electrical signal can be amplified by using a device that allows a small current or voltage to control the flow of a much larger current. Transistors are the basic devices providing control of this kind. Modern transistors are divided into two main categories: bipolar junction transistors (BJTs) and field effect transistors (FETs). Application of current in BJTs and voltage in FETs between the input and common terminals increases the conductivity between the common and output terminals, thereby controlling current flow between them. Diodes: Diodes allow electricity to flow in only one direction. The arrow of the circuit symbol shows the direction in which the current can flow. Diodes are the electrical version of a valve and early diodes were actually called valves. Relays: A relay is an electrical switch that opens and closes under the control of another electrical circuit. Relays are connected to the devices and switch them off/on. When a current flows through the coil, the resulting magnetic field attracts an armature that is mechanically linked to a moving contact. The movement either makes or breaks a connection with a fixed contact. When the current to the coil is switched off, the armature is returned by a force approximately half as strong as the magnetic force to its relaxed position. Most relays are manufactured to operate 16 quickly. In a low voltage application, this is to reduce noise. In a high voltage or high current application, this is to reduce arcing. The contacts in the relay are described as "normally open" (NO) or "normally closed" (NC). This simply describes what the "at rest" state is. For a relay, that means if no power is applied to the coil/trigger wire. In the typical case where something is to be turned on, "normally open" set of contacts is used so that when power applied to the relay, the contacts close, and power is sent to the desired device. In the case of wanting to turn something off, "normally closed" set of contacts are used so that when power is applied to the relay, the contacts open and the power is no longer sent to the desired device. Resistor: Resistor is connected in the circuit to provide safety. If no resistor is added in the circuit, as VCC and ground are serially connected, there will be a heavy flow of current, which can damage/burn the coil of the relay. Figure 10 – The complete circuit diagram 17 2.5 Software Design 2.5.1 Database Design The database diagram as seen in the SQL Server diagram view: Figure 11 – The database diagram The database, simple and small as it is, incorporates 5 tables: Cams Polygons Coords 18 Appliances Reactions Cams The table contains the information about all the cameras plugged in to the system and their respective configurations ATTRIBUTE DESCRIPTION id The primary key of the table; each camera is assigned an ID name The moniker name of the camera roomName The room where a particular camera is deployed autoDeploy To set if the camera should auto or manually be started MERminHeight The minimum height limit of the MER MERmaxHeight The maximum height limit of the MER MERminWidth The minimum width limit of the MER MERmaxWidth The maximum width limit of the MER MERdiffThreshold The value of the threshold filter for the camera MERframesPerUpdate The “speed” with which the bg frame catches current frame MERframeSkipOption To set if frames should be allowed to be dropped MERframeSkip The number of frames to drop everytime Polygons The table contains basic information about the HotSpots marked for every camera ATTRIBUTE DESCRIPTION camId A foriegn key representing the camera associated Id The primary key of the table; each polygon is assigned an ID name The room where a particular camera is deployed Coods The table contains the coordinate information about all HotSpots ATTRIBUTE DESCRIPTION polyid A foriegn key representing the HotSpot polygon CoodOrder (Imp) To return coordinate in the order they were saved xCood The x-coordinate marked for the various polygons yCood The y-coordinate marked for the various polygons 19 Appliances The table contains the information of the appliances attached to the system for a particular environment ATTRIBUTE DESCRIPTION pinNo The primary key of the table; also the pin of the h/w device name The name of the appliance camID A foriegn key representing the camera associated currentState The present state (on/off) of the appliance Reactions This is the table alone which forms the knowledge base of the mini-expert system! ATTRIBUTE DESCRIPTION id The primary key of the table; each reaction is assigned an ID roomID A foriegn key representing the room associated polygonID A foriegn key representing the HotSpot associated appliancePinNo A foriegn key representing the appliance associated action Has Entered, Staying, Leaving isDay The boolean differenciating between the day and night isSummer The boolean differenciating between summer and winter applianceFinalState The final state of the appliance that is to be applianceTriggerTime The time-of-wait (ms) before the reaction is fired 20 3. PROJECT FUNCTIONALITY The vSmart modules, as outlined in the system architecture above, integrate to achieve complete functionality in the following mentioned steps: 3.1 System Deployment and Configuration The system, before being put for use, is to be carefully deployed at least once. System deployment involves mounting the cameras in the rooms at best-coverage locations, registering them within the application (plugand-play detection) and then optimally configuring them for the particular environment in which they have deployed. It may be noted that, if needed, multiple cameras may be deployed and registered with the system within the given locality. Figure 12 – Camera registration Figure 13 – Deployed-camera snapshots Marking “HotSpots”: Once this done, the “HotSpots” are marked for the given environment. HotSpots are enclosed regions (sets of coordinate points) marked on a “clean” camera-instanced “base-image”, which thereby define for the system the region of a particular location, e.g. the bed, in that camera captured environment. The marking is carried out using a simple draw-able mouse-pointer. 21 Figure 14 – Defining HotSpots for the system 22 The system, hence, now “knows” where for e.g. in a particular room are located the bed, study table, sofa, etc. It must be noted that these deployment-settings are valid only till when the furniture settings or the camera positioning has not been changed within the environment for that particular system-deployed camera. As a final step to system deployment comes configuring, as per user requirements, the decision-taking, multi-threaded software engine– the mini expert system. The relational database-based expert system consists of a large number of standard, predefined facts within. An example of in-built fact can be the particular that if in the Summer season and in the afternoon day-time, a person enters the room, then (only) the ceiling fan should switch on by itself, whereas all the lamps and lights need not to. These rules have been put up as a standard; however, as mentioned above, they can well- easily be customized from application GUI, as per requirements of the users in the environment. Moreover, based on changing seasons, number of users and other conditions system notifications may also be generated to recommend to or inquire from the users their preferences of appliance behavior in their environment. Figure 15 – Initial system configurations 23 3.2 Human Presence Detection and Activity Monitoring vSmart, following upon a vision-based approach, uses motion detection to judge the presence of human(s) and the activity in any given environment. The motion detection application is based on the AForge.NET framework. AForge.NET is a C# framework designed for developers and researchers in the fields of Computer Vision and Artificial Intelligence - image processing, neural networks, genetic algorithms, machine learning, etc. At this point the framework is comprised of 5 main and some additional libraries: AForge.Imaging – a library for image processing routines and filers; AForge.Neuro – neural networks computation library; AForge.Genetic – evolution programming library; AForge.Vision – computer vision library; AForge.Machine Learning – machine learning library. The work on the framework's improvement is in constants progress, what means that new feature and namespaces are coming constantly. For the project, we have used the framework’s Imaging and Vision libraries 3.2.1 Motion Detection for Continuous Video Streams There are many approaches for motion detection in a continuous video stream. All of them are based on comparing of the current video frame with one from the previous frames or with something that we'll call background. In this article, we'll try to describe some of the most common approaches: One of the most common approaches is to compare the current frame with the previous one. It's useful in video compression when you need to estimate changes and to write only the changes, not the whole frame. But it is not the best one for motion detection applications. Describing the idea more closely: Assume that we have an original 24 bpp RGB image called current frame (image), a grayscale copy of it (currentFrame) and previous video frame also gray scaled (backgroundFrame). First of all, let's find the regions where these two frames are differing a bit. For the purpose we can use Difference and Threshold filters. 24 // create filters Difference differenceFilter = new Difference( ); IFilter thresholdFilter = new Threshold( 15 ); // set backgroud frame as an overlay for difference filter differenceFilter.OverlayImage = backgroundFrame; // apply the filters Bitmap tmp1 = differenceFilter.Apply( currentFrame ); Bitmap tmp2 = thresholdFilter.Apply( tmp1 ); On this step we'll get an image with white pixels on the place where the current frame is different from the previous frame on the specified threshold value. It's already possible to count the pixels, and if the amount of it will be greater than a predefined alarm level we can signal about a motion event. But most cameras produce a noisy image, so we'll get motion in such places, where there is no motion at all. To remove random noisy pixels, we can use an Erosion filter, for example. So, we'll get now mostly only the regions where the actual motion was. // create filter IFilter erosionFilter = new Erosion( ); // apply the filter Bitmap tmp3 = erosionFilter.Apply( tmp2 ); The simplest motion detector is ready! We can highlight the motion regions if needed. // extract red channel from the original image IFilter extrachChannel = new ExtractChannel( RGB.R ); Bitmap redChannel = extrachChannel.Apply( image ); // merge red channel with motion regions Merge mergeFilter = new Merge( ); mergeFilter.OverlayImage = tmp3; Bitmap tmp4 = mergeFilter.Apply( redChannel ); // replace red channel in the original image ReplaceChannel replaceChannel = new ReplaceChannel( RGB.R ); replaceChannel.ChannelImage = tmp4; Bitmap tmp5 = replaceChannel.Apply( image ); Here is the result of it: 25 From the above picture we can see the disadvantages of the approach. If the object is moving smoothly we'll receive small changes from frame to frame. So, it's impossible to get the whole moving object. Things become worse, when the object is moving so slowly, when the algorithms will not give any result at all. There is another approach. It's possible to compare the current frame not with the previous one but with the first frame in the video sequence. So, if there were no objects in the initial frame, comparison of the current frame with the first one will give us the whole moving object independently of its motion speed. But, the approach has a big disadvantage - what will happen, if there was, for example, a car on the first frame, but then it is gone? Yes, we'll always have motion detected on the place, where the car was. Of course, we can renew the initial frame sometimes, but still it will not give us good results in the cases where we cannot guarantee that the first frame will contain only static background. But, there can be an inverse situation. If we put a picture on the wall in the room? We'll get motion detected until the initial frame will be renewed. The most efficient algorithms are based on building the so called background of the scene and comparing each current frame with the background. There are many approaches to build the scene, but most of them are too complex. We'll describe the Andrew Kirillov’s approach here for building the background. It's rather simple and can be realized very quickly. As in the previous case, let's assume that we have an original 24 bpp RGB image called current frame (image), a grayscale copy of it (currentFrame) and a background frame also gray scaled (backgroundFrame). At the beginning, we get the first frame of the video sequence as the background frame. And then we'll always compare the current frame with the background one. But it will give us the result I've described above, which we obviously don't want very much. Our approach is to "move" the background frame to the current frame on the specified amount (e.g. 1 level per frame). We move the background frame slightly in the direction of the current frame- we are changing colors of pixels in the background frame by one level per frame. 26 // create filter MoveTowards moveTowardsFilter = new MoveTowards( ); // move background towards current frame moveTowardsFilter.OverlayImage = currentFrame; Bitmap tmp = moveTowardsFilter.Apply( backgroundFrame ); // dispose old background backgroundFrame.Dispose( ); backgroundFrame = tmp; Let P1 be the value of pixel in the first image (current frame), P2 be the value in second image (background). We then move the value of P2 towards the value of P1, thereby minimizing the difference between P1 and P2. Hence, the formula is: P2 += min ( level, |P2 – P1| ) * sgn( P1 – P2 ) where level is the "speed" of moving the background frame towards the current frame and sgn(x) = 1, if x >= 0 & sgn(x) = –1, if x < 0; And now, we can use the same approach we've used above. But, let us extend it slightly to get a more interesting result // create processing filters sequence FiltersSequence processingFilter = new FiltersSequence( ); processingFilter.Add( new Difference( backgroundFrame ) ); processingFilter.Add( new Threshold( 15 ) ); processingFilter.Add( new Opening( ) ); processingFilter.Add( new Edges( ) ); // apply the filter Bitmap tmp1 = processingFilter.Apply( currentFrame ); // extract red channel from the original image IFilter extrachChannel = new ExtractChannel( RGB.R ); Bitmap redChannel = extrachChannel.Apply( image ); // merge red channel with moving object borders Merge mergeFilter = new Merge( ); mergeFilter.OverlayImage = tmp1; Bitmap tmp2 = mergeFilter.Apply( redChannel ); // replace red channel in the original image ReplaceChannel replaceChannel = new ReplaceChannel( RGB.R ); replaceChannel.ChannelImage = tmp2; Bitmap tmp3 = replaceChannel.Apply( image ); Now it looks much better! 27 There is another approach based on the idea. As in the previous cases, we have an original frame and a gray scaled version of it and of the background frame. But let's apply Pixellate filter to the current frame and to the background before further processing. // create filter IFilter pixellateFilter = new Pixellate( ); // apply the filter Bitmap newImage = pixellateFilter( image ); So, we have pixellated versions of the current and background frames. Now, we need to move the background frame towards the current frame as we were doing before. The next change is only the main processing step: // create processing filters sequence FiltersSequence processingFilter = new FiltersSequence( ); processingFilter.Add( new Difference( backgroundFrame ) ); processingFilter.Add( new Threshold( 15 ) ); processingFilter.Add( new Dilatation( ) ); processingFilter.Add( new Edges( ) ); // apply the filter Bitmap tmp1 = processingFilter.Apply( currentFrame ); After merging tmp1 image with the red channel of the original image, we'll get the following image: May be it looks not so perfect as the previous one, but the approach has a great possibility for performance optimization. Looking at the previous picture, we can see that objects are highlighted with a curve, which represents the moving object's boundary. But sometimes it's more likely to get a rectangle of the object. Not only this, what should be done if we want, not just to highlight the objects, but also get their count, position, width and height? Don't be afraid, it's easy. It can be done using the BlobCounter class from the Aforge 28 imaging library, which was developed recently. Using BlobCounter we can get the number of objects, their position and the dimension on a binary image. So, let's try to apply it. We'll apply it to the binary image containing moving objects, the result of Threshold filter BlobCounter blobCounter = new BlobCounter( ); ... // get object rectangles blobCounter.ProcessImage( thresholdedImage ); Rectangle[] rects = BlobCounter.GetObjectRectangles( ); // create graphics object from initial image Graphics g = Graphics.FromImage( image ); // draw each rectangle using ( Pen pen = new Pen( Color.Red, 1 ) ) { foreach ( Rectangle rc in rects ) { g.DrawRectangle( pen, rc ); if ( ( rc.Width > 15 ) && ( rc.Height > 15 ) ) { // here we can higligh large objects with something else } } } g.Dispose( ); Here is the result of this small piece of code: Figure 16 – Motion detection algorithm 29 This in turn gives us efficient results and thereby proves to be relatively “easy” on machine processing. The motion detected is grouped as per defined parameters (alterable) and then eventually formed is a MER – the Minimum Enclosed Rectangle. Multiple people within a room result in multiple MERs being formed. Figure 17 - MER Formation on an occupant (moving) within the room Now to eventually answer the core question- “How can we know where in the room a person is?” the MER formed is checked for “overlap” with the outlined HotSpots. At the time of the initial deployment, upon marking all the HotSpots, all the coordinates which lie in their respective HotSpots are once determined. These HotSpots, however, generally being marked irregularly on the camera-instanced perspective image, are saved as irregular closed polygons, and hence make this process not so straightforward. A rectangle is drawn upon every HotSpot, such that it completely encloses the irregular polygon, and thereafter, inside the rectangle is drawn a point grid. Every point in that grid is then checked to see if it exactly lies within the closed complex polygon using the Point-In-Polygon Algorithm. 3.2.2 The Point-In-Polygon Algorithm The Point-In-Algorithm compares each side of the polygon to the Y (vertical) coordinate of the test point, and compiles a list of nodes where each node is a point where one side crosses the Y threshold of the test point. If there are an odd number of nodes on each side of the test point, then it is inside the polygon; if there are an even number of nodes on each side of the test point, then it is outside the polygon. 30 Figure 18.1 demonstrates a typical case of a severely concave polygon with 14 sides. The red dot is a point which needs to be tested, to determine if it lies inside Figure 18.1 Figure 18.2 shows what happens if the polygon crosses itself. In this example, a ten-sided polygon has lines which cross each other. The Figure 18.2 effect is much like “exclusive or,” or XOR as it is known to assembly- language programmers. The portions of the polygon which overlap cancel each other out. So, the test point is outside the polygon, as indicated by the even number of nodes (two and two) on either side of it. In Figure 18.3, the six-sided polygon does not overlap itself, but it does have lines that cross. This is not a problem; the algorithm still works fine Figure 18.3 Figure 18.4 demonstrates the problem that results when a vertex of the Figure 18.4 polygon falls directly on the Y threshold. Since sides a and b both touch the threshold, should they both generate a node? No, because then there would be two nodes on each side of the test point and so the test would say it was outside of the polygon, when it clearly is not! The solution to this situation is simple. Points which are exactly on the Y threshold must be considered to belong to one side of the threshold. Let’s say we arbitrarily decide that points on the Y threshold will belong to the “above” side of the threshold. Then, side a generates a node, since it has one endpoint below the threshold and its other endpoint on-or-above the threshold. Side b does not generate a node, because both of its endpoints are on-or-above the threshold, so it is not considered to be a threshold-cross Figure 18.5 shows the case of a polygon in which one of its sides lies entirely on the threshold. Side c generates a node, because it has one endpoint below the threshold and its other endpoint onFigure 18.5 or-above the threshold. Side d does not generate a node, because it has both endpoints on-or-above the threshold. And side e also does not generate a node, because it has both endpoints on-or-above the threshold. 31 All points in the grid which lie within the particular HotSpot polygon are thereby saved in a special data structure. This cumbersome one-time-process, upon completion, gives approximately all the points which make up the respective HotSpots/region of particular locations on the perspective image. To then check for the percentage overlap of the defined locations by the human user, all these coordinates are checked to see if the rectangular MER contains them. The percentage overlap of the HotSpot is first judgment of knowing human activity, i.e. for example, a person has entered the study table region, a person has entered the bed region, etc. This is the “Has Entered” type of HotSpot-based calculation. Fundamental to us is also to know if a person, who has come and sat at the study table, is that person now “Leaving” from the study-table or is the person now “Staying” there, since the behavior of all appliances may be very different in all three cases. This is more obvious in the example of the bed, where upon ‘Has Entered’ perhaps the bed-side lamp should simply switch on without disturbing any of the other appliances in the room; upon ‘Leaving’ should simply switch off the bed-side lamp, indicating that perhaps the person just sat down on the bed for a short while and has left now; and finally upon ‘Staying’ the bed-side lamp stays lit, whereas after respective due time periods the study-table lamp and then the room lights should switch off, indicating that the person was initially lying on bed and has now fallen asleep! Differentiating between the three types of HotSpot calculations in the algorithm; ‘Has Entered’ is set simply when the percentage overlap crosses the preset threshold. Once ‘Has Entered’ is set, a timer is triggered. ‘Staying’ is set if the current overlap value has decreased compared to the previous frame's overlap and thereafter also if the MER has stayed inside the HotSpot for a specified set duration of time. Once ‘Staying’ is set, the timer is then reset. To ensure that ‘Leaving’ is only triggered by a MER inside a particular HotSpot, the distance between the MER’s centre and the HotSpot’s centre is first calculated and compared to a pre-set minimum threshold. This ensures that the MER actually is within the HotSpot. Thereafter, the Overlap value being less that the earlier ‘Has Entered’ value at a lowset Overlap- threshold value (rather than plainly at 0, as motion may disappear suddenly thereby reducing the Overlap to 0 anyways then) ensures that ‘Leaving’ can now be safely set. 32 Figure 19 – The system application 33 3.3 The Decision-taking Software Engine Other than the HotSpot Overlap calculations made from the video-stream inputs, there are two other input parameters which are vital to judge appliance behavior in the system; these system-attained parameters being the Season and the Time of Day. To exemplify, for a person “Leaving” the room on a summer night, the room lights and fans should not switch off immediately, as it may well result in irritation for the person in the dark. However, had it been the summers but the daytime, it would have perhaps been alright to switch off the lights immediately, but the fans after a little while. Together, the three parameters, namely the HotSpot Overlap calculations, the Season and the Time of Day are input into the software engine. The engine, using the relational database-based expert system, checks for all defined outputs- appliance “reactions”, against the given inputs. For any given input set, there may well be all appliance-states being affected just as there may just be none. A reaction generated based on the actions triggered, is again a set comprising of three parameters- the effected appliance, its new state which is then to be set, and the triggering time. The last parameter- the triggering time is what controls the “irritation factor” in the system. As exemplified earlier, it is not desired that lights and fans be going “on” and “off” immediately, and that too at the slightest of movements, as in may become highly irritating for people. The triggering time therefore sets time-limits upon appliances after which they may switch their states, once being generated from the knowledge base. 34 3.4 The Appliance-control Mechanism A reaction, after having waited for its respective trigger-time, is then “fired”. A signal is serially transmitted, via the RS232 port, to the electronic circuitry on the breadboard. The signal, consisting of the microcontroller port information and a “byte-of-data”, contains information of the appliance to be controlled and its new state to-be. This is then interpreted by a programmed ATMEL AT89C51 microcontroller on the electronic board, which thereby identifies the precise pin to which that particular electronic device is attached to and the bit-signal (0/1) to be sent to that pin. However, as the microcontroller does not provide sufficient current to drive the relays controlling the electronic appliances (lights, fans, etc), transistors are used in midway. Hence, upon the microcontroller firing the final bit-signal, the electronic devices are successfully switched on and off. Figure 20 – The appliance control mechanism Lastly, in a real-life practical scenario, there may be scenarios with not a single, but multiple people within a room engaged in different activities. For e.g. two people may enter a room, where one sits down at the study-table to study, and the other sits on the sofa to watch television. Clearly then, there are multiple MERS formations within the environment and owing to different activities, there will be overlapping of multiple HotSpots at the same time. This, hence means, that different appliances perhaps may have to behave differently and yet in parallel. In the above example, the study-table lamp should switch on for one of the persons, and so should the softside lamp at the same time for the other. vSmart, to cater for appliance control in parallel/simultaneously, smartly employs the concept of multi-threading. For every MER formation that takes into place, a new thread is spawned. Each child thread is then responsible for all tasks starting from the HotSpot overlap calculations to the eventual reactions being fired off. Thereafter, the thread is simply killed. 35 4. PROJECT LIMITATIONS It is worthy to mention that the projects’ Computer vision-based approach, by no means, serves to be a 100% complete solution for the home automation industry. In fact, the project yet evolving from its ideation phase, as its first prototype, indeed has its significant limitations as well. Some of them are outlined below: There is no way of identifying the different choices of different people within an environment; the system lacks user-identification & prioritization. For example, Mr. X would want the study-table lamp to be switched on when he enters the room, whereas, on the contrary, Ms. Y would find that of no sensible use, and would hence desire else wise. How should the system then behave? With whom should it comply with? The system does not take into account the common “unusual” behavioral aspects of people. For example, Mr. Z may have a habit of watching television at night, whilst lying down in bed rather than seating himself on the sofa. Upon doing so, the system would eventually detect Mr. Z to be “Staying” in bed, and assuming that he is sleeping (still as he is perhaps), turn off all the room lights and appliances! The low-cost CMOS cameras fail to function in complete darkness. For example, if at night, the system detects that a person is “Staying” on bed, it turns off the rooms lights. In case that the person wakes up later at night for a glass of water, the system would not function since the normal cameras would not function in complete darkness. The very genuine privacy concerns of the people. Having to use cameras in the system for monitoring user activity in personal environments, such as bedrooms, thereby raises genuine privacy concerns and the feeling of uneasiness with people. 36 5. CONCLUSION vSmart achieves at a basic, yet a truly foundational level, a flexible and usercustomizable system which using low-cost cameras intelligently controls basic domestic appliances, such as room lights, fans and lamps in local environments, such as homes and offices. However, yet evolving from its ideation phase, as its first prototype, it merely serves to mark a niche as an innovative approach taken into the concept of homeautomation using the much-evolving scientific discipline of Computer Vision. We much firmly believe that research and further experimentation using this lateral approach and its fusion thereon with common technological gadgetries such as RFID and Infra-Red sensors can certainly much evolve these basic, yet solid and foundational, results to new epoch heights; taking the vision of Smart Houses from the realm of imagination to actual practice, and thereby achieving multidimensional user-convenience and using modern technology to endow a sustainable environment in our everyday lives. 37 REFERENCES http://www.smarthouse.com.au/Automation/Industry/R7X7C6F8?page=1 “Where is Home Automation Going?”, Richard A Quinnell http://code.google.com/p/aforge, AForge.NET, Andrew Kirillov http://www.codeproject.com/KB/audio-video/Motion_Detection.aspx, Motion Detection Algorithms, Andrew Kirillov http://www.alienryderflex.com/polygon, Point-In-Polygon Algorithm, Darel Rex Finley http://www.atmel.com/dyn/products/product_card.asp?part_id=1930, AT89C51 Microcontroller Robert L. Boylestad and Louis Nashelsky, Electronic Devices and Circuit Theory, 9th Edition 38 THE TEAM Team Members Syed Waqas Ali Burney (2004185) + 92 300 4211047 mail@waqasburney.com Mutahira Ikram Khan (2004136) + 92 300 3858722 mutahirakhan@gmail.com Team Advisors Mr. Badre Munir, FCSE + 92 345 9491428 badre.munir@gmail.com Mr. Umar Shafique, FES + 92 321 5374563 umar.shafique@gmail.com 39