single hand gesture recognition based on dwt and dct feature
Transcription
single hand gesture recognition based on dwt and dct feature
SINGLE HAND GESTURE RECOGNITION BASED ON DWT AND DCT FEATURE EXTRACTION AND NEURO-FUZZY CLASSIFIER Kavitha Jaganathan Faculty of Creative Industries UTAR University kjmmu13@gmail.com Dr. Lili Nur Liyana, Dr. Razali Yakob School of Computer Science University Putra Malaysia liyana@fsktm.upm.edu.my Dr. M. Jaganathan Faculty of Communication Taylorβs University jaganathan46@gmail.com ABSTRACT Hand gestures in Bharatanatyam dance carry valuable information. Learning the meaning of hand gesture, mimic and practice them with the best way and high matching for the people who want to be expert in this field is necessary. In this paper, a combined feature extraction method based on the Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) is proposed. DWT with two level decompositions is applied to the image by size of 128 × 128. Two-dimensional DCT is then applied to chosen part and convert the coefficients of DCT to vector. Finally, neuro fuzzy classifier is used to classify the given images in some given classes. A suitable number of images with good illumination for different applications have been created. Many types of image processing techniques like rotation, scaling, and translation can also apply to the original database and make ready more options for any study. The experimental results show that the proposed method has good performance in most of single hand gestures. The dataset of single hand gesture in Bharatanatyam dance has been successfully created and it could serve as a benchmark dataset as well. Our proposed system is able to recognize single hand gesture with the accuracy of 93%. 56 out of 60 images of single hand gestures are correctly classified by the proposed system. This is because the parameters identified were the right signal, which gave the best 70 features to be classified and recognized. Keywords: Discrete Wavelet Transform, Discrete Cosine Transform, neuro fuzzy classifier, scaling coefficient, joint spatial, Adaptive neuro fuzzy inference system (ANFIS). 1. Introduction The culture of India is rich and has a lot of diversities. This paper intends to implement a fusion of image processing techniques with the aim of making the computer to approve the hand gesture for the accuracy of hand movement in Bharatanatyam dance. It can be classified into two categories: i) Asamyukta Hasta (single hand gestures); and ii) Samyukta Hasta (double hand gestures). There are 28 Asamyukta Hasta and 24 Samyukta Hasta. These single hand gestures are relatively unchanged over the years (Verma, 2009). Most of the previous dance gesture studies are focusing on the other E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 92 body parts motion or skeleton structure for dance gesture recognition (Dong et al., 2006; Heryadi et al., 2012; Saha et al., 2013(a). Earlier works attempt to recognize gesture with classifier that directly operates over single low-level feature such as color and texture. Therefore, we attempt to propose a combined feature extraction method in order to fill the research gap. Objective is to form based on contemporary feature extraction and pattern classification technique. The sub-objectives are as follows: a. To propose the combination of discrete wavelet transform and discrete cosine transform in feature extraction phase. b. To compare the feature space description and identify which is suitable for single hand gestures recognition in Bharanatyam dance. c. To increase the performance of the single hand gesture recognition in Bharanatyam dance. In addition, it is also important to highlight the use of the combination of discrete wavelet transform and discrete cosine transform in feature extraction phase, which increases the performance of the overall recognition system. Section 2 explains the various work done so far in this area. Section 3 describes the proposed system with its architecture. Section 4 comprises the experimental results and discussion. Section 5 concludes with future work. 2. Related Work Computer vision provides innovative solution to many computer-aided digital image-processing applications. One of the significant research areas includes human gesture recognition. The application of human gesture recognition provides advantages to many institutions or individual that employs this application. In addition, related works that deal with the image processing techniques and classification models are presented. Table 2.1. Analysis on surveys and reviews of hand gesture recognition Author(s) Year Description Rautaray and Agrawal 2012 An analysis on hand gesture recognition which focuses on its main phases, framework and software platform Shangeetha .R.K, Valliammai . V, Padmavathi . S 2012 An implementation of distance transform for both the hands is in progress and its robustness varies when there is an overlap of hands Corera and Krishnarajah 2011 An article on challenges in hand gesture recognition and its related application Wachs et al. 2011 Discussion on soft computing based methods for hand gesture recognition Chaudhary et al. 2011 A review on facial movement and hand gesture recognition E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 93 Table 2.2. Summary of static single hand gesture recognition approaches Authors Year Feature Extraction Technique Application Recognition Rate % Saha et al. 2013 Hand gesture recognition for Bharatanatyam dance 85.1% Mozarkar & Warnekar Feng and Yuan 2013 Boundary extraction using Sobel Hybrid saliency technique HoG features extraction algorithm Hand gesture recognition for Bharatanatyam dance Random hand gesture recognition 85.29%. Vieriu et al. 2013 Contour extraction Shangeetha et al. 2012 Distance transform Random hand gestures recognition Indian Sign Language recognition Yun et al. 2012 Hariharan et al. 2011 Multi-feature fusion Orientation filter Random hand gestures recognition Hand gesture recognition for Bharatanatyam dance Localized contour sequence Human-computer interaction application Feature point extraction method Indian Sign Language recognition 2013 Ghosh and Ari 2011 Rajam and Balakrishnan 2010 High recognition rate in both bright and dark environments 93.3%. Recognition accuracy reduce when fingers are bent 91% The algorithm is stable and has small error in experimental 99.6 % 98.125% 3. Methodology 3.1 Proposed Framework Figure 3.1. Framework of the proposed method Figure 3.2 Thisula collection of dataset E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 94 Figure 3.1 shows a proposed framework. As can be seen, it is divided into three major phases: preprocessing, feature extraction and classification. 3.2 Dataset Collection Figure 3.2 are captured from ten different performers aged between 20 and 33 years. They are from Temple of Fine Arts Academy and UTAR. It consists of 28 classes and there are 20 single hand gestures for each class. The background of the images is standardized to white concrete wall and all the images are taken using IPhone 4 with 5 megapixel iSight camera. 3.3 Skin Color Detection The step of the skin color detection in this paper is as shown in Figure 3.3. Figure 3.3 Skin Color Detection 3.4 Feature Extraction The input data is transformed into a reduced representation set of features or feature vectors (Russel, 2013) when the input data to an algorithm is too large to be processed and is suspected to be notoriously redundant (much data, but not much information). Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) have been applied for feature extraction process. The feature extracted is the binary object features. In order to extract the feature, a separate binary image is created. The pixel in the region of interest is assign as 1 and everything else is 0. The projection can be computed by summing all the pixels along the rows and columns of image. The horizontal projection, hi(r) and vertical projection, vi(r) can be defined as in Equation 3.1 and 3.2 respectively. (3.1) (3.2) E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 95 The 2 Dimensional Discrete Wavelet Transform (2D-DWT) produces coefficient values with the same as the original image. The implementation of 2 Dimensional Discrete Cosine Transform (2D-DCT) is suggested for the next step. The extracted coefficients are used to represent the image for classification. 3.4.1 Discrete Wavelet Transform (DWT) Discrete Wavelet Transform (DWT) is a high-level feature extraction technique. The basic idea about DWT is to provide the time-frequency representation and use to transform one function representation into another. It performs simultaneous representation of an image in different resolution levels, which is also known as multi-resolution analysis. The 2D-DWT represents an image in term of a set of shifted and dilated wavelet functions, and scaling functions that form an orthonormal basis for L2 (R2). Given a J-scale DWT, an image x(s,t) of NxN decomposed as in Equation 3.3. with Equation 3.4. π β1 π β1 π π π₯(π , π‘) = βπ,π=0 π’π½,π,π Ξ¦πΏπΏπ½,π,π (π , π‘) + βπ΅βπ΅ βπ=1 βπ,π=0 π€ π΅π½,π,π Ξ¨ π΅π,π,π (π , π‘) (3.3) π π· πΏπΏπ½,π,π (π , π‘) β‘ 2β2 π·(2βπ π β π, 2βπ π‘ β π), πΉ π΅π,π,π (π , π‘), πΉ π΅π,π,π (π , π‘) β‘ 2 β π2πΉ π΅π,π,π (2βπ π β π, 2βπ π‘ β π, π΅ β π΅, π΅(3.4) LL3 HL3 LH3 HH3 HL2 HL1 LH2 HH2 LH1 HH1 Figure 3.4. Joint spatial and frequency representation of three levels of 2D DWT Figure 3.5 Output of DWT - two decompositions L and H represent Low and High frequency bands respectively and label 1, 2 and 3 represents the decomposition level. LL is the upper left quadrant consists of all coefficients. HL and LH is the lower left and upper right bands respectively. The rows and columns are filtered accordingly. HH is the lower right quadrant, which is derived analogously to the upper left quadrant but with the use of the analysis high pass filter, which belongs to the given wavelet. The images are transformed into their respective coefficients that separate the vertical, horizontal and diagonal sub-bands. The original image is first filtered using high pass filter (HPF) and low pass filter (LPF) on each row. The image resulting from HPF and LPF is considered as L1 and H1 respectively. Next, they are combined into A1, where A1 = [L1, H1]. Then, A1 is down sampled by 2 and passed through HPF and LPF on each column. The output is L2 and H2 and its combination in A2 = [L2, H2]. 2 to get compressed image down sample A2. This compressed image is obtained using one level of decomposition. In order to get more compression ratio, the steps above should be repeated E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 96 depending on the number of decomposition level required. Figure 3.5 shows the result of DWT with two decompositions. 3.4.2 Discrete Cosine Transform (DCT) In this paper, 2 Dimensional DCT (2D-DCT) is applied in order to separate the image into parts or subbands of differing importance. It is utilized for data reduction or compression. Data compression is used to reduce the amount of data that needed to be stored before it will be sent to classifier. DCT has the ability to pack the most information in fewest coefficients. The general equation for a 2DDCT is defined as in Equation 3.5. 1 2 2 π 1 2 2 π π±.π’ πΉ(π’, π£) = ( ) ( ) π = 0π β 1π = 0π β 1Ξ(π). Ξ(π). cos [ 2.π (2π + 1)] πππ [ π±.π£ 2.π (2π + 1)] . π(π, π)(3.5) Where, N by M is the input image, is the intensity of the pixel in row i and j and is the DCT coefficient in row and column of the DCT matrix. The left top corner of the DCT appears the most images, much of the signal energy lies at low frequencies. The larger the number of coefficient gets wiped out as the middle and high frequencies will be ignored because the value is often small enough. Hence, the compression is achieved and the low frequency DCT coefficients are then selected as features. Figure 3.6 Example of wavelet coefficients representation. Figure 3.6(b) shows that the compression is achieved after DCT is applied. The middle and high frequencies is ignored, resulting in the elimination of some wavelet coefficients in Figure 3.6(a). 3.4.3 Combination of DWT and DCT Features 35 features have been chosen from the final level DWT approximation coefficients of the input image. The first 35 highest energy, of DCT coefficients from each axis acceleration data, are extracted and selected as activity features. For each image, the selected features of DWT and DCT are combined. Finally, 70 features from each image are selected for training in the next phase. The combination of DWT and DCT is generally illustrated in Figure 3.7. E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 97 Figure 3.7. Combination of DWT and DCT coefficients 3.5 Neuro Fuzzy Classifier The final step of our proposed methodology is classification. (Ying, 2013) is individually observed and analyzed into a set of quantifiable properties. Neuro fuzzy classifier refers to combination of artificial neural network and fuzzy logic system. An illustration of the neuro fuzzy classifier is as shown in Figure 3.8. by Sun and Jang(1993). Figure 3.8. An Illustration of Neuro fuzzy classifier Figure 3.8 demonstrates the neuro fuzzy classifier framework with two input variable, x1 and x2. The training data is categorized into three classes, C1, C2 and C3. In this paper, an alternative adaptive neuro fuzzy classifier is proposed. The rule weights and parameter optimization is manipulated in our proposed neuro-fuzzy based classifier. To initialize the fuzzy rules k-means algorithm is used. Gaussian membership function is used for fuzzy set descriptions only because of its simple derivative expressions. The number of rule samples adapts the rule weights. The SCG is used because it is faster than the steepest descent and some second order derivative methods and suitable for large scale problem (CetiΕli & Barkana, 2010).After the DWT and DCT are applied, the coefficient matrix is E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 98 converted to vector using zigzag order. The number of the first coefficient in the vector is selected. The vector belongs to each image and it is divided into training and testing set in neuro fuzzy classifier. 4. Results and Discussion 4.1 Evaluation of Proposed Single Hand Gesture Recognition System In order to validate the proposed method, a set of images, which consist of 6 classes. It includes hand gesture of Bramharam, Chaturam, Hamsapakshakam, Kapitham, Trupathakam and Trisolam. Table 4.0 shows the description of hand gesture classes that have been chosen for the evaluation. 4.2 Experiment Results The output of the proposed approach is the image of predicted hand gesture, predicted class and the meaning of the predicted hand gesture. Figure 4.1 output shows that both image 1 and image 21 is correctly classified and Figure 4.2(a) shows that image 2 that belongs to Class 1 is incorrectly classified into Class 2. Figure 4.2(b) misclassified image 19 into Class 5. In addition, the overall result of the classification is plotted in a graph as shown in Figure 4.3 Table 4.0 Description of hand gesture classes for system evaluation Class Name Image Meaning 1 Bramharam An auspicious occasion or festival 2 Chaturam Breaking into pieces 3 Hamsapakshakam Breaking into pieces 4 Kapitham Dispersing water of the river 5 Trupathakam Trident / Knot 6 Trisolam Milking Cows / Grasping the end of the robes E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 99 Figure 4.1. Example of correctly classified image: a) image 1 in Class 1; b) image 21 in Class 3. Figure 4.2. Example of misclassified image: a) image 2 in Class 2; b) image 19 in Class 5. Figure 4.3. Overall result of classification task Figure 4.3 shows the overall result of classification task done by the system. Image 2 which should be classified in Class 1 is misclassified into Class 2. Similarly, image 19 and 20 in Class 2 are classified into Class 5 and image 39 of Class 4 is wrongly classified into Class 2. Meanwhile, all images in Class 3, 5 and 6 are correctly classified into their own class. The proposed approach accuracy is calculated using the precision formula. Table 4.1 shows the result analysis in tabular form consisting of number of sample images in testing set, number of correctly classified and misclassified images. E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 100 Table 4.1 Tabular form of result analysis of the proposed system No. of sample No. of correctly classified image (tp) No. of misclassified image (fp) 60 56 4 From the analysis above, it is concluded that our proposed system is able to recognize single hand gesture in Bharatanatyam dance with accuracy of 93%. 56 out of 60 samples tested are classified correctly by the system. The misclassified image of hand gesture might be due to several reasons such as ill background and foreground separation and lighting condition. Table 4.2. Comparison applied in proposed system Figure 4.4 Comparison of performance From Table 4.2 and Figure 4.4, it is shown that our proposed method which applied hand gesture alone, DWT, DCT and 70 features gives the highest accuracy for the recognition with the percentage of 93%. However, when the binary image is used, the accuracy drops to 75%. The second highest accuracy goes to the combination of DWT, DCT and 70 features with 88%. The manipulation of number of features to 80 and 400 decrease the system accuracy to 83% and 73% respectively. Meanwhile, the accuracy of the system is only 55% when direct PCA is utilized. Combination of DWT, DCT, 70 features and PCA gives the least accurate result, which is 45%. The comparison proves that the best number of features used is 70 since it gives the best result. High number of features reduces the accuracy of the recognition system and utilizing PCA does not provide any significant result to the system. High number of features causes redundancy and gives negative effect on the learning models. It significantly decreases the performance of the system. 5. Conclusion As a conclusion, hand gesture recognition has been improved over the past few years. This is greatly driven by the fact that hand gesture recognition can serve in some applications, which apply computer vision technology. This study proposed image-processing algorithms with the aim to recognize and classify the single hand gesture in Bharanatyam dance. The developed system is able to fill the research gap and the performance of the proposed method is encouraging. E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 101 5.1 Limitation The limitation of this research is related to the robustness of the proposed system. In this study, the environment of the images in the sample set is controlled and the background color is standardized to white. It is concerned that the accuracy of the system might change if the image in the sample set exhibits other variations such as color noise, uneven illumination, corrupted by shadows and occlusion. 5.2 Future Works For future works, there are some recommendations can be taken into consideration such as: a. It is recommended to carry out an experiment to test the robustness of the system where images with complex background and other variations are provided in the sample set. b. In this study, we only train a relatively small set of training images, which is only 60 images. It is suggested to apply the method on larger training dataset for future works. c. Implementation of other well-known classifiers such as ANN, KNN and SVM is highly suggested for future works as they might offer higher rate of recognition and more robust. Acknowledgement This paper is under scholarship of the University Tunku Abdul Rahman University. References Chaudhary A, Raheja JL, Das K, Raheja S (2011) Intelligent approaches to interact with machines using hand gesture recognition in natural way: a survey. Int J Comput Sci Eng Survey (IJCSES) 2(1):122β133 Corera, S., & Krishnarajah, N. (2011). Capturing hand gesture movement: a survey on tools techniques and logical considerations. Proceedings of chi sparks. Feng, K.-p., & Yuan, F. (2013). Static hand gesture recognition based on HOG characters and support vector machines. Instrumentation and Measurement, Sensor Network and Automation (IMSNA), 2013 2nd International Symposium on. Ghosh, D. K., & Ari, S. (2011). A static hand gesture recognition algorithm using k-mean based radial basis function neural network. Information, Communications and Signal Processing (ICICS) 2011 8th International Conference on. Hariharan, D., Acharya, T., & Mitra, S. (2011). Recognizing hand gestures of a dancer. Pattern recognition and machine intelligence (pp. 186-192): Springer. Mozarkar, S., & Warnekar, C. (2013). Recognizing Bharatnatyam Mud Recognizing Bharatnatyam Mudra Using Principles of Gesture Recognition. International Journal of Computer Science and Network 2(4), 7. Priyal, S. P., & Bora, P. K. (2010). A study on static hand gesture recognition using moments. Signal Processing and Communications (SPCOM), 2010 International Conference on. E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 102 Rajam, P. S., & Balakrishnan, G. (2010). Indian sign language recognition system to aid deaf-dumb people. Computing Communication and Networking Technologies (ICCCNT), 2010 International Conference on. Rautaray, S. S., & Agrawal, A. (2012). Vision based hand gesture recognition for human computer interaction: a survey. Artificial Intelligence Review, 1-54. Saha, S., Ghosh, L., Konar, A., & Janarthanan, R. (2013(b)). Fuzzy L Membership Function Based Hand Gesture Recognition for Bharatanatyam Dance. Computational Intelligence and Communication Networks (CICN), 2013 5th International Conference on. Saha, S., Ghosh, S., Konar, A., & Nagar, A. K. (2013(a)). Gesture Recognition from Indian Classical Dance Using Kinect Sensor. Computational Intelligence, Communication Systems and Networks (CICSyN), 2013 Fifth International Conference on. Shangeetha, R. K., Valliammai, V., & Padmavathi, S. (2012, 14-15 Dec. 2012). Computer vision based approach for Indian Sign Language character recognition. Machine Vision and Image Processing (MVIP), 2012 International Conference on. Vieriu, R.-L., Mironica, I., & Goras, B.-T. (2013). Background invariant static hand gesture recognition based on Hidden Markov Models. Signals, Circuits and Systems (ISSCS), 2013 International Symposium on. Wachs, J. P., Kölsch, M., Stern, H., & Edan, Y. (2011). Vision-based hand-gesture applications. Communications of the ACM, 54(2), 60-71. Yun, L., Lifeng, Z., & Shujun, Z. (2012). A Hand Gesture Recognition Method Based on Multi-Feature Fusion and Template Matching. Procedia Engineering, 29, 1678-1684. E-Proceeding of the International Conference on Social Science Research, ICSSR 2015 (e-ISBN 978-967-0792-04-0). 8 & 9 June 2015, Meliá Hotel Kuala Lumpur, Malaysia. Organized by http://WorldConferences.net 103