aMMAI
Transcription
aMMAI
Introduction – Advanced Topics in Multimedia Analysis and Indexing (aMMAI) Winston H. Hsu National Taiwan University, Taipei February 18, 2009 Office: R512, CSIE Building Communication and Multimedia Lab (通訊與多媒體實驗室) http://www.csie.ntu.edu.tw/~winston Outline Introduction Lecture style Assessments Logistic issues -2- aMMAI, Spring 2009 – Winston Hsu 1 The Multimedia Analysis and Search Problem (multi-modal) query media repository produced video corpora Personal media Online media shares results Digital life records “Topic177 - Find shots of a daytime demonstration or protest with at least part of one building visible” Explosive growth of image and video content on Web, broadcasts, personal media,… – 5B images on Web – 31M hours of TV per year – 200K YouTube uploads / day (~73M video uploads / year) Growing deluge requires more effective solutions for searching & browsing digital media – Exciting opportunity! 3 aMMAI, Spring 2009 – Winston Hsu Video (photo) search can (or need to) be improved by integrating content, context, and semantic representations!! Enormous (but noisy) multimedia data are publically available and promising for many applications!! 4 aMMAI, Spring 2009 – Winston Hsu 2 RECAP – Multimedia Analysis and Indexing (MMAI), Fall 2007/2008 Preliminarily understanding the design and implementation of search, classification, clustering for multimedia (including images, videos, and music) Understanding basic statistical tools for high-dimensional and largescale (multimedia) data analysis Evaluating the performance of multimedia systems Identifying current research problems in multimedia analysis and retrieval aMMAI, Spring 2009 – Winston Hsu -5- RECAP – Multimedia Analysis and Indexing (MMAI), Fall 2007/2008 .) tc sed Ba ring g e ntnin nte is/Filt /Mi Co s n aly atio An val ariz mm Su trie Re Signal Processing ant Machine Learning Sem Information Retrieval and nt ex Eve Ind nd pt a on i g nce ect rin Co Det ctu tru ic S to k um or ho ons P tw / c e e , N st ag e al ch Im dca ag ci e r o a e m e lI /S ro m Sp an ica su eo (b ic/ d s n m e M Co Vid Hu Mu Data aMMAI ,e er Applications Advanced Researches -6- aMMAI, Spring 2009 – Winston Hsu 3 Course Goals for aMMAI Extending breadths and depths for essential technical components for MMAI in feature representations and learning Gaining practical experiences through assignments and experiments Practicing paper critiques, summarization, and presentations aMMAI, Spring 2009 – Winston Hsu -7- Expected Audience Students enthusiastic about the research topics in multimedia analysis and indexing – Weekly participations required – Critiquing and summarizing two papers weekly Students preliminarily understanding related disciplines such as MMAI, machine learning, pattern recognition, computer vision, etc. -8- aMMAI, Spring 2009 – Winston Hsu 4 Example – Selective among Rich Learning Methods Context SpatialTemporal Multi-Modal Multi-Concept Unsupervised Temporal Mining Unsupervised w. Side Info. Window-based Discriminative: CRF,… Generative: HMM, … Ranking Multi-view Semi-supervised Joint Text-Image Semi-supervised Context Insensitive Multi-Instance Clustering Unsupervised ZoneTag Multimodal Fusion Image Annotation Object Recognition Multi-concept Modeling Brain Signal Speech Cross-domain Active Learning Standard Model (Generative, Discrim.) Supervised Learning Gaming Browsing Tagging Manual Supervision aMMAI, Spring 2009 – Winston Hsu 9 Course Style Overviews of image representation, feature extraction, and search methods (1 week) Student participations – collectively review, critique, and experiment with a set of selected papers – Each student will be assigned one (or two) paper(s) to summarize the technical content as well as related development in the field. (2-3 students/ week) Providing image/video data sets, features, and associated metadata (such as transcripts) for (3-4) assignments in this class. -10- aMMAI, Spring 2009 – Winston Hsu 5 Assessment 10% Class participation (interactions) – Passive and active Q&A 30% Paper critiques and summaries (weekly) – Graded by TA, lecturer, and students 30% Oral presentations – presentation – slides – sample codes/data sets (strongly recommended) 30% Assignments Bonus (final reports or projects, optional) aMMAI, Spring 2009 – Winston Hsu -11- Assessment (cont.) Paper critiques & summarization – Creating blogs or web pages – Posting paper reading before 12pm, the lecture day, including • Novelties, contributions, assumptions • Questions and promising applications • Technical summarizes – Lecturer & TA will grade the posts – See others’ paper interpretations as well – At end, each student marks top 5 students; overall critique scores are rated by the “PageRank” algorithm. – Examples for paper critiques will be given over the course web page. -12- aMMAI, Spring 2009 – Winston Hsu 6 (Tentative) Assignments 3 or 4 practical implementations based on MATLAB or C (C++) Benchmark data provided Essential techniques for advanced researches Cornerstones to promote research capabilities Topics – Feature reduction – PCA (Eigenface) – Probabilistic Latent Semantic Analysis (pLSA)/LDA – Adaboost – Graph Cut – Segmentation aMMAI, Spring 2009 – Winston Hsu -13- Topics Covering Techniques for MMAI 14 week # date 1 02/18/09 2 02/25/09 3 03/04/09 4 03/11/09 5 03/18/09 6 03/25/09 7 04/01/09 8 04/08/09 9 04/15/09 10 04/22/09 11 04/29/09 12 05/06/09 13 05/13/09 14 05/20/09 15 05/27/09 16 06/03/09 17 06/10/09 18 06/17/09 planning introduction MMAI recap Interesting points and local descriptors Dimension Reduction + manifold methods shape and texton representations Latent semantic analysis Efficient indexing methods Annotations for photos and persons mid-term exam. (break) boosting methods (Jiebo Luo's talk) multiple instance and semi-supervised learning Gaphical models Variational inferences Spectral clustering Frequent itemset and association Network analysis Ranking Methods final exam. (break) aMMAI, Spring 2009 – Winston Hsu 7 Conventional Content-Based Image Retrieval [before 1999] retrieved images query image feature extraction Image Database distance metric feature (vector) space (indexing) aMMAI, Spring 2009 – Winston Hsu -15- Semantic and Content-Based Image Retrieval query image * Graphical Models * MRF * Variational Inference [after 1999] retrieved images * Ranking * Annotations/Classifications * Boosting feature extraction * Multiple instance and Semi-supervised Learning * Efficient Indexing * Network analysis Image Database * Interesting points and local descriptors * shape representations and matching * Dimension Reduction and manifold methods * Language Models and Latent Semantic Analysis distance metric feature (vector) space (indexing) * Frequent Itemset and association * Spectral Clustering -16- aMMAI, Spring 2009 – Winston Hsu 8 How to Read a Paper Keshav, “How to Read a Paper,” CCR, 2007 3 Phases aMMAI, Spring 2009 – Winston Hsu -17- How to Write a Paper Henning Schulzrinne, “Writing Technical Articles,” http:// www.cs.columbia.edu/ hgs/etc/writing-style.html And other rich information in his page -18- aMMAI, Spring 2009 – Winston Hsu 9 Presentation These technical papers are fun and useful but require much more time than you imagined!! OK for using others’ materials but acknowledging the sources Slides, provided examples, and codes, will be collected for other students’ references Tips for presentations (see Henning’s page) aMMAI, Spring 2009 – Winston Hsu -19- Logistic Issues Rule – “deliver quality work on time with integrity!!” TA – TBA Course information – Readings, homework, slides, etc. – http://www.csie.ntu.edu.tw/~winston/courses/ammai/ – Mailing list: https://cmlmail.csie.ntu.edu.tw/mailman/listinfo/ammai or google “ammai, cmlab” -20- aMMAI, Spring 2009 – Winston Hsu 10 Next Week MMAI recap by TA – Video feature representations, shot segmentation – Image feature representations, content-based image retrieval – Basic mathematics tools • Probability 101, Entropy, Mutual Information, etc. Paper critique and summaries (due next week) – How to read technique papers – How to deliver research presentations – “Image Retrieval: Ideas, Influences, and Trends of the New Age,” Datta, 2008 (comprehensive and long) aMMAI, Spring 2009 – Winston Hsu -21- :: backup slides:: -22- aMMAI, Spring 2009 – Winston Hsu 11 Semantic Gap Photo with cheering crowds, taken on July 29, 2006, during the Hot Air Balloon Festival in New Jersey, USA Semantic Richness Content Descriptors sky hot air balloon crowds end goal object segmentation, regions Image-level Descriptors pixel intensity, texture, color histogram, date, etc. Raw Media -23- to work with aMMAI, Spring 2009 – Winston Hsu 12