Wen Gao is a Professor in Computer Science at Peking University. He also serves as the Vice President of the NSFC (National Natural Science Foundation of China) from 2013, and the President of CCF (China Computer Federation) from 2016. Before joining Peking University in 2006, he was Professor in the Institute of Computing Technology, Chinese Academy of Sciences (1996-2005) and Professor in Harbin Institute of Technology (1991-1995). Wen Gao received his PhD in Electronic Engineering from University of Tokyo in 1991.
Prof. Wen Gao works in the areas of multimedia and computer vision, including video coding, video analysis, multimedia retrieval, face recognition, multimodal interfaces, and virtual reality. He published six books and over 700 technical articles in refereed journals and proceedings in above areas. He earned many awards including six National Awards in Science and Technology Achievements. He has been featured by IEEE Spectrum in June 2005 as one of the "Ten To Watch" among China's leading technologists. He is a fellow of IEEE, a fellow of ACM, and a member of Chinese Academy of Engineering.
He is a professor in the Department of Computer Science and Technology atPeking University, Beijing, China. He is the founding director of NELVT (National Engineering Lab. on Video Technology) at Peking University.He is also the Chief Scientist of the National Basic Research Program of China (973 Program) on Video Coding Technology from 2009, and the vice president of National Natural Science Foundation of China from 2013.
He is working in the areas of multimedia and computer vision, including video coding, video analysis, multimedia retrieval, face recognition, and multimodal interface. He published six books and over 700 technical articles in refereed journals and proceedings in above areas. His publications have been cited for over 21,000 times according to Google Scholar.
He served or serves on the editorial board for several journals, such as IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Multimedia, IEEE Transactions on Autonomous Mental Development, EURASIP Journal of Image Communications, Journal of Visual Communication and Image Representation. He chaired a number of prestigious international conferences on multimedia and video signal processing, such as IEEE ICME 2007, ACM Multimedia 2009, IEEE ISCAS 2013, and also served on the advisory and technical committees of numerous professional organizations. He earned many awards such as one second class award in technology invention by the State Council, and six second class awards in science and technology achievement by State Council.
Current content search using mobile device is mainly based on keyword or text, with limited capability in describing real-world objects. In many cases, a query in image taken by the smart phone is simply more descriptive. Mobile devices have shown great potential for visual search, emerging applications include landmark search, dress product search, book search, location recognition, and scene retrieval, etc. The challenge for mobile visual search is how to cost down the task with high speed searching. In addition, a practical issue remains open on how to make visual search applications compatible across a broad range of devices and platforms. In this talk, I will discuss the key research efforts on CDVS (compact descriptor for visual search) by many teams from academia and industry, and reviewing the standardization activity took by ISO/IEC MPEG working group. A competitive and collaborative platform to evaluate the state-of-the-art visual search techniques and solutions will be given as well, where learning techniques have been shown to be the most promising approach to improve the performance and efficiency of mobile visual search.
Alex Hauptmann is a Principal Systems Scientist in the Carnegie Mellon University Computer Science Department and a faculty member with CMU’s Language Technologies Institute. His research interests have led him to pursue and combine several different areas: man-machine communication, natural language processing, speech understanding and synthesis, machine learning. He worked on speech and machine translation at CMU from 1984-94, when he joined the Informedia project where he developed the News-on-Demand application. Since then he has conducted research on video analysis and retrieval on broadcast news as well as observational video with success documented by outstanding performance in many video analysis challenges. His current research centers on robust analysis of internet-style and surveillance video as large scale data.
Alexander G. Hauptmann is an American Systems Scientist in the School of Computer Science at Carnegie Mellon University. He has been the leader of the Informedia Digital Library which has made seminal strides in multimedia information retrieval and won best paper awards at major conferences. He was also a founder of the international advisory committee for TRECVID. His research interests are in speech recognition, speech synthesis, speech interfaces and language in general. According to Hauptmann (2008) "Over the years his research interests have led him to pursue and combine several different areas of research: man-machine communication, natural language processing and speech understanding".
In the area of man-machine communication, According to Hauptmann (2008) "he is interested in the tradeoffs between different modalities, including gestures and speech, and in the intuitiveness of interaction protocols. In natural language processing, his desire is to break through the bottlenecks that are currently preventing larger scale natural language applications. The latter theme was also the focus of my thesis, which investigated the use of machine learning on large text samples to acquire the knowledge needed for semantic natural language understanding".
Even though the accuracy of content based video search systems (CBVS) has drastically improved, high accuracy systems tend to be too inefficient for interactive search. Therefore, to achieve real-time CBVS over millions of videos, we perform a comprehensive study on the different components in a CBVS system to understand the tradeoffs between accuracy and speed of each component. Directions investigated include exploring different low-level and semantics based features, testing different compression factors and approximations during video search, and understanding the time vs. accuracy trade-of. Semantic search in video is a novel and challenging problem in information and multimedia retrieval. Existing solutions are mainly limited to text matching, in which the query words are matched against the textual metadata generated by users. This talk will contrast approaches for content search both with example videos and without, using only text queries. The system relies on substantial video content analysis and allows for both low-level and semantic search over a large collection of videos. We share our observations and lessons in building such a system