Computer Vision Research Projects
We present below several representative research projects in the Computer Vision Laboratory
Smart Monitoring of Complex Public Scenes
The recent interest in surveillance in public, military and commercial applications is increasing the need to create and deploy intelligent, semi-automated visual surveillance systems. The overall objective of this project is to develop a system that allows for robust and efficient coordination among robots, vision sensors and human guards in order to enhance surveillance in sensitive environments such as airports, federal buildings, railway stations or other public places.
The system is structured hierarchically, with a central control node as the root, the monitored space being subdivided into regions with their local processing nodes, while at the bottom of the hierarchy there are conventional surveillance processing nodes (intelligent sensors) and mobile processing nodes represented by human personnel and robotic platforms.
The technical goals of this project relate to developing:
- Algorithms for activity recognition using multiple cameras with potentially overlapping views
- Techniques for object recognition and tracking
- An image and video retrieval system based on event-queries
- A multi-robot system provided with intelligent sensors (cameras and image understanding techniques), able to obtain high-resolution and high-level information regarding events occurring in the environment
- An effective human-robot interaction system that allows guards to coordinate their actions with the multi-robot system through portable devices
The problem of detecting and responding to threats through surveillance techniques is particularly well suited to a solution consisting of a team of multiple robots and human guards. For large environments, the distributed nature of such a team provides robustness and increased performance of the surveillance system. Including human interaction in all components of the system can significantly enhance the accuracy of the coordination and vision-based monitoring, while dramatically decreasing the workload of the human operators involved in surveillance applications.
- Christopher King, Maria Valera, Raphael Grech, Robert Mullen, Paolo Remagnino, Luca Iocchi, Luca Marchetti, Daniele Nardi, Dorothy Monekosso, Mircea Nicolescu, "Multi-Robot and Multi-Camera Patrolling", Handbook on Soft Computing for Video Surveillance, Sankar Pal, Alfredo Petrosino, Lucia Maddalena (editors), Taylor & Francis, pages 255-286, January 2012.
- Luca Iocchi, Dorothy Monekosso, Daniele Nardi, Mircea Nicolescu, Paolo Remagnino, Maria Valera Espina, "Smart Monitoring of Complex Public Scenes - collaboration between human guards, security network and robotic platforms", Proceedings of the AAAI Fall Symposium "Robot-Human Teamwork in Dynamic Adverse Environment", pages 14-19, Arlington, Virginia, November 2011.
- Maria Valera Espina, Raphael Grech, Deon De Jager, Paolo Remagnino, Luca Iocchi, Luca Marchetti, Daniele Nardi, Dorothy Monekosso, Mircea Nicolescu, Christopher King, "Multi-Robot Teams for Environmental Monitoring", Innovations in Defence Support Systems - Intelligent Paradigms in Security, Springer-Verlag, pages 183-209, March 2011.
A Visual Traffic Surveillance Framework: Vehicle Classification to Event Detection
Visual traffic surveillance using computer vision techniques can be noninvasive, automated, and cost effective. Traffic surveillance systems with the ability to detect, count, and classify vehicles can be employed in gathering traffic statistics and achieving better traffic control in intelligent transportation systems. However, vehicle classification poses a difficult problem as vehicles have high intraclass variation and relatively low interclass variation.
Five different object recognition techniques are investigated in this work, and adapted to the problem of vehicle classification. Three of the techniques that performed well were incorporated into a unified traffic surveillance system for online classification of vehicles, which uses tracking results to improve the classification accuracy.
To evaluate the accuracy of the system, 31 minutes of traffic video containing multilane traffic intersection was processed. It was possible to achieve classification accuracy as high as 90.49% while classifying correctly tracked vehicles into four classes: cars, SUVs/vans, pickup trucks, and buses/semis. While processing a video, the system also recorded important traffic parameters such as the appearance, speed or trajectory of a vehicle.
This information is subsequently used in an attribute-based search assistant tool in order to find relevant traffic information in a large video, such as "all instances of white pickup trucks traveling north to south."
- Amol Ambardekar, Mircea Nicolescu, George Bebis, Monica Nicolescu, "A Visual Traffic Surveillance Framework: Classification to Event Detection", Journal of Electronic Imaging - Special Issue on Video Surveillance and Transportation Imaging Applications, vol. 22, no. 4, pages 1-17, October-December 2013.
- Amol Ambardekar, Mircea Nicolescu, George Bebis, "Efficient Vehicle Tracking and Classification for an Automated Traffic Surveillance System", Proceedings of the International Conference on Signal and Image Processing, pages 1-6, Kailua-Kona, Hawaii, August 2008.
Intent Understanding from Video Sequences
Understanding intent is an important aspect of communication among people and is an essential component of the human cognitive system. This capability is particularly relevant for situations that involve collaboration among multiple agents or with robotic systems, or detection of situations that can pose a particular threat.
For surveillance or military applications, it is highly important to understand the intent of relevant agents in the environment, from their current actions, before any attack strategies are finalized. The approach relies on a novel formulation which allows a robot to understand the intent of other agents by virtually assuming their place and detecting their potential intentions based on learned models of activities. This allows the system to recognize the intent of observed actions before they have been completed, thus enabling preemptive actions for defense.
The system's capability to observe and analyze the current scene also employs novel vision-based techniques for target detection and tracking.
- Richard Kelley, Alireza Tavakkoli, Christopher King, Amol Ambardekar, Monica Nicolescu, Mircea Nicolescu, "Context-Based Bayesian Intent Recognition", IEEE Transactions on Autonomous Mental Development - Special Issue on Biologically-Inspired Human-Robot Interactions, vol. 4, no. 3, pages 215-225, September 2012.
- Alireza Tavakkoli, Richard Kelley, Christopher King, Mircea Nicolescu, Monica Nicolescu, George Bebis, "A Visual Tracking Framework for Intent Recognition in Videos", Proceedings of the International Symposium on Visual Computing, pages 450-459, Las Vegas, Nevada, December 2008.
- Richard Kelley, Christopher King, Alireza Tavakkoli, Mircea Nicolescu, Monica Nicolescu, George Bebis, "An Architecture for Understanding Intent Using a Novel Hidden Markov Formulation", International Journal of Humanoid Robotics - Special Issue on Cognitive Humanoid Robots, vol. 5, no. 2, pages 203-224, June 2008.
- Richard Kelley, Alireza Tavakkoli, Christopher King, Monica Nicolescu, Mircea Nicolescu, George Bebis, "Understanding Human Intentions via Hidden Markov Models in Autonomous Mobile Robots", Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, pages 367-374, Amsterdam, Netherlands, March 2008.
- Alireza Tavakkoli, Richard Kelley, Christopher King, Mircea Nicolescu, Monica Nicolescu, George Bebis, "A Vision-Based Architecture for Intent Recognition", Proceedings of the International Symposium on Visual Computing, pages 173-182, Lake Tahoe, Nevada, November 2007.
Segmentation for Videos with Quasi-Stationary Backgrounds
Video segmentation is one of the most important tasks in high-level video processing applications. Background modeling is the key to detection of foreground regions (such as moving objects. e.g., people, cars) in videos where camera is assumed to be stationary.
With this assumption, foreground objects can be detected by finding pixels that significantly differ from the static background. However, possible changes in the background of the scene, such as waving tree branches, fluctuating monitors, or water surfaces in motion, make it difficult to detect objects of interest in the scene according to this strategy.
Due to the diverse nature of video applications it has been a main concern for researchers to design a general, scene-independent system. In this project several approaches are proposed that address these challenges. The performance of each of the proposed methods is studied and scenarios in which each of them leads to better performance are investigated.
- Alireza Tavakkoli, Mircea Nicolescu, George Bebis, Monica Nicolescu, "Non-Parametric Statistical Background Modeling for Efficient Foreground Region Detection", Machine Vision and Applications, Springer-Verlag, vol. 20, no. 6, pages 395-409, October 2009.
- Alireza Tavakkoli, Mircea Nicolescu, Monica Nicolescu, George Bebis, "Efficient Background Modeling through Incremental Support Vector Data Description", Proceedings of the International Conference on Pattern Recognition, pages 1-4, Tampa, Florida, December 2008.
- Alireza Tavakkoli, Mircea Nicolescu, George Bebis, Monica Nicolescu, "A Support Vector Data Description Approach for Background Modeling in Videos with Quasi-Stationary Backgrounds", International Journal on Artificial Intelligence Tools, vol. 17, no. 4, pages 635-658, August 2008.
- Alireza Tavakkoli, Mircea Nicolescu, Monica Nicolescu, George Bebis, "Incremental SVDD Training: Improving Efficiency of Background Modeling in Videos", Proceedings of the International Conference on Signal and Image Processing, pages 1-6, Kailua-Kona, Hawaii, August 2008.
- Alireza Tavakkoli, Mircea Nicolescu, George Bebis, "A Novelty Detection Approach for Foreground Region Detection in Videos with Quasi-stationary Backgrounds", Proceedings of the International Symposium on Visual Computing, pages 40-49, Lake Tahoe, Nevada, November 2006.
- Alireza Tavakkoli, Mircea Nicolescu, George Bebis, "Robust Recursive Learning for Foreground Region Detection in Videos with Quasi-Stationary Backgrounds", Proceedings of the International Conference on Pattern Recognition, pages 315-318, Hong Kong, August 2006.
Visual Awareness and Long-Term Autonomy for Robotic Assistants
A major challenge in deploying robots into the real world is the design of an architectural framework which can provide long-term, natural and effective interactions with people. Within this framework, key issues that need to be solved relate to the robots' ability to engage in interactions in a natural way, to deal with multiple users, and to be constantly aware of their surroundings.
We propose a control architecture that addresses these issues. First, we endow our robot with a visual awareness mechanism, which allows it to detect when people are requesting its attention and try to engage it in interaction. Second, we provide the robot with flexibility in dealing with multiple users, such as to accommodate multiple user requests and task interruptions, over extended periods of time.
In support of our robot awareness mechanism, we develop visual capabilities that allow the robot to identify multiple users, with multiple postures, in real-time, in dynamic environments in which both the robot and human users are moving. To enable long-term interaction, we design a control architecture which enables the representation of complex, sequential and hierarchical robot tasks.
- Christopher King, Xavier Palathingal, Monica Nicolescu, Mircea Nicolescu, "A Flexible Control Architecture for Extended Autonomy of Robotic Assistants", Journal of Physical Agents, vol. 3, no. 2, pages 59-69, May 2009.
- Christopher King, Xavier Palathingal, Monica Nicolescu, Mircea Nicolescu, "A Control Architecture for Long-Term Autonomy of Robotic Assistants", Proceedings of the International Symposium on Visual Computing, pages 375-384, Lake Tahoe, Nevada, November 2007.
- Christopher King, Xavier Palathingal, Monica Nicolescu, Mircea Nicolescu, "A Vision-Based Architecture for Long-Term Human-Robot Interaction", Proceedings of the International Conference on Human-Computer Interaction, pages 1-6, Chamonix, France, March 2007.
An Automatic Framework for Figure-Ground Segmentation in Cluttered Backgrounds
This project addresses the problem of segmenting an image into coherent regions, in the presence of a potentially cluttered background. Grouping processes, which "organize" given data by eliminating irrelevant items and sorting the rest into groups, each corresponding to a particular object, can provide reliable pre-processed information to higher level vision functions, such as object detection and recognition.
Specifically, here we consider the problem of grouping oriented line segments in highly cluttered images. We developed a general scheme which has been shown to improve segmentation results considerably. The representation used for segments allows them to communicate with each other through a voting scheme that incorporates principles of perceptual grouping, so that salient groups of segments that "agree" in orientation will emerge.
This process results in better quality segmentations, especially under severe background clutter. Particularly remarkable, our experiments reveal that using this approach as a post-processing step to the boundary detection methods evaluated with a standard dataset improves the results in 84% of the grayscale test images from this benchmark.
- Leandro Loss, George Bebis, Mircea Nicolescu, Alexei Skurikhin, "An Iterative Multi-Scale Tensor Voting Scheme for Perceptual Grouping of Natural Shapes in Cluttered Backgrounds", Computer Vision and Image Understanding, Elsevier, vol. 113, no. 1, pages 126-149, January 2009.
- Leandro Loss, George Bebis, Mircea Nicolescu, Alexei Skurikhin, "Investigating How and When Perceptual Organization Cues Improve Boundary Detection in Natural Images", Proceedings of the IEEE Computer Society Workshop on Perceptual Organization in Computer Vision (in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition), pages 1-8, Anchorage, Alaska, June 2008.
- Leandro Loss, George Bebis, Mircea Nicolescu, Alexei Skurikhin, "An Automatic Framework for Figure-Ground Segmentation in Cluttered Backgrounds", Proceedings of the British Machine Vision Conference, pages 1-10, University of Warwick, UK, September 2007.
- Leandro Loss, George Bebis, Mircea Nicolescu, Alexei Skourikhine, "Perceptual Grouping Based on Iterative Multi-scale Tensor Voting", Proceedings of the International Symposium on Visual Computing, pages 870-881, Lake Tahoe, Nevada, November 2006.
Voting-Based Computational Framework for Motion Analysis
This research addresses the problem of visual motion analysis: given an image sequence that may contain unrestricted camera motion and/or object motion, the goal is to compute the optical flow (image motion for each pixel) and to segment images into regions that exhibit coherent motion. This is useful for applications that require the detection of various objects of interest in general situations where no assumptions can be made about the motion in the scene (e.g., the camera is static) or about the appearance of the objects to be detected (e.g., trying to find a blue car).
In the proposed approach, the problem is addressed by:
- Representing the position and potential image velocity of each pixel as a point in a 4D space
- Letting all these points communicate their information in a neighborhood through voting
Consequently, points that "agree" through the voting process can be grouped, and coherently moving regions can be extracted as smooth, salient layers from this 4D space. As a key feature, this approach allows for the inference of a dense representation in terms of accurate velocities, motion boundaries and regions, without any prior knowledge of the motion model, based on the smoothness of motion only.
- Mircea Nicolescu, Gerard Medioni, "A Voting-Based Computational Framework for Visual Motion Analysis and Interpretation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pages 739-752, May 2005.
- Mircea Nicolescu, Gerard Medioni, "Layered 4D Representation and Voting for Grouping from Motion", IEEE Transactions on Pattern Analysis and Machine Intelligence - Special Issue on Perceptual Organization in Computer Vision, vol. 25, no. 4, pages 492-501, April 2003.
- Mircea Nicolescu, Gerard Medioni, "Motion Segmentation with Accurate Boundaries - A Tensor Voting Approach", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. I, pages 382-389, Madison, Wisconsin, June 2003.
- Mircea Nicolescu, Gerard Medioni, "4-D Voting for Matching, Densification and Segmentation into Motion Layers", Proceedings of the International Conference on Pattern Recognition, vol. III, pages 303-308, Quebec City, Canada, August 2002. (Best Student Paper Award)
- Mircea Nicolescu, Gerard Medioni, "Perceptual Grouping from Motion Cues Using Tensor Voting in 4-D", Proceedings of the European Conference on Computer Vision, vol. III, pages 423-437, Copenhagen, Denmark, May 2002.
GlobeAll: Panoramic Video for an Intelligent Room
This project is focused on a real-time modular system for vision-based intelligent environments. We designed and developed GlobeAll, a modular prototype based on an electronic pan-tilt-zoom camera array.
The visual input is acquired by a multiple-camera system, which generates a composite view of the scene with a wide field of view (as a planar mosaic) and a view of the desired region of interest (as an electronically-controlled virtual camera). By maintaining an adaptive background model in mosaic space, the system segments the foreground objects as planar layers. Among them, targets are selected and tracked by redirecting the virtual camera.
An interpretation module analyzes the generated models (segmented objects, trajectories), allowing for the detection of simple events.
Compared to other solutions, the key features of our system are:
- Acquisition of a large field of view, while also capturing enough resolution for focusing on a certain region of interest
- Ability to perform pan-tilt-zoom operations electronically rather than mechanically
- Better precision and response time in redirecting the region of interest
- Low cost and high robustness, since it involves a digital solution, instead of using expensive and fragile mechanical or optical components
- Mircea Nicolescu, Gerard Medioni, "GlobeAll: Panoramic Video for an Intelligent Room", Proceedings of the International Conference on Pattern Recognition, vol. I, pages 823-826, Barcelona, Spain, September 2000.
- Mircea Nicolescu, Gerard Medioni, "Electronic Pan-Tilt-Zoom: A Solution for Intelligent Room Systems", Proceedings of the International Conference on Multimedia and Expo, pages 1581-1584, New York, NY, July 2000.
- Mircea Nicolescu, Gerard Medioni, Mi-Suen Lee, "Segmentation, Tracking and Interpretation Using Panoramic Video", Proceedings of the IEEE Workshop on Omnidirectional Vision, (in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition) pages 169-174, Hilton Head Island, SC, June 2000.
Hand-based verification/identification represent a key biometric technology with a wide range of potential applications both in industry and government. Traditionally, hand-based verification and identification systems exploit information from the whole hand for authentication or recognition purposes. To account for hand and finger motion, guidance pegs are used to fix the position and orientation of the hand.
In our lab, we have developed a component-based approach to hand-based verification and identification which improves accuracy and robustness as well as ease of use due to avoiding pegs. Our approach accounts for hand and finger motion by decomposing the hand silhouette in different regions corresponding to the back of the palm and the fingers (see image).To improve accuracy and robustness, verification/recognition is performed by fusing information from different parts of the hand.
Our approach operates on 2D images acquired by placing the hand on a flat lighting table and does not require using guidance pegs or extracting any landmark points on the hand. To decompose the silhouette of the hand in different regions, we have devised a robust methodology based on an iterative morphological filtering scheme. To avoid touching fingers and simplify segmentation, subjects are required to stretch their hand prior to placing it on the lighting table. No other restrictions are imposed on the subjects.
To capture the geometry of the back of the palm and the fingers, we employ region descriptors based on high-order Zernike moments. Comparisons with alternative approaches using the whole hand or individual parts of the hand, illustrate the superiority of the proposed approach both in terms of speed and accuracy. Also, qualitative comparisons with systems reported in the literature indicate that our system performs comparable or better.
- G. Amayeh, G. Bebis, A. Erol, and M. Nicolescu, "Hand-Based Verification and Identification Using Palm-Finger Segmentation and Fusion", Computer Vision and Image Understanding (CVIU) vol 113, pp. 477-501, 2009.
- G. Amayeh, G. Bebis, and M. Hussain, "A Comparative Study of Hand Recognition Systems", Workshop on Emerging Techniques and Challenges for Hand-based Biometrics (in conjunction with ICPR 2010), Istanbul, Turkey, August 22, 2010.
- G. Amayeh, G. Bebis and M. Nicolescu, "Gender Classification from Hand Shape", IEEE Workshop on Biometrics (in conjunction with CVPR08), 2008.
- G. Amayeh, G. Bebis, A. Erol and M. Nicolescu, "A component-based approach to Hand Verification", IEEE Workshop on Biometrics (in conjunction with CVPR07), Minneapolis, MN, June 2007.
Facial recognition technology has a wide range of potential applications related to security and safety including surveillance, information security, access control, and identity fraud. Despite the variety of approaches and tools studied, however, face recognition is not accurate or robust enough to be deployed in uncontrolled environments.
Several factors affect face recognition performance including pose variations, facial expression changes, occlusions, and most importantly, illumination changes. Thermal infrared (IR) imagery offers a promising alternative to visible imagery for face recognition due to its relative insensitive to variations in face appearance caused by illumination changes. Despite its advantages, however, thermal IR has several limitations including that it is opaque to glass.
In our lab, we have studied the sensitivity of thermal IR imagery to facial occlusions caused by eyeglasses. Specifically, we have found that recognition performance in the IR spectrum degrades seriously when eyeglasses are present in the probe image but not in the gallery image and vice versa.
To address this serious limitation of IR, we have developed a methodology for fusing IR with visible imagery. Since IR and visible imagery capture intrinsically different characteristics of the observed faces, intuitively, a better face description could be found by utilizing the complimentary information present in the two spectra.
We have tested two different fusion schemes, one performing pixel-based fusion and the other performing feature-based fusion. In both cases, we have employed Genetic Algorithms (GAs) to find an optimum fusion strategy. Our results illustrate significant performance improvements in recognition.
- G. Bebis, A. Gyaourova, S. Singh, and I. Pavlidis, "Face Recognition by Fusing Thermal Infrared and Visible Imagery", Image and Vision Computing, vol. 24, no. 7, pp. 727-742, 2006.
- S. Singh, A. Gyaourova, G. Bebis, and I. Pavlidis, "Infrared and Visible Image Fusion for Face Recognition", SPIE Defense and Security Symposium (Biometric Technology for Human Identification), Orlando, 12-16 April, 2004.
- A. Gyaourova, G. Bebis, and I. Pavlidis, "Fusion of Infrared and Visible Images for Face Recognition", European Conference on Computer Vision (ECCV04), Prague, May 11-14, 2004.
Fingerprint matching is among the most widely used biometric technologies with a broad range of both government and civilian applications such as driver's licenses, social security, passport control, ATM/credit card, medical records management, and laptop and cell phone access control.
The key challenge in fingerprint matching is getting a match decision between a pair of fingerprints from the same finger under various within-class variations. These variations can be caused by several factors such as non-linear geometric distortions due to skin elasticity, inconsistent finger placement and contact pressure, small sensing area, environment conditions, and sensor noise. As a result, impressions of the same finger may be quite different from each other, making matching very difficult.
An effective approach to account for within-class variations is by capturing multiple enrollment impressions. The most common approach involves matching a given impression (i.e., query) against each of the enrollment impressions. The final matching is obtained by fusing the individual matching results. This approach, has shown to increase accuracy, however, it also increases storage and time requirements.
In our lab, we have developed a new approach which involves merging the enrollment impressions into a "super-fingerprint" by registering the enrollment impressions together and matching the query against the super-fingerprint. This approach is less space and time consuming.
Registering the enrollment impressions accurately is a challenging issue. Our approach employs a novel hierarchical matching strategy to combine a number of enrollment minutiae feature sets into a super-template. Minutiae in the super-template are assigned a weight based on the frequency of their occurrence in the enrollment feature sets. These weights serve as a quality measure of the minutiae.
To merge an enrollment feature sets with the super-template, we search for minutiae correspondences between the enrollment template and the super-template using a hierarchical matching algorithm based on the Delaunay triangulation.
- T. Uz, G. Bebis, A. Erol, and S. Prabhakar, "Minutiae-Based Template Synthesis and Matching for Fingerprint Authentication", Computer Vision and Image Understanding (CVIU), vol 113, pp. 979-992, 2009.
- T. Uz, G. Bebis, A. Erol, and S. Prabhakar, "Minutia-Based Template Synthesis and Matching Using Hierarchical Delaunay Triangulations", IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS07), September 27-29, 2007, Washington DC.
- G. Bebis, T. Deaconu and M. Georgiopoulos, " Fingerprint Identification Using Delaunay Triangulation", IEEE International Conference on Intelligence, Information and Systems (ICIIS'99), pp. 452-459, Maryland, 1999.