Video Scene Modeling And Foreground Detection
Abstract:
This work presents a framework for robust foreground detection that works under difficult conditions such as dynamic background and nominally moving camera. The proposed method includes two main components: coarse scene representation as the union of pixel layers, and foreground detection in video by propagating these layers using a maximum-likelihood assignment. Instead of modelling each pixel in the scene separately, we first cluster together pixels that share similar statistics. These pixels/samples are then used to create a non-parametric adaptive model of the cluster or layer. The entire scene is coarsely modelled as the union of such non-parametric layer-models. A pixel is then detected as foreground if it does not adhere to these adaptive models of the background. A principled way of computing detection thresholds is used to achieve robust detection performance with a pre-specified number of false alarms. Correlation between pixels in the spatial vicinity is exploited to deal with camera motion without precise registration. The proposed technique adapts to changes in the scene, and allows us to automatically convert persistent foreground objects to background and re-convert them to foreground when they become interesting. This simple framework addresses the important problem of robust foreground and unusual region detection, performed at about 10 frames per second on a standard laptop computer. The presentation of the proposed approach is complemented by results on challenging real data and comparisons with other standard techniques.
Preprint [PDF] (revised, under review)
Examples (Image Layering): Original image on the left with corresponding layers shown on the right.
|
|
|
|
|
|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Examples (Video-modeling):
Please note that we do not perform any registration and do not use any optical flow information for any of the examples shown below. Also, the detection results are provided "as is", without any morphological enhancements.
|
Video on the left indicates moving person in foreground, with dynamic background composed of subtle illumination variations in the sky along with ripples in the water and significant camera shake (being a hand-held camera). Video on the right shows the detected foreground with red.(click images to download video) |
||
|
There is a large amount of camera panning in the original video (left) along with movement of trees with the wind. Proposed technique robustly detects the moving person in the foreground which is shown in red on the right. (click images to download video) |
||
|
The dynamic background in this scene (far left) is composed of ripples in the water and moving reflections of the surrounding structures. The reflection of a moving person forms the foreground which is detected robustly as shown in red. The video on far right shows the different layers in various scales of gray, which are maintained correctly throughout the sequence. (click image to download video) |
||
|
This is a scene captured in gray-scale (left) and contains camera shake due to support vibration. Robust detection of foreground (shown in red on the right) is very difficult in such videos due to lack of color information. (click images to download video) |
||
|
This video illustrates the way we handle temporally persistent foreground objects. A persistent foreground object is converted to a background layer (shown in green in the video), but is reconverted back to foreground when it becomes interesting again. (click images to download video) |
||
| Preliminary results towards future applications: | ||
| Non-Conformist detections: This is a preliminary result toward inter-scene modeling applications of the proposed framework. The layer-models learnt from scene A (left) are used to make detections in scene B (right). This allows us to detect pixels in the scene B that do not conform to the layer-models in scene A, which are indicated in red in the center frame. (click image to download gif) | ||
| Layer transfer: layer-models learnt from one scene (left) are used to index/search for similar layers in another scene (right). | ||
|
|
Multi-camera person matching: Using layered representation of a foreground person to generate an appearance model that detects across cameras. | |
The above material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Last updated on : Monday, February 05, 2007