CS-MUVI


                  
Compressive sensing (CS)-based spatial-multiplexing cameras (SMCs) sample a scene through a series of coded projections using a spatial light modulator and a few optical sensor elements. SMC architectures are particularly useful when imaging at wavelengths for which full-frame sensors are too cumbersome or expensive. While existing recovery algorithms for SMCs perform well for static images, they typically fail for time-varying scenes (videos). We propose a novel CS multi-scale video (CS-MUVI) sensing and recovery framework for SMCs. Our framework features a co-designed video CS sensing matrix and recovery algorithm that provide an efficiently computable low-resolution video preview. We estimate the scene's optical flow from the video preview and feed it into a convex-optimization algorithm to recover the high-resolution video. We demonstrate the performance and capabilities of the CS-MUVI framework for different scenes.
                  


                  
Outline of the CS-MUVI framework for sensing videos. The key challenge with sensing videos with cameras such as the single pixel camera (SPC) is that the scene changes with every compressive measurement obtained. Traditional l1-recovery methods fail in the presence of fast motion. We circumvent this problem by designing special measurement matrices that enable a two-step recovery process; the first step is to estimate motion in the scene and the second step is to recover the scene in full spatial and temporal resolution.

people

Richard G. Baraniuk
Kevin Kelly
Aswin C. Sankaranarayanan
Christoph Studer
Lina Xu

papers

CS-MUVI: Video Compressive Sensing for Spatial-Multiplexing Cameras
Aswin C. Sankaranarayanan, Christoph Studer, and Richard G. Baraniuk
IEEE Intl. Conf. Computational Photography, 2012

key points


Fig: Random matrices and time-varying scenes
The single pixel camera (SPC) acquires ONE compressive measurement at each time instant. We can obtain multiple compressive measurements by sampling over a duration of time. For static scenes, this poses no problem --- as soon as we obtain enough measurements, we can recover an estimate of the scene using various recovery algorithms. However, for time-varying scenes, each compressive measurement is of a slightly different scene. So how does conventional l1-recovery methods work when applied to time-varying scenes ?

Consider the following thought experiment. We obtain compressive measurements using an SPC observing a time-varying scene (static Lena with a cross moving left-to-right). What happens if we blindly attempt to recover a static scene (even though in reality the scene is non-static). Shown are results for (a) l1-recovery methods and (b) least-squares (LS) methods for various object speed as well as different number of compressive measurements.

Random matrices and l1-recovery methods are affected significantly by motion blur. For fast moving objects, if we take very few measurements, then motion blur is minimal, but reconstruction quality suffers due to having very few measurements. If we take a lot of measurements, then error due to motion blur overwhelms the recovered image. A key problem lies with the use of random measurement matrices.

Fig: Designing novel meaurement matrices
We design measurement matrices that simultaneosly satisfy two properties: (i) contains high-spatial frequencies so as to recover videos at full spatial-resolution; and (ii) has close-to-optimal l2-recovery properties when downsampled. Given that Hadamard matrices are optimal, among the space of +/- 1 matrices, for l2-recovery [Harwit and Sloane, 1967] we design our dual-scale sensing (DSS) matrices by upsampling low-resolution Hadamard matrices and adding random signflips to this. This ensures that the DSS matrices satisfy both properties we require.

Figure (to the left) shows the construction of DSS matrices. (a) Outline of the process of generating a single row of the DSS measurement matrix. (b) In practice, we permute the low resolution Hadamard for better incoherence with wavelet bases. In addition to this, we introduce spatial structures in the sign-flips to make the underlying computations faster.

Fig: DSS matrices and time varying-scenes
L2-recovery using our DSS matrices works extremely well. The key here is that our recovered result is at a lower spatial resolution which has two main advantages: (i) lesser resolution implies a smaller dimensional signal to estimate and hence, lesser measurements; and (ii) in essence, we provide a tradeoff between spatial blur (or downsampling) and motion blur. This, coupled with least square recovery (no use of sparse approximation) and Hadamard matrices (that provide optimal linear recovery gaurantees) gives us these high-quality initial estimates.

We refer to these initial estimates as the preview. These are extremely fast to compute; all it requires is a matrix multiplication with the added advantage of the matrix enjoying a fast transform. In particular, the preview provides insight into the scene and its temporal evolution. We use it to trigger a motion estimation/compensation algorithm. However, it can be used as a digital viewfinder (for sensing beyond visible spectrum) and for adaptive sensing application (choosing ROIs efficiently).

Fig: Optical flow-based recovery
Given the preview of a video, we use optical flow to estimate motion field between the preview images (upsampled to full resolution). Optical flow estimates can be written as a linear relationship between images (using bilinear interpolation model). We can now solve an l1-recovery problem with (i) compressive measurement constraints; and (ii) optical flow constraints between frames.

The recovered video has both high-spatial and temporal resolutions.

results

simulation results on high-speed videos


real data results (coming soon)

talk slides

[ICCP'12 talk slides]

code and dataset

(code)
(card+monster high-speed video) (two cars high-speed video)

Real data and results coming very soon