An Improved Observation Model for SuperResolution under Affine Motion
Abstract
Superresolution (SR) techniques make use of subpixel shifts between frames in an image sequence to yield higherresolution images. We propose an original observation model devoted to the case of non isometric interframe motion as required, for instance, in the context of airborne imaging sensors. First, we describe how the main observation models used in the SR literature deal with motion, and we explain why they are not suited for non isometric motion. Then, we propose an extension of the observation model by Elad and Feuer adapted to affine motion. This model is based on a decomposition of affine transforms into successive shear transforms, each one efficiently implemented by rowbyrow or columnbycolumn 1D affine transforms.
We demonstrate on synthetic and real sequences that our observation model incorporated in a SR reconstruction technique leads to better results in the case of variable scale motions and it provides equivalent results in the case of isometric motions.
SuperResolution, affine motion, multipass interpolation, bspline, approximation, projection, inverse problems, convex regularization.
I Introduction
\PARstartSuperresolution (SR) techniques aim at estimating a highresolution image with reduced aliasing, from a sequence of lowresolution (LR) frames. The literature on the subject is abundant, see [1, 2, 3, 4, 5, 6] and [7] for a recent review.
Our contribution deals with the class of “Reconstruction Based” SR techniques [8], which can be split in three steps: (1) estimation of interframe motion; (2) computation of a linear observation model including motion; (3) regularized inversion of the linear system.
We are interested in aerial imaging applications which often imply non isometric motion, as in the case of an airborne imager getting closer to the observed scene, see Sec. VIC. Such non isometric motion fields can be estimated using various registration algorithms [9, 10]. Hence, step (1) is not the main issue in this context. Concerning step (2), the SR literature is rather allusive: most published methods implicitly assume translational motion [1, 8, 11, 12, 13, 14, 4, 6, 15, 16, 17, 18, 19]. To the best of our knowledge, if some former contributions apply to affine [20, 21] or even homographic [9, 22] warps none of them explicitly deals with variable distance from scene to imager in step (2)^{1}^{1}1It is addressed formally in [3] but not implemented nor demonstrated.. So,we focus on the construction of a proper observation model for affine motions with consistent scale changes.
Section II proposes a bibliographical survey of the SR literature, with respect to the observation model. It is shown that published methods are not adapted to the considered context: its main difficulty is to account for non translational motion in a tractable discrete model.
Section III is devoted to the proposed new observation model that extends the popular one introduced by Elad and Feuer [5] by replacing traditional pointwise interpolation by techniques based on approximations and shifted bspline basis [23]. We show that our model leads to a more precise prediction of LR frame pixel values, in the case of a combined zoom and rotation motion.
Further comparisons are performed on SR reconstruction results. Section IV briefly introduces the convex regularization framework that we use for SR reconstruction. Such techniques are customary in various inverse problems, including restauration and SR [24, 2, 5].
We use the resulting SR reconstruction technique to compare various observation models on synthetic (section V) and real (section VI) datasets. These experiments consistently show that our model is more accurate and reliable for sequences combining rotation and important scale changes, at the expense of a moderate increase of computational load.
Ii Analysis of previous works
This section describes several published observation models differing by the way they account for motion through numerical approximations.
Iia Notations
Uppercase letters (resp. boldface letters) refer to matrices (resp. vectors). and denote discrete positions of LR and SR pixels and denotes real positions on the image plane. An image can be described by a continuous field , or by a sequence of discrete coefficients and as lexicographycally ordered vector .
IiB General observation model
Let be the input irradiance field and be the observed LR image. is a sampled version of the convolution of with an optical point spread function (PSF) integrated by a box function corresponding to the collecting surface of the detector:
with . is the set of discrete detectors positions on a grid with step . Let us denote the number of LR pixels in frame .
It is customary to define a joint opticsplusdetector PSF so that .
SR methods rely on the usual “brightness constancy” assumption which is the basis of many motion estimation techniques, in particular intensitybased techniques [10]. In this framework, SR methods assume that temporally neighboring frames originate from a unique input up to a warp modeling relative sensor / scene motion.
Let () denote a neighboring frame of , then (i) derives from an irradiance field through sensor : and (ii) there is a warp , such that . Combination of both equalities yields
(1) 
The next step is discretization of for the sake of numerical computations. The irradiance field is decomposed on a shifted kernel basis:
(2) 
is the SR grid, with step and is the number of SR pixels. The ratio defines the practical magnification factor (PMF) of the SR process: it is usually greater than two. Note that it does not imply that the actual resolution improvement is as high as the PMF.
may be any classical interpolation kernel (box function, bilinear, …). In the sequel, we use bspline basis, which encompass most classical interpolation schemes [25, 26, 27]. Then is a separable bspline kernel of order : , where is the fold convolution of a box function.
Let us rewrite (1) as:
(3) 
Injecting (2) yields:
(4) 
Using lexicographically ordered vector representation of images, a matrix formulation writes:
The whole matrix is huge with dimensions , . For instance, a sequence of frames, with dimensions and a PMF leads to about billion elements. Of course, is a sparse matrix with a band structure, as practical PSF spreads over two or three LR pixels at most and is a separable bspline kernel, whose support is wide. However, the cost of computing all non zero elements of remains formidable for general warps .
In the following, we review landmark SR papers with respect to the way they compute . We discuss three main approaches:

Exact computation for special cases of and

ConvolvethenWarp approximation

WarpthenConvolve approximation
IiC Exact computation
Exact computation is tractable only in two special cases:

motion is a global translation;

and are both box functions and motion is affine.
IiC1 Global translation
IiC2 and are box funtions
When and are box functions [3, 2, 29], (4) is the common area between each detector and each warped SR pixel (see Fig. 1).
Such an observation model has been proposed by Stark and Oskoui for rotational warps [29]. No indication is provided in their paper about the numerical computation of the relevant intersections.
Assuming affine motion, each warped SR pixel is a convex polygon, and computation of the intersection of two convex polygons can be performed by a “clipping” algorithm such as [30]. However, this technique is not suitable for SR purpose due to its high computational burden.
IiD ConvolvethenWarp
Let us start back from (3). In practice, scarcely spreads over two or three LR pixels, thus integral (3) extends on a neighborhood around . Let us assume that can be locally approximated by a translation:
Then (3) can be approximated by a convolution:
(5) 
Such an approximation is depicted in Fig. 2. The center of each detector is well positioned, but the integration area is a rough approximation. Such an approximation leads to errors in the integration step for large rotations and scale variations.
The discretization of this model is much easier than the general model (1), because it is an irregular sampling of a convolution. The simple model of Schultz and Stevenson [2] is a special case of this approach when and are both box functions and the detector center positions are rounded to integer multiples of . Then, the components are binary, with if the th SR pixel is inside the th detector area, approximated as in Fig. 2(b). A refined version of this model is used in [31].
As a conclusion, this model appears computationnaly attractive but is clearly unable to correctly account for nontranslational warps because of the fixed detector geometry (see Fig. 2).
IiE WarpthenConvolve
This approach consists in using the convolution relationship (1) between the data and the warped SR image . If a discretized version of over the shifted basis functions is available, (1) can easily be discretized as:
where is a downsampling matrix, and is the convolution matrix associated to the opticalplusdetector response.
Now the main problem is to construct using the discretized SR image coefficients defined by (2). A first approach may be to enforce equality on the grid nodes:
If is a bspline of order or , it satisfies , and we get:
(6) 
In other words, the discrete coefficient is the interpolation of at point . If is a box function (), (6) reduces to nearest neighbor interpolation and if is a triangle function (), (6) is a bilinear interpolation.
Interpolation (6) leads to the definition of a warping matrix , which summarizes all motion information. The complete image formation model is then:
(7) 
This is exactly the formulation proposed by Elad and Feuer [5, 21] referred to as “E&F” model in the following.
Fig. 3 summarizes this method: starting from the sought SR image Fig. 3(a), an intermediate highresolution image Fig. 3(b) is constructed with a pixel grid aligned with the detector grid using either bilinear or nearest neighbor interpolation. Integration and subsampling are then straightforward.
Compared to the previous approach, the E&F model seems much more precise for rotation warps. However, one can foresee aliasing problems in the case of scale changes due to the pointwise interpolation step (6).
Iii Proposed observation model
This Section introduces an original observation model extending the E&F model, by replacing pointwise interpolation (6) by a technique based on function approximation.
Dealing with variable scale using approximation technique is not easy in 2D. In this context, Catmull and Smith [32] introduced an efficient decomposition of 2D affine transforms into separable 1D transforms.
First, we will introduce such decomposition into our observation model. Next, we focus on the 1D operations in order to achieve a approximation on a bspline basis. Finally, we will compare observation models and point out the improvements provided by the proposed model.
Iiia Warping decomposition
Thevenaz and Unser showed that 2D invertible affine transforms can be handled by twoshear or threeshear decompositions [23]. Each shear is a vertical or horizontal coordinate transform such as:
(8) 
(9) 
Both ot them are onedimensional affine transforms separably applied rowbyrow or columnbycolumn. As an example, Fig. 4 provides the intermediate images resulting of each shear of the following affine motion and decomposition:
(10) 
This decomposition is not unique, and the choice of one particular decomposition impacts the transformed image quality. Catmull and Smith [32] mentioned the bottleneck problem resulting from a downscaling in one pass followed by upscaling in the next pass, resulting in a loss of resolution.
Many approaches have been proposed to minimize image degradation, depending on the considered transform. For intance, Paeth [33] has proposed a threeshear decomposition wellsuited for rotation. Other authors refer to pass decomposition [34].
Multipass interpolation techniques and their limitations are outside the scope of this article, the reader can refer to [34] for deeper insight. In the sequel, we consider only twoshear decompositions. In this case, there are two possibilities, and one selects the decomposition which reduces the involved scale variations [23, 35, 36].
IiiB 1D affine transform approximation
Let us consider a 1D affine transform with parameters : . With this notation, yields a signal reduction and yields a signal magnification. It is clear that signal reduction may result in important discretization errors (as naive subsampling undergoes a frequency aliasing).
In the line of Thevenaz et al. [23], let us decompose on the 1D shifted bspline basis:
(11) 
where denotes the set of discrete samples (for instance the set of pixels of a row of the image). We search for coefficients , such that , defined by
(12) 
achieves the best approximation of in the sense, i.e. minimization of . The approximation is the orthogonal projection, and the optimal coefficients satisfy the orthogonality equations
(13) 
for . Replacing (11) and (12) in (13) yields:
with and . The socalled bikernel encodes the geometric transform of a sample to a different scale space [36], and actually provides an optimal antialiasing filter [37]. If , is not a bspline kernel, but remains a piecewise polynomial. A closed form expression of is provided in [35].
Finally, the sought coefficients write:
(14) 
and the inverse filter can be efficiently implemented through recursive filtering [27].
To sum up the process, given a sequence of signal samples and 1D affine transform parameters the approximation goes through four steps:

compute bspline coefficients ;

compute the bikernel function ;

compute with (14) and

postfilter coefficients to get samples values .
Remark 1
— The first and the last steps are not required when the bspline representation order is or . Indeed, for these particular orders, bspline coefficients are identical to image samples.
Remark 2
— In the translation case (), . The approximation then turns to a mere bspline interpolation with a higherorder kernel.
IiiC A twoshear observation model
In the proposed model, the th observed frame (in vector notation) writes:
where and are shear operators. Each operator is an 1D rowbyrow (or columnbycolumn) affine transform, which is implemented as described in the previous section. In the sequel, we use an order0 bpline kernel. Thus, as a consequence of Remark 2, our model is identical to that of Elad and Feuer with bilinear interpolation for translation motion. The resulting model is denoted TS0 for TwoShear model with 0order bspline basis.
IiiD Comparing observation models
In this section we illustrate the quality of each observation model compared to exact computation in the special case of and chosen as box functions and affine motion, see Sec. IIC.
We represent the components of the observation matrix for a unique LR pixel in the form of an image patch. This patch displays the weighting coefficients actually applied on SR image pixels for computing one LR detector output. The first rows of the following three arrays of patches show the exact components for rotation angles degrees, scale variations of (Fig. 5(a)), (Fig. 5(b)) and (Fig. 5(c)) and a PMF of .
The remaining patches show the approximated components obtained using Elad and Feuer models with nearest neighbor interpolation (E&F0) or with bilinear interpolation (E&F1) and the proposed model (TS0).
The ConvolvethenWarp model is not presented, but would lead to the same image patch made of a fixed size square pattern, whatever rotation and zoom factor.
Fig. 5 shows that E&F0 is always incorrect even with limited rotations and/or scale variations. It is noticeable that in Fig. 5(a), some coefficients value reach two: some SR pixels (white colored) contribute twice to the detector. Such a behavior has been previously observed for the “ConvolvethenWarp” approach, see Fig. 2(b). In the same time several SR pixels do not contribute at all to the detector.
E&F1 provides a better approximation. Still, contributions of SR pixels are not uniform inside the detector footprint. This is already observed in Fig. 5(a) with rotations, and takes more importance in Fig. 5(b) and Fig. 5(c) with scale factor and rotations. As E&F1 contributions appear as a smoothed version of E&F0 ones, one wonders if a bicubic interpolation (E&F3) would give correct contributions. This is not the case, as shown by Fig. 6. Moreover, as bicubic interpolation does not preserve positivity, the E&F3 model exhibits negative contributions.
Whatever the interpolation method, Elad and Feuer models become inaccurate for rotations as low as and zooming factor as low as .
In contrast, the TS0 observation model ensures that the contributions of SR pixels are uniform inside the detector footprint whatever rotation and/or scale factor being applied. Remaining differences between exact contributions and TS0 ones are located on the detector boundaries: TS0 contributions spread on slightly more than true ones.
Iv Regularization framework
The inversion step is tackled within a classical convex regularization framework [24] as in many other SR methods [2, 5]. The estimated SR image is the (possibly constrained) minimizer of a regularized criterion based on observation model and convex edgepreserving penalty:
(15) 
The first term of criterion (15) is a least squares discrepancy between data and model output: stands for the observation model which is to be inverted and derives either from the Elad and Feuer approach or from the proposed model of Sec. III. The second term is a convex penalization term [24]. is the set of cliques: it consists of all subsets of three adjacent pixels either horizontal, vertical and diagonal. denotes a secondorder difference operator within clique . The regularization parameter balances the tradeoff between the two terms of the criterion. The potential is chosen as a hyperbolic function:
Parameter sets the threshold between the quadratic behavior (), which allows small pixel differences smoothing and the linear behavior () aimed at preserving edges. The latter part produces a lower penalization of large differences compared to a pure quadratic function. has the same qualitative behaviour as the Huber function of [2].
Finally, for a given observation model, four solutions are computed, based on:

quadratic penalty

quadratic penalty and positivity constraint

hyperbolic penalty

hyperbolic penalty and positivity constraint.
The criterion is convex by construction and has a unique global minimizer. The optimization can be achieved by iterative gradientlike techniques [38] and we resort to a limited memory BFGS algorithm^{2}^{2}2The implementation named VMLMB, has been provided by Éric Thiébaut ().. It belongs to the class of QuasiNewton algorithms which only requires evaluation of the criterion and its gradient (no second order derivative is explicitly needed) and it is known to have better convergence properties than gradient algorithms.
V Experiments with synthetic sequences
This section presents the experiments conducted on synthetic sequences. Using synthetic sequences has two main advantages:

Sequences are built from a reference HR image which will later be used as a reference to compare with reconstructed SR images;

We control all imaging parameters: noise level, PSF and image size. Motion is exactly known too.
Va Synthetic data
To generate a sequence of LR frames, the observation matrices are computed exactly according to assumptions of Sec. IIC that and are box functions. As previously said, such a technique is very time consuming.
We simulate a smooth motion with a rotation up to degrees and a zoom factor up to . Each frame is and is built from a HR reference image. In Fig. 7(a), we show the first, middle and last frame generated from the reference HR image Lena.


We also generate another sequence from a bitonal calibration pattern named Mire. The first, middle and last image of the sequence are shown in Fig. 7(b).
VB Results
Four regularized solutions and three observation models (E&F0, E&F1 and TS0) are then available. Hence, we finally compare performances of 12 SR settings with respect to the reference HR image, by means of the PSNR (Peak SignaltoNoise Ratio, PSNR, with the mean square error). For each setting, the presented result is obtained with the best regularization parameter (i.e., selected to get the highest reachable PSNR).
Let us first deal with the “Lena” sequence of Fig. 7(a). Fig. 8(a) sums up the performance levels which have been achieved. First note that, on these relatively smooth images, various regularization settings lead to similar performances, and unconstrained quadratic regularization suffices to obtain good results. But we observe strong differences between observation models. On the average, there is an improvement from dB (noisy case) up to dB (no noise) between the E&F0 and the E&F1 models. Moreover, there is also a gain of to dB between the E&F1 model and the TS0 model.
Fig. 10 illustrates the differences between reconstructed SR images, using regularization and positivity constraint, depending on the chosen observation model. Once again, the reconstructed images shown on the first row of Fig. 10 have been obtained with the best regularization parameters. The E&F reconstructions are slightly more blurred than the SR image obtained from the proposed TS0 model. This is confirmed in the lower row which shows image error with respect to the reference HR image: the TS0 observation model yields a better reconstruction on high frequency areas, like the feather on the hat or the eyes.

We have also measured CPU time on a Pentium 4 at 2.66GHz. For this particular sequence, one iteration duration is respectively and seconds, for E&F0 and E&F1 methods. Our model requires seconds per iteration. All methods converge roughly with the same number of iterations. Hence our method is more time consuming than E&F1.
We now consider the bitonal “Mire” sequence shown in Fig. 7(b). Results are reported in Fig. 9 in terms of PSNR. As expected, this highfrequency sequence leads to much stronger differences between regularization terms and constraints.
As previously, strong differences are observed between observation models. On the average, there is a gain improvement from dB (noisy case) up to dB (no noise) between E&F1 model and TS0. Such an improvement is due to the high contrast in Mire image. Indeed, we know from Sec. IIID that our observation model does not induce non homogeneous contributions in the case of variable scale motion. The induced errors in the reconstructions are more visible in high contrast areas, see Fig. 11 compared to Fig. 10.
We also note that, in the noiseless case, hyperbolic regularization does not improve performances of E&F methods, whereas we notice a gain up to dB on the average, with the TS0 model.
E&F reconstructions are much more noisy than the one obtained with the TS0 model. Let us recall that these reconstruction are obtained with a regularization parameter adjusted to get the better PSNR w.r.t. the reference HR image. The selected regularization parameter is lower () with the TS0 model than with E&F models (). This might indicate that the more precise the model is the less it is necessary to regularize. In other words, regularization compensates for model errors which are lower with the proposed TS0 model.
By using synthetic sequences with rotational and variable scale motion, we have shown that the TS0 observation model leads to better reconstructed SR images than E&F methods, whatever the regularization involved.
As a general comment, it should be emphasized that performances are much more sensitive to a change of observation model than to a change of regularization. In other words, a good choice of the observation model leads to much higher improvement than changing the regularization term, at least in the context of rotation and scale variation explored here.
Vi Experiments on real sequences
In this section, we compare observation models on real sequences. We first discuss prior assumptions on the sequences with an emphasis on motion modelization and estimation, then we present the results obtained on two real datasets.
Via General assumptions and motion estimation
SR requires the knowledge of the sensor response and of the motion field between frames. We use the common box function model for the PSF. Note that all the tested observation models can accomodate a more general PSF.
We restrict our experiments to affine motion between frames, since the proposed TS0 model is limited to these motion fields. Affine model accurately describes the motion of a planar scene through orthographic projection [39]. Such assumptions are usually not valid on the whole field of view (except in special purpose experiments, see VIB), nevertheless the affine motion model is often a good local approximation of complex motion fields [9], valid in a restricted part of the image support (see an example in the aerial sequence of Sec. VIC).
We focus on sequences which exhibit large affine motions, with total zoom factor greater than and rotations higher than degrees (with interframe zoom up to and rotation degrees). Note that such experimental settings are not considered in the previous papers on SR, even those which address the non translational context [9, 22].
The first problem is to register each image of the sequence with respect to the reference image (usually the more resolved one). In this context, direct intensity based methods, which minimize a DFD (displaced frame difference) criterion are subject to false local minima, even using a multiresolution approach. This is due to the sensitivity of DFD criterion with respect to large rotational and scale changes. Hence, we use a twostep approach:

compute a rough affine motion from scaleinvariant keypoints matching;

refine the affine model using multiresolution DFD minimization on a restricted part of the image.
The first step uses ScaleInvariant Fast Transform (SIFT) keypoints of D. Lowe [40]. We match hundreds of keypoints between the considered frame and the reference one by SIFT descriptor correlation, then we robustly fit an affine model on selected matches using a crude rejection threshold. The second step is essentially a domestic version of the pyramidal image registration method of Thevenaz et al. [10].
ViB Lab tests
We made several SR experiments by using sequences of a bitonal resolution chart printed on an A4 paper sheet observed with a AVT046B SVGA Marlin B/W camera. We acquired image sequences with variable interframe translation, rotation and zoom factor: some examples are shown in Fig. 12. Each frame of a sequence is registered with respect to the reference frame as explained in the previous section. We ran SR reconstructions with the three concurrent observation models and quadratic or hyperbolic regularization, subject to positivity constraint. For each setting, several values of the regularization parameter have been tried. Indeed, most of the time there is a certain range of (low) values of the parameter where differences between methods can easily be observed, whereas above some regularization strength, all methods become equivalent and yield an oversmoothed result.
As a first example, we process a purely translational sequence, using frames with a PMF and a quadratic regularization: comparison on a small () region is shown in figure 13, for a low value of . As expected, in this case E&F1 and TS0 lead to quasiidentical results (PSNR = 68dB) whatever the parameter , while E&F0 shows some instability for low .
Fig. 14 and Fig. 15 show compared SR results on 7 frames of a sequence with both rotation (up to degrees) and zoom (there is a factor between the reference image and the farthest view). We use either quadratic regularization (upper part of the figures) or hyperbolic regularization with a threshold parameter (lower part).
For a low value of the regularization parameter ( with quadratic term and with hyperbolic regularization), see Fig. 14, E&F0 and E&F1 suffer from artifacts in the form of a pseudoperiodic texture, which is of high amplitude in E&F0 and less important, but manifest, in E&F1. Not surprisingly, this phenomenon is amplified by the hyperbolic regularization. For the same regularization parameter, TS0 does not encounter such instabilities, but exhibits ripples which are typical of an underregularized quadratic solution, and appear amplified by the hyperbolic edgepreserving potential.
For a more balanced value of the regularization parameter, see Fig. 15, E&F0 is still clearly degraded by instabilities. E&F1 and TS0 are now very close, but a careful examination of both solutions reveals that small amplitude artifacts remain in the E&F1 reconstruction.
ViC Aerial sequence
Fig. 16 displays the first and the last frames of an infrared sequence captured by an array sensor mounted on an airborne platform. As the plane gets closer to the scene, the last frame is the most resolved one and is chosen as the reference frame.
The scene is a harbour with the sea and waterfront in the foreground, a building with a vertical antenna in the middle and a series of cans lined up in the background. Two ships are present in the right low part of the last frame. Because of perspective effects – the lowest part of the frame is closer to the sensor than the upper part – apparent motion is closer to an homography than an affinity. From the first frame to the reference one, the lower part (resp. upper part) of the field of view is magnified with a factor about (resp. ). Therefore our method can only be applied to small regions of the frames.
Two regions are considered in the sequel: (i) in the upper part of the scene, the linedup cans that remain unresolved in the reference frame (see Fig. 17) and (ii) in the right low part of the scene, the waterfront and the ships, see Fig. 20.
We considered five frames of the sequence, Fig. 16 displays two of them. As already described, motion is estimated using SIFT on the whole sequence then the intensity based method of [10] is used to refine the SIFT estimate in each region.
SR reconstruction is performed with the algorithms of Sec. VA, with quadratic regularization () and positivity constraint. PMF along both image axis.
ViD Upper region
The observation models are compared through the SR reconstructions in Fig. 18.
The image quality in Fig. 18 gradually increases from the top image (E&F0) to the bottom image (TS0 model). Even if the latter is still not a high quality image, the improvement in resolution enables the count of the right block of cans in the bottom image, whereas it is less obvious in the middle image and even impossible in the upper image. The results of Fig. 18 look somewhat oversmooth. So a lower regularization parameter has been tested, results are displayed in Fig. 19.
Fig. 19 reveals that E&F0 and E&F1 are severely affected by the decrease of the regularization parameter, whereas our model seems more robust: artifacts appear in the right top part of the scene, but cans can still be counted.
ViE Right lower region
Fig. 21 proposes similar results for the ships at the right low part of the scene. The ships appear in bright contrast. A bicubic interpolation of the last observed frame is provided in Fig. 20.
The top image (E&F0 model) in Fig. 21 has many localized high frequency artifacts, part of them are absent in the middle image (E&F1 model). These artifacts are not present in the bottom image (proposed TS0 model). In the same time, comparison of SR results and Fig. 20 shows that resolution has indeed been increased.
Vii Conclusion
The presented paper deals with SR techniques in the field of aerial imagery. The proposed work focuses on the observation model in the case of an affine motion whereas the main part of SR literature deals with the inversion process or motion estimation.
We analyzed the existing observation models used in SR reconstruction and emphasized their underlying assumptions, so as to clarify their limitations. As a result, it is shown that these observation models fall into three categories:

exact computation

convolvethenwarp

warpthenconvolve
Exact computation is not tractable for general motions. The convolvethenwarp approach is numerically efficient but is unable to capture large rotations and scale variations. So, only the third approach, due to Elad and Feuer is relevant in our framework. However, we have observed inaccuracies for rotations as low as and zooming factor as low as . We succeeded in extending the E&F model to cover a more important range of affine transforms with high accuracy, for about more computation time. The pointwise interpolation stage in the E&F method has been replaced by functional approximation techniques. This technique combines a twoshear decomposition for the affine transform and a 1D projection on a shifted bspline basis.
The proposed model has been compared with various E&Flike models. These models have been associated to several regularization settings to be tested for SR reconstruction purposes using synthetic and real image sequences.
These tests have stressed the importance of the observation model in SR reconstruction when dealing with large zoom and rotation effects. In particular the choice of a bilinear interpolation instead of a nearestneighbor one within an Elad and Feuer setting dramatically improves the reconstructions. Moreover, the proposed model consistently achieves even better results.
Further research should be conducted to accurately deal with homographic motion, or piecewise parametric motion. It should unlock SR techniques to a larger application field.
Acknowledgment
The authors would like to thank Éric Thiébaut for providing an implementation of the VMLMB algorithm (see Sec. IV), used to optimize the constrained regularized criteria.
References
 [1] R. Tsai and T. Huang, “Multiframe image restoration and registration,” in Advances in Computer Vision and Image Processing, vol. 1. JAI, 1984, pp. 317–339.
 [2] R. Schultz and R. Stevenson, “Extraction of highresolution frames from video sequences,” IEEE Transactions on Image Processing, vol. 5, no. 6, pp. 996–1011, June 1996.
 [3] A. Patti, M. Sezan, and A. Murat Tekalp, “Superresolution video reconstruction with arbitrary sampling lattices and nonzero aperture time,” IEEE Transactions on Image Processing, vol. 6, no. 8, pp. 1064–1076, August 1997.
 [4] R. C. Hardie, K. J. Barnard, and E. E. Armstrong, “Joint MAP registration and highresolution image estimation using a sequence of undersampled images,” IEEE Transactions on Image Processing, vol. 6, no. 12, pp. 1621–1633, December 1997.
 [5] M. Elad and A. Feuer, “Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images,” IEEE Transactions on Image Processing, vol. 6, no. 12, pp. 1646–1658, Dec 1997.
 [6] S. Farsiu, M. Robinson, M. Elad, and P. Milanfar, “Fast and robust multiframe superresolution,” IEEE Transactions on Image Processing, vol. 13, no. 10, pp. 1327–1343, October 2004.
 [7] S. C. Park, M. K. Park, and M. G. Kang, “Superresolution image reconstruction: A technical overview,” IEEE Signal Processing Magazine, vol. 20, no. 3, pp. 21–36, May 2003.
 [8] S. Baker and T. Kanade, “Limits on superresolution and how to break them,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1167–1183, September 2002.
 [9] S. Mann and R. W. Picard, “Virtual bellows: Constructing high quality stills from video,” in IEEE Int. Conf. on Image Processing, Austin, TX, 1994, pp. 363–367.
 [10] P. Thévenaz, U. Ruttimann, and M. Unser, “A pyramid approach to subpixel registration based on intensity,” IEEE Transactions on Image Processing, vol. 7, no. 1, pp. 27–41, Jan. 1998.
 [11] S. Kim, N. Bose, and H. Valenzuela, “Recursive reconstruction of high resolution image from noisy undersampled multiframes,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, no. 6, pp. 1013–1027, June 1990.
 [12] S. Kim and W. Su, “Recursive highresolution reconstruction of blurred multiframe images,” IEEE Transactions on Image Processing, vol. 2, no. 4, pp. 534–539, October 1993.
 [13] H. Ur and D. Gross, “Improved resolution from subpixel shifted pictures.” CVGIP : Graph,Models, Image Processing, vol. 2, no. 54, pp. 181–186, March 1992.
 [14] M. Elad and Y. HelOr, “A fast superresolution reconstruction algorithm for pure translationnal motion and common spaceinvariant blur,” IEEE Transactions on Image Processing, vol. 10, no. 8, pp. 1187–1193, October 2001.
 [15] B. C. Tom and A. K. Katsaggelos, “Reconstruction of a highresolution image by simultaneous registration, restoration, and interpolation of lowresolution images,” in Proceedings of the International Conference on Image Processing, Washington, D.C., 1995, pp. 2539–2542.
 [16] A. Tekalp, M. Ozkan, and M. Sezan, “Highresolution image reconstruction from lowerresolution image sequences and spacevarying image restoration,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 3, pp. 169–172, March 1992.
 [17] E. Lee and M. Kang, “Regularized adaptive highresolution image reconstruction considering inaccurate subpixel registration,” IEEE Transactions on Image Processing, vol. 12, no. 7, pp. 526–837, July 2003.
 [18] A. J. Patti and Y. Altunbasak, “Artifact reduction for set theoretic super resolution image reconstruction with edge adaptative constraints and higherorder interpolants,” IEEE Transactions on Image Processing, vol. 10, no. 1, pp. 179–186, January 2001.
 [19] N. Woods, N. Galatsanos, and A. Katsaggelos, “EMbased simultaneous registration, restoration, and interpolation of superresolved images,” in IEEE Int. Conf. on Image Processing, Barcelona, Spain, 2003.
 [20] R. C. Hardie, K. J. Barnard, J. G. Bognar, E. E. Armstrong, and E. A. Watson, “Highresolution image reconstruction from a sequence of rotated and translated frames and its application to an infrared imaging system,” Optical Engineering, vol. 37, no. 1, pp. 247–260, January 1998.
 [21] M. Elad and A. Feuer, “Superresolution restoration of an image sequence: Adaptive filtering approach,” IEEE Transactions on Image Processing, vol. 8, no. 3, pp. 387–395, March 1999.
 [22] S. Lertrattanapanich and N. K. Bose, “High resolution image formation from low resolution frames using delaunay triangulation,” IEEE Transactions on Image Processing, vol. 11, no. 12, pp. 1427–1441, Dec. 2002.
 [23] P. Thévenaz and M. Unser, “Separable leastsquares decomposition of affine transformations,” in Proceedings of the 1997 IEEE International Conference on Image Processing (ICIP’97), Santa Barbara CA, USA, October 1997.
 [24] J. Idier, “Convex halfquadratic criteria and interacting auxiliary variables for image restoration,” IEEE Transactions on Image Processing, vol. 10, pp. 1001–1009, July 2001.
 [25] H. Curry and I. Schoenberg, “On spline distributions and their limits: The polya distribution functions,” Bull. Amer. Math. Soc., vol. 53, no. 1114, 1947.
 [26] M. Unser, A. Aldroubi, and M. Eden, “BSpline signal processing: Part I—Theory,” IEEE Transactions on Signal Processing, vol. 41, no. 2, pp. 821–833, February 1993.
 [27] ——, “BSpline signal processing: Part II—Efficient design and applications,” IEEE Transactions on Signal Processing, vol. 41, no. 2, pp. 834–848, February 1993.
 [28] A. Papoulis, Signal Analysis. NewYork: McGrawHill, 1977.
 [29] H. Stark and P. Oskoui, “Highresolution image recovery from imageplane arrays, using convex projections,” JOSA, vol. 6, no. 11, pp. 1715–1726, Novembre 1989.
 [30] I. Sutherland and G. Hodgman, “Reentrant polygon clipping,” Communication of the ACM, vol. 17, pp. 32–42, 1974.
 [31] M. Irani and S. Peleg, “Improving resolution by image registration,” Computer Vision and Graphics and Image Processing, vol. 52, no. 3, pp. 231–239, May 1991.
 [32] E. Catmull and A. Smith, “3Dtransformations of images in scanline order,” Computer Graphics (SIGGRAPH ’80 Proceedings), vol. 14, no. 3, pp. 279–285, July 1980.
 [33] A. Paeth, “A fast algorithm for general raster rotation,” Proc. Graphics Interface, pp. 77–81, 1986.
 [34] D. Fraser and R. Schowengerdt, “Avoidance of additional aliasing in multipass image rotations,” IEEE Transactions on Image Processing, vol. 6, no. 3, pp. 721–735, November 1994.
 [35] S. Horbelt, “Spline and wavelets for image warping and projection,” Ph.D. dissertation, Swiss Federal Institute of Technology Lausanne (EPFL), May 2001, EPFL Thesis no. 2397 (2001), 131 p.
 [36] A. Muñoz Barrutia, T. Blu, and M. Unser, “Leastsquares image resizing using finite differences,” IEEE Transactions on Image Processing, vol. 10, no. 9, pp. 1365–1378, September 2001.
 [37] M. Unser, A. Aldroubi, and M. Eden, “Enlargement or reduction of digital images with minimum loss of information,” IEEE Transactions on Image Processing, vol. 4, no. 3, pp. 247–258, March 1995.
 [38] J. Nocedal and S. J. Wright, Numerical Optimization. New York: SpringerVerlag, 1999.
 [39] J. L. Mundy and A. Zisserman, Eds., Geometric Invariants in Computer Vision. Cambridge (MA), USA: MIT Press, 1992.
 [40] D. G. Lowe, “Distinctive image features from scale invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
[]Gilles Rochefort
was born in Issy les Moulineaux,
France, in 1977. He graduated from the École Supérieure de Mécanique
et d’Électricité in 2000. He received the Doctorat
degree in signal processing at Université ParisSud,
Orsay, France, in 2005.
He is presently research engineer at RealEyes3d, a young company
involved in the design of camera phone applications.
He is interested in inverse problems in image processing, and more
specifically in improving image quality of low cost camera.
{biography}[]Frédéric Champagnat
was born in Dakar, Senegal, in 1966. He
graduated from the École Nationale Supérieure de Techniques
Avancées in 1989 and received the Ph.D. degree in physics from the
Université de ParisSud, Orsay, France, in 1993. In 199495 he was with
the Biomedical Engineering Institute of the École Polytechnique, Montreal,
Canada for a postdoctoral position. Since 1998, he is with Office National
d’Études et Recherches Aérospatiales, Châtillon, France.
His main
interests are in the field of spatiotemporal processing for space or aerial
image sequences, in particular registration, motion estimation,
superresolution and detection.
{biography}[]Guy Le Besnerais
was born in Paris, France, in 1967. He graduated from
the École Nationale Supérieure de Techniques Avancées in 1989 and
received the Ph.D. degree in physics from the Université de ParisSud,
Orsay, France, in 1993. Since 1994, he is with Office National d’Études et
Recherches Aérospatiales, Châtillon, France.
His main interests are in the
fields of image reconstruction, structurefrommotion and spatiotemporal
processing for space and aerial image sequences.
{biography}[]JeanFrançois Giovannelli was born in Béziers,
France,
in 1966. He graduated from the École Nationale Supérieure de
l’Électronique et de ses Applications in 1990. He received the Doctorat
degree in physics at Université ParisSud, Orsay, France, in 1995.
He is presently assistant professor in the Département de Physique at
Université ParisSud and researcher with the Laboratoire des Signaux et
Systm̀es (CNRS  Supélec  UPS). He is interested in regularization and
Bayesian methods for inverse problems in signal and image processing.
Application fields essentially concern astronomical, medical and geophysical
imaging.