1 Introduction
Wireless capsule endoscopy is a medical technology, noninvasive, devised for the in vivo and painless inspection of the interior of the GI tract. It is particularly important for the examination of the small intestine, since this organ is not easily reached by conventional endoscopic techniques. The first capsule was developed by Given Imaging (Yoqneam, Israel) in 2000 [12] and after its approval in Europe and the United States in 2001, it has been widely used by the medical community as a means of investigating small bowel diseases, namely GI bleeding and obscure GI bleeding (a bleeding of unknown origin that persists or recurs) [1, 7, 20]. This first capsule, for the small bowel examination, is a very small device with the size and shape of a vitamin pill. It consists of a miniaturized camera, a light source and a wireless circuit for the acquisition and transmission of signals [18]. In a WCE exam, a patient ingests the capsule, and as it moves through the GI tract, propelled by peristalsis (a contraction of the small intestine muscles that pushes the intestine content to move forward), images are transmitted to a data recorder, worn on a belt outside the body. After about 8 hours, the WCE battery lifetime, the stored images, approximately 50.000 images of the inside of the GI wall, are transferred to a computer workstation for offline viewing. Despite the important medical benefits of wireless capsule endoscopy, one biggest drawback of this technology is the impossibility of knowing the WCE precise location when an abnormality is detected in the WCE video. For instance, for an abnormality in the small bowel, the principal medical goal is to know how far is the abnormality from a reference point as for example, the pylorus (the opening from the stomach into the duodenum) or the ileocecal valve (the valve that separates the small from the large intestine), for planning a surgical intervention if necessary. Therefore, an accurate estimate of the WCE speed together with the location of one of these reference points (pylorus or ileocecal valve) would be medically extremely useful, since it would permit to measure the distance from the reference point to the capsule and consequently (i.e. equivalently) the distance from the reference point to the region imaged by the capsule.
Recently, there have been many efforts to develop accurate localization methods for WCE and we refer to [27] for an extended review on this topic. Generally, WCE localization techniques can be divided in three major categories: radio frequency (RF) signal based [3, 8, 13, 19, 21, 28], magnetic field based [5, 9, 10, 15, 16, 22, 23]
, and imagebased computer vision methods
[2, 3, 4, 6, 11, 14, 15, 16, 25, 26, 29, 24]. The first two typically require extra sensors installed outside the body.The monitoring of the RF waves emitted by the capsule antenna is a technique that has received considerable attention in the literature. Some of the strengths of this approach are that there is no need to redesign the capsule, since the RF antennas are already present in all capsules, and also the potential high accuracy of the method. For instance, in [28], using a threedimensional human body model, the authors suggest that it is possible to obtain an average localization error of mm in the digestive organs. An even lower error of mm is achieved in the small intestine. In particular, the technique presented is based on the measurement of the RF signal strength using receiving sensors placed on the surface of the human body model. In alternative, RF localization can also be based on the analysis of timeofarrival (TOA) and directionofarrival (DOA) measurements [8, 13, 19]. However, a number of difficulties remain to be resolved. First, the accuracy of these methods is highly dependent on a relatively high number of external sensors. This external equipment can be very discomforting for the patient. Also, some of these techniques require the patient to be confined to a medical facility. These restrictions eliminate some of the advantages that WCE has to offer. Moreover, the real human body is an an extremely complex medium having many nonhomogeneous and nonisotropic parts that interfere with the RF signal. Therefore, in practice, the existing RF localization systems still suffer from high tracking errors.
The magnetic localization technique is similar in principle to RF signal techniques. The idea is to insert a permanent magnet or a coil into the WCE and measure the resulting magnetic field with sensors placed outside the body. The permanent magnet method, unlike the coil based method, has the advantage that no external excitation current is needed. On the other hand, the latter, is less sensible to ambient electromagnetic noise. Magnetic based methods could benefit from the fact the human body has a very small influence on the magnetic field. Theoretically, the accuracy of these methods can be very high, e.g., average position errors of mm were reported in [9]. The main drawbacks associated with this technology are basically similar to those pointed out to RF methods: those are the need for a high number of external sensors and the restricted mobility of the patient. The modification of the capsule design may also be problematic. We also point out that magnetic localization systems are limited to 2D orientation estimation, since one rotation angle is missing.
One alternative technique that avoids any burden for the patient is based on computervision methods. Here only information extracted from WCE images is used to estimate the displacement and orientation of the capsule. Generally, these methods involve as a first step image registration procedures between consecutive video frames. The registration process is carried out through the minimization of a global similarity measure, e.g. mutual information [29], or the matching of local features, where algorithms like RANSAC and SIFT are the usual choices [11, 25]. The following step involves the estimation of the relative displacement and rotation of the wireless capsule. Several different approaches have been proposed to achieve this goal. One such approach, and the one also followed here, is to relate the scale and rotation parameters resulting from the registration scheme, with the capsule rotation and displacement, using a projective transformation and the pinhole model [25]. Another, more complex, approach is the model of deformable rings [26]. Orientation estimation resorting to homography transformation [24] or epipolar geometry [15] has also been explored.
The main challenges in the computer based methods are the abrupt changes of the image content in consecutive frames and in the capsule motion, caused by the peristaltic motion and the accompanying large deformation of the small intestine. However a common simplification used in image based WCE tracking, is to neglect the nonrigid deformations of the elastic intestine walls.
In this paper we develop an appropriate multiscale elastic image registration strategy that tries to take into account this effect, and that overcomes the limitations of multiscale parametric image registration (this latter captures only rigidlike movements of the intestine walls in successive frames). By way of illustration Figure 1 shows two consecutive frames in a WCE video, exhibiting elastic deformations, and demonstrating that an affine transformation composed of a planar rotation, scale and translation transformations, is not enough to match (or equivalently to register) the left with the right frame.
In fact, as observed in [14], and because WCE is propelled by peristalsis, the motion of the walls of the small intestine, in consecutive frames, is a consequence of a combination of two types of movements: the WCE movement, which is rigidlike, and the nonrigid movement of the small intestine (because of the peristaltic movement, the small instestine, which is an elastic organ, bends and deforms). Therefore, in this paper we propose a multiscale elastic image registration procedure, for measuring the motion of the walls of the small intestine between consecutive frames, that takes into account the combination of these two movements. Firstly a parametric preregistration is performed at a coarse scale, and gives the motion/deformation that corresponds to an affine alignment of the two images at a coarse scale, thus matching the most prominent and large features, and correcting the main distortions, originated by the WCE movement. In the second step, and based on the result of the first step, a multiscale elastic registration is accomplished. This second step performs the multiscale elastic motion/deformation, correcting the fine and local misalignments generated by the nonrigid movement of the gastrointestinal tract. The motion obtained with this multiscale elastic image registration, in two consecutive video frames, is the final deformation resulting from these two aforementioned successive deformations. Moreover we further enhance the quality of this approach, by iterating it twice.
To the best of our knowledge this is the first time that a multiscale elastic image registration (with an affine preregistration) is proposed for WCE imaging motion. Moreover, under the proposed multiscale elastic image registration approach we show that a qualitative information about the WCE speed can be obtained, as well as the WCE location and orientation by using projective geometry and following the aforementioned arguments of [25] (that is, by relating the scale and rotation parameters resulting from the registration scheme, with the capsule orientation and displacement, using projective geometry analysis and the pinhole model). Furthermore, the results of the tests and experiments evidence a better performance of the multiscale elastic image registration, when elastic deformations are involved (which is the realistic scenario because the capsule motion is driven by peristalsis), compared to the multiscale parametric image registration.
After this introduction, the rest of the paper is organized in three sections. In Section 2 we describe the proposed multiscale image registration approach (elastic with affine preregistration) as well as the fully parametric. In Section 3 we evaluate the proposed procedure in real (and artificial) WCE video frames and also compare it with multiscale parametric image registration, in terms of the qualitative WCE speed information, the dissimilarity measure for evaluating the registration, and in terms of the WCE location and orientation by following [25]. We give an account of all the numerical tests done and the corresponding obtained results. Finally, a section with conclusions and future work closes the paper.
2 Image Registration Approach
Let be a pair of images, one called the reference (and that is kept unchanged) and the other called the template , represented by the functions , where stands for the pixel domain, and is the notation for an arbitrary pixel in . The goal of image registration is to find a geometric transformation , such that the transformed template image, denoted by , becomes similar to the reference image , or equivalently, to solve an optimization problem, where the objective is to find a transformation that minimizes the distance between and , represented by a distance measure .
In this paper we always consider the greyscale version of the WCE video frames to perform the registration and the selected distance measure , that quantifies the similarity (or alignment) of the reference and transformed template images, under the transformation , is the the sum of square differences that directly compares the gray values of the reference and template images. This distance is defined by
(1) 
where is the space of squareintegrable functions in .
In this section we describe the proposed image registration approach, which is a multiscale elastic image registration with an affine preregistration, hereafter denoted by MEIR. It relies on a multiscale representation of the image data (see Figure 2) that originates a sequence of image registration problems (that are optimization problems). This multiscale representation is a strategy that attempts to diminish or eliminate several possible local minima and lead to convex optimization problems.
2.1 Multiscale elastic image registration with affine preregistration (MEIR)
Let , with and
a positive integer, denote a decreasing sequence of scale parameters, associated to a spline interpolation procedure
[17]. By starting with the large initial , that is related to the coarse scale, we denoted by and the corresponding interpolated reference and template images. These will retain only the most prominent features (small details in these images will disappear, as exemplified in Figure 2c). Then we perform a parametric preregistration, that is, we search for a particular type of affine transformation , a rigidlike one, that is a composition of scaling, rotation and translations, defined by(2) 
and such that is the solution of the optimization problem
(3) 
In (2)
is the vector with 4 parameters characterizing the rigidlike transformation
: represents the scale, is the rotation angle and finally, and denote the translations on the and axis, respectively.We observe that a general affine transformation is characterized not only by four parameters, as in (2), but by six parameters. However we have restricted the search to transformations of the type (2), because in this initial preregistration, at the coarse scale , the objective is to partially recover the rigidlike motion of the small intestine walls in a pair of consecutive frames, due to the WCE movement which roughly induces a twodimensional rigidlike apparent motion of the form (2) in the frames.
Afterwards, the idea is to improve this rigidlike motion by complementing it with the nonrigid deformations of the small intestine walls. In fact, the WCE motion is caused by the intestine movement.
Thus the goal is to do a loop over all the scales , for carrying out the multiscale elastic registration, and using the solution at scale as a starting point for the elastic image registration at the following finer scale , aiming at speeding up the total optimization procedure and avoiding possible local minima. To be precise, for each scale , with let and be the corresponding interpolated reference and template images. Figure 2 displays for a WCE video frame the multiscale representation of its greyscale version, using 4 scales , , , . The objective is to find a particular transformation (i.e. an elastic deformation), that for convenience is split into the trivial identity part and the deformation or displacement part (which means, , with ), such that at scale the transformed interpolated template image becomes similar to the interpolated reference image . The elastic registration problem to be solved at scale is the following optimization problem
(4) 
whose solution we denote by . Here is the elastic regularization term (which should make the optimization problem wellposed and restrict the minimizer to the group of linear elastic transformations) defined by
(5) 
with and denoting, respectively, the gradient and divergence operators
(6) 
is the notation for the Euclidean norm, and the parameters and are the Lamé constants characterizing the elastic material.The constant is a regularization parameter that balances the influence of the similarity and regularity terms in the cost functional of the optimization problem (4).
In general an analytical solution to (4) does not exist, and consequently the optimization problem (4) is then discretized and gives rise to a finite dimensional problem. The numerical scheme used in this paper to solve the discretized version of (4) is a GaussNewton like method (with Armijo’s line search), for which the starting guess is the solution of the registration problem at the previous coarse scale , that is, solution of (4) for , and the solution of the affine preregistration (3) for scale .
Finally and for summarizing the MEIR approach consists in performing firstly (3), the affine registration at a coarse scale, and then the multiscale elastic registration, by solving (4) for each scale (and using the solution of each scale as the input for the next scale).
We note that in (4), if we consider the regularizing parameter , and search for an affine transformation of the form (2) at each scale, then the proposed MEIR approach becomes a multiscale parametric (affine) image registration approach, hereafter denoted by MPIR.
We remark that in all the experiments described in Section 3 we further enrich the MEIR approach, by iterating it twice, and using the registered image as the input template for the second iterate. This means that the following two steps are performed.

Step 1  Registration of the pair with MEIR.

Step 2  Registration of the pair with MEIR, where is the solution of Step 1.

The transformation which is the solution of the previous Step 2, hereafter denoted by , is the final result for the iterated MEIR.
The Figures 3, 4 and 5 illustrate the results obtained with MEIR and MPIR, for different pairs of images , where is the reference and the template. We can visually compare in Figures 3 and 4 the two registration approaches. In Figure 3, is a simulated version of , obtained by applying a rotation and an elastic deformation to , and the result of MEIR, displayed in the second row, is clearly better than the MPIR result, shown in third row. In Figure 4, and are two consecutive frames of a WCE video: is the frame after , in the video, and we can perceive an elastic deformation and a rotation in . Also in this case MEIR gives a better result than MPIR (compare the second and third rows). In Figure 5, is a rotated and scaled version of , and the performance of both registration approaches are visually very similar, that is the reason why we only show the results obtained with MEIR, and the MPIR results are omitted. Moreover in these three figures the displayed grids for MEIR correspond to one iteration for MEIR; the grid obtained in the second iteration of MEIR only corrects minor differences.
We can also quantitatively compare the results obtained with MEIR and MPIR, displayed in Figures 3 and 5, where the template image is a simulated version of the reference image , by computing the following normalized dissimilarity measure ()
(7) 
This measure evaluates the accuracy of the registration approach. Here denotes the final numerical solution of the registration process ( of the form (2) for MPIR and for MEIR), and denotes the space of squareintegrable functions in . We observe that the measure quantifies the similarity between the reference and transformed template images in the norm of , normalized by the norm of the reference image. Clearly, for Figures 3 and 5, where is a simulated version of , the smaller is, the more accurate is the registration approach. In Figure 3 we have that for MEIR and for MPIR, and in Figure 5 we have that for MEIR and for MPIR. So in Figure 3 MEIR has a better performance than MPIR and in Figure 5 the results of both approaches resemble each other closely.
3 Experiments, Results and Analysis
We have evaluated the two multiscale registration approaches on 39 WCE videos, recorded at the Department of Gastroenterology of Coimbra Hospital (CHUC  Centro Hospitalar e Universitário de Coimbra, Portugal). The videos were acquired with the capsule PillCam SB, a WCE for the small bowel, manufactured by Given Imaging, Yoqneam, Israel. Each video clip has the duration of 20 seconds and 100 frames. Each frame has a resolution of pixels. The 39 videos belong to 9 different patients.
All the experiments were implemented with the software MATLAB® R2013b (The Mathworks, Inc.) and we have also used FAIR Software [17], an image registration package written in MATLAB, that can be freely downloaded from www.siam.org/books/fa06.
We have performed two types of experiments. Firstly we use real consecutive images of WCE videos, for showing the potential of the proposed MEIR approach. Secondly, since it is difficult to validate, at the moment, the approach in human bodies, we consider artificially scaled, rotated and elastic transformations of video frames, for demonstrating the efficacy of the proposed MEIR approach and for evidencing its superiority with respect to the MPIR approach, when elastic deformations are involved.
In the numerical tests, for both MEIR and MPIR we identify the image domain with the set , and discretize it with points for both the template and reference images, in each scale scale, thus creating a regular grid. We also consider four scales . Morevover, in MEIR the value for the regularization parameter is , and for the elasticity parameters the values are , .
We also note, as it can seen for example in Figures 3 and 5
(first row), for generating the synthetic frames, before applying the (scaled, rotated or elastic) transformation the original grayscale frame is padded with zeros such that its artificial version is still inside the domain
. In addition, for all the tests the is always computed in the domain and not in a subregion.3.1 Experiments with real successive frames
In this section we describe several results obtained in the experiments performed with real successive frames, namely the results in terms of the normalized dissimilarity measure for computing an estimation of the WCE speed.
The Figure 6 shows (in the middle) the plot of the curve for the MEIR approach, for a WCE video clip with 100 frames and with the duration of seconds. In the same fashion as is done in [4], this curve can thus be understood as a qualitative capsule speed information, that is based on the similarity between consecutive frames. We remark as well that each video frame has the information concerning its time acquisition, thus there is a direct correspondence between the frame number, that belongs to the interval , and its acquisition time, that belongs to the interval in seconds. Low values for indicate similarity between frames (for example, for the pair of frames 12 and 13 displayed on the left of Figure 6, the corresponding point in the curve is ), so the capsule is almost still or rotates/moves slowly, while high values for indicate abrupt changes/dissimilarities in the corresponding consecutive frames (for instance to the pair of frames 51 and 52, shown on the right of Figure 6, it corresponds the point in the curve) revealing that the capsule is moving fast. In particular, we refer that from the medical point of view the parts of a video with sudden changes of image content are of special interest. Therefore the can help clinicians in identifying quickly these changes (corresponding to the peak values) as well as the other parts with slow motion (corresponding to low values).
Figure 7 displays the curves for the two approaches (MEIR and MPIR), for the same video considered in Figure 6, and when the registration is done in the forward direction (starting from frame number 1 to 100).
We also note that MEIR (and also MPIR) is a technique to match consecutive video frames, so it is particularly effective, when these frames have common regions, but not so effective when the frames are totally dissimilar. The corresponding curve gives a valuable WCE speed information in regions where the WCE movement is continuous. When there are abrupt changes in consecutive frames, the registration approaches lead to peaks in the curves, that accurately identify the different pairs of consecutive frames where these peaks occur, however, the MEIR (or MPIR) approach, itself, is not very informative in these cases.
A comparison between the curves obtained with MEIR and MPIR reveals that there is a bigger gap between similar and dissimilar frames (respectively, low and high values for ) in the curve generated with MEIR than with MPIR. This result evidences a better separation between similar/quite similar and different consecutive frames, and thus a better performance of the MEIR registration approach. This was somewhat expected, because the small intestine is an elastic organ, and in motion due to peristalsis, therefore an elastic registration approach is more suited than an affine one. We refer as well to Figure 12 for a comparison, for a single frame, between the curves, obtained with MEIR and MPIR, as the amount of elastic deformation increases.
Figure 8 exhibits 3 different pairs of consecutive frames in WCE videos. For each pair we can perceive an elastic deformation and/or a rotation and/or a change in scale while passing from the previous frame to the following one . Figure 9 shows the results obtained with MEIR, for each pair in Figure 8. The grids correspond to the transformations obtained with one MEIR iteration. Clearly the transformed templates , displayed on the first row of Figure 9, demonstrate the elastic matching of these there pairs of consecutive video frames.
Finally, we note that in order to improve the efficiency of the MEIR approach, the affine preregistration problem (3) can be solved by a multilevel strategy by considering downsampled images. Using a twolevel approach for solving (3), first with and then with points, for both the template and reference images, we have observed a reduction of in the overall MEIR computation time.
3.2 Experiments with artificial frames
To evaluate the performance of the proposed multiscale approach (elastic with affine preregistration, MEIR) and also for a comparison with the multiscale fully parametric registration approach, MPIR (that is similar to many other existing approaches that rely only on affine correspondences between frames) we start by simulating transformations of video frames. Secondly we register the originals and corresponding simulated frames with the proposed MEIR and MPIR registration procedures, and finally we compare the results. More specifically, we proceed in the following way:

For each small bowel video, 20 frames are selected, by sampling the video every 1 second. Thus there is a total of 780 frames.

We register the original video frame and the corresponding modified version of it, using the two multiscale approaches, MEIR and MPIR.

We use the normalized dissimilarity measure introduced in (7) to assess and compare the accuracy of the registration approaches MEIR and MPIR, for all the tests.

We further assess and compare the performance of MEIR and MPIR, for tracking the capsule within the body, by using the idea described in [25] for estimating the displacement and orientation of the WCE. In fact, in [25] the scale and rotation parameters, resulting from an affine registration scheme (that involves the algorithms SURF and RANSAC), are identified with the capsule displacement and orientation using a projective transformation and the pinhole camera model. Here we use the scale and and rotation parameters resulting from MEIR and MPIR approaches, for inferring the displacement and orientation of the WCE as in [25].
The solution of MPIR corresponds to an affine transformation of the type (2) and gives immediately the scale and rotation needed for WCE localization and orientation, following [25]. When the MEIR approach is used, we need to consider the affine transformation of the form (2) closest to the solution of the MEIR approach (iterated twice), in the leastsquares sense, to deduce the WCE localization and orientation as in [25].
Figure 8: Three columns showing three different pairs of consecutive frames in WCE videos (original frames). The first line shows the reference images and the bottom line the template images . Image follows in the video. Figure 9: Results obtained with MEIR for the three pairs of Figure 8. Each column shows (from top to bottom) : the transformed template image to compare with , the difference between the reference and the transformed template images, and finally the deformed mesh corresponding to the solution of MEIR approach. Finally, for the all the tests involving the frames synthetically generated, we estimate the scale or/and rotation errors for MEIR and MPIR, by comparing the obtained scale and rotation parameters, and , with the a priori known scale and rotation values used to built the synthetically scaled or/and rotated frames.
3.2.1 Tests with elastic deformations
We describe now the results provided by the tests performed with synthetic elastic deformations. We have generated the elastic deformation for a frame in the following way : a) First we define a 128 by 128 random matrix, whose components are pseudorandom values drawn from the standard uniform distribution on the open interval
and smooth this matrix by using a Gaussian filter. b) Then we create a perturbed grid by adding the previous matrix to the regular grid of the image domain , with points. c) Finally, the elastically deformed version of the image is obtained by interpolating the image on this perturbed grid. This procedure is repeated for all the 780 images of the dataset. Therefore, a unique elastic deformation is associated with each image. The Figure 11 depicts several grayscale original frames and the corresponding elastic deformed versions by the aforementioned procedure.The result of the first experiment is shown in Figure 12. It displays a comparison, for a single frame, between the curves obtained with MEIR and MPIR as the amount of elastic deformation (induced artificially) increases. The graphic corresponds to the registration results for a single frame (displayed on the top right) whose grayscale version (displayed on the bottom left) is always the reference image . The different templates are deformed versions of the reference image , generated by increasing the amount of elastic deformation (and also by applying a rotation angle of and a change of scale with scale factor ). The vertical axis represents the values and the horizontal axis the intensity of elastic deformations, by increasing order. The results of for MEIR and MPIR with the deformed images exhibited in the third column as templates, correspond to the left and right, respectively, vertical dashed lines in the middle graph. The amount of elastic deformation applied to generate the top and bottom frames, denoted by and respectively and represented in the third column, are indicated by the left and right vertical dashed lines, respectively, in the middle graph. The intersection of these vertical lines with the curves are the NDM the results for MEIR and MPIR. Obviously this graphic reinforces the advantage of the MEIR approach over the MPIR approach, when there are elastic deformations involved. Figure 13 illustrates the MPIR and MEIR results for the reference and two template images (a weak elastic deformation of ) and (a strong elastic deformation of ) shown in Figure 12. These results clearly demonstrate the superiority of MEIR over MPIR, when the amount of elastic deformation increases.
After this first experiment, four types of synthetic frames were generated, using for each type the 780 frames : Case i) applying an elastic deformation only, at the original scale and original orientation. Case ii) applying a rotation and an elastic deformation at the original scale. Case iii) applying a scale factor and an elastic deformation at the original orientation. Case iv) applying a rotation, a scale factor and an elastic deformation.
The results of the tests for the cases i) to iv) are displayed in Tables 1, 2 and 3, for i), ii) and iii) respectively, and for iv) in Table 4, where the rotation angle is fixed at , and in Table 5, where the scale factor is kept fixed at (the errors listed in the tables are always mean absolute value errors).
Mean Scale Error  Mean Rotation Error  
MEIR  MPIR  MEIR  MPIR  MEIR  MPIR 
0.077865  0.305940  0.046420  0.050328  4.111000  4.821000 
As shown in these tables, the normalized dissimilarity measure is always better for MEIR than for MPIR. A similar results is true for the mean (absolute value) errors, either for the scale or the rotation angle, that is, the performance of MEIR is always superior to MPIR. This conclusion was somewhat expected, since the MEIR approach is obviously more convenient than MPIR, when elastic deformations are involved.
Rotation  Mean Scale Error  Mean Rotation Error  

MEIR  MPIR  MEIR  MPIR  MEIR  MPIR  
5  0.077690  0.299270  0.041627  0.044040  4.285200  4.982500 
10  0.081106  0.303860  0.044879  0.047900  4.001400  4.710200 
15  0.080859  0.304740  0.044368  0.048056  4.319300  5.013800 
20  0.086114  0.304760  0.045060  0.048703  3.853000  4.521800 
25  0.090683  0.306830  0.044147  0.046658  4.439000  5.129100 
30  0.095251  0.306230  0.045137  0.048883  4.748100  5.269300 
Scale  Mean Scale Error  Mean Rotation Error  

MEIR  MPIR  MEIR  MPIR  MEIR  MPIR  
0.4  0.119440  0.287370  0.018118  0.019389  4.436500  5.153900 
0.6  0.109320  0.292800  0.026897  0.028673  4.198000  4.926600 
0.8  0.091956  0.298420  0.035632  0.038818  4.360100  5.141100 
1.2  0.118630  0.304190  0.055755  0.058641  4.764600  5.172400 
1.4  0.172400  0.295720  0.066795  0.069305  4.494900  4.714600 
We remark that an elastic deformation always embodies a change in scale and generates a rotation, as illustrated in the examples depicted in Figure 11. There we can see that for two frames there is an evident rotation associated to the elastic deformation, and for one frame a change of scale is also obvious. This is the reason why in Table 1 we have measured the scale and rotation errors, for MEIR and MPIR, in spite of the fact that neither scale factor nor rotation angle were applied to generate the synthetic frames, except the elastic deformation. This comment also applies to all the other Tables 2 to 5. In fact the changes in scale and orientation are inherent to the elastic deformation procedure (i.e. are implicit changes) and interestingly the errors shown in Tables 2 to 5 confirm this issue, because the magnitude of the scale and orientation errors displayed in these tables is similar to that of Table 1. This means that these errors are essentially related to the change in scale an orientation produced by the elastic deformation, and the additional, induced, explicit change in scale or orientation does not increase the errors.
Scale  Mean Scale Error  Mean Rotation Error  

MEIR  MPIR  MEIR  MPIR  MEIR  MPIR  
0.4  0.117770  0.282990  0.016975  0.018131  4.397200  4.978300 
0.6  0.112350  0.294420  0.026690  0.028613  4.572500  5.258800 
0.8  0.093531  0.299500  0.035684  0.038745  4.291500  5.021300 
1.2  0.137110  0.309090  0.055324  0.057912  4.517900  4.931800 
1.4  0.202470  0.311510  0.064034  0.066069  4.830400  5.129400 
1.6  0.237270  0.298080  0.074932  0.076590  4.782800  4.840000 
Rotation  Mean Scale Error  Mean Rotation Error  

MEIR  MPIR  MEIR  MPIR  MEIR  MPIR  
5  0.222160  0.317380  0.070345  0.071575  4.604600  4.883100 
10  0.219680  0.312970  0.066740  0.069023  4.364000  4.505900 
15  0.206470  0.307480  0.066462  0.067773  4.562800  4.792400 
20  0.199300  0.306580  0.066079  0.068393  4.868400  5.045800 
25  0.190730  0.301070  0.064825  0.066755  5.100400  5.264700 
30  0.187460  0.302130  0.065554  0.067711  5.299400  5.510100 
3.2.2 Comments and extra tests
The tests described in Section 3.2.1, with artificial frames (elastically deformed), clearly show the advantage of MEIR over MPIR, to the real objective of WCE localization and orientation, when elastic deformations are involved. These tests demonstrate that the scale and rotation errors for MEIR are smaller than for MPIR. This is also connected with the exhibited values. In fact, the measure evaluates the quality of the registration approach (more precisely the similarity between reference and template images), and as Tables 1 to 5 show, NDM is always smaller for MEIR than for MPIR. So, based on these results and those displayed in Figure 7 (for a video with real successive frames, where is cleary smaller for MEIR than for MPIR), we expect the scale and rotation errors to be smaller for MEIR, in real consecutive WCE frames, and thus a better accuracy can be achieved in WCE localization with the MEIR approach.
We remark that in many existing approaches, dealing with capsule endoscope localization, as for instance [15, 25], the evaluation of the methods is done using artificially scaled and rotated video frames, but synthetic elastic deformations are never considered. This is an unrealistic procedure, because the movement of the WCE is caused precisely by the (elastic) deformation of the intestine. Therefore, the movement between two consecutive video frames with overlapping areas, is always intrinsically associated with a nonrigid movement, which is a much more complex movement than the one originated just by the combination of a rotation and a change of scale.
However, for comparison with the experiments and results, reported in the literature, and obtained by other methods, we have also performed experimental tests with frames that are only artificially rotated and scaled, and whose results we briefly described herein.
Obviously, for these particular tests where the frames are only synthetically rotated and scaled, MPIR is a better approach than MEIR. In fact, for these tests the obtained results show that the scale and orientation errors are lower for MPIR than for MEIR, while the values for the normalized dissimilarity measure are comparable in both approaches (of the order of ). This is a straightforward, evident and expected result, due to the definition of MPIR that searches exactly for an affine transformation, while in MEIR the main goal is to find an elastic deformation, and therefore we need to consider the affine transformation of the form (2) closest to the solution of the MEIR approach (iterated twice), to deduce the WCE localization and orientation; this procedure clearly induces some approximation errors that causes the slightly worse performance of MEIR compared to MPIR in these particular tests.
However, we emphasize that when there are elastic deformations involved, the results from the numerous tests on the artificial frames (see Tables 1 to 5) show that the values for MEIR are significantly lower than the values for MPIR. Therefore, a possible procedure to adopt, assuming the unrealistic scenario that there might be some WCE movements that are strictly rigidlike, and because in that case the values in both approaches, MEIR and MPIR, are comparable and of the order of (as aforementioned), is the following:

For a pair of consecutive frames apply MPIR and also MEIR.

Compute for MPIR and MEIR, hereafter denoted by and , respectively.

If and are comparable (of the order of ), consider the approach MPIR. If is significantly lower than (this means that elastic deformations are present), adopt the MEIR approach for this pair of frames.
Hence in the sequel we restrict ourselves to the description of the results obtained with MPIR for these particular tests (where the frames are only synthetically rotated and scaled) and which haven proven to be better than those reported in the literature with other methods.
In a first test we have created rotated versions of the 780 frames, by using nine rotation angles from to with a step of 5, at the original scale and then we have proceeded with the image registration of the original frames and their rotated versions with MPIR. The obtained results concerning the mean (absolute value) orientation errors are of the order , except for angle , where the error is of the order . These are better results than those reported in [15, 25] with other methods, where very large orientation errors occur when the rotation angle increases.
Then in a second test we have generated scaled versions of the 780 frames, using nine different scales from a factor of to and have performed the registration with the originals, using MPIR. The mean (absolute value) scale error stay in the some order of magnitude (approximately between and ), while in [25] the mean (absolute value) scale error is extremely big for small scales.
In addition we have also registered with MPIR each original grayscale image and a synthetically version of it, generated by simultaneously applying a rotation and a factor of scale. More specifically, in a third test we have fixed the scale at a factor of and varied the rotation angles from to with a step of , and for the fourth test, we fixed the rotation angle at and varied the factor of scale from to with a factor of . Again, for MPIR the mean absolute value errors, for scale and orientation, stay in the same order of magnitude. In the third test the mean rotation error increased with the angle, from (at angle ) to (at angle ). In the fourth test the oder of the mean scale error varied between to . We did not obtain large errors at the small scale or at the big rotation angle as reported in [25].
4 Conclusions
In this paper a multiscale elastic image registration has been proposed as a tool for tracking the movement of the walls of the small intestine, in WCE video frames, and subsequently for tracking the WCE motion. The proposed procedure, that involves an affine preregistration, takes into account the rigidlike and nonrigid movements to which the WCE is subjected within the small intestine, and that are a consequence of peristalsis.
The qualitative WCE speed information provided by this approach, through the dissimilarity measure , is medically practical, useful and facilitates the video interpretation. The tests also evidence the relevance of this measure, relative to MEIR, since from artificial data we conclude that smaller leads to smaller errors in WCE location and orientation. In addition, the experiments with real frames, described in Section 3.1, demonstrate the accuracy of the WCE velocity estimation as a function of . However peak speed points, that correspond to sudden changes of the image content in consecutive frames, should be further studied.
The proposed approach is also compared with a multiscale parametric image registration, that is similar to other existing approaches, that as this latter one, essentially rely on affine correspondences between consecutive frames, and consequently are only capable of capturing rigidlike movements. The comparison is done in terms of the qualitative WCE speed information, the dissimilarity measure for evaluating the registration, and in terms of the WCE location and orientation by following [25] (for this the scale and rotation parameters, resulting from the affine transformation closest to the solution of the proprosed approach, are computed and then identified with the capsule displacement and orientation, using a projective transformation and the pinhole camera model). The overall results indicate a better performance of the multiscale elastic image registration than the multiscale parametric image registration, when there are elastic deformations involved, which is a realistic situation in the WCE images.
Finally, we note that the multiscale elastic image registration herein proposed is an imagebased motion procedure, that could be also integrated or used as a complement, in other more complex existing approaches for WCE localization, involving extra sensors other than the WCE, for improving their accuracy.
Acknowledgment
This work was partially supported by the project PTDC/MATNAN/0593/2012 funded by FCT (Portuguese national funding agency for science, research and technology), and also by CMUC (Center for Mathematics, University of Coimbra) and FCT, through European program COMPETE/ FEDER and project PEstC/MAT/UI0324/2013. Richard Tsai is supportably partially by National Science Foundation Grant DMS1217203.
References
 [1] D. G. Adler and C. J. Gostout. Wireless capsule endoscopy. Hospital Physician, 39(5):14–22, 2003.
 [2] G. Bao, L. Mi, Y. Geng, and K Pahlavan. A computer vision based speed estimation technique for localiz ing the wireless capsule endoscope inside small intestine. In 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Chicago, 2014.
 [3] G. Bao, L. Mi, and K. Pahlavan. A video aided RF localization technique for the wireless capsule endoscope (WCE) inside small intestine. In 8th International Conference on Body Area Networks, Boston, 2013.
 [4] Y. Bao, G.and Ye, U. Khan, X. Zheng, and K. Pahlavan. Modeling of the movement of the endoscopy capsule inside GI tract based on the captured endoscopic images. In International Conference on Modeling, Simulation and Visualization Methods, Las Vegas, 2012.
 [5] G. Ciuti, A. Menciassi, and P. Dario. Capsule endoscopy: from current achievements to open challenges. Biomedical Engineering, IEEE Reviews in, 4:59–72, 2011.
 [6] J.P.S. Cunha, M. Coimbra, P. Campos, and J.M. Soares. Automated topographic segmentation and transit time estimation in endoscopic capsule exams. IEEE Transactions on Medical Imaging, 27(1):19–27, 2008.
 [7] R. Eliakim. Video capsule colonoscopy: where will we be in 2015? Gastroenterology, 139(5):1468–1471, 2010.

[8]
S. T. Goh and S. A. Zekavat.
DOAbased endoscopy capsule localization and orientation estimation via unscented Kalman filter.
IEEE Sensors Journal, 14(11):3819–3829, 2014.  [9] C. Hu, M. Q.H. Meng, and M. Mandal. The calibration of 3axis magnetic sensor array system for tracking wireless capsule endoscope. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, 2006.
 [10] C. Hu, W. Yang, D. Chen, M. Q.H. Meng, and H. Dai. An improved magnetic localization and orientation algorithm for wireless capsule endoscope. In 30th Annual International IEEE/EMBS Conference, Vancouver, 2008.
 [11] D. K. Iakovidis, E. Spyrou, D. Diamantis, and I. Tsiompanidis. Capsule endoscope localization based on visual features. In IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), Chania, 2013.
 [12] G. Idan, G. Meron, A. Glukhovsky, and P. Swain. Wireless capsule endoscopy. Nature, 405:417–417, 2000.
 [13] M. Kawasaki and R. Kohno. A TOA based positioning technique of medical implanted services. In Third International Symposium on Medical Information & Communication Technology, ISMCIT09, Montreal, 2009.
 [14] H. Liu, N. Pan, H. Lu, E. Song, Q. Wang, and C.C. Hung. Wireless capsule endoscopy video reduction based on camera motion estimation. Journal of digital imaging, 26(2):287–301, 2013.
 [15] L. Liu, C. Hu, W. Cai, and MQH. Meng. Capsule endoscope localization based on computer vision technique. In Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE, pages 3711–3714. IEEE, 2009.
 [16] L. Liu, W. Liu, C. Hu, and MQH. Meng. Hybrid magnetic and vision localization technique of capsule endoscope for 3d recovery of pathological tissues. In Intelligent Control and Automation (WCICA), 2011 9th World Congress on, pages 1019–1023. IEEE, 2011.
 [17] J. Modersitzki. FAIR: flexible algorithms for image registration, volume 6. SIAM, 2009.
 [18] A. Moglia, A. Menciassi, and P. Dario. Recent patents on wireless capsule endoscopy. Recent Patents on Biomedical Engineering, 1(1):24–33, 2008.
 [19] A. R. Nafchi, S. T. Goh, and S. A. Zekavat. High performance DOA/TOAbased endoscopy capsule localization and tracking via 2D circular arrays and inertial measurement unit. In IEEE International Conference, Wireless for Space and Extreme Environments (WiSEE), Baltimore, 2013.
 [20] T. Nakamura and A. Terano. Capsule endoscopy: past, present, and future. Journal of gastroenterology, 43(2):93–99, 2008.
 [21] K. Pahlavan, G. Bao, Y. Ye, S. Makarov, U. Khan, P. Swar, D. Cave, A. Karellas, P. Krishnamurthy, and K. Sayrafian. Rf localization for wireless video capsule endoscopy. International Journal of Wireless Information Networks, 19(4):326–340, 2012.
 [22] M. Salerno, G. Ciuti, G. Lucarini, R. Rizzo, P. Valdastri, A. Menciassi, A. Landi, and P. Dario. A discretetime localization method for capsule endoscopy based on onboard magnetic sensing. Measurement Science and Technology, 23(1):015701, 2012.
 [23] S. Song, C. Hu, M. Li, W. Yang, and M. Q.H. Meng. Twomagnetbased 6Dlocalization and orientation for wireless capsule endoscope. In Proceedings of the 2009 IEEE International Conference on Robotics and Biomimetics, Guilin, 2009.
 [24] E. Spyrou and D. K. Iakovidis. Homographybased orientation estimation for capsule endoscope tracking. In IEEE International Conference on Imaging Systems and Techniques (IST), Manchester, 2012.
 [25] E. Spyrou and D. K. Iakovidis. Videobased measurements for wireless capsule endoscope tracking. Measurement Science and Technology, 25(1):015002, 2014.
 [26] P. M. Szczypiński, R. D. Sriram, P. VJ Sriram, and D. N. Reddy. A model of deformable rings for interpretation of wireless capsule endoscopic videos. Medical Image Analysis, 13(2):312–324, 2009.
 [27] T. D. Than, G. Alici, H. Zhou, and W. Li. A review of localization systems for robotic endoscopic capsules. IEEE Transactions on Biomedical Engineering, 59(9):2387–2399, 2012.
 [28] Y. Ye, P. Swar, K. Pahlavan, and K. Ghaboosi. Accuracy of RSSbased RF localization in multicapsule endoscopy. International Journal of Wireless Information Networks, 19(3):229–238, 2012.
 [29] M. Zhou, G. Bao, and K. Pahlavan. Measurement of motion detection of wireless capsule endoscope inside large intestine. In Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE, pages 5591–5594. IEEE, 2014.
Comments
There are no comments yet.