Appearance-based localization is a robot self-navigation technique that integrates visual appearance and kinematic information. To analyze the visual appearance, we need to build a regression model based on extracted visual features from raw images as predictors to estimate the robot's location in two-dimensional (2D) coordinates. Given the training data, our first problem is to find the optimal subset of the features that maximize the localization performance. To achieve appearance-based localization of a mobile robot, we propose an integrated localization model that consists of two main components: the group least absolute shrinkage and selection operator (LASSO) regression and sequential Bayesian filtering. We project the output of the LASSO regression onto the kinematics of the mobile robot via sequential Bayesian filtering. In particular, we examine two candidates for the Bayesian estimator: the extended Kalman filter (EKF) and particle filter (PF). Our method is implemented in both indoor mobile robot and outdoor vehicle equipped with an omnidirectional camera. The results validate the effectiveness of our proposed approach.

## Introduction

Localization of a robot relative to its environment using visual information has received extensive attention over past few decades from the robotic and computer vision communities [1]. Autonomous unmanned aerial vehicles with vision-based navigation and localization [2,3] to cope with global positioning system (GPS)-denied environments play major roles in tasks such as search and rescue, environment monitoring, and surveillance [4,5]. Self-driving cars could benefit from GPS-denied geo-localization using visual odometry [6]. The appearance-based localization could be useful in indoor navigation [7], as well as in minimizing levels of location uncertainties in mobile sensor networks for regression of environmental fields [8,9].

The appearance-based method is based on a supervised learning framework that utilizes collections of location coordinates and visual observations. Models are learned from a large number of images at the training step, then newly collected images are recognized [10]. In related studies, gradient orientation histograms [11] and low dimensional representation of vision data [12] are used to localize mobile robots. In Ref. [13], a visual potential is used for navigating a mobile robot. A method for recognizing scene categories by comparing the histograms of local features is presented in Ref. [14]. Additionally, the speeded up robust feature (SURF) has been often used for classification type problems such as topological localization (i.e., image matching) [15]. In Ref. [16], the authors developed an X-ray vision system to build a map of a totally unseen territory by utilizing the Wi-Fi signal. While the careful feature engineering for such applications determines the ultimate performance, optimal feature selection for appearance-based localization has not been fully investigated to date. Bellotto et al. proposed an appearance-based localization using digital zoom and visual compass in order to integrate the visual appearance with odometry information [7].

In most of the cases, utilizing the full set of features or manual feature engineering is unlikely to yield optimal outcomes. Therefore, selecting the most effective features is critical in two ways: (1) to maximize the performance of the tasks and (2) to eliminate redundant features. For example, in localization tasks, the selected features must be robust to local noise and sensitive to different locations over the spatial field. One example of widely used feature selection technique is the principal component analysis, which is utilized to find the mapping from image features to the robot locations [17]. There are a number of methods to determine the significant variables in principal component analysis. However, since those methods are unsupervised learning, the selection process might not be optimized for the localization performance. Thus, we focus our investigation on optimal feature selection in a supervised learning fashion.

Appearance-based localization shall take into account kinematics of the mobile robots. Sequential Bayesian filtering approaches such as the Kalman filter and the particle filter (PF) have been applied extensively in vehicle localization. The Kalman filter [18] is based on the assumption that the system is linear, and the noise processes are distributed as Gaussian distributions. Thus, tracking the state distribution of a vehicle can be computed efficiently by propagating only the first and second moments of the distribution. For nonlinear systems, an approximation is made by an extended Kalman filter (EKF) in which the nonlinear system is linearized at every time-step such that the Kalman filtering can be implemented. On the other hand, the PF [19] approximates the real distribution via a Monte Carlo (MC) approximation using particles. Recently, these methods have been widely applied in GPS-based localization applications [6] or in cases where there are explicit observation models, for instance, mapping from Wi-Fi signal to positions [20]. However, the visual observation is high-dimensional and nonlinear, which makes directly applying filtering approaches impossible. Thus, a common solution is to extract low-dimensional features from the raw images then fit these features to the position of the vehicle [21]. Hence, in this paper, we plan to utilize the least absolute shrinkage and selection operator (LASSO) technique to select localization features from raw images in order to project onto the kinematics of mobile robots. The LASSO regression is chosen to perform both variable selection and regularization in order to enhance the prediction quality [22].

The recent research efforts that are closely related to our problem are summarized as follows. The location for a set of image features from new observations is inferred by comparing new features with the calculated map [23,24]. There exists effort on automatically finding the transformation that maximizes the mutual information between two random variables [25]. Using Gaussian process regression, Brooks et al. [23] and Schairer et al. [26] present effective approaches to build a map from a sparse set of noisy feature observations taken from known locations using an omnidirectional camera to infer the location. Building on Brook's approach [23], the work in Ref. [21] expands it more to the purpose of feature extraction and selection in order to improve the quality of localization. The localization is then based on a maximum likelihood estimation with a feature map from optimally selected visual features using Gaussian process regression [21].

In this paper, we first unwrap the omnidirectional images into panoramic ones and extract three types of visual features—fast Fourier transform (FFT), color histogram, and SURF. We concatenate all different types of features together and normalize the feature vectors to eliminate dominancy in values among different features. We then treat the robot's coordinates as multivariate targets and train the group LASSO [22] regression model using the training data set. Once the model is trained, we feed the newly captured image into the model to obtain an estimated position. Next, we treat the estimated position as a noisy observation and apply sequential Bayesian filtering, e.g., EKF or PF, in order to project onto vehicle's kinematics and to filter out the noise from the group LASSO regression outcomes. Preliminary results for indoor and outdoor experiments were reported in Refs. [27] and [28], respectively.

The contribution of our paper is as follows. In contrast to approaches used in Refs. [21] and [23], we focus on building a direct mapping from features space to the coordinates instead of solving for locations from the feature map via maximum likelihood estimation. Therefore, the test phase is executed in a short period of time since our method does not require any image matching or optimization steps. Second, our method is robust to visual noise and does not need high resolution images, which allows it to be applicable in low-cost embedded systems. For instance, we use relatively low quality and noise-prone images captured from a regular webcam (Logitech C270, Logitech, Newark, CA), yet the system yields reasonable results. Furthermore, in contrast to Ref. [29], our proposed method does not require the depth measurements of the SURF points.

The overall structure of this paper is as follows: We introduce the visual features that are used in this study in Sec. 2. We then discuss the LASSO regression in Sec. 3 and the group LASSO regression in Sec. 4. In Sec. 5, we show how to integrate the EKF into the LASSO-based localization performance. Experiment results are discussed in Sec. 6. For notations, we use $A[i,j]$ for the entry of row i, column j, $A[i,:]$ for the ith row of the matrix A, and $N(0,σ2)$ as the normal distribution with zero mean and variance $σ2$.

## Image Features

Inspired by a biological observation (e.g., insects and arthropods), recently omnidirectional cameras have been studied for navigation [30]. In contrast to conventional cameras with restricted fields of view, panoramic cameras can provide the entire surrounding environmental view in a single image [7]. In order to extract useful features from an omnidirectional image, we first unwrap it as follows. The raw image captured by the omnidirectional camera is a nonlinear mapping of a $360 deg$ panoramic view onto a doughnut shape. We recover the panoramic view by applying a reverse mapping of the pixels from the wrapped view onto a panoramic view [31]. Figures 1(a) and 1(b) show the wrapped and unwrapped images, respectively.

From the panoramic images, we extract three types of visual features. We will use the notation $xi$ generally for the collection of all types of the image features that are extracted from image i. In particular, we use the FFT coefficients, the histogram, and SURF as the image features [14]. These feature types and their properties are briefly explained below:

Fast Fourier transform: The two-dimensional (2D) fast Fourier transform of a square image of the size n × n is given by
$F[i](ρ,l)=∑a=0n−1∑b=0n−1f[i](a,b)e−j2π(ρan+lbn)$
(1)

where $f[i]$ is the ith two-dimensional realized image and j is the imaginary unit. To use the FFT, we convert the panoramic color images to gray scale 128 × 128 pixel images, i.e., $f[i](a,b)$. After applying the FFT on the resized spatial image, we obtain the same-sized image in the frequency domain, i.e., $F[i](ρ,l)$. Each complex point on the frequency domain is characterized by its magnitude and phase. However, it has been shown that the magnitude of the Fourier transformed image captures most of the information of the repeated geometric patterns in the spatial domain image [32]. Furthermore, the magnitude is robust to the rotation of the vehicle in the yaw angle. Finally, it was shown that the first 15 components of the FFT frequency spectrum carry enough information to correctly match a pair of images [31]. Based on the above knowledge, we specify the magnitude of the first 16 FFT components of the x axis, i.e., $xi={|F[i](1,0)|,…,|F[i](16,0)|}$ to be our FFT features. Figure 1(c) shows the two-dimensional FFT magnitude plot of an image.

Histogram: The image histogram [33] is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image. The number of pixels in each tonal bin of the image histogram is used as an image feature. A histogram provides a global descriptor for an image and is also robust to rotation in yaw angle of the robot.

Speeded-up robust features: SURF [34] is a powerful scale- and rotation-invariant descriptor that is obtained from Haar wavelet responses. It is widely used for computer vision tasks such as object recognition, image registration, and classification. For a point of interest (namely, a SURF point), the feature is a 64-dimensional vector that is calculated locally based on the neighborhood region around the point. To form a feature vector, a patch of 16 × 16 pixels is divided into groups of pixels (4 × 4 grid). For each element of the 4 × 4 cell, the sum of gradients and their absolute values are computed for each direction, i.e., ∑dx, $∑|dx|$, ∑dy, and $∑|dx|$. Thus, for each SURF point, there is a feature vector of dimension $4×4×4=64$.

However, since the number of SURF points vary across images, we apply an additional step to achieve the SURF feature for each image as follows. The SURF points from the whole data set are accumulated and put into the k-means clustering [35]. Each centroid is defined as a codeword and the collection of centroids is defined as the codebook. Each SURF point is mapped into the index of the nearest centroid in the codebook. Therefore, we obtain a histogram of codewords for each image that indicates the appearance frequency of all codewords in the image. The number of bins of the histogram is the size of visual features for the image. In this paper, we use a 100 bins histogram.

For the indoor environment, we utilize the FFT and histogram, since they are more robust to the dynamic change of the surrounding compared to the SURF. For the outdoor experiment, since a large portion of the image is covered by uniform tonal pixels of the sky, the FFT and histogram features are not sensitive to the change in surrounding scenes. The SURF features capture the local properties of points of interest in an image other than the global properties like the other two. Therefore, we use the SURF for the outdoor experiment. Features from different types are concatenated and normalized.

## The Least Absolute Shrinkage and Selection Operator

Suppose that one image is captured per sampling time i. Thus, we have the data $(Xi,yi),i∈{1,…,N}$, where $Xi=[xi[1],…,xi[p]]T$ and yi are the input variable vector and the target, respectively. Define $X∈ℝN×p$ and $y∈ℝN×1$ as the collections of N observations. We assume that the input vector and the target are standardized. Let $b̂=[b̂1,…,b̂p]T$ be the linear fitting estimate vector that provides the estimated target vector $ŷ$, i.e., $ŷ=Xb̂$. The vector $b̂$ can be computed by the least squares minimization as
$b̂=argminb‖y−Xb‖22$
(2)
where $‖.‖2$ is the $ℓ2$ norm. The LASSO regression introduces a penalty term into Eq. (2) as follows:
$b̂=argminb(12N‖y−Xb‖22+λ∑i=1p|bi|)$
(3)

where $λ≥0$ is a penalty parameter. The level of shrinkage applied to the estimated $b$, i.e., the number of zero entries of $b$, is determined by λ. For instance in Fig. 2(a), the lines show the evolution of elements of the vector $b̂$ as we change λ from 0 (no penalty applied) to a final value (when all entries are zeros). Increasing λ makes $b̂$ sparser as more of its elements are driven toward 0 as shown in Fig. 2(a). The collection of vectors $b̂$'s that correspond to different values of λ's is obtained from the training data set. It is then applied to the validation data set to form the validation error curve. An example of such an error curve obtained for a range of λ is shown in Fig. 2(b). Note that the curve has a minimum near λ = 27, which is the optimal value of λ. This value achieves a good balance between the goodness of fit from the least squares estimation and the sparseness of $b̂$ for model selection. In this study, we use the root-mean-squared error (RMSE) to compare two trajectories so the optimal λ is chosen at the minimum of the RMSE curve.

## The Group Least Absolute Shrinkage and Selection Operator

In Sec. 3, the estimated target yi is a scalar quantity. However, since the two coordinates of the robot's location need to be estimated simultaneously, we introduce the group LASSO regression [22] to estimate a multivariate target.

Let $Y∈ℝN×k$ be the collection of N target vectors of k dimensions $yi$ and $B̂∈ℝp×k$ be the matrix of estimated coefficients
$Ŷ=XB̂$
(4)
Note that k = 2 in this study, which is corresponding to two spatial coordinates. We define the $ℓ1/ℓ2$ norm as
$‖B‖ℓ1/ℓ2=∑i=1p(∑j=1kBij2)12=∑i=1p‖B[i,:]‖2$
where $B[i,:]$ is the ith row of the matrix $B$. The estimate matrix $B̂$ can be calculated as follows:
$B̂=argminB(12N‖Y−XB‖F2+λ‖B‖ℓ1/ℓ2)$
(5)
where $‖.‖F$ denotes the Frobenius norm.2 According to Obozinski et al. [22], Eq. (5) is equivalent to the second-order cone program
$B̂=argminB,β∈ℝp×1(12N‖Y−XB‖F2+λ∑i=1pβi)subject to‖B[i,:]‖2≤βi$
(6)

Define $vec(.)$ operator as $vec(B)=[B[:,1]T B[:,2]T ⋯ B[:,k]T]T$.

Applying the $vec(.)$ operator to both sides of Eq. (6) yields
$vec(B̂)=argminvec(B),β(12N‖vec(Y)−(I⊗X)vec(B)‖22+λ∑i=1pβi)subject to‖Wivec(B)‖2≤βi$
(7)

where ⊗ is the Kronecker product and $Wi∈ℝk×kp$ is the weight matrix,3 i.e., $W1=[Ik⋯Okp−k]$, $W2=[OkIk⋯Okp−2k],…$.

Note that the $ℓ1/ℓ2$ regularization penalizes the norm of the rows of estimate matrix $B$ in Eq. (5), which are corresponding to the two coordinates. Thus, features that are not significant for the localization of both coordinates are more likely to be eliminated.

## The Extended Kalman Filter

In this section, we show how to implement the EKF on the group LASSO-based localization as a postprocessing step. We describe the bicycle kinematics of a front wheel-driven vehicle for the outdoor experiment. For the indoor experiment, we utilize the unicycle kinematics, and the rest of the EKF shall stay unchanged.

We utilize the bicycle kinematics of a front wheel-driven vehicle. Let ηi, ui, and L be the front wheel angle, the rear wheel linear speed at iteration time i, and the wheelbase (distance between the two wheel axes) of the vehicle, respectively. Define $xi∈ℝ3$ as the state vector that includes the position and the heading angle of the robot, i.e., $xi=[qi[1]qi[2]ψi]T$. Therefore, the state transition equation of the vehicle can be described as follows:
$xi+1=xi+Δi[ui cos ψiui sin ψi tan(ηi)Lui]+ωi=f(xi,ui,ηi)+ωi$
(8)
where $ωi∼N(0,Σωi)$ is the white noise on the location and orientation. Let $x̂i+1−,Pi+1−$ be the estimated state and the covariance matrix of the robot before taking the position measurement, respectively. Assume that localization errors for the x, y axes and the orientation ψ are independent, therefore
$Pi=diag(σ1,i2,σ2,i2,σψi2)$
(9)
We make the predictions as
$x̂i+1−=f(xi,ui,ηi)Pi+1−=∇fPi∇fT+Σωi$
(10)
where $∇f$ is the Jacobian matrix of the function f with respect to the state vector $xi$, e.g.,
$∇fxi=[10−Δiui sin ψi01Δiui cos ψi001]$
(11)
where $Δi$ is the sampling period. Since we do not take the measurement of the vehicle's heading angle, the observation has the form
$q̃i=Mxi+ei$
(12)
where $M=[100010]$ and $ei∼N(0,σe2I)$. Note that $q̃i=ŷi$ from Eq. (4) since we take the estimation of group LASSO-based localization as the noisy measurement for the EKF, and the observation noise level σe is set to be the RMSE of the group LASSO-based localization. The measurement update can be computed as follows:
$x̂i+1=x̂i+1−+Ki+1(q̃i+1−Mx̂i+1−)Pi+1=(I−Ki+1M)Pi+1−$
(13)
where the Kalman gain $Ki+1$ is
$Ki+1=Pi+1−MT(MPi+1−MT+σe2I)−1$
(14)

The prediction and update measurement computations, i.e., Eqs. (10) and (13), are recursively executed in the localization process.

Note that for the two wheels mobile robot, Eq. (8) becomes
$xi+1=xi+Δi[ui cos ψiui sin ψiηi]+ωi,=f(xi,ui,ηi)+ωi$

where ηi is the robot's orientation.

The initial value for the covariance matrices in the EKF is determined as follows. For the observation covariance matrix, we set it to be $σobs2I$, where σobs is obtained from the RMSE of the group LASSO regression and I is a 2 × 2 identity matrix. This shows that the output of the group LASSO regression is used as the input to the filter. Similarly, we set the initial value for the process covariance matrix to be $σprocess2I$, where σprocess is obtained from the RMSE of the open-loop-based localization.

### The Particle Filter.

The PF uses a sampling approach, with a set of particles to compute the posterior distribution of a stochastic process given noisy observations without any assumptions of linearity or Gaussianity. We construct a separate study using the PF to replace EKF step using the outcomes of group LASSO regression based localization.

Unlike in the EKF where the posterior distribution at instance i + 1 is characterized as $p(xi+1|x1:i,q̃1:i+1)∼N(x̂i+1,Pi+1)$ with $x̂i+1$ and $Pi+1$ computed in Eq. (13), in PF the posterior distribution is approximated by utilizing the concept of importance sampling [19] as follows:
$p(x1:i+1|q̃1:i+1)≈∑k=1Nswi+1kδ(x1:i+1−x¯1:i+1k)$
(15)
where ${x¯i+1k|k=1⋯Ns}$ is the set of support points, ${wi+1k|k=1⋯Ns}$ is the set of corresponding weights, Ns is the number of particles, and $δ(.)$ is the delta function. Note that $∑k=1Nswik=1$. The support points are sampled from an importance density $q(x1:i+1|x1:i,q̃1:i+1)$ and the weights are computed as follows:
$wi+1k∝p(x¯1:i+1k|q̃1:i+1)q(x¯1:i+1k|q̃1:i+1)$
(16)
We can design the importance density such that
$q(x¯1:i+1k|q̃1:i+1)=q(x¯i+1k|x¯1:ik,q̃i+1)q(x¯1:ik|q̃1:i)$
(17)
It is shown in Ref. [19] that
$p(x¯1:i+1k|x¯1:ik,q̃1:i+1)∝p(q̃i+1|x¯i+1k)p(x¯i+1k|x¯ik)p(x¯1:ik|q̃1:i)$
(18)
Plugging Eqs. (17) and (18) into Eq. (16), we have an update rule for the weights as follows:
$wi+1k∝wikp(q̃i+1|x¯i+1k)p(x¯i+1k|x¯ik)q(x¯i+1k|x¯1:ik,q̃1:i+1)$
(19)
In practice, it is common to choose the importance density for the new state $q(x¯i+1k|x¯1:ik,q̃1:i+1)$ to be equal to the prior $p(x¯i+1k|x¯ik)$. By doing this, we have implicitly assumed that the evolution of the state at instance i + 1 is dependent only on the one at instance i. Hence, Eq. (19) is simplified to
$wi+1k∝wikp(q̃i+1|x¯i+1k)$
(20)
We use the same state transition and observation models as in the EKF case in our paper. In particular, adopting the state transition model in Eq. (8), we have
$x¯i+1k|x¯ik∼N(f(x¯ik,ui,ηi),Σω)$

In addition, the observation density $p(q̃i+1|x¯i+1k)$ can be computed from Eq. (12) as follows: $q̃i+1|x¯i+1k∼N(Mx¯i+1k,σe2I).$ Note that even we started with an attempt to find the state's posterior distribution for the whole time horizon, i.e., $p(x1:i+1|q̃1:i+1)$, Eq. (20) suggests that information about $x¯1:i−1k$ and $q̃1:i$ is “absorbed” in the weight $wik$ and there is no need to store them. In other words, we only need to propagate $x¯ik$ to iteration i + 1.

Furthermore, once we obtain $wi+1k$, we are more interested in the marginal distribution of the current state, i.e., $p(xi+1|q̃1:i+1)$ rather than the joint distribution in Eq. (15). Thus, the marginal posterior distribution can be approximated as
$p(xi+1|q̃1:i+1)≈∑k=1Nswi+1kδ(xi+1−x¯i+1k)$
(21)
with $wi+1k$ computed in Eq. (20).

## Experiment Study

### Indoor Experiment.

In this section, we show how we implement our proposed method in an indoor experimental setup. The mobile robot and the testing environment are shown in Fig. 3(a).

We control the robot remotely via wireless communication units, while the panoramic vision of the surrounding scene is recorded. The average speed of the robot is $0.2 m/s$. In order to track the true positions of the robot, we utilize an overhead camera (Logitech HD webcam C910, Logitech, Newark, CA) that is mounted on the ceiling. Note that the positions obtained from the ceiling camera are only used for evaluating the method's performance. In summary, we collect the following three data sets all sampled at 0.5 Hz: the command inputs, turn rates, and collapsed time between samplings ${ui,ηi,Δi}$; the surrounding scene recorded in the panoramic vision; and the positions of the mobile robot (ground truth).

We compare the localization results performed by the following three techniques:

• Open-loop based localization: We naively apply the recorded command inputs to the robot's kinematics, which is described in Eq. (8) to obtain the robot's position.

• Group LASSO-based localization: We randomly sample without replacement of the whole data set and categorize them into three labels: training, validation, and testing by corresponding percentages: 50%, 25%, and 25%. Then, we apply the group LASSO regression to the labeled data and compare the performance on the test data with other techniques.

• Group LASSO-based and EKF localization: We apply the EKF as a postprocessing to the test result of the group LASSO-based localization as described in Sec. 5.

• Group LASSO-based and PF localization: We apply the PF [36] as a postprocessing to the test result of the group LASSO-based localization.

For the indoor experiment, the test data are the collection of locations that are randomly chosen from the original trajectory. Therefore, when we run the EKF and the PF to estimate the trajectory of the robot, we assume that the robot does not take measurements at the nontest locations, i.e., estimation with missing observations. If an observation is made, the EKF executes the update step by Eq. (13). Otherwise, it predicts the state vector by open-loop kinematics and propagates the covariance matrix by Eq. (10). Similarly, for the PF, the weights of the filter density are updated once an observation is made. Otherwise, a uniform probability is assigned for all particles [36].

To illustrate the estimation with missing observation by the EKF, Fig. 4 shows the evolutions of the first and second entries of the diagonal of matrix Pi ($P[1,1]$ and $P[2,2]$) that represent the uncertainties in the x and y axes. Notice that the variance of position coordinates drops whenever an observation, which is indicated by a vertical dashed line in Fig. 4, is made. This simulates a practical situation where the localization observations are mostly missing and very sparse due to line-of-sight or GPS-denied environments.

Table 1 compares the localization results performed by open-loop localization, group LASSO-based localization, group LASSO-based with EKF localization, and group LASSO-based with PF localization which are denoted as “Open-loop,” “Group LASSO,” “Group LASSO + EKF,” and “Group LASSO + PF,” respectively.

Table 1 shows the RMSEs of the four methods. The open-loop prediction yields the worst performance, since there is no feedback in the prediction, the error is accumulated. The group LASSO-based localization reduces 55% features and yields 29% lower RMSE compared to the open-loop method. By applying the EKF, the group LASSO-based and EKF localization results in the lowest RMSE with 55% reduction compared to the open-loop case. Figure 5 shows the true trajectory (solid line), group LASSO-based localization (square dots), group LASSO and EKF localization (red dotted dashed line), and group LASSO and PF localization (dotted line). Table 1 has indicated the excellent performance of our proposed method.

### Outdoor Experiment.

In this section, we demonstrate the effectiveness of our proposed method using an outdoor experimental study.

Figure 6 shows the data acquisition circuit and the surveillance vehicle. We drive a vehicle (Acura RSX 2003) in Michigan State University campus and record the panoramic scene via a omnidirectional camera installed on the top of the vehicle. The average speed of the vehicle is $20 km/h$. We collect three data sets, which are sampled at 1 Hz: (1) the estimated control inputs and sampling times ${ηi,ui,Δi}$, (2) the scene recorded by the omnidirectional camera that is stored in the Raspberry Pi (Raspberry Pi model B+, Raspberry Pi Foundation, Cambridge, UK), and (3) the positions of the vehicle that are tracked by the Xsens GPS (MTi-G-700, Xsens Technologies B.V., Enschede, The Netherlands, position accuracy 2.5 m).

We compare the localization performances based on the following three techniques:

• Open-loop based localization: We naively apply the open-loop prediction by using the kinematics described in Eq. (8) and the command inputs.

• Group LASSO-based localization: We record two separated trajectories for training and testing, as shown in Fig. 7. The original data are divided $50:50$ for training and validation. The group LASSO regression uses the training data to train the estimate matrix $B$, the validation data to compute the optimal matrix $B̂$, and then applies $B̂$ to estimate the test data $Ŷ$.

• Group LASSO and EKF localization: We apply the EKF as a postprocessing to the test result of the group LASSO-based localization as described in Sec. 5.

• Group LASSO and PF localization: We apply the PF as a postprocessing to the test result of the group LASSO-based localization.

For this outdoor experiment, we also use the RMSE to evaluate the performance of the three techniques described earlier.

Table 1 shows the RMSEs of the three methods (column 2) and the number of used features over the initial total (column 3). The group LASSO-based localization reduces 75.5% of the whole features. By applying the EKF, the group LASSO-based localization with the EKF results in the best RMSE with 30.6% reduction compared to the open-loop method. Figure 8 shows the true trajectory (solid line), group LASSO-based localization (square dots), group LASSO and EKF localization (dotted dashed line), and group LASSO and PF localization (dotted line).

The estimation step of the group LASSO-based localization shown in Eq. (4) is linear regression. Thus, its estimation error has a smaller bias compared to the open-loop prediction, which gradually deviates from the true trajectory, but exhibits high variance. Therefore, we utilize the EKF (as well as PF) as a low-pass filter to smooth the estimated trajectory by projecting the noisy outputs of the group LASSO-based localization onto the vehicle kinematics. Figure 9(a) shows the evolution of the group LASSO regression estimate matrix $B$. Each pair of features that is corresponding to one location (two coordinates) is plotted in the same color. The optimal $B$ is plotted in Fig. 9(b). Note that the matrix is sparse with a large number of entries having values of zero.

For comparison, we also implement the PF [19] that takes the group LASSO output as measurements. Since the PF involves random generation of the particles, by adapting the evaluation criteria from Ref. [19], we evaluate the performance of the PF by the MC simulation with 100 runs. The statistics of the MC are reported in Table 1 with the mean and standard variation denoted by $(μ,σ)$. The estimated path by the group LASSO and PF localization with lowest RMSE among the MC simulations is plotted in dotted line in Fig. 8. A snapshot of the PF at sampling time t = 50 is shown in Fig. 10. The weight densities of 800 particles are plotted in gray-scale colored dots. Note that the sum of weight densities over all particles equals to 1. The PF estimated location is computed by averaging the locations of all particles with the corresponding weight densities.

The computational time is reported in Table 1 in seconds. Since the filtering processes (EKF and PF) are deployed as the postprocessing after the group LASSO, we report their test phase computational times as an additional part to the group LASSO regression, i.e., after the “+” sign in the last column of Table 1. Generally, the train phase requires 30 min for both indoor and outdoor data, while the test phase is significantly shorter. The prediction process of the group LASSO includes one matrix multiplication in Eq. (4) and the visual features extraction. In general for one data point, extracting the SURF, FFT, and histogram features take 0.68, 0.25, and 0.17 s, respectively. Since the SURF feature involves clustering, it requires more time to extract compared to the FFT and histogram. Therefore, the test phase computational time in the outdoor experiment is longer than in the indoor case.

A visual summary of the RMSEs of the two experiments is shown in Fig. 11. In the indoor experiment, the open-loop localization RMSE level is higher than that of the vision-based localization. The group LASSO and Kalman filtering obtained the best RMSE result. On the other hand, in the outdoor experiment, the open-loop localization RMSE level is quite small. This is due to the fact that (in contrast to the indoor mobile robot with known control inputs) we estimated the control inputs by fitting the vehicle model to the collected trajectory data optimizing the goodness-of-the fit between the model and real trajectories. In this case, the group LASSO and Kalman filtering also outperforms others. The RMSE of the group LASSO in the indoor case outperforms that of the outdoor case. This may be due to the fact that the sky takes a large portion of the outdoor scene providing not much dependable features for localization.

## Conclusion

The paper presented a novel appearance-based approach based on group LASSO regression. We have shown the effectiveness of our method in feature selection and localization enhancement by combining the group LASSO regression and the sequential Bayesian filters. The experiment study shows the significant reduction (75.5%) in the features while improving the localization performance. The performance of the system can be further improved by adapting a better vision camera as well as larger training data. Additionally, it would be possible to improve the performance if we use a Bayesian filter considering the vehicle dynamics for the real car outdoor experiments.

The linearity assumption used in group LASSO regression could restrict the localization performance due to the nonlinear nature of the map from visual features to the robot positions. Therefore, a potential future study is to apply variable selection with a nonlinear regression technique, such as Gaussian process regression, to relax the linearity condition.

Our method could be used potentially as another locational sensor in a sensor fusion part for lightly equipped unmanned vehicles that are often facing GPS-denied environments.

## Funding Data

• National Research Foundation of Korea, Ministry of Science and ICT (2016R1A2B4008237).

• Ministry of Trade, Industry and Energy, MOTIE, South Korea (Technology Innovation Program 10073129).

• National Science Foundation (CAREER Award CMMI-0846547).

• Vietnam Education Foundation Fellowship.

• Yonsei University (New Faculty Research and Facility Grant).

2

The Frobenius norm of a matrix A is defined by $‖A‖F:=∑i,jAi,j2$.

3

$Ik$ and $Ok$ are the identity and zero matrices of size k × k, respectively.

## References

References
1.
Bonin-Font
,
F.
,
Ortiz
,
A.
, and
Oliver
,
G.
,
2008
, “
Visual Navigation for Mobile Robots: A Survey
,”
J. Intell. Rob. Syst.
,
53
(
3
), pp.
263
296
.
2.
Weiss
,
S.
,
Scaramuzza
,
D.
, and
Siegwart
,
R.
,
2011
, “
Monocular-Slam–Based Navigation for Autonomous Micro Helicopters in GPS-Denied Environments
,”
J. Field Rob.
,
28
(
6
), pp.
854
874
.
3.
Pestana
,
J.
,
Sanchez-Lopez
,
J. L.
,
Saripalli
,
S.
, and
Campoy
,
P.
,
2014
, “
Computer Vision Based General Object Following for GPS-Denied Multirotor Unmanned Vehicles
,”
American Control Conference
(
ACC
), Portland, OR, June 4–6, pp.
1886
1891
.
4.
Xu
,
Y.
,
Choi
,
J.
,
Dass
,
S.
, and
Maiti
,
T.
,
2016
,
Bayesian Prediction and Adaptive Sampling Algorithms for Mobile Sensor Networks: Online Environmental Field Reconstruction in Space and Time
,
Springer
, New York.
5.
Xu
,
Y.
,
Choi
,
J.
,
Dass
,
S.
, and
Maiti
,
T.
,
2013
, “
Efficient Bayesian Spatial Prediction With Mobile Sensor Networks Using Gaussian Markov Random Fields
,”
Automatica
,
49
(
12
), pp.
3520
3530
.
6.
Gupta
,
A.
,
Chang
,
H.
, and
Yilmaz
,
A.
,
2016
, “
GPS-Denied Geo-Localisation Using Visual Odometry
,”
ISPRS Annual Photogrammetry, Remote Sensing Spatial Information Science
, pp.
263
270
.
7.
Bellotto
,
N.
,
Burn
,
K.
,
Fletcher
,
E.
, and
Wermter
,
S.
,
2008
, “
Appearance-Based Localization for Mobile Robots Using Digital Zoom and Visual Compass
,”
Rob. Auton. Syst.
,
56
(
2
), pp.
143
156
.
8.
Choi
,
S.
,
,
M.
,
Choi
,
J.
, and
Oh
,
S.
,
2015
, “
Distributed Gaussian Process Regression Under Localization Uncertainty
,”
ASME J. Dyn. Syst. Meas. Control
,
137
(
3
), p.
031007
.
9.
,
M.
,
Xu
,
Y.
,
Choi
,
J.
,
Johnson
,
N. S.
, and
Li
,
W.
,
2013
, “
Gaussian Process Regression for Sensor Networks Under Localization Uncertainty
,”
IEEE Trans. Signal Process.
,
61
(
2
), pp.
223
237
.
10.
Hayet
,
J.
,
Lerasle
,
F.
, and
Devy
,
M.
,
2003
, “
Visual Landmarks Detection and Recognition for Mobile Robot Navigation
,”
IEEE
Computer Society Conference on Computer Vision and Pattern Recognition
, Madison, WI, June 18–20, pp.
II-313
II-318
.
11.
Kosecka
,
J.
,
Zhou
,
L.
,
Barber
,
P.
, and
Duric
,
Z.
,
2003
, “
Qualitative Image Based Localization in Indoors Environments
,”
IEEE
Computer Society Conference on Computer Vision and Pattern Recognition
, Madison, WI, June 18–20, pp.
II-3
II-8
.
12.
Torralba
,
A.
,
Murphy
,
K.
,
Freeman
,
W.
, and
Rubin
,
M.
,
2003
, “
Context-Based Vision System for Place and Object Recognition
,”
IEEE
International Conference on Computer Vision
, Nice, France, Oct. 13–16, pp.
273
280
.
13.
Ohnishi
,
N.
, and
Imiya
,
A.
,
2013
, “
Appearance-Based Navigation and Homing for Autonomous Mobile Robot
,”
Image Vision Comput.
,
31
(
6–7
), pp.
511
532
.
14.
Lazebnik
,
S.
,
Schmid
,
C.
, and
Ponce
,
J.
,
2006
, “
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
,”
IEEE
Computer Society Conference on Computer Vision and Pattern Recognition
, New York, June 17–22, pp.
2169
2178
.
15.
Valgren
,
C.
, and
Lilienthal
,
A.
,
2008
, “
Incremental Spectral Clustering and Seasons: Appearance-Based Localization in Outdoor Environments
,”
IEEE International Conference on Robotics and Automation
(
ICRA
), Pasadena, CA, May 19–23, pp.
1856
1861
.
16.
Depatla
,
S.
,
Buckland
,
L.
, and
Mostofi
,
Y.
,
2015
, “
X-Ray Vision With Only WiFi Power Measurements Using Rytov Wave Models
,”
IEEE Trans. Veh. Technol.
,
64
(
4
), pp.
1376
1387
.
17.
Se
,
S.
,
Lowe
,
D.
, and
Little
,
J.
,
2005
, “
Vision-Based Global Localization and Mapping for Mobile Robots
,”
IEEE Trans. Rob.
,
21
(
3
), pp.
364
375
.
18.
Ribeiro
,
M. I.
,
2004
, “
Kalman and Extended Kalman Filters: Concept, Derivation and Properties
,”
Inst. Syst. Rob.
,
43
, p.
43
.
19.
Arulampalam
,
M. S.
,
,
S.
,
Gordon
,
N.
, and
Clapp
,
T.
,
2002
, “
A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking
,”
IEEE Trans. Signal Process.
,
50
(
2
), pp.
174
188
.
20.
Duvallet
,
F.
, and
Tews
,
A. D.
,
2008
, “
WiFi Position Estimation in Industrial Environments Using Gaussian Processes
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems
(
IROS
2008), Nice, France, Sept. 22–26, pp.
2216
2221
.
21.
Do
,
H. N.
,
,
M.
,
Choi
,
J.
, and
Lim
,
C. Y.
,
2015
, “
Feature Selection for Position Estimation Using an Omnidirectional Camera
,”
Image Vision Comput.
,
39
, pp.
1
9
.
22.
Obozinski
,
G.
,
Wainwright
,
M. J.
, and
Jordan
,
M. I.
,
2011
, “
Support Union Recovery in High-Dimensional Multivariate Regression
,”
Ann. Stat.
,
39
(
1
), pp.
1
47
.
23.
Brooks
,
A.
,
Makarenko
,
A.
, and
Upcroft
,
B.
,
2008
, “
Gaussian Process Models for Indoor and Outdoor Sensor-Centric Robot Localization
,”
IEEE Trans. Rob.
,
24
(
6
), pp.
1341
1351
.
24.
Menegatti
,
E.
,
Zoccarato
,
M.
,
Pagello
,
E.
, and
Ishiguro
,
H.
,
2004
, “
Image-Based Monte Carlo Localisation With Omnidirectional Images
,”
Rob. Auton. Syst.
,
48
(
1
), pp.
17
30
.
25.
Vlassis
,
N.
,
Bunschoten
,
R.
, and
Krose
,
B.
,
2001
, “
Learning Task-Relevant Features From Robot Data
,”
IEEE International Conference on Robotics and Automation
(
ICRA
), Seoul, South Korea, May 21–26, pp.
499
504
.
26.
Schairer
,
T.
,
Huhle
,
B.
,
Vorst
,
P.
,
Schilling
,
A.
, and
Straser
,
W.
,
2011
, “
Visual Mapping With Uncertainty for Correspondence-Free Localization Using Gaussian Process Regression
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems
(
IROS
), San Francisco, CA, Sept. 25–30, pp.
4229
4235
.
27.
Do
,
H. N.
,
Choi
,
J.
,
Lim
,
C. Y.
, and
Maiti
,
T.
,
2015
, “
Appearance-Based Localization Using Group LASSO Regression With an Indoor Experiment
,”
IEEE International Conference on Advanced Intelligent Mechatronics
(
AIM
), Busan, South Korea, July 7–11, pp.
984
989
.
28.
Do
,
H. N.
,
Choi
,
J.
,
Lim
,
C. Y.
, and
Maiti
,
T.
,
2015
, “
Appearance-Based Outdoor Localization Using Group LASSO Regression
,”
ASME
Paper No. DSCC2015-9865.
29.
Se
,
S.
,
Lowe
,
D.
, and
Little
,
J.
,
2002
, “
Mobile Robot Localization and Mapping With Uncertainty Using Scale-Invariant Visual Landmarks
,”
Int. J. Rob. Res.
,
21
(
8
), pp.
735
758
.
30.
Winters
,
N.
,
Gaspar
,
J.
,
Lacey
,
G.
, and
Santos-Victor
,
J.
,
2000
, “
,”
IEEE
Workshop on Omnidirectional Vision,
Hilton Head Island, SC, June 12, pp.
21
28
.
31.
Menegatti
,
E.
,
Maeda
,
T.
, and
Ishiguro
,
H.
,
2004
, “
Image-Based Memory for Robot Navigation Using Properties of Omnidirectional Images
,”
Rob. Auton. Syst.
,
47
(
4
), pp.
251
267
.
32.
Nixon
,
M.
, and
,
A. S.
,
2008
,
Feature Extraction and Image Processing
,
, Cambridge, MA.
33.
,
E.
,
Grossberg
,
M. D.
, and
Nayar
,
S. K.
,
2004
, “
Multiresolution Histograms and Their Use for Recognition
,”
IEEE Trans. Pattern Anal. Mach. Intell.
,
26
(
7
), pp.
831
847
.
34.
Bay
,
H.
,
Andreas Ess
,
T. T.
, and
Gool
,
L. V.
,
2008
, “
Speeded-Up Robust Features (SURF)
,”
Comput. Vision Image Understanding
,
110
(
3
), pp.
346
359
.
35.
Hartigan
,
J. A.
, and
Wong
,
M. A.
,
1979
, “
Algorithm as 136: A k-Means Clustering Algorithm
,”
Appl. Stat.
, (
1
), pp.
100
108
.
36.
Gopaluni
,
R.
,
2008
, “
A Particle Filter Approach to Identification of Nonlinear Processes Under Missing Observations
,”
Can. J. Chem. Eng.
,
86
(
6
), pp.
1081
1092
.