Abstract

Connected and autonomous vehicles have the potential to minimize energy consumption by optimizing the vehicle velocity and powertrain dynamics with Vehicle-to-Everything info en route. Existing deterministic and stochastic methods created to solve the eco-driving problem generally suffer from high computational and memory requirements, which makes online implementation challenging. This work proposes a hierarchical multi-horizon optimization framework implemented via a neural network. The neural network learns a full-route value function to account for the variability in route information and is then used to approximate the terminal cost in a receding horizon optimization. Simulations over real-world routes demonstrate that the proposed approach achieves comparable performance to a stochastic optimization solution obtained via reinforcement learning, while requiring no sophisticated training paradigm and negligible on-board memory.

1 Introduction

The introduction of connected and automated vehicles (CAVs) has had a significant impact due to its potential to improve efficiency by reducing energy consumption. The increased amount of information the vehicle receives from vehicle-to-infrastructure (V2I), vehicle-to-vehicle (V2V), and global positioning system (GPS) technology affords CAVs the ability to make context-aware decisions en route for higher travel and fuel efficiency [1].

An eco-driving problem for CAVs can be formulated as an optimal vehicle speed trajectory that minimizes a given cost functional over a route. Recent work in this field explores the opportunity for leveraging information on surrounding traffic and signalized intersection information, allowing for the harmonization of traffic speed [2]. Recent studies aim to minimize energy consumption by either sequentially optimizing [3], or co-optimizing the speed and powertrain dynamics [4]. Similarly, the potential of leveraging information from signalized intersections to improve CAVs energy efficiency has been shown in Ref. [5], while V2V opportunities for efficiency improvements on large-scale traffic scenarios are explored in Ref. [6].

Different solution methods have been investigated in literature to solve the eco-driving problem. Most of these include the use of Pontryagin's minimum principle (PMP) [7] and dynamic programming (DP) [8]. Although PMP and DP provide globally optimal solutions for a given route itinerary, it is difficult to solve the problem online due to high computational requirements. Alternatively, the problem is solved hierarchically using DP, namely utilizing multiple horizons [9]. This solution method, referred to as the rollout algorithm [4], involves solving a long-term optimization under nominal conditions first, then solving a short-term optimization to account for variability en route. The long-term optimization is considered a base-heuristic, and it is approximated as the terminal cost for the short-horizon optimization.

Different methods may be explored to generate the base-heuristic. For example, a full-route deterministic optimization could be performed using DP and stored as multi-dimensional maps [10]. Alternatively, approaches from the field of machine learning (ML) which leverage data sets obtained via field experiments or simulation may be applied. For instance, a Safe Model-based Off-policy Reinforcement Learning (SMORL) method was demonstrated to learn the terminal cost offline [1]. In addition, several methods utilize online methods based on Q-learning [11] and Actor-Critic networks [12] to optimize the base heuristic during simulation.

Computing a full-route DP solution requires no training online but fails to account for unknown route variability, such as signal phase and timing (SPaT), which refers to the duration and order of a traffic light's states (i.e., red, green and yellow). Moreover, the storing of the precomputed deterministic value function is memory intensive, hence prohibitive for in-vehicle deployment. Conversely, using ML for offline or online training requires less memory relative to the DP solution, but the extensive training and the feedback process intrinsic to reinforcement learning make this method computationally expensive. More specifically, model-based reinforcement learning introduces computational expense given its complexity and while model free reinforcement learning lessens the additional expense, it introduces inaccuracy by relying on a reward function that does not account for system characteristics or physics. This work proposes a neural network (NN) based methodology of extracting the base-heuristic from a pre-computed full-route solution derived from a co-optimization of powertrain dynamics and velocity [4]. Given a terminal state as an input, the NN approximates the terminal cost for the remainder of the driving mission. Training is performed using the value function computed on different routes and SPaT combinations. The presented approach is illustrated in Fig. 1 and offers three advantages over the previous methods. First, the offline training of the terminal cost saves significant computation time relative to the SMORL method. Second, the neural network trains on a set of optimal DP solution data based on an accurate physical model which improves on the accuracy achieved by online reinforcement learning methods. Third, the neural network structure is implemented as a function approximation rather than a static map, significantly decreasing the memory requirements.

Fig. 1
Hierarchical multi-horizon optimization framework
Fig. 1
Hierarchical multi-horizon optimization framework
Close modal

2 Vehicle Dynamics and Powertrain Model

A forward-looking model of the longitudinal vehicle dynamics and a 48 V mild-hybrid powertrain is used in this work for computing energy use [8]. The powertrain consists of a Belted Starter Generator (BSG) connected to a 1.8 L turbocharged gasoline engine. The battery is modeled as a zero-th order equivalent circuit to determine the State-of-Charge (SoC). Quasi-static models predict the engine fuel consumption, as well as BSG, torque converter and transmission efficiency. The model was validated with test data collected on a chassis dynamometer for regulatory drive cycles [13].

3 Problem Formulation

3.1 Full Route Optimization.

Let s ∈ [0, N] ⊂ ℝ denote the discrete distance step, xs=[vs,ξs,ts]XRn the state variables comprising of vehicle velocity vs, battery SoC ξs and travel time ts, us=[Teng,s,Tbsg,s]URm be the control input comprising of the engine Teng,s and BSG torque Tbsg,s respectively. The discretized state dynamics is:
(1)
where x0 is the known initial condition for vehicle speed and SoC and vs¯ denotes average velocity for a time step. The equations describing the discrete state dynamics f(xs, us) at distance step s are:
(2a)
(2b)
(2c)
where Ftr,s is the tractive force produced by the powertrain [13]. Froad,s is the road load resistive force, M is the total vehicle mass, I¯batt,s is the current evaluated over a distance step, Cnom is the nominal battery capacity, ts is the travel time at a position s. Here, tGR,s and tRG,s represent the time remaining in the green and red phase respectively. It is assumed that the positions of all the traffic lights in the route are known a priori from a navigation system and contained in the set DTL.
An admissible control map at distance s is denoted by μs:XU, which satisfies the constraint h(xs, μs(x)) ≤ 0 for all xsX, where h:X×URp,
(3a)
(3b)
(3c)
(3d)
(3e)
(3f)
where vk{min,max},ξk{min,max},a{min,max},Teng{min,max} and Tbsg{min,max} refers to the minimum and maximum limits on speed, SoC, acceleration, engine and BSG torque respectively. TG,k refers to the green window of the traffic light. The sequence of admissible control maps denoted by μ:(μs)s=0N1 is referred to as the policy of the controller. The set of admissible policies is denoted by Γ. A running cost (or stage cost) function c:X×UR is introduced, accounting for the trade-off between the fuel consumption and travel time:
(4)
where m˙f,k(xk,μk(xk)) is the rate of fuel consumption, Δtk is the travel time over a given distance step and γ is the relative weight between fuel and travel time. A comprehensive study on the effect of the selection of γ on the optimal solution was performed in Ref. [4].
The Optimal Control Problem (OCP) is formulated over a N steps full-route as:
(5)
where cN(x) accounts for costs related to the terminal state of the system.

3.2 Receding Horizon Optimization.

To allow for real-time implementation and account for sources of variability along the route, the optimization is reformulated as a receding horizon OCP (RHOCP) and solved using a Rollout Algorithm, an online suboptimal control technique based on approximation in the value space [4]. Here, the optimal cost-to-go function obtained from solving the full-route optimization is improved upon by solving a one-step look-ahead optimization. Considering the same discrete dynamic function in Eq. (1) and constraints in Eq. (3), at distance s, the policy μ¯ is evaluated by solving the constrained RHOCP:
(6)
where cT(xs+NH) is the terminal cost, approximated as the base-heuristic from the value function of the full-route optimization in Eq. (5).
Note that the full-route pre-optimization assumes a fixed SPaT sequence, neglecting the variability that can significantly impact the energy consumption within the RHOCP. To overcome this limitation, a stochastic OCP is defined to minimize the expectation of the cost function over all the possible realizations of SPaT [1]:
(7)
where I is the indicator function, st is the distance travelled at time t, stotal is the route distance. The constraint set over the NH horizon remains the same as Eq. (3).

The OCP formulated in Eq. (7) is solved using SMORL [1]. The algorithm performs a data-driven approximation of the terminal cost, accounting for the variations in SPaT using a simulator of the vehicle interacting with the environment. The simulated ego-vehicle is assumed to be equipped with GPS and a Dedicated Short-Range Communication (DSRC) sensor, providing SPaT data within the communication range.

When compared against the deterministic rollout solution, which is assumed to be the baseline strategy, SMORL showed 11.0% less fuel consumption [1]. While the stochastic OCP solved using SMORL allowed the incorporation of a wide range of scenarios, the algorithm requires a large number of online simulations for the training process. In fact, the main limitation of this method is its reliance on an actor-critic network system and a perturbation network supported by both a safe set and a replay buffer [1]. As a direct consequence, this method requires an extremely large amount of memory allocation, which is prohibitive in realistic online implementations.

3.3 Neural Network–Based Rollout Algorithm.

Instead of learning the value function using SMORL, a novel rollout algorithm is developed in this paper. This approach requires the full-route optimization in Eq. (5) to be run offline for a wide range of simulated routes and SPaT combinations, storing the resulting value functions. A fully connected feed-forward NN is then trained to approximate the cost-to-go from the terminal cost matrices as a function of an extended state-space. The inputs to the NN are based on an extension of [1] and are summarized in Table 1.

Table 1

Input vector form

VariableDescription
X¯SoC ∈ ℝBattery SoC
Vveh ∈ ℝVehicle Velocity
Vrlim ∈ ℝDifference of Vehicle Velocity and Speed Limit at the Current Road Segment
VrlimRDifference of Vehicle Velocity and Upcoming Speed Limit
dtfc ∈ ℝDistance to upcoming traffic light
dlimRDistance to road segment at which the speed limit changes
drem ∈ ℝRemaining distance of the trip
xtfc ∈ {ℝ| − 1 ≤ xtfc ≤ 1}6Sampled status of the upcoming traffic light encoded as six digits
VariableDescription
X¯SoC ∈ ℝBattery SoC
Vveh ∈ ℝVehicle Velocity
Vrlim ∈ ℝDifference of Vehicle Velocity and Speed Limit at the Current Road Segment
VrlimRDifference of Vehicle Velocity and Upcoming Speed Limit
dtfc ∈ ℝDistance to upcoming traffic light
dlimRDistance to road segment at which the speed limit changes
drem ∈ ℝRemaining distance of the trip
xtfc ∈ {ℝ| − 1 ≤ xtfc ≤ 1}6Sampled status of the upcoming traffic light encoded as six digits

The neural networks are trained offline with supervised learning using the processed DP solution as ground truth. This solution benefits the short-term rollout solution as it is trained inexpensively using offline data generation and an offline training process. In addition, the derived network accounts for SPaT, allowing it to function with stochastic elements while requiring significantly less storage space. A summary of the process for producing the neural network is given in Fig. 2.

Fig. 2
Neural network training flowchart
Fig. 2
Neural network training flowchart
Close modal

3.3.1 Data Set Creation and Augmentation.

The 3-state vector is augmented using speed limit and SPaT information under the assumption that each route has information on the placement and the speed limit values used to define its relative difference from vehicle velocity. The route has a known length, therefore distance remaining is an additional state. Then, SPaT is included as the distance to the next traffic light, phase and time sample of the respective phase. The final input vector to the NN for the training takes the form X¯ shown in Table 1.

The cost-to-go values obtained from DP include high-cost regions that correspond to infeasible operations of the vehicle. Because the objective is to use supervised learning for the approximation of the value function, infeasibilities in the data have been addressed in preprocessing. Specifically, the infeasibilites due to the violation of the limits on the states are shown in Figs. 35. Infeasibilities in Fig. 4 imposing the recharge constraint were removed because they are applied in rollout. Infeasibilities in Fig. 5 for going too slow were removed because they are not applicable in an online simulation scenario. Infeasibilities due to traffic lights were also truncated so that the neural network could account for the obstacle, as in Fig. 5. Finally, infeasibilities from exceeding the speed limit, as shown in Fig. 3, were removed due to the constraint being applied during rollout.

Fig. 3
Cost-to-Go graphs as a function of velocity
Fig. 3
Cost-to-Go graphs as a function of velocity
Close modal
Fig. 4
Cost-to-Go graphs as a function of SoC
Fig. 4
Cost-to-Go graphs as a function of SoC
Close modal
Fig. 5
Cost-to-Go graphs as a function of time
Fig. 5
Cost-to-Go graphs as a function of time
Close modal

3.3.2 Training Process.

To produce raw data, full-route DP optimization was performed on a set of routes. Once post-processed, this data was used to train the NN whose input layer is the same as the input vector X¯, and its output layer is the predicted value function. This neural network is optimized using an offline supervised training process. During training, the data was normalized, scrambled, applied as mini-batches, and the chosen optimization algorithm was ADAM. Using a learning rate of 0.001 and a dropout rate of 30 percent on a two-layer network composed of 500 neurons each, an optimal set of neural network parameters was derived using the algorithm summarized in algorithm 1. This training process converges by early stopping such that it ceased training once changes in training and testing error became negligible.

Neural network supervised learning
Algorithm 1

 1: Initialize neural network, encoder, ADAM loss function, output file for training and test loss

 2: forniter in N epochs do

 3:    Initialize loss values

 4:    Randomly sample 100,000 points uniformly from the processed DP solution

 5:    forjth mini-batch of 500 in the data set do

 6:      Encode traffic light data

 7:      Perform a forward pass for the input data

 8:     Iterate the model weights using back-propagation on prediction error

 9:      Add training loss to running sum

10:    end for

11:    Average running loss sum for average training loss

12:    Perform a forward run and calculate loss on test set data using the current network

13: end for

3.3.3 Neural Network.

The converged NN approximates the terminal cost as a function of the augmented vehicle state input vector X¯. Utilizing an offline training process and requiring much less memory to store and use compared to the fully deterministic solution, this new formulation differs from the SMORL formulation by considering stochastic variation in a more computationally efficient manner.

4 Simulation Results

The trained NN is used as the base-heuristic in the RHOCP Eq. (6) and compared against SMORL. To demonstrate the ability of the NN to predict the terminal cost for a given set of states, an overfit was performed over a training data set whereby the NN is trained to predict only its training data accurately. The chosen route was an 8.2 km mixed-urban route with 2 traffic lights, a representative example from the 100-route dataset.

The trained NN was then integrated in the rollout scheme as shown in Fig. 1 with a horizon length of 1 km. Figure 6 shows the comparison between the solution of RHOCP Eq. (6) using the deterministic value function and the developed NN as the terminal cost. Results indicate that the velocity trajectories obtained with the two methods are within close proximity, which indirectly confirms that the trained NN is correctly predicting the terminal cost of the RHOCP.

Fig. 6
Result of neural network overfit to a sample route
Fig. 6
Result of neural network overfit to a sample route
Close modal

After this initial confirmation, the NN was retrained to generalize and represent several possible routes. A set of 20 real-world routes was chosen with a given SPaT profile generated in SUMO (Simulation of Urban Mobility) [14]. Each route contains around 2 × 108 data points, together composing a set of approximately 4 × 109 data points. For this training process, the data was split into 16 and 4 routes forming the training and test set respectively. The training process detailed in Sec. 3.3.2 was followed. Figure 7 shows the average loss per epoch over the test set, demonstrating convergence during learning.

Fig. 7
Average loss per epoch on the test set
Fig. 7
Average loss per epoch on the test set
Close modal

The resulting NN is integrated with the rollout using a 200 m horizon and simulated over five representative routes selected from a set of 100 routes. The routes include three mixed-urban route and two urban routes, summarized in Table 2.

Table 2

Route characteristics

RouteLength (km)Average speed limit (m/s)Traffic lightsStop signs
Overfit8.2225.3222
Urban Route 1 (UR1)10.0118.2772
Urban Route 2 (UR2)8.4119.2232
Mixed-Urban Route 1 (MUR1)10.8323.4552
Mixed-Urban Route 2 (MUR2)7.4022.4462
Mixed-Urban Route 3 (MUR3)7.1723.2372
RouteLength (km)Average speed limit (m/s)Traffic lightsStop signs
Overfit8.2225.3222
Urban Route 1 (UR1)10.0118.2772
Urban Route 2 (UR2)8.4119.2232
Mixed-Urban Route 1 (MUR1)10.8323.4552
Mixed-Urban Route 2 (MUR2)7.4022.4462
Mixed-Urban Route 3 (MUR3)7.1723.2372

The cumulative performance of the NN-rollout framework is compared against SMORL over the 5 selected routes, as summarized in Table 3. The results indicate that the NN-rollout method reduces the cumulative cost compared to SMORL for all the routes considered, due to the fact that the NN learned the value function from the global DP solutions. Further, the NN-rollout method improves the fuel economy over SMORL for 4 of the 5 selected routes, with minimal effect on the travel time.

Table 3

Summary statistics of simulated routes

Fuel economy (mpg)Time (s)Cumulative cost (—)Final SoC (%)
SMORLNN-rolloutSMORLNN-rolloutSMORLNN-rolloutSMORLNN-rollout
UR144.2243.72 (−1.1%)752748 (−0.53%)594.74594.44 (−0.050%)0.50060.5689
MUR142.8745.68 (+6.4%)702712 (+1.4%)583.96578.43 (−0.95%)0.50250.5448
UR243.3846.22 (+6.3%)583596 (+2.2%)473.29472.11 (−0.25%)0.50090.5679
MUR240.0842.52 (+5.9%)575588 (+2.2%)462.10461.80 (−0.065%)0.50080.5648
MUR339.7941.54 (+4.3%)532532 (+0%) 434.31428.91 (−1.3%)0.50020.5639
Fuel economy (mpg)Time (s)Cumulative cost (—)Final SoC (%)
SMORLNN-rolloutSMORLNN-rolloutSMORLNN-rolloutSMORLNN-rollout
UR144.2243.72 (−1.1%)752748 (−0.53%)594.74594.44 (−0.050%)0.50060.5689
MUR142.8745.68 (+6.4%)702712 (+1.4%)583.96578.43 (−0.95%)0.50250.5448
UR243.3846.22 (+6.3%)583596 (+2.2%)473.29472.11 (−0.25%)0.50090.5679
MUR240.0842.52 (+5.9%)575588 (+2.2%)462.10461.80 (−0.065%)0.50080.5648
MUR339.7941.54 (+4.3%)532532 (+0%) 434.31428.91 (−1.3%)0.50020.5639

Figures 8 and 9 shows the optimal state and control input trajectories, along with the time-space plot for UR1 and UR2, respectively. The state and control input trajectories show that all the constraints on speed limits, battery SoC limits and torque limits are met, which is ensured by the deterministic solution of the short-term RHOCP. A correct prediction of the terminal cost becomes important when approaching intersections, where errors could lead to infeasibilities. To this extent, the results indicate that the vehicle does not violate any red light and stop signs, while passing only when the light is green.

Fig. 8
Urban Route 1 (UR1) trajectory: (a) Distance versus velocity trajectory, (b) Distance versus SoC trajectory, (c) Torque versus time trajectory, and (d) Distance versus time trajectory
Fig. 8
Urban Route 1 (UR1) trajectory: (a) Distance versus velocity trajectory, (b) Distance versus SoC trajectory, (c) Torque versus time trajectory, and (d) Distance versus time trajectory
Close modal
Fig. 9
Urban Route 2 (UR2) trajectory: (a) Distance versus velocity trajectory, (b) Distance versus SoC trajectory, (c) Torque versus time trajectory, and (d) Distance versus time trajectory
Fig. 9
Urban Route 2 (UR2) trajectory: (a) Distance versus velocity trajectory, (b) Distance versus SoC trajectory, (c) Torque versus time trajectory, and (d) Distance versus time trajectory
Close modal

The velocity trajectory exhibit some noise resulting from a general NN fit, but also exhibits behavior that demonstrates the ability to extrapolate from its training set. For example, the vehicle accelerates and then coasts through the speed limit change, which is demonstrated on UR1 between 8000 m and 10,000 m in Fig. 8(a) and on UR2 between 6000 m and 7000 m in Fig. 9(a). Constant velocities are held on unobstructed sections of road, which is shown in UR1 between 7000 m and 8000 m in Fig. 8(a) and UR2 between 5000 m and 6000 m in Fig. 9(a).

The slight change in the velocity trajectory between the NN-based method and SMORL can result in the vehicle experiencing a different SPaT sequence along the route, which is evident from the time-space plot in Figs. 8 and 9. For the UR2 case, shown in Fig. 9(a), the speed profile of the NN-based approach is almost identical to SMORL, except for the NN causing a slower constant velocity between 2000 m and 3000 m. This is necessary to avoid a red light stop at 5000 m, as shown in Fig. 9(d), while SMORL encounters a red phase at the same intersection.

Comparison of the SoC trajectory demonstrates close to charge-sustaining behavior in both cases, with only a slight deviation from the 50% SoC target at the end of the route. This was due to pruning the infeasible raw data shown in Fig. 4, which discouraged low SoC values at the end of route, creating a preference for slight overshoots.

Overall, the NN-rollout method can outperform the SMORL approach in approximating the terminal cost. Due to its offline training, the NN-rollout is not only significantly less computationally expensive, but also results in faster simulation times. In addition, compared to the DP (deterministic) solution, the NN-rollout method was able to provide a full approximation of the mapping between terminal state and terminal cost without the need to always store a value function. For the routes considered in this paper, Table 4 shows that the NN used 268 kilobytes (kb) of memory compared to 2–5 gigabytes (Gb) needed for the deterministic approach. Therefore, the NN-rollout approach outperforms the existing methods of approximating the terminal cost in a computationally and memory-efficient manner.

Table 4

Memory requirement of approach

RouteUR1MUR1UR2MUR2MUR3
Value function (GB)5.105.083.052.922.63
Neural network (kB)269.5269.5269.5269.5269.5
RouteUR1MUR1UR2MUR2MUR3
Value function (GB)5.105.083.052.922.63
Neural network (kB)269.5269.5269.5269.5269.5

5 Conclusion

A neural network approximation of the terminal cost in a receding horizon optimal control problem is proposed in this paper. The method is compared against a stochastic approach that captures the variability in SPaT. Simulations over five real-world routes show that the proposed NN-rollout outperforms the stochastic method, while reducing the computational and memory requirements. The NN-rollout provides an exhaustive mapping of the terminal cost similar to a full-route DP, but without the significant memory required. Additionally, the NN approximation shows similar robustness to variation as the reinforcement learning method, while requiring a completely offline process that is computationally more efficient.

The current work is focusing on integrating the proposed NN as a terminal cost approximator of the rollout algorithm for Eco-Driving presented in Ref. [4]. This will enable the integration in vehicle and experimental verification of the proposed strategy. Future work will focus on extending the NN framework to include uncertainties due to variations in traffic density.

Acknowledgment

The authors acknowledge the support from the United States Department of Energy, Advanced Research Projects Agency—Energy (ARPA-E) NEXTCAR project (Award No. DE-AR0000794).

Conflict of Interest

There are no conflicts of interest.

Data Availability Statement

The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.

Footnote

1

Paper presented at the 2023 Modeling, Estimation, and Control Conference (MECC 2023), Lake Tahoe, NV, October 2–5. Paper No. MECC2023-46.

References

1.
Zhu
,
Z.
,
Pivaro
,
N.
,
Gupta
,
S.
,
Gupta
,
A.
, and
Canova
,
M.
,
2022
, “
Safe Model-Based Off-Policy Reinforcement Learning for Eco-Driving in Connected and Automated Hybrid Electric Vehicles
,”
IEEE Trans. Intell. Veh
,
7
(
2
), pp.
387
398
.
2.
Tajalli
,
M.
,
Mehrabipour
,
M.
, and
Hajbabaie
,
A.
,
2020
, “
Network-Level Coordinated Speed Optimization and Traffic Light Control for Connected and Automated Vehicles
,”
IEEE Trans. Intell. Transp. Syst.
,
22
(
11
), pp.
6748
6759
.
3.
Amini
,
M. R.
,
Gong
,
X.
,
Feng
,
Y.
,
Wang
,
H.
,
Kolmanovsky
,
I.
, and
Sun
,
J.
,
2019
, “
Sequential Optimization of Speed, Thermal Load, and Power Split in Connected Hevs
,”
2019 American Control Conference (ACC)
,
Philadelphia, PA
,
July 10–12
, IEEE, pp.
4614
4620
.
4.
Deshpande
,
S. R.
,
Gupta
,
S.
,
Gupta
,
A.
, and
Canova
,
M.
,
2022
, “
Real-Time Ecodriving Control in Electrified Connected and Autonomous Vehicles Using Approximate Dynamic Programing
,”
ASME J. Dyn. Syst. Meas. Control
,
144
(
1
), p.
011111
.
5.
Han
,
J.
,
Shen
,
D.
,
Jeong
,
J.
,
Di Russo
,
M.
,
Kim
,
N.
,
Grave
,
J. J.
,
Karbowski
,
D.
,
Rousseau
,
A.
, and
Stutenberg
,
K. M.
,
2023
, “
Energy Impact of Connecting Multiple Signalized Intersections to Energy-Efficient Driving: Simulation and Experimental Results
,”
IEEE Control Syst. Lett.
,
7
, pp.
1297
1302
.
6.
Hyeon
,
E.
,
Han
,
J.
,
Shen
,
D.
,
Karbowski
,
D.
,
Kim
,
N.
, and
Rousseau
,
A.
,
2022
, “
Potential Energy Saving of V2V-Connected Vehicles in Large-Scale Traffic
,”
IFAC-Pap.
,
55
(
24
), pp.
78
83
.
7.
Uebel
,
S.
,
Murgovski
,
N.
,
Tempelhahn
,
C.
, and
Baker
,
B.
,
2017
, “
Optimal Energy Management and Velocity Control of Hybrid Electric Vehicles
,”
IEEE Trans. Veh. Technol.
,
67
(
1
), pp.
327
337
.
8.
Gupta
,
S.
,
2019
, “
Look-Ahead Optimization of a Connected and Automated 48v Mild-Hybrid Electric Vehicle
,”
Master thesis
,
The Ohio State University
,
Columbus, OH
.
9.
Borek
,
J.
,
Groelke
,
B.
,
Earnhardt
,
C.
, and
Vermillion
,
C.
,
2019
, “
Economic Optimal Control for Minimizing Fuel Consumption of Heavy-Duty Trucks in a Highway Environment
,”
IEEE Trans. Control Syst. Technol.
,
28
(
5
), pp.
1652
1664
.
10.
Zhu
,
Z.
,
Gupta
,
S.
,
Pivaro
,
N.
,
Deshpande
,
S. R.
, and
Canova
,
M.
,
2021
, “
A GPU Implementation of a Look-Ahead Optimal Controller for Eco-Driving Based on Dynamic Programming
,”
2021 European Control Conference (ECC)
,
Rotterdam, Netherlands
,
June 29–July 2
, IEEE, pp.
899
904
.
11.
Lee
,
H.
,
Kim
,
N.
, and
Cha
,
S. W.
,
2020
, “
Model-Based Reinforcement Learning for Eco-Driving Control of Electric Vehicles
,”
IEEE Access
,
8
, pp.
202886
202896
.
12.
Wegener
,
M.
,
Koch
,
L.
,
Eisenbarth
,
M.
, and
Andert
,
J.
,
2021
, “
Automated Eco-Driving in Urban Scenarios Using Deep Reinforcement Learning
,”
Transp. Res. C Emerg. Technol.
,
126
, p.
102967
.
13.
Olin
,
P.
,
Aggoune
,
K.
,
Tang
,
L.
,
Confer
,
K.
,
Kirwan
,
J.
,
Deshpande
,
S. R.
,
Gupta
,
S.
,
Tulpule
,
P.
,
Canova
,
M.
, and
Rizzoni
,
G.
,
2019
, “Reducing fuel consumption by using information from connected and automated vehicle modules to optimize propulsion system control,” Tech. Rep., SAE Technical Paper.
14.
Krajzewicz
,
D.
,
Hertkorn
,
G.
,
Rossel
,
C.
, and
Wagner
,
P.
,
2002
, “
SUMO (Simulation of Urban Mobility)-An Open-Source Traffic Simulation
,”
Proceedings of the 4th Middle East Symposium on Simulation and Modelling (MESM2002)
,
Sharjah, UAE
,
Oct. 28–30
, pp.
183
187
.