A PD-Type State-Dependent Riccati Equation With Iterative Learning Augmentation for Mechanical Systems

2022-08-13 02:05SaeedRafeeNekooJosngelAcostaGuillermoHerediaandAnibalOllero
IEEE/CAA Journal of Automatica Sinica 2022年8期

Saeed Rafee Nekoo, José Ángel Acosta, Guillermo Heredia,, and Anibal Ollero,

Abstract—This work proposes a novel proportional-derivative(PD)-type state-dependent Riccati equation (SDRE) approach with iterative learning control (ILC) augmentation. On the one hand, the PD-type control gains could adopt many useful available criteria and tools of conventional PD controllers. On the other hand, the SDRE adds nonlinear and optimality characteristics to the controller, i.e., increasing the stability margins. These advantages with the ILC correction part deliver a precise control law with the capability of error reduction by learning. The SDRE provides a symmetric-positive-definite distributed nonlinear suboptimal gain K(x) for the control input law u = –R–1(x)BT(x)K(x)x. The sub-blocks of the overall gain R–1(x)BT(x)K(x), are not necessarily symmetric positive definite. A new design is proposed to transform the optimal gain into two symmetric-positive-definite gains like PD-type controllers as u =–KSP(x)e–KSD(x)ė. The new form allows us to analytically prove the stability of the proposed learning-based controller for mechanical systems; and presents guaranteed uniform boundedness in finite-time between learning loops. The symmetric PD-type controller is also developed for the state-dependent differential Riccati equation (SDDRE) to manipulate the final time. The SDDRE expresses a differential equation with a final boundary condition,which imposes a constraint on time that could be used for finitetime control. So, the availability of PD-type finite-time control is an asset for enhancing the conventional classical linear controllers with this tool. The learning rules benefit from the gradient descent method for both regulation and tracking cases. One of the advantages of this approach is a guaranteed-stability even from the first loop of learning. A mechanical manipulator, as an illustrative example, was simulated for both regulation and tracking problems. Successful experimental validation was done to show the capability of the system in practice by the implementation of the proposed method on a variable-pitch rotor benchmark.

I. INTRODUCTION

THE proportional derivative (PD) control has wellestablished mathematics, so many well-known tools,assessment and tuning methods, etc. It has been accepted by the industrial platforms which prefer the simplicity of the control law and straightforward analysis. What if someone wants a finite-time PD controller? We intend to gain all the benefits of the PD control with the additional power of nonlinearity and finite-time regulation. The state-dependent Riccati equation (SDRE) has been defined as an optimal control design for nonlinear systems. Optimality, robustness,design flexibility, and systematic procedure are some advantages of the SDRE [1]. The differential form of the SDRE uses a final boundary condition to impose an extra penalty on final states near the end of time, the method is socalled the state-dependent differential Riccati equation(SDDRE) [2]. Both SDRE and SDDRE were widely used in aerospace [3], approximate dynamic programming framework[4], cancer treatment modeling [5], etc. Either of SDRE or SDDRE method provides a suboptimal-distributed symmetricpositive-definite gain K(x) for the standard form of Riccati control law u=−X(x)x in which X(x)=R−1(x)BT(x)K(x).This work proposes a new control structure to transform the distributed gain into two symmetric-positive-definite gains for the error vector and its derivative u =−KSP(x)e−KSD(x)e˙.

The proposed structure possesses many advantages such as independent control of error of the system, PD shape, the structure of a finite-time PD controller, etc.; however, the main objective for the transformation is to release a new design to analytically guarantee the stability of a novel SDRE design, augmented by iterative learning control (ILC)approach. Using PD-like control seems more practical and widely used, and easier for implementation and mathematical derivation. The motivation for using the SDRE controller is the nonlinear optimal structure and finite-time characteristics of the controller. The optimality deviates since the symmetric structure changes the original gain K(x); nonetheless, the nonlinearity and finite-time control remained as the advantages of the proposed method. Iterative learning control uses previous data to update the next control loop [6]. A proper ILC converges the error of a system towards zero in each control loop [7]–[9]. One of the main advantages of the ILC is a compensation of the modeling uncertainty by the learning process [10]–[12]. This is more critical when dynamic modeling is complex, for example, the behavior of the flapping wing flying robots [13]; or piezoelectric actuators with hysteretic nonlinearity [14]. Shen and Xu presented an adaptive ILC for systems with randomly varying iteration lengths [15]. Robotics has been an attractive field for the implementation of ILC. Rovedaet al.presented the iterative learning control approach with reinforcement peculiar to highaccuracy force tracking in robotics [16]. Ohet al.researched a regulation problem for iterative learning model predictive control [17]. Pan and Yu developed a composite learning robot control possessing guaranteed parameter convergence[18]. The iterative learning control improves trajectory tracking by several trials; Schöllig and D’Andrea employed the learning for tracking systems with state and input constraints [19]. The dynamic of a brush-bot is a challenging topic, especially in terms of actuation; hence, one efficient way is to use learning methods. Barrier-certified adaptive control was implemented for the navigation of a brush-bot as a reinforcement learning method [20]. Vision-based learning was used for racing drones in a tracking problem with moving uncertain trajectories [21]. The high level of uncertainty was handled by a deep neural network learning tool. In reinforcement learning, a gradient descent method is a powerful approach for training the models, also capable of modification for computation enhancement [22].

The closed-loop ILCs, with PD- or PID-based structures were reported [23], [24]; however, the ILC was usually employed to skip the unknown closed-loop dynamic in feedforward methods [25], [26]. It was also common to consider ILC based on a core hypothesis that repetitiveness of task and model was satisfied; however, robustness with ILC was shown using a high-order internal model [27]–[29]. In other words, the learning process helps to deliver an ideal controller without engaging the design with offline dynamic modeling; and using the online data of each step for the performance enhancement of the next loop. The open-loop approach demonstrated the nature of the learning by showing unexpected behavior at primary loops might result in bad trajectories, collisions, saturations, or destruction of the prototypes. So, open-loop learning is preferable for stable systems; i.e., a quadrotor with an internal stable loop [26]. The closed-loop ILC could possess stable primary loops at the beginning. This is useful for dynamically unstable systems.Here in this work, the presented learning-based controller is closed-loop, applicable for a dynamically unstable system.The experimental implementation of this work uses a propeller-type inverted pendulum that, in nature, is unstable at an equilibrium point. So, the PD-type SDDRE generates a primary input signal (stable) for the first loop and the ILC will update and reduce the error in consecutive learning loops.

Fig. 1. The explanation of the ILC operation in error reduction in finitetime.

The current work uses a gradient descent method as a training rule for updating the learning feedforward control law to propose a symmetric gain SDRE/SDDRE control and combining it with the ILC to form a new finite-time iterative learning approach. The stability proof of the control law is presented. The main advantage of this learning system is to employ a nonlinear (suboptimal) input law besides the ILC to guarantee stable regulation and tracking even from the first loop of the learning. The proposed ILC is also implemented on a variable-pitch (VP) rotor benchmark for experimental validation. The VP platform was controlled by the SDRE [32],though here for the first time, the implementation of a learning-based controller has been presented.

The main contributions of this work are as follows:

1) Presenting a nonlinear finite-time PD-like controller based on SDDRE with the ILC feedforward compensation to gain more precision.

2) Transforming the suboptimal-distributed symmetricpositive-definite gain of the SDRE/SDDRE methodology into novel PD-like gains, which is more suitable for mechanical systems.

3) Proving the stability of the nonlinear controller with ILC.

4) Introducing a novel convex objective function for the regulation training rule of the gradient descent method.

5) Guaranteeing uniform boundedness in finite-time between learning loops allows us to use the approach for unstable mechanical systems.

II. THE SDRE/SDDRE CONTROL DESIGN

An SDRE controller is suitable for cases that need an optimal design when the finishing time is unimportant and the steady-state behavior of the system is more demanded. An SDDRE controller penalizes the states at a final time which is usually less than the conventional time of operation (when the user requires a faster response). The final time is a function of control effort that the designer invests in the platform. More energy results in a faster response; and the fastest response is limited by the actuator’s saturations and limits. The other aspect that defines the final time is the amplitude of the error in regulation (point-to-point) control and the length and speed of a trajectory in the tracking case. That means if the error in regulation is big, more time is needed and if the error is small,less time.

The question is how to determine the final time? Using the SDRE, without penalization of the final boundary condition delivers an infinite-time optimal control solution as a function of error and the convergence-time is limited to the bound of actuators and tuning of the weighting matrices. In that sense,we realize approximately how much time we need for a specific task in regulation between two set points.

After that how to finish the control task faster? Using the SDDRE with penalizing the final boundary condition. The error at the end of the regulation gets smaller and the amplitude of the input signal decreases which shows an asymptotic behavior in infinite time controllers. Using the finite time approach, the regulation near the end needs to speed up and we see a faster response, which leads to a shorter finishing time. Matrix F plays this role and enhances regulation speed.

It should be noted that in SDDRE closed-loop optimal control, at the final time, there is an error but the controller equipped with ILC will reduce it, see Fig. 1. The exact final time with zero error is possible by open-loop two-point boundary value problem [40], [41], etc.; however, using closed-loop optimal control, the exact value without error is impossible.

As a result, the signal of the SDDRE control law increases near the end of the time of operation. The solution to the differential Riccati equation, SDDRE, is a little more complicated than the SDRE (SDRE is an algebraic matrix equation). The more complicated structure was not a limit to the application of this method in control of a super-tanker[42], aircraft [43], wind energy conversion system [44],helicopter [45], etc.

Lemma 1:The nonlinear system (1) with its performance index (2) and necessary conditions based on Assumptions 1 and 2 can be stabilized using control law (3), in which Kss(x(t))is the positive-definite solution to the SDRE (4)[46].

Lemma 2:The nonlinear system (1) with its cost function(2) and necessary conditions based on Assumptions 1 and 2 can be stabilized using control law (3), in which K (x(t)) is the positive-definite solution to the SDDRE (5) with the final

III. SDRE/SDDRE AUGMENTED BY ITERATIVE LEARNING

A. Regulation

B. Tracking

IV. LEARNING RULE

A. Regulation Training Rule

B. Tracking Training Rule

The performance index of the tracking problem is expressed based on the non-homogenous part of (24) as [23]

A. Regulation

V. SIMULATIONS

TABLE I THE PARAMETERS OF THE SPHERICAL MANIPULATOR

Fig. 2. (a) The configuration of the manipulator; (b) The magnified view of the end-effector error reduction.

Fig. 3. End-effector errors in iterations.

TABLE II END-EFFECTOR ERROR IN VARIOUS VALUES OF ρ2 WITH ρ1=1.5

Fig. 4. (a) Forcing value to prove stability; (b) The derivative of Lyapunov function with different ρ2.

As it was pointed out in Remark 2 the simplest way to define this extra term is just ρ2=0 but in practice, it might be necessary to have a trade-off to modulate its size. This simulation aims to show the design of the control law based on scaling parameterr(t). A series of simulations were also provided to check the capability of the proposed approach in various conditions, Fig. 5(a) shows the forcing value, and Fig. 5(b) shows the derivative of the Lyapunov function.

B. Comparison

Fig. 5. (a) Forcing value; (b) derivative of Lyapunov function; in a series of different final times.

To clarify the machinery of the PD-type SDDRE learning controller, a comparison has been done. We compare the proposed method with a PD-based ILC to highlight the effect of finite-time gain; and also, we compare the method with SDDRE and PD without learning capability to highlight the learning effect on error reduction. The error of the proposed method with similar parameters (30 iterations) in Section V-A was gained 0 .0059 mm. If we use simple PD + gravity control with Kp=I3×3and Kd=2×I3×3gains and the same ILC support, the error will increase to 6 .8(mm). Also removing the ILC from the control loop, the conventional SDDRE and PD +gravity resulted in 23.8(mm) and 93.2(mm) errors, respectively. Regarding 4 (s) simulation time and 1 500 mm length of the three links, it is hard for the PD to reach the desired condition though the ILC improved its performance by 92%.So, comparing the results, the role of ILC on the closed-loop control is clearer. The ILC reduces the error by each iteration.The controllers, SDDRE or PD + gravity could indeed be tuned and operate properly, but the power of learning in error reduction and feedforward compensation of unpredicted situations is really helpful, especially in experiments.

C. Robustness and Uncertainty

Uncertainty reduces the performance of the proposed controller and the ideal situation for the SDRE is to have the model as close as possible to the real platform. The source of uncertainty might be the friction (which is hard to estimate and model), the variation of load, and lack of precision in modeling. To assess the performance of the controller, this subsection is presented. We provide uncertainty to the controller by a change in mass of the load, set on the endeffector of the robot. In this scenario, we do not inform the controller about the load, somp=0, the mass of the load ( kg)is zero in the controller, but in the system we setmp=0.1 kg.This increases the error from 0.0059 mm to 1.67 mm.Increasing the load more results in failure that shows the weakness towards the uncertainty (the black dotted line in Fig. 6).

Fig. 6. Comparison of the results in presence of disturbance ad uncertainty with the ideal case.

Remedy:The SDRE could be modified to show robust characteristics such as incorporating the bounds of uncertainty in the SDC matrices, the addition of correction terms such as sliding mode control [48], or using additional features such as neural networks [49]. Here we add robustness through the definition of an upper bound for load in the controller. The SDC matrices will possessmp,maxand the dynamics might have another value, lower than the boundmp≤mp,max.Consideringmp,max=1.5 andmp=1.2, and also increasing the gains Q=100×diag(11×3,21×3) and F=20×Q, the endeffector error is gained 0.0054 mm (the red dashed line in Fig. 6). The same uncertainty in friction could be addressed by incorporating the bounds in the SDC matrices (the green dashed-dotted line in Fig. 6). The higher error in the initial loop for the uncertainty simulation is due to more load on the end-effector. We emphasize that the focus of the paper is not on robustness and this topic deserves more discussions and a thorough investigation.

Robustness of the proposed SDRE+ILC:There are two mechanisms adding robustness: learning and dynamic scaling.The learning would find larger errors at the final time with disturbances, thus forcing a higher correction in the controller for the next iteration. However, this would only be possible if the core controller maintains the internal stability during each iteration, and this is the role of the scaling factor dynamics,dominating the mismatches and eventually disturbances.

D. Tracking

VI. ExPERIMENT

Fig. 7. Error reduction of the system in trajectory tracking case.

The experimental implementation of the ILC and SDDRE combination is presented to validate the proposed approach.The contribution of this work concerning [32] is that here the ILC augmented with SDRE controller has been implemented;nevertheless, in the previous work, only the SDRE was tested.For the first series of experiments, a stationary platform is selected, Fig. 8. The setup is a variable-pitch pendulum,rotating around the center-point of the system, represented by variable θ(t)(rad). The specifications are expressed in Table III.

Fig. 8. The variable-pitch pendulum, experimental platform.

TABLE III THE SPECIFICATIONS OF THE VARIABLE-PITCH BENCHMARK [32]

Fig. 9. Time-varying sampling time of the experiment.

The error of the system decreased by learning iterations,reduced to zero with 10 loops, Fig. 10. The error was measured directly by the optical encoders, installed on the system for providing the feedback. The amplitude of the input torque increases with each iteration (this will cause an increase at the end of servo signals), and the reason is the usual small raise of the SDDRE caused by finite-time control.The inputs to the servos are also presented in Fig. 11. One of the challenges in VP control is the asymmetric motion of the blades in the opposite direction which caused the deviation zero blade angle in Fig. 11. As a summary, the proposed ILC control with nonlinear optimal control structure of the SDDRE in finite-time was implemented on the VP experimental benchmark and without simplification. The programming was done in MATLAB script thanks to the Arduino package.

Fig. 10. The error of the system, learning in 10 loops.

Fig. 11. Servo input of the blades, 10 iterations of learning.

VII. CONCLUSIONS

This research proposed a combination of iterative control with the SDRE/SDDRE controller in a symmetric gain structure. The new shape provides the stability proof of the controller for a special case of mechanical systems analytically. The SDRE controller is a suboptimal controller that shares robustness, optimality, systematic structure, and design flexibility in the combination with the ILC. The ILC,on the other hand, reduces the error in each loop trained by the gradient descent method. The advantage of this approach is that the learning system is based on a stable optimal controller. The first loop of iteration does not endanger the control operation. Each consecutive loop enhances performance by reducing the error. The proposed controller has been tested on a real prototype. The HYFLIERS project requires inspection and maintenance in refineries [50]. The safety measurements in refineries are high and not all the learning controllers could be tested in the sites. So, this type of proposed controller with a stable input law, even from the first learning loop, is preferable for such activities. The benchmark was subjected to the learning loops and showed error reduction caused by the ILC and without simplification. This pilot setup prepares the condition for implementation of the method on a VP drone for rotation around a pipe.