Optimal control and policy search in dynamical systems using expectation maximization

Mallick, Prakash

Title: Optimal control and policy search in dynamical systems using expectation maximization
Creator: Mallick, Prakash
Relation: University of Newcastle Research Higher Degree Thesis
Resource Type: thesis
Date: 2023
Description: Research Doctorate - Doctor of Philosophy (PhD)
Description: Trajectory optimization is a fundamental stochastic optimal control problem. In this type of control problem it is incredibly important to consider the impact of measurement noise. In particular, measurement noise plays a huge role in dynamical systems undergoing motion/action, especially in an uncertain environment. Therefore, in this thesis, I deal with a trajectory optimization approach for unknown dynamical systems subject to measurement noise. I propose an architecture which assimilates the benefits of a conventional optimal control procedure with the advantages of maximum likelihood approaches, resulting in a novel iterative trajectory optimization paradigm called Stochastic Optimal Control - Expectation Maximization. I explore the advantages of the proposed methodology in a reinforcement learning setting compared to other widely used baselines. Another class of algorithms known as Guided Policy Search approaches have been proven to work with incredible accuracy for not only controlling a complicated dynamical system, but also learning optimal policies from various unseen instances. One assumes the true nature of the states in almost all of the well-known policy search and learning algorithms. However, I utilize the stochastic optimal control approach and extend it to learning (optimal) policies when there is latency in states. This learning will have less noise because of lower variance in the optimal trajectories. The theoretical and empirical evidence from the learnt optimal policies of the new approach are depicted in comparison to some well-known baselines which are evaluated on a two-dimensional autonomous system with widely used performance metrics. Furthermore, I provide extensive empirical results for the case of a dynamical system attempting to perform three-dimensional complicated tasks as well. The trajectory optimization procedure shows that the optimal policy parameters obtained by the maximum likelihood technique produce better performance in terms of reduction of cumulative cost-to-go and less stochasticity in state and action trajectories through efficiently balancing exploration and exploitation, which is a new direction introduced in this thesis. Additionally, I provide a few novel theoretical results that bridge the gap between definitions of information theory as a result of my proposed optimization objective function.
Subject: stochastic optimal control; model-based reinforcement learning; guided policy search; expectation maximization
Identifier: http://hdl.handle.net/1959.13/1477548
Identifier: uon:50000
Language: eng
Full Text

Hits: 696
Visitors: 945
Downloads: 273

		Thumbnail	File	Description	Size	Format
View Details Download			ATTACHMENT01	Thesis	7 MB	Adobe Acrobat PDF	View Details Download
View Details Download			ATTACHMENT02	Abstract	506 KB	Adobe Acrobat PDF	View Details Download