[video coming soon]
This project was done as part of a contest in DD2438 Artificial Intelligence and Multi Agent Systems course at KTH.
We were given a Unity environment with 4 3D mazes and “realistic” car and drone models. The task consisted in creating an AI capable of navigating from start to finish of any maze as fast as possible.
We were the only team to address the challenge using a data-driven approach (RL) and achieved fastest accumulated times over all test (previously unseen) tracks.
We solve it combining two main ideas:
In a nutshell:
We use curriculum learning in order to speed the training process. To do so, we sequentially generate random mazes of increasing driving difficulty (number of blocks). Once the agent is able to master a certain difficulty, it advances to the next level. The first levels do not have any walls and are completed simply by driving in a straight line.
The algorithm we used to train the policy is PPO (Proximal Policy Optimization): A policy gradient algorithm “simple” to implement and tune. More on it on this video.
Once the control is learned, it would be interesting to learn also the path planning. Following the curriculum learning approach, we could start by stretching the distance between key points until we only provide the goal. This could require of an RNN architecture so that the agent somehow remembers the traversed maze.
RL is hard. Current RL algorithms are far from being “plug and play”. This was our first approach on a (relatively) more complex problem and we were surprised by how much the state-action space, algorithm and meta-parameter choices affect the performance.
Work in relative coordinates w.r.t the agent. Otherwise the state-space can become too confusing. Including rotations also rise its dimensionality.
Do NOT over-engineer rewards! Its a huge temptation but defeats the whole purpose of RL. At a certain point you might as well just write an heuristic to solve the problem and save headaches.
Curriculum learning helps a lot. Specially with limited computing resources. Attempting to directly drive on complex mazes proved to be too slow to learn. Start with a very simplified problem and once it works, evolve from there.
Formal experiment design & tracking is super important (such as fractional factorial design). We spent around a week by intuitively (or randomly) trying things without getting it to work. It wasn’t until we stopped to think the best experiments to perform and formalized them in a spreadsheet that we started to consistently see good results. In later projects we automated these searches using Bayesian optimization.
Design choice justifications, experimental results, further detailed explanations and drone state-action space adaptations in the full report.