The Power of Deep Reinforcement Learning

Maximising the potential of next-generation technologies with a better-than-human AI approach

deep learning


Google's DeepMind AlphaGo defeated Lee Sedol, one of the best players at strategy board game Go in 2016, and a host of other Go world champions over the years.

Since then, deep reinforcement learning (DRL)—a new AI approach that enabled AlphaGo—has brought new and refreshed enthusiasm to the world of AI. This enthusiasm has spurred much research in creating smarter environments to build sustainable cities and improve society’s well-being.

Researchers in both academia and industry have been racing to apply DRL in various ways, including self-driving cars and industry automation. They aim to make traditional systems smart and the smart ones even smarter, perhaps achieving a better-than-human intelligence as demonstrated by AlphaGo.

DRL integrates deep learning, which uses layers of artificial neurons that mimic the brain structure, with reinforcement learning. The latter enables a learning agent to explore and exploit the best possible actions autonomously in a dynamic operating environment (or state).

The agent would achieve the highest possible rewards (i.e., make the best decisions) in enhancing system performance over time. The agent learns knowledge, which comprises appropriate actions under different states, on the fly and unsupervised.

Compared to reinforcement learning, DRL uses a deep neural network to represent complex sets of states. DRL has been shown to achieve breakthrough performance with lower computational cost, reduced learning time, and more efficient knowledge storage.

Capitalising on the advantages of DRL, my research is geared towards the use of reinforcement learning and DRL in enhancing smart transportation and communication systems, which are fast-paced, dynamic, heterogeneous, complex, and data-intensive in nature.

Traffic congestion, for example, is inevitable in most urban areas. In Malaysia, unpredictable weather compounds the issue as heavy rain and wet roads will slow traffic, especially during rush hour or at night. Congestion at a single intersection has domino and single-point-of-failure effects that could disrupt the traffic at neighbouring roads.

I was interested in the intersections where traffic bottlenecks are known to occur despite being monitored by traffic lights. Using DRL, I enabled traffic light controllers at different intersections to collaborate and exchange knowledge in selecting their traffic phases and split phasing. This would allow a green wave and mitigate cross blocking and vehicle idling.

I applied this novel approach to the traffic lights in Sunway City, considering the irregular traffic caused by heavy rainfall. The results showed reduced queue length and waiting time of vehicles and fewer number of vehicles crossing an intersection.

In terms of communication systems, we are moving towards 5G wireless mobile networks and cognitive radios. Wireless applications are growing, particularly multimedia-based ones and internet services for mobile gadgets and devices.

There is increasing need for more wireless bandwidth and the radio spectrum that offers it. This increase has led to spectrum scarcity, which in Malaysia, is further complicated as radio spectrum is shared with neighbouring countries such as Indonesia and Singapore.

Applying reinforcement learning and DRL, I aim to enable mobile gadgets and devices to learn knowledge and adopt the best possible actions for various network operations.

Both approaches provide intelligence and autonomy to support core operations, from accessing underutilised radio spectrum to routing and enhancing security. For example, a wireless host searches for a multi-hop route to its destination node in a dynamic environment in which network conditions, such as licensed and unlicensed network traffic, change over time.

My findings have been published in journals and conference proceedings, and even resulted in a patent. Looking ahead, I hope my research can improve next-generation technologies for smarter and more sustainable development.   


Professor Yau Kok Lim
School of Engineering and Technology


This article appeared in Spotlight on Research (Volume 5)