Microsoft unveiled a new deep learning acceleration platform. This will be known as Project Brainwave. Project Brainwave will be a major forward step in both flexibility and performance for a cloud-based serving of deep learning models. The best thing about this one is that this system is designed for real-time AI meaning the system processes requests as fast as it receives them, with ultra-low latency. The importance of real-time AI is increasing as cloud infrastructures process live data streams whether they are interacting with users or search queries.
There are three main layers of this Brainwave:
A hardware DNN engine synthesized onto FPGA
The high performance and distributed system architecture
The runtime for low-friction deployment of trained models
Project Brainwave uses the massive FPGA infrastructure which is directly attached to the data center network. The DNN can also be mapped to a pool of remote FPGAs and called by a server with no software in the loop. This system architecture reduces latency as CPU does not need to process incoming requests which in turn cause high throughput. Brainwave also uses “soft” DNN processing unit which is synthesized onto commercially available FPGAs. There are many companies that are building their own hardened DPUs which has high peak performance. But, they have to choose their operators and data types at design time which tend to limit their flexibility.
Project Brainwave provides a design which can be scaled across a range of data types. It uses both the synthesizable logic and the ASIC digital signal processing blocks on the FPGAs to provide a greater and more optimized number of functional units. It results in increased performance without real losses in model accuracy and quick incorporation of research innovations into the hardware platform. Microsoft demonstrated the Brainwave using the latest Intel’s 14 nm Stratix 10 FPGA.
Project Brainwave has also in-built support Microsoft Cognitive Toolkit and Google’s Tensorflow. Microsoft has assured that they will expand its support in coming time. Microsoft has also defined a graph-based intermediate representation which they convert models trained in the popular frameworks, and then compile down to their high-performance infrastructure.
Microsoft is also working to bring the real-time AI system to users in Azure which will complement the indirect access through their services like Bing. This will allow the users to run their most complex deep learning models at the very high performance.