Lesson 2: Get started with GOAI

H2O.ai, in partnership with Continuum Analytics and MapD Technologies, has announced the formation of the GPU Open Analytics Initiative (GOAI) to create common data frameworks enabling developers and statistical researchers to accelerate data science on GPUs. GOAI will foster the development of a data science ecosystem on GPUs by allowing resident applications to interchange data seamlessly and efficiently.

Python has taken the throne as the most used programming language in data science and has a large number of data science platforms employing Python APIs. The language allows its users to efficiently extract, transform, and load (ETL) data to be used by other processes, as well as mine for insights. GOAI’s first project, GPU Data Frame, provides a Python API to an open source GPU data frame that addresses the current and future needs of digital transformation. The offerings from individual partners of the GOAI collective fit each other to a T as the MapD Core database sends the results of a SQL query into the GPU data frame, which then can be manipulated by Continuum Analytics’ Anaconda NumPy-like Python API or used as input into the H2O suite of machine learning algorithms without additional data manipulation.

GPU Data Frame drastically simplifies the flow of data to and from the GPUs by providing an API for efficient data interchange between processes. As the data gets loaded into the GPU data frame, the common API can access and modify the data without exiting the GPU. This enables users to efficiently interact with data using SQL or Python, and then execute machine learning algorithms without any help required from CPU. This is beneficial for developers who can then more easily write CUDA C/C++ code to accelerate targeted parts of their workload.

Currently, GPU Data Frame is a single-GPU data structure with support for multi-GPU model-parallel training, where each GPU gets a duplicate of the data and trains an independent model. In the future, GOAI plans to add support for multi-GPU distributed data frames and data-parallel training (where GPUs work together to train a model). Spearheading the advancements in this space, H2O.ai will provide additional machine learning models, such as gradient boosting machines (GBM), support vector machines (SVM), k-means clustering, and more, with multi-GPU data-parallel and model-parallel training support. Continuum Analytics will continue to add Python functionality and bring the richness of Anaconda to GPUs. Conclusively, MapD Technologies will expand GPU SQL support and data visualization functionality to concoct enterprise-ready solutions. Integrating scale-out data warehousing, graph visualization, and graph analytics will give data scientists more tools to analyze even larger and more complex datasets.

The initiative has received immense appreciation and use-cases from the industry. Joining the initiative are organizations BlazingDB, Graphistry, and Gunrock, which has us super excited for all things GPU.  

The links below will help you get a quick start on GPU Data Frame and help understand the development more practically:

1. GOAI Homepage

2. NVIDIA's blog explaining GOAI's prowess

3. GitHub repository


  • We'd love to hear from you! Reach out to us on our Google Stream group with questions, requests, or comments any time.