Lesson 2:  Run a predictive model on a 14.5GB data set in 3 minutes

Predictive modeling is employed in a number of business use-cases to produce valuable insights for high efficiency. In the example above, H2O’s data scientist Amy Wang demonstrates predicting potential flight delays using a publicly available airlines dataset. The dataset used in this example is a small sample, which is more than 2 decades of flight data to ensure the download and import process would not take more than a minute or two. She runs a GLM model on a 14.5 GB dataset with 152 million rows to calculate the (un)likelihood of a flight being on time.

There are obvious benefits to predicting potential delays logistic issues for a business. It helps the user make contingency plans and corrections to make unavoidable outcomes. Recommendation engines can forewarn users of potential delays and rank flight options accordingly. The goal is to have machine take in all the possible factors that will affect a flight and return the probability of a flight being delayed.



  • We'd love to hear from you! Reach out to us on our Google Stream group with questions, requests, or comments any time.