Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. You'll use PySpark package to work with data about flights from Portland and Seattle. You'll learn to wrangle this data and build a whole machine learning pipeline to predict whether or not flights will be delayed.
How Spark manages data and how can you read and write tables from Python.
About the pyspark.sql module, which provides optimized data queries to your Spark session.
PySpark has built-in, cutting-edge machine learning routines, along with utilities to create full machine learning pipelines.
Create a model that predicts which flights will be delayed.