Skip to content

cc59chong/Introduction-to-PySpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction-to-PySpark

Description

Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. You'll use PySpark package to work with data about flights from Portland and Seattle. You'll learn to wrangle this data and build a whole machine learning pipeline to predict whether or not flights will be delayed.

Getting to know PySpark

How Spark manages data and how can you read and write tables from Python.

Manipulating data

About the pyspark.sql module, which provides optimized data queries to your Spark session.

Getting started with machine learning pipelines

PySpark has built-in, cutting-edge machine learning routines, along with utilities to create full machine learning pipelines.

Model tuning and selection

Create a model that predicts which flights will be delayed.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published