Introduction-to-PySpark

Description

Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. You'll use PySpark package to work with data about flights from Portland and Seattle. You'll learn to wrangle this data and build a whole machine learning pipeline to predict whether or not flights will be delayed.

Getting to know PySpark

How Spark manages data and how can you read and write tables from Python.

Manipulating data

About the pyspark.sql module, which provides optimized data queries to your Spark session.

Getting started with machine learning pipelines

PySpark has built-in, cutting-edge machine learning routines, along with utilities to create full machine learning pipelines.

Model tuning and selection

Create a model that predicts which flights will be delayed.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Introduction to PySpark.ipynb		Introduction to PySpark.ipynb
README.md		README.md
airports.csv		airports.csv
flights_small.csv		flights_small.csv
planes.csv		planes.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction-to-PySpark

Description

Getting to know PySpark

Manipulating data

Getting started with machine learning pipelines

Model tuning and selection

About

Uh oh!

Releases

Packages

Languages

cc59chong/Introduction-to-PySpark

Folders and files

Latest commit

History

Repository files navigation

Introduction-to-PySpark

Description

Getting to know PySpark

Manipulating data

Getting started with machine learning pipelines

Model tuning and selection

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages