Comprehensive Introduction to Apache Spark, RDDs & Dataframes (Using PySpark)
原文地址:Comprehensive Introduction to Apache Spark, RDDs & Dataframes (using PySpark)
Introduction
Industry estimates that we are creating more than 2.5 Quintillion bytes of data every year.
Think of it for a moment – 1 Qunitillion = 1 Million Billion! Can you imagine how many drives / CDs / Blue-ray DVDs would be required to store them? It is difficult to imagine this scale of data generation even as a data science professional. While this pace of data generation is very exciting, it has created entirely new set of challenges and has forced us to find new ways to handle Big Huge data effectively.

