Our ability to collect and analyze data is evolving at an exponential rate. We collect vast quantities of data every second and are only beginning to understand the true potential impact it can have on our businesses. All this data is an ever-expanding mountain of gold, waiting to be mined and transferred into new, profound capabilities that will help us become more adept at predicting the future. Fundamentally, this capability transforms organizations from reactive environments — being managed by static and aged data — to automated continuous learning environments in real time. Today’s analytical capabilities don’t stop at the physical or virtual boundaries of any organization. Relationships across the entire business model and value chain — including customers, suppliers and partners — do and should share real-time data. This allows companies to extend or acquire knowledge and feedback, significantly reducing performance risk, waste and costs while driving performance and growth. In this course you will understand what real time data is, and how to analyze it to get useful insights using Spark and Spark Streaming.
Skills covered
Pyspark
Spark
Spark Streaming
Real time Analytics
Course Syllabus
Spark Basics and Streaming
Introduction to Spark
Spark vs hadoop
Spark architecture
RDDs
Spark terminologies
Hands on PySpark
Spark MLIB
Moving from RDD to dataframe API
Clustering with pyspark
Music data case studies
Overview of real time analytics and spark streaming
Spark streaming architecture
Understand real time analytics with twitter example
Ad tech case study