Spark Structured Streaming - Introduction (1/3)

A brief introduction to Spark Structured Streaming

Pavan Kulkarni

10 minute read

Structured Streaming is a new of looking at realtime streaming. With abstraction on DataFrame and DataSets, structured streaming provides alternative for the well known Spark Streaming. Structured Streaming is built on top of Spark SQL Engine. Some of the main features of Structured Streaming are -

Detailed Guide to Setting up Scalable Apache Spark Infrastructure on Docker - Standalone Cluster With History Server

This post is a complete guide to build a scalable Apache Spark on using Dockers. We will see how to enable History Servers for log persistence.

Pavan Kulkarni

10 minute read

This post is a complete guide to build a scalable Apache Spark on using Dockers. We will see how to enable History Servers for log persistence. To be able to scale up and down is one of the key requirements of today’s distributed infrastructure. By the end of this guide, you should have pretty fair understanding of setting up Apache Spark on Docker and we will see how to run a sample program.