Introduction to Kafka
This is a quick cookbook to introduce Apache Kafka
Apache Kafka is a message publishing framework that works in a distributed environment. Kafka can be scaled horizontally with high fault-tolerance.
This is a quick cookbook to introduce Apache Kafka
Apache Kafka is a message publishing framework that works in a distributed environment. Kafka can be scaled horizontally with high fault-tolerance.
CSV File to JSON File Real Time Streaming Example
In this post we will see how to build a simple application to process file to file real time processing.
Socket Word Count demo for Spark Structured Streaming
Structured Streaming is a new of looking at realtime streaming. In this post we will see how to build our very first Structured Streaming app to perform Word Count over network.
A brief introduction to Spark Structured Streaming
Structured Streaming is a new of looking at realtime streaming. With abstraction on DataFrame and DataSets, structured streaming provides alternative for the well known Spark Streaming. Structured Streaming is built on top of Spark SQL Engine. Some of the main features of Structured Streaming are -
Processing data from MongoDB in Python
This post will give an insight of data processing from MonogDB in Python.
This is a step-by-step guide to install MongoDB on Mac
This post will introduce mongo shell and basic query operations that can be performed on mongo shell with examples.
Processing data from Mongo on distributed environment - Apache Spark
We will look into basic details of how to process data from MongoDB using Apache Spark.
This is a step-by-step guide to install MongoDB on Mac
This post is a step-by-step guide to install MongoDB on Mac.
This post is a complete guide to build a scalable Apache Spark on using Dockers. We will see how to enable History Servers for log persistence.
This post is a complete guide to build a scalable Apache Spark on using Dockers. We will see how to enable History Servers for log persistence. To be able to scale up and down is one of the key requirements of today’s distributed infrastructure. By the end of this guide, you should have pretty fair understanding of setting up Apache Spark on Docker and we will see how to run a sample program.
This post will guide you to a step-by-step setup to run PySpark jobs in PyCharm
This post will give a walk through of how to setup your local system to test PySpark jobs. Followed by demo to run the same code using spark-submit
command.
Share this post
Twitter
Google+
Facebook
LinkedIn
Email