Spark has support for zipping rdds using functions like zip, zipPartition, zipWithIndex and zipWithUniqueId . Lets go through each of…
Performance issues can be categorized into two parts 1. Distribution Performance – program slow due to scheduling , coordination and…
Batch and Window Sizes – The most common question is what minimum batch size Spark Streaming can use. In general,…
Tuning Spark often simply means changing the Spark application’s runtime configuration. The primary configuration mechanism in Spark is the SparkConf…
Common join The common join is also called reduce side join. It is a basic join in Hive and works…
Local mode Hadoop can run in standalone, pseudo-distributed, and fully distributed mode. Most of the time, we need to configure…
Hive partitioning is one of the most effective methods to improve the query performance on larger tables. The query with…