Skip to content

Big Data

Analytics And More
  • Home
  • Spark
  • Design Patterns
  • streaming
  • Map Reduce
  • Hive
  • Hdfs & Yarn
  • Pig
  • Oozie
  • Hbase

Category: Data Analytics

Validating Spark DataFrame Schemas

May, 2019 adarsh

In this article I will illustrate how to do schema discovery for validation of column name before firing a select…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

deploying spring boot application in mesosphere

May, 2019 adarsh

In this article I will illustrate how to deploy a spring boot application as a service in mesosphere. The prerequisite…

Continue Reading →

Posted in: Data Analytics, mesosphere Filed under: mesosphere, spring boot

spark merge two dataframes with different columns or schema

adarsh 1 Comment

In this article I will illustrate how to merge two dataframes with different schema. Spark supports below api for the…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

spark converting nested json to csv

May, 2019 adarsh

In this article I will illustrate how to convert a nested json to csv in apache spark. Spark does not…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

submit spark job programmatically using SparkLauncher

March, 2019 adarsh

In this article I will illustrate how to submit a spark job programmatically using SparkLauncher. Let us take a use…

Continue Reading →

Posted in: aws, Spark, stream processing Filed under: kafka, Spark Rdd, spark streaming

running spark in mesosphere

March, 2019 adarsh

In this article i will illustrate how to install spark package and run a spark application in mesosphere. I will…

Continue Reading →

Posted in: mesosphere, Spark

spark read avro data from s3

January, 2019 adarsh 1 Comment

In this article i will demonstrate how to read and write avro data in spark from amazon s3. We will…

Continue Reading →

Posted in: aws, Spark Filed under: aws emr, datasets and dataframe, Spark Rdd

Post navigation

Page 5 of 26
← Previous 1 … 4 5 6 … 26 Next →

Recent Posts

  • Optimization for Using AWS Lambda to Send Messages to Amazon MSK
  • Rebalancing a Kafka Cluster in AWS MSK using CLI Commands
  • Using StsAssumeRoleCredentialsProvider with Glue Schema Registry Integration in Kafka Producer
  • Home
  • Contact Me
  • About Me
Copyright © 2017 Time Pass Techies