Skip to content

Big Data

Analytics And More
  • Home
  • Spark
  • Design Patterns
  • streaming
  • Map Reduce
  • Hive
  • Hdfs & Yarn
  • Pig
  • Oozie
  • Hbase

Tag: datasets and dataframe

Validating Spark DataFrame Schemas

May, 2019 adarsh

In this article I will illustrate how to do schema discovery for validation of column name before firing a select…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

spark merge two dataframes with different columns or schema

May, 2019 adarsh 1 Comment

In this article I will illustrate how to merge two dataframes with different schema. Spark supports below api for the…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

spark converting nested json to csv

May, 2019 adarsh

In this article I will illustrate how to convert a nested json to csv in apache spark. Spark does not…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

spark read avro data from s3

January, 2019 adarsh 1 Comment

In this article i will demonstrate how to read and write avro data in spark from amazon s3. We will…

Continue Reading →

Posted in: aws, Spark Filed under: aws emr, datasets and dataframe, Spark Rdd

spark create avro data using dataframe

November, 2018 adarsh

Avro is a language-neutral data serialization system and its schemas are usually written in JSON, and data is usually encoded…

Continue Reading →

Posted in: Spark Filed under: datasets and dataframe, Spark Rdd

dataframe adding column with constant value in spark

November, 2018 adarsh

In this article i will demonstrate how to add a column into a dataframe with a constant or static value…

Continue Reading →

Posted in: Data Analytics, Hive, Spark Filed under: datasets and dataframe, hive, Spark Rdd

using hive udtf in spark sql

adarsh

In this article i will demonstrate how to build a Hive UDTF and execute it in Apache Spark. In hive…

Continue Reading →

Posted in: Data Analytics, Hive, Spark Filed under: datasets and dataframe, hive, Spark Rdd

Post navigation

Page 1 of 4
1 2 … 4 Next →

Recent Posts

  • Producing events and handling credentials refresh for IAM enabled aws msk cluster using aws msk IAM auth library
  • spark example to replace a header delimiter
  • Scala code to get a secret stored in Azure key vault from databricks
  • Home
  • Contact Me
  • About Me
Copyright © 2017 Time Pass Techies