apache spark creating excel report with multiple sheets

In this article I will illustrate how to use spark to create excel report with multiple sheets . We will be using crealytics/spark-excel an open source project for querying and writing excel files with Apache Spark.

To add the dependent libraries in sbt project we can use the below artifact

libraryDependencies += "com.crealytics" %% "spark-excel" % "0.13.5"

If you are using maven below is the dependency

<dependency>
<groupId>com.crealytics</groupId>
<artifactId>spark-excel_2.12</artifactId>
<version>0.13.5</version>
</dependency>

Below is an example, here we are creating two dataframes and we are writing the same as an xlsx file with two sheets with the name sheet-1 and sheet-2.


import org.apache.spark.sql.{DataFrame, SparkSession}
import scala.collection.mutable.ListBuffer

object SparkExcel {

def getDataFrame(): List[DataFrame] = {

lazy val sparkSession: SparkSession = SparkSession
.builder()
.master("local[*]")
.getOrCreate()
import sparkSession.implicits._
var sequenceOfOverview = ListBuffer[(String, String)]()
sequenceOfOverview += Tuple2("user-1", "mysore")
sequenceOfOverview += Tuple2("user-2", "bangalore")
sequenceOfOverview += Tuple2("user-3", "chennai")
sequenceOfOverview += Tuple2("user-4", "delhi")
val df1 = sequenceOfOverview.toDF("NAME", "CITY")

var sequenceOfOverview2 = ListBuffer[(String, String)]()
sequenceOfOverview2 += Tuple2("user-1", "mysore")
sequenceOfOverview2 += Tuple2("user-2", "bangalore")
sequenceOfOverview2 += Tuple2("user-3", "chennai")
sequenceOfOverview2 += Tuple2("user-4", "delhi")
val df2 = sequenceOfOverview2.toDF("NAME", "CITY")

List(df1, df2)

}

def main(args: Array[String]): Unit = {

val outputFileName = "OUTPUT_FILE_PATH"
var index = 1
getDataFrame().foreach(df => {
val sheetName = "sheet-".concat(index.toString)
df.write
.format("com.crealytics.spark.excel")
.option("dataAddress", s"'$sheetName'!A1")
.option("useHeader", "true")
.option("header", "true")
.mode(org.apache.spark.sql.SaveMode.Append)
.save(outputFileName)

index += 1

})

}

}

That`s a quick overview of on how we can write an xlsx file with multiple sheets in Spark.