In this article I will illustrate how to use spark to create excel report with multiple sheets . We will be using crealytics/spark-excel an open source project for querying and writing excel files with Apache Spark.
To add the dependent libraries in sbt project we can use the below artifact
libraryDependencies += "com.crealytics" %% "spark-excel" % "0.13.5"
If you are using maven below is the dependency
<dependency> <groupId>com.crealytics</groupId> <artifactId>spark-excel_2.12</artifactId> <version>0.13.5</version> </dependency>
Below is an example, here we are creating two dataframes and we are writing the same as an xlsx file with two sheets with the name sheet-1 and sheet-2.
import org.apache.spark.sql.{DataFrame, SparkSession}
import scala.collection.mutable.ListBuffer
object SparkExcel {
def getDataFrame(): List[DataFrame] = {
lazy val sparkSession: SparkSession = SparkSession
.builder()
.master("local[*]")
.getOrCreate()
import sparkSession.implicits._
var sequenceOfOverview = ListBuffer[(String, String)]()
sequenceOfOverview += Tuple2("user-1", "mysore")
sequenceOfOverview += Tuple2("user-2", "bangalore")
sequenceOfOverview += Tuple2("user-3", "chennai")
sequenceOfOverview += Tuple2("user-4", "delhi")
val df1 = sequenceOfOverview.toDF("NAME", "CITY")
var sequenceOfOverview2 = ListBuffer[(String, String)]()
sequenceOfOverview2 += Tuple2("user-1", "mysore")
sequenceOfOverview2 += Tuple2("user-2", "bangalore")
sequenceOfOverview2 += Tuple2("user-3", "chennai")
sequenceOfOverview2 += Tuple2("user-4", "delhi")
val df2 = sequenceOfOverview2.toDF("NAME", "CITY")
List(df1, df2)
}
def main(args: Array[String]): Unit = {
val outputFileName = "OUTPUT_FILE_PATH"
var index = 1
getDataFrame().foreach(df => {
val sheetName = "sheet-".concat(index.toString)
df.write
.format("com.crealytics.spark.excel")
.option("dataAddress", s"'$sheetName'!A1")
.option("useHeader", "true")
.option("header", "true")
.mode(org.apache.spark.sql.SaveMode.Append)
.save(outputFileName)
index += 1
})
}
}
That`s a quick overview of on how we can write an xlsx file with multiple sheets in Spark.