In this article I will illustrate how to use spark to create excel report with multiple sheets . We will be using crealytics/spark-excel an open source project for querying and writing excel files with Apache Spark.
To add the dependent libraries in sbt project we can use the below artifact
libraryDependencies += "com.crealytics" %% "spark-excel" % "0.13.5"
If you are using maven below is the dependency
<dependency> <groupId>com.crealytics</groupId> <artifactId>spark-excel_2.12</artifactId> <version>0.13.5</version> </dependency>
Below is an example, here we are creating two dataframes and we are writing the same as an xlsx file with two sheets with the name sheet-1 and sheet-2.
import org.apache.spark.sql.{DataFrame, SparkSession} import scala.collection.mutable.ListBuffer object SparkExcel { def getDataFrame(): List[DataFrame] = { lazy val sparkSession: SparkSession = SparkSession .builder() .master("local[*]") .getOrCreate() import sparkSession.implicits._ var sequenceOfOverview = ListBuffer[(String, String)]() sequenceOfOverview += Tuple2("user-1", "mysore") sequenceOfOverview += Tuple2("user-2", "bangalore") sequenceOfOverview += Tuple2("user-3", "chennai") sequenceOfOverview += Tuple2("user-4", "delhi") val df1 = sequenceOfOverview.toDF("NAME", "CITY") var sequenceOfOverview2 = ListBuffer[(String, String)]() sequenceOfOverview2 += Tuple2("user-1", "mysore") sequenceOfOverview2 += Tuple2("user-2", "bangalore") sequenceOfOverview2 += Tuple2("user-3", "chennai") sequenceOfOverview2 += Tuple2("user-4", "delhi") val df2 = sequenceOfOverview2.toDF("NAME", "CITY") List(df1, df2) } def main(args: Array[String]): Unit = { val outputFileName = "OUTPUT_FILE_PATH" var index = 1 getDataFrame().foreach(df => { val sheetName = "sheet-".concat(index.toString) df.write .format("com.crealytics.spark.excel") .option("dataAddress", s"'$sheetName'!A1") .option("useHeader", "true") .option("header", "true") .mode(org.apache.spark.sql.SaveMode.Append) .save(outputFileName) index += 1 }) } }
That`s a quick overview of on how we can write an xlsx file with multiple sheets in Spark.