In this article, we will see how we can test private methods and fields in scala with spark support. There are 2 approaches we can consider, the first approach is the traditional java approach where we will use reflection to call and set values to the private methods and fields and the second approach is using the PrivateMethodTester trait from the scala test. We will explore both approaches in this article, starting with the PrivateMethodTester trait approach.
Create a Simple Scala class with a private method and field
This is the class we intend to test which has a private method and a private field. The private method takes a dataframe, column name, and value as parameters and returns the updated dataframe.
package com.timepasstechies.blog.testing import org.apache.spark.sql.DataFrame import org.apache.spark.sql.functions._ class ClassToTestWithPrivateMethod { private var privateField: String = "private field" private def privateMethod(dataFrame: DataFrame, columnToAdd: String, value: String) = { dataFrame.withColumn(columnToAdd, lit(value)) } }
Create a Simple Scala object with a private method and field
ObjectToTestWithPrivateMethod object is same as the above class ClassToTestWithPrivateMethod but its an object instead of a class .
package com.timepasstechies.blog.testing import org.apache.spark.sql.DataFrame import org.apache.spark.sql.functions._ object ObjectToTestWithPrivateMethod { private val privateField: String = "private field" private def privateMethod(dataFrame: DataFrame, columnToAdd: String, value: String) = { dataFrame.withColumn(columnToAdd, lit(value)) } }
Create a SparkSampleData class to create a dataFrame
We will be using this dataframe as a parameter to pass into the private method which takes a dataframe and adds to the column to the dataframe and returns the updated dataframe.
package com.timepasstechies.blog.testing import org.apache.spark.sql.{DataFrame, SparkSession} import scala.collection.mutable.ListBuffer class SparkSampleData { def getSampleDataFrame(sparkSession: SparkSession): DataFrame = { import sparkSession.implicits._ var sequenceOfOverview = ListBuffer[(String, String, String, Integer)]() sequenceOfOverview += Tuple4("Western Digital", "006", "20200901", 1) sequenceOfOverview += Tuple4("Western Digital", "2", "20200901", 0) sequenceOfOverview += Tuple4("Western Digital", "3", "20200901", 1) sequenceOfOverview += Tuple4("Western Digital", "4", "20200901", 0) sequenceOfOverview += Tuple4("Western Digital", "1", "20200902", 1) sequenceOfOverview += Tuple4("Western Digital", "2", "20200902", 0) sequenceOfOverview += Tuple4("Western Digital", "3", "20200902", 1) sequenceOfOverview += Tuple4("Western Digital", "4", "20200902", 1) sequenceOfOverview += Tuple4("Western Digital", "1", "20200903", 0) sequenceOfOverview += Tuple4("Western Digital", "2", "20200903", 0) sequenceOfOverview += Tuple4("Western Digital", "3", "20200903", 0) sequenceOfOverview += Tuple4("Western Digital", "4", "20200903", 1) sequenceOfOverview += Tuple4("Western Digital", "1", "20200904", 0) sequenceOfOverview += Tuple4("Western Digital", "2", "20200904", 0) sequenceOfOverview += Tuple4("Western Digital", "3", "20200904", 1) sequenceOfOverview += Tuple4("Western Digital", "4", "20200904", 1) val df1 = sequenceOfOverview.toDF("Employee", "Id", "doj", "numAwards") df1 } }
Create PrivateMethodTest Class
This is our unit test class which runs unit tests on private method and field in ClassToTestWithPrivateMethod and ObjectToTestWithPrivateMethod.
package com.testing.timepass import com.timepasstechies.blog.testing.{ ClassToTestWithPrivateMethod, ObjectToTestWithPrivateMethod, SparkSampleData, SparkSupportMixin } import org.apache.spark.sql.DataFrame import org.scalatest.{FeatureSpec, GivenWhenThen, PrivateMethodTester} class PrivateMethodTest extends FeatureSpec with GivenWhenThen with SparkSupportMixin with PrivateMethodTester { val enrichColumn = PrivateMethod[DataFrame]('privateMethod) val classToTest = new ClassToTestWithPrivateMethod val sparkDataLoader = new SparkSampleData val dataFrameFromClass = classToTest invokePrivate enrichColumn( sparkDataLoader.getSampleDataFrame(sparkSession = sparkSession), "new_column_to_add", "value for newly added column" ) val dataFrameFromObject = ObjectToTestWithPrivateMethod invokePrivate enrichColumn( sparkDataLoader.getSampleDataFrame(sparkSession = sparkSession), "new_column_to_add", "value for newly added column" ) assert(dataFrameFromClass.columns.contains("new_column_to_add")) assert(dataFrameFromObject.columns.contains("new_column_to_add")) }
SparkSupportMixin Trait
A trait which we will be mixing as we need support for sparksSession in all the unit test cases we will be writing.
package com.timepasstechies.blog.testing import org.apache.spark.sql.SparkSession trait SparkSupportMixin { lazy val sparkSession: SparkSession = SparkSession .builder() .master("local") .appName("Spark Testing App") .getOrCreate() }
Let`s now look into the traditional java approach of using reflection in scala to call private method and field for unit testing.
Create UnitTestToInvokePrivateMethodsClass
The UnitTestToInvokePrivateMethodsClass uses the reflection methods getDeclaredMethod and getDeclaredField to access private methods and fields in our unit test class.
package com.testing.timepass import com.timepasstechies.blog.testing.{ ClassToTestWithPrivateMethod, SparkSampleData, SparkSupportMixin } import org.apache.spark.sql.DataFrame import org.scalatest.{FeatureSpec, GivenWhenThen} class UnitTestToInvokePrivateMethodsClass extends FeatureSpec with GivenWhenThen with SparkSupportMixin { val classToTest = new ClassToTestWithPrivateMethod val privateField = classToTest.getClass.getDeclaredField("privateField") privateField.setAccessible(true) privateField.set( classToTest, "calling private field and setting a new string value" ) val privateMethod = classToTest.getClass .getDeclaredMethod( "privateMethod", classOf[DataFrame], classOf[String], classOf[String] ) privateMethod.setAccessible(true) val sparkDataLoader = new SparkSampleData val updatedDataframe = privateMethod .invoke( classToTest, sparkDataLoader.getSampleDataFrame(sparkSession = sparkSession), "new_column_to_add", "value for newly added column" ) .asInstanceOf[DataFrame] assert(updatedDataframe.columns.contains("new_column_to_add")) }
Create UnitTestToInvokePrivateMethodsObject
The UnitTestToInvokePrivateMethodsObject uses the reflection methods getDeclaredMethod and getDeclaredField to access private methods and fields in our unit test object. It has few differences compared to calling a class with private methods and fields.
package com.testing.timepass import com.timepasstechies.blog.testing.{ ObjectToTestWithPrivateMethod, SparkSampleData, SparkSupportMixin } import org.apache.spark.sql.DataFrame import org.scalatest.{FeatureSpec, GivenWhenThen} class UnitTestToInvokePrivateMethodsObject extends FeatureSpec with GivenWhenThen with SparkSupportMixin { val privateField = ObjectToTestWithPrivateMethod.getClass.getDeclaredField("privateField") privateField.setAccessible(true) privateField.set( ObjectToTestWithPrivateMethod, "calling private field and setting a new string value" ) val privateMethod = com.timepasstechies.blog.testing.ObjectToTestWithPrivateMethod.getClass .getDeclaredMethod( "privateMethod", classOf[DataFrame], classOf[String], classOf[String] ) privateMethod.setAccessible(true) val sparkDataLoader = new SparkSampleData val updatedDataframe = privateMethod .invoke( com.timepasstechies.blog.testing.ObjectToTestWithPrivateMethod, sparkDataLoader.getSampleDataFrame(sparkSession = sparkSession), "new_column_to_add", "value for newly added column" ) .asInstanceOf[DataFrame] assert(updatedDataframe.columns.contains("new_column_to_add")) }
Summary
Congratulations! You have learned how to build a simple unit test with Spark support to test scala class and objects with private methods and fields. The PrivateMethodTester from the scala test is a cleaner approach if we just want to test private methods whereas the traditional reflection approach is more flexible which can cater to both private methods and fields.