scala test private methods and fields with spark support

In this article, we will see how we can test private methods and fields in scala with spark support. There are 2 approaches we can consider, the first approach is the traditional java approach where we will use reflection to call and set values to the private methods and fields and the second approach is using the PrivateMethodTester trait from the scala test. We will explore both approaches in this article, starting with the PrivateMethodTester trait approach.

Create a Simple Scala class with a private method and field

This is the class we intend to test which has a private method and a private field. The private method takes a dataframe, column name, and value as parameters and returns the updated dataframe.

package com.timepasstechies.blog.testing

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._

class ClassToTestWithPrivateMethod {

private var privateField: String = "private field"

private def privateMethod(dataFrame: DataFrame,
columnToAdd: String,
value: String) = {

dataFrame.withColumn(columnToAdd, lit(value))

}

}
Create a Simple Scala object with a private method and field

ObjectToTestWithPrivateMethod object is same as the above class  ClassToTestWithPrivateMethod but its an object instead of a class .

package com.timepasstechies.blog.testing
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._

object ObjectToTestWithPrivateMethod {

private val privateField: String = "private field"

private def privateMethod(dataFrame: DataFrame,
columnToAdd: String,
value: String) = {

dataFrame.withColumn(columnToAdd, lit(value))

}

}
Create a SparkSampleData class to create a dataFrame

We will be using this dataframe as a parameter to pass into the private method which takes a dataframe and adds to the column to the dataframe and returns the updated dataframe.

package com.timepasstechies.blog.testing

import org.apache.spark.sql.{DataFrame, SparkSession}

import scala.collection.mutable.ListBuffer

class SparkSampleData {

def getSampleDataFrame(sparkSession: SparkSession): DataFrame = {

import sparkSession.implicits._

var sequenceOfOverview = ListBuffer[(String, String, String, Integer)]()
sequenceOfOverview += Tuple4("Western Digital", "006", "20200901", 1)
sequenceOfOverview += Tuple4("Western Digital", "2", "20200901", 0)
sequenceOfOverview += Tuple4("Western Digital", "3", "20200901", 1)
sequenceOfOverview += Tuple4("Western Digital", "4", "20200901", 0)

sequenceOfOverview += Tuple4("Western Digital", "1", "20200902", 1)
sequenceOfOverview += Tuple4("Western Digital", "2", "20200902", 0)
sequenceOfOverview += Tuple4("Western Digital", "3", "20200902", 1)
sequenceOfOverview += Tuple4("Western Digital", "4", "20200902", 1)

sequenceOfOverview += Tuple4("Western Digital", "1", "20200903", 0)
sequenceOfOverview += Tuple4("Western Digital", "2", "20200903", 0)
sequenceOfOverview += Tuple4("Western Digital", "3", "20200903", 0)
sequenceOfOverview += Tuple4("Western Digital", "4", "20200903", 1)

sequenceOfOverview += Tuple4("Western Digital", "1", "20200904", 0)
sequenceOfOverview += Tuple4("Western Digital", "2", "20200904", 0)
sequenceOfOverview += Tuple4("Western Digital", "3", "20200904", 1)
sequenceOfOverview += Tuple4("Western Digital", "4", "20200904", 1)

val df1 =
sequenceOfOverview.toDF("Employee", "Id", "doj", "numAwards")
df1
}

}

Create PrivateMethodTest Class

This is our unit test class which runs unit tests on private method and field in ClassToTestWithPrivateMethod and ObjectToTestWithPrivateMethod.

package com.testing.timepass

import com.timepasstechies.blog.testing.{
ClassToTestWithPrivateMethod,
ObjectToTestWithPrivateMethod,
SparkSampleData,
SparkSupportMixin
}
import org.apache.spark.sql.DataFrame
import org.scalatest.{FeatureSpec, GivenWhenThen, PrivateMethodTester}

class PrivateMethodTest
extends FeatureSpec
with GivenWhenThen
with SparkSupportMixin
with PrivateMethodTester {

val enrichColumn = PrivateMethod[DataFrame]('privateMethod)

val classToTest = new ClassToTestWithPrivateMethod
val sparkDataLoader = new SparkSampleData

val dataFrameFromClass = classToTest invokePrivate enrichColumn(
sparkDataLoader.getSampleDataFrame(sparkSession = sparkSession),
"new_column_to_add",
"value for newly added column"
)

val dataFrameFromObject = ObjectToTestWithPrivateMethod 
invokePrivate enrichColumn(
sparkDataLoader.getSampleDataFrame(sparkSession = sparkSession),
"new_column_to_add",
"value for newly added column"
)

assert(dataFrameFromClass.columns.contains("new_column_to_add"))
assert(dataFrameFromObject.columns.contains("new_column_to_add"))

}

SparkSupportMixin Trait

A trait which we will be mixing as we need support for sparksSession in all the unit test cases we will be writing.

package com.timepasstechies.blog.testing

import org.apache.spark.sql.SparkSession

trait SparkSupportMixin {

lazy val sparkSession: SparkSession = SparkSession
.builder()
.master("local")
.appName("Spark Testing App")
.getOrCreate()

}

Let`s now look into the traditional java approach of using reflection in scala to call private method and field for unit testing.

Create UnitTestToInvokePrivateMethodsClass

The UnitTestToInvokePrivateMethodsClass uses the reflection methods getDeclaredMethod and getDeclaredField to access private methods and fields in our unit test class.

package com.testing.timepass

import com.timepasstechies.blog.testing.{
ClassToTestWithPrivateMethod,
SparkSampleData,
SparkSupportMixin
}
import org.apache.spark.sql.DataFrame
import org.scalatest.{FeatureSpec, GivenWhenThen}

class UnitTestToInvokePrivateMethodsClass
extends FeatureSpec
with GivenWhenThen
with SparkSupportMixin {

val classToTest = new ClassToTestWithPrivateMethod
val privateField = classToTest.getClass.getDeclaredField("privateField")
privateField.setAccessible(true)
privateField.set(
classToTest,
"calling private field and setting a new string value"
)

val privateMethod = classToTest.getClass
.getDeclaredMethod(
"privateMethod",
classOf[DataFrame],
classOf[String],
classOf[String]
)
privateMethod.setAccessible(true)
val sparkDataLoader = new SparkSampleData
val updatedDataframe = privateMethod
.invoke(
classToTest,
sparkDataLoader.getSampleDataFrame(sparkSession = sparkSession),
"new_column_to_add",
"value for newly added column"
)
.asInstanceOf[DataFrame]

assert(updatedDataframe.columns.contains("new_column_to_add"))

}
Create UnitTestToInvokePrivateMethodsObject

The UnitTestToInvokePrivateMethodsObject uses the reflection methods getDeclaredMethod and getDeclaredField to access private methods and fields in our unit test object. It has few differences compared to calling a class with private methods and fields.

package com.testing.timepass

import com.timepasstechies.blog.testing.{
ObjectToTestWithPrivateMethod,
SparkSampleData,
SparkSupportMixin
}
import org.apache.spark.sql.DataFrame
import org.scalatest.{FeatureSpec, GivenWhenThen}

class UnitTestToInvokePrivateMethodsObject
extends FeatureSpec
with GivenWhenThen
with SparkSupportMixin {

val privateField =
ObjectToTestWithPrivateMethod.getClass.getDeclaredField("privateField")
privateField.setAccessible(true)
privateField.set(
ObjectToTestWithPrivateMethod,
"calling private field and setting a new string value"
)

val privateMethod =
com.timepasstechies.blog.testing.ObjectToTestWithPrivateMethod.getClass
.getDeclaredMethod(
"privateMethod",
classOf[DataFrame],
classOf[String],
classOf[String]
)
privateMethod.setAccessible(true)
val sparkDataLoader = new SparkSampleData
val updatedDataframe = privateMethod
.invoke(
com.timepasstechies.blog.testing.ObjectToTestWithPrivateMethod,
sparkDataLoader.getSampleDataFrame(sparkSession = sparkSession),
"new_column_to_add",
"value for newly added column"
)
.asInstanceOf[DataFrame]

assert(updatedDataframe.columns.contains("new_column_to_add"))

}

Summary

Congratulations! You have learned how to build a simple unit test with Spark support to test scala class and objects with private methods and fields. The PrivateMethodTester from the scala test is a cleaner approach if we just want to test private methods whereas the traditional reflection approach is more flexible which can cater to both private methods and fields.