In this article I will illustrate how to use when clause in spark dataframe. Lets consider the below sql query to find the age group distribution in a city
SELECT city, case when age < 30 then 'young-age' when age < 50 then 'middle-age' else 'old-age' end AS "age-groups", COUNT(*) AS "count" FROM population-data GROUP BY city
The above query will result in the below sample output
City age-group count New York young-age 32323 New York middle-age 54545 New York old-age 65656
Let’s implement the same in spark using the when clause
val dataframe = sparkSession.read
.parquet(
"INPUT_FILE_PATH"
)
val dfWhen = dataframe
.withColumn("age-group",
when(col("age").lt(30),"young-age")
.when(col("age").lt(50),"middle-age")
.otherwise("old-age"))
.groupBy("city", "age-group")
.count()
dfWhen.show
Alternatively, we can use case clause as well
val dfCase = dataframe.withColumn("age-group", expr("case when age < 30 then 'young-age' " + "when age < 50 then 'middle-age' " + "else 'old-age' end"))
dfCase.show
That`s a quick overview of on how we can use when clause in Spark.