Spark using when otherwise clause

In this article I will illustrate how to use when clause in spark dataframe. Lets consider the below sql query to find the age group distribution in a city

SELECT city, 
case when age < 30 then 'young-age' when age < 50 then 'middle-age' else 'old-age' end AS "age-groups", 
COUNT(*) AS "count"
FROM population-data
GROUP BY city

The above query will result in the below sample output

City age-group count
New York young-age 32323
New York middle-age 54545
New York old-age 65656

Let’s implement the same in spark using the when clause

val dataframe = sparkSession.read
.parquet(
"INPUT_FILE_PATH"
)

val dfWhen = dataframe
.withColumn("age-group",
 when(col("age").lt(30),"young-age")
.when(col("age").lt(50),"middle-age")
.otherwise("old-age"))
.groupBy("city", "age-group")
.count()

dfWhen.show

Alternatively, we can use case clause as well

val dfCase = dataframe.withColumn("age-group", expr("case when age < 30 then 'young-age' " + "when age < 50 then 'middle-age' " + "else 'old-age' end"))

dfCase.show

That`s a quick overview of on how we can use when clause in Spark.

Leave a Reply

Your email address will not be published. Required fields are marked *