In this article I will illustrate how to use when clause in spark dataframe. Lets consider the below sql query to find the age group distribution in a city
SELECT city, case when age < 30 then 'young-age' when age < 50 then 'middle-age' else 'old-age' end AS "age-groups", COUNT(*) AS "count" FROM population-data GROUP BY city
The above query will result in the below sample output
City age-group count New York young-age 32323 New York middle-age 54545 New York old-age 65656
Let’s implement the same in spark using the when clause
val dataframe = sparkSession.read .parquet( "INPUT_FILE_PATH" ) val dfWhen = dataframe .withColumn("age-group", when(col("age").lt(30),"young-age") .when(col("age").lt(50),"middle-age") .otherwise("old-age")) .groupBy("city", "age-group") .count() dfWhen.show
Alternatively, we can use case clause as well
val dfCase = dataframe.withColumn("age-group", expr("case when age < 30 then 'young-age' " + "when age < 50 then 'middle-age' " + "else 'old-age' end")) dfCase.show
That`s a quick overview of on how we can use when clause in Spark.