In this article, we will explore how we can covert a json string into a Spark dataframe. In the below code we have an employeeSystemStruct which is a string that we want to convert to a Spark dataframe.
Below is the code
object ConvertJsonStringToDataFrame extends App {
val employeeSystemStruct = "[
{
"id": "343434",
"name": "Mark",
"dob": "2019-01-01",
"company": "microsoft"
},
{
"id": "56565",
"name": "Steve",
"dob": "2019-01-01",
"company": "google"
},
{
"id": "787878",
"name": "Cummins",
"dob": "2019-01-01",
"company": "microsoft"
},
{
"id": "7872323",
"name": "nassir",
"dob": "2019-01-01",
"company": "microsoft"
}
]"
val df=Seq(employeeSystemStruct).toDS()
val jsondf = spark.read.json(df)
display(jsondf)
}
We can also convert this to a list of a custom object using the below code
object ConvertJsonStringToDataFrame extends App {
case class EmployeeStruct(id: String,name:String,
dob: String,name:String)
val employeeSystemStruct = "[
{
"id": "343434",
"name": "Mark",
"dob": "2019-01-01",
"company": "microsoft"
},
{
"id": "56565",
"name": "Steve",
"dob": "2019-01-01",
"company": "google"
},
{
"id": "787878",
"name": "Cummins",
"dob": "2019-01-01",
"company": "microsoft"
},
{
"id": "7872323",
"name": "nassir",
"dob": "2019-01-01",
"company": "microsoft"
}
]"
val df=Seq(employeeSystemStruct).toDS()
val jsondf = spark.read.json(df)
val employeeList = jsondf
.collect()
.map(row => {
EmployeeStruct
.apply(row.getAs("id"), row.getAs("name"),row.getAs("dob"),row.getAs("name"))
})
employeeList.foreach(hs => {
println(employeeList)
})
}