running spark in mesosphere

In this article i will illustrate how to install spark package and run a spark application in mesosphere. I will be using dcos cli to illustrate the same and below is the link for installing and configuring dcos cli

https://docs.mesosphere.com/1.12/cli/install/

Lets start with a simple spark program which we will be deploying in mesosphere. Below is the code


import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructType;

public class Test {

public static void main(String[] args) {

SparkSession spark = SparkSession.builder().appName("mesosphere_example").master("local").getOrCreate();
spark.sparkContext().setLogLevel("ERROR");
List<Row> list = new ArrayList<Row>();
list.add(RowFactory.create("one"));
list.add(RowFactory.create("two"));
list.add(RowFactory.create("three"));
list.add(RowFactory.create("four"));

List<org.apache.spark.sql.types.StructField> listOfStructField = new ArrayList<org.apache.spark.sql.types.StructField>();
listOfStructField.add(DataTypes.createStructField("test", DataTypes.StringType, true));
StructType structType = DataTypes.createStructType(listOfStructField);
Dataset<Row> data = spark.createDataFrame(list, structType);
data.show();

//Lets create the dataset of row using the Arrays asList Function

Dataset<Row> test = spark
.createDataFrame(Arrays.asList(new Movie("movie1", 2323d, "1212"), new Movie("movie2", 2323d, "1212"),
new Movie("movie3", 2323d, "1212"), new Movie("movie4", 2323d, "1212")), Movie.class);

test.show();
}
}

Below is the movie class which is used in the above code


import java.io.Serializable;

public class Movie implements Serializable {

private String name;
private Double rating;
private String timestamp;

public Movie(String name, Double rating, String timestamp) {
super();
this.name = name;
this.rating = rating;
this.timestamp = timestamp;
}

public Movie() {

}

public String getName() {
return name;
}

public void setName(String name) {
this.name = name;
}

public Double getRating() {
return rating;
}

public void setRating(Double rating) {
this.rating = rating;
}

public String getTimestamp() {
return timestamp;
}

public void setTimestamp(String timestamp) {
this.timestamp = timestamp;
}

}

Below is the pom file which can be used to create jar file


<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.timepass</groupId>
<artifactId>samples</artifactId>
<version>1.0.0</version>
<dependencies>

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.0</version>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs -->
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.1</version>
</dependency>

</dependencies>

<build>
<plugins>
<!-- Maven shade plug-in that creates uber JARs -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>

<configuration>
<source>1.8</source>
<target>1.8</target>

<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<!-- Additional configuration. -->
</configuration>

</execution>
</executions>
</plugin>
</plugins>
</build>

<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>

</project>

Lets deploy the jar file created above into mesosphere, login to the cluster using the below command


dcos.exe cluster setup <cluster url> --username=<username> --password=<password>

If we have ssl certificates we can set the same or we can disable the ssl verification using below command


dcos.exe config set core.ssl_verify false

Below is the command to install spark package in mesosphere


dcos package install spark --options <file location of json to install the spark package>

Below is the an example of json file that can be used


{
"service": {
"name": "/sandbox/username/spark",
"cpus": 1,
"mem": 1024,
"role": "*",
"service_account": "",
"service_account_secret": "",
"user": "root",
"docker-image": "",
"log-level": "INFO",
"virtual_network_enabled": false,
"virtual_network_name": "dcos",
"virtual_network_plugin_labels": [],
"UCR_containerizer": true,
"docker_user": "root",
"use_bootstrap_for_IP_detect": false
},
"security": {
"kerberos": {
"enabled": false,
"kdc": {},
"krb5conf": ""
}
},
"hdfs": {}
}

Once we have a spark service running we can submit spark job to the service using the below command


dcos spark run --submit-args="--class com.spark.driver.example.Test https://<artifactory-ip>/spark-example.jar" --name=/sandbox/username/spark

To verify whether the spark job is running as expected, we can login to the mesosphere ui and navigate to the spark service and verify the spark job that was submitted using the logs in mesosphere service ui.