Config and Start Spark Standalone Cluster
The spark official document provides guides about how to start a standalone cluster.
In Docklet, Apache Spark is installed in
robin has a Workspace named
note. There are three vnode in the vcluster: host-0, host-1, host-2.
robin opens Web Terminal, entering Spark home directory, starting spark master
root@host-0:~# cd /home/spark root@host-0:/home/spark# ./sbin/start-master.sh
robin could enter
logs directory to check logs. Spark master
will print its working URL
spark://HOST:PORT, used for connections by
spark slaves. Here it may be
spark://host-0:7077. The master WEB UI is
also printed, probably
Then start spark slaves in host-1 and host-2 using ssh
root@host-0:/home/spark# ssh root@host-1 /home/spark/sbin/start-slave.sh spark://host-0:7077 root@host-0:/home/spark# ssh root@host-2 /home/spark/sbin/start-slave.sh spark://host-0:7077
Done, the cluster is ready.
Docklet provides two scripts in
sbin for fast starting and stop spark clusters. The
dl_start_spark.sh can automatically start a spark cluster, the master of which is host-0, and all vnodes are slaves. The
dl_stop_spark.sh can stop the spark cluster started by
Now running an example
root@host-0:/home/spark# ./bin/spark-submit --master spark://host-0:7077 examples/src/main/python/pi.py 10
Docklet vcluster is in private network, which could not be accessed from public Internet. Docklet provides a proxy service to help visiting Web Server in the vcluster from outside.
In Docklet portal, click Config to configure visiting Spark WEB UI.
Assume the IP address of host-0 is 172.16.0.99
ip : 172.16.0.99, port : 8080
Click Enable . Now the Spark WEB UI can be accessed through the URL of
From WEB UI,
robin could check the information about the Spark cluster, including workers, jobs, etc.