Mesosphere DC/OS SMACK Stack Hands-On Tutorial For DC/OS 1.10
Mesosphere DC/OS SMACK Stack Hands-On Tutorial For DC/OS 1.10
While this tutorial does not require any previous DC/OS or SMACK Stack
experience, it would be helpful to have knowledge of how clustered servers
work together (master nodes and worker nodes) and experience using the Linux
operating system and BASH shell.
While working with DC/OS and the SMACK Stack components, you will be using
Mesosphere’s DC/OS Dashboard, the DC/OS Command Line Interface (CLI) and
occasionally, plain Linux shell commands.
The environment you will use in this tutorial should be staged in advance,
including a DC/OS cluster running on AWS, Azure, Google Cloud Platform or on
prem. To run the SMACK Stack, you should have at least 10 private agent nodes
with enough CPU, Memory and Disk to support all of the tasks to be deployed on
the cluster. Contact your Mesosphere sales representative to get help installing
Kubernetes on DC/OS will allow operators to easily install, scale and upgrade
multiple production-grade Kubernetes clusters on Mesosphere DC/OS.
Infrastructure owners will be able to offer application developers Kubernetes
for Docker container orchestration alongside other data services or legacy
applications, all on shared DC/OS infrastructure while maintaining high
availability and isolation. All of these services running on DC/OS benefit from
complete hybrid cloud portability on an open platform.
The open source version of DC/OS supports the OAuth authentication method
using an OpenID authentication server. To log into your open source DC/OS
Dashboard, you can authenticate with your Google, GitHub or Microsoft
account. Point your Web browser to your master node URL to see the sign in
prompt.
Enterprise DC/OS
Enterprise DC/OS has the ability to link to your AD/LDAP directory service or
integrate with your SAML 2.0 and OAuth2 servers. But in this tutorial you will be
using a local DC/OS user.
When successfully logged in, you will be presented with the main DC/OS
Dashboard screen. The Dashboard shows the menu options down the left side
and the resource allocations and service health on the right side. Since this is a
newly launched DC/OS cluster, there are no resources allocated at this time.
$ dcos service
http://cassandra.apache.org/doc/latest
Features
• Easy installation
• Simple horizontal scaling of Cassandra nodes
• Straightforward backup and restore of data out of the box
• Multi-datacenter replication support
https://docs.mesosphere.com/service-docs/cassandra
In this section of the tutorial, you will deploy the Apache Cassandra distributed
database on the DC/OS agent nodes.
In the DC/OS Dashboard, click on the Catalog menu option on the left and
display the data services packages in the DC/OS Catalog. Then click on the
Cassandra package.
Click on the service category and keep the name of the Cassandra service as
cassandra.
Then click on the nodes category and keep the number of nodes at 3. This will
cause the Cassandra service to start three nodes on three different agent nodes.
After the Cassandra service starts up and passes its health check, you will see
the tasks running on the DC/OS Mesos cluster. Click on the Services menu
option on the left and then click on the cassandra service name. You will see the
Cassandra node managers running on three different DC/OS agent nodes and
you will see the Cassandra Mesos framework running as well.
http://kafka.apache.org/documentation.html
DC/OS Kafka gives you direct access to the Kafka API so that existing producers
and consumers can interoperate. You can configure and install DC/OS Kafka in
moments. Multiple Kafka clusters can be installed on DC/OS and managed
independently, so you can offer Kafka as a managed service to your
organization. See the Mesosphere DC/OS Kafka documentation here:
https://docs.mesosphere.com/service-docs/kafka
Benefits
DC/OS Kafka offers the following benefits of a semi-managed service:
• Easy installation
• Multiple Kafka clusters
• Elastic scaling of brokers
• Replication for high availability
• Kafka cluster and broker monitoring
Features
DC/OS Kafka provides the following features:
In this section of the tutorial, you will deploy the Apache Kafka messaging
environment on the DC/OS agent nodes.
In the DC/OS Dashboard, click on the Catalog menu option on the left and
display the data services packages in the DC/OS Catalog. Then click on the
Kafka package.
The DC/OS Kafka package configuration screens allow you to modify the default
configuration and in this tutorial you will be using those defaults.
Then, click on the brokers category and keep the number of brokers to deploy
as 3.
Next, deploy the Kafka service on DC/OS by clicking the REVIEW & RUN button.
After the Kafka service starts up and passes its health check, you will see the
tasks running on the DC/OS Mesos cluster. Click on the Services menu option on
the left and then click on the kafka service name and you will see the three
Kafka brokers running on three different DC/OS agent nodes. You will also see
the Kafka Mesos framework running. This is the tasks that coordinates the
launching of the other Kafka tasks.
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-
hdfs/HdfsDesign.html
And you can find the Mesosphere DC/OS HDFS documentation here:
https://docs.mesosphere.com/service-docs/hdfs/
In the DC/OS Dashboard, click on the Catalog menu option on the left and
display the data services packages in the DC/OS Catalog. Then click on the HDFS
package.
The DC/OS HDFS package configuration screens allow you to modify the default
configuration and in this tutorial you will modify the virtual networking option,
and the number of HDFS data nodes to deploy.
Next, click on the data_node category and keep the data_node count as 3. This
will start three data node tasks on three different DC/OS agent nodes.
After the HDFS service starts up and passes its health check, you will see the
tasks running on the DC/OS Mesos cluster. Click on the Services menu option on
the left and then click on the hdfs service name and you will see the three name
nodes, three journal nodes and three data nodes running on various DC/OS
agent nodes and you will see the HDFS Mesos framework running as well.
Next, launch an HDFS client shell session and run some Hadoop commands.
# NOTE: You may have to use the ssh-add command to get your private ssh key
# to automatically offer the key to the remote ssh server. Use these commands:
$ eval "$(ssh-agent)"
$ ssh-add ~/.ssh/my-private-key.key
$ bin/hadoop fs -ls /
Copy some test data to an HDFS file. First, create a directory in HDFS to hold
your data file. Use these commands:
$ bin/hadoop fs -ls /
Extract the data from HDFS and check the size of the new file using the
commands:
$ ls -alh ./test-data-2.txt
$ exit
At this point in the tutorial, you have configured and deployed three data
services on the DC/OS cluster. In the DC/OS Dashboard, you can click on the
Services option on the left menu to see the services running.
From your DC/OS CLI session, create the JSON file using these commands:
Once the Spark History Server starts up, you can view the history server console
by clicking on the console launch icon on the DC/OS Dashboard.
From the DC/OS Dashboard’s Services panel, view the Spark History Server
running. Place your mouse cursor just to the right of the spark-history service
name and you will see an arrow icon appear. Click on that icon to launch the
Spark History Server console.
Next you will configure and deploy the Spark service on the DC/OS cluster.
http://spark.apache.org/documentation.html
DC/OS Apache Spark consists of Apache Spark with a few custom commits. See:
https://github.com/mesosphere/spark
https://github.com/mesosphere/spark-build
Benefits
Features
• Multiversion support
• Run multiple Spark dispatchers
• Run against multiple HDFS clusters
• Backports of scheduling improvements
• Simple installation of all Spark components, including the dispatcher and
the history server
• Integration of the dispatcher and history server
• Zeppelin integration
• Kerberos and SSL support
https://docs.mesosphere.com/service-docs/spark/
In the DC/OS Dashboard, click on the Catalog menu option on the left and
display the data services packages in the DC/OS Catalog. Then click on the
Spark package.
Click on the service category and keep the name of the Spark service as spark.
Also in the service category, enter the URL to the Spark History Service that you
deployed previously. To get this URL, click on the Dashboard panel in your
DC/OS Web console. Copy that Web address into your paste buffer, but only
include up to the main hostname or IP address. Do not include the remainder of
the Web address. See below:
http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints
After the Spark service starts up and passes its health check, you will see the
tasks running on the DC/OS Mesos cluster. Click on the Services menu option on
the left and then click on the spark service name and you will see the Spark
Mesos Dispatcher task running on one of the DC/OS agent nodes.
You can view the Spark dispatcher console by clicking on the arrow icon just to
the right of the spark service name.
Now that the Spark Service and the Spark History Service are running on the
DC/OS cluster, you can submit your first Spark job.
Once your Spark job is submitted successfully, you can see the Spark Driver
program that was launched by the Spark Dispatcher. Click on the Web browser
tab that contains the Spark Dispatcher console that you opened previously.
Your new job shows up in the Launched Drivers list.
While your Spark job is running, the Spark Driver program will launch tasks that
will use CPU and memory resource offers from the Mesos scheduler. Open the
Once your Spark job is completed, it will be shown in the Finished Drivers list.
Previously, you created a test data file with 1000 lines of data and uploaded it to
the HDFS service running on your DC/OS cluster. In this section, you will submit
a Spark job that reads the contents of that file in HDFS and counts the number
of lines. From your DC/OS command line, run these commands:
Just like before, you can view the progress of the Spark job by viewing the tasks
on the DC/OS Dashboard, or the Spark Dispatcher Web console, or the Spark
History Server Web console. In the Spark Service task list, you can click on the
Kafka Revisited
In this section of the tutorial, you will use the Kafka messaging environment to
show how producers can put data into Kafka topics in a reliable and redundant
fashion and how consumers can retrieve that data from the topics.
Show the current list of Kafka brokers and topics with these commands:
Run a containerized application to read from the new Kafka topic. From the
DC/OS Dashboard, click on the Services menu option. Then click on the plus
sign to create a new service manually.
Then click the RUN SERVICE button to run this new service:
When your service completes the startup process, it will show up in the list of
services running in the application group.
You will see the STDERR and STDOUT log files for this service. Click on the
STDOUT button to see the current standard output.
This will generate some test data and place entries into the my-topic message
queue in Kafka.
Then go back to your STDOUT console for the kafka-consumer service and view
the Kafka messages.
Summary
This tutorial guided you through the process of deploying the components that
make up the SMACK Stack and also showed you how to run a Spark job that
reads from the HDFS service and from the Kafka service. Additionally, this
tutorial showed you how to test the Kafka service with consumers and
producers.
If you would like to quickly deploy these components you can use the pre-built
startup script named start-smackstack.sh found here in the scripts directory:
https://github.com/gregpalmr/smack-stack-tutorial
If you would like to review the Mesosphere Advanced SMACK Stack tutorial, you
can find that here:
https://github.com/gregpalmr/smack-stack-advanced-tutorial