0% found this document useful (0 votes)

88 views

Mesosphere DC/OS SMACK Stack Hands-On Tutorial For DC/OS 1.10

This document provides an introduction to deploying the SMACK stack on Mesosphere DC/OS. It will guide the user through deploying the components of the SMACK stack, including Spark, Cassandra, Kafka, and other services like HDFS. The tutorial assumes basic knowledge of clustered servers and uses the DC/OS dashboard, CLI, and Linux commands. It first covers logging into the DC/OS dashboard and installing the CLI. It then guides the user through deploying the Cassandra database service from the DC/OS catalog to three agent nodes as the first step in deploying the full SMACK stack.

Uploaded by

Deim0s243

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views

Mesosphere DC/OS SMACK Stack Hands-On Tutorial For DC/OS 1.10

Uploaded by

Deim0s243

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Mesosphere DC/OS

SMACK Stack Hands-on Tutorial

for DC/OS 1.10

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 1

Introduction
Welcome to Mesosphere’s SMACK Stack hands-on tutorial for DC/OS. This
tutorial is designed to guide you through the process of deploying the SMACK
Stack components including Spark, Cassandra, and Kafka on an DC/OS Mesos
cluster. Additionally, you will be guided through the process of deploying
Apache Hadoop HDFS and a few other services to compliment the SMACK Stack
components.

While this tutorial does not require any previous DC/OS or SMACK Stack
experience, it would be helpful to have knowledge of how clustered servers
work together (master nodes and worker nodes) and experience using the Linux
operating system and BASH shell.

While working with DC/OS and the SMACK Stack components, you will be using
Mesosphere’s DC/OS Dashboard, the DC/OS Command Line Interface (CLI) and
occasionally, plain Linux shell commands.

If you would like to review documentation on Mesosphere’s DC/OS and the

Apache Mesos Project, refer to these links:

• Mesosphere’s Enterprise DC/OS: http://mesosphere.io

• Open Source DC/OS: http://dcos.io

• Apache Mesos Project: http://mesos.apache.org

The environment you will use in this tutorial should be staged in advance,
including a DC/OS cluster running on AWS, Azure, Google Cloud Platform or on
prem. To run the SMACK Stack, you should have at least 10 private agent nodes
with enough CPU, Memory and Disk to support all of the tasks to be deployed on
the cluster. Contact your Mesosphere sales representative to get help installing

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 2

an Enterprise DC/OS cluster, or if you are not a customer yet, deploy an Open
Source DC/OS cluster.

Enterprise DC/OS and Data Services

Apache Mesos is the open-source distributed systems kernel at the heart of the
Mesosphere DC/OS. It abstracts the entire datacenter into a single pool of
computing resources, simplifying running distributed systems at scale.

A key design criteria of Apache Mesos is its two-level, application aware,

scheduler architecture, making it easier to operate, scale and extend.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 3

Enterprise DC/OS is the most flexible platform for containerized, data intensive
application.

Extending the Mesosphere philosophy of emphasizing “freedom of choice” on

DC/OS, Marathon and Kubernetes are both available for container
orchestration. Development teams can now choose container orchestrators on
our platform as easily as they choose data services, CI/CD, or networking tools.
Kubernetes on DC/OS brings a public cloud-like “Containers-as-a-Service”
experience to any infrastructure, and allows you to run Kubernetes applications
alongside big data services with a common set of security, maintenance, and
management tools.

Kubernetes on DC/OS will allow operators to easily install, scale and upgrade
multiple production-grade Kubernetes clusters on Mesosphere DC/OS.
Infrastructure owners will be able to offer application developers Kubernetes
for Docker container orchestration alongside other data services or legacy
applications, all on shared DC/OS infrastructure while maintaining high
availability and isolation. All of these services running on DC/OS benefit from
complete hybrid cloud portability on an open platform.

Many IT organizations are developing and deploying a new generation of highly

integrated, data intensive applications that process data in a real-time or semi

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 4

real-time basis. These new applications requiring running containerized
applications in the same environment as their analytics and data storage
applications. Mesosphere’s DC/OS is supremely suited for supporting these
types of mix-workload requirements.

By allowing the SMACK Stack to run in the same deployment environment,

DC/OS allows custom containerized applications, often implemented as
microservices, to run right next to stateful services like Kafka, for messaging,
Spark for analytics and Cassandra for highly scalable storage.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 5

DC/OS Dashboard
In this section of the tutorial you will log into the DC/OS Web based Dashboard
and create an environment for deploying the SMACK Stack components.

Open Source DC/OS

The open source version of DC/OS supports the OAuth authentication method
using an OpenID authentication server. To log into your open source DC/OS
Dashboard, you can authenticate with your Google, GitHub or Microsoft
account. Point your Web browser to your master node URL to see the sign in
prompt.

Click on the service that you would like to use to authenticate.

Enterprise DC/OS

Enterprise DC/OS has the ability to link to your AD/LDAP directory service or
integrate with your SAML 2.0 and OAuth2 servers. But in this tutorial you will be
using a local DC/OS user.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 6

At this time, log in using the DC/OS administrator user and the password
provided by your system administrator. Click on the LOG IN button.

When successfully logged in, you will be presented with the main DC/OS
Dashboard screen. The Dashboard shows the menu options down the left side
and the resource allocations and service health on the right side. Since this is a
newly launched DC/OS cluster, there are no resources allocated at this time.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 7

DC/OS Command Line Interface (CLI)
Several steps in this tutorial require the use of the DC/OS Command Line
Interface or CLI. The CLI is available on Windows, Mac OS X and Linux operating
systems. Install the CLI using the commands provided from the Dashboard’s
pull down menu in the upper left hand corner of the Dashboard page. Click on
the cluster name in the upper left corner to view the Install CLI link.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 8

Click on the Install CLI link to view the detailed instructions for installing the CLI
in your OS environment. Copy and paste those commands and run them on
your laptop or other client computer.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 9

Follow the prompts and you will be successfully logged into the cluster via the
CLI. Test the CLI with a command to list the running services:

$ dcos service

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 10

Begin the Tutorial
In this tutorial you will be configuring and deploying the SMACK Stack and other
packages from the DC/OS service catalog. Mesosphere has created the service
catalog as a way to quickly deploy complex services that require multiple tasks
to be launched in a specific order and on various agent nodes in the cluster.
Click on the Catalog menu option on the left to see the packages available. If
you scroll down, you will see over 100 packages available from the community
includes databases, analytical tools, microservice and container tools and
more.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 11

Apache Cassandra
DC/OS Apache Cassandra is an automated service that makes it easy to deploy
and manage Apache Cassandra on DC/OS. Apache Cassandra is a distributed
NoSQL database offering high availability, fault tolerance and scalability across
data centers.

For more information on Apache Cassandra, see the Apache

Cassandra documentation at:

http://cassandra.apache.org/doc/latest

Features

• Easy installation
• Simple horizontal scaling of Cassandra nodes
• Straightforward backup and restore of data out of the box
• Multi-datacenter replication support

See the Mesosphere DC/OS Cassandra documentation at:

https://docs.mesosphere.com/service-docs/cassandra

In this section of the tutorial, you will deploy the Apache Cassandra distributed
database on the DC/OS agent nodes.

Configure and Deploy Cassandra on DC/OS

In the DC/OS Dashboard, click on the Catalog menu option on the left and
display the data services packages in the DC/OS Catalog. Then click on the
Cassandra package.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 12

You will see some details about the Cassandra service on DC/OS. Click on the
REVIEW & RUN button.

Then click on the EDIT button to modify the configuration.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 13

The DC/OS Cassandra package configuration screens allow you to modify the
default configuration and in this tutorial you will be using the default
configuration settings for deployment.

Click on the service category and keep the name of the Cassandra service as
cassandra.

Then click on the nodes category and keep the number of nodes at 3. This will
cause the Cassandra service to start three nodes on three different agent nodes.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 14

Now that you have completed the changes needed to deploy the Cassandra
service on DC/OS, click the REVIEW & RUN button.

Then click the RUN SERVICE button.

After the Cassandra service starts up and passes its health check, you will see
the tasks running on the DC/OS Mesos cluster. Click on the Services menu
option on the left and then click on the cassandra service name. You will see the
Cassandra node managers running on three different DC/OS agent nodes and
you will see the Cassandra Mesos framework running as well.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 15

Apache Kafka
DC/OS Apache Kafka is an automated service that makes it easy to deploy and
manage Apache Kafka on Mesosphere DC/OS, eliminating nearly all of the
complexity traditionally associated with managing a Kafka cluster. Apache
Kafka is a distributed high-throughput publish-subscribe messaging system
with strong ordering guarantees. Kafka clusters are highly available, fault
tolerant, and very durable. See the Apache Kafka documentation here:

http://kafka.apache.org/documentation.html

DC/OS Kafka gives you direct access to the Kafka API so that existing producers
and consumers can interoperate. You can configure and install DC/OS Kafka in
moments. Multiple Kafka clusters can be installed on DC/OS and managed
independently, so you can offer Kafka as a managed service to your
organization. See the Mesosphere DC/OS Kafka documentation here:

https://docs.mesosphere.com/service-docs/kafka

Benefits
DC/OS Kafka offers the following benefits of a semi-managed service:

• Easy installation
• Multiple Kafka clusters
• Elastic scaling of brokers
• Replication for high availability
• Kafka cluster and broker monitoring

Features
DC/OS Kafka provides the following features:

• Single-command installation for rapid provisioning

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 16

• Multiple clusters for multiple tenancy with DC/OS
• High availability runtime configuration and software updates
• Storage volumes for enhanced data durability, known as Mesos Dynamic
Reservations and Persistent Volumes
• Integration with syslog-compatible logging services for diagnostics and
troubleshooting
• Integration with statsd-compatible metrics services for capacity and
performance monitoring

In this section of the tutorial, you will deploy the Apache Kafka messaging
environment on the DC/OS agent nodes.

Configure and Deploy Kafka on DC/OS

In the DC/OS Dashboard, click on the Catalog menu option on the left and
display the data services packages in the DC/OS Catalog. Then click on the
Kafka package.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 17

You will see some details about the Kafka service on DC/OS. Click on the REVIEW
& RUN button.

Then click on the EDIT button to modify the configuration.

The DC/OS Kafka package configuration screens allow you to modify the default
configuration and in this tutorial you will be using those defaults.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 18

Click on the service category and keep the name of the Kafka service as kafka.

Then, click on the brokers category and keep the number of brokers to deploy
as 3.

Next, deploy the Kafka service on DC/OS by clicking the REVIEW & RUN button.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 19

Then click the RUN SERVICE button.

After the Kafka service starts up and passes its health check, you will see the
tasks running on the DC/OS Mesos cluster. Click on the Services menu option on
the left and then click on the kafka service name and you will see the three
Kafka brokers running on three different DC/OS agent nodes. You will also see
the Kafka Mesos framework running. This is the tasks that coordinates the
launching of the other Kafka tasks.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 20

Apache Hadoop HDFS
DC/OS Apache HDFS is a managed service that makes it easy to deploy and
manage an HA Apache HDFS cluster on Mesosphere DC/OS. Apache Hadoop
Distributed File System (HDFS) is an open source distributed file system based
on Google’s Google File System(GFS) paper. It is a replicated and distributed file
system interface for use with “big data” and “fast data” applications.

You can find the Apache Hadoop documentation here:

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-
hdfs/HdfsDesign.html

And you can find the Mesosphere DC/OS HDFS documentation here:

https://docs.mesosphere.com/service-docs/hdfs/

Configure and Deploy HDFS on DC/OS

In the DC/OS Dashboard, click on the Catalog menu option on the left and
display the data services packages in the DC/OS Catalog. Then click on the HDFS
package.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 21

You will see some details about the HDFS service on DC/OS. Click on the REVIEW
& RUN button.

Then click on the EDIT button to modify the configuration.

The DC/OS HDFS package configuration screens allow you to modify the default
configuration and in this tutorial you will modify the virtual networking option,
and the number of HDFS data nodes to deploy.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 22

Click on the service category and keep the name of the HDFS service as hdfs.
Also, click on the check box next to the VIRTUAL_NETWORK_ENABLED option.
This will allow applications running on the cluster to access the HDFS service
without knowing on which DC/OS agent nodes the various HDFS components
are running.

Next, click on the data_node category and keep the data_node count as 3. This
will start three data node tasks on three different DC/OS agent nodes.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 23

Now that you have completed the changes needed to deploy the HDFS service
on DC/OS, click the REVIEW & RUN button.

Then click the RUN SERVICE button.

After the HDFS service starts up and passes its health check, you will see the
tasks running on the DC/OS Mesos cluster. Click on the Services menu option on
the left and then click on the hdfs service name and you will see the three name
nodes, three journal nodes and three data nodes running on various DC/OS
agent nodes and you will see the HDFS Mesos framework running as well.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 24

Using the HDFS Service

Next, launch an HDFS client shell session and run some Hadoop commands.

First issue the command to launch an hdfs-client Docker container on a node in

the cluster. Here are the commands to use:

# NOTE: You may have to use the ssh-add command to get your private ssh key
# to automatically offer the key to the remote ssh server. Use these commands:
$ eval "$(ssh-agent)"
$ ssh-add ~/.ssh/my-private-key.key

$ dcos node ssh --master-proxy --leader "docker run -it mesosphere/hdfs-client:1.0.0-

2.6.0 bash"

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 25

Create an HDFS directory for the Spark History Server with the Hadoop
command:

$ bin/hadoop fs -mkdir -p /history

$ bin/hadoop fs -ls /

Copy some test data to an HDFS file. First, create a directory in HDFS to hold
your data file. Use these commands:

$ bin/hadoop fs -mkdir /test-data

$ bin/hadoop fs -ls /

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 26

Then create a data file with 1000 records and upload it to the HDFS directory.
Use these commands:

$ dd if=/dev/urandom of=test-data.txt bs=1048576 count=10

$ bin/hadoop fs -put test-data.txt hdfs:///test-data/test-data.txt

$ bin/hadoop fs -ls /test-data

Extract the data from HDFS and check the size of the new file using the
commands:

$ bin/hadoop fs -get hdfs:///test-data/test-data.txt ./test-data-2.txt

$ ls -alh ./test-data-2.txt

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 27

Exit out of the HDFS client and return to your DC/OS CLI session. Use this
commands:

$ exit

Cassandra, Kafka, and HDFS running on DC/OS

At this point in the tutorial, you have configured and deployed three data
services on the DC/OS cluster. In the DC/OS Dashboard, you can click on the
Services option on the left menu to see the services running.

Click on the Service menu option:

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 28

Notice that each of the data services frameworks has its own CPU, MEM and
DISK resource allocations. If you click on the Kafka service, you will see that
three Kafka brokers have been started on three different DC/OS agent nodes.
Later, you can experiment with modifying the configuration of Kafka, Cassandra
and HDFS and add brokers, Cassandra nodes and data nodes to the services.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 29

Spark History Server
The Spark History Server can be used to track the progress and history of the
Spark jobs you submit on the DC/OS cluster and the History Server stores its
data in the HDFS directory you created above (hdfs:///history). You will not be
using a Spark History Server package in the DC/OS Catalog for this part of the
tutorial, instead, you will start the Spark History Server using Marathon and an
application configuration file in JSON format.

From your DC/OS CLI session, create the JSON file using these commands:

$ cat > spark-history-options.json <<EOF

{
"name": "spark-history",
"hdfs-config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints"
}
EOF

$ dcos package install spark-history --options=spark-history-options.json --yes

Once the Spark History Server starts up, you can view the history server console
by clicking on the console launch icon on the DC/OS Dashboard.

From the DC/OS Dashboard’s Services panel, view the Spark History Server
running. Place your mouse cursor just to the right of the spark-history service
name and you will see an arrow icon appear. Click on that icon to launch the
Spark History Server console.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 30

Because you have not yet launched the Spark service on the DC/OS cluster and
you have not yet submitted any Spark jobs, you will not see any job history at
this time.

Next you will configure and deploy the Spark service on the DC/OS cluster.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 31

Apache Spark
Apache Spark is a fast and general-purpose cluster computing system for big
data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
engine that supports general computation graphs for data analysis. It also
supports a rich set of higher-level tools including Spark SQL for SQL and
DataFrames, MLlib for machine learning, GraphX for graph processing, and
Spark Streaming for stream processing. For more information, see the Apache
Spark documentation at:

http://spark.apache.org/documentation.html

DC/OS Apache Spark consists of Apache Spark with a few custom commits. See:

https://github.com/mesosphere/spark

It also has some DC/OS-specific packaging. See:

https://github.com/mesosphere/spark-build

DC/OS Apache Spark includes:

• Mesos Cluster Dispatcher

• Spark History Server
• DC/OS Apache Spark CLI
• Interactive Spark shell

Benefits

• Utilization: DC/OS Apache Spark leverages Mesos to run Spark on the

same cluster as other DC/OS services
• Improved efficiency
• Simple Management

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 32

• Multi-team support
• Interactive analytics through notebooks
• UI integration
• Security, including file- and environment-based secrets

Features

• Multiversion support
• Run multiple Spark dispatchers
• Run against multiple HDFS clusters
• Backports of scheduling improvements
• Simple installation of all Spark components, including the dispatcher and
the history server
• Integration of the dispatcher and history server
• Zeppelin integration
• Kerberos and SSL support

You can review the Mesosphere DC/OS Spark documentation here:

https://docs.mesosphere.com/service-docs/spark/

Configure and Deploy Spark on DC/OS

In the DC/OS Dashboard, click on the Catalog menu option on the left and
display the data services packages in the DC/OS Catalog. Then click on the
Spark package.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 33

You will see some details about the HDFS service on DC/OS. Click on the REVIEW
& RUN button.

Then click on the EDIT button to modify the configuration.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 34

The DC/OS Spark package configuration screens allow you to modify the default
configuration and in this tutorial you will be modifying the URL to the HDFS
service.

Click on the service category and keep the name of the Spark service as spark.

Also in the service category, enter the URL to the Spark History Service that you
deployed previously. To get this URL, click on the Dashboard panel in your
DC/OS Web console. Copy that Web address into your paste buffer, but only
include up to the main hostname or IP address. Do not include the remainder of
the Web address. See below:

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 35

In the SPARK-HISTORY-SERVER-URL field, paste the contents of your paste
buffer container and add the rest of the specification for the Spark History
Server like this:

<pasted Web address>/service/dev/smackstack/spark-history

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 36

Finally, change the URL to your HDFS service so that the Spark service can
download the core-site.xml and hdfs-site.xml configuration scripts. Enter this
value in the CONFIG-URL field:

http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 37

Now that you have completed the changes needed to deploy the Spark service
on DC/OS, click the REVIEW & RUN button.

Then click the RUN SERVICE button.

After the Spark service starts up and passes its health check, you will see the
tasks running on the DC/OS Mesos cluster. Click on the Services menu option on
the left and then click on the spark service name and you will see the Spark
Mesos Dispatcher task running on one of the DC/OS agent nodes.

You can view the Spark dispatcher console by clicking on the arrow icon just to
the right of the spark service name.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 38

The Spark Mesos Dispatcher console will display in a new Web browser tab.
Because you have not yet submitted a Spark job, no Spark drivers will be
shown.

Submit Your First Spark Job

Now that the Spark Service and the Spark History Service are running on the
DC/OS cluster, you can submit your first Spark job.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 39

$ dcos package install spark --cli --yes

$ dcos spark run --name 'spark' --submit-args='--conf spark.eventLog.enabled=true --

conf spark.eventLog.dir=hdfs://hdfs/history --conf spark.mesos.coarse=true --conf
spark.cores.max=4 --conf spark.executor.memory=1g --driver-cores 1 --driver-memory
1g --class org.apache.spark.examples.SparkPi
https://downloads.mesosphere.com/spark/assets/spark-examples_2.10-1.4.0-
SNAPSHOT.jar 50'

Once your Spark job is submitted successfully, you can see the Spark Driver
program that was launched by the Spark Dispatcher. Click on the Web browser
tab that contains the Spark Dispatcher console that you opened previously.
Your new job shows up in the Launched Drivers list.

While your Spark job is running, the Spark Driver program will launch tasks that
will use CPU and memory resource offers from the Mesos scheduler. Open the

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 40

DC/OS Dashboard’s Nodes panel and you will see more CPU and memory being
allocated on the Mesos cluster.

Once your Spark job is completed, it will be shown in the Finished Drivers list.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 41

Submit a Spark Job that Uses HDFS

Previously, you created a test data file with 1000 lines of data and uploaded it to
the HDFS service running on your DC/OS cluster. In this section, you will submit
a Spark job that reads the contents of that file in HDFS and counts the number
of lines. From your DC/OS command line, run these commands:

$ dcos package install --cli spark

$ dcos spark run --name 'spark' --submit-args='--conf spark.eventLog.enabled=true --

conf spark.eventLog.dir=hdfs://hdfs/history --conf spark.mesos.coarse=true --conf
spark.cores.max=4 --conf spark.executor.memory=1g --driver-cores 1 --driver-memory
1g --class HDFSWordCount http://infinity-
artifacts.s3.amazonaws.com/spark/sparkjob-assembly-1.0.jar hdfs:///test-data/test-
data.txt'

Just like before, you can view the progress of the Spark job by viewing the tasks
on the DC/OS Dashboard, or the Spark Dispatcher Web console, or the Spark
History Server Web console. In the Spark Service task list, you can click on the

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 42

“logs” icon for the HDFSWordCount driver task and then click on the STDOUT
tab to view the results of the Spark job reading the file from HDFS.

Kafka Revisited

In this section of the tutorial, you will use the Kafka messaging environment to
show how producers can put data into Kafka topics in a reliable and redundant
fashion and how consumers can retrieve that data from the topics.

Show the current list of Kafka brokers and topics with these commands:

$ dcos package install --cli kafka --yes

$ dcos kafka broker list

$ dcos kafka topic list

Create a Kafka topic called my-topic using this command.

$ dcos kafka topic create my-topic --partitions=3 --replication=3

Run a containerized application to read from the new Kafka topic. From the
DC/OS Dashboard, click on the Services menu option. Then click on the plus
sign to create a new service manually.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 43

Click on the Single Container button to display the Run a Service page.

Fill in the following configuration settings:

SERVICE ID: kafka-consumer

CONTAINER IMAGE: mesosphere/kafka-client

CMD: echo "#### KAFKA CONSUMER ####" && ./kafka-console-

consumer.sh --zookeeper master.mesos:2181/dcos-service-kafka --from-beginning --
topic my-topic

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 44

Click the RUN & REVIEW button:

Then click the RUN SERVICE button to run this new service:

When your service completes the startup process, it will show up in the list of
services running in the application group.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 45

Let’s view the output of the service you just started. Click on the service name,
kafka-consumer, and then click on the logs icon (the page icon) on the right of
the service name.

You will see the STDERR and STDOUT log files for this service. Click on the
STDOUT button to see the current standard output.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 46

Produce some messages in the new topic using the command:

$ dcos kafka topic producer_test my-topic 100

This will generate some test data and place entries into the my-topic message
queue in Kafka.

Then go back to your STDOUT console for the kafka-consumer service and view
the Kafka messages.

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 47

$ dcos kafka topic producer_test my-topic 100

$ dcos spark run --submit-args='--conf spark.eventLog.enabled=true --conf

spark.eventLog.dir=hdfs://hdfs/history --conf spark.mesos.coarse=true --conf
spark.cores.max=4 --conf spark.executor.memory=1g --driver-cores 1 --driver-memory
1g --class org.apache.spark.examples.streaming.KafkaWordCount
https://downloads.mesosphere.com/spark/assets/spark-examples_2.10-1.4.0-
SNAPSHOT.jar mesos://leader.mesos:5050 zk-1.zk,zk-2.zk,zk-3.zk my-consumer-group
my-topic 1'

Summary

This tutorial guided you through the process of deploying the components that
make up the SMACK Stack and also showed you how to run a Spark job that
reads from the HDFS service and from the Kafka service. Additionally, this
tutorial showed you how to test the Kafka service with consumers and
producers.

If you would like to quickly deploy these components you can use the pre-built
startup script named start-smackstack.sh found here in the scripts directory:

https://github.com/gregpalmr/smack-stack-tutorial

If you would like to review the Mesosphere Advanced SMACK Stack tutorial, you
can find that here:

https://github.com/gregpalmr/smack-stack-advanced-tutorial

Mesosphere DC/OS - SMACK Stack Hands-on Tutorial - DRAFT v0.7 48

Howweedgrow Ultimate Chart Cheat Sheet
100% (2)
Howweedgrow Ultimate Chart Cheat Sheet
3 pages
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Module 4 - Deploying and Implementing A Cloud Solution
No ratings yet
Module 4 - Deploying and Implementing A Cloud Solution
39 pages
MySQL Questions Answers
No ratings yet
MySQL Questions Answers
18 pages
Tulli Papyrus
0% (1)
Tulli Papyrus
12 pages
Thai Kim Sim V Public Prosecutor
No ratings yet
Thai Kim Sim V Public Prosecutor
15 pages
Docker Java App With MariaDB - Deployment in Less Than A Minute PDF
No ratings yet
Docker Java App With MariaDB - Deployment in Less Than A Minute PDF
12 pages
Servlets Tutorial
No ratings yet
Servlets Tutorial
7 pages
Dotcms Install
No ratings yet
Dotcms Install
2 pages
5 Azure Introduction 1
No ratings yet
5 Azure Introduction 1
40 pages
Azure
No ratings yet
Azure
45 pages
Container Service PDF
No ratings yet
Container Service PDF
84 pages
Servlet Introduction
No ratings yet
Servlet Introduction
5 pages
Document for LocalStack SQS nd DynamoDB
No ratings yet
Document for LocalStack SQS nd DynamoDB
6 pages
Aquis, Installation Guide: Use This Guide With Version 5. 0 of Aquis - Released: Q4 2012
No ratings yet
Aquis, Installation Guide: Use This Guide With Version 5. 0 of Aquis - Released: Q4 2012
14 pages
SAS Connect Vs SAS Access
No ratings yet
SAS Connect Vs SAS Access
8 pages
Servlets Tutorial PDF
No ratings yet
Servlets Tutorial PDF
7 pages
APEX 5 Installation Steps
No ratings yet
APEX 5 Installation Steps
9 pages
Exacqvision Active Directory
No ratings yet
Exacqvision Active Directory
14 pages
Copy A SQL Server Database With Just The Objects and No Data
No ratings yet
Copy A SQL Server Database With Just The Objects and No Data
10 pages
Servlets Material
No ratings yet
Servlets Material
98 pages
X86VMS Sqlrelay V0109 3 1 Rnotes
No ratings yet
X86VMS Sqlrelay V0109 3 1 Rnotes
6 pages
1261329 - DBA Cockpit - Oracle as a remote database
No ratings yet
1261329 - DBA Cockpit - Oracle as a remote database
12 pages
ARTIGO - The Architecture of The CAS in SAS Viya
No ratings yet
ARTIGO - The Architecture of The CAS in SAS Viya
18 pages
CIRCABC OSS 3.4 Bin - Installation Guide
No ratings yet
CIRCABC OSS 3.4 Bin - Installation Guide
27 pages
Pascal Newsletter - Delphi+mysql
No ratings yet
Pascal Newsletter - Delphi+mysql
8 pages
MySQL Notes
100% (2)
MySQL Notes
24 pages
CLI
No ratings yet
CLI
12 pages
How-To - Install CDH On Mac OSX 10
No ratings yet
How-To - Install CDH On Mac OSX 10
20 pages
WAS On Mainframe Soultion Outline
No ratings yet
WAS On Mainframe Soultion Outline
10 pages
SAP On SQL Server ..
No ratings yet
SAP On SQL Server ..
16 pages
Amazon ElastiCache - Digital Cloud Training (2019!05!25 07-29-21)
No ratings yet
Amazon ElastiCache - Digital Cloud Training (2019!05!25 07-29-21)
7 pages
VFP
No ratings yet
VFP
4 pages
11G To 12C Upgrade Document
No ratings yet
11G To 12C Upgrade Document
18 pages
002.11.1 - Presentation - Kaspersky Endpoint Security For Business. What's New
No ratings yet
002.11.1 - Presentation - Kaspersky Endpoint Security For Business. What's New
77 pages
Sap Suse Linux
No ratings yet
Sap Suse Linux
136 pages
Superbase Odbc
No ratings yet
Superbase Odbc
21 pages
05 Azure Containers
100% (1)
05 Azure Containers
132 pages
Configuring MEM 1.1 With VSphere 5
No ratings yet
Configuring MEM 1.1 With VSphere 5
21 pages
Central Authentication Service-Final Paper
No ratings yet
Central Authentication Service-Final Paper
5 pages
DOCS For Apache
100% (1)
DOCS For Apache
3 pages
Matlab and MySQL
No ratings yet
Matlab and MySQL
17 pages
About DSpace6 PDF
No ratings yet
About DSpace6 PDF
15 pages
Sap DATASPHERE
No ratings yet
Sap DATASPHERE
88 pages
Vvvvvvvvvvvvvvvvvvvvvvvvvvvveffortless Containerization - Deploying Spring Boot and MySQL With Docker and Docker Compose - DEV Community
No ratings yet
Vvvvvvvvvvvvvvvvvvvvvvvvvvvveffortless Containerization - Deploying Spring Boot and MySQL With Docker and Docker Compose - DEV Community
19 pages
Installation Manual of ABCD in Linux V1.2
0% (1)
Installation Manual of ABCD in Linux V1.2
7 pages
The Components in This Illustration - Detailed Review
No ratings yet
The Components in This Illustration - Detailed Review
28 pages
Up-To-Date 1Z0-160 Exam Dumps (NOV 2017)
0% (1)
Up-To-Date 1Z0-160 Exam Dumps (NOV 2017)
7 pages
SQL Server 2019 Administration On LINUX SQL Server Simplified
No ratings yet
SQL Server 2019 Administration On LINUX SQL Server Simplified
376 pages
Connecting Microsoft SQL Server Reporting Services To Oracle Autonomous Database
No ratings yet
Connecting Microsoft SQL Server Reporting Services To Oracle Autonomous Database
17 pages
Windows Azure Poster
No ratings yet
Windows Azure Poster
1 page
Citrix Migration Guide 12619 Final2 Whitepapers
No ratings yet
Citrix Migration Guide 12619 Final2 Whitepapers
39 pages
DALC4NET A Generic Data Access Layer For DOT NET
No ratings yet
DALC4NET A Generic Data Access Layer For DOT NET
7 pages
Servlets
No ratings yet
Servlets
3 pages
DSpace Installation Guide
No ratings yet
DSpace Installation Guide
5 pages
9 Intro Mesosphere DCOS
No ratings yet
9 Intro Mesosphere DCOS
30 pages
DSPACECRIS Installation 031219 0351 2642
No ratings yet
DSPACECRIS Installation 031219 0351 2642
10 pages
Case For Migrating Sparc Oracle Solaris To X86sles
No ratings yet
Case For Migrating Sparc Oracle Solaris To X86sles
8 pages
Cloudstack Thesis
100% (3)
Cloudstack Thesis
8 pages
NoSQL Database Topics
No ratings yet
NoSQL Database Topics
10 pages
CONFIGURATION OF APACHE SERVER TO SUPPORT ASP
From Everand
CONFIGURATION OF APACHE SERVER TO SUPPORT ASP
DR. HIDAIA MAHMOOD ALASSOULI
No ratings yet
Learn Cassandra in 24 Hours
From Everand
Learn Cassandra in 24 Hours
Alex Nordeen
No ratings yet
Configuration of Apache Server To Support ASP
From Everand
Configuration of Apache Server To Support ASP
Dr. Hedaya Mahmood Alasooly
No ratings yet
Design and Implementation of A Wireless Sensor Network Node Based On Arduino
No ratings yet
Design and Implementation of A Wireless Sensor Network Node Based On Arduino
8 pages
Decomposing SMACK Stack
No ratings yet
Decomposing SMACK Stack
62 pages
Design and Implementation of A Low Cost Wireless Sensor Network Using Arduino and nRF24L01 (+)
No ratings yet
Design and Implementation of A Low Cost Wireless Sensor Network Using Arduino and nRF24L01 (+)
3 pages
Modeling and Building IoT Data Platforms With Actor-Oriented Databases
No ratings yet
Modeling and Building IoT Data Platforms With Actor-Oriented Databases
12 pages
Big Data SMACK A Guide To Apache Spark, Mesos, Akka, Cassandra, and Kafka
100% (2)
Big Data SMACK A Guide To Apache Spark, Mesos, Akka, Cassandra, and Kafka
277 pages
Device Data Ingestion For Industrial Big Data Platforms With A Case Study PDF
No ratings yet
Device Data Ingestion For Industrial Big Data Platforms With A Case Study PDF
15 pages
Modeling and Building IoT Data Platforms With Actor-Oriented Databases
No ratings yet
Modeling and Building IoT Data Platforms With Actor-Oriented Databases
12 pages
Example of The Gram-Schmidt Process.
100% (1)
Example of The Gram-Schmidt Process.
5 pages
Prodigiorum Ac Ostentorum Chronicon by Conrad Lycosthenes
100% (12)
Prodigiorum Ac Ostentorum Chronicon by Conrad Lycosthenes
120 pages
A Brief History of The Incas by Brien Foerster PDF
No ratings yet
A Brief History of The Incas by Brien Foerster PDF
102 pages
Mba Sip Report (Smriti Rao)
No ratings yet
Mba Sip Report (Smriti Rao)
30 pages
Power Mig (140, 180 MODELS) : Operator's Manual
No ratings yet
Power Mig (140, 180 MODELS) : Operator's Manual
128 pages
Contacts With E-Mail
No ratings yet
Contacts With E-Mail
71 pages
Local Roads & Bridges San Rafael Cabotonan Panicuan
No ratings yet
Local Roads & Bridges San Rafael Cabotonan Panicuan
23 pages
Request For Evidence of Chandrika and Ajay Patel
No ratings yet
Request For Evidence of Chandrika and Ajay Patel
4 pages
Structure and Electrical Properties of Cold-Sintered Strontium-Doped
No ratings yet
Structure and Electrical Properties of Cold-Sintered Strontium-Doped
8 pages
S 85 PDF
No ratings yet
S 85 PDF
364 pages
Cease and Desist Letter.
No ratings yet
Cease and Desist Letter.
3 pages
How To Develop Applications Compatible With All SecuGen USB Readers (SG1-0030D-001)
No ratings yet
How To Develop Applications Compatible With All SecuGen USB Readers (SG1-0030D-001)
4 pages
Review Quiz for Chapter 1
No ratings yet
Review Quiz for Chapter 1
4 pages
PD41 Spare Parts Catalog
No ratings yet
PD41 Spare Parts Catalog
38 pages
Al Gurm Resort, UAE
No ratings yet
Al Gurm Resort, UAE
2 pages
Legal Ethics Reviewer
No ratings yet
Legal Ethics Reviewer
7 pages
UDJ Cheat Sheet - Merged
No ratings yet
UDJ Cheat Sheet - Merged
2 pages
Theoretical Framework
No ratings yet
Theoretical Framework
4 pages
Control Valve Sizing Theory, Cavitation, Flashing
100% (3)
Control Valve Sizing Theory, Cavitation, Flashing
45 pages
Individual Paper Income Tax Return 2016
No ratings yet
Individual Paper Income Tax Return 2016
39 pages
Types of Corrosion
100% (3)
Types of Corrosion
53 pages
ATGC Risk Assessment For Low Voltage Busway System (BAHRA TBS)
No ratings yet
ATGC Risk Assessment For Low Voltage Busway System (BAHRA TBS)
5 pages
112281930
No ratings yet
112281930
11 pages
Module 4 in AE 11
No ratings yet
Module 4 in AE 11
15 pages
Form GST REG-06: (Amended)
No ratings yet
Form GST REG-06: (Amended)
3 pages
Cybercrimewarrants
No ratings yet
Cybercrimewarrants
21 pages
Spiceland SM 7ech09 PDF
100% (2)
Spiceland SM 7ech09 PDF
72 pages
M14-Oil & Water Strainer
No ratings yet
M14-Oil & Water Strainer
31 pages
CAPP Final PPT 2.0
No ratings yet
CAPP Final PPT 2.0
21 pages
San Mateo Daily Journal 04-27-19 Edition
No ratings yet
San Mateo Daily Journal 04-27-19 Edition
32 pages
IC 2002 / Bangkok: EMI Testing of An Electric Vehicle Drive With Loop Antenna
No ratings yet
IC 2002 / Bangkok: EMI Testing of An Electric Vehicle Drive With Loop Antenna
5 pages