Getting Started with SANSA-Stack

This document summarizes all instructions to help first time users to get and use SANSA-Stack.

Set up SANSA

In order to get quickly started, SANSA provides project templates for the following build tools: Maven and SBT.

Maven

Spark
Flink

Use this Maven template to generate a SANSA project using Apache Spark.

1
2
3
4

git clone https://github.com/SANSA-Stack/SANSA-Template-Maven-Spark.git
cd SANSA-Template-Maven-Spark

mvn clean package

The subsequent steps depend on your IDE. Generally, just import this repository as a Maven project and start using SANSA / Spark.
Use this Maven template to generate a SANSA project using Apache Flink.

1
2
3
4

git clone https://github.com/SANSA-Stack/SANSA-Template-Maven-Flink.git
cd SANSA-Template-Maven-Flink

mvn clean package

The subsequent steps depend on your IDE. Generally, just import this repository as a Maven project and start using SANSA / Flink.

Use this SBT template to generate a SANSA project using Apache Spark.

1
2
3
4

git clone https://github.com/SANSA-Stack/SANSA-Template-SBT-Spark.git
cd SANSA-Template-SBT-Spark

sbt clean package

The subsequent steps depend on your IDE. Generally, just import this repository as a SBT project and start using SANSA / Spark.
Use this SBT template to generate a SANSA project using Apache Flink.

1
2
3
4

git clone https://github.com/SANSA-Stack/SANSA-Template-SBT-Flink.git
cd SANSA-Template-SBT-Flink

sbt clean package

The subsequent steps depend on your IDE. Generally, just import this repository as a SBT project and start using SANSA / Flink.

These templates help you to set up the project structure and to create the initial build files. Enjoy it! 🙂

IDE Setup

Eclipse / Scala-IDE
IntelliJ Idea

1. Make sure that you have Java 8 or higher installed.
2. Install the Eclipse m2e Maven plugin for Maven support, “m2e-egit“ for Git (if not installed already) and m2eclipse-scala (if not installed already).
3. Go to “File → New Project → “Checkout Maven Projects from SCM“.
4. Set SCM URL type to “git“ and enter the URL of your repository (e.g. for https://github.com/SANSA-Stack/SANSA-RDF it is https://github.com/SANSA-Stack/SANSA-RDF.git ).
5. Click on “OK” and wait a while.
1. File → New → Project from version control -> GitHub
2. Log in to github
3. Choose github.com/SANSA-Stack/SANSA-Query.git (for example)
4. Clone
5. “Non-managed pom file found” prompt in the lower right
6. Add as maven project
7. Be patient while it is “Resolving dependencies” (in the status bar)
8. Done

For developers using SANSA:

Eclipse / Scala-IDE & SBT
IntelliJ Idea & SBT

1. In order to generate Eclipse project files out of the sbt project, you should install sbteclipse plugin and just hit sbt eclipse on the root of the project .
2. Once you have installed and generated the Eclipse project files using one of the above plug-ins, start Eclipse.
3. File → Import → General/Existing Project into Workspace.
4. Select the directory containing your project as root directory (e.g. https://github.com/SANSA-Stack/SANSA-Template-SBT-Spark), select the project and hit Finish.
1. File –> New –> Project from Existing Sources.
2. Select a project (e.g. https://github.com/SANSA-Stack/SANSA-Template-SBT-Spark) that you want to import and click OK.
3. Select Import project from external model option and choose SBT project from the list. Click Next.
4. Select SBT options and click Finish.

SANSA-Notebooks

Interactive Spark Notebooks can run SANSA-Examples and are easy to deploy with docker-compose. Deployment stack includes Hadoop for HDFS, Spark for running SANSA examples, Hue for navigation and copying file to HDFS. The notebooks are created and run using Apache Zeppelin.

Clone the SANSA-Notebooks git repository:

1 2	git clone https://github.com/SANSA-Stack/SANSA-Notebooks cd SANSA-Notebooks

Get the SANSA Examples jar file (requires wget):

make

Start the cluster (this will lead to downloading BDE docker images, will take a while):

make up

When start-up is done you will be able to access the following interfaces:

http://localhost:8080/ (Spark Master)
http://localhost:8088/home (Hue HDFS Filebrowser)
http://localhost/ (Zeppelin) To load the data to your cluster simply do:

1 2	make load-data

Go on and open Zeppelin, choose any available notebook and try to execute it.

For more information refer to SANSA-Notebooks Github repository. If you have questions or found bugs, feel free to open an issue on the Github.

Configuring the Computing Frameworks

SANSA Version	Spark Version	Flink Version	Scala Version
0.8.0	3.0.x		2.12
0.7.1	2.4.x		2.11
0.6.0	2.4.x	1.8.x	2.11
0.5.0	2.4.x	1.7.x	2.11
0.4.0	2.3.x	1.5.x	2.11
0.3.0	2.2.x	1.4.x	2.11
0.2.0	2.1.x	1.3.x	2.11
0.1.0	2.0.x	1.1.x	2.11

Using SANSA in Maven Projects

If you want to import the full SANSA Stack for Apache Spark, please add the following Maven dependency to your project POM file:

<groupId>net.sansa-stack</groupId>

<artifactId>sansa-stack-spark_2.12</artifactId>

<version>$LATEST_RELEASE_VERSION$</version>

</dependency>

If you want to use only a particular layer of the stack, the pattern is always “sansa-LAYER_NAME-spark_SCALA_VERSION” for the Maven artifact name, i.e. it looks in your POM file as follows:

<groupId>net.sansa-stack</groupId>

<artifactId>sansa-$LAYER_NAME$-spark_$SCALA_VERSION$</artifactId>

<version>$LATEST_RELEASE_VERSION$</version>

</dependency>

For example, if you just want to use latest RDF layer version 0.8.0 with Scala 2.12 in your project, you have to add

<groupId>net.sansa-stack</groupId>

<artifactId>sansa-rdf-spark_2.12</artifactId>

</dependency>