0% found this document useful (0 votes)
136 views

Deploying Jupyter Notebooks For Students and Researchers

The document discusses deploying Jupyter Notebooks for students and researchers using JupyterHub. It covers what Jupyter Notebooks and servers are, how to install JupyterHub including using Docker containers, configuring authentication with GitHub OAuth, customizing spawners, and reference deployment options like using Docker Compose. Optimizations discussed include always using SSL, a Postgres database, and pruning idle servers. When and when not to use JupyterHub in different scenarios is also covered.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
136 views

Deploying Jupyter Notebooks For Students and Researchers

The document discusses deploying Jupyter Notebooks for students and researchers using JupyterHub. It covers what Jupyter Notebooks and servers are, how to install JupyterHub including using Docker containers, configuring authentication with GitHub OAuth, customizing spawners, and reference deployment options like using Docker Compose. Optimizations discussed include always using SSL, a Postgres database, and pruning idle servers. When and when not to use JupyterHub in different scenarios is also covered.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Deploying Jupyter Notebooks

for Students and Researchers


https://github.com/minrk/jupyterhub-pydata-2016

Min Ragan-Kelley*, Kyle Kelley, Thomas Kluyver


PyData London, 2016
git clone https://github.com/minrk/jupyterhub-pydata-2016 /srv/jupyterhub
What is a Notebook?

• Document

• Environment

• Web app

https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
What is a Notebook
Server?
• Manages authentication

• Spawns single-user servers on-


demand

• Each user gets a complete


notebook server
• Initial request is handled by Hub

• User authenticates via form /


OAuth

• Spawner starts single-user server

• Hub notifies Proxy

• Redirects user to /user/[name]

• Single-user Server verifies auth


with Hub
Installation (as admin)
conda:
conda install -c conda-forge jupyterhub
conda install notebook

pip, npm:
python3 -m pip install jupyterhub
npm install -g configurable-http-proxy

test:
jupyterhub -h
configurable-http-proxy -h
Installation (this repo)

conda env create -f environment.yml


source activate jupyterhub-tutorial
Installation: Caveats

JupyterHub installation must be readable


+executable by all users*

This is often not the case for envs, so be careful

*when using local users


Plug: conda-forge

Community-managed conda packages.

https://conda-forge.github.io

conda config --add channels conda-forge


Installation

https://docs.docker.com/engine/installation

pip install dockerspawner


docker pull jupyterhub/singleuser
JupyterHub Defaults

• Authentication: PAM (local users, passwords)

• Spawning: Local users

• Hub must run as root


Aside: SSL
• JupyterHub is an authenticated service - users login.
That should never happen over plain HTTP.

• For testing, we can generate self-signed certificates:

openssl req -x509 -nodes -days 365 -newkey rsa:1024 \


-keyout jupyterhub.key -out jupyterhub.crt

Note: Safari will not connect websockets to untrusted (self-signed) certs


Aside: Let's Encrypt
• https://letsencrypt.org/getting-started/

• Free SSL for any domain


git clone https://github.com/letsencrypt/letsencrypt
cd letsencrypt
./letsencrypt-auto certonly --standalone -d mydomain.tld

key: /etc/letsencrypt/live/mydomain.tld/privkey.pem
cert: /etc/letsencrypt/live/mydomain.tld/fullchain.pem
Start configuring JupyterHub
jupyterhub --generate-config

c.JupyterHub.ssl_key = 'jupyterhub.key'
c.JupyterHub.ssl_cert = 'jupyterhub.crt'
c.JupyterHub.port = 443
Installing kernels for all users

conda create -n py2 python=2 ipykernel


conda run -n py2 -- ipython kernel install

jupyter kernelspec list


Using GitHub OAuth
https://github.com/settings/applications/new
Using GitHub OAuth

In ./env:

export GITHUB_CLIENT_ID=from_github
export GITHUB_CLIENT_SECRET=from_github
export OAUTH_CALLBACK_URL=https://YOURDOMAIN/hub/oauth_callback

source ./env
Using GitHub OAuth
We need OAuthenticator:

python3 -m pip install oauthenticator

In jupyterhub_config.py:

from oauthenticator.github import LocalGitHubOAuthenticator


c.JupyterHub.authenticator_class = LocalGitHubOAuthenticator
c.LocalGitHubOAuthenticator.create_system_users = True
Specifying users
By default, any user that successfully authenticates is allowed to use
the Hub.

This is appropriate for shared workstations with PAM Auth, but


probably not GitHub:
# set of users allowed to use the Hub
c.Authenticator.whitelist = {'minrk', 'takluyver'}

# set of users who can administer the Hub itself


c.Authenticator.admin_users = {'minrk'}
Custom Authenticators
Using DockerSpawner
We need DockerSpawner:

python3 -m pip install dockerspawner netifaces


docker pull jupyterhub/singleuser

In jupyterhub_config.py:

from oauthenticator.github import GitHubOAuthenticator


c.JupyterHub.authenticator_class = GitHubOAuthenticator

from dockerspawner import DockerSpawner


c.JupyterHub.spawner_class = DockerSpawner
Using DockerSpawner
from dockerspawner import DockerSpawner
c.JupyterHub.spawner_class = DockerSpawner

# The Hub's API listens on localhost by default,


# but docker containers can't see that.
# Tell the Hub to listen on its docker network:
import netifaces
docker0 = netifaces.ifaddresses('docker0')
docker0_ipv4 = docker0[netifaces.AF_INET][0]
c.JupyterHub.hub_ip = docker0_ipv4['addr']
Using DockerSpawner

• There is *loads* to configure with Docker

• Networking configuration

• Data volumes
• DockerSpawner.container_image = 'jupyterhub/singleuser'
Customizing

Spawners
JupyterHub with supervisor
apt-get install supervisor
# /etc/supervisor/conf.d/jupyterhub.conf
[program:jupyterhub]
command=bash launch.sh
directory=/srv/jupyterhub
#!/usr/bin/env bash autostart=true
# /srv/jupyterhub/launch.sh autorestart=true
set -e startretries=3
source env exitcodes=0,2
exec jupyterhub $@ stopsignal=TERM
redirect_stderr=true
stdout_logfile=/var/log/jupyterhub.log
stdout_logfile_maxbytes=1MB
stdout_logfile_backups=10
user=root
Reference Deployments
https://github.com/jupyterhub/jupyterhub-deploy-docker
docker-compose, DockerSpawner, Hub in Docker

https://github.com/jupyterhub/jupyterhub-deploy-teaching
ansible, no docker, nbgrader
Docker Deployment
• Docker Compose: https://docs.docker.com/compose/install/

• git clone https://github.com/jupyterhub/jupyterhub-deploy-docker

• Create a network:
docker network create jupyterhub-network

• Create a volume for secrets:


docker volume create --name jupyterhub-secrets

• Create a data volume:


docker volume create --name jupyterhub-data
Docker Deployment

• mkdir secrets

• Copy SSL key, cert to:

• secrets/jupyterhub.cer (cert)

• secrets/jupyterhub.key (key)
Docker Deployment

Make userlist:

minrk admin
takluyver
Docker Deployment

Launch: 🚀

docker-compose up
Optimizations
and best practices
• Always use SSL!

• Use postgres for the Hub database

• Put nginx in front of the proxy

• Run cull-idle-servers service to prune resources

• Global configuration in /etc/jupyter and /etc/


ipython

• Back up your user data!!!


When to use JupyterHub
• A class where students can do homework
(nbgrader)

• A short-lived workshop, especially if installation is


hard

• A research group with a shared workstation or


small cluster

• On-site computing resources for researchers and


analysts at an institution
When not to use JupyterHub

• JupyterHub is Authenticated and Persistent

• tmpnb: anonymous, ephemeral notebooks

• binder: tmpnb + GitHub repos

• SageMathCloud is hosted and provides realtime-


collaboration
API

You might also like