Centralizing Kubernetes and Container
Operations
Oleg Chunikhin | CTO, Kublr
Introductions
Oleg Chunikhin
CTO, Kublr
ü 20 years in software architecture & development
ü Working w/ Kubernetes since its release in 2015
ü Software architect behind Kublr—an enterprise
ready container management platform
ü Twitter @olgch
History
• Custom software development company
• Dozens of projects per year
• Varying target environments: clouds, on-prem,
hybrid
• Recurring need for unified application delivery
and ops platform w/ monitoring, logs, security,
multiple env, ...
@olgch; @kublr
Docker and Kubernetes to the Rescue
• Docker is great, but local
• Kubernetes is great... when it is up and running
• Who sets up and operates K8S clusters?
• Who takes care of operational aspects at scale?
• How do you provide governance and ensure
compliance?
@olgch; @kublr
Enterprise Kubernetes Needs
Developers SRE/Ops/DevOps/SecOps
• Self-service • Org multi-tenancy
• Compatible • Single pane of glass
• Conformant • Operations
• Configurable • Monitoring
• Open & Flexible • Log collection
• Image management
• Security
• Identity management
• Reliability
• Performance
• Portability
@olgch; @kublr
Kubernetes Management Platform Wanted
• Portability – clouds, on-prem, hybrid, air-gapped, different OS’
• Centralized multi-cluster operations saves resources – many
environments (dev, prod, QA, ...), teams, applications
• Self-service and governance for Kubernetes operations
• Reliability – cluster self-healing, self-reliance
• Limited management profile – cloud and K8S API
• Architecture – flexible, open, pluggable, compatible
• Sturdy – secure, scalable, modular, HA, DR etc.
@olgch; @kublr
OPERATIONS SECURITY &
GOVERNANCE
Automation Infrastructure RBAC IAM
Ingress Storage Networking Container CI / CD App Mgmt
Registry
Logging Monitoring
Air Gap TLS
Custom Container Runtime Kubernetes
Observability
Clusters Certificate
Audit
Rotation
Usage
Infrastructure
API
Reporting
@olgch; @kublr
Central Control Plane: Operations
K8S Clusters
Data Prod API UI Operations
center
Log collection Monitoring
K8S API Authn and authz, SSO, federation
Cloud(s) PoC
Audit Image Repo Backup & DR
Dev Infrastructure management
Cloud
DevAPI
@olgch; @kublr
Central Control Plane: Operations
@olgch; @kublr
Cluster: Self-Sufficiency Simple
orchestration and
Infrastructure configuration agent
Orchestration
Automation Store Secrets
discovery
Central
NODE control
MASTER plane
KUBELET KUBLR KUBELET KUBLR
Docker Docker
overlay network, discovery, overlay network, discovery,
connectivity connectivity
K8s Master Components: Infrastructure and
etcd, scheduler, API, controller Application containers
@olgch; @kublr
Cluster: Portability
• (Almost) everything runs in containers
• Simple (single-binary) management agent Infrastructure Orchestration
Automation Store Secrets
• Minimal store requirements discovery
• Shared, eventually consistent
• Secure: RW files for masters, RO for nodes MASTE NOD
• Thus the store can be anything: RKUBELET KUBLR EKUBELET KUBLR
S3, SA, NFS, rsynced dir, provided files, ...
Docker Docker
• Minimal infra automation requirements overlay network, discovery, overlay network,
• Configure and run configuration agent connectivity discovery, connectivity
Infrastructure and
• Enable access to the store K8s Master Components:
etcd, scheduler, API, Application containers
• Can be AWS CF, Azure ARM, BOSH, controller
Ansible, ...
• Load balancer is not required for multi-master;
each agent can independently fail over to a healthy
master @olgch; @kublr
Cluster: Reliability
• Rely on underlying platform as much as
possible
Infrastructure
• ASG on AWS Automation
Orchestration
Store
• IAM on AWS for store access
• SA on Azure, S3 on AWS
• ARM on Azure, CF on AWS MASTER NODE
KUBELET KUBLR KUBELET KUBLR
• Minimal infrastructure SLA
Docker Docker
tolerate temporary failures
overlay network, discovery, overlay network, discovery,
connectivity
• Multi-muster API failover on nodes connectivity
K8s Master Components: Infrastructure and
• Resource management, memory etcd, scheduler, API, controller Application containers
requests and limits for OS and k8s
components
@olgch; @kublr
Central Control Plane: Logs and Metrics
K8S Clusters
Data Prod API UI Operations
center
Log collection Monitoring
K8S API Authn and authz, SSO, federation
Cloud(s) PoC
Audit Image Repo Backup & DR
Dev Infrastructure management
Cloud
DevAPI
@olgch; @kublr
Centralized Monitoring and Log Collection.
Why Bother?
• Prometheus and ELK are heavy and not easy to operate;
need attention and at least 4-8 Gb RAM... each, per cluster
• Cloud/SaaS monitoring is not always permitted or available
• Existing monitoring is often not container-aware
• No aggregated view and analysis
• No alerting governance
@olgch; @kublr
K8S Monitoring with Prometheus
• Discover nodes, services, pods Kubernetes Cluster
via K8S API Grafana
• Query metrics from discovered Discovery
endpoints K8S API Prometheus
• Endpoint are accessed directly Metrics
via internal cluster addresses Nodes Pods Srv
@olgch; @kublr
Centralized Monitoring
Control plane
Cluster registry
Configurator
Prometheus
config
KUBERNETES CLUSTER
K8S Proxy API
Grafana PROMETHEUS Prometheus
nodes, pods, (collector)
service endpoints
Ship externally
Prometheus Ship externally
data
@olgch; @kublr
Centralized Monitoring: Considerations
• Prometheus resource usage tuning
• Long-term storage (m3)
• Configuration file growth with many clusters
• Metrics labeling
• Additional load on API server
@olgch; @kublr
Centralized Monitoring
@olgch; @kublr
K8S Logging with Elasticsearch
Kubernetes Cluster
• Fluentd runs on nodes
• OS, K8S, and container logs Kibana
collected and shipped to
Elasticsearch Elasticsearch
• Kibana for visualization Logs
Pods
@olgch; @kublr
Centralized Log Collection
Control plane
Cluster registry
Configurator
Messaging
config
filter KUBERNETES CLUSTER
K8S Proxy API
RabbitMQ MQTT RabbitMQ
Prometheus
Shovel Forwarder (collector)
Port Fluentd filter
forwarding
filter analyze MQTT
Ship externally Ship externally
Logstash Elasticsearch
@olgch; @kublr
Centralized Log Collection: Considerations
• Tune Elasticsearch resource usage
• Take into account additional load on API server
• Log index structure normalization
{ {
"data": { "flatData": [
"elasticsearch": { {
"version": "6.x" "key": "elasticsearch.version",
} "type": "string",
} "key_type": "elasticsearch.version.string",
} "value_string": "6.x"
},
...
]
}
@olgch; @kublr http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/
The Rest: Considerations
• Identity management
Use Identity Broker (e.g. KeyCloak): Users, Authn, Autzn, SSO, RBAC,
Federation, ...
• Backup and disaster recovery
K8s metadata + app data/volumes: full cluster recovery or copy
• Docker image management
Docker image registry (e.g. Nexus, Artifactory, Docker Hub);
image scanning;
air-gapped or isolated environment: image registries proxying and caching,
“system” images
@olgch; @kublr
Take Kublr for a test drive!
kublr.com/deploy
Q&A
Free non-production license.
@olgch; @kublr
Oleg Chunikhin
Chief Technology Officer
[email protected]
@olgch
Stay in touch! Signup for our Kublr | kublr.com
newsletter at kublr.com @kublr