Openshift Container Platform-3.11-Day Two Operations Guide
Openshift Container Platform-3.11-Day Two Operations Guide
11
The text of and illustrations in this document are licensed by Red Hat under a Creative Commons
Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is
available at
http://creativecommons.org/licenses/by-sa/3.0/
. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must
provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert,
Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift,
Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States
and other countries.
Linux ® is the registered trademark of Linus Torvalds in the United States and other countries.
XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States
and/or other countries.
MySQL ® is a registered trademark of MySQL AB in the United States, the European Union and
other countries.
Node.js ® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the
official Joyent Node.js open source or commercial project.
The OpenStack ® Word Mark and OpenStack logo are either registered trademarks/service marks
or trademarks/service marks of the OpenStack Foundation, in the United States and other
countries and are used with the OpenStack Foundation's permission. We are not affiliated with,
endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
Abstract
While the OpenShift Container Platform Cluster administration guide is focused more on
configuration, this guide will describe an overview of common daily maintenance tasks.
Table of Contents
Table of Contents
. . . . . . . . . . . 1.. .OVERVIEW
CHAPTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . .
.CHAPTER
. . . . . . . . . . 2.
. . RUN-ONCE
. . . . . . . . . . . . .TASKS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . .
2.1. NTP SYNCHRONIZATION 6
2.2. ENTROPY 7
2.3. CHECKING THE DEFAULT STORAGE CLASS 7
.CHAPTER
. . . . . . . . . . 3.
. . ENVIRONMENT
. . . . . . . . . . . . . . . . .HEALTH
. . . . . . . . . CHECKS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. . . . . . . . . . . . .
3.1. CHECKING COMPLETE ENVIRONMENT HEALTH 9
Procedure 9
3.2. CREATING ALERTS USING PROMETHEUS 9
3.3. HOST HEALTH 9
3.4. ROUTER AND REGISTRY HEALTH 10
3.5. NETWORK CONNECTIVITY 11
3.5.1. Connectivity on master hosts 11
3.5.2. Connectivity on node instances 13
Procedure 13
3.6. STORAGE 15
3.7. DOCKER STORAGE 16
3.8. API SERVICE STATUS 17
3.9. CONTROLLER ROLE VERIFICATION 17
3.10. VERIFYING CORRECT MAXIMUM TRANSMISSION UNIT (MTU) SIZE 18
Prerequisites 18
.CHAPTER
. . . . . . . . . . 4.
. . .CREATING
. . . . . . . . . . . AN
. . . .ENVIRONMENT-WIDE
. . . . . . . . . . . . . . . . . . . . . . .BACKUP
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
..............
4.1. CREATING A MASTER HOST BACKUP 21
Procedure 22
4.2. CREATING A NODE HOST BACKUP 25
Procedure 26
4.3. BACKING UP REGISTRY CERTIFICATES 28
Procedure 28
4.4. BACKING UP OTHER INSTALLATION FILES 28
Procedure 28
4.5. BACKING UP APPLICATION DATA 29
Procedure 29
4.6. ETCD BACKUP 30
4.6.1. Backing up etcd 30
4.6.1.1. Backing up etcd configuration files 30
Procedure 31
4.6.1.2. Backing up etcd data 31
Prerequisites 31
Procedure 32
4.7. BACKING UP A PROJECT 33
Procedure 33
4.8. BACKING UP PERSISTENT VOLUME CLAIMS 34
Procedure 35
.CHAPTER
. . . . . . . . . . 5.
. . HOST-LEVEL
. . . . . . . . . . . . . . .TASKS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37
..............
5.1. ADDING A HOST TO THE CLUSTER 37
5.2. MASTER HOST TASKS 37
5.2.1. Deprecating a master host 37
5.2.1.1. Creating a master host backup 37
1
OpenShift Container Platform 3.11 Day Two Operations Guide
Procedure 37
5.2.1.2. Backing up etcd 41
5.2.1.2.1. Backing up etcd configuration files 41
Procedure 41
5.2.1.2.2. Backing up etcd data 42
Prerequisites 42
Procedure 42
5.2.1.3. Deprecating a master host 43
Procedure 43
5.2.1.4. Removing an etcd host 45
Procedure 45
Procedure 45
5.2.2. Creating a master host backup 46
Procedure 47
5.2.3. Restoring a master host backup 50
Procedure 51
5.3. NODE HOST TASKS 52
5.3.1. Deprecating a node host 52
Prerequisites 52
Procedure 52
5.3.1.1. Replacing a node host 58
5.3.2. Creating a node host backup 58
Procedure 59
5.3.3. Restoring a node host backup 61
Procedure 61
5.3.4. Node maintenance and next steps 62
5.4. ETCD TASKS 62
5.4.1. etcd backup 62
5.4.1.1. Backing up etcd 63
5.4.1.1.1. Backing up etcd configuration files 63
Procedure 63
5.4.1.1.2. Backing up etcd data 64
Prerequisites 64
Procedure 64
5.4.2. Restoring etcd 65
5.4.2.1. Restoring the etcd configuration file 65
5.4.2.2. Restoring etcd data 66
5.4.3. Replacing an etcd host 67
5.4.4. Scaling etcd 67
Prerequisites 68
5.4.4.1. Adding a new etcd host using Ansible 69
Procedure 69
5.4.4.2. Manually adding a new etcd host 70
Procedure 70
Modify the current etcd cluster 70
Modify the new etcd host 73
Modify each OpenShift Container Platform master 75
5.4.5. Removing an etcd host 76
Procedure 76
Procedure 76
.CHAPTER
. . . . . . . . . . 6.
. . .PROJECT-LEVEL
. . . . . . . . . . . . . . . . . . TASKS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79
..............
6.1. BACKING UP A PROJECT 79
2
Table of Contents
Procedure 79
6.2. RESTORING A PROJECT 80
Procedure 80
6.3. BACKING UP PERSISTENT VOLUME CLAIMS 81
Procedure 81
6.4. RESTORING PERSISTENT VOLUME CLAIMS 82
6.4.1. Restoring files to an existing PVC 82
Procedure 82
6.4.2. Restoring data to a new PVC 83
Procedure 83
6.5. PRUNING IMAGES AND CONTAINERS 84
.CHAPTER
. . . . . . . . . . 7.
. . DOCKER
. . . . . . . . . .TASKS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85
..............
7.1. INCREASING CONTAINER STORAGE 85
7.1.1. Evacuating the node 85
7.1.2. Increasing storage 85
Prerequisites 86
Procedure 86
7.1.3. Changing the storage backend 89
7.1.3.1. Evacuating the node 89
7.2. MANAGING CONTAINER REGISTRY CERTIFICATES 91
7.2.1. Installing a certificate authority certificate for external registries 91
Procedure 92
7.2.2. Docker certificates backup 93
Procedure 93
7.2.3. Docker certificates restore 93
7.3. MANAGING CONTAINER REGISTRIES 93
7.3.1. Docker search external registries 94
Procedure 94
7.3.2. Docker external registries whitelist and blacklist 94
Procedure 94
7.3.3. Secure registries 96
7.3.4. Insecure registries 96
Procedure 96
7.3.5. Authenticated registries 97
Procedure 98
7.3.6. ImagePolicy admission plug-in 99
Procedure 99
7.3.7. Import images from external registries 100
Procedure 100
7.3.8. OpenShift Container Platform registry integration 102
7.3.8.1. Connect the registry project with the cluster 102
Procedure 103
. . . . . . . . . . . 8.
CHAPTER . . .MANAGING
. . . . . . . . . . . . CERTIFICATES
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .105
...............
8.1. CHANGING AN APPLICATION’S SELF-SIGNED CERTIFICATE TO CA-SIGNED CERTIFICATE 105
3
OpenShift Container Platform 3.11 Day Two Operations Guide
4
CHAPTER 1. OVERVIEW
CHAPTER 1. OVERVIEW
This section is built for OpenShift Container Platform administrators with a fresh installation.
While the OpenShift Container Platform Cluster administration guide is focused more on configuration,
this guide describes an overview of common daily maintenance tasks.
5
OpenShift Container Platform 3.11 Day Two Operations Guide
While these are classified as run-once tasks, you can perform any of these at any time if any
circumstances change.
NOTE
The OpenShift Container Platform installation playbooks install, enable, and configure
the ntp package to provide NTP service by default. To disable this behavior, set
openshift_clock_enabled=false in the inventory file. If a host has the chrony package
already installed, it is configured to provide NTP service instead of using the ntp package.
Depending on your instance, NTP might not be enabled by default. To verify that a host is configured to
use NTP:
$ timedatectl
Local time: Thu 2017-12-21 14:58:34 UTC
Universal time: Thu 2017-12-21 14:58:34 UTC
RTC time: Thu 2017-12-21 14:58:34
Time zone: Etc/UTC (UTC, +0000)
NTP enabled: yes
NTP synchronized: yes
RTC in local TZ: no
DST active: n/a
If both NTP enabled and NTP synchronized are yes, then NTP synchronization is active.
IMPORTANT
Time synchronization should be enabled on all hosts in the cluster, whether using NTP or
any other method.
6
CHAPTER 2. RUN-ONCE TASKS
For more information about the timedatectl command, timezones, and clock configuration, see
Configuring the date and time and UTC, Timezones, and DST .
2.2. ENTROPY
OpenShift Container Platform uses entropy to generate random numbers for objects such as IDs or SSL
traffic. These operations wait until there is enough entropy to complete the task. Without enough
entropy, the kernel is not able to generate these random numbers with sufficient speed, which can lead
to timeouts and the refusal of secure connections.
$ cat /proc/sys/kernel/random/entropy_avail
2683
The available entropy should be verified on all node hosts in the cluster. Ideally, this value should be
above 1000.
NOTE
Red Hat recommends monitoring this value and issuing an alert if the value is under 800.
Alternatively, you can use the rngtest command to check not only the available entropy, but if your
system can feed enough entropy as well:
If the above takes around 30 seconds to complete, then there is not enough entropy available.
Depending on your environment, entropy can be increased in multiple ways. For more information, see
the following blog post: https://developers.redhat.com/blog/2017/10/05/entropy-rhel-based-cloud-
instances/.
Generally, you can increase entropy by installing the rng-tools package and enabling the rngd service:
Once the rngd service has started, entropy should increase to a sufficient level.
$ oc get storageclass
NAME TYPE
ssd kubernetes.io/gce-pd
7
OpenShift Container Platform 3.11 Day Two Operations Guide
The above output is taken from an OpenShift Container Platform instance running on GCP, where two
kinds of persistent storage are available: standard (HDD) and SSD. Notice the standard storage class is
configured as the default. If there is no storage class defined, or none is set as a default, see the
Dynamic Provisioning and Creating Storage Classes section for instructions on how to set up a storage
class as suggested.
8
CHAPTER 3. ENVIRONMENT HEALTH CHECKS
Knowing the verification process for the various components is the first step to troubleshooting issues.
If experiencing issues, you can use the checks provided in this section to diagnose any problems.
Procedure
1. Create a new project named validate, as well as an example application from the cakephp-
mysql-example template:
$ oc new-project validate
$ oc new-app cakephp-mysql-example
$ oc logs -f bc/cakephp-mysql-example
2. Once the build is complete, two pods should be running: a database and an application:
$ oc get pods
NAME READY STATUS RESTARTS AGE
cakephp-mysql-example-1-build 0/1 Completed 0 1m
cakephp-mysql-example-2-247xm 1/1 Running 0 39s
mysql-1-hbk46 1/1 Running 0 1m
3. Visit the application URL. The Cake PHP framework welcome page should be visible. The URL
should have the following format cakephp-mysql-example-validate.<app_domain>.
4. Once the functionality has been verified, the validate project can be deleted:
9
OpenShift Container Platform 3.11 Day Two Operations Guide
$ oc get nodes
NAME STATUS AGE VERSION
ocp-infra-node-1clj Ready 1h v1.6.1+5115d708d7
ocp-infra-node-86qr Ready 1h v1.6.1+5115d708d7
ocp-infra-node-g8qw Ready 1h v1.6.1+5115d708d7
ocp-master-94zd Ready 1h v1.6.1+5115d708d7
ocp-master-gjkm Ready 1h v1.6.1+5115d708d7
ocp-master-wc8w Ready 1h v1.6.1+5115d708d7
ocp-node-c5dg Ready 1h v1.6.1+5115d708d7
ocp-node-ghxn Ready 1h v1.6.1+5115d708d7
ocp-node-w135 Ready 1h v1.6.1+5115d708d7
The above cluster example consists of three master hosts, three infrastructure node hosts, and three
node hosts. All of them are running. All hosts in the cluster should be visible in this output.
The Ready status means that master hosts can communicate with node hosts and that the nodes are
ready to run pods (excluding the nodes in which scheduling is disabled).
# source /etc/etcd/etcd.conf
You can check the basic etcd health status from any master instance with the etcdctl command:
However, to get more information about etcd hosts, including the associated master host:
All etcd hosts should contain the master host name if the etcd cluster is co-located with master
services, or all etcd instances should be visible if etcd is running separately.
NOTE
etcdctl2 is an alias for the etcdctl tool that contains the proper flags to query the etcd
cluster in v2 data model, as well as, etcdctl3 for v3 data model.
10
CHAPTER 3. ENVIRONMENT HEALTH CHECKS
The values in the DESIRED and CURRENT columns should match the number of nodes hosts.
NOTE
Multiple running instances of the container image registry require backend storage
supporting writes by multiple processes. If the chosen infrastructure provider does not
contain this ability, running a single instance of a container image registry is acceptable.
NOTE
If OpenShift Container Platform is using an external container image registry, the internal
registry service does not need to be running.
NOTE
Due to the complexity of networking, not all verification scenarios are covered in this
section.
Master services keep their state synchronized using the etcd key-value store. Communication between
master and etcd services is important, whether those etcd services are collocated on master hosts, or
11
OpenShift Container Platform 3.11 Day Two Operations Guide
running on hosts designated only for the etcd service. This communication happens on TCP ports 2379
and 2380. See the Host health section for methods to check this communication.
SkyDNS
SkyDNS provides name resolution of local services running in OpenShift Container Platform. This
service uses TCP and UDP port 8053.
If the answer matches the output of the following, SkyDNS service is working correctly:
Both the API service and web console share the same port, usually TCP 8443 or 443, depending on the
setup. This port needs to be available within the cluster and to everyone who needs to work with the
deployed environment. The URLs under which this port is reachable may differ for internal cluster and
for external clients.
$ curl -k https://internal-master.example.com:443/version
{
"major": "1",
"minor": "6",
"gitVersion": "v1.6.1+5115d708d7",
"gitCommit": "fff65cf",
"gitTreeState": "clean",
"buildDate": "2017-10-11T22:44:25Z",
"goVersion": "go1.7.6",
"compiler": "gc",
"platform": "linux/amd64"
}
$ curl -k https://master.example.com:443/healthz
ok
The following commands can be used to determine the URLs that are used by the internal cluster and
external clients. Example URLs are found in the previous example.
12
CHAPTER 3. ENVIRONMENT HEALTH CHECKS
To verify node host functionality, create a new application. The following example ensures the node
reaches the container image registry, which is running on an infrastructure node:
Procedure
$ oc new-project sdn-test
$ oc new-app centos/httpd-24-centos7~https://github.com/sclorg/httpd-ex
$ oc get pods
NAME READY STATUS RESTARTS AGE
httpd-ex-1-205hz 1/1 Running 0 34s
httpd-ex-1-build 0/1 Completed 0 1m
$ oc rsh po/<pod-name>
For example:
$ oc rsh po/httpd-ex-1-205hz
13
OpenShift Container Platform 3.11 Day Two Operations Guide
sh-4.2$ *exit*
6. The node host is listening on TCP port 10250. This port needs to be reachable by all master
hosts on any node, and if monitoring is deployed in the cluster, the infrastructure nodes must
have access to this port on all instances as well. Broken communication on this port can be
detected with the following command:
$ oc get nodes
NAME STATUS AGE VERSION
ocp-infra-node-1clj Ready 4d v1.6.1+5115d708d7
ocp-infra-node-86qr Ready 4d v1.6.1+5115d708d7
ocp-infra-node-g8qw Ready 4d v1.6.1+5115d708d7
ocp-master-94zd Ready,SchedulingDisabled 4d v1.6.1+5115d708d7
ocp-master-gjkm Ready,SchedulingDisabled 4d v1.6.1+5115d708d7
ocp-master-wc8w Ready,SchedulingDisabled 4d v1.6.1+5115d708d7
ocp-node-c5dg Ready 4d v1.6.1+5115d708d7
ocp-node-ghxn Ready 4d v1.6.1+5115d708d7
ocp-node-w135 NotReady 4d v1.6.1+5115d708d7
In the output above, the node service on the ocp-node-w135 node is not reachable by the
master services, which is represented by its NotReady status.
7. The last service is the router, which is responsible for routing connections to the correct services
running in the OpenShift Container Platform cluster. Routers listen on TCP ports 80 and 443 on
infrastructure nodes for ingress traffic. Before routers can start working, DNS must be
configured:
$ dig *.apps.example.com
;; OPT PSEUDOSECTION:
14
CHAPTER 3. ENVIRONMENT HEALTH CHECKS
;; ANSWER SECTION:
*.apps.example.com. 3571 IN CNAME apps.example.com.
apps.example.com. 3561 IN A 35.xx.xx.92
The IP address, in this case 35.xx.xx.92, should be pointing to the load balancer distributing
ingress traffic to all infrastructure nodes. To verify the functionality of the routers, check the
registry service once more, but this time from outside the cluster:
3.6. STORAGE
Master instances need at least 40 GB of hard disk space for the /var directory. Check the disk usage of
a master host using the df command:
$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 xfs 45G 2.8G 43G 7% /
devtmpfs devtmpfs 3.6G 0 3.6G 0% /dev
tmpfs tmpfs 3.6G 0 3.6G 0% /dev/shm
tmpfs tmpfs 3.6G 63M 3.6G 2% /run
tmpfs tmpfs 3.6G 0 3.6G 0% /sys/fs/cgroup
tmpfs tmpfs 732M 0 732M 0% /run/user/1000
tmpfs tmpfs 732M 0 732M 0% /run/user/0
Node instances need at least 15 GB space for the /var directory, and at least another 15 GB for Docker
storage (/var/lib/docker in this case). Depending on the size of the cluster and the amount of
ephemeral storage desired for pods, a separate partition should be created for
/var/lib/origin/openshift.local.volumes on the nodes.
$ df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 xfs 25G 2.4G 23G 10% /
devtmpfs devtmpfs 3.6G 0 3.6G 0% /dev
15
OpenShift Container Platform 3.11 Day Two Operations Guide
Persistent storage for pods should be handled outside of the instances running the OpenShift
Container Platform cluster. Persistent volumes for pods can be provisioned by the infrastructure
provider, or with the use of container native storage or container ready storage.
The Docker storage disk is mounted as /var/lib/docker and formatted with xfs file system. Docker
storage is configured to use overlay2 filesystem:
$ cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS='--storage-driver overlay2'
# docker info
Containers: 4
Running: 4
Paused: 0
Stopped: 0
Images: 4
Server Version: 1.12.6
Storage Driver: overlay2
Backing Filesystem: xfs
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
Volume: local
Network: overlay host bridge null
Authorization: rhel-push-plugin
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Security Options: seccomp selinux
Kernel Version: 3.10.0-693.11.1.el7.x86_64
Operating System: Employee SKU
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 3
CPUs: 2
Total Memory: 7.147 GiB
Name: ocp-infra-node-1clj
ID: T7T6:IQTG:WTUX:7BRU:5FI4:XUL5:PAAM:4SLW:NWKL:WU2V:NQOW:JPHC
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
16
CHAPTER 3. ENVIRONMENT HEALTH CHECKS
The API service exposes a health check, which can be queried externally using the API host name. Both
the API service and web console share the same port, usually TCP 8443 or 443, depending on the
setup. This port needs to be available within the cluster and to everyone who needs to work with the
deployed environment:
$ curl -k https://myserver.com:443/healthz 1
ok
1 This must be reachable from the client’s network. The web console port in this example is 443.
Specify the value set for openshift_master_console_port in the host inventory file prior to
OpenShift Container Platform deployment. If openshift_master_console_port is not included in
the inventory file, port 8443 is set by default.
The OpenShift Container Platform controllers execute a procedure to choose which host runs the
service. The current running value is stored in an annotation in a special configmap stored in the kube-
system project.
Verify the master host running the controller service as a cluster-admin user:
17
OpenShift Container Platform 3.11 Day Two Operations Guide
10.19.115.212-dnwrtcl4","leaseDurationSeconds":15,"acquireTime":"2018-02-
17T18:16:54Z","renewTime":"2018-02-19T13:50:33Z","leaderTransitions":16}'
creationTimestamp: 2018-02-02T10:30:04Z
name: openshift-master-controllers
namespace: kube-system
resourceVersion: "17349662"
selfLink: /api/v1/namespaces/kube-system/configmaps/openshift-master-controllers
uid: 08636843-0804-11e8-8580-fa163eb934f0
master-<hostname>-<ip>-<8_random_characters>
Find the hostname of the master host by filtering the output using the following:
When a packet is larger than the MTU size that is transmitted over HTTP, the physical network router is
able to break the packet into multiple packets to transmit the data. However, when a packet is larger
than the MTU size is that transmitted over HTTPS, the router is forced to drop the packet.
Installation produces certificates that provide secure connections to multiple components that include:
master hosts
node hosts
infrastructure nodes
registry
router
These certificates can be found within the /etc/origin/master directory for the master nodes and
/etc/origin/node directory for the infra and app nodes.
Prerequisites
18
CHAPTER 3. ENVIRONMENT HEALTH CHECKS
2. Append /healthz to the value given above, use it to check on all hosts (master, infrastructure,
node):
$ curl -v https://docker-registry.default.svc:5000/healthz
* About to connect() to docker-registry.default.svc port 5000 (#0)
* Trying 172.30.11.171...
* Connected to docker-registry.default.svc (172.30.11.171) port 5000 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=172.30.11.171
* start date: Oct 18 05:30:10 2017 GMT
* expire date: Oct 18 05:30:11 2019 GMT
* common name: 172.30.11.171
* issuer: CN=openshift-signer@1508303629
> GET /healthz HTTP/1.1
> User-Agent: curl/7.29.0
> Host: docker-registry.default.svc:5000
> Accept: */*
>
< HTTP/1.1 200 OK
< Cache-Control: no-cache
< Date: Tue, 24 Oct 2017 19:42:35 GMT
< Content-Length: 0
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host docker-registry.default.svc left intact
The above example output shows the MTU size being used to ensure the SSL connection is
correct. The attempt to connect is successful, followed by connectivity being established and
completes with initializing the NSS with the certpath and all the server certificate information
regarding the docker-registry.
$ curl -v https://docker-registry.default.svc:5000/healthz
* About to connect() to docker-registry.default.svc port 5000 (#0)
* Trying 172.30.11.171...
* Connected to docker-registry.default.svc (172.30.11.171) port 5000 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
The above example shows that the connection is established, but it cannot finish initializing NSS
with certpath. The issue deals with improper MTU size set within the appropriate node
configuration map.
To fix this issue, adjust the MTU size within the node configuration map to 50 bytes smaller than
the MTU size that the OpenShift SDN Ethernet device uses.
3. View the MTU size of the desired Ethernet device (i.e. eth0):
19
OpenShift Container Platform 3.11 Day Two Operations Guide
4. To change the MTU size, modify the appropriate node configuration map and set a value that is
50 bytes smaller than output provided by the ip command.
For example, if the MTU size is set to 1500, adjust the MTU size to 1450 within the node
configuraton map:
networkConfig:
mtu: 1450
NOTE
You must change the MTU size on all masters and nodes that are part of the
OpenShift Container Platform SDN. Also, the MTU size of the tun0 interface
must be the same across all nodes that are part of the cluster.
6. Once the node is back online, confirm the issue no longer exists by re-running the original curl
command.
$ curl -v https://docker-registry.default.svc:5000/healthz
If the timeout persists, continue to adjust the MTU size in increments of 50 bytes and repeat the
process.
20
CHAPTER 4. CREATING AN ENVIRONMENT-WIDE BACKUP
In OpenShift Container Platform, you can back up, saving state to separate storage, at the cluster level.
The full state of an environment backup includes:
API objects
Registry storage
Volume storage
IMPORTANT
The following process describes a generic way of backing up applications and the
OpenShift Container Platform cluster. It cannot take into account custom requirements.
Use these steps as a foundation for a full backup and restoration procedure for your
cluster. You must take all necessary precautions to prevent data loss.
Backup and restore is not guaranteed. You are responsible for backing up your own data.
The master instances run important services, such as the API, controllers. The /etc/origin/master
directory stores many important files:
Keys and other authentication files, such as htpasswd if you use htpasswd
And more
You can customize OpenShift Container Platform services, such as increasing the log level or using
proxies. The configuration files are stored in the /etc/sysconfig directory.
Because the masters are also nodes, back up the entire /etc/origin directory.
21
OpenShift Container Platform 3.11 Day Two Operations Guide
Procedure
IMPORTANT
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
$ sudo cp -aR /etc/sysconfig/ ${MYBACKUPDIR}/etc/sysconfig/
NOTE
WARNING
IMPORTANT
3. Other important files that need to be considered when planning a backup include:
File Description
22
CHAPTER 4. CREATING AN ENVIRONMENT-WIDE BACKUP
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo mkdir -p ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors
$ sudo cp -aR /etc/sysconfig/{iptables,docker-*,flanneld} \
${MYBACKUPDIR}/etc/sysconfig/
$ sudo cp -aR /etc/dnsmasq* /etc/cni ${MYBACKUPDIR}/etc/
$ sudo cp -aR /etc/pki/ca-trust/source/anchors/* \
${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/
4. If a package is accidentally removed or you need to resore a file that is included in an rpm
package, having a list of rhel packages installed on the system can be useful.
NOTE
If you use Red Hat Satellite features, such as content views or the facts store,
provide a proper mechanism to reinstall the missing packages and a historical
data of packages installed in the systems.
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}
$ rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt
5. If you used the previous steps, the following files are present in the backup directory:
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo find ${MYBACKUPDIR} -mindepth 1 -type f -printf '%P\n'
etc/sysconfig/flanneld
etc/sysconfig/iptables
etc/sysconfig/docker-network
etc/sysconfig/docker-storage
23
OpenShift Container Platform 3.11 Day Two Operations Guide
etc/sysconfig/docker-storage-setup
etc/sysconfig/docker-storage-setup.rpmnew
etc/origin/master/ca.crt
etc/origin/master/ca.key
etc/origin/master/ca.serial.txt
etc/origin/master/ca-bundle.crt
etc/origin/master/master.proxy-client.crt
etc/origin/master/master.proxy-client.key
etc/origin/master/service-signer.crt
etc/origin/master/service-signer.key
etc/origin/master/serviceaccounts.private.key
etc/origin/master/serviceaccounts.public.key
etc/origin/master/openshift-master.crt
etc/origin/master/openshift-master.key
etc/origin/master/openshift-master.kubeconfig
etc/origin/master/master.server.crt
etc/origin/master/master.server.key
etc/origin/master/master.kubelet-client.crt
etc/origin/master/master.kubelet-client.key
etc/origin/master/admin.crt
etc/origin/master/admin.key
etc/origin/master/admin.kubeconfig
etc/origin/master/etcd.server.crt
etc/origin/master/etcd.server.key
etc/origin/master/master.etcd-client.key
etc/origin/master/master.etcd-client.csr
etc/origin/master/master.etcd-client.crt
etc/origin/master/master.etcd-ca.crt
etc/origin/master/policy.json
etc/origin/master/scheduler.json
etc/origin/master/htpasswd
etc/origin/master/session-secrets.yaml
etc/origin/master/openshift-router.crt
etc/origin/master/openshift-router.key
etc/origin/master/registry.crt
etc/origin/master/registry.key
etc/origin/master/master-config.yaml
etc/origin/generated-configs/master-master-1.example.com/master.server.crt
...[OUTPUT OMITTED]...
etc/origin/cloudprovider/openstack.conf
etc/origin/node/system:node:master-0.example.com.crt
etc/origin/node/system:node:master-0.example.com.key
etc/origin/node/ca.crt
etc/origin/node/system:node:master-0.example.com.kubeconfig
etc/origin/node/server.crt
etc/origin/node/server.key
etc/origin/node/node-dnsmasq.conf
etc/origin/node/resolv.conf
etc/origin/node/node-config.yaml
etc/origin/node/flannel.etcd-client.key
etc/origin/node/flannel.etcd-client.csr
etc/origin/node/flannel.etcd-client.crt
etc/origin/node/flannel.etcd-ca.crt
etc/pki/ca-trust/source/anchors/openshift-ca.crt
etc/pki/ca-trust/source/anchors/registry-ca.crt
etc/dnsmasq.conf
24
CHAPTER 4. CREATING AN ENVIRONMENT-WIDE BACKUP
etc/dnsmasq.d/origin-dns.conf
etc/dnsmasq.d/origin-upstream-dns.conf
etc/dnsmasq.d/node-dnsmasq.conf
packages.txt
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR
$ sudo rm -Rf ${MYBACKUPDIR}
To create any of these files from scratch, the openshift-ansible-contrib repository contains the
backup_master_node.sh script, which performs the previous steps. The script creates a directory on
the host where you run the script and copies all the files previously mentioned.
NOTE
The openshift-ansible-contrib script is not supported by Red Hat, but the reference
architecture team performs testing to ensure the code operates as defined and is secure.
$ mkdir ~/git
$ cd ~/git
$ git clone https://github.com/openshift/openshift-ansible-contrib.git
$ cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
$ ./backup_master_node.sh -h
The backup process is to be performed before any change to the infrastructure, such as a system
update, upgrade, or any other significant modification. Backups should be performed on a regular basis
to ensure the most recent data is available if a failure occurs.
Node instances run applications in the form of pods, which are based on containers. The /etc/origin/ and
/etc/origin/node directories house important files, such as:
The OpenShift Container Platform services can be customized to increase the log level, use proxies,
25
OpenShift Container Platform 3.11 Day Two Operations Guide
The OpenShift Container Platform services can be customized to increase the log level, use proxies,
and more, and the configuration files are stored in the /etc/sysconfig directory.
Procedure
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
$ sudo cp -aR /etc/sysconfig/atomic-openshift-node ${MYBACKUPDIR}/etc/sysconfig/
2. OpenShift Container Platform uses specific files that must be taken into account when planning
the backup policy, including:
File Description
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo mkdir -p ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors
$ sudo cp -aR /etc/sysconfig/{iptables,docker-*,flanneld} \
${MYBACKUPDIR}/etc/sysconfig/
$ sudo cp -aR /etc/dnsmasq* /etc/cni ${MYBACKUPDIR}/etc/
$ sudo cp -aR /etc/pki/ca-trust/source/anchors/* \
${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/
26
CHAPTER 4. CREATING AN ENVIRONMENT-WIDE BACKUP
NOTE
If using Red Hat Satellite features, such as content views or the facts store,
provide a proper mechanism to reinstall the missing packages and a historical
data of packages installed in the systems.
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}
$ rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo find ${MYBACKUPDIR} -mindepth 1 -type f -printf '%P\n'
etc/sysconfig/atomic-openshift-node
etc/sysconfig/flanneld
etc/sysconfig/iptables
etc/sysconfig/docker-network
etc/sysconfig/docker-storage
etc/sysconfig/docker-storage-setup
etc/sysconfig/docker-storage-setup.rpmnew
etc/origin/node/system:node:app-node-0.example.com.crt
etc/origin/node/system:node:app-node-0.example.com.key
etc/origin/node/ca.crt
etc/origin/node/system:node:app-node-0.example.com.kubeconfig
etc/origin/node/server.crt
etc/origin/node/server.key
etc/origin/node/node-dnsmasq.conf
etc/origin/node/resolv.conf
etc/origin/node/node-config.yaml
etc/origin/node/flannel.etcd-client.key
etc/origin/node/flannel.etcd-client.csr
etc/origin/node/flannel.etcd-client.crt
etc/origin/node/flannel.etcd-ca.crt
etc/origin/cloudprovider/openstack.conf
etc/pki/ca-trust/source/anchors/openshift-ca.crt
etc/pki/ca-trust/source/anchors/registry-ca.crt
etc/dnsmasq.conf
etc/dnsmasq.d/origin-dns.conf
etc/dnsmasq.d/origin-upstream-dns.conf
etc/dnsmasq.d/node-dnsmasq.conf
packages.txt
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR
$ sudo rm -Rf ${MYBACKUPDIR}
27
OpenShift Container Platform 3.11 Day Two Operations Guide
To create any of these files from scratch, the openshift-ansible-contrib repository contains the
backup_master_node.sh script, which performs the previous steps. The script creates a directory on
the host running the script and copies all the files previously mentioned.
NOTE
The openshift-ansible-contrib script is not supported by Red Hat, but the reference
architecture team performs testing to ensure the code operates as defined and is secure.
$ mkdir ~/git
$ cd ~/git
$ git clone https://github.com/openshift/openshift-ansible-contrib.git
$ cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
$ ./backup_master_node.sh -h
IMPORTANT
Procedure
# cd /etc/docker/certs.d/
# tar cf /tmp/docker-registry-certs-$(hostname).tar *
NOTE
When working with one or more external secured registry, any host that pulls or pushes
images must trust the registry certificates to run pods.
Procedure
1. Because the restoration procedure involves a complete reinstallation, save all the files used in
the initial installation. These files might include:
2. Backup the procedures for post-installation steps. Some installations might involve steps that
28
CHAPTER 4. CREATING AN ENVIRONMENT-WIDE BACKUP
2. Backup the procedures for post-installation steps. Some installations might involve steps that
are not included in the installer. These steps might include changes to the services outside of
the control of OpenShift Container Platform or the installation of extra services like monitoring
agents. Additional configuration that is not yet supported by the advanced installer might also
be affected, such as using multiple authentication providers.
WARNING
This is a generic backup of application data and does not take into account
application-specific backup procedures, for example, special export and import
procedures for database systems.
Other means of backup might exist depending on the type of the persistent volume you use, for
example, Cinder, NFS, or Gluster.
The paths to back up are also application specific. You can determine what path to back up by looking at
the mountPath for volumes in the deploymentconfig.
NOTE
You can perform this type of application data backup only while the application pod is
running.
Procedure
29
OpenShift Container Platform 3.11 Day Two Operations Guide
OpenShift Container Platform versions prior to 3.5 use etcd version 2 (v2), while 3.5 and later use
version 3 (v3). The data model between the two versions of etcd is different. etcd v3 can use both the
v2 and v3 data models, whereas etcd v2 can only use the v2 data model. In an etcd v3 server, the v2 and
v3 data stores exist in parallel and are independent.
For both v2 and v3 operations, you can use the ETCDCTL_API environment variable to use the correct
API:
$ etcdctl -v
etcdctl version: 3.2.28
API version: 2
See Migrating etcd Data (v2 to v3) section in the OpenShift Container Platform 3.7 documentation for
information about how to migrate to v3.
In OpenShift Container Platform version 3.10 and later, you can either install etcd on separate hosts or
run it as a static pod on your master hosts. If you do not specify separate etcd hosts, etcd runs as a
static pod on master hosts. Because of this difference, the backup process is different if you use static
pods.
You can perform the data backup process on any host that has connectivity to the etcd cluster, where
the proper certificates are provided, and where the etcdctl tool is installed.
NOTE
The backup files must be copied to an external system, ideally outside the OpenShift
Container Platform environment, and then encrypted.
Note that the etcd backup still has all the references to current storage volumes. When you restore etcd,
OpenShift Container Platform starts launching the previous pods on nodes and reattaching the same
storage. This process is no different than the process of when you remove a node from the cluster and
add a new one back in its place. Anything attached to that node is reattached to the pods on whatever
nodes they are rescheduled to.
The etcd configuration files to be preserved are all stored in the /etc/etcd directory of the instances
30
CHAPTER 4. CREATING AN ENVIRONMENT-WIDE BACKUP
The etcd configuration files to be preserved are all stored in the /etc/etcd directory of the instances
where etcd is running. This includes the etcd configuration file (/etc/etcd/etcd.conf) and the required
certificates for cluster communication. All those files are generated at installation time by the Ansible
installer.
Procedure
For each etcd member of the cluster, back up the etcd configuration.
$ ssh master-0 1
# mkdir -p /backup/etcd-config-$(date +%Y%m%d)/
# cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/
NOTE
The certificates and configuration files on each etcd cluster member are unique.
Prerequisites
NOTE
The OpenShift Container Platform installer creates aliases to avoid typing all the flags
named etcdctl2 for etcd v2 tasks and etcdctl3 for etcd v3 tasks.
However, the etcdctl3 alias does not provide the full endpoint list to the etcdctl
command, so you must specify the --endpoints option and list all the endpoints.
etcdctl binaries must be available or, in containerized installations, the rhel7/etcd container
must be available.
# etcdctl3 --cert="/etc/etcd/peer.crt" \
--key=/etc/etcd/peer.key \
--cacert="/etc/etcd/ca.crt" \
--endpoints="https://master-0.example.com:2379,https://master-
1.example.com:2379,https://master-2.example.com:2379" \ 1
endpoint health
31
OpenShift Container Platform 3.11 Day Two Operations Guide
Example Output
Example Output
Procedure
NOTE
While the etcdctl backup command is used to perform the backup, etcd v3 has no
concept of a backup. Instead, you either take a snapshot from a live member with the
etcdctl snapshot save command or copy the member/snap/db file from an etcd data
directory.
The etcdctl backup command rewrites some of the metadata contained in the backup,
specifically, the node ID and cluster ID, which means that in the backup, the node loses its
former identity. To recreate a cluster from the backup, you create a new, single-node
cluster, then add the rest of the nodes to the cluster. The metadata is rewritten to
prevent the new node from joining an existing cluster.
IMPORTANT
1. Obtain the etcd endpoint IP address from the static pod manifest:
$ export ETCD_POD_MANIFEST="/etc/origin/node/pods/etcd.yaml"
2. Log in as an administrator:
32
CHAPTER 4. CREATING AN ENVIRONMENT-WIDE BACKUP
$ oc login -u system:admin
$ oc project kube-system
5. Take a snapshot of the etcd data in the pod and store it locally:
IMPORTANT
Because the oc get all command returns only certain project resources, you must
separately back up other resources, including PVCs and Secrets, as shown in the
following steps.
Procedure
$ oc get all
Example Output
33
OpenShift Container Platform 3.11 Day Two Operations Guide
3. Export other objects in your project, such as role bindings, secrets, service accounts, and
persistent volume claims.
You can export all namespaced objects in your project using the following command:
Note that some resources cannot be exported, and a MethodNotAllowed error is displayed.
4. Some exported objects can rely on specific metadata or references to unique IDs in the project.
This is a limitation on the usability of the recreated objects.
When using imagestreams, the image parameter of a deploymentconfig can point to a
specific sha checksum of an image in the internal registry that would not exist in a restored
environment. For instance, running the sample "ruby-ex" as oc new-app centos/ruby-22-
centos7~https://github.com/sclorg/ruby-ex.git creates an imagestream ruby-ex using the
internal registry to host the image:
If importing the deploymentconfig as it is exported with oc get --export it fails if the image
does not exist.
34
CHAPTER 4. CREATING AN ENVIRONMENT-WIDE BACKUP
IMPORTANT
Consult any product documentation for the correct backup procedures of specific applications. For
example, copying the mysql data directory itself does not create a usable backup. Instead, run the
specific backup procedures of the associated application and then synchronize any data. This includes
using snapshot solutions provided by the OpenShift Container Platform hosting platform.
Procedure
$ oc get pods
NAME READY STATUS RESTARTS AGE
demo-1-build 0/1 Completed 0 2h
demo-2-fxx6d 1/1 Running 0 1h
2. Describe the desired pod to find the volumes that are currently used by a persistent volume:
35
OpenShift Container Platform 3.11 Day Two Operations Guide
This output shows that the persistent data is in the /opt/app-root/src/uploaded directory.
The ocp_sop.txt file is downloaded to the local system to be backed up by backup software or
another backup mechanism.
NOTE
You can also use the previous steps if a pod starts without needing to use a pvc,
but you later decide that a pvc is necessary. You can preserve the data and then
use the restorate process to populate the new storage.
36
CHAPTER 5. HOST-LEVEL TASKS
The reasons to deprecate or scale down master hosts include hardware re-sizing or replacing the
underlying infrastructure.
Highly available OpenShift Container Platform environments require at least three master hosts and
three etcd nodes. Usually, the master hosts are colocated with the etcd services. If you deprecate a
master host, you also remove the etcd static pods from that host.
IMPORTANT
Ensure that the master and etcd services are always deployed in odd numbers due to the
voting mechanisms that take place among those services.
Perform this backup process before any change to the OpenShift Container Platform infrastructure,
such as a system update, upgrade, or any other significant modification. Back up data regularly to
ensure that recent data is available if a failure occurs.
The master instances run important services, such as the API, controllers. The /etc/origin/master
directory stores many important files:
Keys and other authentication files, such as htpasswd if you use htpasswd
And more
You can customize OpenShift Container Platform services, such as increasing the log level or using
proxies. The configuration files are stored in the /etc/sysconfig directory.
Because the masters are also nodes, back up the entire /etc/origin directory.
Procedure
IMPORTANT
37
OpenShift Container Platform 3.11 Day Two Operations Guide
IMPORTANT
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
$ sudo cp -aR /etc/sysconfig/ ${MYBACKUPDIR}/etc/sysconfig/
NOTE
WARNING
IMPORTANT
3. Other important files that need to be considered when planning a backup include:
File Description
38
CHAPTER 5. HOST-LEVEL TASKS
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo mkdir -p ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors
$ sudo cp -aR /etc/sysconfig/{iptables,docker-*,flanneld} \
${MYBACKUPDIR}/etc/sysconfig/
$ sudo cp -aR /etc/dnsmasq* /etc/cni ${MYBACKUPDIR}/etc/
$ sudo cp -aR /etc/pki/ca-trust/source/anchors/* \
${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/
4. If a package is accidentally removed or you need to resore a file that is included in an rpm
package, having a list of rhel packages installed on the system can be useful.
NOTE
If you use Red Hat Satellite features, such as content views or the facts store,
provide a proper mechanism to reinstall the missing packages and a historical
data of packages installed in the systems.
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}
$ rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt
5. If you used the previous steps, the following files are present in the backup directory:
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo find ${MYBACKUPDIR} -mindepth 1 -type f -printf '%P\n'
etc/sysconfig/flanneld
etc/sysconfig/iptables
etc/sysconfig/docker-network
etc/sysconfig/docker-storage
etc/sysconfig/docker-storage-setup
etc/sysconfig/docker-storage-setup.rpmnew
39
OpenShift Container Platform 3.11 Day Two Operations Guide
etc/origin/master/ca.crt
etc/origin/master/ca.key
etc/origin/master/ca.serial.txt
etc/origin/master/ca-bundle.crt
etc/origin/master/master.proxy-client.crt
etc/origin/master/master.proxy-client.key
etc/origin/master/service-signer.crt
etc/origin/master/service-signer.key
etc/origin/master/serviceaccounts.private.key
etc/origin/master/serviceaccounts.public.key
etc/origin/master/openshift-master.crt
etc/origin/master/openshift-master.key
etc/origin/master/openshift-master.kubeconfig
etc/origin/master/master.server.crt
etc/origin/master/master.server.key
etc/origin/master/master.kubelet-client.crt
etc/origin/master/master.kubelet-client.key
etc/origin/master/admin.crt
etc/origin/master/admin.key
etc/origin/master/admin.kubeconfig
etc/origin/master/etcd.server.crt
etc/origin/master/etcd.server.key
etc/origin/master/master.etcd-client.key
etc/origin/master/master.etcd-client.csr
etc/origin/master/master.etcd-client.crt
etc/origin/master/master.etcd-ca.crt
etc/origin/master/policy.json
etc/origin/master/scheduler.json
etc/origin/master/htpasswd
etc/origin/master/session-secrets.yaml
etc/origin/master/openshift-router.crt
etc/origin/master/openshift-router.key
etc/origin/master/registry.crt
etc/origin/master/registry.key
etc/origin/master/master-config.yaml
etc/origin/generated-configs/master-master-1.example.com/master.server.crt
...[OUTPUT OMITTED]...
etc/origin/cloudprovider/openstack.conf
etc/origin/node/system:node:master-0.example.com.crt
etc/origin/node/system:node:master-0.example.com.key
etc/origin/node/ca.crt
etc/origin/node/system:node:master-0.example.com.kubeconfig
etc/origin/node/server.crt
etc/origin/node/server.key
etc/origin/node/node-dnsmasq.conf
etc/origin/node/resolv.conf
etc/origin/node/node-config.yaml
etc/origin/node/flannel.etcd-client.key
etc/origin/node/flannel.etcd-client.csr
etc/origin/node/flannel.etcd-client.crt
etc/origin/node/flannel.etcd-ca.crt
etc/pki/ca-trust/source/anchors/openshift-ca.crt
etc/pki/ca-trust/source/anchors/registry-ca.crt
etc/dnsmasq.conf
etc/dnsmasq.d/origin-dns.conf
40
CHAPTER 5. HOST-LEVEL TASKS
etc/dnsmasq.d/origin-upstream-dns.conf
etc/dnsmasq.d/node-dnsmasq.conf
packages.txt
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR
$ sudo rm -Rf ${MYBACKUPDIR}
To create any of these files from scratch, the openshift-ansible-contrib repository contains the
backup_master_node.sh script, which performs the previous steps. The script creates a directory on
the host where you run the script and copies all the files previously mentioned.
NOTE
The openshift-ansible-contrib script is not supported by Red Hat, but the reference
architecture team performs testing to ensure the code operates as defined and is secure.
$ mkdir ~/git
$ cd ~/git
$ git clone https://github.com/openshift/openshift-ansible-contrib.git
$ cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
$ ./backup_master_node.sh -h
When you back up etcd, you must back up both the etcd configuration files and the etcd data.
The etcd configuration files to be preserved are all stored in the /etc/etcd directory of the instances
where etcd is running. This includes the etcd configuration file (/etc/etcd/etcd.conf) and the required
certificates for cluster communication. All those files are generated at installation time by the Ansible
installer.
Procedure
For each etcd member of the cluster, back up the etcd configuration.
$ ssh master-0 1
# mkdir -p /backup/etcd-config-$(date +%Y%m%d)/
# cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/
NOTE
The certificates and configuration files on each etcd cluster member are unique.
41
OpenShift Container Platform 3.11 Day Two Operations Guide
Prerequisites
NOTE
The OpenShift Container Platform installer creates aliases to avoid typing all the flags
named etcdctl2 for etcd v2 tasks and etcdctl3 for etcd v3 tasks.
However, the etcdctl3 alias does not provide the full endpoint list to the etcdctl
command, so you must specify the --endpoints option and list all the endpoints.
etcdctl binaries must be available or, in containerized installations, the rhel7/etcd container
must be available.
Procedure
NOTE
While the etcdctl backup command is used to perform the backup, etcd v3 has no
concept of a backup. Instead, you either take a snapshot from a live member with the
etcdctl snapshot save command or copy the member/snap/db file from an etcd data
directory.
The etcdctl backup command rewrites some of the metadata contained in the backup,
specifically, the node ID and cluster ID, which means that in the backup, the node loses its
former identity. To recreate a cluster from the backup, you create a new, single-node
cluster, then add the rest of the nodes to the cluster. The metadata is rewritten to
prevent the new node from joining an existing cluster.
IMPORTANT
1. Obtain the etcd endpoint IP address from the static pod manifest:
$ export ETCD_POD_MANIFEST="/etc/origin/node/pods/etcd.yaml"
2. Log in as an administrator:
42
CHAPTER 5. HOST-LEVEL TASKS
$ oc login -u system:admin
$ oc project kube-system
5. Take a snapshot of the etcd data in the pod and store it locally:
Master hosts run important services, such as the OpenShift Container Platform API and controllers
services. In order to deprecate a master host, these services must be stopped.
The OpenShift Container Platform API service is an active/active service, so stopping the service does
not affect the environment as long as the requests are sent to a separate master server. However, the
OpenShift Container Platform controllers service is an active/passive service, where the services use
etcd to decide the active master.
Deprecating a master host in a multi-master architecture includes removing the master from the load
balancer pool to avoid new connections attempting to use that master. This process depends heavily on
the load balancer used. The steps below show the details of removing the master from haproxy. In the
event that OpenShift Container Platform is running on a cloud provider, or using a F5 appliance, see the
specific product documents to remove the master from rotation.
Procedure
1. Remove the backend section in the /etc/haproxy/haproxy.cfg configuration file. For example,
if deprecating a master named master-0.example.com using haproxy, ensure the host name is
removed from the following:
backend mgmt8443
balance source
mode tcp
# MASTERS 8443
server master-1.example.com 192.168.55.12:8443 check
server master-2.example.com 192.168.55.13:8443 check
43
OpenShift Container Platform 3.11 Day Two Operations Guide
3. Once the master is removed from the load balancer, disable the API and controller services by
moving definition files out of the static pods dir /etc/origin/node/pods:
# mkdir -p /etc/origin/node/pods/disabled
# mv /etc/origin/node/pods/controller.yaml /etc/origin/node/pods/disabled/:
+
4. Because the master host is a schedulable OpenShift Container Platform node, follow the steps
in the Deprecating a node host section.
5. Remove the master host from the [masters] and [nodes] groups in the /etc/ansible/hosts
Ansible inventory file to avoid issues if running any Ansible tasks using that inventory file.
WARNING
Deprecating the first master host listed in the Ansible inventory file requires
extra precautions.
IMPORTANT
6. The kubernetes service includes the master host IPs as endpoints. To verify that the master has
been properly deprecated, review the kubernetes service output and see if the deprecated
master has been removed:
44
CHAPTER 5. HOST-LEVEL TASKS
Endpoints: 192.168.55.12:8053,192.168.55.13:8053
Session Affinity: ClientIP
Events: <none>
After the master has been successfully deprecated, the host where the master was previously
running can be safely deleted.
Procedure
1. Remove each other etcd host from the etcd cluster. Run the following command for each etcd
node:
Procedure
# etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
failed to check the health of member 8372784203e11288 on https://192.168.55.21:2379: Get
https://192.168.55.21:2379/health: dial tcp 192.168.55.21:2379: getsockopt: connection
refused
member 8372784203e11288 is unreachable: [https://192.168.55.21:2379] are all
unreachable
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy
# etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy
1 The remove command requires the etcd ID, not the hostname.
2. To ensure the etcd configuration does not use the failed host when the etcd service is restarted,
45
OpenShift Container Platform 3.11 Day Two Operations Guide
2. To ensure the etcd configuration does not use the failed host when the etcd service is restarted,
modify the /etc/etcd/etcd.conf file on all remaining etcd hosts and remove the failed host in the
value for the ETCD_INITIAL_CLUSTER variable:
# vi /etc/etcd/etcd.conf
For example:
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-
1.example.com=https://192.168.55.12:2380,master-
2.example.com=https://192.168.55.13:2380
becomes:
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-
1.example.com=https://192.168.55.12:2380
NOTE
Restarting the etcd services is not required, because the failed host is removed
using etcdctl.
3. Modify the Ansible inventory file to reflect the current status of the cluster and to avoid issues
when re-running a playbook:
[OSEv3:children]
masters
nodes
etcd
[etcd]
master-0.example.com
master-1.example.com
4. If you are using Flannel, modify the flanneld service configuration located at
/etc/sysconfig/flanneld on every host and remove the etcd host:
FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-
1.example.com:2379,https://master-2.example.com:2379
46
CHAPTER 5. HOST-LEVEL TASKS
The master instances run important services, such as the API, controllers. The /etc/origin/master
directory stores many important files:
Keys and other authentication files, such as htpasswd if you use htpasswd
And more
You can customize OpenShift Container Platform services, such as increasing the log level or using
proxies. The configuration files are stored in the /etc/sysconfig directory.
Because the masters are also nodes, back up the entire /etc/origin directory.
Procedure
IMPORTANT
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
$ sudo cp -aR /etc/sysconfig/ ${MYBACKUPDIR}/etc/sysconfig/
NOTE
WARNING
IMPORTANT
47
OpenShift Container Platform 3.11 Day Two Operations Guide
IMPORTANT
3. Other important files that need to be considered when planning a backup include:
File Description
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo mkdir -p ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors
$ sudo cp -aR /etc/sysconfig/{iptables,docker-*,flanneld} \
${MYBACKUPDIR}/etc/sysconfig/
$ sudo cp -aR /etc/dnsmasq* /etc/cni ${MYBACKUPDIR}/etc/
$ sudo cp -aR /etc/pki/ca-trust/source/anchors/* \
${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/
4. If a package is accidentally removed or you need to resore a file that is included in an rpm
package, having a list of rhel packages installed on the system can be useful.
48
CHAPTER 5. HOST-LEVEL TASKS
NOTE
If you use Red Hat Satellite features, such as content views or the facts store,
provide a proper mechanism to reinstall the missing packages and a historical
data of packages installed in the systems.
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}
$ rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt
5. If you used the previous steps, the following files are present in the backup directory:
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo find ${MYBACKUPDIR} -mindepth 1 -type f -printf '%P\n'
etc/sysconfig/flanneld
etc/sysconfig/iptables
etc/sysconfig/docker-network
etc/sysconfig/docker-storage
etc/sysconfig/docker-storage-setup
etc/sysconfig/docker-storage-setup.rpmnew
etc/origin/master/ca.crt
etc/origin/master/ca.key
etc/origin/master/ca.serial.txt
etc/origin/master/ca-bundle.crt
etc/origin/master/master.proxy-client.crt
etc/origin/master/master.proxy-client.key
etc/origin/master/service-signer.crt
etc/origin/master/service-signer.key
etc/origin/master/serviceaccounts.private.key
etc/origin/master/serviceaccounts.public.key
etc/origin/master/openshift-master.crt
etc/origin/master/openshift-master.key
etc/origin/master/openshift-master.kubeconfig
etc/origin/master/master.server.crt
etc/origin/master/master.server.key
etc/origin/master/master.kubelet-client.crt
etc/origin/master/master.kubelet-client.key
etc/origin/master/admin.crt
etc/origin/master/admin.key
etc/origin/master/admin.kubeconfig
etc/origin/master/etcd.server.crt
etc/origin/master/etcd.server.key
etc/origin/master/master.etcd-client.key
etc/origin/master/master.etcd-client.csr
etc/origin/master/master.etcd-client.crt
etc/origin/master/master.etcd-ca.crt
etc/origin/master/policy.json
etc/origin/master/scheduler.json
etc/origin/master/htpasswd
etc/origin/master/session-secrets.yaml
etc/origin/master/openshift-router.crt
etc/origin/master/openshift-router.key
etc/origin/master/registry.crt
49
OpenShift Container Platform 3.11 Day Two Operations Guide
etc/origin/master/registry.key
etc/origin/master/master-config.yaml
etc/origin/generated-configs/master-master-1.example.com/master.server.crt
...[OUTPUT OMITTED]...
etc/origin/cloudprovider/openstack.conf
etc/origin/node/system:node:master-0.example.com.crt
etc/origin/node/system:node:master-0.example.com.key
etc/origin/node/ca.crt
etc/origin/node/system:node:master-0.example.com.kubeconfig
etc/origin/node/server.crt
etc/origin/node/server.key
etc/origin/node/node-dnsmasq.conf
etc/origin/node/resolv.conf
etc/origin/node/node-config.yaml
etc/origin/node/flannel.etcd-client.key
etc/origin/node/flannel.etcd-client.csr
etc/origin/node/flannel.etcd-client.crt
etc/origin/node/flannel.etcd-ca.crt
etc/pki/ca-trust/source/anchors/openshift-ca.crt
etc/pki/ca-trust/source/anchors/registry-ca.crt
etc/dnsmasq.conf
etc/dnsmasq.d/origin-dns.conf
etc/dnsmasq.d/origin-upstream-dns.conf
etc/dnsmasq.d/node-dnsmasq.conf
packages.txt
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR
$ sudo rm -Rf ${MYBACKUPDIR}
To create any of these files from scratch, the openshift-ansible-contrib repository contains the
backup_master_node.sh script, which performs the previous steps. The script creates a directory on
the host where you run the script and copies all the files previously mentioned.
NOTE
The openshift-ansible-contrib script is not supported by Red Hat, but the reference
architecture team performs testing to ensure the code operates as defined and is secure.
$ mkdir ~/git
$ cd ~/git
$ git clone https://github.com/openshift/openshift-ansible-contrib.git
$ cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
$ ./backup_master_node.sh -h
50
CHAPTER 5. HOST-LEVEL TASKS
Procedure
# MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
# cp /etc/origin/master/master-config.yaml /etc/origin/master/master-config.yaml.old
# cp /backup/$(hostname)/$(date +%Y%m%d)/origin/master/master-config.yaml
/etc/origin/master/master-config.yaml
# master-restart api
# master-restart controllers
WARNING
Restarting the master services can lead to downtime. However, you can
remove the master host from the highly available load balancer pool, then
perform the restore operation. Once the service has been properly
restored, you can add the master host back to the load balancer pool.
NOTE
2. If you cannot restart OpenShift Container Platform because packages are missing, reinstall the
packages.
> ansible-2.4.0.0-5.el7.noarch
1 Replace <packages> with the packages that are different between the package lists.
$ MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
$ sudo cp ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/<certificate> /etc/pki/ca-
trust/source/anchors/ 1
51
OpenShift Container Platform 3.11 Day Two Operations Guide
$ sudo update-ca-trust
1 Replace <certificate> with the file name of the system certificate to restore.
NOTE
Always ensure the user ID and group ID are restored when the files are copied
back, as well as the SELinux context.
Prerequisites
Ensure enough capacity is available to migrate the existing pods from the node set to be removed.
Removing an infrastructure node is advised only when at least two more nodes will stay online after the
infrastructure node is removed.
Procedure
$ oc get nodes
NAME STATUS AGE VERSION
ocp-infra-node-b7pl Ready 23h v1.6.1+5115d708d7
ocp-infra-node-p5zj Ready 23h v1.6.1+5115d708d7
ocp-infra-node-rghb Ready 23h v1.6.1+5115d708d7
ocp-master-dgf8 Ready,SchedulingDisabled 23h v1.6.1+5115d708d7
ocp-master-q1v2 Ready,SchedulingDisabled 23h v1.6.1+5115d708d7
ocp-master-vq70 Ready,SchedulingDisabled 23h v1.6.1+5115d708d7
ocp-node-020m Ready 23h v1.6.1+5115d708d7
ocp-node-7t5p Ready 23h v1.6.1+5115d708d7
ocp-node-n0dd Ready 23h v1.6.1+5115d708d7
52
CHAPTER 5. HOST-LEVEL TASKS
Conditions:
...
Addresses: 10.156.0.11,ocp-infra-node-b7pl
Capacity:
cpu: 2
memory: 7494480Ki
pods: 20
Allocatable:
cpu: 2
memory: 7392080Ki
pods: 20
System Info:
Machine ID: bc95ccf67d047f2ae42c67862c202e44
System UUID: 9762CC3D-E23C-AB13-B8C5-FA16F0BCCE4C
Boot ID: ca8bf088-905d-4ec0-beec-8f89f4527ce4
Kernel Version: 3.10.0-693.5.2.el7.x86_64
OS Image: Employee SKU
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.6.1+5115d708d7
Kube-Proxy Version: v1.6.1+5115d708d7
ExternalID: 437740049672994824
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default docker-registry-1-5szjs 100m (5%) 0 (0%) 256Mi (3%)0 (0%)
default router-1-vzlzq 100m (5%) 0 (0%) 256Mi (3%)0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
200m (10%) 0 (0%) 512Mi (7%) 0 (0%)
Events: <none>
The output above shows that the node is running two pods: router-1-vzlzq and docker-
registry-1-5szjs. Two more infrastructure nodes are available to migrate these two pods.
NOTE
The cluster described above is a highly available cluster, this means both the
router and docker-registry services are running on all infrastructure nodes.
If the pod has attached local storage (for example, EmptyDir), the --delete-local-data option
must be provided. Generally, pods running in production should use the local storage only for
temporary or cache files, but not for anything important or persistent. For regular storage,
53
OpenShift Container Platform 3.11 Day Two Operations Guide
applications should use object storage or persistent volumes. In this case, the docker-registry
pod’s local storage is empty, because the object storage is used instead to store the container
images.
NOTE
The above operation deletes existing pods running on the node. Then, new pods
are created according to the replication controller.
oc adm drain will not delete any bare pods (pods that are neither mirror pods
nor managed by ReplicationController, ReplicaSet, DaemonSet, StatefulSet, or
a job). To do so, the --force option is required. Be aware that the bare pods will
not be recreated on other nodes and data may be lost during this operation.
The example below shows the output of the replication controller of the registry:
$ oc describe rc/docker-registry-1
Name: docker-registry-1
Namespace: default
Selector: deployment=docker-registry-1,deploymentconfig=docker-registry,docker-
registry=default
Labels: docker-registry=default
openshift.io/deployment-config.name=docker-registry
Annotations: ...
Replicas: 3 current / 3 desired
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: deployment=docker-registry-1
deploymentconfig=docker-registry
docker-registry=default
Annotations: openshift.io/deployment-config.latest-version=1
openshift.io/deployment-config.name=docker-registry
openshift.io/deployment.name=docker-registry-1
Service Account: registry
Containers:
registry:
Image: openshift3/ose-docker-registry:v3.6.173.0.49
Port: 5000/TCP
Requests:
cpu: 100m
memory: 256Mi
Liveness: http-get https://:5000/healthz delay=10s timeout=5s period=10s #success=1
#failure=3
Readiness: http-get https://:5000/healthz delay=0s timeout=5s period=10s #success=1
#failure=3
Environment:
REGISTRY_HTTP_ADDR: :5000
REGISTRY_HTTP_NET: tcp
REGISTRY_HTTP_SECRET: tyGEnDZmc8dQfioP3WkNd5z+Xbdfy/JVXf/NLo3s/zE=
REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA: false
REGISTRY_HTTP_TLS_KEY: /etc/secrets/registry.key
OPENSHIFT_DEFAULT_REGISTRY: docker-registry.default.svc:5000
54
CHAPTER 5. HOST-LEVEL TASKS
REGISTRY_CONFIGURATION_PATH: /etc/registry/config.yml
REGISTRY_HTTP_TLS_CERTIFICATE: /etc/secrets/registry.crt
Mounts:
/etc/registry from docker-config (rw)
/etc/secrets from registry-certificates (rw)
/registry from registry-storage (rw)
Volumes:
registry-storage:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
registry-certificates:
Type: Secret (a volume populated by a Secret)
SecretName: registry-certificates
Optional: false
docker-config:
Type: Secret (a volume populated by a Secret)
SecretName: registry-config
Optional: false
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
49m 49m 1 replication-controller Normal SuccessfulCreate Created pod: docker-registry-
1-dprp5
The event at the bottom of the output displays information about new pod creation. So, when
listing all pods:
$ oc get pods
NAME READY STATUS RESTARTS AGE
docker-registry-1-dprp5 1/1 Running 0 52m
docker-registry-1-kr8jq 1/1 Running 0 1d
docker-registry-1-ncpl2 1/1 Running 0 1d
registry-console-1-g4nqg 1/1 Running 0 1d
router-1-2gshr 0/1 Pending 0 52m
router-1-85qm4 1/1 Running 0 1d
router-1-q5sr8 1/1 Running 0 1d
4. The docker-registry-1-5szjs and router-1-vzlzq pods that were running on the now
deprecated node are no longer available. Instead, two new pods have been created: docker-
registry-1-dprp5 and router-1-2gshr. As shown above, the new router pod is router-1-2gshr,
but is in the Pending state. This is because every node can be running only on one single router
and is bound to the ports 80 and 443 of the host.
5. When observing the newly created registry pod, the example below shows that the pod has
been created on the ocp-infra-node-rghb node, which is different from the deprecating node:
The only difference between deprecating the infrastructure and the application node is that
55
OpenShift Container Platform 3.11 Day Two Operations Guide
The only difference between deprecating the infrastructure and the application node is that
once the infrastructure node is evacuated, and if there is no plan to replace that node, the
services running on infrastructure nodes can be scaled down:
6. Now, every infrastructure node is running only one kind of each pod:
$ oc get pods
NAME READY STATUS RESTARTS AGE
docker-registry-1-kr8jq 1/1 Running 0 1d
docker-registry-1-ncpl2 1/1 Running 0 1d
registry-console-1-g4nqg 1/1 Running 0 1d
router-1-85qm4 1/1 Running 0 1d
router-1-q5sr8 1/1 Running 0 1d
NOTE
$ oc get nodes
NAME STATUS AGE VERSION
ocp-infra-node-b7pl Ready,SchedulingDisabled 1d v1.6.1+5115d708d7
ocp-infra-node-p5zj Ready 1d v1.6.1+5115d708d7
ocp-infra-node-rghb Ready 1d v1.6.1+5115d708d7
ocp-master-dgf8 Ready,SchedulingDisabled 1d v1.6.1+5115d708d7
ocp-master-q1v2 Ready,SchedulingDisabled 1d v1.6.1+5115d708d7
ocp-master-vq70 Ready,SchedulingDisabled 1d v1.6.1+5115d708d7
ocp-node-020m Ready 1d v1.6.1+5115d708d7
ocp-node-7t5p Ready 1d v1.6.1+5115d708d7
ocp-node-n0dd Ready 1d v1.6.1+5115d708d7
56
CHAPTER 5. HOST-LEVEL TASKS
failure-domain.beta.kubernetes.io/zone=europe-west3-c
kubernetes.io/hostname=ocp-infra-node-b7pl
role=infra
Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Wed, 22 Nov 2017 09:36:36 -0500
Phase:
Conditions:
...
Addresses: 10.156.0.11,ocp-infra-node-b7pl
Capacity:
cpu: 2
memory: 7494480Ki
pods: 20
Allocatable:
cpu: 2
memory: 7392080Ki
pods: 20
System Info:
Machine ID: bc95ccf67d047f2ae42c67862c202e44
System UUID: 9762CC3D-E23C-AB13-B8C5-FA16F0BCCE4C
Boot ID: ca8bf088-905d-4ec0-beec-8f89f4527ce4
Kernel Version: 3.10.0-693.5.2.el7.x86_64
OS Image: Employee SKU
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.6.1+5115d708d7
Kube-Proxy Version: v1.6.1+5115d708d7
ExternalID: 437740049672994824
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
0 (0%) 0 (0%) 0 (0%) 0 (0%)
Events: <none>
8. Remove the infrastructure instance from the backend section in the /etc/haproxy/haproxy.cfg
configuration file:
backend router80
balance source
mode tcp
server infra-1.example.com 192.168.55.12:80 check
server infra-2.example.com 192.168.55.13:80 check
backend router443
balance source
mode tcp
server infra-1.example.com 192.168.55.12:443 check
server infra-2.example.com 192.168.55.13:443 check
57
OpenShift Container Platform 3.11 Day Two Operations Guide
10. Remove the node from the cluster after all pods are evicted with command:
$ oc get nodes
NAME STATUS AGE VERSION
ocp-infra-node-p5zj Ready 1d v1.6.1+5115d708d7
ocp-infra-node-rghb Ready 1d v1.6.1+5115d708d7
ocp-master-dgf8 Ready,SchedulingDisabled 1d v1.6.1+5115d708d7
ocp-master-q1v2 Ready,SchedulingDisabled 1d v1.6.1+5115d708d7
ocp-master-vq70 Ready,SchedulingDisabled 1d v1.6.1+5115d708d7
ocp-node-020m Ready 1d v1.6.1+5115d708d7
ocp-node-7t5p Ready 1d v1.6.1+5115d708d7
ocp-node-n0dd Ready 1d v1.6.1+5115d708d7
NOTE
For more information on evacuating and draining pods or nodes, see Node maintenance
section.
In the event that a node would need to be added in place of the deprecated node, follow the Adding
hosts to an existing cluster section.
The backup process is to be performed before any change to the infrastructure, such as a system
update, upgrade, or any other significant modification. Backups should be performed on a regular basis
to ensure the most recent data is available if a failure occurs.
Node instances run applications in the form of pods, which are based on containers. The /etc/origin/ and
/etc/origin/node directories house important files, such as:
58
CHAPTER 5. HOST-LEVEL TASKS
The OpenShift Container Platform services can be customized to increase the log level, use proxies,
and more, and the configuration files are stored in the /etc/sysconfig directory.
Procedure
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo cp -aR /etc/origin ${MYBACKUPDIR}/etc
$ sudo cp -aR /etc/sysconfig/atomic-openshift-node ${MYBACKUPDIR}/etc/sysconfig/
2. OpenShift Container Platform uses specific files that must be taken into account when planning
the backup policy, including:
File Description
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}/etc/sysconfig
$ sudo mkdir -p ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors
$ sudo cp -aR /etc/sysconfig/{iptables,docker-*,flanneld} \
${MYBACKUPDIR}/etc/sysconfig/
$ sudo cp -aR /etc/dnsmasq* /etc/cni ${MYBACKUPDIR}/etc/
$ sudo cp -aR /etc/pki/ca-trust/source/anchors/* \
${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/
59
OpenShift Container Platform 3.11 Day Two Operations Guide
NOTE
If using Red Hat Satellite features, such as content views or the facts store,
provide a proper mechanism to reinstall the missing packages and a historical
data of packages installed in the systems.
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo mkdir -p ${MYBACKUPDIR}
$ rpm -qa | sort | sudo tee $MYBACKUPDIR/packages.txt
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo find ${MYBACKUPDIR} -mindepth 1 -type f -printf '%P\n'
etc/sysconfig/atomic-openshift-node
etc/sysconfig/flanneld
etc/sysconfig/iptables
etc/sysconfig/docker-network
etc/sysconfig/docker-storage
etc/sysconfig/docker-storage-setup
etc/sysconfig/docker-storage-setup.rpmnew
etc/origin/node/system:node:app-node-0.example.com.crt
etc/origin/node/system:node:app-node-0.example.com.key
etc/origin/node/ca.crt
etc/origin/node/system:node:app-node-0.example.com.kubeconfig
etc/origin/node/server.crt
etc/origin/node/server.key
etc/origin/node/node-dnsmasq.conf
etc/origin/node/resolv.conf
etc/origin/node/node-config.yaml
etc/origin/node/flannel.etcd-client.key
etc/origin/node/flannel.etcd-client.csr
etc/origin/node/flannel.etcd-client.crt
etc/origin/node/flannel.etcd-ca.crt
etc/origin/cloudprovider/openstack.conf
etc/pki/ca-trust/source/anchors/openshift-ca.crt
etc/pki/ca-trust/source/anchors/registry-ca.crt
etc/dnsmasq.conf
etc/dnsmasq.d/origin-dns.conf
etc/dnsmasq.d/origin-upstream-dns.conf
etc/dnsmasq.d/node-dnsmasq.conf
packages.txt
$ MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
$ sudo tar -zcvf /backup/$(hostname)-$(date +%Y%m%d).tar.gz $MYBACKUPDIR
$ sudo rm -Rf ${MYBACKUPDIR}
60
CHAPTER 5. HOST-LEVEL TASKS
To create any of these files from scratch, the openshift-ansible-contrib repository contains the
backup_master_node.sh script, which performs the previous steps. The script creates a directory on
the host running the script and copies all the files previously mentioned.
NOTE
The openshift-ansible-contrib script is not supported by Red Hat, but the reference
architecture team performs testing to ensure the code operates as defined and is secure.
$ mkdir ~/git
$ cd ~/git
$ git clone https://github.com/openshift/openshift-ansible-contrib.git
$ cd openshift-ansible-contrib/reference-architecture/day2ops/scripts
$ ./backup_master_node.sh -h
Procedure
# MYBACKUPDIR=/backup/$(hostname)/$(date +%Y%m%d)
# cp /etc/origin/node/node-config.yaml /etc/origin/node/node-config.yaml.old
# cp /backup/$(hostname)/$(date +%Y%m%d)/etc/origin/node/node-config.yaml
/etc/origin/node/node-config.yaml
# reboot
WARNING
Restarting the services can lead to downtime. See Node maintenance, for tips on
how to ease the process.
NOTE
Perform a full reboot of the affected instance to restore the iptables configuration.
1. If you cannot restart OpenShift Container Platform because packages are missing, reinstall the
packages.
61
OpenShift Container Platform 3.11 Day Two Operations Guide
> ansible-2.4.0.0-5.el7.noarch
1 Replace <packages> with the packages that are different between the package lists.
$ MYBACKUPDIR=*/backup/$(hostname)/$(date +%Y%m%d)*
$ sudo cp ${MYBACKUPDIR}/etc/pki/ca-trust/source/anchors/<certificate> /etc/pki/ca-
trust/source/anchors/
$ sudo update-ca-trust
Replace <certificate> with the file name of the system certificate to restore.
NOTE
Always ensure proper user ID and group ID are restored when the files are copied
back, as well as the SELinux context.
A node can reserve a portion of its resources to be used by specific components. These include the
kubelet, kube-proxy, Docker, or other remaining system components such as sshd and
NetworkManager. See the Allocating node resources section in the Cluster Administrator guide for
more information.
OpenShift Container Platform versions prior to 3.5 use etcd version 2 (v2), while 3.5 and later use
version 3 (v3). The data model between the two versions of etcd is different. etcd v3 can use both the
v2 and v3 data models, whereas etcd v2 can only use the v2 data model. In an etcd v3 server, the v2 and
62
CHAPTER 5. HOST-LEVEL TASKS
For both v2 and v3 operations, you can use the ETCDCTL_API environment variable to use the correct
API:
$ etcdctl -v
etcdctl version: 3.2.28
API version: 2
See Migrating etcd Data (v2 to v3) section in the OpenShift Container Platform 3.7 documentation for
information about how to migrate to v3.
In OpenShift Container Platform version 3.10 and later, you can either install etcd on separate hosts or
run it as a static pod on your master hosts. If you do not specify separate etcd hosts, etcd runs as a
static pod on master hosts. Because of this difference, the backup process is different if you use static
pods.
You can perform the data backup process on any host that has connectivity to the etcd cluster, where
the proper certificates are provided, and where the etcdctl tool is installed.
NOTE
The backup files must be copied to an external system, ideally outside the OpenShift
Container Platform environment, and then encrypted.
Note that the etcd backup still has all the references to current storage volumes. When you restore etcd,
OpenShift Container Platform starts launching the previous pods on nodes and reattaching the same
storage. This process is no different than the process of when you remove a node from the cluster and
add a new one back in its place. Anything attached to that node is reattached to the pods on whatever
nodes they are rescheduled to.
When you back up etcd, you must back up both the etcd configuration files and the etcd data.
The etcd configuration files to be preserved are all stored in the /etc/etcd directory of the instances
where etcd is running. This includes the etcd configuration file (/etc/etcd/etcd.conf) and the required
certificates for cluster communication. All those files are generated at installation time by the Ansible
installer.
Procedure
For each etcd member of the cluster, back up the etcd configuration.
63
OpenShift Container Platform 3.11 Day Two Operations Guide
$ ssh master-0 1
# mkdir -p /backup/etcd-config-$(date +%Y%m%d)/
# cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/
NOTE
The certificates and configuration files on each etcd cluster member are unique.
Prerequisites
NOTE
The OpenShift Container Platform installer creates aliases to avoid typing all the flags
named etcdctl2 for etcd v2 tasks and etcdctl3 for etcd v3 tasks.
However, the etcdctl3 alias does not provide the full endpoint list to the etcdctl
command, so you must specify the --endpoints option and list all the endpoints.
etcdctl binaries must be available or, in containerized installations, the rhel7/etcd container
must be available.
Procedure
NOTE
While the etcdctl backup command is used to perform the backup, etcd v3 has no
concept of a backup. Instead, you either take a snapshot from a live member with the
etcdctl snapshot save command or copy the member/snap/db file from an etcd data
directory.
The etcdctl backup command rewrites some of the metadata contained in the backup,
specifically, the node ID and cluster ID, which means that in the backup, the node loses its
former identity. To recreate a cluster from the backup, you create a new, single-node
cluster, then add the rest of the nodes to the cluster. The metadata is rewritten to
prevent the new node from joining an existing cluster.
IMPORTANT
64
CHAPTER 5. HOST-LEVEL TASKS
IMPORTANT
1. Obtain the etcd endpoint IP address from the static pod manifest:
$ export ETCD_POD_MANIFEST="/etc/origin/node/pods/etcd.yaml"
2. Log in as an administrator:
$ oc login -u system:admin
$ oc project kube-system
5. Take a snapshot of the etcd data in the pod and store it locally:
If an etcd host has become corrupted and the /etc/etcd/etcd.conf file is lost, restore it using the
following procedure:
$ ssh master-0 1
# cp /backup/etcd-config-<timestamp>/etcd/etcd.conf /etc/etcd/etcd.conf
65
OpenShift Container Platform 3.11 Day Two Operations Guide
After the etcd configuration file is restored, you must restart the static pod. This is done after you
restore the etcd data.
etcdctl binaries must be available or, in containerized installations, the rhel7/etcd container
must be available.
You can install the etcdctl binary with the etcd package by running the following commands:
The package also installs the systemd service. Disable and mask the service so that it does not
run as a systemd service when etcd runs in static pod. By disabling and masking the service, you
ensure that you do not accidentally start it and prevent it from automatically restarting when you
reboot the system.
1. If the pod is running, stop the etcd pod by moving the pod manifest YAML file to another
directory:
# mkdir -p /etc/origin/node/pods-stopped
# mv /etc/origin/node/pods/etcd.yaml /etc/origin/node/pods-stopped
# mv /var/lib/etcd /var/lib/etcd.old
You use the etcdctl to recreate the data in the node where you restore the pod.
3. Restore the etcd snapshot to the mount path for the etcd pod:
# export ETCDCTL_API=3
66
CHAPTER 5. HOST-LEVEL TASKS
--initial-cluster-token "etcd-cluster-1" \
--initial-advertise-peer-urls https://172.18.3.48:2380 \
--skip-hash-check=true
Obtain the appropriate values for your cluster from your backup etcd.conf file.
5. Restart the etcd pod by moving the pod manifest YAML file to the required directory:
# mv /etc/origin/node/pods-stopped/etcd.yaml /etc/origin/node/pods/
WARNING
The etcd cluster must maintain a quorum during the replacement operation. This
means that at least one host must be in operation at all times.
If the host replacement operation occurs while the etcd cluster maintains a quorum,
cluster operations are usually not affected. If a large amount of etcd data must
replicate, some operations might slow down.
NOTE
Before you start any procedure involving the etcd cluster, you must have a backup of the
etcd data and configuration files so that you can restore the cluster if the procedure fails.
NOTE
67
OpenShift Container Platform 3.11 Day Two Operations Guide
NOTE
Due to the voting system etcd uses, the cluster must always contain an odd number of
members.
Having a cluster with an odd number of etcd hosts can account for fault tolerance. Having
an odd number of etcd hosts does not change the number needed for a quorum but
increases the tolerance for failure. For example, with a cluster of three members, quorum
is two, which leaves a failure tolerance of one. This ensures the cluster continues to
operate if two of the members are healthy.
The new host requires a fresh Red Hat Enterprise Linux version 7 dedicated host. The etcd storage
should be located on an SSD disk to achieve maximum performance and on a dedicated disk mounted in
/var/lib/etcd.
Prerequisites
1. Before you add a new etcd host, perform a backup of both etcd configuration and data to
prevent data loss.
2. Check the current etcd cluster status to avoid adding new hosts to an unhealthy cluster. Run this
command:
3. Before running the scaleup playbook, ensure the new host is registered to the proper Red Hat
software channels:
# subscription-manager register \
--username=*<username>* --password=*<password>*
# subscription-manager attach --pool=*<poolid>*
# subscription-manager repos --disable="*"
# subscription-manager repos \
--enable=rhel-7-server-rpms \
--enable=rhel-7-server-extras-rpms
4. Make sure all unused etcd members are removed from the etcd cluster. This must be completed
before running the scaleup playbook.
68
CHAPTER 5. HOST-LEVEL TASKS
7. If the new etcd members will also be OpenShift Container Platform nodes, add the desired
number of hosts to the cluster.
8. The rest of this procedure assumes you added one host, but if you add multiple hosts, perform
all steps on each host.
Procedure
1. In the Ansible inventory file, create a new group named [new_etcd] and add the new host. Then,
add the new_etcd group as a child of the [OSEv3] group:
[OSEv3:children]
masters
nodes
etcd
new_etcd 1
[etcd]
master-0.example.com
master-1.example.com
master-2.example.com
[new_etcd] 2
etcd0.example.com 3
NOTE
69
OpenShift Container Platform 3.11 Day Two Operations Guide
NOTE
Replace the old etcd host entry with the new etcd host entry in the inventory
file. While replacing the older etcd host, you must create a copy of /etc/etcd/ca/
directory. Alternatively, you can redeploy etcd ca and certs before scaling up the
etcd hosts.
2. From the host that installed OpenShift Container Platform and hosts the Ansible inventory file,
change to the playbook directory and run the etcd scaleup playbook:
$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook playbooks/openshift-etcd/scaleup.yml
3. After the playbook runs, modify the inventory file to reflect the current status by moving the
new etcd host from the [new_etcd] group to the [etcd] group:
[OSEv3:children]
masters
nodes
etcd
new_etcd
[etcd]
master-0.example.com
master-1.example.com
master-2.example.com
etcd0.example.com
4. If you use Flannel, modify the flanneld service configuration on every OpenShift Container
Platform host, located at /etc/sysconfig/flanneld, to include the new etcd host:
FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-
1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
If you do not run etcd as static pods on master nodes, you might need to add another etcd host.
Procedure
Modify the current etcd cluster
To create the etcd certificates, run the openssl command, replacing the values with those from your
environment.
export NEW_ETCD_HOSTNAME="*etcd0.example.com*"
export NEW_ETCD_IP="192.168.55.21"
70
CHAPTER 5. HOST-LEVEL TASKS
export CN=$NEW_ETCD_HOSTNAME
export SAN="IP:${NEW_ETCD_IP}, DNS:${NEW_ETCD_HOSTNAME}"
export PREFIX="/etc/etcd/generated_certs/etcd-$CN/"
export OPENSSLCFG="/etc/etcd/ca/openssl.cnf"
NOTE
# mkdir -p ${PREFIX}
3. Create the server certificate request and sign it: (server.csr and server.crt)
4. Create the peer certificate request and sign it: (peer.csr and peer.crt)
5. Copy the current etcd configuration and ca.crt files from the current node as examples to
modify later:
# cp /etc/etcd/etcd.conf ${PREFIX}
# cp /etc/etcd/ca.crt ${PREFIX}
6. While still on the surviving etcd host, add the new host to the cluster. To add additional etcd
members to the cluster, you must first adjust the default localhost peer in the peerURLs value
for the first member:
a. Get the member ID for the first member using the member list command:
71
OpenShift Container Platform 3.11 Day Two Operations Guide
# etcdctl --cert-file=/etc/etcd/peer.crt \
--key-file=/etc/etcd/peer.key \
--ca-file=/etc/etcd/ca.crt \
--peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379"
\ 1
member list
1 Ensure that you specify the URLs of only active etcd members in the --peers
parameter value.
c. Update the value of peerURLs using the etcdctl member update command by passing the
member ID and IP address obtained from the previous steps:
# etcdctl --cert-file=/etc/etcd/peer.crt \
--key-file=/etc/etcd/peer.key \
--ca-file=/etc/etcd/ca.crt \
--peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379"
\
member update 511b7fb6cc0001 https://172.18.1.18:2380
d. Re-run the member list command and ensure the peer URLs no longer include localhost.
7. Add the new host to the etcd cluster. Note that the new host is not yet configured, so the status
stays as unstarted until the you configure the new host.
WARNING
You must add each member and bring it online one at a time. When you add
each additional member to the cluster, you must adjust the peerURLs list
for the current peers. The peerURLs list grows by one for each member
added. The etcdctl member add command outputs the values that you
must set in the etcd.conf file as you add each member, as described in the
following instructions.
# etcdctl -C https://${CURRENT_ETCD_HOST}:2379 \
--ca-file=/etc/etcd/ca.crt \
--cert-file=/etc/etcd/peer.crt \
--key-file=/etc/etcd/peer.key member add ${NEW_ETCD_HOSTNAME}
https://${NEW_ETCD_IP}:2380 1
ETCD_NAME="<NEW_ETCD_HOSTNAME>"
ETCD_INITIAL_CLUSTER="<NEW_ETCD_HOSTNAME>=https://<NEW_HOST_IP>:2380,
72
CHAPTER 5. HOST-LEVEL TASKS
<CLUSTERMEMBER1_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,
<CLUSTERMEMBER2_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,
<CLUSTERMEMBER3_NAME>=https:/<CLUSTERMEMBER3_IP>:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
1 In this line, 10.3.9.222 is a label for the etcd member. You can specify the host name, IP
address, or a simple name.
a. Replace the following values with the values generated in the previous step:
ETCD_NAME
ETCD_INITIAL_CLUSTER
ETCD_INITIAL_CLUSTER_STATE
b. Modify the following variables with the new host IP from the output of the previous step.
You can use ${NEW_ETCD_IP} as the value.
ETCD_LISTEN_PEER_URLS
ETCD_LISTEN_CLIENT_URLS
ETCD_INITIAL_ADVERTISE_PEER_URLS
ETCD_ADVERTISE_CLIENT_URLS
c. If you previously used the member system as an etcd node, you must overwrite the current
values in the /etc/etcd/etcd.conf file.
d. Check the file for syntax errors or missing IP addresses, otherwise the etcd service might
fail:
# vi ${PREFIX}/etcd.conf
9. On the node that hosts the installation files, update the [etcd] hosts group in the
/etc/ansible/hosts inventory file. Remove the old etcd hosts and add the new ones.
10. Create a tgz file that contains the certificates, the sample configuration file, and the ca and
copy it to the new host:
1. Install iptables-services to provide iptables utilities to open the required ports for etcd:
73
OpenShift Container Platform 3.11 Day Two Operations Guide
NOTE
WARNING
3. Install etcd:
4. Ensure the etcd service is not running by removing the etcd pod definition:
# mkdir -p /etc/origin/node/pods-stopped
# mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/
# rm -Rf /etc/etcd/*
# rm -Rf /var/lib/etcd/*
8. Verify that the host is part of the cluster and the current cluster health:
74
CHAPTER 5. HOST-LEVEL TASKS
# etcdctl --cert-file=/etc/etcd/peer.crt \
--key-file=/etc/etcd/peer.key \
--ca-file=/etc/etcd/ca.crt \
--peers="https://*master-0.example.com*:2379,\
https://*master-1.example.com*:2379,\
https://*master-2.example.com*:2379,\
https://*etcd0.example.com*:2379"\
cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
member 8b8904727bf526a5 is healthy: got healthy result from
https://192.168.55.21:2379
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy
etcdClientInfo:
ca: master.etcd-ca.crt
certFile: master.etcd-client.crt
keyFile: master.etcd-client.key
urls:
- https://master-0.example.com:2379
- https://master-1.example.com:2379
- https://master-2.example.com:2379
- https://etcd0.example.com:2379
On every master:
75
OpenShift Container Platform 3.11 Day Two Operations Guide
# master-restart api
# master-restart controllers
WARNING
The number of etcd nodes must be odd, so you must add at least two
hosts.
3. If you use Flannel, modify the flanneld service configuration located at /etc/sysconfig/flanneld
on every OpenShift Container Platform host to include the new etcd host:
FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-
1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
Procedure
1. Remove each other etcd host from the etcd cluster. Run the following command for each etcd
node:
Procedure
# etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
failed to check the health of member 8372784203e11288 on https://192.168.55.21:2379: Get
https://192.168.55.21:2379/health: dial tcp 192.168.55.21:2379: getsockopt: connection
76
CHAPTER 5. HOST-LEVEL TASKS
refused
member 8372784203e11288 is unreachable: [https://192.168.55.21:2379] are all
unreachable
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy
# etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy
1 The remove command requires the etcd ID, not the hostname.
2. To ensure the etcd configuration does not use the failed host when the etcd service is restarted,
modify the /etc/etcd/etcd.conf file on all remaining etcd hosts and remove the failed host in the
value for the ETCD_INITIAL_CLUSTER variable:
# vi /etc/etcd/etcd.conf
For example:
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-
1.example.com=https://192.168.55.12:2380,master-
2.example.com=https://192.168.55.13:2380
becomes:
ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-
1.example.com=https://192.168.55.12:2380
NOTE
Restarting the etcd services is not required, because the failed host is removed
using etcdctl.
3. Modify the Ansible inventory file to reflect the current status of the cluster and to avoid issues
when re-running a playbook:
[OSEv3:children]
masters
nodes
etcd
[etcd]
master-0.example.com
master-1.example.com
4. If you are using Flannel, modify the flanneld service configuration located at
77
OpenShift Container Platform 3.11 Day Two Operations Guide
4. If you are using Flannel, modify the flanneld service configuration located at
/etc/sysconfig/flanneld on every host and remove the etcd host:
FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-
1.example.com:2379,https://master-2.example.com:2379
78
CHAPTER 6. PROJECT-LEVEL TASKS
IMPORTANT
Because the oc get all command returns only certain project resources, you must
separately back up other resources, including PVCs and Secrets, as shown in the
following steps.
Procedure
$ oc get all
Example Output
79
OpenShift Container Platform 3.11 Day Two Operations Guide
3. Export other objects in your project, such as role bindings, secrets, service accounts, and
persistent volume claims.
You can export all namespaced objects in your project using the following command:
Note that some resources cannot be exported, and a MethodNotAllowed error is displayed.
4. Some exported objects can rely on specific metadata or references to unique IDs in the project.
This is a limitation on the usability of the recreated objects.
When using imagestreams, the image parameter of a deploymentconfig can point to a
specific sha checksum of an image in the internal registry that would not exist in a restored
environment. For instance, running the sample "ruby-ex" as oc new-app centos/ruby-22-
centos7~https://github.com/sclorg/ruby-ex.git creates an imagestream ruby-ex using the
internal registry to host the image:
If importing the deploymentconfig as it is exported with oc get --export it fails if the image
does not exist.
Procedure
$ oc new-project <project_name> 1
1 This <project_name> value must match the name of the project that was backed up.
$ oc create -f project.yaml
3. Import any other resources that you exported when backing up the project, such as role
bindings, secrets, service accounts, and persistent volume claims:
$ oc create -f <object>.yaml
Some resources might fail to import if they require another object to exist. If this occurs, review
80
CHAPTER 6. PROJECT-LEVEL TASKS
Some resources might fail to import if they require another object to exist. If this occurs, review
the error message to identify which resources must be imported first.
WARNING
Some resources, such as pods and default service accounts, can fail to be created.
IMPORTANT
Consult any product documentation for the correct backup procedures of specific applications. For
example, copying the mysql data directory itself does not create a usable backup. Instead, run the
specific backup procedures of the associated application and then synchronize any data. This includes
using snapshot solutions provided by the OpenShift Container Platform hosting platform.
Procedure
$ oc get pods
NAME READY STATUS RESTARTS AGE
demo-1-build 0/1 Completed 0 2h
demo-2-fxx6d 1/1 Running 0 1h
2. Describe the desired pod to find the volumes that are currently used by a persistent volume:
81
OpenShift Container Platform 3.11 Day Two Operations Guide
Image: docker-
registry.default.svc:5000/test/demo@sha256:0a9f2487a0d95d51511e49d20dc9ff6f350436f935
968b0c83fcb98a7a8c381a
Image ID: docker-pullable://docker-
registry.default.svc:5000/test/demo@sha256:0a9f2487a0d95d51511e49d20dc9ff6f350436f935
968b0c83fcb98a7a8c381a
Port: 8080/TCP
State: Running
Started: Tue, 05 Dec 2017 12:54:52 -0500
Ready: True
Restart Count: 0
Volume Mounts:
*/opt/app-root/src/uploaded from persistent-volume (rw)*
/var/run/secrets/kubernetes.io/serviceaccount from default-token-8mmrk (ro)
Environment Variables: <none>
...omitted...
This output shows that the persistent data is in the /opt/app-root/src/uploaded directory.
The ocp_sop.txt file is downloaded to the local system to be backed up by backup software or
another backup mechanism.
NOTE
You can also use the previous steps if a pod starts without needing to use a pvc,
but you later decide that a pvc is necessary. You can preserve the data and then
use the restorate process to populate the new storage.
Consult any product documentation for the correct restoration procedures for specific applications.
$ oc rsh demo-2-fxx6d
82
CHAPTER 6. PROJECT-LEVEL TASKS
sh-4.2$ ls */opt/app-root/src/uploaded/*
lost+found ocp_sop.txt
sh-4.2$ *rm -rf /opt/app-root/src/uploaded/ocp_sop.txt*
sh-4.2$ *ls /opt/app-root/src/uploaded/*
lost+found
2. Replace the file from the server that contains the rsync backup of the files that were in the pvc:
3. Validate that the file is back on the pod by using oc rsh to connect to the pod and view the
contents of the directory:
$ oc rsh demo-2-fxx6d
sh-4.2$ *ls /opt/app-root/src/uploaded/*
lost+found ocp_sop.txt
Procedure
$ oc describe dc/demo
Name: demo
Namespace: test
Created: 3 hours ago
Labels: app=demo
Annotations: openshift.io/generated-by=OpenShiftNewApp
Latest Version: 3
Selector: app=demo,deploymentconfig=demo
Replicas: 1
Triggers: Config, Image(demo@latest, auto=true)
Strategy: Rolling
Template:
Labels: app=demo
deploymentconfig=demo
Annotations: openshift.io/container.demo.image.entrypoint=["container-
entrypoint","/bin/sh","-c","$STI_SCRIPTS_PATH/usage"]
openshift.io/generated-by=OpenShiftNewApp
Containers:
demo:
Image: docker-
registry.default.svc:5000/test/demo@sha256:0a9f2487a0d95d51511e49d20dc9ff6f350436f935
968b0c83fcb98a7a8c381a
Port: 8080/TCP
83
OpenShift Container Platform 3.11 Day Two Operations Guide
Volume Mounts:
/opt/app-root/src/uploaded from persistent-volume (rw)
Environment Variables: <none>
Volumes:
persistent-volume:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same
namespace)
*ClaimName: filestore*
ReadOnly: false
...omitted...
3. Now that the deployment configuration uses the new pvc, run oc rsync to place the files onto
the new pvc:
4. Validate that the file is back on the pod by using oc rsh to connect to the pod and view the
contents of the directory:
$ oc rsh demo-3-2b8gs
sh-4.2$ ls /opt/app-root/src/uploaded/
lost+found ocp_sop.txt
84
CHAPTER 7. DOCKER TASKS
As a cluster administrator, sometimes container engines requires some extra configuration in order to
efficiently run elements of the OpenShift Container Platform installation.
1. From a master instance, or as a cluster administrator, allow the evacuation of any pod from the
node and disable scheduling of other pods on that node:
$ NODE=ose-app-node01.example.com
$ oc adm manage-node ${NODE} --schedulable=false
NAME STATUS AGE VERSION
ose-app-node01.example.com Ready,SchedulingDisabled 20m v1.6.1+5115d708d7
NOTE
If there are containers running with local volumes that will not migrate, run the
following command: oc adm drain ${NODE} --ignore-daemonsets --delete-
local-data.
2. List the pods on the node to verify that they have been removed:
For more information on evacuating and draining pods or nodes, see Node maintenance.
85
OpenShift Container Platform 3.11 Day Two Operations Guide
Prerequisites
A new disk must be available to the existing instance that requires more storage. In the following
steps, the original disk is labeled /dev/xvdb, and the new disk is labeled /dev/xvdd, as shown in
the /etc/sysconfig/docker-storage-setup file:
# vi /etc/sysconfig/docker-storage-setup
DEVS="/dev/xvdb /dev/xvdd"
NOTE
Procedure
2. Stop the node service by removing the pod definition and rebooting the host:
# mkdir -p /etc/origin/node/pods-stopped
# mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/
3. Run the docker-storage-setup command to extend the volume groups and logical volumes
associated with container storage:
# docker-storage-setup
a. For thin pool setups, you should see the following output and can proceed to the next step:
b. For XFS setups that use the Overlay2 file system, the increase shown in the previous
output will not be visible.
You must perform the following steps to extend and grow the XFS storage:
i. Run the lvextend command to grow the logical volume to use of all the available space
in the volume group:
NOTE
86
CHAPTER 7. DOCKER TASKS
ii. Run the xfs_growfs command to grow the file system to use the available space:
# xfs_growfs /dev/mapper/docker_vol-dockerlv
NOTE
# docker-storage-setup
You should now see the extended volume groups and logical volumes in the output.
INFO: Device /dev/vdb is already partitioned and is part of volume group docker_vg
INFO: Found an already configured thin pool /dev/mapper/docker_vg-docker--pool in
/etc/sysconfig/docker-storage
INFO: Device node /dev/mapper/docker_vg-docker--pool exists.
Logical volume docker_vg/docker-pool changed.
# mv /etc/origin/node/pods-stopped/* /etc/origin/node/pods/
7. A benefit in adding a disk compared to creating a new volume group and re-running docker-
storage-setup is that the images that were used on the system still exist after the new storage
has been added:
# container images
REPOSITORY TAG IMAGE ID CREATED
SIZE
docker-registry.default.svc:5000/tet/perl latest 8b0b0106fb5e 13
minutes ago 627.4 MB
registry.redhat.io/rhscl/perl-524-rhel7 <none> 912b01ac7570 6 days ago
559.5 MB
registry.redhat.io/openshift3/ose-deployer v3.6.173.0.21 89fd398a337d 5 weeks
ago 970.2 MB
registry.redhat.io/openshift3/ose-pod v3.6.173.0.21 63accd48a0d7 5 weeks
ago 208.6 MB
8. With the increase in storage capacity, enable the node to be schedulable in order to accept new
87
OpenShift Container Platform 3.11 Day Two Operations Guide
8. With the increase in storage capacity, enable the node to be schedulable in order to accept new
incoming pods.
As a cluster administrator, run the following from a master instance:
# mkdir -p /etc/origin/node/pods-stopped
# mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/
4. Resize the existing disk as desired. This can depend on your environment:
# lvremove /dev/docker_vg/docker/lv
# vgremove docker_vg
# pvremove /dev/<my_previous_disk_device>
If you are using a cloud provider, you can detach the disk, destroy the disk, then create a new
bigger disk, and attach it to the instance.
For a non-cloud environment, the disk and file system can be resized. See the following
solution for more information:
https://access.redhat.com/solutions/199573
5. Verify that the /etc/sysconfig/docker-storage-setup file is correctly configured for the new
disk by checking the device name, size, etc.
88
CHAPTER 7. DOCKER TASKS
# docker-storage-setup
INFO: Volume group backing root filesystem could not be determined
INFO: Device /dev/xvdb is already partitioned and is part of volume group docker_vol
INFO: Device node /dev/xvdd1 exists.
Physical volume "/dev/xvdd1" successfully created.
Volume group "docker_vol" successfully extended
# mv /etc/origin/node/pods-stopped/* /etc/origin/node/pods/
1. From a master instance, or as a cluster administrator, allow the evacuation of any pod from the
node and disable scheduling of other pods on that node:
$ NODE=ose-app-node01.example.com
$ oc adm manage-node ${NODE} --schedulable=false
NAME STATUS AGE VERSION
ose-app-node01.example.com Ready,SchedulingDisabled 20m v1.6.1+5115d708d7
NOTE
If there are containers running with local volumes that will not migrate, run the
following command: oc adm drain ${NODE} --ignore-daemonsets --delete-
local-data
89
OpenShift Container Platform 3.11 Day Two Operations Guide
2. List the pods on the node to verify that they have been removed:
For more information on evacuating and draining pods or nodes, see Node maintenance.
3. With no containers currently running on the instance, stop the docker service:
5. Verify the name of the volume group, logical volume name, and physical volume name:
# vgs
VG #PV #LV #SN Attr VSize VFree
docker_vol 1 1 0 wz--n- <25.00g 15.00g
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
dockerlv docker_vol -wi-ao---- <10.00g
# lvremove /dev/docker_vol/docker-pool -y
# vgremove docker_vol -y
# pvs
PV VG Fmt Attr PSize PFree
/dev/xvdb1 docker_vol lvm2 a-- <25.00g 15.00g
# pvremove /dev/xvdb1 -y
# rm -Rf /var/lib/docker/*
# rm -f /etc/sysconfig/docker-storage
NOTE
When a system is upgraded from Red Hat Enterprise Linux version 7.3 to 7.4, the
docker service attempts to use /var with the STORAGE_DRIVER of extfs. The
use of extfs as the STORAGE_DRIVER causes errors. See the following bug for
more info regarding the error:
DEVS=/dev/xvdb
VG=docker_vol
DATA_SIZE=95%VG
STORAGE_DRIVER=overlay2
90
CHAPTER 7. DOCKER TASKS
CONTAINER_ROOT_LV_NAME=dockerlv
CONTAINER_ROOT_LV_MOUNT_PATH=/var/lib/docker
CONTAINER_ROOT_LV_SIZE=100%FREE
# docker-storage-setup
10. With the storage modified to use overlay2, enable the node to be schedulable in order to
accept new incoming pods.
From a master instance, or as a cluster administrator:
WARNING
Docker interprets .crt files as CA certificates and .cert files as client certificates.
Any CA extensions must be .crt.
NOTE
91
OpenShift Container Platform 3.11 Day Two Operations Guide
NOTE
Depending on the Docker version, the process to trust a container image registry varies.
The latest versions of Docker’s root certificate authorities are merged with system
defaults. Prior to docker version 1.13, the system default certificate is used only when no
other custom root certificates exist.
Procedure
2. Extract and add the CA certificate to the list of trusted certificates authorities:
4. Once the certificate is in place and the trust is updated, restart the docker service to ensure the
new certificates are properly set:
For Docker versions prior to 1.13, perform the following additional steps for trusting certificates of
authority:
1. On every node create a new directory in /etc/docker/certs.d where the name of the directory is
the host name of the container image registry:
NOTE
The port number is not required unless the container image registry cannot be
accessed without a port number. Addressing the port to the original Docker
registry is as follows: myregistry.example.com:port
2. Accessing the container image registry via IP address requires the creation of a new directory
within /etc/docker/certs.d on every node where the name of the directory is the IP of the
container image registry:
3. Copy the CA certificate to the newly created Docker directories from the previous steps:
$ sudo cp myregistry.example.com.crt \
/etc/docker/certs.d/myregistry.example.com/ca.crt
92
CHAPTER 7. DOCKER TASKS
4. Once the certificates have been copied, restart the docker service to ensure the new
certificates are used:
Procedure
1. If using /etc/docker/certs.d, copy all the certificates included in the directory and store the files:
2. If using a system trust, store the certificates prior to adding them within the system trust. Once
the store is complete, extract the certificate for restoration using the trust command. Identify
the system trust CAs and note the pkcs11 ID:
$ trust list
...[OUTPUT OMMITED]...
pkcs11:id=%a5%b3%e1%2b%2b%49%b6%d7%73%a1%aa%94%f5%01%e7%73%65%4c%
ac%50;type=cert
type: certificate
label: MyCA
trust: anchor
category: authority
...[OUTPUT OMMITED]...
3. Extract the certificate in pem format and provide it a name. For example, myca.crt.
5. Repeat the procedure for all the required certificates and store the files in a remote location.
You can configure OpenShift Container Platform to use external docker registries to pull images.
93
OpenShift Container Platform 3.11 Day Two Operations Guide
You can configure OpenShift Container Platform to use external docker registries to pull images.
However, you can use configuration files to allow or deny certain images or registries.
If the external registry is exposed using certificates for the network traffic, it can be named as a secure
registry. Otherwise, traffic between the registry and host is plain text and not encrypted, meaning it is an
insecure registry.
NOTE
The ability to search images from the Red Hat Registry registry.redhat.io exists by
default in the Red Hat Enterprise Linux docker package.
Procedure
1. To allow users to search for images using docker search with other registries, add those
registries to the /etc/containers/registries.conf file under the registries parameter:
registries:
- registry.redhat.io
- my.registry.example.com
openshift_docker_additional_registries=registry.redhat.io,my.registry.example.com
Procedure
1. Add the allowed registries to the /etc/containers/registries.conf file with the registries flag:
registries:
- registry.redhat.io
- my.registry.example.com
NOTE
94
CHAPTER 7. DOCKER TASKS
NOTE
block_registries:
- all
BLOCK_REGISTRY='--block-registry=all'
5. In this example, the docker.io registry has been blacklisted, so any operation regarding that
registry fails:
Add docker.io back to the registries variable by modifying the file again and restarting the
service.
registries:
- registry.redhat.io
- my.registry.example.com
- docker.io
block_registries:
- all
or
ADD_REGISTRY="--add-registry=registry.redhat.io --add-registry=my.registry.example.com -
-add-registry=docker.io"
BLOCK_REGISTRY='--block-registry=all'
95
OpenShift Container Platform 3.11 Day Two Operations Guide
8. If using an external registry is required, for example to modify the docker daemon configuration
file in all the node hosts that require to use that registry, create a blacklist on those nodes to
avoid malicious containers from being executed.
Using the Ansible installer, this can be configured using the
openshift_docker_additional_registries and openshift_docker_blocked_registries
variables in the Ansible hosts file:
openshift_docker_additional_registries=registry.redhat.io,my.registry.example.com
openshift_docker_blocked_registries=all
In order to do so, see the Installing a Certificate Authority Certificate for External Registries section.
If using a whitelist, the external registries should be added to the registries variable, as explained above.
However, any insecure registries should be added using the --insecure-registry option to allow for the
docker daemon to pull images from the repository. This is the same as the --add-registry option, but
the docker operation is not verified.
The registry should be added using both options to enable search, and, if there is a blacklist, to perform
other operations, such as pulling images.
For testing purposes, an example is shown on how to add a localhost insecure registry.
Procedure
[registries.search]
registries = ['registry.redhat.io', 'my.registry.example.com', 'docker.io', 'localhost:5000' ]
[registries.insecure]
registries = ['localhost:5000']
[registries.block]
registries = ['all']
96
CHAPTER 7. DOCKER TASKS
NOTE
4. Pull an image:
openshift_docker_additional_registries=registry.redhat.io,my.registry.example.com,localhost:500
0
openshift_docker_insecure_registries=localhost:5000
openshift_docker_blocked_registries=all
NOTE
97
OpenShift Container Platform 3.11 Day Two Operations Guide
If an external docker registry requires authentication, create a special secret in the project that uses
that registry and then use that secret to perform the docker operations.
Procedure
1. Create a dockercfg secret in the project where the user is going to log in to the docker registry:
$ oc project <my_project>
$ oc create secret docker-registry <my_registry> --docker-server=
<my.registry.example.com> --docker-username=<username> --docker-password=
<my_password> --docker-email=<[email protected]>
4. Use the dockercfg secret to pull images from the authenticated registry by linking the secret to
the service account performing the pull operations. The default service account to pull images is
named default:
5. For pushing images using the S2I feature, the dockercfg secret is mounted in the S2I pod, so it
needs to be linked to the proper service account that performs the build. The default service
account used to build images is named builder.
6. In the buildconfig, the secret should be specified for push or pull operations:
"type": "Source",
"sourceStrategy": {
"from": {
"kind": "DockerImage",
"name": "*my.registry.example.com*/myproject/myimage:stable"
},
"pullSecret": {
"name": "*mydockerregistry*"
},
...[OUTPUT ABBREVIATED]...
"output": {
"to": {
"kind": "DockerImage",
"name": "*my.registry.example.com*/myproject/myimage:latest"
},
"pushSecret": {
98
CHAPTER 7. DOCKER TASKS
"name": "*mydockerregistry*"
},
...[OUTPUT ABBREVIATED]...
7. If the external registry delegates authentication to external services, create both dockercfg
secrets: the registry one using the registry URL and the external authentication system using its
own URL. Both secrets should be added to the service accounts.
$ oc project <my_project>
$ oc create secret docker-registry <my_registry> --docker-server=*
<my_registry_example.com> --docker-username=<username> --docker-password=
<my_password> --docker-email=<[email protected]>
$ oc create secret docker-registry <my_docker_registry_ext_auth> --docker-server=
<my.authsystem.example.com> --docker-username=<username> --docker-password=
<my_password> --docker-email=<[email protected]>
$ oc secrets link default <my_registry> --for=pull
$ oc secrets link default <my_docker_registry_ext_auth> --for=pull
$ oc secrets link builder <my_registry>
$ oc secrets link builder <my_docker_registry_ext_auth>
Image resolution: force pods to run with immutable digests to ensure the image does not
change due to a re-tag
Container image label restrictions: force an image to have or not have particular labels
Image annotation restrictions: force an image in the integrated container registry to have or not
have particular annotations
WARNING
Procedure
1. If the ImagePolicy plug-in is enabled, it needs to be modified to allow the external registries to
be used by modifying the /etc/origin/master/master-config.yaml file on every master node:
admissionConfig:
pluginConfig:
openshift.io/ImagePolicy:
configuration:
99
OpenShift Container Platform 3.11 Day Two Operations Guide
kind: ImagePolicyConfig
apiVersion: v1
executionRules:
- name: allow-images-from-other-registries
onResources:
- resource: pods
- resource: builds
matchRegistries:
- docker.io
- <my.registry.example.com>
- registry.redhat.io
NOTE
openshift_master_admission_plugin_config={"openshift.io/ImagePolicy":{"configuration":
{"kind":"ImagePolicyConfig","apiVersion":"v1","executionRules":[{"name":"allow-images-from-
other-registries","onResources":[{"resource":"pods"},{"resource":"builds"}],"matchRegistries":
["docker.io","*my.registry.example.com*","registry.redhat.io"]}]}}}
Procedure
1. To configure the allowed registries where users can import images, add the following to the
/etc/origin/master/master-config.yaml file:
imagePolicyConfig:
allowedRegistriesForImport:
- domainName: docker.io
- domainName: '\*.docker.io'
- domainName: '*.redhat.com'
- domainName: 'my.registry.example.com'
2. To import images from an external authenticated registry, create a secret within the desired
project.
3. Even if not recommended, if the external authenticated registry is insecure or the certificates
can not be trusted, the oc import-image command can be used with the --insecure=true
option.
If the external authenticated registry is secure, the registry certificate should be trusted in the
master hosts as they run the registry import controller as:
100
CHAPTER 7. DOCKER TASKS
$ sudo update-ca-trust
6. The certificate for the external registry should be trusted in the OpenShift Container Platform
registry:
WARNING
This workaround creates configmaps with all the trusted certificates from
the system running those commands, so the recommendation is to run it
from a clean system where just the required certificates are trusted.
7. Alternatively, modify the registry image in order to trust the proper certificates rebuilding the
image using a Dockerfile as:
FROM registry.redhat.io/openshift3/ose-docker-registry:v3.6
ADD <my.registry.example.com.crt> /etc/pki/ca-trust/source/anchors/
USER 0
RUN update-ca-trust extract
USER 1001
8. Rebuild the image, push it to a docker registry, and use that image as
spec.template.spec.containers["name":"registry"].image in the registry deploymentconfig:
NOTE
101
OpenShift Container Platform 3.11 Day Two Operations Guide
NOTE
openshift_master_image_policy_config={"imagePolicyConfig":
{"allowedRegistriesForImport":[{"domainName":"docker.io"},
{"domainName":"\*.docker.io"},{"domainName":"*.redhat.com"},
{"domainName":"*my.registry.example.com*"}]}}
For more information about the ImagePolicy, see the ImagePolicy admission plug-in section.
For more information about the OpenShift Container Platform registry, see Installing a Stand-alone
Deployment of OpenShift Container Registry.
To integrate the OpenShift Container Platform registry, all previous sections apply. From the OpenShift
Container Platform point of view, it is treated as an external registry, but there are some extra tasks that
need to be performed, because it is a multi-tenant registry and the authorization model from OpenShift
Container Platform applies so when a new project is created, the registry does not create a project
within its environment as it is independent.
As the registry is a full OpenShift Container Platform environment with a registry pod and a web
interface, the process to create a new project in the registry is performed using the oc new-project or
oc create command line or via the web interface.
Once the project has been created, the usual service accounts (builder, default, and deployer) are
created automatically, as well as the project administrator user is granted permissions. Different users
can be authorized to push/pull images as well as "anonymous" users.
There can be several use cases, such as allowing all the users to pull images from this new project within
the registry, but if you want to have a 1:1 project relationship between OpenShift Container Platform and
the registry, where the users can push and pull images from that specific project, some steps are
required.
102
CHAPTER 7. DOCKER TASKS
WARNING
The registry web console shows a token to be used for pull/push operations, but
the token showed there is a session token, so it expires. Creating a service account
with specific permissions allows the administrator to limit the permissions for the
service account, so that, for example, different service accounts can be used for
push or pull images. Then, a user does not have to configure for token expiration,
secret recreation, and other tasks, as the service account tokens will not expire.
Procedure
$ oc new-project <my_project>
$ oc new-project <registry_project>
4. Give permissions to push and pull images using the registry-editor role:
If only pull permissions are required, the registry-viewer role can be used.
7. Use the dockercfg secret to pull images from the registry by linking the secret to the service
account performing the pull operations. The default service account to pull images is named
default:
8. For pushing images using the S2I feature, the dockercfg secret is mounted in the S2I pod, so it
needs to be linked to the proper service account that performs the build. The default service
account used to build images is named builder:
103
OpenShift Container Platform 3.11 Day Two Operations Guide
9. In the buildconfig, the secret should be specified for push or pull operations:
"type": "Source",
"sourceStrategy": {
"from": {
"kind": "DockerImage",
"name": "<myregistry.example.com/registry_project/my_image:stable>"
},
"pullSecret": {
"name": "<my_registry>"
},
...[OUTPUT ABBREVIATED]...
"output": {
"to": {
"kind": "DockerImage",
"name": "<myregistry.example.com/registry_project/my_image:latest>"
},
"pushSecret": {
"name": "<my_registry>"
},
...[OUTPUT ABBREVIATED]...
104
CHAPTER 8. MANAGING CERTIFICATES
See Redeploying certificates for information on viewing certificate expirations and redeploying
certificates.
These self-signed certificates are not recognized by browsers. To mitigate this issue, use a publicly
signed certificate, then configure it to re-encrypt traffic with the self-signed certificate.
With the route deleted, the certificates that will be used in the new route with the re-encrypt
strategy must be assembled from the existing wildcard and self-signed certificates created by
the metrics deployer. The following certificates must be available:
Wildcard CA certificate
Wildcard certificate
Hawkular CA certificate
Each certificate must be available as a file on the file system for the new route.
You can retrieve the Hawkular CA and store it in a file by executing the following command:
2. Locate the wildcard private key, certificate, and CA certificate. Place each into a separate file,
such as wildcard.key, wildcard.crt, and wildcard.ca.
105
OpenShift Container Platform 3.11 Day Two Operations Guide
--ca-cert wildcard.ca \
--service hawkular-metrics \
--dest-ca-cert hawkular-internal-ca.crt
106