1_Introduction
1_Introduction
Slide #1
Topics
1. What is system administration?
2. What do sysadmins do?
3. Principles and First Steps
4. Organizations and Certifications
5. Maturity and Complexity
6. Ethics
Slide #2
What is a system?
System: An organized collection of computers
interacting with a group of users.
Servers PCs
run on
run on
Network
Services Users
help to accomplish work
Slide #3
System State
System policy: specification of a system’s
configuration and its acceptable usage.
System state S(t): the current configuration (files,
kernel, memory or CPU usage) of a system.
Ideal states S*(t): states of the system that match the
system policy. Over time, the system state shifts
away from the ideal state.
System administration: modifying the system to
bring it closer to S*(t).
Slide #4
What do sysadmins do?
Small org: sysadmin can be entire IT staff
– Phone support
– Order and install software and hardware
– Fix anything that breaks from phones to servers
– Develop software
Large org: sysadmin is one of many IT staff
– Specialists instead of “jack of all trades”
– Database admin, Network admin, Fileserver admin, Help
desk worker, Programmers, Logistics
Slide #5
Common Activities
1. Add and remove users.
2. Add and remove hardware.
3. Perform backups.
4. Install new software systems.
5. Troubleshooting.
6. System monitoring.
7. Auditing security.
8. Help users.
9. Communicate.
Slide #6
User Management
Creating user accounts
– Consistency requires automation
– Startup (dot) files
Namespace management
– Usernames and UIDs
– Multiple namespaces or SSI?
Removing user accounts
– Consistency requires automation
– Many accounts across different systems
Slide #7
Hardware Management
Adding and removing hardware
– Configuration, cabling, etc.
Purchase
– Evaluate and purchase servers + other hardware
Capacity planning
– How many servers? How much bandwidth, storage?
Data Center management
– Power, racks, environment (cooling, fire alarm)
Virtualization
– When can virtual servers be used vs. physical?
Slide #8
Backups
Backup strategy and policies
– Scheduling: when and how often?
– Capacity planning
– Location: on-site vs. off-site.
Monitoring backups
– Checking logs
– Verifying media
Performing restores when requested
Slide #9
Software Installation
Automated consistent OS installs
– Desktop vs. server OS image needs.
Installation of software
– Purchase, find, or build custom software.
Managing software installations
– Distributing software to multiple hosts.
– Managing multiple versions of a software pkg.
Patching and updating software
Slide #10
Troubleshooting
Problem identification
– By user notification
– By log files or monitoring programs
Tracking and visibility
– Ensure users know you’re working on problem
– Provide an ETA if possible
Finding the root cause of problems
– Provide temporary solution if necessary
– Solve the root problem to permanently eliminate
Slide #11
System Monitoring
Automatically monitor systems for
– Problems (disk full, error logs, security)
– Performance (CPU, mem, disk, network)
Provides data for capacity planning
– Determine need for resources
– Establish case to bring to management
Slide #12
Helping Users
Request tracking system
– Ensures that you don’t forget problems.
– Ensures users know you’re working on their
problem; reduces interruptions, status queries.
– Lets management know what you’ve done.
User documentation and training
– Policies and procedures
Schedule and communicate downtimes
Slide #13
Communicate
Customers
– Keep customer appraised of process.
• When you’ve started working on a request with ETA.
• When you make progress, need feedback.
• When you’re finished.
– Communicate system status.
• Uptime, scheduled downtimes, failures.
– Meet regularly with customer managers.
Managers
– Meet regularly with your manager.
– Write weekly status reports.
Slide #14
Specialized Skills
Heterogeneous Environments
Integrating multiple-OSes, hardware types, or network
protocols, distributed sites.
Databases
SQL RDMS
Networking
Complex routing, high speed networks, voice.
Security
Firewalls, authentication, NIDS, cryptography.
Storage
NAS, SANs, cloud storage.
Virtualization and Cloud Computing
VMware, cloud architectures.
Slide #15
Qualities of a Successful Sysadmin
Customer oriented
– Ability to deal with interrupts, time pressure
– Communication skills
– Service provider, not system police
Technical knowledge
– Hardware, network, and software knowledge
– Debugging and troubleshooting skills
Time management
– Automate everything possible.
– Ability to prioritize tasks: urgency and importance.
Slide #16
First Steps to Better SA
Use a request system.
– Customers know what you’re doing.
– You know what you’re doing.
Manage quick requests right
– Handle emergencies quickly.
– Use request system to avoid interruptions.
Policies
– How do people get help?
– What is the scope of responsibility for SA team?
– What is our definition of emergency?
Start every host in a known state.
Slide #17
Principles of SA
Simplicity
– Choose the simplest solution that solves the entire problem.
– Work towards a predictable system.
Clarity
– Choose a straightforward solution that’s easy to change, maintain,
debug, and explain to other SAs.
Generality
– Choose reusable solutions that scale up; use open protocols.
Automation
– Use software to replace human effort.
Communication
– Be sure that you’re solving the right problems and that people know
what you’re doing.
Basics First
– Solve basic infrastructure problems before advanced ones.
Slide #18
Organizations
USENIX: Advanced Computing Systems
Association
LISA: Large Installation System
Administration
SAGE: System Administration Guild
LOPSA: League of Professional System
Administrators
Slide #19
Types of Sites
Small
2-10 computers, 1 OS, 2-20 users.
Small staff size requires outsourcing to obtain most
specialized skills.
Midsized
11-100 computers, 1-3 OSes, 21-100 users.
Large
100+ computers, multiples OSes, 100+ users
Outsources to reduce costs, some specializations.
Slide #20
Certifications
• CCNA, CCNP, CCIE (Cisco)
• cSAGE (SAGE)
• MCSA (Microsoft)
• RHCE (Red Hat)
• SCSA (Sun)
• VCP (VMware)
Slide #21
SAGE Job Descriptions
Novice
OS familiarity, help desk skills
Junior
Can use OS system administration tools (370)
Intermediate
Understanding of distributed computing, common servers,
automate small tasks, independent action
Senior
Understanding of scaling issues, including capacity
planning, solve problems by addressing root cause,
higher level programming abilities, write proposals for
purchasing, data center planning, etc.
Slide #22
SA Maturity Model (SAMM)
1. Ad Hoc
Ad-hoc non-repeatable solutions, firefighting.
2. Repeatable
Some repeatable processes.
3. Defined
Documented standard processes
4. Managed
Process effectiveness measured, adapted.
5. Optimized
Slide #23
Maturity and Complexity
Slide #24
Tool Maturity Levels
1. Ad Hoc
OS GUI, CLI, or web administration interfaces.
2. Repeatable
Version control (RCS, SVN, GIT), request tracker
3. Defined
Automatic monitoring (Nagios, monit, god)
4. Managed
Configuration management (AutomateIt, cfengine)
5. Optimized
Slide #25
SAGE Code of Ethics
• Professionalism
• Personal Integrity
• Privacy
• Laws and Policies
• Communication
• System Integrity
• Education
• Social Responsibility
Slide #26
Terry Childs Case
Network administrator for San Francisco
– CCIE who built city’s FiberWAN network
Terry was only person with router passwords
– IT department acknowledges knowing that
– He was on-call 24x7x365 to resolve issues
Terry refused to give passwords to boss
– Cited fears that they would be misused by
management, outside contractors.
What was the right thing for Terry to do?
Slide #27
Key Points
Definitions
– System, system state, ideal state, administration
Principles of System Administration
– Simplicity
– Clarity
– Generality
– Automation
– Communication
– Basics First
System Administration Maturity Model
– Maturity and complexity, tools
Slide #28