DevOps Practices for Cloud Native-
DevOps Practices for Cloud Native-
Security Level:
Yao Dong
HUAWEI CLOUD
DevCloud
January
2017 • CloudIDE release
• CloudPipeline release
December
2016 DevCloud 1.0 release
September
2016 • CodeCheck
July 2015 • CloudDeploy
• CloudRelease
April 2015
HUAWEI CONNECT 2016
DevCloud OBT DevCloud official release
Start • ProjectMan • ProjectMan
• CodeHub • CodeHub
• CloudBuild
3
Background — Continuous Improvement
2018
2017 Microservice-based, independent
release
2016 Pipeline-based, independent
release 10–20 releases per day, < 30 min
Automated deployment and per release
2015 5 releases a day, 30 min to 1 hour
testing, decoupled systems
R&D, testing, and O&M worked per release
Release took 3–6 hours over 2
in silos
weeks
DevCloud initial release took
weeks
4
4
Cloud Native — New Requirements of Traditional and Internet Applications
Vague requirements
Constant requirements Changing requirements
Seen as a project that only requires Seen as a product that continuously Seen as a service in constant operation
O&M once finished develops
Requires service agility
Predictable and constant user requests Unpredictable and constantly
growing user requests Continuous delivery (CD)
Concurrent access for 10,000+ users
Concurrent access for millions of users Massive-scale concurrency
Allowing for temporary interruption of Service continuity, dark launch,
offline services for events such as Requires 24/7 online services with no
rollback, and online upgrades
midnight downtime and system O&M excuse
Microservices framework
Cloud-native application
Cloud-native applications architecture
5
Cloud Native — Architecture Evolution
ERP OA
ESB HIP
Customer HR
service
6
Cloud Native — A Triangle of Value
Cloud Native is a new paradigm for building, running, and managing software in the cloud environment by using the cloud infrastructure and
platform services. It has the following architectural features: (micro) service construction, auto scaling, distribution, high availability, multi-tenancy,
and automation. The best practices for supporting the Cloud Native architecture include setting up cross-functional teams, developing
organizations where full-stack engineers highly collaborate with each other, and using DevOps and automation tools to implement microservice CD.
Architecture
Microservices architecture
Value Organization
Project
delivery
Lean and agile Speed Scale Reliability Flexibility Efficiency
Cross-functional teams
CD Full-stack engineers
Service/Microservices architecture–based decoupling Self-service, agile cloud infrastructure Sharing underlying capabilities through APIs
Web UI
Web UI Web UI
DB Table 1 Table 2 DB DB
Web UI
Web UI
MQ
MQ
API Gateway
API Gateway
Cache
Cache
Details page Order placement
Legacy Shopping
Order Inventory Legacy
system cart Order Inventory Price
system
DB DB DB DB
DB DB DB DB
9
Architecture — Self-Service DevOps Units
Services can be discovered, obtained, used, measured, and managed by other applications and developers.
Discover
Highly autonomous
Obtain Manage
User-friendly web UI, CLI, SDK, etc.
Service
Service
Use Measure
10
Architecture — Ensuring Agility with Architecture Decoupling and MVPs
Architecture Evolution
Stage 1: Traditional architecture with the Stage 2: Servitized cloud product delivery Stage 3: Independent microservice releases + E2E
waterfall model >> A release takes 1 month. >> A release takes 1–2 weeks. delivery pipeline >> Multiple daily releases
I. Codebase Fast delivery: Set clear boundaries; maintain healthy software lifecycle
IV. Backing services Elasticity/Agility: Use circuit breakers and relaxed binding
VI. Processes Cloud compatibility: Hand over status management to backend services
12
Architecture: Applying the 12 Factors in Microservices Management
Microservices governance
2. Dependencies
Quick feedback and CD (DevOps)
Cloud services/middleware
Distributed Distributed task
Metrics Communication scheduling 5. Build, release, run
Testing transaction
Image
Application operations 10. Dev/Prod parity
Development Test 8 Auto-scaling
Container
management
services
13
Organization — Cloud Native Microservices Architecture Embraces Agile
and DevOps
Moving towards cross-functional agile/DevOps teams based on cloud service/microservice-
oriented architecture
A team is responsible for the full lifecycle from planning, determination of requirement, design, development, and testing, to
independently deploy, deliver, and operate a specific feature, component, or service.
Business Go-to-
Business Plan
Requirements Cases Features Planning market Service 1 Product Technical Development Operations
Customer support
development
Product Technical Development
Business
Marketing, Service 2 Operations
Manager Manager engineers
requirement mgmt., Agile
and project mgmt... Service 3 Product Technical Development
Operations
Manager Manager engineers
... Product
Unit Technical Development
R&D (Dev) Design Developm ent Reconstruction
testing
Bug fix Deployment DevOps Service N Manager Manager engineers
Operations
Managers, R&D
engineers, test
engineers... Business
Requirements Planning Development Integration Testing Deployment Supply Monitoring
Plan
Platform and
infrastructure O&M
14
Organization — Transforming into Autonomous Cross-Functional Teams
Architecture Evolution
Product
PM/PO service team
PM Core AM/SEG
Product managers
team O&M PD SDE/SL SRE UX
AM SE Product
I&V service 1
Product Product
service 2 service 3
SRE
PL TL Microservices Microservices Microservices Security &
team 1 team 2 team 3 reliability
PD UX
Atomic
service 1 Atomic
SDE SDE SDE TE TE TE 2015– 2017– Atomic
PD FSD/SL service 3
SDE/SL
2016 2018 SDE/SL service 2
PM is responsible for results, reports to the The core team reports to its supervisor based on A microservices team makes its own decisions and
supervisor, and makes decisions directly based on business operations results for decision-making determines the future of the organization based on
business insights. support. Service teams run autonomously. digital operations results.
15
Organization — Redefining Roles for Quick, Autonomous Decision-Making
16
Organization — Enabling DevOps with Layered Software and Professional
Platforms
Leverage IaaS and PaaS services to dynamically allocate resources.
aPaaS
1. Rely on dynamic resource
orchestration and scheduling
Resource orchestration and Distributed
Distributed MQ Database services
scheduling cache
2. Use existing services by
invoking APIs rather than
Image Security
Elastic computing Elastic network Elastic storage internally developing new
ECS, AS, and ELB VPC EVS, OBS...
Monitoring ... services
17
Tools — DevOps Toolchain
E2E software R&D support covering delivery of requirement, code submission and compilation, testing, deployment, and O&M
Scrum/Kanban
Software
Image repository Configuration
repository service
Requirement status pushing Images or war
Environments
Application Performance
Requirement delivery
Create slave ECS RDS CCE Management (APM)
Deploy
Development
Application Testing
ECS RDS CCE
code Build Pipeline Deployment
18
Tools — DevOps Toolchain and Environments Supporting Services and
Microservices
Concurrent development and testing of different microservices; verification in the Gamma environment;
dark launch and fast rollout; continuous feedback and evolution
Requirem
Process Design, development, and testing Release Production environment
ents
...
...
...
19
Tools — Automated O&M for Data-Based Comprehensive Fault Monitoring
Automation of system deployment, upgrades, scaling, monitoring, alarms, fault demarcation and location, and self-healing.
20
CD — Essential Practices
Automatic deployment with Code coverage > 70%; Automation > Standardized environments on all the
installation and upgrade packages 90% four levels
within 15 min
Hierarchical case decoupling Automatic networking for service
Script-based process; automatic
(component-level verification: 2 h; environments; automatic
environment configuration and
version-level verification: 4 h) environment set-up
seamless reusability
SDV, SIT, and customer environment Component layer automated test Efficient support for R&D, testing,
verification supported by automatic cases ready within a day; Automated and verification; environments
deployment I&V cases ready within the iteration available at anytime
21
CD — Full-Lifecycle Testing, as Early as Possible
Software quality management (SQM) aims to ensure that the software meets the
customer quality standards and any necessary regulatory requirements.
Require Test basis
-ments GUI
• Test requirements • Unit • Planning
• Tasks and progress • API • Preparations Testing Service
• Performance Fulfill
• Test cases • Execution
• Web automation Test object
• Defects • Report
• Continuous automatic R&D Unit
• Measurement testing… results
Testing pyramid
Test Automated
management testing Test procedure Relationship between requirement
determination, development, and testing
Requirement explanation Test preparations Test execution Test report • Self-testing by developers; special tests by
1–2
iterations testers
Microservices
Fulfill Fault • Manual, API, web automation, and online testing
developers Self-testing
requirements rectification
Organize by developers
release
Microservices review • Performance testing, reliability testing, security
testers Align developers Design test
cases Regression
testing, and online inspection by dedicated
and testers with Case testing
requirements Implement test testing testers
Dedicated cases
Prepare test
• Alpha: API testing; Beta: API testing, web
testers
environments automation testing, and security testing; Gamma:
Review test cases Confirm API testing, and web automation testing;
test
conclusion Production: online testing, performance testing,
reliability testing, and security testing 22
CD — Key Points for E2E CD Pipelines
A/B testing
No service
interruption
Static code
check
Microservices
testing and gate
Keys:
1. Smoke test: Verify new and modified code in the pipeline.
2. Andon Cord: Pipeline is stopped in case of a failure at any stage,
Test gate
such as a compilation alarm, coding style error, test lag, test failure, Proceed to the next
or violation of architecture principles. stage only if quality
is acceptable
3. Consistency: Build only once with consistent deployment mode
and environment.
4. Layering: Clarify requirements at all checkpoints.
5. Quick feedback: Automate measurement of efficiency, quality, and
cycle time, and visualize the pipeline.
6. R&D efficiency: The heartbeat of R&D, which reveals the actual
progress and quality.
23
CD — Daily DevCloud CD Pipeline
Key actions:
1. 11:00 pm: Obtain code from each service's release branch for static checking, code download, compilation, build, archive, and release.
2. 12:00 am: Trigger automatic deployment of version packages.
3. 3:00 am: Trigger automated RF interface testing.
• Vulnerability scanning
Daily online compilation and build using the release branch Automated deployment Automated testing using
• Open-source scanning Security scanning of version packages Robot Framework APIs with
results notified by email
• Version control
• Automated environment RF automated
cases
preparation Scanning result After compilation and build, archive
and release components using the
• Immutable servers automated release service.
Emailed check result
• Integration testing Code check notifications to ensure that
• Performance testing all issues are resolved
Email notification
• Automatic build, deployment, and CI01 environment
1 Lv1 offline
3 Lv1 online 1. Fast rollback
SLB
Dark launch 4 Lv2 offline
6 Lv2 online
2. Online acceptance testing
strategy 7 Lv3-1 offline 3. A/B testing
9 Lv3-1 online
4. Key features first tested through
Lv1 Lv2 Lv3 private preview
dark launch dark launch dark launch
2 Lv1 deploy
5 Lv2 deploy
Deployment
8 Lv3-1 deploy
Service development Preview and testing by Open beta test or free trial Officially available for users
and verification select users for end tenants at market price
3–6 months 3–6 months
26
Continuous Feedback — Shifting from Manual to Automatic
Job Job
Configuration CMDB platform management
ZEUS
Automated
O&M
Platform
Assisted
Monitoring Monitoring
locating Analysis
O&M teams try everything possible to avoid manual operations because over 60% of incidents are
caused by maloperations.
27
Continuous Feedback — Data Analysis, Dynamic Adjustment, and
Rectification Driven by VoC
Data
...
API calling PV
Instance type UV
collection
Instance
quantity
Product Website Heat map 1. User profile system: precise user research
...
...
Total
2. User behavior analysis system (high-frequency user
resources
Resource
Data O&M
SLA
Fault
Regional
Utilization operations and scenarios): PV/UV, intelligent routing, and
usage
...
Growth rate
...
Customer Consulting
growth hacking
Paid users User
service Fault reporting Data analysis
Conversion
Churn rate
Marketing Complaints
...
3. aPaaS: core service data and a North Star Metric system
...
Events, discounts, and traffic diversion 4. VoC system: user requirement feedback and analysis
28
Continuous Feedback — Layered Capability Maturity
Lead time
Short and frequent iterations
Individual
Team Delivery quality Automatic and visualized pipeline
Capability
Product maturity R&D efficiency
levels Automatic continuous deployment
Enterprise
Delivery stability Reduced time consumption
Cloud infrastructure
29
Summary — Cloud Native DevOps Practices
1. Agile Management
1. Joint agile delivery or crowd innovation with customers
2. Cross-functional, two-pizza teams
3. Service- or microservices-based, autonomous teams
Agile 4. Product management: product definition, competitor analysis, and
prioritization of requirements
2. CD 5. Epic-Feature-User Story: from strategic initiatives to implementation
6. Realizing customer value: independent stories available for delivery
1. Service/Microservices architecture
with traceable requirements
and decoupling
7. Scrum: stand-ups, review, Kanban, showcase acceptance
2. Reserve architecture optimization
and technology improvement CD 8. Dogfooding
channels
3. Code branching policy, fewer 3. Continuous Feedback
conflicts, and fast merging 1. Monitoring, O&M, logging, and application performance analysis
4. CI and automation pipelines
Continuous 2. VoC management and response
5. Chaos Monkey (resilience testing)
Feedback 3. Key customer engagement
6. Built-in security
4. Dark launching, private preview, public preview, and GA
7. Alpha/Beta/Staging environments
8. Automated deployment 5. Operations: warm-up, homepage promotions, and traffic diversion
6. Quick data-driven corrections and dynamic planning adjustment
7. Continuous learning and improvement: proactive suggestions and
encouragement for improvement...
30
Summary — CD Implementation Framework
Improve delivery granularity, speed, and quality by focusing on seven dimensions
1 release every 100 days 1 release every 10 days 1 release per day 10 releases per day 100 releases per day
Function testing and automation testing by dedicated testing departments Automated pipeline unit, function, and performance testing
Test model
Function and automation testing by a team of both developers and testers
Strictly controlled release times and deployment windows Code allowed to go live directly in dark launch, A/B testing, and feature switches
Deployment
model Push code to the production environment based on demands
Strict infrastructure control by dedicated O&M Auto-scaling infrastructure as code service backed by PaaS or IaaS
Infrastructure
Automatic creation of production environment–like infrastructure
Service-oriented Responsible for customer experience operations E2E cross-functional, fully authorized
organization Cross-functional team of a product manager, developers, testers, O&M, and operations personnel team with integrated Dev and Ops