Document Name Capacity Management Policy
Classification Internal Use Only
Document Management Information
Document Title: Capacity Management Policy
Document Number: ORGANISATION-CAP-MNM-POL
Document Internal Use Only
Classification:
Document Status: Approved
Issue Details
Release Date DD-MM-YYYY
Revision Details
Version
Revision Date Particulars Approved by
No.
<Provide details of
<Provide name of
1.0 DD-MM-YYYY changes made on policy
Approver here>
here>
Document Contact Details
Role Name Designation
<Provide name of author <Provide designation of author
Author
here> here>
Reviewer/ <Provide name of reviewer <Provide designation of reviewer
Custodian here> here>
<Provide name of owner <Provide designation of owner
Owner
here> here>
Distribution List
Name
Need Based Circulation Only
Document Name Capacity Management Policy
Classification Internal Use Only
CONTENTS
1. PURPOSE 4
2. SCOPE 4
3. TERMS AND DEFINITIONS 6
4. ROLES AND RESPONSIBILITIES 7
5. CAPACITY MANAGEMENT 8
6. CLOUD AND THIRD-PARTY CAPACITY CONSIDERATIONS 12
7. CAPACITY TESTING AND VALIDATION 16
8. INTEGRATION WITH CHANGE AND INCIDENT MANAGEMENT 18
9. DOCUMENTATION AND RECORDKEEPING 21
10. MONITORING, METRICS, AND REPORTING 23
11. POLICY EXCEPTIONS 25
12. COMPLIANCE AND ENFORCEMENT 27
13. DOCUMENT CONTROL 29
Document Name Capacity Management Policy
Classification Internal Use Only
1. PURPOSE
The purpose of this Capacity Management Policy is to ensure that [ORG NAME]
maintains adequate and reliable computing, network, storage, and application
resources to support business operations, service levels, and information security
requirements at all times.
This policy is intended to:
● Ensure proactive planning, monitoring, and optimization of IT and cloud
infrastructure resources to prevent performance degradation, service
disruptions, or capacity-related incidents.
● Align capacity planning activities with the organization’s information security,
availability, and business continuity objectives.
● Support compliance with:
○ ISO/IEC 27001:2022, specifically control A.8.31 (Capacity Management)
and related availability controls
○ SOC 2 Type 2, particularly the Availability and System Operations Trust
Services Criteria
● Enable effective forecasting, scaling, and cost control through visibility into
resource utilization and workload demand.
By implementing this policy, [ORG NAME] ensures that capacity is managed in a
systematic, secure, and scalable manner to support current and future business needs.
2. SCOPE
This policy applies to all capacity planning, monitoring, and management activities
necessary to ensure the uninterrupted operation of [ORG NAME]’s business-critical
systems and services.
2.1 Covered Environments
This policy covers all environments where [ORG NAME] operates its technology and
business functions, including:
● On-premises data centres and offices
● Public, private, or hybrid cloud platforms
Document Name Capacity Management Policy
Classification Internal Use Only
● Co-location and hosted environments
● Remote or mobile workforces
2.2 Covered Assets and Resources
The following categories of assets are included within the scope of this policy:
● Digital Infrastructure & Cloud Services
○ Compute, memory, storage, and bandwidth resources
○ Application scaling, licensing, and throughput
○ Monitoring of cloud-native services and APIs
● Workforce Capacity
○ Staffing levels across IT, security, and support teams
○ Skill availability for key processes and technologies
○ Onboarding/offboarding coordination with HR
● Physical and Network Infrastructure
○ Laptops, desktops, workstations, servers
○ Network equipment (routers, switches, firewalls, wireless access points)
○ VPN concentrators and remote access endpoints
● Utilities and Facility Resources
○ Power supply and UPS systems
○ HVAC systems and cooling capacity
○ Internet service provider (ISP) bandwidth and redundancy
○ Backup generators and fuel supply
○ Fire detection and suppression systems (e.g., extinguishers, alarms)
Document Name Capacity Management Policy
Classification Internal Use Only
2.3 Organizational Applicability
This policy applies to:
● All business units, functions, and departments responsible for service delivery,
IT operations, cybersecurity, facilities, and human resources
● All teams involved in infrastructure procurement, planning, monitoring, and
scaling
● Third-party service providers and vendors supporting any of the above
resources
3. TERMS AND DEFINITIONS
Term Definition
Capacity The process of ensuring that adequate resources (human,
Management technical, physical, and environmental) are available to meet
current and anticipated demand for IT services and business
operations.
Utilization A predefined limit (typically in %) that indicates when a resource
Threshold (e.g., CPU, memory, bandwidth, team availability) is approaching
overuse and requires scaling or rebalancing.
Scalability The ability of a system, application, or infrastructure to handle
increased workload or demand by adding resources without
affecting performance.
Elasticity The ability of cloud or virtual systems to automatically adjust
resources (up or down) in response to workload changes.
Workforce The availability of skilled human resources needed to perform
Capacity critical functions, support services, or respond to incidents.
Utility Infrastructure components such as uninterruptible power
Systems supplies (UPS), HVAC systems, internet connections, and fire
suppression that support physical site operations.
Document Name Capacity Management Policy
Classification Internal Use Only
Availability The ability of a system or resource to be accessible and usable as
required by business operations.
Resource The process of predicting future demand for capacity (compute,
Forecasting staff, power, etc.) based on trends, project plans, or growth
metrics.
Capacity A reference point representing normal resource usage under
Baseline typical workloads, used for comparison and forecasting.
Redundancy Deployment of duplicate components (e.g., ISPs, power sources,
personnel) to prevent single points of failure and ensure
availability.
4. ROLES AND RESPONSIBILITIES
Role Responsibilities
Chief Information - Oversee organization-wide capacity planning initiatives.
Officer (CIO) / Head
- Ensure alignment of capacity management with business growth,
of IT
availability goals, and regulatory requirements.
- Approve budget for scaling and new infrastructure.
Infrastructure / - Monitor and manage compute, storage, and network resource
Cloud Operations utilization.
Team
- Define and maintain thresholds, auto-scaling configurations, and
alerting rules.
- Conduct regular trend analysis and forecasting for IT resources.
Facilities / Admin / - Monitor physical utilities and infrastructure (e.g., power, HVAC, UPS,
Real Estate Team fire extinguishers, generators).
- Plan for facility expansion, upgrades, or redundancy based on
occupancy and equipment needs.
- Coordinate with vendors for inspection, refuelling, and maintenance
cycles.
Document Name Capacity Management Policy
Classification Internal Use Only
Information Security - Ensure capacity-related risks (e.g., resource exhaustion, degraded
Team controls) are tracked in the risk register.
- Review critical control dependencies on shared resources (e.g., VPN,
SIEM, firewall logs).
- Participate in scalability and availability planning for security tools.
HR and Workforce - Track staffing levels, forecast headcount requirements, and plan
Planning Team hiring against expected workload demand (e.g., new projects, SOC
coverage, support hours).
- Maintain skill inventory and assist in resource gap identification.
Application Owners / - Monitor application throughput and performance.
DevOps Teams
- Forecast peak usage trends (e.g., seasonal loads, new features).-
Coordinate with Infra team for load testing and horizontal/vertical
scaling.
Compliance / Risk - Ensure that capacity management practices meet regulatory,
Management contractual, and audit requirements (e.g., ISO 27001, SOC 2).
- Review capacity plans during change management and annual risk
assessments.
Third-Party Vendors / - Provide transparency into resource usage, bandwidth capacity, and
MSPs failover capabilities.
- Notify [ORG NAME] of any capacity constraints or maintenance
schedules that may impact availability.
Document Name Capacity Management Policy
Classification Internal Use Only
5. CAPACITY MANAGEMENT
[ORG NAME] shall implement a structured, cross-functional, and proactive capacity
management framework covering all critical assets, systems, personnel, utilities, and
third-party services to ensure optimal performance, cost efficiency, and business
continuity.
5.1 Capacity Planning Governance
Capacity management shall be embedded within [ORG NAME]’s IT strategy, ISMS,
risk management framework, and business continuity planning.
Capacity planning shall be performed for all major service components including:
● Infrastructure (on-premises and cloud)
● Applications and platforms
● Physical facilities and utilities
● Human resources
● Security, compliance, and monitoring systems
Capacity shall be considered from both business-as-usual and disaster recovery
perspectives.
5.2 Performance Baselines and Thresholds
All critical systems shall have documented performance baselines, measured under
normal load conditions.
Thresholds shall be established for:
● System resources (e.g., 70% CPU, 80% memory)
● Support capacity (e.g., ticket volumes per engineer)
● Utility tolerance (e.g., HVAC cooling capacity vs rack heat output)
Threshold breaches shall trigger alerts, investigation, and rebalancing actions.
Document Name Capacity Management Policy
Classification Internal Use Only
5.3 Continuous Monitoring and Real-Time Visibility
Monitoring tools shall be deployed across infrastructure, networks, and applications
to:
● Track real-time utilization
● Analyze performance degradation
● Predict capacity saturation events
Dashboards shall be reviewed regularly by IT Ops, DevOps, and Risk Teams.
Alerts from capacity monitoring systems must be integrated into SIEM, NOC, or
incident workflows for timely response.
5.4 Forecasting and Trend Analysis
Resource consumption data shall be analyzed using trend reports and forecasting
models to:
● Predict future growth across infrastructure and support functions
● Plan procurement, budget allocations, and hiring roadmaps
● Support strategic planning (e.g., regional expansion, new product launches)
Forecasting shall cover at least 6–12 months into the future, updated quarterly.
5.5 Infrastructure and Asset Scalability
Infrastructure provisioning shall support scaling up and out (e.g., via cloud elasticity
or modular hardware deployment).
Resource provisioning shall include buffer capacity (e.g., 20–30%) for:
● Growth surges
● Incident-related loads
● DR/BCP cutover scenarios
Document Name Capacity Management Policy
Classification Internal Use Only
Capacity for network devices, WAFs, VPN, and firewalls shall be tested under
simulated peak conditions.
5.6 Utilities and Facility Resource Planning
Power, cooling, ISP bandwidth, fire suppression, and physical security systems must
be:
● Adequately sized for current and forecasted usage
● Supported by redundant systems (e.g., dual UPS, multiple ISPs)
● Included in DR test scenarios and maintenance schedules
Utility health (e.g., UPS load, generator fuel levels) must be monitored, documented,
and tested periodically
5.7 Workforce and Support Team Capacity
HR and department leads shall perform periodic workforce capacity reviews based
on:
● Ticket load, project volumes, 24/7 coverage expectations
● Skills mapping and resource availability
● Absence, attrition, and surge support planning
Workforce shortfalls shall trigger hiring, reskilling, or outsourcing options with lead
time built into BCP plans.
5.8 Change and Deployment Alignment
All significant deployments or infrastructure changes shall include a capacity impact
review as part of the:
● Change Request or CAB checklist
● Go-live readiness assessment
● Pre-deployment load testing or smoke testing
Document Name Capacity Management Policy
Classification Internal Use Only
Post-deployment monitoring shall confirm performance against projected usage.
5.9 Security Control Dependencies
Security-related systems (e.g., logging, endpoint protection, SIEM, WAFs) must have
capacity to:
● Sustain high log throughput during incident spikes
● Retain logs and alerts as per regulatory requirements
● Scale with the number of endpoints and events per second (EPS)
5.10 Business Continuity and Availability Planning
All critical resources shall be mapped to their availability class (e.g., Tier 1, 2, 3) and
must:
● Include failover, backup, and disaster recovery capacity
● Be validated through BCP and DR drills
● Be aligned with Recovery Time Objective (RTO) and Recovery Point
Objective (RPO) thresholds
6. CLOUD AND THIRD-PARTY CAPACITY CONSIDERATIONS
[ORG NAME] shall ensure that all cloud services, SaaS platforms, and third-party
infrastructure providers supporting critical operations are included in the
organization’s capacity planning and availability strategy. This is essential to ensure
scalability, resilience, and service continuity across hybrid and outsourced
environments.
6.1 Cloud Capacity Management Framework
All workloads hosted on cloud platforms (e.g., AWS, Azure, GCP) shall follow a defined
capacity management framework that includes:
● Baseline definition:
○ Establish expected usage profiles and minimum/maximum resource
levels (e.g., CPU cores, storage, DB connections).
Document Name Capacity Management Policy
Classification Internal Use Only
○ Document initial sizing parameters for autoscaling groups, serverless
functions, and container clusters.
● Auto-scaling and elasticity:
○ Configure autoscaling rules (horizontal/vertical) for compute, databases,
and managed services based on thresholds (e.g., CPU > 70%, memory >
75%).
○ Validate elasticity under production-like load through stress testing in
staging.
● Monitoring and alerts:
○ Implement real-time monitoring for:
■ Compute saturation (EC2, VMs)
■ API rate limits (Lambda, Azure Functions)
■ Network ingress/egress limits
■ Storage capacity (EBS, S3, Blob)
■ Billing or quota breaches
○ Integrate alerts into centralized dashboards (e.g., Grafana, DataDog,
CloudWatch) with escalation paths.
● Cloud service quota tracking:
○ Maintain a register of cloud resource quotas and soft limits (e.g., VPCs,
function concurrency, IAM policies per region).
○ Request limit increases proactively before deployment peaks or client
onboarding.
6.2 SaaS and Platform Services
● SaaS platforms supporting core business functions (e.g., CRM, ticketing, SIEM,
MDM, collaboration, HRMS) shall be evaluated for:
○ Concurrency limits (e.g., maximum number of active sessions/users)
Document Name Capacity Management Policy
Classification Internal Use Only
○ Storage or mailbox quotas
○ API or data export limits
○ Rate-limiting or throttling behaviour under load
○ Impact of license plan changes on performance or scale
● Usage metrics must be:
○ Reviewed monthly by the Application Owner or IT Ops Team
○ Documented with dashboards and integrated into performance review
meetings
○ Used to plan license upgrades or platform transitions as needed
● Admins must monitor for approaching SaaS thresholds and flag risks that
may lead to user disruptions or compliance breaches (e.g., data retention cap
reached, log archival delays).
6.3 Third-Party Hosting and Infrastructure Providers
For managed service providers (MSPs), hosting partners, or co-location facilities, [ORG
NAME] shall:
● Review and document the provider’s:
○ Capacity provisioning model (shared/dedicated resources)
○ Peak usage thresholds (e.g., per tenant, per service)
○ Backup bandwidth and storage guarantees
○ Network segmentation and oversubscription policies
○ Multi-tenant performance isolation mechanisms
● Validate the provider’s ability to:
○ Scale infrastructure during workload spikes
○ Provide logs and reports on performance bottlenecks
Document Name Capacity Management Policy
Classification Internal Use Only
○ Manage upgrades, failovers, and patching without degrading capacity
● Monitor the vendor’s adherence to SLAs and uptime thresholds, with
structured monthly or quarterly review cadence.
6.4 Contractual Safeguards and SLA Capacity Guarantees
All cloud and third-party service agreements must include capacity-related
contractual clauses such as:
Requirement Example Clauses
Availability SLAs Minimum 99.9% uptime per month for Tier 1 services
Scalability Commitment to provision additional resources within 2
Guarantees hours of request
Burst Capacity Buffer resource access during seasonal or critical peaks
API / Throughput Maximum concurrent calls, query limits, or requests per
Caps minute
Maintenance 7-day advance notice for upgrades impacting resource
Notifications availability
Performance Monthly reporting on usage, saturation, and capacity
Reporting incidents
All contracts must be reviewed by Legal, InfoSec, and Compliance teams before
execution.
6.5 Shared Responsibility and Operational Transparency
● A Capacity Responsibility Matrix shall be maintained for all cloud and vendor-
hosted services, indicating:
○ Which party is responsible for provisioning, scaling, and reporting
○ Escalation contacts for resource bottlenecks or failures
Document Name Capacity Management Policy
Classification Internal Use Only
○ Shared dependency management (e.g., DNS, authentication, CDNs)
● Cloud usage, vendor metrics, and SLA compliance shall be included in:
○ Monthly service review meetings
○ ISMS Steering Committee discussions
○ Internal audit and SOC 2 control testing as applicable
6.6 Risk Mitigation for Cloud and Vendor Capacity
[ORG NAME] shall identify and mitigate capacity risks across the cloud and third-party
supply chain, including:
● Vendor lock-in scenarios due to scaling limitations or rigid licensing
● Cloud region capacity shortages, especially during global outages or
geopolitical disruptions
● Unplanned usage surges caused by marketing events, cyberattacks (e.g.,
DDoS), or integrations
● Rate-limiting or function throttling affecting user experience or downstream
processes
Mitigation actions may include:
● Multi-region or multi-cloud deployment design
● Quota increase requests in advance of launches
● Load testing across cloud-native services
● Contracts with alternate providers (cold standby SaaS or secondary ISP)
7. CAPACITY TESTING AND VALIDATION
[ORG NAME] shall validate the effectiveness of its capacity planning efforts through
periodic testing, simulations, and performance validation exercises. These activities
ensure that systems, applications, infrastructure, and workforce can withstand
expected and unexpected surges in demand without compromising availability,
performance, or compliance.
Document Name Capacity Management Policy
Classification Internal Use Only
7.1 Types of Capacity Testing
The following types of capacity tests shall be conducted based on system criticality,
regulatory scope, and business impact:
Test Type Purpose Examples
Load Testing Validate system behavior Simulate 1,000 concurrent users on
under expected workload customer portal
Stress Testing Determine system stability Push application beyond max
under extreme conditions capacity to identify failure points
Scalability Assess ability to scale up or Trigger autoscaling rules in cloud
Testing out under increasing load environment
Failover Confirm availability during Switch from primary to DR data
Testing component or site failure center / cloud region
Saturation Simulate resource exhaustion Fill disk space on SIEM or endpoint
Testing to observe alerting and log collector
recovery
Workforce Validate human resource Simulate 24x7 SOC coverage for
Simulation readiness for peak or incident extended period or sudden
load incident spike
7.2 Testing Frequency and Triggers
Capacity testing shall be performed under the following conditions:
● Annually for all Tier 1 systems (as per BIA or asset classification)
● Before go-live of any major application or infrastructure deployment
● After significant changes in system architecture, workload patterns, or cloud
configurations
● During BCP/DR drills, simulating real-world resource stress
Document Name Capacity Management Policy
Classification Internal Use Only
● In response to SLA violations, high utilization alerts, or audit findings
7.3 Documentation and Evidence
All capacity testing must be documented and retained for audit and compliance.
Records shall include:
● Test plan and scope
● Tools and scripts used (e.g., JMeter, Locust, AWS Fault Injection Simulator)
● Input parameters (load volume, duration, concurrent sessions, etc.)
● Results and observations
● Performance thresholds and breach points
● Issues encountered and mitigation applied
● Approvals and sign-offs
Test reports must be reviewed by the CISO, IT Ops, and Change Advisory Board (CAB)
before production impact changes are finalized.
7.4 Continuous Validation via Observability
● Systems with high variability in usage (e.g., customer-facing apps, APIs) must
be equipped with:
○ Observability tooling (e.g., Prometheus, OpenTelemetry, Grafana)
○ Anomaly detection for unusual usage or saturation trends
○ Dynamic alert thresholds that adjust based on time of day or
seasonality
● Capacity-related incidents (e.g., resource exhaustion, degradation under load)
must be:
○ Investigated via root cause analysis
○ Mapped to gaps in previous testing or forecasting
○ Used to update baseline assumptions and recovery plans
Document Name Capacity Management Policy
Classification Internal Use Only
8. INTEGRATION WITH CHANGE AND INCIDENT
MANAGEMENT
To ensure capacity-related risks are identified and mitigated before disruptions occur,
[ORG NAME] shall integrate capacity planning checkpoints into its Change
Management and Incident Management processes. This ensures operational
readiness, service availability, and continual improvement of capacity planning
decisions.
8.1 Capacity Checks During Change Management
All significant changes—whether infrastructure upgrades, new deployments, or
migrations—shall undergo a capacity impact assessment as part of the change
lifecycle.
● The Change Advisory Board (CAB) shall validate whether:
○ The new system or change introduces additional workload on existing
resources
○ There is sufficient buffer (compute, memory, bandwidth, licenses) to
absorb the change
○ Scaling rules or resource pools have been reviewed and updated
○ Dependencies on cloud quotas or third-party throughput limits have
been addressed
● Changes requiring capacity scaling shall:
○ Be logged in the Capacity Planning Register
○ Include a rollback strategy in case of failure due to saturation
○ Include testing outcomes, when applicable (see Section 7)
8.2 Capacity-Linked Incident Handling
Capacity-related incidents shall be logged, categorized, and analyzed to improve the
overall capacity framework.
Examples of capacity-linked incidents include:
Document Name Capacity Management Policy
Classification Internal Use Only
● High latency or system unavailability due to CPU, memory, or bandwidth
exhaustion
● Throttling or timeouts from SaaS platforms or cloud services
● Delayed log ingestion or alerting due to SIEM overload
● ISP outage exceeding redundant failover capacity
● Understaffed support desks during high-volume events
All such incidents shall trigger:
● Root Cause Analysis (RCA), specifically identifying:
○ Forecasting gaps
○ Threshold misconfiguration
○ Unexpected usage patterns
○ Vendor-side saturation
● Corrective Action Plans (CAPs) that may include:
○ Scaling up/down infrastructure
○ Updating alert thresholds or autoscaling policies
○ Reallocating resources (e.g., moving workloads across regions)
○ Revisiting workforce coverage models
8.3 Feedback Loop to Capacity Planning
● Lessons learned from incident and change reviews shall be fed back into:
○ Performance baselines (Section 5)
○ Forecasting models (Section 5.4)
○ Procurement and hiring plans
Document Name Capacity Management Policy
Classification Internal Use Only
○ Monitoring dashboards and alert rules
● High-impact incidents or repeated saturation events shall be reviewed at the
ISMS Steering Committee or Operational Risk Council for executive-level
visibility and funding support if needed.
9. DOCUMENTATION AND RECORDKEEPING
[ORG NAME] shall maintain comprehensive records of all capacity-related planning,
monitoring, testing, and incident resolution activities to support operational continuity,
audit readiness, and regulatory compliance.
9.1 Capacity Planning Documentation
The following documents must be maintained and reviewed periodically:
● Capacity Planning Register:
○ Contains forecasted usage, buffer levels, and scaling plans for compute,
network, storage, workforce, utilities, and critical third-party services.
● Utilization Dashboards and Threshold Reports:
○ Real-time and historical metrics for CPU, memory, storage, bandwidth,
API usage, SaaS license consumption, etc.
● Forecasting Reports:
○ Predictive models and historical trends used to inform procurement,
scaling, or hiring.
● Workforce Planning Sheets:
○ Headcount vs. workload mapping for critical teams (e.g., SOC, support,
DevOps, cloud).
● Cloud Quota and Resource Limits Tracker:
○ Active quota usage, vendor-imposed thresholds, limit increase requests,
and expiry reminders.
● Third-Party SLA and Capacity Declarations:
Document Name Capacity Management Policy
Classification Internal Use Only
○ Vendor-side commitments for performance, scalability, and buffer
capacities (as part of due diligence or contract annexes).
9.2 Capacity Testing and Validation Records
Records shall be maintained for each capacity test conducted, including:
● Test scope, goals, and system(s) tested
● Scripts, simulators, or tools used
● Test logs and screenshots
● Results and bottleneck analysis
● Sign-offs by owners and change approvers
These documents shall be stored in a secure, access-controlled repository and mapped
to the Change or DR test register.
9.3 Incident and Change Logs (Capacity-Relevant)
● All capacity-related incidents (e.g., outages, throttling, DR failovers) shall be
tagged in the Incident Management System with a capacity linkage.
● Change records involving scale, configuration, migration, or optimization shall
reference associated capacity planning or impact assessments.
9.4 Record Retention
All capacity-related records shall be:
● Retained for a minimum of 5 years, or longer if required by:
○ ISO/IEC 27001 or SOC 2 audit cycles
○ Regulatory obligations (e.g., DPDP, HIPAA)
○ Client or contractual commitments
● Retention timelines shall be reviewed annually by the Compliance, Risk, or
ISMS team.
Document Name Capacity Management Policy
Classification Internal Use Only
10. MONITORING, METRICS, AND REPORTING
To ensure timely action and strategic decision-making, [ORG NAME] shall implement
a structured monitoring and reporting framework for all capacity-related metrics
across infrastructure, applications, workforce, and third-party services.
10.1 Capacity Monitoring Requirements
All critical systems and resources must be continuously or periodically monitored using
automated tools and dashboards.
Monitoring shall include:
● System Utilization Metrics:
○ CPU, memory, disk, IOPS, and bandwidth for servers and cloud instances
○ Database query volumes and connection saturation
○ Log and event ingestion volumes for SIEM and observability stacks
● Network and Utility Monitoring:
○ ISP bandwidth usage and failover link status
○ Power consumption, UPS load, and cooling system efficiency
○ Generator runtime and fuel levels
● Cloud Quota Monitoring:
○ Instance limits, storage tiers, API gateway limits, concurrency caps
○ Autoscaling performance and scaling lag analysis
● Workforce Monitoring:
○ Ticket volumes per team member
○ On-call rotation coverage and fatigue indicators
○ Hiring pipeline progress vs. projected workload
10.2 Key Capacity Metrics (KPIs)
Each function shall define capacity KPIs that are reviewed monthly or quarterly.
Document Name Capacity Management Policy
Classification Internal Use Only
Category Sample KPI
Infrastructure Avg. CPU utilization % across production nodes
Cloud Services % of quota used vs. threshold (e.g., Lambda
concurrency)
Network Peak bandwidth usage as % of available ISP
capacity
Workforce Support tickets per engineer per week
SaaS Licenses % of license consumption vs. purchased capacity
Response Time % of time systems meet SLA response time under
load
Alert % of capacity alerts resolved before threshold
Effectiveness breach
10.3 Reporting and Review Cadence
● Weekly Dashboards:
○ Auto-generated reports reviewed by Infra, CloudOps, and SOC teams
○ Focus on active alerts, thresholds breached, and upcoming risks
● Monthly Reports:
○ Sent to department heads and ISMS/Risk team
○ Include trend charts, projected growth, and action items
● Quarterly Capacity Review:
○ Conducted as part of IT/BCP/ISMS review meetings
○ Covers infrastructure, workforce, and vendor-side capacity risks
Document Name Capacity Management Policy
Classification Internal Use Only
○ Inputs used for budget, hiring, and procurement decisions
10.4 Threshold Breach Handling
● Alert thresholds must be:
○ Defined based on criticality and historical behavior
○ Tuned periodically to avoid noise or false positives
● All threshold breaches must be:
○ Logged in monitoring systems
○ Investigated and resolved with corrective actions
○ Escalated if they indicate systemic risk or recurring saturation
11. POLICY EXCEPTIONS
While this Capacity Management Policy is intended to apply universally across systems,
processes, and teams, [ORG NAME] recognizes that legitimate exceptions may
occasionally be required due to unique business, technical, or operational
circumstances.
11.1 Acceptable Exception Scenarios
Exceptions may be considered in situations such as:
● Temporary resource overutilization due to emergency projects or migrations
● Vendor-imposed restrictions or licensing models that limit scalability
● Unavailability of hardware, cloud quotas, or personnel during crisis
● Controlled deviations for innovation labs, PoCs, or sandbox environments
● Legacy systems pending decommissioning with limited scaling options
11.2 Exception Request Process
● The owner of the system/process seeking an exception must submit a formal
Exception Request, including:
Document Name Capacity Management Policy
Classification Internal Use Only
○ Description of the deviation
○ Justification and business impact
○ Risks involved (e.g., saturation, SLA breach, compliance failure)
○ Compensating controls in place (e.g., monitoring, backups)
○ Timeframe for resolution or return to compliance
● Requests must be logged in the Policy Exception Register and assigned a
unique reference ID.
11.3 Review and Approval Workflow
Risk Level Approval Required
Low impact or temporary Function Head or Infra Lead
Medium impact or repeated CISO or ISMS Manager
High risk / SLA or compliance Executive Management / Risk
impact Committee
All approved exceptions must have an expiration date, after which:
● The exception must be resolved and closed, or
● Revalidated with updated risk assessment and approvals
11.4 Monitoring and Reporting of Exceptions
● All active exceptions must be reviewed monthly by the Risk or ISMS team
● Exception status shall be reported to the:
○ ISMS Steering Committee
○ Internal Audit team (if capacity-related control is impacted)
○ Management Review (quarterly or annually)
Document Name Capacity Management Policy
Classification Internal Use Only
Persistent or high-risk exceptions may trigger:
● Corrective action plans
● Project reprioritization
● Vendor escalation or infrastructure upgrades
12. COMPLIANCE AND ENFORCEMENT
All teams, departments, and third-party service providers involved in the design,
operation, monitoring, or management of resources within [ORG NAME] are expected
to comply with this Capacity Management Policy. Non-compliance may result in
capacity-related incidents, SLA breaches, or regulatory exposure.
12.1 Internal Compliance Expectations
All employees and stakeholders shall:
● Monitor, plan, and scale capacity proactively for systems under their ownership
● Collaborate with Infra, Cloud, HR, and Admin teams to manage utilization
thresholds
● Participate in forecasting, testing, and BCP drills related to capacity planning
● Report anticipated spikes or bottlenecks ahead of major initiatives or business
events
12.2 Roles of Control Owners and Approvers
● Infra, Cloud, DevOps teams must ensure:
○ Systems are auto-scaled or manually scaled when thresholds are
crossed
○ Alerts are tuned and responded to in a timely manner
○ Capacity is factored into change requests and DR planning
● HR, SOC, Admin teams must ensure:
○ Workforce, facilities, and utilities have buffer and continuity plans
Document Name Capacity Management Policy
Classification Internal Use Only
○ Shifts, on-call coverage, and support staffing are maintained
● ISMS, Risk, Compliance teams must:
○ Validate that controls linked to ISO 27001:2022 A.8.31 and SOC 2
Availability are in place
○ Ensure regular reviews and audits are conducted on resource health and
trends
○ Track open exceptions, overdue upgrades, or scaling delays
12.3 Non-Compliance Consequences
Violation Examples Consequences
Type
Negligence Ignoring threshold alerts, failing Performance issues, security
to scale workloads alerts, or SLA breaches
Bypass Going live without a capacity Incident escalation, change
review or testing rollback
Repeated Failing to resolve known Formal warning, process audit
Inaction capacity issues
Control Gaps Failure to plan for SOC 2 or ISO Audit findings, client escalation
control coverage
12.4 Disciplinary Measures
Non-compliance may lead to:
● Warnings or escalation to department heads
● Restrictions on change approvals or platform access
● Inclusion in internal audit reports
● Referral to HR for disciplinary action in severe case
Document Name Capacity Management Policy
Classification Internal Use Only
12.5 Whistleblower Protection
Any employee may confidentially report violations, misuse, or unmanaged risks related
to capacity planning to:
● CISO
● ISMS Manager
● Whistleblower channel or Ethics Committee
[ORG NAME] prohibits retaliation against employees who report capacity or risk-
related concerns in good faith.
13. DOCUMENT CONTROL
This section defines the ownership, review cycle, and versioning requirements for the
Capacity Management Policy to ensure it remains current, effective, and aligned with
regulatory and operational needs.
13.1 Ownership and Responsibility
Role Responsibility
Policy Owner Chief Information Officer (CIO) / CISO
Custodian Infrastructure or Cloud Operations Lead
Approving ISMS Steering Committee / Executive
Authority Management
The Policy Owner is accountable for the policy’s alignment with security, compliance,
and availability goals. The Custodian is responsible for implementing controls,
conducting reviews, and maintaining documentation.
13.2 Review and Update Cycle
● This policy shall be reviewed at least once annually, or more frequently if:
○ There are major infrastructure or cloud architecture changes
Document Name Capacity Management Policy
Classification Internal Use Only
○ Business growth requires capacity model adjustments
○ There are audit findings or incidents linked to capacity failures
○ Changes occur in ISO 27001:2022, SOC 2, or other applicable frameworks
● Reviews shall be recorded in the document history with version control.
13.3 Version Control and Change Log
Each policy version must include:
● Version number and date
● Summary of changes
● Reviewer(s) and approver(s)
● Reference to impacted systems or audits (if applicable)
Previous versions shall be retained for at least 5 years in a secure repository.
13.4 Policy Distribution
● The approved policy shall be:
○ Published on [ORG NAME]’s internal policy portal or GRC platform
○ Communicated to all infrastructure, cloud, admin, HR, DevOps, and
security teams
○ Included in onboarding packs for Infra/Cloud/SOC teams
○ Referenced in ISMS internal audits and certification preparation
Access rights to edit the policy shall be restricted to the Owner and Custodian. View-
only access shall be extended to relevant stakeholders.
Document Name Capacity Management Policy
Classification Internal Use Only