What is the minimum level of ‘availability’ that can be expected of a server?
How is the severity and impact (and therefore priority) of an issue represented?
Before explaining what is meant by ‘availability’, it is important to understand how the priority of an issue is defined based on its severity and impact as the definition of availability relates to certain priorities of issue only.
When a Service Provider raises a request or issue with the Support Team, a priority is assigned to it based on its perceived urgency which, in the case of issues, is related to its severity and its impact. The definitions used when determining the priority of a request or issue are as follows:
- Priority 1 (P1) All of the customer’s users are unable to access (a) one or more ‘critical software components’ on a server or (b) one or more IaaS VMs on that server.
Priority 1 issues exclude situations in which a disaster (as defined later in this document) has occurred where a specific set of different service level commitments apply.
- Priority 2 (P2) Any users are unable to access a critical software component or one or more IaaS VMs are not available. In practice, P2 issues should occur infrequently as issues whose underlying cause is a failure of some kind within the server’s software or hardware (and P2 issues must have their underlying cause within the server) generally affect all users, not just one or some users.
Priority 2 issues exclude any issues which Service Providers should be able to resolve themselves using the various consoles provided with each server, including but not limited to user account management requests and issues such as password resets.
- Priority 3 (P3) Any users are experiencing an issue with one or more critical software components but the issue does not materially impact their ability to work; that is, the issue is an inconvenience (and possibly a major inconvenience) but the users can work around the issue for a short period.
- Priority 4 (P4) A request for a configuration change to the server’s software platform that only we can make (because of its complexity or side-effects) or a request for advice/support on usage best practice.
The ‘critical software components’ on a server are those which immediately affect any user’s ability to access the services and applications running on the server whose management is our responsibility. They include the Active Directory Domain Controller, Network Gateway, Security Gateway and Fileserver but they exclude, for example, the Local Backup Software and Cloud Backup Software, neither of which immediately prevent any user from working effectively. The critical software components also include any Managed Applications running on the server as well as any Infrastructure-as-a-Service Virtual Machines on the server but they exclude any third party applications installed by the Service Provider or Customer in those Custom Virtual Machines as the responsibility for these lies with the Service Provider or Customer.
What do we mean by ‘Availability’ and ‘Service Loss’?
A server is deemed to be ‘available’ if it has not experienced a Priority 1 issue that has not yet been resolved. A Service Loss corresponds to any period of time at any time or the day or night on weekdays or at weekends (that is, 24 x 7) during which a server is not ‘available’ except for periods of time resulting from the following:
- Scheduled patches and upgrades that are periodically applied to all servers. These can cause periods of Service Loss outside the normal working day in the time zone in which the server is located. The schedule for these is published in advance to Service Providers and Customers.
- Emergency security patches that very occasionally need to be distributed to servers at short notice. The decision to apply these and the timing of their application is at our sole discretion and we will only apply such patches if we believe that either the availability of the server or the integrity or privacy of the Customer’s data are at risk.
- Factors beyond our reasonable control including issues caused by (a) the Customer, (b) other technology in the Customer’s infrastructure that interacts with the server, (c) third parties not contracted to us such as utility and dependent service providers that fail to provide continuous service (e.g. power, connectivity) or (d) natural disasters and force majeure.
How is the duration of a ‘Service Loss’ measured?
A Service Loss is deemed to have commenced at the earlier of (a) the Service Provider reporting the issue to our Support Team and (b) the time at which the Service Loss was detected by our automated monitoring capabilities. A Service Loss ends when all the critical software components have been restored to their correct working state and the Service Provider has been notified of this restoration of service.
What is the Support Service?
The Support Service is a collection of service-based commitments that we make in relation to a server. It defines the hours during which support and maintenance services will be available to a Service Provider and the speed with which those services can be expected to be delivered.
What are the service commitments associated with the Support Service?
The Support Service and the service commitments it comprises are as shown in the table below:
SUPPORT SERVICE SUMMARY
1. Hardware component failure response and resolution time depends on the HPE hardware support service purchased with the server
2. As they are service-affecting, P1 and P2 issues must always be reported by phone even if reported by email or through the Support Portal as well. This is to ensure prompt action in the event of any IT or communications issues that could delay the receipt of email or Support Portal requests.
2. P3 and P4 issues can be reported to the Support Team 24 x 7 by email, through the Support Portal or by phone. The Support Team will start to look into them the following working day and the resolution time targets mentioned above commence at 8am on the following day.
DISASTER RECOVERY SUMMARY
- Achievement of the target recovery point service commitment will depend on the rate of change of data during the days immediately preceding the disaster event and on the available upload bandwidth to the Internet.
- Achievement of the Cloud Recovery Time target will depend on the volume of customer data (files, databases, etc) on the server at the time of the last Cloud backup that was completed prior to the disaster event and on the number of distinct custom Virtual Machines that are present on the server at this time.
Support Team Response Time
How quickly can Service Providers expect the Support Team to respond?
The Support Team response time is a measure of the speed with which we will endeavour to (a) understand the nature of any request or issue raised by a Service Provider, (b) allocate a priority level to it based on severity and impact (as described in the priority definitions set out previously) and (c) start to action the request or investigate the issue. The target response time varies based on the severity and impact of the issue and on the Support Service.
Hardware Component Failure Resolution Time
How quickly can Service Providers expect hardware failures to be resolved?
Hardware fault resolution is the process of restoring a server to its normal operating state following a hardware component failure. The service commitments associated with the process of hardware fault resolution apply only if the hardware has been supplied as an integral part of the subscription; that is, the hardware was not purchased separately.
Hardware component failures are resolved using the HPE Care Pack that is associated with the Support Service. The terms and conditions associated with hardware failure resolution are those published by HPE for its Care Packs.
The hardware component failure resolution time is measured from the earlier of (a) the time that the Service Provider reported the fault to us, and (b) the time that we detected the fault using our automated monitoring tools. It is deemed to have been resolved when the server becomes ‘available’ for use again.
Data Recovery Point and Recovery Time
What do we mean by a ‘disaster’?
A disaster is an event which destroys or damages beyond repair a server or renders it inaccessible indefinitely. Examples of disasters include theft, flood or fire.
What do we mean by ‘disaster recovery’?
Disaster recovery (or DR) is the process of restoring a server to its normal operating state after a disaster with the software and data resident on the server being in its state at the time that the most recently Cloud Backup of that software and data commenced.
What are the key steps in the ‘disaster recovery’ process?
The first step in the recovery process is the rapid restoration of the Customer’s software and data into the Cloud so that it is accessible to the Customer there. We perform this restoration if requested by the Service Provider and endeavours to complete it within the specified Cloud Recovery Time for the Support Service.
The second step in the recovery process is the full restoration of the Customer’s software and data on a new server on the Customer’s premises (which may differ to the original premises if the premises were also damaged during the disaster event).
- If the server hardware was supplied by us, we will restore a server of identical specification on the Customer’s chosen premises, pre-loaded with all the Customer’s software and data that were on the server at the time the most recently completed Cloud Backup commenced. We will endeavour to complete this within the specified On-Premise RTO based on the Support Service that is applicable.
- If the server hardware was not supplied by as an integral part of the subscription but was purchased by the Service Provider or Customer, the hardware owner is responsible for the supply of identical new hardware following the disaster from the same party from whom the original hardware was purchased. We will restore the Customer’s software and data to the new hardware based on the state of the software and data at the time that the most recently completed Cloud Backup commenced. No commitment to the On-Premises Recovery Time is offered in this case as the speed of restoration will depend on the speed of sourcing of the new server hardware and on the Customer’s available download bandwidth. This is because the restoration can be performed remotely (with much higher available internet download bandwidth) by us if we have supplied the hardware - this leads to a much faster restoration time.
To download a PDF file of this article please click on the link below