On 22nd December, 2021 HKMA (Hong Kong Monetary Authority) came up with a Supervisory Policy Manual for Operational Resilience to provide Authorized Institutions (AI) with guidance on the general principles which they are expected to consider when developing their operational resilience framework.
When is an AI said to be Operationally Resilient?
An AI is said to be operationally resilient if it can satisfy the following
- Identify and mitigate risks that may threaten the delivery of critical operations.
- Continue to deliver critical operations when disruptions occur, including under severe but plausible scenarios and disruptions must not exceed its tolerance for disruption.
- Resume normal operations promptly after disruptions occur
- Learn from disruptions or close-misses to continually improve
What is critical operation and its tolerance for disruption?
Critical operation refers to activities, processes and services performed by the AI, as well as the supporting assets (including people, technology, information and facilities) necessary for the delivery of such activities and services, which if disrupted, could pose material risks to the viability of AI itself or impact the role of AI within Hong Kong’s financial system.
A tolerance for disruption is defined as maximum level of disruption to a critical operation that an AI can accept and is in practice the point after which further disruption would pose a threat to the viability of the AI or impact its role within the Hong Kong financial system. Situations that would result in significant disruptions, and while unlikely to occur, remain probable refers to severe but plausible scenarios.
Components of Operational Resilience Framework
It is important to note that developing operational resilience is an ongoing process. The process will not always be linear. An AI should actively apply what it learns from its implementation of the framework and the management of actual incidents to continuously improve the effectiveness of the framework.
The following components constitutes an operational resilience framework
- Active participation of Board and senior management
- Identify critical operations
- Identify and set tolerance level for disruptions of critical operations
- identify severe by plausible scenarios which could cause disruption to critical operations
- Mapping the interdependencies and interconnections to deliver the critical operations
- Identify risks and preparation for risk management to manage the identified risks
- Scenario testing to assess if an AI is able to deliver critical operations through disruption and in severe but plausible scenario
- Incident management program to effectively respond to and manage disruptions to critical operations delivery
- Active participation of Board and senior management
The board should take an active role in establishing a broad understanding of the AI operational resilience framework. It should clearly communicate the objectives of the framework to all stakeholders, including staff, intra-group entities and third parties. Regular training on the AI’s operational resilience framework should be provided to these parties to reinforce their understanding.
- Identify Critical Operations
Some of the criteria`s to be considered to identifying the critical operations are
- Impact on customers and personnel
- Impact on financial reputation
- Legal and regulatory implications
- Role played by the AI in the financial system
- Identify and set tolerance levels
A tolerance for disruption should be set for every critical operation. It must include at least one time-based metric, but can also include a combination of other quantitative (eg. volume or value of transactions) and qualitative metrics (e.g. reputational or legal implications).
AIs must be aware that their operational capabilities may vary during different business cycles or as a result of seasonal factors and should be considered while setting tolerance.
- Identify severe but plausible scenarios
AIs should identify a range of scenarios of different natures, severity and duration relevant to its business and risk profile. Examples of scenarios AIs may consider include, but are not limited to, pandemics, natural disasters, and failures or disruptions at a third party or within the third party’s supply chain.
When identifying the scenarios, AIs should refer to previous incidents or near misses within the institution or across financial sectors, as well as in other sectors or jurisdictions, or any situations that could result in significant disruptions given the changing operational landscape.
- Mapping interconnectedness and interdependencies
(a) The appropriate functions within an AI should identify and document:
- The people, processes, technology, information, facilities; and
- The interconnections and interdependencies among these factors that are necessary for the AI to deliver its critical operations. When considering
- An AI should also include those interconnections and interdependencies that depend on third parties and intragroup arrangements.
(b) The approach and level of granularity of mapping should be sufficient to enable the AI to identify vulnerabilities and facilitate the testing of the AI’s ability to deliver critical operations through disruptions.
(c) The mapping documentation should be prepared in a way that it should also be usable by all relevant parties in the event of disruptions.
(d) AIs are expected to update their mapping documentation on a regular basis, but no less than annually or following any material changes to their operations.
- Preparation for and managing risks
(a) AIs should be prepared to manage all risks with potential to affect critical operations delivery. As a given critical operation may face a number of risks, AIs should leverage different risk management frameworks, as appropriate, to offer holistic and comprehensive support to the critical operation.
(b) The HKMA expects that AIs should, at a minimum, take into consideration the following risk management components
- Operational Risk Management
- Business Continuity planning and testing
- Third Party dependency management
- An ICT (Information & Communication Technology) policy including cyber security
- Scenario Testing
(a) AIs should conduct regular testing of their operational resilience framework to ensure that they are able to continue delivering their critical operations through disruptions, including under severe but plausible scenarios.
(b) When considering the testing requirement, AIs should take into account the following:
- The testing exercises should include realistic assumptions, and should encompass the AI’s interconnections and interdependencies, including those through relationships with intragroup entities and third parties.
- The frequency of testing should be determined based on a variety of factors, including the potential impact of a disruption, how many critical operations an AI has, and whether the operating environment has materially changed.
- Different types of testing (e.g. paper-based, simulations or live-systems testing) serve different purposes and AIs should deploy the most appropriate type of testing based on the nature or needs of the specific testing exercise.
- AIs should deploy staff with appropriate expertise to conduct the testing. The testing approach should dictate the type of staff involved, including their seniority, qualifications as well as the function
- AIs should consider how they may leverage the testing exercises to enhance their staff’s operational resilience awareness and readiness to operate during disruptions, thereby improving their ability to effectively adapt and respond to different types of disruptive events.
(c) Where practicable, AIs may leverage on existing testing arrangements, including the arrangements devised for business continuity planning purposes, to fulfill the testing requirement relating to operational resilience. An AI should be able to demonstrate how an existing testing exercise enables it to achieve the specific objectives of scenario testing for operational resilience purposes.
(d) After each testing exercise, an AI should prepare a formal testing report to record any gaps or weaknesses identified, as well as document the remedial actions planned. The reports should be reviewed by the AI’s senior management.
- Incident Management Program
The incident management programme should capture the full life-cycle of any incidents and involve:
- Classification of the severity of an accident according to predefined criteria. This should enable the AI to prioritise and allocate resources to respond to an incident.
- Incident response and recovery procedures. These should be reviewed, tested and updated on a regular basis. Their connection to the AI’s business continuity, disaster recovery and other associated management plans and procedures should also be clearly documented.
- Communication plans for reporting incidents to both internal and external stakeholders. Communication should take place during the incident (e.g. to provide performance metrics), and after, including to convey analysis of lessons learned.
- Root cause analysis of incidents to help with the prevention or minimisation of recurrence.
- The incident management programme should be supported by an inventory of internal and third party resources to enable prompt incident response and recovery.
Implementation of Operational Resilience requirement
Application
The requirements contained in this module apply to all AIs. Locally incorporated AIs should endeavour to implement the guidance of this module with respect to their subsidiaries and overseas operations, and for overseas incorporated AIs with respect to their operations in Hong Kong.
Timeline
By 1 year after the date upon which the final module is issued, the HKMA expects an AI to have:
(a) Develop its operational resilience framework; and
(b) Determine the timeline by when it will implement the operational resilience framework and become operationally resilient.
(c) HKMA has decided to allow AIs up to 2 years to become operationally resilient.
After this point in time, an AI will be expected to have fully implemented its operational resilience framework, including to have conducted scenario testing, and be able to satisfy the requirements.