Archive for category Business Continuity Planning
Requirements for Business Contingency and Continuity Plans
By: Amy Wees
April 21, 2013
Abstract: Technology plays a vital role in business and threats to technology are constantly evolving. Businesses must be ready to react to a multitude of situations from a computer virus to a hurricane. The only way to react successfully is to have a well-written, well-tested contingency and continuity plan. The steps to planning include identifying threats through Business Impact Analysis (BIA), planning for mitigation of risks or reduction of impact to the business through contingency plan development, and setting up recovery options such as backup sites. Finally, the plan must remain actionable and up-to-date, and the best way to ensure this is through training personnel and testing the plan on a regular basis.
Requirements for Business Contingency and Continuity Plans
On 17 April, 2013 a giant explosion ripped through the small town of West, Texas after the West Fertilizer Company plant caught fire. The cause of the fire is still unknown, but many people were killed in an attempt to extinguish the massive blaze, air traffic over the area was halted due to the dangerous chemicals released, and miles of structures surrounding the plant were damaged and evacuated (Eilperin & Fears, 2013). Many are probably wondering how this happened and if the explosion could have been prevented. The Environmental Protection Agency (EPA) reported that the fertilizer plant was fined in 2006 for a lacking risk management plan that failed to address safety hazards, employee training, and maintenance procedures. Furthermore, the owner does not know how he will recover from this disaster (Eilperin & Fears, 2013). Even if West Fertilizer has insurance to cover the damage of the building and company assets, the costs during the disaster recovery could be far more than West can afford. Insurance may not cover the medical expenses and deaths of the citizens harmed from the explosion. How will displaced employees be paid? Will there be law suits? Did pertinent company data needed to continue operations or file damage claims get lost in the fire?
Although the fire may not have been preventable, a contingency and continuity plan would help West Fertilizer Company pick up the pieces and continue operations. West Fertilizer is not alone in their lack of business continuity and disaster recovery planning. A survey conducted by OpenSky Research in 2006 showed that almost half the businesses in America had no business continuity plan in place. Of the companies that did have plans, the survey reported that the greatest motivation was the reputation of the business and customer satisfaction, followed by compliance with regulations and past experiences with operational hiccups. Businesses reported that network operations, malware and data corruption were considered highly threatening along with natural disasters such as fires and blackouts. Businesses without a plan reported budgetary and resource constraints as primary factors (On Windows, 2006).
It is obvious businesses should be concerned with contingency and continuity planning as it is only a matter of when, not if, something happens that can shut the business down. Today more than ever, businesses are dependent on technology such as computers, networks, mobile devices and the Internet to run their businesses. Protecting these assets from cyber security threats and service disruptions is paramount to the bottom line and customer satisfaction. However, in order to convince management that business continuity planning is a worthwhile investment management must understand the return on their investment and design a plan that weighs the benefits of implementing cyber security, maintenance, and safety protocols against the costs of installing these protocols. The argument for a plan must help management see a Return on Investment (ROI) so that forecasted returns on money spent can be estimated. In calculating a ROI, the purchase of the proposed solutions, the cost of employee training, and the cost of paying the staff who will manage the solutions should be included. This calculation will account for the Total Cost of Ownership (TCO) for the investment. If costs are not projected accurately, management may reject the proposal or restrict the budget (UMUC, 2011).
This paper will cover the steps to identifying threats and risks to a business, creating and maintaining business contingency and continuity plans, options for recovery of data and business operations, and recommendations to put the plan into practice by conducting business continuity testing for a twenty-four month testing cycle.
Developing Business Contingency Plans
According to the National Institute of Standards and Technology’s (NIST) contingency planning guide for federal information systems, there are seven key steps to developing a plan: 1) Construct the contingency planning policy; 2) Complete a business impact analysis (BIA); 3) Pinpoint preventive measures; 4) Produce contingency approaches; 5) Create an information system contingency plan; 6) Conduct testing, training, and exercises; and 7) Ensure the plan is maintained (Swanson, Bowen, Phillips, Gallup & Lynes, 2010). Although these steps are written specifically for federal systems, they can be used by any businesses as an overall framework to develop a contingency and continuity plan. For the purpose of this paper, the seven steps are simplified to three broader areas: 1) Identify threats to the business; 2) Create a plan to alleviate or lessen the impact of the threats; 3) Train personnel and test the plan to ensure accuracy (Cerullo, V., & Cerullo, M. J., 2004). Authors should keep in mind during plan development that all steps should be documented, actionable, and most importantly, kept up to date (Balaouras, 2009).
Identify Threats to the Business
The first aspects a company must consider when creating contingency and continuity plans are the potential threats to the business. Some threats will be different depending on the type of business. For example, an Internet based company may be more concerned with cyber threats such as malware and viruses than a small retail store with little to no web presence. The retail store, on the other hand, may be more concerned with protecting databases containing customer credit card information. A defense contractor may see a competitor accessing their intellectual property as the largest threat to the business. There are also threats that impact every business such as natural disasters, electrical outages, and fires which must be taken into consideration.
Business Impact Analysis
No business is exempt from harm or disruption, however, threats may not always be easy to quantify or identify. For this reason, a Business Impact Analysis (BIA) can assist in identifying the primary areas affected by a disaster or contingency. A BIA will distinguish the services and functions most critical to the business’ bottom line, and classify those services and functions according to their effect on the business, level of risk, and likelihood of occurrence. A recommendation is made on whether to avoid, mitigate, or absorb the risk and methods in which to do so. Management may also choose to delve further into the identified risks by conducting risk assessments (Cerullo, V., & Cerullo, M. J., 2004).
The first step when conducting a BIA is to identify the primary business processes and supporting systems and the criticality of recovering the associated processes/systems. The impacts of a system outage are determined to include projected downtime, indicating the maximum downtime that can be tolerated whilst allowing the business to maintain operations. Possible work-around options should also be listed. Management and process owners should work together to create a comprehensive list of processes, process descriptions, and systems directly related to these processes (Swanson, Bowen, Phillips, Gallup & Lynes, 2010).
The next step in the BIA is to identify resources required to continue primary processes and any interrelated or dependent systems/assets. Considerations for a thorough resource listing are facilities, staff, hardware, software, electronic files, system elements, and critical records (Swanson, Bowen, Phillips, Gallup & Lynes, 2010). Some companies may have a configuration manager or other information systems manager that maintains this information. The constant changes and updates in technology make updating this list on a regular basis relevant. An example table of a listing of assets follows:
Table 1: Company ABC Critical Resources
|System||Platform/Version||Primary User||Critical Process||Dependencies|
|Exchange Server||Windows Server 2008||All users internal and external||Ensures mail sent/received||Domain Controllers, Active Directory Servers|
The final step in BIA is to set priorities for recovery of various systems linked to critical processes identified in step one. Systems should be recovered in the order of criticality to the business and alternate available options (Swanson, Bowen, Phillips, Gallup & Lynes, 2010). For example, if the previously mentioned small retail business loses its point of sale (POS) system, cashiers may be able to add up the cost of various items and collect cash from customers for a short period of time, but there will be a maximum amount of time before the business starts to lose customers. Therefore, the POS may be the most critical asset to recover on their list. Secondary to the POS may be the inventory system. Many retailers depend on an automated inventory system to track incoming deliveries, sales, and order new supplies as well as pay suppliers for items received. These systems are immensely complex and keeping track of inventory on paper and later having to update the recovered system could be costly in man-hours and mistakes. Third for the retailer may be the store security system. Although employees could be posted at the door to check receipts against purchases, the amount of theft may increase, and the store could lose valuable evidence related to a crime or incident that occurs.
Create a plan to alleviate or lessen the impact of the threats
Now that the BIA is complete, the business can work on a plan to mitigate the identified risks. According to Swanson, Bowen, Phillips, Gallup and Lynes (2010), there are three phases to a contingency plan to include supporting documentation such as the BIA, personnel contact information, write-ups of procedures. The three phases are: Activation and Notification; Recovery; and Reconstitution.
Activation and Notification Phase
When a contingency or event occurs that affects a crucial business process the first step is to put the plan into action and notify personnel responsible for and affected by actions. This means the plan must identify primary and alternate team members’ roles and responsibilities. Procedures should include instructions for notifying staff and customers to include contact information and primary duties of personnel internal and external to the organization, locations of alternate work sites, and checklists to follow in order to complete alternate processes while primary means are restored (Cerullo, V., & Cerullo, M. J., 2004). Procedures should be easy to follow and not overly complicated.
After personnel are deployed and active in alternate processes to keep the business afloat, it is time to start recovery of assets affected by the contingency, in the order of priority previously identified during the BIA. The recovery phase will take up the greatest portion of the contingency plan as there are many options to consider, and the costs are high. At a minimum, system back-ups should be created and stored at an off-site location or in a cloud environment on a regular basis to minimize system recovery time and allow for reconstitution from another location. Procedures for system back-up and recovery should be included in the business continuity plan (UMUC, 2011). The entire environment should be in the backup, to include software, executables, databases, training information, and all systems needed to run the operation as the ability to get back to business is dependent on the quality of the backups (Barry, 2012).
According to a 2002 report by the Disaster Recovery Institute International, costs of downtime were from three to seven percent of the information systems budget. Some examples of costs of downtime for company website cited by Cerullo and Cerullo (2004) were $8,000 an hour for leading Internet players, $1,400 per minute on average, and a medium-sized business downtime cost of $78,000 per hour on average with an annual cost of over $1 million due to downtime. Although these costs were estimated for businesses which depend heavily on the Internet, it is pertinent for any business to consider the cost of downtime when looking at options for timely recovery of assets.
There are three options for recovery sites: hot, cold, or warm. Businesses should consult with service providers and software vendors when making a decision about what type of site to use, or whether to outsource this service. A hot site allows for immediate recovery as it should contain all hardware and necessary for operations and can be loaded with current operational and back-up data (Barry, 2012). The hot site can also serve as the location to store off-site back-ups. The greatest consideration for a hot site is the considerable cost of creating and maintaining such a site. A business should consider a hot site when the cost of the loss of systems is greater than the cost of the site (i.e. there is a ROI) and other site options such as cold or warm do not meet the need.
A cold site provides only a facility to operate from without the hardware infrastructure of a hot site. While the cost of a cold site is lower, hardware will need to be acquired along with backups to return to regular operations. Even with robust planning and well trained personnel, a cold site could take weeks or longer for recovery.
A warm site is the happy medium between hot and cold. Warm sites contain some hardware and can contain backup and recovery data, depending on the setup. Unlike a hot site, warm sites do not have the latest configurations loaded and will require a shorter workload for recovery compared to a cold site. Outsourcing is also an option as there are multiple companies that offer a wide range of services (Barry, 2012). Anytime outsourcing is considered, Service Level Agreements (SLA) should be made, adhered to, and updated on a regular basis to cover the changing requirements of the business and the responsibilities of the service provider. As previously mentioned, the quantity and type of systems needing backups, the location of backups, and the steps to recovery depending on the contingency should be thoroughly documented in the contingency plan.
During the reconstitution phase, the system should be validated to determine necessary capability and functionality so the business can return to normal operations. If the original facility is beyond repair, the reconstitution activities can also be helpful in testing and prepping a new location for future use. At this point, deactivation of the plan can occur, and lessons learned can be documented as well as updates to the plan.
The final step in contingency planning is to train personnel to carry out the plan and test the plan for accuracy. Perhaps the toughest part of contingency planning is not only creating an actionable plan, but finding time during normal operations to test it. This is where the buy in of management is so critical. If management does not push the importance of testing, employees will not feel they are stakeholders in the plan or that it is worth their time to test or train for. There are several options for training personnel and testing the plan from hosting plan reviews or table top exercises all the way to complete backup and recovery testing cycles, or a combination thereof. Individual checklists included in the contingency plan could also be given out to key personnel or individual work centers to run through during duty hours and check for accuracy and updates. It can be difficult for system administrators to test system checklists as live systems are critical to operations and cannot be taken down for such purposes. This is where virtual machines can be helpful in that copies of servers can be created from virtual templates using very little system resources allowing for testing and training on systems and having no effect on current operations.
Costs for training personnel and testing the plan should be considered and included in the contingency planning and continuity of operations budget. Potential costs include training and testing man-hours not billable to direct operating costs, purchases of additional technology (such as virtual machines and servers) utilized for testing, and cost of other additional resources necessary for testing and training such as office supplies, use of external facilities, or outsourced vendor training.
24-Month Cycle Business Continuity Testing Plan
Below is a sample testing plan based on a 24 month cycle.
Months 1-2: Plan Accuracy
Plan appendixes are distributed to key personnel in work centers where they will run through their checklists and action items and check for accuracy. Key personnel will train alternates on procedures. Alternates will run checklists to ensure they are repeatable.
Months 3-4: Notification Procedures
Management will choose a table top scenario based on the probability of various threats identified in the BIA. Work centers will practice notification procedures by running through call lists based on the scenario. On duty and off duty emergency contact information will be tested and updated as necessary.
Months 5-6: Activation Procedures
Management will choose another scenario based on the BIA and make note of systems affected. Key personnel will be notified to test their activation procedures based on that scenario. Operations personnel will conduct business processes using alternate procedures, systems administrators will recover backups to alternate hardware (or virtual machines) and operations personnel will attempt processes on recovered systems. This practice will identify lacking procedures in the checklists and data that may not have been backed up or recoverable as well as necessary system configurations after recovery. Checklists and procedures will be updated based on this exercise.
Months 7-8: Reconstitution Testing
Reconstitution is the process of ensuring that a system is fully operational and configured for use. In order to validate a system, users must identify the data needed on the system and procedures for working with that data. This is not covered in the BIA but should be covered in a continuity book for the duty position. Continuity books are created to ensure that someone with limited knowledge of a position can perform basic tasks when key personnel are not available. During this testing phase, personnel will be given an alternate duty position for a specified period of time and attempt to perform routine tasks using the continuity book as their guide. Often in an emergency situation, the person who knows an essential business process best may not be available and it will be paramount for other personnel to be able to fill-in where necessary.
Months 9-10: Updating Continuity Procedures
Based on the last test of continuity books, personnel will utilize months 9-10 to update their continuity documentation and prepare for a disaster preparedness drill in months 11-12.
Months 11-12: Contingency Recovery Drill
In this test, all phases will be tested. Management will choose a scenario from the BIA that would require a move to an alternate facility which is a hot site and ultimately, employees will reconstitute operations at the new site. First, notification procedures will be tested; employees will be prepared for this ahead of time to let them know this is a test of the system. External agencies and customers will also be notified ahead of time that the agency is running this test so as not to affect operations. Employees will start their checklists using alternate procedures for regular operations depending on the scenario until information technology (IT) personnel notify them to move to the hot site. Employees will then move to the hot site and continue operations, identify shortfalls, and update the plan based on the lessons learned during this testing. This type of drill is not recommended for businesses without a hot site as there would be too much risk to operations. However, a table top contingency drill could be similar to this to test employees’ awareness of what to do in various scenarios would be helpful.
Months 13-24: Repeat months 1-12
During the second year, the business will repeat the testing done in the first year and adjust timelines and procedures as necessary to fine-tune the process. Different scenarios can be given, or the same scenarios if management feels employees need more practice. Repetition allows employees to gain confidence in plan execution and creates a mindset of contingency planning as part of day-to-day operations.
Technology plays a vital role in business and threats to technology are constantly evolving. Businesses must be ready to react to a multitude of situations from a computer virus to a hurricane. The only way to react successfully is to have a well-written, well-tested contingency and continuity plan. The steps to planning include identifying threats through BIA, planning for mitigation of risks or reduction of impact to the business through contingency plan development, and setting up recovery options such as backup sites. Finally, the plan must remain actionable and up-to-date, and the best way to ensure this is through training personnel and testing the plan on a regular basis.
Baker, N. (2012). Enterprisewide Business Continuity. (Cover story). Internal Auditor, 69(3), 36-40.
Barry, C. (2012). Backup plans. Multichannel Merchant, 8(5), 36-38.
Balaouras, S. (2009). Businesses take BC planning more seriously. (2009). For Security & Risk Professionals.
Cerullo, V., & Cerullo, M. J. (2004). Business continuity planning: a comprehensive approach. Information Systems Management, 21(3), 70-78.
Eilperin, J., & Fears, D. (2013, April 18). Fertilizer facility explosion injures at least 160 in central Texas; 5 to 15 feared dead. The Washington Post. Retrieved from http://www.washingtonpost.com/world/national-security/fertilizer-plant-explosion-leaves-more-than-100-wounded-in-central-texas/2013/04/18/14fa7cb2-a7ef-11e2-a8e2-5b98cb59187f_story_2.html
Geer, D. (2012). Are You Really Ready for Disaster? Three exercises for testing your business continuity plans. CSO Magazine, 11(8), 16-18.
Karim, A. (2011). Business Disaster Preparedness: An Empirical Study for measuring the Factors of Business Continuity to face Business Disaster. International Journal of Business & Social Science, 2(18), 183-192.
Kirvan, P. (2009, July). Using a business impact analysis (BIA) template: A free BIA template and guide. TechTarget: SearchDisasterRecovery. Retrieved November 4, 2011, from http://searchdisasterrecovery.techtarget.com/feature/Using-a-business-impact-analysis-BIA-template-A-free-BIA-template-and-guide.
Lam, W. (2002). Ensuring business continuity. IT professional, 4(3), 19-25.
On Windows. (2006, March 23). Half of us businesses lack continuity plan. On Windows Magazine, Retrieved from http://www.onwindows.com/Articles/Half-of-US-businesses-lack-continuity-plan/2063/Default.aspx
Rawlings, P. (2013). SEC’s Aguilar Pushes Continuity Plan Testing. Compliance Reporter, 25.
Rucks, A., Ginter, P., Duncan, W., & Lesinger, C. (2011). A Continuity of Operations Planning Template: Translating Public Policy into an Effective Plan. Journal of Homeland Security and Emergency Management, 8(1).
Slater, D. (2012, December 13). Business continuity and disaster recovery planning: The basics. Retrieved from http://www.csoonline.com/article/204450/business-continuity-and-disaster-recovery-planning-the-basics?page=1
Swanson, M., Bowen, P., Phillips, A., Gallup, D., & Lynes, D. (2010, November 11). Retrieved from website: http://csrc.nist.gov/publications/nistpubs/800-34-rev1/sp800-34-rev1_errata-Nov11-2010.pdf
Totty, P. (2009). Business Continuity: Test and Verify. Credit Union Magazine, 75(12), 46.
UMUC. (2011). Module 11: Service Restoration and Business Continuity. Retrieved from http://tychousa.umuc.edu/
Whitworth, P. M. (2006). Continuity of Operations Plans: Maintaining Essential Agency Functions When Disaster Strikes. Journal of Park & Recreation Administration, 24(4), 40-63.
Wold, G. H. (2006). Disaster recovery planning process. Disaster Recovery Journal, 5(1).