by David Vose
Schedule risk analysis offers many benefits. It is easy to perform and the extra knowledge required can be learned in just a few hours. This paper describes the techniques that are used and why they are important.
If you are interested in learning more, Vose Software offers a four-hour online schedule risk modeling training course. The cost is very modest and you will be taught all the techniques necessary to perform a good quality, easily understood and genuinely useful schedule risk model by a seasoned risk analyst. Better still, the only software tools you will need are Excel and the Standard edition of ModelRisk. To download a free trial of ModelRisk, please click here.
We constantly hear in the news about projects being delayed. I am sure you will be able to think of plenty of examples in your own experience of projects that have suffered unexpectedly large delays. It can be very costly if an important project takes longer than planned:
- As a contractor, there can be heavy penalties for exceeding a deadline;
- As an investor, large delays in the system you invest in becoming operational can create a huge cashflow problem. Many otherwise promising companies go out of business because of cashflow problems due to delivery and payment delays;
- In some settings, even a small delay can be disastrous. For example, in our consulting work we have looked at a shopping center that might fail to be ready in time for the Christmas rush, a potential failure to refit an offshore rig within a weather window, a self-assessment tax system that may not be ready in time for the changeover date announced by the government, and the building of storage tanks that may not be ready for the opening of a new pipeline – delaying use of the entire system;
- Sometimes it is necessary to put more resources on a project at great cost, to rush the project with lots of resultant quality issues, and/or reduce a project’s scope to be able to meet a deadline.
With so much at stake, you might wonder why the time to delivery of mission-critical projects is so frequently under-estimated. The usual project planning approach is to start off with a rough sketch of the project plan based on the project’s work breakdown structure (a hierarchal list of everything that needs to be done). Project approval is given based on a review of the detailed cost estimate and perhaps a detailed schedule analysis. Both cost and delivery time estimates tend to be fixed values, i.e. the uncertainty has not been properly investigated yet. A software tool like Primavera allows one to build a highly detailed model of all the tasks that need to be completed, the movement of human and machinery resources that will be needed for each task, and build out a plan of the order in which the tasks will be performed, leading to milestone and final delivery estimates. These models can be very useful for the project manager to keep track of who is doing what, and to prepare for upcoming tasks. The level of detail, and the feeling that one has thought about pretty much everything, can give a false confidence in the estimated delivery time. The main value of such models lies in the detailed management of a project that has been approved, but by then the deadlines have been set.
Unfortunately, it is often only once a project gets through several approval stages that people start paying close attention to the delivery date uncertainty, and by then there is a strong expectation (even a commitment) of when the project will be delivered. The very large and detailed project plan one might have created in a tool like Primavera is not the appropriate starting point for a risk analysis, because it requires the planner to get uncertainty estimates for too many tasks.
A better approach is to produce a high level schedule risk analysis model at the early stages of the approval process for the project. A model of some 30-100 inter-connected tasks can be quite sufficient. Its sole purpose is to obtain a realistic risk-based estimate of the final delivery date and perhaps some intermediary milestones. The model can be built in Excel with a Monte Carlo add-in like ModelRisk very quickly and with only a basic knowledge of risk analysis modeling. Reducing a complex project down to so few tasks can seem like an excessive simplification, but the focus of this model is on the uncertainty of the timeline not the detail of the actual activities. A ‘risked’ estimate of the duration of each task is used instead of the ‘best guess’ and potential risk events that could delay the project are included. The results are usually very surprising. Almost without exception, a schedule risk analysis model will demonstrate that a delivery date based on ‘most likely’ estimates of duration is extremely unlikely to be achieved. The rest of this short paper explains why.
Reason #1: The most likely estimate of the duration of a task is optimistic
One most commonly uses the most likely estimate for each task’s duration, for example: “we expect it will take 15 days to demolish the old building”. Maybe things will go really well and it will take a few days less, but if things go badly (asbestos was found, a pipe was broken, the steel was harder than expected, the machinery was broken, there was flooding, the foreman was sick – the list is endless) it could take many days more. In almost everything we do there is more opportunity for a task to take longer than expected than there is for it to take a shorter time. Represented as a probability distribution, this means that the duration has a longer tail to the right than to the left:
In Figure 1, the most likely value (also known as the mode) is located at the red line, which is where the probability curve is at its peak. This has a value of 15. To the right of that, where the green line lies, is the 50:50 mark, known as the median, This is the value for which there is a 50% probability of being below (and therefore above). The relative positions of the mode and median implies that there is a greater than 50% probability that the task will take longer the best guess estimate of 15 days. To the right of the median is the average value located by the blue line.
There are two useful lessons to get from this plot. First of all, the relative positions (mode, then median, then mean) of the three plotted points are always the same if the distribution has a single peak and a longer right tail. Since the distribution of most task durations will have a longer right, this demonstrates that using the ‘most likely’ value in a schedule plan will systematically underestimate a task’s duration.
Another probability rule known as Central Limit Theorem says that the sum of a large number of random variables will tend to follow a Normal distribution with a mean equal to the sum of the means of the random variables. To illustrate the importance of this rule, consider the following set of tasks which have to be completed one after the other (Figure 2).
Using the same shape as Figure 1 to keep things simple, and changing the scale only, we can come up with a set of mode, median and mean values to use:
A simulation model run with ModelRisk gives the distribution for the total project duration shown in Figure 3. Using the most likely (mode) estimates, the total project duration would work out to be 174 weeks. Figure 3 shows that the probability of finishing the project within 174 weeks is exceedingly small, so this estimate is very optimistic indeed. Recognizing that the mode is quite optimistic, one might consider using the median (50:50) values, which would give a total project duration estimate of 197.6 weeks. There is still less than 30% probability of finishing the project before this time. People often mistakenly believe that adding together the medians will give a total project duration estimate at the 50:50 point.
The Central Limit Theorem rule described above says that the sum should follow a roughly normal distribution with a mean of 208.8 weeks. A Normal distribution is symmetric about its mean, which implies that there is roughly a 50% probability of being below the 208.8 weeks estimate. Figure 3 shows that the median and mean of the project duration’s distribution almost exactly match at around 207 weeks i.e. that indeed adding up the task mean values gives a total which only has a 50% probability of being achieved.
Some attempts were made to recognize this problem. The Project Evaluation and Review Technique (PERT) involved getting three estimate (low L, mode M and high H) and then using a duration equal to:
(L + 4M + H)/6
In other words, the PERT estimate is an average of all three values with four times the weighting for the mode. The choice of the 4 value was somewhat subjective and studies showed it should be varied between different types of project. This goes some way to removing the bias discussed so far, but doesn’t help with Reason #2 below.
Reason #2: Tasks done in parallel take longer than you think
Project plans are designed to minimize the amount of time necessary to complete the project. The plan will usually involve running tasks in parallel as much as possible to compress the delivery time. However, the more this is done, the more uncertain the delivery time becomes as illustrated in this example. Consider the segment of a project schedule shown in Figure 4.
Two tasks A and B need to be performed in parallel, before Task C can start. The time required before C starts can be calculated as startC = MAX(endA, endB). If Tasks A and B have the same duration as Figure 1, this formula gives a Start time distribution for Task C as shown in Figure 5. The median and mean values are marked, together with the 15 week estimate we would have if a static model had been used with the most likely values for Tasks A and B. Individually, each of Tasks A and B had a 26% chance of taking less than the most likely value (=15) but when the tasks are run in parallel, they have a much lower probability of both being achieved within that time (in fact, 26% * 26% = 7%) so once again using the most likely estimates for each task duration leads to a highly optimistic estimate of project duration.
This example clearly illustrates how schedules that have tight parallel tasks can be very vulnerable to overrun, and the extent of that overrun risk can only be properly appreciated if one performs a risk analysis. The more compressed a project plan becomes, the more parallel tasks there will be and the closer they will all lie to the critical path. This in turn makes the uncertainty about the delivery date progressively larger. The project sponsor needs to find a balance between the risks of an ambitious schedule against the costs of an overrun.
Reason #3: Tasks uncertainties are correlated
There are often factors in common between several tasks that affect the duration of each task. For example, the same team may be working on several tasks, and if they are under-staffed, poorly managed, unmotivated or unskilled this will increase the duration of all the tasks they work on. The more critical the team’s contribution, the longer a task may take. In a risk analysis model, this type of effect can either be modeled explicitly with logic that describes the effect of the performance of the team, or more simply (and abstractly) using a copula (a mathematical tool for modeling correlation) to represent the correlation relationship.
Modeling correlation can be critical for a good quality schedule risk analysis. Failure to recognize and model correlation almost always leads to a significant underestimate of the uncertainty of a project’s delivery date.
A copula allows one to simulate task durations so that, for example, when one task takes a long time, a second task does so too (positive correlation). The scatter plots of Figure 6 show the effect of using a copula – each dot represents the duration of the two tasks that arise in a simulated scenario.
There is no relationship between the task durations when they are uncorrelated (left pane). An example of positive correlation (the durations go up and down together) is shown in the central pane. This example models a situation where there is very tight correlation at the minimum values of the Task durations, but a much looser correlation at the maximum values. The right pane shows an example of negative correlation (if one task takes a longer time to complete, the other task takes a shorter time) where the tightest correlation is at the maximum of Task 1 and the minimum of Task 2. Negative correlations occur far less often than positive correlations and tend to decrease the delivery time uncertainty.
Modeling correlation is quite simple to do in ModelRisk, and the range of correlation structures (copulas) available in ModelRisk allow users to express precisely the type of relationship they envisage.
Reason #4: Estimates of task duration uncertainty are too narrow
It is a recognized psychological phenomenon that we humans tend to be over-confident about how much we know, or how well we can predict. At Vose Software, we perform a simple exercise in our risk analysis training classes to illustrate this. Each participant is asked to estimate eight known quantities (like the mass of a ping pong ball in grams, or the population of France – the questions are varied to suit the audience). They are instructed to give a three point estimate: low, best guess, and high, with the understanding that they should be around 90% confident that the true value lies somewhere between low and high. When we analyze the results we find that the true value lies between low and high for only 35%-40% of the responses. In other words, the participants did not provide anywhere near a sufficiently broad range.
Now, apply this to risk analysis modeling and we can have a problem. If the estimates of task duration uncertainty in a schedule risk analysis are systematically too narrow, we will end up with an unrealistically narrow uncertainty about the delivery date. That is perhaps worse than doing no risk analysis at all since one has a false confidence in the delivery date – at least when a single estimate was used everyone knew it was a ‘guess’.
Fortunately, there are a number of elicitation techniques for getting good quality uncertainty estimates, and for monitoring and calibrating them to maintain and improve their quality.
Reason #5: Not including risk events
Developing a risk register is standard practice in project management these days. The risk register is a (usually very large) database of events that may occur with a description of the scenario, the probability of occurrence, and the impact they may have on cost, time to delivery and quality of the deliverable. There will also often be details about a mitigation plan and some analysis of the most cost effective risk management plan.
Risks described in a risk register are often not modeled within the project schedule. Sometimes this is because it is too difficult (usually because the project plan is too detailed, or the risk register changes very frequently), often it is simply not considered useful. In general, it is not very useful to include the effect of very low probability (say less than 1 in 1000) risk events, unless there are many of them, because they have no significant impact on the metrics people typically use for delivery uncertainty (like the histogram or cumulative plots above, the P10 and P90 – dates we estimate to have a 10% and 90% chance of delivering before respectively).
The effect a risk event has on a project schedule is highly dependent on which task(s) are delayed. If it only has a moderate impact on a task that is not on the critical path with a lot of slack, then the delay effect may be negligible, and vice versa. Thus, the delay and cost impacts (resulting from delays) for each risk need to be kept up-to-date with changes in the project schedule, but rarely is this done systematically, and thus the risk register will fail to provide the correct guidance for prioritizing risk management.
The best approach is to keep a manageable list of the most important risks impacting a schedule and incorporate these into the simple project plan. There are a number of analytical tools available within ModelRisk’s Results Viewer that help determine the net impact of each risk and thus guide the manager towards an appropriate way of mitigating each risk. The key weakness in incorporating risk events is the estimation of probability. For example, we might estimate a risk event (like a chemical explosion) to have an impact of 60-150 days and be fairly sure that this is about right, but could not say whether 1% or 5% would be a more realistic estimate of probability, yet there is a five-fold difference between them. We often see analysts try to incorporate some uncertainty about the probability estimate, for example by using Uniform(1%,5%) in the above scenario. However, in a Monte Carlo simulation this is the equivalent of just using the mid-point value of 3%, so we haven’t added any extra uncertainty.
There are some techniques for estimating the probability of risk events, which involve decomposing the event into a sequence of individually more likely events that are easier to estimate.