Agile is Dead! The Rise of High-Performing Teams: 10 Lessons from Fighter Aviation

Software and hardware industry leaders are leveraging the lessons from fighter aviation to help their businesses navigate the speed of change and thrive in today’s complex and hostile environment. The emergence of the Observe-Orient-Decide-Act (OODA) Loop—an empathy-based decision cycle created by John Boyd (fighter pilot)—in today’s business lexicon suggests that executives, academia, and the Agile community recognize that fighter pilots know something about agility.

For example, Eric Ries, author of The Lean Startup and entrepreneur, attributes the idea of the Build-Measure-Learn feedback loop to John Boyd’s OODA Loop [1]. At the core of Steve Blank’s Customer Development model and Pivot found in his book, The Four Steps to the Epiphany, is once again OODA [2]. In his new book, Scrum: The Art of Doing Twice the Work in Half the Time, Dr. Jeff Sutherland, a former fighter pilot and the co-creator of Scrum, connects the origins of Scrum to hardware manufacturing and fighter aviation (John Boyd’s OODA Loop) [3]. Conduct a quick Google book search on “Cyber Security OODA” and you will find over 760 results.

This fighter pilot “mindset” behind today’s agile innovation frameworks and cyber security approaches is being delivered to organizations by coaches and consultants who may have watched Top Gun once or twice but more than likely have never been part of a high-performing team [4].

So What?

According to Laszlo Block, “Having practitioners teaching is a far more effective than listening to academics, professional trainers, or consultants. Academics and professional trainers tend to have theoretical knowledge. They know how things ought to work, but haven’t lived them [5].” Unfortunately, most agile consultants’ toolboxes contain more processes and tools than human interaction knowhow. Why? They have not lived what they coach. And this is what is killing Agile.

Teaming Lessons from Fighter Aviation

To survive and thrive in their complex environment, fighter pilots learn to operate as a network of teams using the cognitive and social skills designed by industrial-organizational psychologists—there is actually real science behind building effective teams. It is the combination of inspect-and-adapt frameworks with human interactions skills developed out of the science of teamwork that ultimately build a high-performance culture and move organizational structures from traditional, functional models toward interconnected, flexible teams.

10 Reasons Why Your Next Agile High-Performance Teaming Coach Should Have a Fighter Aviation Background

OODA (Observe-Orient-Decide.-Act). According to Jeff Sutherland, “Fighter pilots have John Boyd’s OODA Loop burned into muscle memory. They know what agility really means and can teach it uncompromisingly to others.”

Empathy. A 1 v 1 dogfight is an exercise in empathy, according to the award-winning thinker, author, broadcaster, and speaker on today’s most significant trends in business, Geoff Colvin. In his 2015 book, Humans Are Underrated: What High Achievers Know that Brilliant Machines Never Will, Geoff pens, “Even a fighter jet dogfight, in which neither pilot would ever speak to or even see the other, was above all a human interaction. Few people would call it an exercise in empathy, but that’s what it was—discerning what was in the mind of someone else and responding appropriately. Winning required getting really good at it [6]” Interestingly, empathy is baked-in Boyd’s OODA Loop.

Debriefing (Retrospective). The most important ceremony in any continuous improvement process is the retrospective (debrief). Your fleet average fighter pilot has more than 1000 debriefs under their belt before they leave their first tour at the five-year mark of service. In Agile iterations years, that is equal to 19 years of experience [7]. Moreover, when compared to other retrospective or debriefing techniques, “Debriefing with fighter pilot techniques offer more ‘bang for the buck’ in terms of learning value [8].” Why is this? There are no games in fighter pilot debriefs, no happy or sad faces to put up on the white board – just real human interactions, face-to-face conversations that focus on what’s right, not who’s right. Fighter pilots learn early that the key to an effective retrospective is establishing a psychologically safe environment.

Psychological Safety. Psychological safety “describes a climate in which people feel free to express relevant thoughts and feelings [9].” Fighter pilots learn to master this leadership skill the day they step in their first debrief where they observe their flight instructor stand up in front of the team and admit her own shortcomings (display fallibility), asks questions, and uses direct language. Interestingly, according to Google’s Project Aristotle, the most important characteristic to building a high-performing team is psychological safety [10]. Great job Google!

Teaming (Mindset and Practice of Teamwork) [11]. Although not ideal, fighter pilots often find themselves in “pickup games” where they find a wingman of opportunity from another squadron, service, or country—even during combat operations. Knowing how to coordinate and collaborate without the benefit of operating as a stable team is a skill fighter pilots develop from building nontechnical known stable interfaces. These stable interfaces include a common language; shared mental models of planning, briefing, and debriefing; and being aligned to shared and common goals. Yes, you do not need stable teams and you they do not need to be co-located if you have known stable interfaces of human interaction.

Empirical Process. The engine of agility is the empirical process and in tactical aviation we use a simple plan-brief-execute-debrief cycle that, when coupled with proven human interaction skills, builds a resilient and learning culture. The inspect and adapt execution rhythm is the same around every mission, whether it be a flight across country or 40-plane strike into enemy territory, we always planned, briefed, executed the mission, and held a debrief. There is no room for skipping steps—no exceptions.

Adaptability/Flexibility. The ability to alter a course of action based on new information, maintain constructive behavior under pressure and adapt to internal and external environmental changes is what fighter pilots call adaptability or flexibility. Every tactical aviator who strapped on a $50M aircraft knows that flexibility is the key to airpower. Every flight does not go according to plan and sometimes the enemy gets a vote – disrupting the plan to the point where the mission looks like a pick-up game. 

Agility. Agility is adaptability with a timescale.

Practical Servant Leadership Experience. Fighter pilots have practical experience operating in complex environments and are recognized as servant leaders. But don’t take my word for it; watch this video by Simon Sinek to learn more.

Fun. Agility is about having fun. Two of my favorite sayings from my time in the cockpit are “You cannot plan fun” and “If you are not having fun, you are not doing it right.” If your organization is truly Agile, then you should be having fun.

So, who’s coaching your teams?

Brian “Ponch” Rivera is a recovering naval aviator, co-founder of AGLX Consulting, LLC, and co-creator of High-Performance Teaming™, an evidence-based approach to rapidly build and develop high-performing teams.

[1] “The idea of the Build-Measure-Learn feedback loop owes a lot to ideas from maneuver warfare, especially John Boyd’s OODA (Observe-Orient-Decide-Act) Loop.” Ries, E. The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. (Crown Publishing, 2011)

[2] “…Customer Development model with its iterative loops/pivots may sound like a new idea for entrepreneurs, it shares many features with U.S. warfighting strategy known as the “OODA Loop” articulated by John Boyd.” Blank, S. The Four Steps to the Epiphany. Successful Strategies for products that win. (2013)

[3] “In the book I talk about the origins of Scrum in the Toyota Production Systems and the OODA loop of combat aviation.” Sutherland, J. Scrum: The Art of Doing Twice the Work in Half the Time. New York. Crown Business (2014).

[4] I do not recommend the movie Top Gun as an Agile Training Resource.

[5] Block, L. Work Rules! That will transform how you live and lead. (Hachette Book Group, 2015).

[6] Geoff Colvin. Humans are Underrated: What high achievers know that brilliant machines never will, 96, (Portfolio/Penguin, 2015).

[7] Assuming two teams with iteration length of two weeks. And 100% retrospective execution.

[8] McGreevy, J. M., MD, FACSS, & Otten, T. D., BS. Briefing and Debriefing in the Operating Room Using Fighter Pilot Crew Resource Management. (2007, July).

[9] Edmondson, A.C. Teaming. How organizations Learn, Innovate, and Compete in the Knowledge Economy. Wiley. (2012)

[10] Duhigg, C. Smarter Faster Better: The Secrets to Being Productive in Life and Business. Random House. (2016).

[11] Edmondson, A.C. Teaming. How organizations Learn, Innovate, and Compete in the Knowledge Economy. Wiley. (2012)

Share This:

17 Ways to Stop Your Organization’s Agile Transformation

In 1944, the Office of Strategic Services (OSS), now known as the Central Intelligence Agency (CIA), published the Simple Sabotage Field Manual which provides organizational saboteurs—let’s call them managers and employees who are on the wrong bus—a guide on how to interfere with organizational development and transformation.

As an Agile and High-Performance Teaming™ Coach, I have observed the following 17 tactics found in the Simple Sabotage Field Manual skillfully employed by managers and employees who clearly do not want their organizations to survive and thrive in today’s knowledge economy:

  1. When training new workers, give incomplete or misleading instructions.
  2. To lower morale and with it, productivity, be pleasant to inefficient workers; give them undeserved promotions. Discriminate against efficient workers, complain unjustly about their work.
  3. Hold [meetings] when there is more critical work to be done.
  4. Demand [documentation].
  5. “Misunderstand” [documentation]. Ask endless questions or engage in long correspondence about such [documents]. Quibble over them when you can.
  6. Make “Speeches.” Talk as frequently as possible and at great lengths.
  7. Bring up irrelevant issues as frequently as possible.
  8. Insists on doing everything through “channels” [and email].
  9. When possible, refer all matters to committees, for “further study and consideration.” Attempt to make the committees as large as possible–never less than five.
  10. Spread inside rumors that sound like inside dope.
  11. Contrive as many interruptions to your work [and team] as you can.
  12. Do your work poorly and blame it on bad tools, machinery, or equipment.
  13. Never pass on your skills and experience to anyone.
  14. If possible, join or help organize a group for presenting employee problems to the management. See that procedures adopted are as inconvenient as possible for the management, involving the presence of large number of employees at each presentation, entailing more than one meeting for each grievance, bringing up problems which are largely imaginary, and so on.
  15. Give lengthy and incomprehensive explanations when questioned.
  16. Act stupid.
  17. Be as irritable and quarrelsome as possible without getting yourself into trouble.

Brian “Ponch” Rivera is a recovering naval aviator, co-founder of AGLX Consulting, LLC, and co-creator of High-Performance Teaming™, an evidence-based approach to rapidly build and develop high-performing teams.

Share This:

Risk Management and Error Trapping in Software and Hardware Development, Part 3

This is part 3 of a 3-part piece on risk management and error trapping in software and hardware development. The first post is located here (and should be read first to provide context on the content below), and part 2 is located here.

Root Cause Analysis and Process Improvement

Once a bug has been discovered and risk analysis / decision-making has been completed (see below), a retrospective-style analysis on the circumstances surrounding the engineering practices which failed to effectively trap the bug completes the cycle.

The purpose of the retrospective is not to assign blame or find fault, but rather to understand the cause of the failure to trap the bug, inspect the layers of the system, and determine if any additional layers, procedures, or process changes could effectively improve collective engineering surety and help to prevent future bugs emerging from similar causes.

Methodology

  1. Review sequence of events that led to the anomaly / bug.
  2. Determine root cause.
  3. Map the root cause to our defense-in-depth (Swiss cheese) model.
  4. Decide if there are remediation efforts or improvements which would be effective in supporting or restructuring the system to increase its effectiveness at error trapping.
  5. Implement any changes identified, sharing them publicly to ensure everyone understands the changes and the reasoning behind them.
  6. Monitor the changes, adjusting as necessary.

Review sequence of events

With appropriate representatives from engineering teams, certification, hardware, operations, customer success, etc., review the discovery path which led to finding the bug. The point is to understand the processes used, which ones worked, and which let the bug pass through.

Determine root cause and analyze the optimum layers for improvement

What caused the bug? There are many enablers and contributing factors, but typically only one or two root causes. The root cause is one or a possible combination of Organization, Communication, Knowledge, Experience, Discipline, Teamwork, or Leadership.

  • Organization – typically latent, organizational root causes include things like existing processes, tools, practices, habits, customs, etc., which the company or organization as a whole employs in carrying out its work.
  • Communication – a failure to convey necessary, important, or vital information to or among an individual or team who required it for the successful accomplishment of their work.
  • Knowledge – an individual, team, or organization did not possess the knowledge necessary to succeed. This is the root cause for knowledge-based errors.
  • Experience – an individual, team, or organization did not possess the experience necessary to successfully accomplish a task (as opposed to the knowledge about what to do). Experience is often a root cause in skill-based errors of omission.
  • Discipline – an individual, team, or organization did not possess the discipline necessary to apply their knowledge and experience to solving a problem. Discipline is often a root cause in skill-based errors of commission.
  • Teamwork – individuals, possibly at multiple levels, failed to work together as a team, support one another, and check one another against errors. Additional root causes may be knowledge, experience, communication, or discipline.
  • Leadership – less often seen at smaller organizations, a Leadership failure is typically a root cause when a leader and/or manager has not effectively communicated expectations or empowered execution regarding those expectations.

Map the root cause to the layer(s) which should have trapped the error

Given the root cause analysis, determine where in the system (which layer or layers) the bug should have been trapped. Often there will be multiple locations at which the bug should or could have been trapped, however the best location to identify is the one which most closely corresponds to the root cause of the bug. Consideration should also be given to timeliness. The earlier an error can be caught or prevented (trapped), the less costly it is in terms of both time (to find, fix, and eliminate the bug) and effort (a bug in production requires more effort from more people than a developer discovering a bug while checking their own unit test).

While we should seek to apply fixes at the locations best suited for them, the earliest point at which a bug could have been caught and prevented will often be the optimum place to improve the system.

For example, if a bug was traced back to a team’s discipline in writing and using tests (root cause: discipline and experience), then it would map to layers dealing with testing practices (TDD/ATDD), pair programming, acceptance criteria, definition of “Done,” etc. Those layers to which the team can most readily apply improvements and which will trap the error sooner rather than later should be the focus for improvement efforts.

Decide on improvements to increase system effectiveness

Based on the knowledge gained through analyzing and mapping the root cause, decisions are made on how to improve the effectiveness of the system at the layers identified. Using the testing example above, a team could decide that they need to adjust their definition of Done to include listing which tests a story has been tested against and their pass/fail conditions.

Implement the changes identified, and monitor them for effectiveness.

Risk Analysis

Should our preventative measures fail to stop a bug from escaping into a production environment, an analysis of the level of risk needs to be explicitly completed. (This is often done, but in an implicit way.) The analysis of the level of risk derives from two areas.

Risk Severity – the degree of impact the bug can be expected to have to the data, operations, or functionality of affected parties (the company, vendors, customers, etc.).

Blocking A bug that is so bad, or a feature that is so important, that we would not ship the next release until it is fixed/completed. Could also signify a bug that is currently impacting a customer’s operations, or one that is blocking development.
Critical A bug that needs to be resolved ASAP, but for which we wouldn’t stop everything. Bugs in this category are not impacting operations (a customer’s, or ours), but they are significantly challenging to warrant attention.
Major Best judgement should be used to determine how this stacks against other work. The bug is serious enough that it needs to be resolved, but the value of other work and timing should be considered. If a bug sits in major for too long, its categorization should be reviewed and either upgraded or downgraded.
Minor A bug that is known, but which we have explicitly de-prioritized. Such a bug will be fixed as time allows.
Trivial Should really consider closing this level of bug. At best these should be put into the “Long Tail” for tracking.

Risk Probability – the likelihood, expressed against a percentage, that those potentially affected by the bug will actually experience it (ie., always, only if they have a power outage, or only if the sun aligns with Jupiter during the slackwater phase of a diurnal tide in the northeastern hemisphere between 44 and 45 degrees Latitude).

Definite 100% – issue will occur in every case
Probable 60-99% – issue will occur in most cases
Possible 30-60% – coin-flip; issue may or may not occur
Unlikely 2-30% – issue will occur in less than 50% of cases
Won’t 1% – occurrence of the issue will be exceptionally rare

Given Risk Severity and Probability, the risk can be assessed according to the following matrix and assigned a Risk Assessment Code (RAC).

Risk Assessment Matrix Probability
Definite Probable Possible Unlikely Won’t
Severity Blocker 1 1 1 2 3
Critical 1 1 2 2 3
Major 2 2 2 3 4
Minor 3 3 3 4 5
Trivial 3 4 4 5 5

Risk Assessment Codes
1 – Strategic     2 – Significant     3 – Moderate     4 – Low     5 – Negligible

The Risk Assessment Codes are a significant factor in Risk decision-making.

  1. Strategic – the risk to the business or customers is significant enough that its realization could threaten operations, basic functioning, and/or professional reputation to the point that the basic survival of the business could be in jeopardy. As Arnold said in Predator: “We make a stand now, or there will be nobody left to go to the chopper!”
  2. Significant – the risk poses considerable, but not life-threatening, challenges for the business or its customers. If left unchecked, these risks may elevate to strategic levels.
  3. Moderate – the risk to business operations, continuity, and/or reputation is significant enough to warrant consideration against other business priorities and issues, but not significant enough to trigger higher responses.
  4. Low – the risk to the business is not significant enough to warrant special consideration of the risk against other priorities. Issues should be dealt with in routine, predictable, and business-as-usual ways.
  5. Negligible – the risk to the business is not significant enough to warrant further consideration except in exceptional circumstances (ie., we literally have nothing better to do).

Risk Decision

The risk decision is the point at which a decision is made about the risk. Typically, risk decisions take the form of:

  • Accept – accept the risk as it is and do not mitigate or take additional steps.
  • Delay – for less critical issues or dependencies, a decision about whether to accept or mitigate a risk may be delayed until additional information, research, or steps are completed.
  • Mitigate – establish a mitigation strategy and deal with the risk.

For risk mitigation, feasible Courses of Action (CoAs) should be developed to assist in making the mitigation plan. These potential actions comprise the mitigation and or reaction plan. Specifically, given a specific bug’s risk severity, probability, and resulting RAC, the courses of action are the possible mitigate solutions for the risk. Examples include:

— Pre-release —

  • Apply software fix / patch
  • Code refactor
  • Code rewrite
  • Release without the code integrated (re-build)
  • Hold the release and await code fix
  • Cancel the release

— In production —

  • Add to normal backlog and prioritize with normal workflow
  • Pull / create a team to triage and fix
  • Swarm / mob multiple teams on fix
  • Pull back / recall release
  • Release an additional fix as a micro-upgrade

For all risk decisions, those decisions should be recorded and those which remain active need to be tracked. There are many methods available for logging and tracking risk decisions, from spreadsheets to documentation to support tickets. There are entire software platforms expressly designed to track and monitor risk status and record decisions taken (or not) about risks.

Decisions to delay risk mitigations are the most important to track, as they require action and at the speed most business move today, a real risk exists of losing track of risk delay decisions. Therefore a Risk Log or Review should be used to routinely review the status of pending risk decisions and reevaluate them. Risk changes constantly, and risks may significantly change in severity and probability overnight. In reviewing risk decisions regularly, leadership is able to simultaneously ensure both that emerging risks are mitigated and that effort is not wasted unnecessarily (as when effort is put against a risk which has significantly declined in impact due to changes external to the business).

Conclusion

I hope you’ve enjoyed this 3-part series. Risk management and error trapping is a complicated and – at times – complex topic. There are many ways to approach these types of systems and many variations on the defense-in-depth model.

The specific implementation your business or organization chooses to adopt should reflect the reality and environment in which you operate, but the basic framework has proven useful across many domains, industries, and is directly adapted from Operational Risk Management as I used to practice and teach it in the military.

Understanding the root cause of your errors, where they slipped through your system, and how to improve your system’s resiliency and robustness are critical skills which you need to develop if they are not already functional. A mindful, purposeful approach to risk decision-making throughout your organization is also critical to your business operations.

Good luck!

 

Chris Alexander is a former U.S. Naval Officer who was an F-14 Tomcat flight officer and instructor. He is Co-Founder and Executive Team Member of AGLX Consulting, creators of the High-Performance Teaming™ model, a Scrum Trainer, Scrum Master, and Agile Coach.

Share This:

Risk Management and Error Trapping in Software and Hardware Development, Part 2

This is part 2 of a 3-part piece on risk management and error trapping in software and hardware development. The first post is located here (and should be read first to provide context on the content below).

Error Causality, Detection & Prevention

Errors occurring during software and hardware development (resulting in bugs) can be classified into two broad categories: (1) skill-based errors, and (2) knowledge-based errors.

Skill-based errors

Skill-based errors are those errors which emerge through the application of knowledge and experience. They are differentiated from knowledge-based errors in that they arise not from a lack of knowing what to do, but instead from either misapplication or failure to apply what is known. The two types of skill-based errors are errors of commission, and errors of omission.

Errors of commission are the mis-application of a previously learned behavior or  knowledge. To use a rock-climbing metaphor, if I tied my climbing rope to my harness with the wrong type of knot, I would be committing an error of commission. I know I need a knot and I know which knot to use and I know how to tie the correct knot – I simply did not do it correctly. In software development, one example of an error of commission might be an engineer providing the wrong variable to a function call, as in:

var x = 1;        // variable to call
var y = false;    // variable not to call
public function callVariable(x) {
return x;
}
callVariable(y); // should have provided “x” but gave “y” instead

Errors of omission, by contrast, are the failure to apply knowledge or experience (previously learned behaviors) to the given problem. In my climbing example, not tying the climbing rope to my harness (at all) before beginning to climb is an error of omission. (Don’t laugh – this actually happens.) In software development, an example of an error of omission would be an engineer forgetting to provide a variable to a function call (or forgetting to add the function call at all), as in:

var x = 1;              // variable to call
var y = false;          // variable not to call
public function callVariable(x) {
return x;
}
callVariable();   // should have provided “x” but left empty

Knowledge-based errors

Knowledge-based errors, in contrast to skill-based errors, arise from the failure to know the correct behavior to apply (if any). An example of a knowledge-based error would be a developer checking in code without any unit, integration, or system tests. If the developer is new and has never been indoctrinated to the requirements for code check-in as including having written and run a suite of automated unit, integration, and system tests, this is an error caused by a lack of knowledge (as opposed to omission, where the developer had been informed of the need to write and run the tests but failed to do so).

Defense-in-depth, the Swiss cheese model, bug prevention and detection

Prevention comprises the systems and processes employed to trap bugs and stop them from getting through development environments and into certification and/or production environments (depending on your software / hardware release process). In envisioning our Swiss cheese model, we need to understand that the layers include both latent and active types of error traps, and are designed to mitigate against certain types of errors.

The following are intended to aid in preventing bugs.

Tools & methods to mitigate against Skill-based errors in bug prevention:

  • Code base and architecture [latent]
  • Automated test coverage [active]
  • Manual test coverage [active]
  • Unit, feature, integration, system, and story tests [active]
  • TDD / ATDD / BDD / FDD practices [active]
  • Code reviews [active]
  • Pair Programming [active]
  • Performance testing [active]
  • Software development framework / methodology (ie, Scrum, Kanban, DevOps, etc.) [latent]

Tools & methods to mitigate against Knowledge-based errors in bug prevention:

  • Education & background [latent]
  • Recruiting and hiring practices [active]
  • New-hire Onboarding [active]
  • Performance feedback & professional development [active]
  • Design documents [active]
  • Definition of Done [active]
  • User Story Acceptance Criteria [active]
  • Code reviews [active]
  • Pair Programming [active]
  • Information Radiators [latent]

Detection is the term for the ways in which we find bugs, hopefully in the development environment but this phase would also include certification if your organization has a certification / QA phase. The primary focus of detection methods is to ensure no bugs escape into production. As such, the entire software certification system itself may be considered one, large, active layer of error trapping. In fact, in many enterprise companies, the certification or QA team (if you have one) is actually the last line of defense.

The following are intended to aid in detecting bugs:

Tools & methods to mitigate against Skill-based errors in detecting bugs:

  • Automated test coverage [active]
  • Manual test coverage [active]
  • Unit, feature, integration, system, and story tests [active]
  • TDD / ATDD / BDD / FDD practices [active]
  • Release certification testing [active]
  • Performance testing [active]
  • User Story Acceptance Criteria [active]
  • User Story “Done” Criteria [active]
  • Bug tracking software [active]
  • Triage reports [active]

Tools & methods to mitigate against Knowledge-based errors in detecting bugs:

  • Education & background [latent]
  • Professional development (individual / organizational) [latent / active]
  • Code reviews [active]
  • Automated & manual test coverage [active]
  • Unit, feature, integration, system, story tests [active]

When bugs “escape” the preventative measures of your Defense-in-depth system and are discovered in either the development or production environment, a root cause analysis should be conducted on your system based on the nature of the bug and how it could have been prevented and / or detected earlier. Based upon the findings of your root cause analysis, your system can be improved in specific, meaningful ways to increase both its robustness and resilience.

How an organization should, specifically, conduct root cause analysis, analyze risk and make purposeful decisions about risk, and how they should improve their system is the subject of part 3 in this series, available here.

 

Chris Alexander is a former U.S. Naval Officer who was an F-14 Tomcat flight officer and instructor. He is Co-Founder and Executive Team Member of AGLX Consulting, creators of the High-Performance Teaming™ model, a Scrum Trainer, Scrum Master, and Agile Coach.

Share This:

Risk Management and Error Trapping in Software and Hardware Development, Part 1

The way in which we conceptualize and analyze risk and error management in technology projects has never received quite the same degree of scrutiny which business process frameworks and methodologies such as Scrum, Lean, or Traditional Project Management have. Yet risk is inherent in everything we do, every day, regardless of our industry, sector, work domain, or process.

We actually practice risk management in our everyday lives, often without consciously realizing that what we are doing is designed to manage levels of risk against degrees of potential reward, and either prevent errors from occurring or minimizing their impact when they do.

For example, I recently took a trip to the San Juan islands with my wife and parents. I woke up early, made coffee, roused the troops, and checked the weather. I’d filled up the gas thank the day before, and booked our ferry tickets online. I checked the weather and recommended we each take an extra layer. We departed the house a bit earlier than really necessary, but ended up encountering a detour along the way due to a traffic accident on the Interstate. Nevertheless, we made it to the ferry terminal with about 10 minutes to spare, and just in time to drive onto the ferry and depart for Friday Harbor.

My personal example is relatively simple but, with a little analysis, demonstrates how intuitively we assess and manage risk:

  • Wake up early: mitigates risk of oversleeping and departing late (which could result further in forgetting important things, leaving coffee pot/equipment on, etc.), waiting on others in the bathroom, and not being able to prepare and enjoy some morning coffee (serious risk).
  • Check the weather: understanding the environment we are entering into is critical to mitigating environment-related risks, in this case real environmental concerns such as temperature, weather, wind, and precipitation, enabling us to mitigate potentially negative effects and capitalize on positives. Bad weather may even result in our changing our travel plans entirely – a clear form of risk mitigation in which we determine that our chance for a successful journey is low compared against the value we would derive from undertaking the journey in the first place, and decide the goal is not sufficient to accept present risk levels.
  • Book ferry tickets online: a mitigation against the risk of arriving late and having to wait in line to purchase tickets, which could result in us missing the ferry due to either running out of time or the ferry already being completely booked.
  • Departing earlier than necessary: a mitigation against unforeseen and unknowable specific risk, in this case the generic risk of en route delays, which we did encounter on this occasion.

As you can see, as a story my preparations for our trip seem rather routine and unremarkable, but when viewed through the lens of risk mitigation and error management, each action and decision can be seen as specifically targeted to mitigate one or more specific risks or minimize the potential effects of an error. Unfortunately, our everyday intuitive actions and mental processes seldom translate into our work environments in such direct and meaningful ways.

Risk and Error Management in Software and Hardware Development – Defense-in-Depth and the Swiss Cheese Model

Any risk management system can be seen as a series of layers designed to employ a variety of means to mitigate risk and prevent errors from progressing further through the system. We call this “trapping errors.” Additionally, each of these layers is often just one part of a larger system. A system constructed with these layers  is referred to as having “defense-in-depth.”

Defense-in-depth reflects the simple idea that instead of employing one single, catch-all solution for eliminating risk and trapping errors, a layered approach which employs both latent and active controls in different areas throughout the system will be far more effective in both detecting and preventing errors from escaping.

These layers are often envisioned as slices of Swiss cheese, with each slice representing a different part of the larger system. As a potential risk or error progresses through holes in the system’s layers, it should eventually be trapped in one of the layers.

Risk and errors are then only able to impact the system when all the holes in the system’s Swiss cheese layers “line up.”

Latent and Active Layers

There are two basic types of layers (or traps) in any system: latent and active. In your day to day life, latent traps are things such as the tires on your car or the surface of the road. Active traps are things such as checking the weather, putting on safety gear, wearing a helmet, or deciding not to go out into the weather.

Latent layers in software or hardware development may be things such as the original (legacy) code base, development language(s) used, system architecture & design, hardware (types of disk drives, manufacturer), and so forth. It may even include educational requirements for hiring, hiring practices, and company values.

Active layers in software and hardware development may include release processes, User Story writing and acceptance criteria, and development practices like TDD/ATDD, test automation, code reviews, and pair programming.

Separation of Risk and Error Management Concerns

To better focus on dealing with the most appropriate work at the appropriate time in responding to error detection, triage, and risk mitigation, we can separate our risk and error analysis into the following areas:

During development: focus on trapping errors

  • Prevention – the practices, procedures, and techniques we undertake in engineering disciplines to help ensure we do not release bugs or errors into our code base or hardware products.
  • Detection – the methods available to us as individual engineers, teams, and the organization as a whole to find and respond to errors in our code base or hardware products (which includes reporting and tracking).

Risk mitigation: steps for errors that have escaped into certification or production environments

  • Risk Analysis – the steps required to analyze the severity and impact of an error.
  • Risk Decision-making – the process of ensuring decisions about risk avoidance, acceptance, or mitigation are made at appropriate levels with full transparency.

Continuous Improvement in every case

Improvement – the process of improving workflows and practices through shared knowledge and experience in order to improve engineering practices and further harden our release cycles. This step uses root cause analysis to help close the holes we find in the layers of our Swiss cheese model.

Here is one conceptualization of what a Defense-in-depth Risk Management model might look like. Bear in mind that this is simply one way to conceive of layers at a more macro level, and each layer could easily itself be broken down into a set of layers, or you could conceive of it as one very large model.

swiss_cheese

Given our model and our new ability to conceive of Risk and Error Management in this more meaningful and purposeful way, our next step is to understand error causality and what we can do to apply our causal analysis to strengthening our software and hardware risk management and error trapping system.

Continue reading in part 2 of this 3-part series.

 

Chris Alexander is a former U.S. Naval Officer who was an F-14 Tomcat flight officer and instructor. He is Co-Founder and Executive Team Member of AGLX Consulting, creators of the High-Performance Teaming™ model, a Scrum Trainer, Scrum Master, and Agile Coach.

Share This:

Why Your Next Agile Coach Should be a Fighter Pilot

As technological adoption and innovation accelerate through Mach 3, more business leaders will turn to fighter pilots to help their businesses survive and thrive in today’s VUCA world. For example, the cognitive and social skills naval aviators developed in the cockpit are in high demand in industries where teamwork is essential and team failures costly (e.g. healthcare, oil and gas, mining, energy, and commercial aviation). As more companies adopt a team-based approach to product delivery, and product and companies’ lifecycles shorten, the demand for proven team performance training and coaching will accelerate.

The cognitive and social skills (nontechnical skills) fighter pilots learn are rooted in what is considered to be one of the success stories of modern psychology and cognitive engineering: Crew Resource Management (CRM). CRM training, affectionately known as “Charm School,” covers crucial aspects of resilience including the topics of situational awareness, mission planning, team dynamics, workload management, effective communication, and leadership. CRM was developed in response to the realization that the kinds of errors that cause plane crashes are invariably errors of teamwork and communication (nontechnical skills).

A 50,000 foot view of CRM

  • CRM is the foundation of a human-systems approach, Threat and Error Management (TEM), designed by Human Factors engineers to help us understand and direct human performance within complex operating systems.
  • Human Factors is the applied science of how humans relate effectively and productively with one another in highly technological settings.
  • Crew Resource Management (CRM) is defined as the use of all available resources—information, equipment, and people—to achieve safe and efficient flight operations.

Fighter Pilots as Agile Coaches?

In the 1950s, John Boyd, a fighter pilot and military strategists, developed a decision cycle that changed the “The Art of War.” The decision cycle Boyd developed is known as the OODA Loop and refers to Observe-Orient-Decide-Act. In business, the speed at which the OODA Loop is executed allows the company to get “inside the decision cycle” of its competitors or valued customers. The OODA Loop is an exercise in empathy.

Eric Ries, author of The Lean Startup and entrepreneur, attributes the idea of the Build-Measure-Learn feedback loop to John Boyd’s OODA Loop. At the core of Steve Blank’s Customer Development model and Pivot found in his book, The Four Steps to the Epiphany, is once again OODA. In his new book, Scrum: The Art of Doing Twice the Work in Half the Time, Dr. Jeff Sutherland, a former fighter pilot and the co-creator of Scrum, mentions that the origins of Scrum are Boyd’s OODA and the Toyota Production System.

Scrum is based on my experience flying F-4 Phantoms over North Vietnam… Fighter pilots have John Boyd’s OODA Loop burned into muscle memory. They know what Agility means and can teach it uncompromisingly to others.

-Jeff Sutherland, co-creator of Scrum

What’s missing from today’s Agile Coaching toolkit is the proven human-interaction skills (nontechnical) developed for technical teams who operate in complex environments, CRM. The Agile community is making the same assumptions fighter and commercial pilots made pre-CRM: That effective teams can be built without any formal guidance or instruction. “Leaving it up to the team” is a recipe for failure.

Fighter pilots, unlike some agilists, are horrible marketers. Crew Resource Management is not your DaD’s Scaled Agile Framework (SAFe) nor does it tell you how to do more with LeSS; CRM is a proven approach to building agile at scale. CRM does not replace Scrum but provides the tools an enterprise needs to transition command-and-control managers into servant leaders and build effective and efficient teams. CRM is the “Science of Teamwork.”

9 Cognitive and Social Skills Fighter Pilots Bring to the Agile Fight

1. Adaptability. The ability to alter a course of action based on new information, maintain constructive behavior under pressure and adapt to internal and external environmental changes. The success of a mission depends upon the team’s ability to alter behavior and dynamically manage team resources to meet situational demands.

2. Empathy. Empathy? Fighter pilots and empathy? Yes. John Boyd’s OODA is really about empathy. According to Geoff Colvin, empathy is “discerning what some other person is thinking and feeling, and responding in some appropriate way [1].” OODA is an exercise in empathy. Moreover, according to Geoff Colvin, empathy is the foundation of all other abilities that increasingly make people valuable as technology advances [1].”

3. Assertiveness. One’s willingness to actively participate, state and maintain a position, until convinced by the facts that other options are better.

4. Decision Making. The ability to choose a course of action using logical and sound judgment based on available information.

5. Leadership. The ability to direct and coordinate the activities of other team members or wingman and to encourage the team to work together.

6. Mission Analysis. The ability to develop short-term, long-term and contingency plans and to coordinate, allocate and monitor team resources. Effective planning leads to execution that removes uncertainty and increases mission effectiveness.

7. Situational Awareness. The degree of accuracy by which one’s perception of the current environment mirrors reality. Maintaining a high level of situational awareness will better prepare teams to respond to unexpected situations.

8. Communication. The ability to clearly and accurately send and acknowledge information, instructions, or commands and provide useful feedback. Effective communication is vital to ensuring that all team members understand mission status.

9. Workload Management. The implementation of a strategy to balance the amount of work with the appropriate time and resources available. It includes making sure those people are alert and vigilant (preventing fatigue); figuring out who does what delegation; teaching people how to manage interruptions (and limit interruptions at critical moments; prioritizing tasks and avoiding task oversaturation; and avoiding pitfalls such as continuing a project, flight, or activity even when it’s becoming clearer and clearer that it is dangerous to do so. Workload management in high-reliability industries also means doing all of the above under stress.

While putting this list together I came up with more than 50 examples of why a fighter pilot should be your next Agile Coach. Please feel free to add more or comment on my choices for this article.

Brian “Ponch” Rivera is a recovering naval aviator and co-founder of AGLX, LLC, a Seattle-based Agile consultancy that melds the proven principles of High Reliability Organizations with today’s Agile practices.

[1] Geoff Colvin. Humans are Underrated: What high achievers know that brilliant machines never will. (Portfolio/Penguin, 2015).

Share This:

Leading Agile Organizations: Lessons from the Flight Deck

Have you ever been part of an organization where the product owner, manager, or CEO failed to accept input from a junior or rookie team member and the project or initiative failed? Imagine being part of a team where either failing to share information or not acting on critical information is found to be the root cause behind flying an actual project, an aircraft, into the ground.

In the cockpits of today’s commercial airliners and military aircraft, open communication and the ability to respectfully question authority(perceived or explicit) are essential cognitive and interpersonal skills every crew member must learn so as a team, make that a high-performing team, they can mitigate the unforgiving risks inherent in their complex environment. However, thirty years ago cockpit culture was the poster child for management 1.0 (control) — the very problem impeding agile transformations around the globe.

In the 1970s a rash of commercial airline accidents led NASA and the National Transportation Safety Board (NTSB) to investigate how to fix the complex aviation system. The tipping point in commercial aviation came on December 28, 1978 when United Airlines Flight 173 (UA-173) crashed in a Portland suburb, killing 10 passengers.

UA-173, a DC-8 with 181 passengers on board, circled near the Portland, Oregon airport for an hour as the crew tried to troubleshoot a landing gear problem. The flight engineer, the crew member responsible for monitoring the aircraft systems, unsuccessfully warned the captain (the flying pilot) of the rapidly diminishing fuel supply. The captain—later described by one investigator as “an arrogant S.O.B.” –waited too long to begin his final approach and as a result UA-173 ran out of fuel and crashed.

United-173-e1404804178495

Following the UA-173 investigation, NASA discovered that 60-80% of airline accidents were caused by human error. Digging deeper, and not just settling on human error as a singular cause, NASA identified failures in alignment, leadership, interpersonal communication, and decisiveness as root causes behind several commercial airline disasters including UA-173.

The hierarchical, command and control culture commonly found in the front office (flight decks) of 1970s era commercial airliners including that of UA-173 were no different than those cultures found in many of today’s legacy companies. Following the crash of UA-173, the aviation industry with the help of NASA realized through deep retrospection that the top-down predict-and-control paradigm of managing in complex environments needed to change.

The hierarchical “chain of command” system in place at the time of the accident [UA-173] did not always provide for effective flight crew resource management. Additional training was identified as being needed to ensure the flight crews recognized the value and merits of flight crews breaking down the more formal hierarchical structure. This approach leads to a more participative and collaborative process to ensure that all members of the flight crew feel free to speak up and provide the captain with all relevant information relating to the safety of the aircraft [1].

This change came in the form of Crew Resource Management (CRM), a leadership training system that “encompasses a wide range of knowledge, skills and attitudes including communications, situation awareness, problem solving, decision making, and teamwork,” according to CrewResourceManagement.net. CRM liberated the cockpit environment so every member, regardless of rank, position, skills set, age, time with the company, etc., was empowered to truly collaborate around a shared objective or explicit purpose.

In today’s turbulent markets business leaders are desperately trying to become more “Agile” yet most fail because traditional company cultures—hierarchical, command and control bureaucracies—do not provide knowledge workers the support needed to innovate, adapt, and ultimately delight customers with valued, rapid product releases. Applying aviation lessons learned to Agile transformational challenges is not new; for those of you who have read my posts you are aware that commercial and military aviation have profoundly influenced Agile, Scrum, The Lean Start Up, and elements of design thinking.

Adding leadership patterns from CRM to your Agile toolbox will help transform managers into Agile leaders and coaches–rather than the perceived or actual impediments to organizational agility. In part 2 of this post, I will share more CRM information and provide ideas on how it can help remove the “S.O.B” from your cockpit before a project or your company crashes into the ground.

Warning: You may be the S.O.B.

Brian “Ponch” Rivera is a recovering Naval Aviator and current Enterprise Agile Coach and Executive Consultant based in Seattle, WA.

[1] FAA Website http://lessonslearned.faa.gov/ll_main.cfm?TabID=1&LLID=42&LLTypeID=7

Share This:

500% Productivity Increase in One Day: Lessons from a Stand-down

Last month, seven software development teams (35+ members) stepped away from their sprint for one day and participated in Sprint Stand-down. The problems the teams were trying to solve during the Stand-down were technical—the teams recognized they had a collective knowledge gap and needed to slow down to speed up.

During the Stand-down retrospective, we discovered the teams increased productivity by over 500% in one day—an unexpected and welcomed event outcome. The retrospective provided us an opportunity to examine the how and why behind the hyper-productivity realized in this unfamiliar, one-day training event.

The lessons we learned were not revolutionary; instead, the lessons reinforced the values and practices found in the Agile Manifesto, Scrum, Extreme Programming, CrossLead, Flawless Execution, Crew Resource Management and Threat and Error Management.  In one day, a Sprint Stand-down provided undeniable evidence to developers, product owners, managers, directors, VPs, and the CIO that empowered execution trumps the traditional command-and-control approach to product delivery.

The transferable lessons learned from the Stand-down fall into familiar categories:

  • Shared Purpose/Objective
  • Workload Management/Limit Work in Progress (WIP)
  • Leadership/Teamwork
  • Execution Rhythm or Cadence
  • Communication

Before going deeper into the lessons learned, I want to share a little bit about the origins, concept and our approach to a Sprint Stand-down.


Sprint Stand-down

You will not find a Sprint Stand-down in the Scrum Guide. A Stand-down is not found the Project Management Institute’s (PMI) vernacular nor is it part of any Agile or current trending management methodology. A Stand-down is a training evolution commonly used by elite military units, commercial aviation, and other high-reliability organizations (HRO) to accelerate team performance.

The purpose of any Stand-down is to promote knowledge-based training along with personal discipline and responsibility as essential elements of professionalism. It is designed to empower and inspire a community of professionals to continuously seek knowledge, integrate new information in everyday practice, and share new findings with others within the company and industry.

Stand-down Planning

The event was a self-organized undertaking where a small team of eight people were accountable for event execution. Planning for the event followed a rapid planning processes inspired by Crew Resource Management (CRM) and Threat and Error Management (TEM). The objectives of this Sprint Stand-down were to inform, inspire, educate, and motivate the teams—admittedly weak objectives as they lacked clarity and measurability.

With a shared understanding of the Stand-down objective(s), the planning team used a liberating structure to capture anticipated threats and the resources needed to overcome those threats, and reviewed lessons learned from previous events that were similar to a Stand-down. A Stand-down plan was formed in less than 35 minutes where each planning team member knew who would have to do what by when to ensure flawless Stand-down execution.

Stand-down Execution

The Stand-down included in-house subject matter experts and one external trainer with 35+ team members in one room for 6.5 hours. Team members treated the Stand-down as an offsite, declining all meetings and turning on their Outlook out-of-office replies. Team members were randomly assigned to one of two Stand-down teams as determined by the type of gift card they received when they entered the Stand-down room. Two additional gift cards were given to all participants for the purpose of regifting —team members were encouraged to give away their gift cards to other team members for any reason. Team members were warned that over lunch (provided by the company) they may be called upon to share with everyone to whom they gave a gift card to and why. The CIO provided an impromptu leadership moment which included the distribution of additional gift cards to team members who were nominated by their peers.

500%

An outcome of the event was an increase in productivity by 200% to 700% depending on the metric used (e.g. story points, stories done, stories in progress and stories done, etc.). However, it is likely, based on stories “done” during the Stand-down, 26, versus average stories completed during a normal sprint day, 5, the increase in productivity was 500%. In one day.

For argument’s sake, let’s just say the productivity outcome for this one day event was 20%, a palatable number for those who have not embraced the power of Scrum or empowered execution. What if we could take the lessons learned from this event and apply them to how we work during our normal workdays to get a productivity increase of 5% in the next two weeks?


Sprint Stand-down Lessons Learned

Shared Purpose/Objective

  • A team needs a shared purpose or common objective. Objectives should be clear, measurable, achievable, and aligned to a focus area, strategic line of effort, company vision, etc.
  • A shared purpose builds unity of effort. Teams were observed self-organizing throughout the day and reported a reduction in duplication of work and an increase in cross-team knowledge-sharing.

 Workload Management/  Limit WIP

  • Limit WIP. Individuals reported being happier as they felt part of a team of teams working toward one goal.
  • Context Switching is bad. Most team members reported that they did not check their email during the six hours. Team members reported that the internal Stand-down disruptions (we played music during frequent shout-outs) slowed them down and were absolutely disruptive.
  • Protect the teams from out-of-band work. Team members reported that they had no out-of band work during the day.
  • Empower team members to push back on work that is not aligned to the objective.
  • Pairing works. Teams paired all day. Some mobbed.

Leadership

  • Say “Thank You.” Team members should recognize and acknowledge the importance of others in task performance.
  • Leaders need to be visible but not intrusive. Checking-in to say “thank you” to individuals carries more weight than email.
  • An invisible leader is a visible problem. Team members noticed those leaders who failed to stop by to see how the day was progressing.
  • Unscripted leadership is the best kind. The CIO’s visit was received as genuine.
  • Recognition from leaders is great, but peer recognition of important contributions is even better.

 Execution Rhythm or Cadence

  • Stand-down tempo is not sustainable but the practice is sound when a knowledge gap exists.
  • Stand-downs should not exceed six hours.
  • Schedule Stand-downs as required. No more than once a month.

Communication

  • Face-to-face communication remains the gold standard.
  • Keep work visible. The teams shared one electronic backlog.
  • Co-locate teams to maximize the value of osmotic communication.
  • Cross-team pollination builds trust.

Brian “Ponch” Rivera is a recovering Naval Aviator and Commander in the U.S Navy Reserve. He is the co-founder of AGLX, LLC, a Seattle-based Agile Leadership Consulting Team, and a ScrumTotal Advisory Board Member.

Share This:

What Agile Teams Can Learn from Flight Crews

Small, cross-functional teams working together with devices, focused on a shared objective, surrounded by complexity and frequently changing conditions. Welcome to the world of software development. And commercial aviation. Think the similarities between software development and aviation end here? Think again.

Aviation continues to have a profound influence on software development, organizational agility, cyber security, and transforming managers into leaders. For example, the complexity-busting framework, Scrum, used by technology companies to build complex software, comes from fighter aviation and Lean manufacturing. The Lean Startup, a popular business-model framework used by today’s hottest Silicon Valley startups, is based on John Boyd’s OODA Loop, an empathy-driven decision cycle that captures how fighter pilots “get inside” their opponent’s decision cycle to gain a competitive advantage.  Similarly, OODA (Observe, Orient, Decide, Act) is used to rapidly design products and in the burgeoning business of cyber security. On the management front, aviation is reported to be the inspiration behind the Holocracy movement, a social system where authority and decision-making are distributed throughout self-organizing teams. But you already knew all of this, right?

Next Time You Fly on a Commercial Carrier…

Commercial aviation flight deck and cabin crews follow the empirical process of plan, communicate, execute, and assess on each leg of their assigned trip (mission). Similarly, software developers around the globe follow the same empirical process found in Scrum—Sprint Planning (plan), Standups (communicate), Sprint Execution (execute), Review and Retrospective (assess). A sprint or iteration is a time-boxed mission (one to four weeks long) where potentially shippable software is delivered. With empowered team members and solid execution, Scrum builds a culture of continuous learning and innovation.

There’s more?

The human interaction skills needed on the flight deck and on software development and business teams are exactly the same; these cognitive and social skills include empathy, collaboration, discipline, communicationleadership, situation awareness and teamwork. Moreover, the silent killer found in the cockpit is also the top threat among software development and business teams.

Slow and insidious, poor Workload Management is the silent killer. However, software developers and Lean experts refer to Workload Management as Work in Progress (WIP). When business and software teams try to do too much (too much WIP), or do not have a shared purpose or objective, rapid value delivery (effective productivity) and quality decreases—detriments to business survival.

Prioritization of work in and out of the cockpit is an imperative but flight deck and cabin crews have a marked advantage over software and business teams: flight crews are trained on the effective use of all available resources needed to complete a safe and efficient flight; software and business teams are not. The non-technical skills training flight crews receive is called Crew Resource Management (CRM) and Threat and Error Management (TEM).

CRM, affectionately known as “Charm” school, teaches the cognitive and social skills individuals need to be part of high-performing teams in complex, rapidly changing environments. TEM is a human-system approach to building habits and skills team members need to manage threats and errors within complex operating environments.

What if technology teams applied the cognitive and social lessons learned from CRM and TEM to the world of software development?

Instead of “Scaling Agile,” what is needed is a Crew Resource Management- and Threat Error Management- influenced Agile Operating System–a system that builds leaders and empowers teams and individuals at every level. This operating system should enhance Scrum through a simple, repeatable, proven, and scalable set of interconnected and interdependent planning, communication, execution, and assessment processes that drive innovation, create leaders, and build a continuous learning culture. Think of this human operating system as the non-technical skills teams need to overcome complexity—those skills that flight crews have burned into muscle memory.

Brian “Ponch” Rivera is a recovering Naval Aviator and Commander in the U.S Navy Reserve. He is the co-founder of AGLX, LLC, a Seattle-based Agile Leadership Consulting Team, and a ScrumTotal Advisory Board Member.

(c) Can Stock Photo

Share This:

What the Agile Community Should Learn from Two Little Girls and Their Weather Balloon

As reported by GeekWire, over the weekend two Seattle sisters, Kimberly (8) and Rebecca (10) Yeung, launched a small weather balloon to the edge of space (roughly 78,000 feet). They have the GoPro video from two cameras to prove it.

While this is certainly an impressive, if not amazing, feat for two young girls to have accomplished (despite some parental assistance), what is perhaps most impressive (at least to me) is the debrief (or retrospective) they held after the mission. While I’m not fortunate enough to have been there to witness it personally, I can see from the photo of their debrief sheet (as posted in the GeekWire article) that it was amazingly productive and far surpasses most of the agile retrospectives (debriefs) I’ve witnessed.

14416510384450*Photo copied from the article on GeekWire.

Apart from the lesson about their Project Plan (“We were successful because we followed a Project Plan & Project Binder”), this sheet is astonishingly solid. Even given the fact that I think it is a misconception to attribute success to having had a project plan, for an 8 and 10-year-old, this is awesome work!

My friend and fellow coach Brian Rivera and I have often discussed the dire lack of quality, understanding, and usefulness of most agile retrospectives. I might even go so far as to call the current state of agile retrospectives in general “abhorrid” or “pathetic,” even “disgraceful.” Yes, I might just use one of those adjectives.

For teams using agile methodologies and frameworks focused on continuous improvement (hint: everything in agile is about enabling continuous improvement), the retrospective is the “how” which underlies the “what” of continuous improvement.

Supporting the concrete actions of how to improve within the retrospective are the lessons learned. Drawing out lessons learned during an iteration isn’t magic and it isn’t  circumstantial happenstance – it requires focused thought, discussion, and analysis. Perhaps for high-performing teams who have become expert at this through positive practice, distilling lessons learned and improving their work may occur at an almost unconscious level of understanding, but that’s maybe 1% (5% if I’m optimistic) of all agile teams.

So what does a team need to understand to actually conduct a thorough and detailed analysis during their retrospective? Actually only a few things:

  1. What were they trying to do? (Goals)
  2. How did they plan to do it? (Planning / strategy)
  3. What did they actually do? (Execution – what actually occurred)
  4. What were their outcomes? (Results of their work)
  5. What did they learn, derived from analyzing the results of their efforts measured against the plan they had to achieve their goals? (Lessons learned)

A simple example:

  1. I want to bake peach scones which are light, fluffy, and taste good. (Goal + acceptance criteria)
  2. I plan to wake up early Saturday morning and follow a recipe for peach scones which I’ve found online, is highly rated, and comes from a source I trust. It should take 30 minutes. (Planning – who / what / when / where / how)
  3. I wake up early Saturday morning and follow the recipe, except for the Baking Powder. It can leave a metallic taste behind, so I leave it out. (Execution)
  4. It took almost an hour to make the scones, and they did not rise. They tasted alright, but were far, far too dense and under-cooked internally, partially due to being flat. (Outcomes)
  5. I didn’t allocate enough time based on the fact that it was my first attempt at baking scones and I was trying to modify a known good recipe (reinventing the wheel, root causes: experience). Although I wanted light, fluffy scones, I didn’t get them because I deliberately left out a key ingredient necessary to help the dough rise (good intention – bad judgment, root causes: knowledge / discipline). (Lessons learned)

Perhaps a bit overly simplistic but this is exactly the type of concrete, detailed analysis into which most teams simply never delve. Instead, retrospectives for most agile teams have devolved into a tragic litany of games, complaining sessions, and “I liked this / I didn’t like that” reviews with no real outcomes, takeaways, or practical concepts for how to actually improve anything. Their coaches leave them with simple statements such as “we need to improve.” Great. Thanks.

Taking what we know from Kimberly and Rebecca’s plan to send a weather balloon into outer space, let’s do a little analysis on their retrospective. I can tell you already it is not only solid, but will ensure they’re able to improve not only on the technical design itself, but also improve their team’s “meta” – the ways they work, their collaboration, their teamwork, their research – everything which enables them to actually continually improve and produce powerful results.

  • Bigger balloon – create more lift – ensure faster rate of ascent (Technical / work – related but important. They have learned through iterating.)
  • Remember to weigh payload with extra – more accurate calculations – correct amount of helium (Technical but also process-related, this draws root causes arising from both knowledge and experience, enabling them to adapt both their work itself and their meta – how they work.)
  • Don’t stop trying – you will never know if you don’t ask. Eg GoPro (Almost purely meta, reflecting a great lesson which builds not only a team mindset but also reflects a core value, perseverance!)
  • Washington Geography – Map research on launch locations taught us a lot of geography (This is both technical and meta, addressing their research data and inputs/outputs but also learning about how to learn and the value of research itself!)
  • Always be optimistic – We thought everything went wrong but every thing went right. Eg. SPOT Trace max altitude mislead [sic] our expectations. Eg. We thought weather cloudy but it was sun after launch. Eg. Weight. Thought payload too heavy for high altitude. (Are you kidding me?! Awesome! Lessons about situational awareness and current operational picture, data inconsistencies, planned versus actual events, planning data and metrics, and the importance of outlook/attitude! #goldmine!)
  • Be willing to reconstruct – If you find out there is a problem, do not be afraid to take it apart and start all over again. (Invaluable lesson – learning to embrace failure when it occurs and recover from it, realizing that the most important thing is not to build the product right, but to build the right product!)
  • Have a redundant system – Worry less. (Needs no explanation.)
  • SPOT Trace technology awesome – Very precise (This is a fantastic example of a positive lesson learned – something that is equally important to acknowledge and capture to ensure it gets carried forward and turned into a standard practice / use.)
  • Live FB updates – add to fun + excitement (Yes yes yes!! To quote an old motto, “If you’re not having fun, you’re not doing it right!” This stuff should be fun!!)
  • Speculation – Don’t guess. Rely on data. (Fantastic emphasis on the importance of data-oriented decisions and reflects another potential team core value!)
  • Project Plan – We were sucessful [sic] because we followed a Project Plan + Project Binder. (The only lesson I disagree with. I would advocate a good 5 Whys session on this one. My suspicion is that the project was successful because they as a team worked both hard and well together [high-performing], had fun, and iterated well [based on the lesson about not being afraid to reconstruct / start over]. I have serious doubts that their mission was a success because they had and followed a project plan. Regardless, this is far too small a point to detract from the overall impressiveness of their work!)

Take a few lessons from two girls who have demonstrated concrete learning in ways most adults fail miserably to even conceptually grasp. If you are on a team struggling to get productive results from your retrospectives, stop accepting less than solid, meaningful analysis coupled with clear, actionable results. The power is in your hands (and head).

If you are one of those agile coaches who thinks retrospectives are just for fun and celebration, who plays games instead of enables concrete analysis, and who wonders why their teams just cannot seem to make any marked improvements, get some education and coaching yourself and stop being a part of the problem!

(Written with the sincerest of thanks to Kimberly and Rebecca Yeung, and the Yeung family for their outstanding work, and to GeekWire for publishing it!)

* Chris Alexander is an agile coach, thinker, ScrumMaster, recovering developer, and co-founder of AGLX Consulting, who spends too little time rock climbing.

Share This: