Tag Archives: Software Development

A Shallow Dive Into Chaos: Containing Chaos to Improve Agile Story Pointing

In May 1968 the U.S.S. Scorpion (SSN-589), a Skipjack-class nuclear submarine with 99 crewmembers aboard, mysteriously disappeared en route to Norfolk, VA from its North Atlantic patrol. Several months later, the U.S. Navy found its submarine in pieces on the Atlantic seabed floor. Although there are multiple theories as to what caused the crippling damage to the submarine, the U.S. Navy calls the loss of the Scorpion and her 99 crew an “unexplained catastrophic” event [1].

The initial search area stretched across 2,500 NM of Atlantic Ocean from the Scorpion’s last known position off of the Azores to its homeport in Norfolk, Virginia. Recordings from a vast array of underwater microphones reduced the search area down to 300 NM. Although technology played an important role in finding the U.S.S. Scorpion, it was the collective estimate of a group that eventually led to the discovery of the destroyed submarine. The U.S.S. Scorpion was found 400 nautical miles southwest of the Azores at a depth of 9,800 ft., a mere 220 yards from the collective estimate of the group [2].

The group of experts included submarine crew members and specialists, salvage experts, and mathematicians. Instead of having the group of experts consult with one another, Dr. John Craven, Chief Scientist of the U.S. Navy’s Special Projects Office, interviewed each expert separately and put the experts’ answers together. What’s interesting about the collective estimate is that none of the expert’s own estimates coincided with the group’s estimate—in other words, none of the individual experts picked the spot where the U.S.S. Scorpion was found.

A Quick Lesson in Chaos

According to Dave Snowden, Chaos is completely random but if you can contain it, you get innovation. You do this by separating and preventing any connection within a system. And when done properly, you can trust the results. Skunk Works projects and the Wisdom of Crowds approach made popular by James Surowiecki are great examples of how to contain Chaos [3].

Dr. Craven’s approach to finding the U.S.S. Scorpion is a controlled dive into Chaos; preventing any connections within the group, protecting against misplaced biases. Moreover, by bringing in a diverse group of experts, Dr. Craven ensured different expert perspectives were represented in the collective estimate.

To contain Chaos, three conditions must be satisfied [4]:

1. Group members should have tacit knowledge—they should have some level of expertise

2. Group members must NOT know what the other members answered

3. Group Members must NOT have a personal stake

Story Point Estimates: Taking a Shallow Dive into Chaos

Agile software development teams frequently estimate the effort and complexity of user stories found in their product and iteration backlogs. Individual team members “size” a story by assigning a Fibonacci number to a story based on their own experiences and understanding of the user story. A point consensus is not the aim but, unfortunately, is frequently coached and practiced.

To reduce cognitive biases, contain Chaos, and accelerate the story pointing process, AGLX trains and coaches clients’ software development teams to ask the product owner questions using various Red Teaming techniques, to include Liberating Structures. Once all team members are ready to assign points to the story, team members place their selected Fibonacci card or chip face down on the table.

On the “Flip” in “Ready…Flip,” team members turn their cards over and the ScrumMaster rapidly records the individual points. When all points are registered, the ScrumMaster takes the average of the points scored and assigns that number to the story (rounding to the nearest integer, if desired). No need to waste time re-pointing or trying to come to a consensus.

Example. A six-person software development team assigns the following individual points to a story.

Cards

The average is 6.5 (7 if rounding). In this example, none of the individual estimates match the group’s estimate. And, the group’s estimate is not a Fibonacci number.

In some High-Performing Organizations where psychological safety is well established, some development teams will have the team members who pointed the story with a 3 and 13 (using the example above) to present their reasoning using a complex facilitation technique—time-boxed, of course. The point behind this ritual is not to re-point the story but to have team members listen to the story outliers or mavericks for the purpose of identifying possible insights. Caution: This is an advanced technique.

Innovative and Resilient Organizations

Containing Chaos requires expert facilitation and will not happen overnight. However, simplifying your story pointing approach by not allowing consensus or team consultation (Condition 2) when it comes to story pointing is a small step to becoming an innovative and resilient organization—if that is what the organizations desires.

Although the loss of the U.S.S. Scorpion and her 99 crew was a tragedy, by sharing the story of how the collective estimate of a group of diverse experts found the submarine on the seabed floor is a great example of the power of cognitive diversity and containing Chaos.

Brian “Ponch” Rivera is a recovering naval aviator, co-founder of AGLX Consulting, LLC, and co-creator of High-Performance Teaming™ – an evidence-based, human systems solution to rapidly build and develop networks of high-performing teams. Contact Brian at brian@aglx.consulting.

[1] Sontag, Sherry; Drew, Christopher (2000). Blind Man’s Bluff: The Untold Story of American Submarine Espionage. New York:

[2] Surowiecki, James (2005). The Wisdom of Crowds. Anchor Books. pp. xv. ISBN 0-385-72170-6.

[3] Snowden, D.  KM World 2016 Keynote.  http://cognitive-edge.com/resources/slides-and-podcasts/

[4] Ibid

Share This:

Agile is Dead! The Rise of High-Performing Teams: 10 Lessons from Fighter Aviation

Software and hardware industry leaders are leveraging the lessons from fighter aviation to help their businesses navigate the speed of change and thrive in today’s complex and hostile environment. The emergence of the Observe-Orient-Decide-Act (OODA) Loop—an empathy-based decision cycle created by John Boyd (fighter pilot)—in today’s business lexicon suggests that executives, academia, and the Agile community recognize that fighter pilots know something about agility.

For example, Eric Ries, author of The Lean Startup and entrepreneur, attributes the idea of the Build-Measure-Learn feedback loop to John Boyd’s OODA Loop [1]. At the core of Steve Blank’s Customer Development model and Pivot found in his book, The Four Steps to the Epiphany, is once again OODA [2]. In his new book, Scrum: The Art of Doing Twice the Work in Half the Time, Dr. Jeff Sutherland, a former fighter pilot and the co-creator of Scrum, connects the origins of Scrum to hardware manufacturing and fighter aviation (John Boyd’s OODA Loop) [3]. Conduct a quick Google book search on “Cyber Security OODA” and you will find over 760 results.

This fighter pilot “mindset” behind today’s agile innovation frameworks and cyber security approaches is being delivered to organizations by coaches and consultants who may have watched Top Gun once or twice but more than likely have never been part of a high-performing team [4].

So What?

According to Laszlo Block, “Having practitioners teaching is a far more effective than listening to academics, professional trainers, or consultants. Academics and professional trainers tend to have theoretical knowledge. They know how things ought to work, but haven’t lived them [5].” Unfortunately, most agile consultants’ toolboxes contain more processes and tools than human interaction knowhow. Why? They have not lived what they coach. And this is what is killing Agile.

Teaming Lessons from Fighter Aviation

To survive and thrive in their complex environment, fighter pilots learn to operate as a network of teams using the cognitive and social skills designed by industrial-organizational psychologists—there is actually real science behind building effective teams. It is the combination of inspect-and-adapt frameworks with human interactions skills developed out of the science of teamwork that ultimately build a high-performance culture and move organizational structures from traditional, functional models toward interconnected, flexible teams.

10 Reasons Why Your Next Agile High-Performance Teaming Coach Should Have a Fighter Aviation Background

OODA (Observe-Orient-Decide.-Act). According to Jeff Sutherland, “Fighter pilots have John Boyd’s OODA Loop burned into muscle memory. They know what agility really means and can teach it uncompromisingly to others.”

Empathy. A 1 v 1 dogfight is an exercise in empathy, according to the award-winning thinker, author, broadcaster, and speaker on today’s most significant trends in business, Geoff Colvin. In his 2015 book, Humans Are Underrated: What High Achievers Know that Brilliant Machines Never Will, Geoff pens, “Even a fighter jet dogfight, in which neither pilot would ever speak to or even see the other, was above all a human interaction. Few people would call it an exercise in empathy, but that’s what it was—discerning what was in the mind of someone else and responding appropriately. Winning required getting really good at it [6]” Interestingly, empathy is baked-in Boyd’s OODA Loop.

Debriefing (Retrospective). The most important ceremony in any continuous improvement process is the retrospective (debrief). Your fleet average fighter pilot has more than 1000 debriefs under their belt before they leave their first tour at the five-year mark of service. In Agile iterations years, that is equal to 19 years of experience [7]. Moreover, when compared to other retrospective or debriefing techniques, “Debriefing with fighter pilot techniques offer more ‘bang for the buck’ in terms of learning value [8].” Why is this? There are no games in fighter pilot debriefs, no happy or sad faces to put up on the white board – just real human interactions, face-to-face conversations that focus on what’s right, not who’s right. Fighter pilots learn early that the key to an effective retrospective is establishing a psychologically safe environment.

Psychological Safety. Psychological safety “describes a climate in which people feel free to express relevant thoughts and feelings [9].” Fighter pilots learn to master this leadership skill the day they step in their first debrief where they observe their flight instructor stand up in front of the team and admit her own shortcomings (display fallibility), asks questions, and uses direct language. Interestingly, according to Google’s Project Aristotle, the most important characteristic to building a high-performing team is psychological safety [10]. Great job Google!

Teaming (Mindset and Practice of Teamwork) [11]. Although not ideal, fighter pilots often find themselves in “pickup games” where they find a wingman of opportunity from another squadron, service, or country—even during combat operations. Knowing how to coordinate and collaborate without the benefit of operating as a stable team is a skill fighter pilots develop from building nontechnical known stable interfaces. These stable interfaces include a common language; shared mental models of planning, briefing, and debriefing; and being aligned to shared and common goals. Yes, you do not need stable teams and you they do not need to be co-located if you have known stable interfaces of human interaction.

Empirical Process. The engine of agility is the empirical process and in tactical aviation we use a simple plan-brief-execute-debrief cycle that, when coupled with proven human interaction skills, builds a resilient and learning culture. The inspect and adapt execution rhythm is the same around every mission, whether it be a flight across country or 40-plane strike into enemy territory, we always planned, briefed, executed the mission, and held a debrief. There is no room for skipping steps—no exceptions.

Adaptability/Flexibility. The ability to alter a course of action based on new information, maintain constructive behavior under pressure and adapt to internal and external environmental changes is what fighter pilots call adaptability or flexibility. Every tactical aviator who strapped on a $50M aircraft knows that flexibility is the key to airpower. Every flight does not go according to plan and sometimes the enemy gets a vote – disrupting the plan to the point where the mission looks like a pick-up game. 

Agility. Agility is adaptability with a timescale.

Practical Servant Leadership Experience. Fighter pilots have practical experience operating in complex environments and are recognized as servant leaders. But don’t take my word for it; watch this video by Simon Sinek to learn more.

Fun. Agility is about having fun. Two of my favorite sayings from my time in the cockpit are “You cannot plan fun” and “If you are not having fun, you are not doing it right.” If your organization is truly Agile, then you should be having fun.

So, who’s coaching your teams?

Brian “Ponch” Rivera is a recovering naval aviator, co-founder of AGLX Consulting, LLC, and co-creator of High-Performance Teaming™, an evidence-based approach to rapidly build and develop high-performing teams.

[1] “The idea of the Build-Measure-Learn feedback loop owes a lot to ideas from maneuver warfare, especially John Boyd’s OODA (Observe-Orient-Decide-Act) Loop.” Ries, E. The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. (Crown Publishing, 2011)

[2] “…Customer Development model with its iterative loops/pivots may sound like a new idea for entrepreneurs, it shares many features with U.S. warfighting strategy known as the “OODA Loop” articulated by John Boyd.” Blank, S. The Four Steps to the Epiphany. Successful Strategies for products that win. (2013)

[3] “In the book I talk about the origins of Scrum in the Toyota Production Systems and the OODA loop of combat aviation.” Sutherland, J. Scrum: The Art of Doing Twice the Work in Half the Time. New York. Crown Business (2014).

[4] I do not recommend the movie Top Gun as an Agile Training Resource.

[5] Block, L. Work Rules! That will transform how you live and lead. (Hachette Book Group, 2015).

[6] Geoff Colvin. Humans are Underrated: What high achievers know that brilliant machines never will, 96, (Portfolio/Penguin, 2015).

[7] Assuming two teams with iteration length of two weeks. And 100% retrospective execution.

[8] McGreevy, J. M., MD, FACSS, & Otten, T. D., BS. Briefing and Debriefing in the Operating Room Using Fighter Pilot Crew Resource Management. (2007, July).

[9] Edmondson, A.C. Teaming. How organizations Learn, Innovate, and Compete in the Knowledge Economy. Wiley. (2012)

[10] Duhigg, C. Smarter Faster Better: The Secrets to Being Productive in Life and Business. Random House. (2016).

[11] Edmondson, A.C. Teaming. How organizations Learn, Innovate, and Compete in the Knowledge Economy. Wiley. (2012)

Share This:

Risk Management and Error Trapping in Software and Hardware Development, Part 3

This is part 3 of a 3-part piece on risk management and error trapping in software and hardware development. The first post is located here (and should be read first to provide context on the content below), and part 2 is located here.

Root Cause Analysis and Process Improvement

Once a bug has been discovered and risk analysis / decision-making has been completed (see below), a retrospective-style analysis on the circumstances surrounding the engineering practices which failed to effectively trap the bug completes the cycle.

The purpose of the retrospective is not to assign blame or find fault, but rather to understand the cause of the failure to trap the bug, inspect the layers of the system, and determine if any additional layers, procedures, or process changes could effectively improve collective engineering surety and help to prevent future bugs emerging from similar causes.

Methodology

  1. Review sequence of events that led to the anomaly / bug.
  2. Determine root cause.
  3. Map the root cause to our defense-in-depth (Swiss cheese) model.
  4. Decide if there are remediation efforts or improvements which would be effective in supporting or restructuring the system to increase its effectiveness at error trapping.
  5. Implement any changes identified, sharing them publicly to ensure everyone understands the changes and the reasoning behind them.
  6. Monitor the changes, adjusting as necessary.

Review sequence of events

With appropriate representatives from engineering teams, certification, hardware, operations, customer success, etc., review the discovery path which led to finding the bug. The point is to understand the processes used, which ones worked, and which let the bug pass through.

Determine root cause and analyze the optimum layers for improvement

What caused the bug? There are many enablers and contributing factors, but typically only one or two root causes. The root cause is one or a possible combination of Organization, Communication, Knowledge, Experience, Discipline, Teamwork, or Leadership.

  • Organization – typically latent, organizational root causes include things like existing processes, tools, practices, habits, customs, etc., which the company or organization as a whole employs in carrying out its work.
  • Communication – a failure to convey necessary, important, or vital information to or among an individual or team who required it for the successful accomplishment of their work.
  • Knowledge – an individual, team, or organization did not possess the knowledge necessary to succeed. This is the root cause for knowledge-based errors.
  • Experience – an individual, team, or organization did not possess the experience necessary to successfully accomplish a task (as opposed to the knowledge about what to do). Experience is often a root cause in skill-based errors of omission.
  • Discipline – an individual, team, or organization did not possess the discipline necessary to apply their knowledge and experience to solving a problem. Discipline is often a root cause in skill-based errors of commission.
  • Teamwork – individuals, possibly at multiple levels, failed to work together as a team, support one another, and check one another against errors. Additional root causes may be knowledge, experience, communication, or discipline.
  • Leadership – less often seen at smaller organizations, a Leadership failure is typically a root cause when a leader and/or manager has not effectively communicated expectations or empowered execution regarding those expectations.

Map the root cause to the layer(s) which should have trapped the error

Given the root cause analysis, determine where in the system (which layer or layers) the bug should have been trapped. Often there will be multiple locations at which the bug should or could have been trapped, however the best location to identify is the one which most closely corresponds to the root cause of the bug. Consideration should also be given to timeliness. The earlier an error can be caught or prevented (trapped), the less costly it is in terms of both time (to find, fix, and eliminate the bug) and effort (a bug in production requires more effort from more people than a developer discovering a bug while checking their own unit test).

While we should seek to apply fixes at the locations best suited for them, the earliest point at which a bug could have been caught and prevented will often be the optimum place to improve the system.

For example, if a bug was traced back to a team’s discipline in writing and using tests (root cause: discipline and experience), then it would map to layers dealing with testing practices (TDD/ATDD), pair programming, acceptance criteria, definition of “Done,” etc. Those layers to which the team can most readily apply improvements and which will trap the error sooner rather than later should be the focus for improvement efforts.

Decide on improvements to increase system effectiveness

Based on the knowledge gained through analyzing and mapping the root cause, decisions are made on how to improve the effectiveness of the system at the layers identified. Using the testing example above, a team could decide that they need to adjust their definition of Done to include listing which tests a story has been tested against and their pass/fail conditions.

Implement the changes identified, and monitor them for effectiveness.

Risk Analysis

Should our preventative measures fail to stop a bug from escaping into a production environment, an analysis of the level of risk needs to be explicitly completed. (This is often done, but in an implicit way.) The analysis of the level of risk derives from two areas.

Risk Severity – the degree of impact the bug can be expected to have to the data, operations, or functionality of affected parties (the company, vendors, customers, etc.).

Blocking A bug that is so bad, or a feature that is so important, that we would not ship the next release until it is fixed/completed. Could also signify a bug that is currently impacting a customer’s operations, or one that is blocking development.
Critical A bug that needs to be resolved ASAP, but for which we wouldn’t stop everything. Bugs in this category are not impacting operations (a customer’s, or ours), but they are significantly challenging to warrant attention.
Major Best judgement should be used to determine how this stacks against other work. The bug is serious enough that it needs to be resolved, but the value of other work and timing should be considered. If a bug sits in major for too long, its categorization should be reviewed and either upgraded or downgraded.
Minor A bug that is known, but which we have explicitly de-prioritized. Such a bug will be fixed as time allows.
Trivial Should really consider closing this level of bug. At best these should be put into the “Long Tail” for tracking.

Risk Probability – the likelihood, expressed against a percentage, that those potentially affected by the bug will actually experience it (ie., always, only if they have a power outage, or only if the sun aligns with Jupiter during the slackwater phase of a diurnal tide in the northeastern hemisphere between 44 and 45 degrees Latitude).

Definite 100% – issue will occur in every case
Probable 60-99% – issue will occur in most cases
Possible 30-60% – coin-flip; issue may or may not occur
Unlikely 2-30% – issue will occur in less than 50% of cases
Won’t 1% – occurrence of the issue will be exceptionally rare

Given Risk Severity and Probability, the risk can be assessed according to the following matrix and assigned a Risk Assessment Code (RAC).

Risk Assessment Matrix Probability
Definite Probable Possible Unlikely Won’t
Severity Blocker 1 1 1 2 3
Critical 1 1 2 2 3
Major 2 2 2 3 4
Minor 3 3 3 4 5
Trivial 3 4 4 5 5

Risk Assessment Codes
1 – Strategic     2 – Significant     3 – Moderate     4 – Low     5 – Negligible

The Risk Assessment Codes are a significant factor in Risk decision-making.

  1. Strategic – the risk to the business or customers is significant enough that its realization could threaten operations, basic functioning, and/or professional reputation to the point that the basic survival of the business could be in jeopardy. As Arnold said in Predator: “We make a stand now, or there will be nobody left to go to the chopper!”
  2. Significant – the risk poses considerable, but not life-threatening, challenges for the business or its customers. If left unchecked, these risks may elevate to strategic levels.
  3. Moderate – the risk to business operations, continuity, and/or reputation is significant enough to warrant consideration against other business priorities and issues, but not significant enough to trigger higher responses.
  4. Low – the risk to the business is not significant enough to warrant special consideration of the risk against other priorities. Issues should be dealt with in routine, predictable, and business-as-usual ways.
  5. Negligible – the risk to the business is not significant enough to warrant further consideration except in exceptional circumstances (ie., we literally have nothing better to do).

Risk Decision

The risk decision is the point at which a decision is made about the risk. Typically, risk decisions take the form of:

  • Accept – accept the risk as it is and do not mitigate or take additional steps.
  • Delay – for less critical issues or dependencies, a decision about whether to accept or mitigate a risk may be delayed until additional information, research, or steps are completed.
  • Mitigate – establish a mitigation strategy and deal with the risk.

For risk mitigation, feasible Courses of Action (CoAs) should be developed to assist in making the mitigation plan. These potential actions comprise the mitigation and or reaction plan. Specifically, given a specific bug’s risk severity, probability, and resulting RAC, the courses of action are the possible mitigate solutions for the risk. Examples include:

— Pre-release —

  • Apply software fix / patch
  • Code refactor
  • Code rewrite
  • Release without the code integrated (re-build)
  • Hold the release and await code fix
  • Cancel the release

— In production —

  • Add to normal backlog and prioritize with normal workflow
  • Pull / create a team to triage and fix
  • Swarm / mob multiple teams on fix
  • Pull back / recall release
  • Release an additional fix as a micro-upgrade

For all risk decisions, those decisions should be recorded and those which remain active need to be tracked. There are many methods available for logging and tracking risk decisions, from spreadsheets to documentation to support tickets. There are entire software platforms expressly designed to track and monitor risk status and record decisions taken (or not) about risks.

Decisions to delay risk mitigations are the most important to track, as they require action and at the speed most business move today, a real risk exists of losing track of risk delay decisions. Therefore a Risk Log or Review should be used to routinely review the status of pending risk decisions and reevaluate them. Risk changes constantly, and risks may significantly change in severity and probability overnight. In reviewing risk decisions regularly, leadership is able to simultaneously ensure both that emerging risks are mitigated and that effort is not wasted unnecessarily (as when effort is put against a risk which has significantly declined in impact due to changes external to the business).

Conclusion

I hope you’ve enjoyed this 3-part series. Risk management and error trapping is a complicated and – at times – complex topic. There are many ways to approach these types of systems and many variations on the defense-in-depth model.

The specific implementation your business or organization chooses to adopt should reflect the reality and environment in which you operate, but the basic framework has proven useful across many domains, industries, and is directly adapted from Operational Risk Management as I used to practice and teach it in the military.

Understanding the root cause of your errors, where they slipped through your system, and how to improve your system’s resiliency and robustness are critical skills which you need to develop if they are not already functional. A mindful, purposeful approach to risk decision-making throughout your organization is also critical to your business operations.

Good luck!

 

Chris Alexander is a former U.S. Naval Officer who was an F-14 Tomcat flight officer and instructor. He is Co-Founder and Executive Team Member of AGLX Consulting, creators of the High-Performance Teaming™ model, a Scrum Trainer, Scrum Master, and Agile Coach.

Share This:

Risk Management and Error Trapping in Software and Hardware Development, Part 2

This is part 2 of a 3-part piece on risk management and error trapping in software and hardware development. The first post is located here (and should be read first to provide context on the content below).

Error Causality, Detection & Prevention

Errors occurring during software and hardware development (resulting in bugs) can be classified into two broad categories: (1) skill-based errors, and (2) knowledge-based errors.

Skill-based errors

Skill-based errors are those errors which emerge through the application of knowledge and experience. They are differentiated from knowledge-based errors in that they arise not from a lack of knowing what to do, but instead from either misapplication or failure to apply what is known. The two types of skill-based errors are errors of commission, and errors of omission.

Errors of commission are the mis-application of a previously learned behavior or  knowledge. To use a rock-climbing metaphor, if I tied my climbing rope to my harness with the wrong type of knot, I would be committing an error of commission. I know I need a knot and I know which knot to use and I know how to tie the correct knot – I simply did not do it correctly. In software development, one example of an error of commission might be an engineer providing the wrong variable to a function call, as in:

var x = 1;        // variable to call
var y = false;    // variable not to call
public function callVariable(x) {
return x;
}
callVariable(y); // should have provided “x” but gave “y” instead

Errors of omission, by contrast, are the failure to apply knowledge or experience (previously learned behaviors) to the given problem. In my climbing example, not tying the climbing rope to my harness (at all) before beginning to climb is an error of omission. (Don’t laugh – this actually happens.) In software development, an example of an error of omission would be an engineer forgetting to provide a variable to a function call (or forgetting to add the function call at all), as in:

var x = 1;              // variable to call
var y = false;          // variable not to call
public function callVariable(x) {
return x;
}
callVariable();   // should have provided “x” but left empty

Knowledge-based errors

Knowledge-based errors, in contrast to skill-based errors, arise from the failure to know the correct behavior to apply (if any). An example of a knowledge-based error would be a developer checking in code without any unit, integration, or system tests. If the developer is new and has never been indoctrinated to the requirements for code check-in as including having written and run a suite of automated unit, integration, and system tests, this is an error caused by a lack of knowledge (as opposed to omission, where the developer had been informed of the need to write and run the tests but failed to do so).

Defense-in-depth, the Swiss cheese model, bug prevention and detection

Prevention comprises the systems and processes employed to trap bugs and stop them from getting through development environments and into certification and/or production environments (depending on your software / hardware release process). In envisioning our Swiss cheese model, we need to understand that the layers include both latent and active types of error traps, and are designed to mitigate against certain types of errors.

The following are intended to aid in preventing bugs.

Tools & methods to mitigate against Skill-based errors in bug prevention:

  • Code base and architecture [latent]
  • Automated test coverage [active]
  • Manual test coverage [active]
  • Unit, feature, integration, system, and story tests [active]
  • TDD / ATDD / BDD / FDD practices [active]
  • Code reviews [active]
  • Pair Programming [active]
  • Performance testing [active]
  • Software development framework / methodology (ie, Scrum, Kanban, DevOps, etc.) [latent]

Tools & methods to mitigate against Knowledge-based errors in bug prevention:

  • Education & background [latent]
  • Recruiting and hiring practices [active]
  • New-hire Onboarding [active]
  • Performance feedback & professional development [active]
  • Design documents [active]
  • Definition of Done [active]
  • User Story Acceptance Criteria [active]
  • Code reviews [active]
  • Pair Programming [active]
  • Information Radiators [latent]

Detection is the term for the ways in which we find bugs, hopefully in the development environment but this phase would also include certification if your organization has a certification / QA phase. The primary focus of detection methods is to ensure no bugs escape into production. As such, the entire software certification system itself may be considered one, large, active layer of error trapping. In fact, in many enterprise companies, the certification or QA team (if you have one) is actually the last line of defense.

The following are intended to aid in detecting bugs:

Tools & methods to mitigate against Skill-based errors in detecting bugs:

  • Automated test coverage [active]
  • Manual test coverage [active]
  • Unit, feature, integration, system, and story tests [active]
  • TDD / ATDD / BDD / FDD practices [active]
  • Release certification testing [active]
  • Performance testing [active]
  • User Story Acceptance Criteria [active]
  • User Story “Done” Criteria [active]
  • Bug tracking software [active]
  • Triage reports [active]

Tools & methods to mitigate against Knowledge-based errors in detecting bugs:

  • Education & background [latent]
  • Professional development (individual / organizational) [latent / active]
  • Code reviews [active]
  • Automated & manual test coverage [active]
  • Unit, feature, integration, system, story tests [active]

When bugs “escape” the preventative measures of your Defense-in-depth system and are discovered in either the development or production environment, a root cause analysis should be conducted on your system based on the nature of the bug and how it could have been prevented and / or detected earlier. Based upon the findings of your root cause analysis, your system can be improved in specific, meaningful ways to increase both its robustness and resilience.

How an organization should, specifically, conduct root cause analysis, analyze risk and make purposeful decisions about risk, and how they should improve their system is the subject of part 3 in this series, available here.

 

Chris Alexander is a former U.S. Naval Officer who was an F-14 Tomcat flight officer and instructor. He is Co-Founder and Executive Team Member of AGLX Consulting, creators of the High-Performance Teaming™ model, a Scrum Trainer, Scrum Master, and Agile Coach.

Share This:

High-Performing Teams: Writing Code is Not Your Problem

Regardless of the software or hardware development processes used in your business domain, chances are if you are worried about your teams’ performance levels, their ability to write code or build hardware solutions is not your concern.

How do you build teams which are truly high-performing?

Teams which are able to work together toward levels of truly high-performance remain relatively elusive and seldom in most industries. Regardless of which frameworks, methodologies, and tools teams adopt and adapt, their productivity remains relatively average. This hurts the bottom line of the business, which has often agreed to accept certain restrictions on current productivity on the promise of significantly increased productivity once the new methodology or framework is in place and humming.

Sound familiar? This is a situation in which the application of multiple solutions entirely fails to address the actual problem.

Teams do not form around processes, methodologies, and frameworks; they form around the members of the team. Or, more specifically, they form around the social, non-technical interactions of the individuals within the team. When a team fails to effectively bond together, several problems are typically the root:

  1. The level of empathy at the team level is relatively low
  2. The number, type, and quality of social interactions is low
  3. There is low to no feedback within the group

Despite what you may believe, social skills are highly trainable and can be learned. Teams can build their social, non-technical skills in order to team together more effectively and achieve those levels of high-performance.

Moreover, leadership can directly enable these teaming activities by learning about how high-performing teams function and what they can do to enable those teams to coalesce and perform. The secret to leading highly-performing teams is that it actually isn’t that hard – but it does take a level of discipline and rigor which many leaders find exceptionally challenging.

If you want to learn about High-Performance Teaming™ and what you or your organization can do to get to those levels of high-performance, reach out to us at AGLX Consulting today.

Chris Alexander is a former Navy Lieutenant Commander, F-14 Tomcat RIO, software developer, Agile Coach, and Executive Team Member at AGLX Consulting, LLC.

Share This:

500% Productivity Increase in One Day: Lessons from a Stand-down

Last month, seven software development teams (35+ members) stepped away from their sprint for one day and participated in Sprint Stand-down. The problems the teams were trying to solve during the Stand-down were technical—the teams recognized they had a collective knowledge gap and needed to slow down to speed up.

During the Stand-down retrospective, we discovered the teams increased productivity by over 500% in one day—an unexpected and welcomed event outcome. The retrospective provided us an opportunity to examine the how and why behind the hyper-productivity realized in this unfamiliar, one-day training event.

The lessons we learned were not revolutionary; instead, the lessons reinforced the values and practices found in the Agile Manifesto, Scrum, Extreme Programming, CrossLead, Flawless Execution, Crew Resource Management and Threat and Error Management.  In one day, a Sprint Stand-down provided undeniable evidence to developers, product owners, managers, directors, VPs, and the CIO that empowered execution trumps the traditional command-and-control approach to product delivery.

The transferable lessons learned from the Stand-down fall into familiar categories:

  • Shared Purpose/Objective
  • Workload Management/Limit Work in Progress (WIP)
  • Leadership/Teamwork
  • Execution Rhythm or Cadence
  • Communication

Before going deeper into the lessons learned, I want to share a little bit about the origins, concept and our approach to a Sprint Stand-down.


Sprint Stand-down

You will not find a Sprint Stand-down in the Scrum Guide. A Stand-down is not found the Project Management Institute’s (PMI) vernacular nor is it part of any Agile or current trending management methodology. A Stand-down is a training evolution commonly used by elite military units, commercial aviation, and other high-reliability organizations (HRO) to accelerate team performance.

The purpose of any Stand-down is to promote knowledge-based training along with personal discipline and responsibility as essential elements of professionalism. It is designed to empower and inspire a community of professionals to continuously seek knowledge, integrate new information in everyday practice, and share new findings with others within the company and industry.

Stand-down Planning

The event was a self-organized undertaking where a small team of eight people were accountable for event execution. Planning for the event followed a rapid planning processes inspired by Crew Resource Management (CRM) and Threat and Error Management (TEM). The objectives of this Sprint Stand-down were to inform, inspire, educate, and motivate the teams—admittedly weak objectives as they lacked clarity and measurability.

With a shared understanding of the Stand-down objective(s), the planning team used a liberating structure to capture anticipated threats and the resources needed to overcome those threats, and reviewed lessons learned from previous events that were similar to a Stand-down. A Stand-down plan was formed in less than 35 minutes where each planning team member knew who would have to do what by when to ensure flawless Stand-down execution.

Stand-down Execution

The Stand-down included in-house subject matter experts and one external trainer with 35+ team members in one room for 6.5 hours. Team members treated the Stand-down as an offsite, declining all meetings and turning on their Outlook out-of-office replies. Team members were randomly assigned to one of two Stand-down teams as determined by the type of gift card they received when they entered the Stand-down room. Two additional gift cards were given to all participants for the purpose of regifting —team members were encouraged to give away their gift cards to other team members for any reason. Team members were warned that over lunch (provided by the company) they may be called upon to share with everyone to whom they gave a gift card to and why. The CIO provided an impromptu leadership moment which included the distribution of additional gift cards to team members who were nominated by their peers.

500%

An outcome of the event was an increase in productivity by 200% to 700% depending on the metric used (e.g. story points, stories done, stories in progress and stories done, etc.). However, it is likely, based on stories “done” during the Stand-down, 26, versus average stories completed during a normal sprint day, 5, the increase in productivity was 500%. In one day.

For argument’s sake, let’s just say the productivity outcome for this one day event was 20%, a palatable number for those who have not embraced the power of Scrum or empowered execution. What if we could take the lessons learned from this event and apply them to how we work during our normal workdays to get a productivity increase of 5% in the next two weeks?


Sprint Stand-down Lessons Learned

Shared Purpose/Objective

  • A team needs a shared purpose or common objective. Objectives should be clear, measurable, achievable, and aligned to a focus area, strategic line of effort, company vision, etc.
  • A shared purpose builds unity of effort. Teams were observed self-organizing throughout the day and reported a reduction in duplication of work and an increase in cross-team knowledge-sharing.

 Workload Management/  Limit WIP

  • Limit WIP. Individuals reported being happier as they felt part of a team of teams working toward one goal.
  • Context Switching is bad. Most team members reported that they did not check their email during the six hours. Team members reported that the internal Stand-down disruptions (we played music during frequent shout-outs) slowed them down and were absolutely disruptive.
  • Protect the teams from out-of-band work. Team members reported that they had no out-of band work during the day.
  • Empower team members to push back on work that is not aligned to the objective.
  • Pairing works. Teams paired all day. Some mobbed.

Leadership

  • Say “Thank You.” Team members should recognize and acknowledge the importance of others in task performance.
  • Leaders need to be visible but not intrusive. Checking-in to say “thank you” to individuals carries more weight than email.
  • An invisible leader is a visible problem. Team members noticed those leaders who failed to stop by to see how the day was progressing.
  • Unscripted leadership is the best kind. The CIO’s visit was received as genuine.
  • Recognition from leaders is great, but peer recognition of important contributions is even better.

 Execution Rhythm or Cadence

  • Stand-down tempo is not sustainable but the practice is sound when a knowledge gap exists.
  • Stand-downs should not exceed six hours.
  • Schedule Stand-downs as required. No more than once a month.

Communication

  • Face-to-face communication remains the gold standard.
  • Keep work visible. The teams shared one electronic backlog.
  • Co-locate teams to maximize the value of osmotic communication.
  • Cross-team pollination builds trust.

Brian “Ponch” Rivera is a recovering Naval Aviator and Commander in the U.S Navy Reserve. He is the co-founder of AGLX, LLC, a Seattle-based Agile Leadership Consulting Team, and a ScrumTotal Advisory Board Member.

Share This: