I guess I picked a bad day to go to Target. I kind of knew it was a bad day to go to Target because it was a Saturday and the day before Father’s Day. I didn’t know it would be a really, really bad day to go to Target - an all systems are down sort of day. But it was.
Target’s inability (or near inability by some accounts) to process customer transactions for about two hours last Saturday represented a mild-to-moderate inconvenience for me, requiring two trips that took way longer than they should have to purchase my husband’s Father’s Day present. (It’s a bike. He loves it.) Of course, for the employees of Target, the inconvenience was way more than moderate.
One can only speculate about how much revenue, not to mention goodwill, was lost during the outage. As it does, the Twitterverse had a heyday making fun of the situation. Rivals Walmart and Amazon no doubt got a boost, and I suspect there was a spike in shrinkage. (Retail-speak for theft.) And although the company was quick to state that the problem was not a hack or data breach, Target’s already marred reputation for data integrity certainly was not helped.
The company tweeted:
“June 15, 2019. Target’s registers are fully back online, and guests are able to purchase their merchandise again in all stores. The temporary outage earlier today was the result of an internal technology issue that lasted for approximately two hours. Our technology team worked quickly to identify and fix the issue, and we apologize for the inconvenience and frustration this caused for our guests. After an initial but thorough review, we can confirm that this was not a data breach or security-related issue, and no guest information was compromised at any time. We appreciate all of our store team members who worked quickly to assist guests and thank everyone involved for their patience.”
So, what now?
What does an organization do when something so damaging – and so public – happens? They can simply fix the technical glitch and treat it as a one-off problem, or they can learn something from high-reliability organizations like air traffic control and nuclear power plants, that lose more than revenue and PR points when something goes wrong. If Target chooses that path, they can learn a lot from this situation.
Technology - What Set of Conditions Allowed this to Happen?
At this point, I have no idea what went wrong, but as someone who’s spent a long time in the software industry, I’ve seen what can happen in IT teams when something like this goes down. The first question folks are tempted to ask is, “Who screwed this up?” The simplest thing to do and the quickest way to achieve a false sense of security is to blame someone for the problem. Maybe you fire them. Problem solved … Not.
High-reliability organizations practice systems thinking and would see an incident like this as something that needs both corrective and preventative action. No one person, whether they be a bad programmer or a bad actor, caused this issue. There existed a set of conditions that made it possible. Now that the cause of the glitch has been identified and solved, the next focus should be on determining the root cause and the conditions that failed to prevent the problem.
Operations – Was there a plan?
If the company looks at this situation as only a technical problem, they are missing many more opportunities for improvement. As a customer in the store when it happened, I can offer a few suggestions. I’d bet the farm that employees can as well. I hope Target asks and listens. One hopes that something like this doesn’t happen again, but an HRO would never make that assumption. They always plan for resiliency. At the store level, Target could have had a better plan for resiliency. I hope they look for opportunities to improve:
Consistency – I have no direct knowledge of how/whether Target store managers are trained to handle this sort of thing, but from my experience and many online accounts, the responses were not consistent. Some stores handed out coupons and Starbucks samples. Some found workarounds like using the phone app to ring up purchases. Some shut their doors while others remained open. It’s hard to know what to do when technology goes down because it may come back up at any minute but developing a consistent and logical response to such an event seems like a good exercise for such a large retailer.
Communication – While I was in the store, there was no announcement about the problem. There was no sign. After walking in, my daughter and I simply noticed very crowded checkout lines, which was odd considering that I got a remarkable parking space. After just a minute of observation, she said, “Something is wrong.” Once we determined that the registers were down, we figured we’d just go to the other Target in town. I wonder how many people actually did that? Luckily my husband saw the global issue on Twitter and sent us a text. Why did I have to hear about the problem from my husband when I was in the store? A customer communications review would make a great Kaizen event.
Containment – I mentioned that some stores closed their doors. Mine didn’t. I really wish they had. Again, it’s tricky because you don’t know when the system will come back online, but if you have x number of people in the store who want to buy something from you, but can’t, why would you let it grow to x+y? Allowing more people in just increased the net amount of frustration and chaos. I don’t know the right answer, but there is a tipping point, close after 20 minutes? An hour? This should definitely be part of the postmortem.
No organization is immune to problems like this one; it just gets a lot more attention when it happens to one of America’s favorite brick-and-mortar retailers. As a software professional, I feel for the Target technology team. It was a bad day. But something good can come out of it if the leaders at Target do what high-reliability organizations do and center expertise above authority.
The people who were on the ground in the stores weren’t closest to the technology problem, but they were closest to the impact, and they know what would have helped them provide the best possible experience for guests given the situation. I hope Target keeps that in mind.