Incident Response: Who does what?
Who does what when something goes wrong?
Who owns the actual service and operation of a product: Product, Engineering, or someone else? Who is supposed to keep customers informed when there’s a service disruption? Who is supposed to, you know, fix it?
Generally, I’m a fan of Engineering owning all the technical service side, with Product or Customer Success owning the customer relationship. That’s not because Engineering can’t communicate, but rather because they have other things to do.
In practice, that looks like: Engineering owns service, and keeps Product and CS in the loop on updates, status, risks, and of course actual incidents.
Having an incident response process (look to ITIL). While Product can own incident response, and in fact I’ve run these in the past, it’s really best if Engineering takes ownership since they’re closer to the issue. Product or CS owns communicating with customers, and that communication will look different depending on the issue, service plan, and other factors.
Always doing a root cause analysis, which follows the same roles: Engineering runs the RCA and keeps Product/CS in the loop, while Product/CS runs communications. And of course, if the RCA uncovers a deeper issue, or the incident was resolved with a quick fix, Product may work a more permanent fix into the backlog.
Finally, all of this may change from company to company. Some orgs will lean more heavily on Product for customer communications, others on Customer Success. Some will have Product more heavily involved in the technical investigation and fix, others will want Product completely out of that work. It all comes down to the organization, the product, and the people. But this is a good place to start, if you’re looking to tighten up incident response.