Five Essential Non-technical Skills for SRE Success
Five Essential Non-technical Skills for SRE Success
8 June 2022
One of the first questions I often hear asked in SRE and/or DevOps communities is “What technologies should I be learning?” in the context of career fulfillment or trying to land a new position. Keeping up with new tools and technologies like Kubernetes, Terraform, and AWS can help with those questions; however, I find that the non-technical (or “soft skills”) are sometimes forgotten when these questions are asked. These are equally important skills to have and in this post, I will share five essential non-technical skills (in no particular order) that I believe are necessary for SRE success. The skill of “naming things appropriately” did not make the list as let’s face it...that skill is quite impossible to master.
In SRE, good communication is very important for providing clarity and accurate context no matter the situation. To practice good communication, you need to mean what you say and say what you mean, and over communication is always better than no communication. Being open about what is exactly happening during incidents and continually asking questions is far better than just making assumptions and being shut off to dialogue with your teammates and/or clients. There are many ways to facilitate good communication in the era of remote working:
- Slack, Discord, or Microsoft Teams channels for internal and cross-team collaboration
- Ticketing systems such as JIRA to share requirement specifications
- Zoom, Google Meet, or Slack Huddles for audio/video calls
- PagerDuty and Amazon SNS for incident and alerting notifications
- StatusPage for consistent team messaging around incident response
Communication via our text-based mediums has become even more important in the last few years. It can be difficult to convey certain emotions via text; be aware that the tone in your messaging can be taken the wrong way, so always try to provide clarity via an audio or video call where necessary.
All teams and individuals would benefit from having more empathy with each other, and SRE is no exception. Being able to “put yourself in the requester’s shoes” is a fantastic skill as you can set aside any bias or ego and help get to the root cause of issues more efficiently. Empathy should be thought a two-way street in SRE though, and I wouldn’t expect anyone to be completely empathetic or compassionate with someone who is being purposefully frustrating or unwilling to help or learn. However, setting the tone and being the role model to demonstrate “good support/troubleshooting” can help both the wider SRE team viewing those support interactions, and also the requestor when interacting with the SRE team in future situations
The adage “if you want to go fast, go alone. If you want to go far, go together” rings true for SRE. Nobody wants to be the one and only person who can resolve all issues for a service, and if you think “yes, I do”, trust me, you do not. Spreading the workload provides significant improvements in SRE team efficiency, helps reduce knowledge hoarding, and levels up all team member’s skills as well. Further, being a good team member and helping complete a colleague’s task or working together to figure out a problem’s solutions helps build trust between members of the SRE team, creating great team dynamics. Being a “team player” goes beyond simply doing your assigned tasks for yourself; it’s truly about “being there” for each other. Examples of being a good team player include:
- Pairing and screen-sharing to debug or troubleshoot issues
- Being present and engaged when fellow team members are presenting during meetings
- Presenting and demoing solutions/documentation to your team to build understanding
- Being open to covering or swapping pager/on-call shifts for those in need
- Offering to assist others with troubleshooting
For me, motivation can come in two forms: motivation to learn anything and motivation to fix anything. Sure, there are times when an SRE team is given what can only be described as a “garbage” service (or a “dumpster-fire” as previous SRE teams I have been on have called this) to support, but having a keen sense of adventure will certainly push both the team and you individually towards growth. Members of SRE teams typically need to have a breadth of knowledge and having the motivation to learn will help enhance the skill reputation of the team, and will help with future employment opportunities individually as well. Having the motivation to want to fix services is also very important. Fixes can take the form of automation, feature development including establishing roadmaps and demoing these changes, patching services, and constantly trying to improve your services and processes. Wanting to make a service better will take you and the SRE team a long way.
Good documentation does not mean showing a fellow team member a process once and thinking “mission accomplished.” Good documentation means providing enough context for a teammate to be able to understand the problem space and product/service to support. This can come in the form of flow charts and diagrams, video demos and recordings, and documentation artifacts in a README or Confluence-like tool.
💡 An incredibly important part of documentation is writing good-quality tickets. If you don’t have time to put the entire context during ticket creation because of say an incident or some other scenario, that’s perfectly fine. However, do your best to get **back** to the ticket as soon as you can. There’s nothing worse for a fellow SRE teammate to read an empty ticket with a subject to “fix service A”. Further, it is also frustrating for the ticket author to forget “what exactly did I want to be fixed?” when reading back the same ticket, so always opt for more information over no information. Your SRE team could specify templates for tickets that include:
- Issue description/details
- How to reproduce (when dealing with bugs or misconfigurations)
- Acceptance criteria
These can take your tickets from average to great and certainly help provide a shared context and working agreement for the entire SRE team!
There will always be a plethora of tools, languages, and technologies required to be successful in an SRE team or SRE role. These are forever changing, as evidenced by what was popular five, ten, 15, and 20 years ago. However, it is truly in your best interest to place importance on sharpening these timeless non-technical skills as well. Often, these can be the differentiator between good or great SRE teams and SRE team members.