Final Assignment – Nikhil

What is SRE? – 3-4 Lines

  • Site Reliability Engineer, takes care of the platform/site reliability and stability.
  • Bridges the gap between development and operations.
  • Manages risk.
  • Leverages automation to reduce toil.
  • Engineering approach to operations.

5 Action items to perform to transform from Ops to SRE.

  • Work closely with product/development team.
  • Break down the silos between product and ops.
  • Automate as much as possible to reduce the toil.
  • Incorporate engineering approach to operations.
  • Being proactive rather than reactive.
  • Set sli’s, slo’s, sla’s.
  • Set Error budjets.
  • Incident Management.

Top 10 SRE tools.

  • Datadog
  • Prometheus
  • Grafana
  • Kubenetes
  • Docker swarm
  • Gitlab CI/CD
  • Docker
  • Jenkins
  • Chef
  • Confluence
  • Jira
  • Ansible
  • Terraform
  • runDeck
  • ELK
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x