What is SRE?

What is SRE? and How its differnece from Ops?

What is difference between SRE and DevOps?

What are the function of SRE Team?

What are the best practices for Toil Management?

Please me understand service level in SRE?

What are the phases to work on incidents?

What are the items we should have for postmortam?

What is Obserbability?

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

28 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Mariano Gazzola
Mariano Gazzola
6 months ago

What is SRE? and How its differnece from Ops?
SRE looks for continuos enhancemente by applying a methodology for reducing toil, prevent problems recurrence, be poractive. Ops is just fixing the incidents.
What is difference between SRE and DevOps?
Devops is about implementing a toolchain for continous delivery, SRE is also about toolchaing but also integrates operations and methodology, it’s a philosofy for managing IT.
What are the function of SRE Team?
Eliminate tool to focus on enhancement, manage risk derived from changes, handle failures and prevent recurrence
What are the best practices for Toil Management?
50% of time on operations, 50% on identifying toil and automating it.
Please me understand service level in SRE?
SLI, SLO, SLA
What are the phases to work on incidents?
triage, examine, diagnose, test, cure
What are the items we should have for postmortam?
timeline, evidences, analysis, RC, lessons learnt, corrective actions, education
What is Obserbability?
Monitoring of target environment by reviewing events, tracking metrics, finding patterns in logs and checking the health metrics of the infrastructure and applocations

John Ossa
John Ossa
6 months ago

What is SRE? and How its differnece from Ops?
  SRE is an aproach to gain service reability Through some principal tenents like: eliminate toil, work with service levels, manage failure .

What is difference between SRE and DevOps?
  The devops teams is include in the app codding phase 

What are the function of SRE Team?
  reduce toil, manage incidentes, improve observavility 

What are the best practices for Toil Management?
  understanding toil, limitin toil and eliminating 

Please me understand service level in SRE?
  is a way to measure the reability 

What are the phases to work on incidents?
  examine, diagnose, remediate

What are the items we should have for postmortam?
  problem summary, timeline, lessns learned

What is Obserbability?
  its more than monitoring, its gain visibility of many aspects as SLO,SLI,SLA, alerts , and more from a system 

shardendu
shardendu
6 months ago

Q1 What is SRE? and How its differnece from Ops?

culture approach that combines aspects of software engineering and applies them to infrastructure and operations problems, Unlike traditional Ops, SRE uses code, automation, and teamwork to boost reliability, not just manual tasks.

Q2 What is difference between SRE and DevOps?

focuses on reliability using software and automation. DevOps is about merging development and operations teams to streamline software delivery a. SRE emphasizes reliability; DevOps emphasizes collaboration

Q3 What are the function of SRE Team?

   Reliability
   Automation
   Monitoring
   Collaboration

Q4 What are the best practices for Toil Management?

   Automation
   Documentation
   Analysis
   Task Reduction
Q5 Please me understand service level in SRE?

   Service Level Objectives
   Service Level Indicators
   Service Level Agreements

Q6 What are the phases to work on incidents?

monitoring alert
Response
Mitigate
Root Cause Analysis
Document

Q7 What are the items we should have for postmortam?

Incident detail, Timeline, action taken , lessons learnt blameless postmortem ,

Q8 What is Obserbability?

having the tools and systems to understand how software and services work in real-time, quick problem detection, diagnosis, resolution.

Sammi
Sammi
6 months ago

#What is SRE? and How its differnece from Ops?
SRE is more focusing on site reliability and availibility. Ops(operations) just means the works for mantaining system.

#What is difference between SRE and DevOps?
DevOps includes development and operations, and SRE only includes operations.

#What are the function of SRE Team?
– Eliminating toil
– Managing risk
– Handling failure

#What are the best practices for Toil Management?
1. Identify and define toil
2. Quantify toil
3. Prioritize Automatation
4. Automate

#Please me understand service level in SRE?
1. SLA – service level agreement
2. SLO – service level goal
3. SLI – key metrics

#What are the phases to work on incidents?
1. Detection
2. Logging
3. Notification
4. Response
5. Resolution
6. Documentation
7. Postmortem

#What are the items we should have for postmortam?
1. Incident info
2. Root cause analysis
3. Timeline of actions
4. Document
5. Follow-up

#What is Obserbability?
1. Metric
2. Event
3. Logs
4. Traces

Milan Schild
Milan Schild
6 months ago

What is SRE? and How its differnece from Ops?
SRE is the way how all the teams and components should work together.
Ops is only the team to perform the things, but SRE includes also all other operations around.

What is difference between SRE and DevOps?
DevOps is more the way of developing, SRE is wider principle for whole product or site functioning.

What are the function of SRE Team?
Joining all the other parts together, communicate, automate, remove toil as much as possible, 

What are the best practices for Toil Management?
Keep the toil max on 50% of time by automate everything possible 🙂

Help me understand service level in SRE?
SLI – parameters of the service
SLO – parameters for the service in a time slot
SLA – contractual parameters how the service should be available

What are the phases to work on incidents?
detect, first aid, diagnose, resolution, postmortem

What are the items we should have for postmortam?
analyze the whole chain of events, record all actions has been done, RCA, list of all involved people+components, lessns learned

What is Obserbability?
The capability to see the all available logs/info, trace the events and visualize it

Ville Kääriä
Ville Kääriä
6 months ago

1. What is SRE. SRE vs Ops
SRE is person/team focusing on availability, performance, incident management and monitoring of services. For SRE one of the main goals is to automate manual work and be involved at the start of SDLC instead of just running/supporting production.

2. SRE vs DevOps
DevOps is a mindset to automate everything, everyone can take part on DevOps. SRE is specific role focusing on availability, performance, incident management and monitoring.

3. SRE functions
Automate manual work, manage risk and handling failure

4. Best practices for toil management
Evaluate and messure how much each toil is taking time/resources.
How much time it would take to automate Toil.
Choose which automation/process change would be most cost effective/most important.

5. Service levels
SLA = service level agreement. Agreed service level with customer.
SLA = service level objective. Agreed within organization/SRE team what is acceptable level of service. This level needs to be higher than SLA and have realistic margin so that service level can be improved before SLA is reached.
SLI = service level indicator. To see actual status of a service with latency, amount traffic, amount of errors, platform/server saturation

6. Phases for incident
triage, get services up
examine, understand the problem
diagnosis, find root cause for the problem
Test, verify you understand the problem
Cure, fix problem long term

7. Postmortem
document the issue
identified root cause
What fixes should be done to fix so problem will not happen again

8. Observability
Ability to observe and monitor current state of service based on logs, events, metrics and traces

Faisal
Faisal
6 months ago

1.A. SRE posses the technical knowledge and are engaged right from the architecture to the operations and have complete information of the product. Ops teams they are bridge between the operation teams and the business. They do not have technical knowledge of the product.

2.A. Devops team are focus on developing the software and delivering it with quality. SRE focus on the software that is easily manageable, scalable and resilient. SRE also focus on existing projects to operate it more effectively by enhancing the deliverables using SRE principles of automating, observability etc.,

3.A. Eliminating toil, managing risks and handling failures

4.A. Toil should not exceed 50% of total work load

5.A.SLI,SLO and SLA

6.A. Triage, Examine, diagnose, test and cure

7.A. Identify the root cause and have it well documented. It should be neutral, blame-free and published with in the organisation

8.A. Metrics, Events, logs and Traces

Alison Silva
Alison Silva
6 months ago

What is SRE? and How its differnece from Ops?
Engineering approach for IT operation. SRE is focused in availability, efficiency, eliminate toil, manage risk and avoid failures.

What is difference between SRE and DevOps?
SRE concerns in software engineering to design operation function. DevOps to build and run.

What are the function of SRE Team?
Work to keep highest availability and efficiency. Document problems, share knowledge and solutions.

What are the best practices for Toil Management?
Identify, measure, prioritize

What are the phases to work on incidents?
Triage, examine, diagnose, test, cure

What are the items we should have for postmortam?
Issues in discussion, action items, assignments, lessons learned

What is Obserbability?
Collection of metrics, logs and traces that can monitor or control the state of system

Ankur Malik
Ankur Malik
6 months ago

What is SRE? and How its different from Ops?

SRE is an approach to operations which uses the software as their primary tool for managing the systems. SRE is different from Ops because SRE is more prescriptive and more feasible on internal migrations.

What is difference between SRE and DevOps?

The major difference is that DevOps teams create software and then refine it. Where SRE teams work with already-built software to ensure it functions correctly and cooperates with other software and systems.

What are the function of SRE Team?

The main function of SRE team is to elimination the toil, Need to work on service levels and managing failure.

What are the best practices for Toil Management?

The best practices for Toil management are – Data Driven Analysis, Toil reduction backlog, Cost benefit analysis, Automation projects etc.

Please me understand service level in SRE?
SLI, SLO and SLA
SLI-SLI are the metrics used to measure the levels of services provided to end users.
SLO-SLO are the targeted levels of services measured by SLIs they are typically expressed as a percentage over a period of time.
SLA- SLA are the agreement that outline the levels of service end users can expect from service providers. like service credits, subscription extension, services etc.

What are the phases to work on incidents?
The Incidents phases can say that Triage, Examine, Diagnose, Test and Cure.

What are the items we should have for postmortem?
A postmortem is the record of an incident, its impact, the actions taken to mitigate or resolve it, the root cause, and the follow-up actions to prevent the incident from recurring.

What is Observability?
the ability to monitor your system to discover and diagnose problems as they occur.

jorge Sadao
jorge Sadao
6 months ago

 Site realibility engineering   
 DEVOPS is the development methodology
 SRE is a component of DevOPS
 SRE functions : eliminating Toil , managing risk , Handling failure
 Toil : labor intensive repetitive automatable
 Incidents : past experience , issue complexity 
Postmortam : blameless analisys , continuous improvement
Obserbability : end to end vision of the customer bussinnes process 

Kiran Kumar Vuyyuru
Kiran Kumar Vuyyuru
6 months ago
  1. SRE is an approach to operations to manage the system. SRE has shared gloas and common platform, involves with development team from starting from Design phase.
  2. Shared Goals, Overlapping Skills, Commn Basis
  3. Design, Development, Testing, Support, Maintenance and Capacity
  4. Engineer toil out of the system,Start small and then improve,Increase uniformity,Use SLOs to reduce toil.
  5. Service level , SLA – Agreement with customer about the level of support we provide , SLO – Service level of object what we define based on SLA. SLI: Actual Service Level Indicator.
  6. preparation and prevention; detection and analysis; containment, eradication, and recovery; and post-incident activity
  7. Pareto Charts,Failure Mode and Effect Analysis,5 Whys.
  8. Condition of a complex system based only on knowledge of its external outputs.
DannyR
DannyR
6 months ago

Q1- What is SRE?
A1 – Site Reliability Engineering, is a discipline that incorporates engineering and operations to ensure the reliability, performance, and scalability of systems. SRE use data and automation to monitor, troubleshoot, and improve systems.

Q2 – How does SRE differ from Ops?
A2 – SRE teams are more proactive than traditional Ops teams. They use data and analytics to identify potential problems before they occur. They also use automation to streamline and speed up their work.

Q3 – What is the difference between SRE and DevOps?
A3 – SRE and DevOps are both approaches to software development and operations that emphasize collaboration and automation. However, SRE is more focused on the reliability and performance of systems, while DevOps is more focused on the speed and frequency of software releases.

Q4- What are the functions of an SRE team?
A4 – SRE responsibilities, include:
a) Monitoring systems for performance and reliability issues
b) Troubleshooting and solving incidents
c) Implementing and manage automations
d) Work with developers to improve the reliability of software

Q5 – What are the best practices for toil management?
A5 – Toil is repetitive, manual work that is necessary to keep systems running. It can be a major drain on SRE teams’ time and energy. Here are some best practices for toil management:
a) Identify and automate as much toil as possible
b) Use data and analytics to identify and prioritize toil
c) Delegate toil to less skilled workers
d) Outsource toil to third-party vendors

Q6 – What is service level in SRE?
A6 – Service level, or SLO, is a set of targets that define the reliability, performance, and availability of a system. SREs using SLOs like a measure of progress and ensure that they are meeting the needs of their customers.

Q7 – What are the phases to work on incidents?
A7 – The four phases of incident response are:
a) Detection: The incident is detected and reported
b) Triage: The severity of the incident is assessed and a response plan is developed.
c) Resolution: The incident is solved and the system is restored to normal operation.
d) Postmortem: The incident is analyzed to identify the root cause and prevent similar incidents not happening in the future.

Q8 – What are the items we should have for a postmortem?
A8 – The postmortem include someone items:
a) A description of the incident with timeline of the events
c) RCA – root cause analysis (5 Whys, Ishikawa – Fishbone diagram )
d) Recommendations to prevent similar incidents(learning lessons)

Q9 – What is observability?
A9 – Observability is the ability to understand internal state of a system by observing its outputs. SREs using observability to monitor systems for performance and reliability to prevent issues.

Victor Palomo
Victor Palomo
6 months ago

What is SRE? and How its differnece from Ops?

– SRE uses the software as primary tool to manage systems

– SRE was engage since the begining of the creation of a new project with development team

What is difference between SRE and DevOps?

What are the function of SRE Team?

– Think about eliminating toil, managing riks and Hadling failure

What are the best practices for Toil Management?

– Think about effort to automatize a task and the save of time.

Please me understand service level in SRE?

– SLI.- Are metrics used to manage the level of service.

– SLO.- Are agrements with customer or another parties that usually doesn’t result in a penalty are used to know the status of level of service.

– SLA.- Are agrements with customer or another parties that could result in lost of money, credit or penalty if we can’t get the proper result in the metrics.

What are the phases to work on incidents?

– Analisis
– Engage proper teams.
– Fix the issue
– Monitoring

What are the items we should have for postmortam?

– Time of the incident
– Logs
– Actions that happened during the incident.

What is Obserbability?

– It’s the evolution of monitoring.

swati singh
swati singh
6 months ago

What is SRE? and How its differnece from Ops?
>SRE uses tools and automation for smoother operations and reliability. While ops team’s task is to maintain infrastructure.

What is difference between SRE and DevOps?
>focuses on reliability using software and automation. DevOps is about merging development and operations teams to streamline software delivery a. SRE emphasizes reliability; DevOps emphasizes collaboration

What are the function of SRE Team?
>Eliminate toil, manage risks & handle failure.
What are the best practices for Toil Management?
>Automate manual and repetitive work.
>Prioritize issues. Focus on issues that makes difference.
>Implement Observability.

Please me understand service level in SRE?
>Service Level Indicator (SLI), Service Level Objective (SLO) & Service Level Agreement (SLA).

What are the phases to work on incidents?
>Triage, examine, diagnose, test, & cure

What are the items we should have for postmortam?
>Document incident and its resolution.
>Identify Root cause & apply fix.
>continuous improvement.

What is Obserbability?
>MELT. Metrices, Events , Logs & traces.

Kostiantyn Konstantinov
Kostiantyn Konstantinov
6 months ago

What is SRE? and How its differnece from Ops?

  • SRE is set of principals and practices to use software to make the system more stable/reliable.

Difference: SRE using automation, has a stronger engineering skills and working together with Dev team to to ensure the system/service stability.
What is difference between SRE and DevOps?

  • SRE is mainly focused on the stability, reliability of the service/product, and not on the speed of implementing of new features as it is in DevOps.

What are the function of SRE Team?

  • Eliminating toil, managing risk, handling issues

What are the best practices for Toil Management?

  • Spend no more than 50% of the time.

Please me understand service level in SRE?

  • SLA, SLA, SLI

What are the phases to work on incidents?

  • Triage, examine, diagnose, test, cure.

What are the items we should have for postmortam?

  • Incident details, timeline, impact, root cause, lesson learned, preventive actions.

What is Obserbability?

  • Metrics, Events, logs, traces.
Dirk Willemans
Dirk Willemans
6 months ago

What is SRE? and How its difference from Ops?
 
An SRE is tasked to ensure collaboration between DEV and OPS through automation and enhancement of processes , tools
 
What is difference between SRE and DevOps?
While DEVOPS is focusing on ensuring rapid release of stable, secure software. SRE is more focusing on a set of practices and metrics to improve collaboration and service delivery. SRE is closer to the business and also is a bridge between business and DEV&OPS.
 
What are the function of SRE Team?
SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.
 
What are the best practices for Toil Management?

  1. Identify and measure toil
  2. Get toil out of the system by automation or finding the root cause
  3. Use feedback to improve

Best practice: start small and then improve.
 
 
Please me understand service level in SRE?
SLO (range of sli’s)  /SLI(indicator) /SLA(greement)
SRE mostly focuses on SLO/SLI. SLA’S helps to define both.
 
What are the phases to work on incidents?
Prep
Identify
Containment
Recover
Leran
Re-test
 
What are the items we should have for post-mortems?
 
Assembling background on the incident:
When dis it start
When was i dtected
What was the efect
What was oveserved by end-client
What actions were taken

 
 
What is Observability?
Is in between Metrics/traces/logs
Observability is a tooling that enables SRE to debug their systems .

Bert Smets
Bert Smets
6 months ago

What is SRE? and How its differnece from Ops?

What is difference between SRE and DevOps?
Thet have much un common but SRE focusses more on the reliability while DevOps looks more to the development.

What are the function of SRE Team?
Eliminate toil, manage risk, handle failures

What are the best practices for Toil Management?
Identify, prioritize, limit time spent on toil

Please me understand service level in SRE?
SLI – The metrics used to monitor the health
SLO – objective you must reach to meet the agreed target
SLA – agreed values with customers/users for a certain metric

What are the phases to work on incidents?
Triage, examine, diagnose, test, cure

What are the items we should have for postmortam?
Problem summary, lessons learned, action items, timeline

What is Obserbability?
a mechanism that helps to understand and explain unexpected system behavior with the help of logs, traces and metrics.

Arturo Alvarado Esparza
Arturo Alvarado Esparza
6 months ago

What is SRE? and How its differnece from Ops?

SRE – Site Reliability 
SRE works on Operations tasks but must identify and automate repetitive tasks, works in incidents, postmortems, its a upgraded version of OPS with an end to end vision

What is difference between SRE and DevOps?
DevOps teams works in develope,create and delivery sofware and then refine it and SRE works with built software to ensure its funcionality at the required level, optmizing systems and the resources are available

What are the function of SRE Team?
Eliminate Toil, managing risk and handling Failure

What are the best practices for Toil Management?
Data Analysis Identify toil, priorize projects, cost-benefit analysis, Automation projects, buld costs vs toil costs

Please me understand service level in SRE?
SLI – a quantifible measure of service realibility
SLO – a goal- a realiability target for an SLI
SLA – consequences

What are the phases to work on incidents?
triage, examine, diagnose, test and cure

What are the items we should have for postmortam?
Blameless analysis

What is Obserbability?
Metrics Events, Logs and traces to identify a desviation complex system

Francisco Jose Montoya
Francisco Jose Montoya
6 months ago

What is SRE and how its different from ops?
An SRE is in charge of helping to mantain an application working as best as it can during the expected time, in order to do this he needs to be involved in the planning, designing, develpment, testing, etc of the applications.
Ops team is only responsible to keep the applications running, most of the times without knowing who it works and reacting to events when they are already happened.

What is difference between SRE and Devops?
They have different goals, for Devops teams their main goal is to create new applications, delivered new functions, as fast as they can, they are minimal interested on how those applications may be maintain during their life cicle and SRE are involved in the almost the same phases than devops teams but they are also focus on maintain the application running, the quality of the service, automation, instrumentation.
What are the functions of SRE Team?
Eliminating Toil, Work with service levels and managing failures

What are the best practices for Toil Management?
Identify and mesure toil
Priorited toil-reduction
adopt toil-reduction techniques
Please me understand service level in SRE?
We have 3 main Service levels on SRE
SLI.- Service Level Indicators. this are the metrics that we use to level a service like availability, latency; and they are presented in percentage

SLO.- Service Level Objectives. This are the target levels for a system availabity (the operation level that we want to obtain for it) and they are presented in percentage

SLA.- Service Level Aggreements. This are the contractual aggreements between the service provider and the users about the levels of the services provided and if you do not reach them usually represent consequences for the provider 

What are the phases to work on incidents?
Triage, Examine, Diagnose, Test and Cure

What are the items we should have for postmortems?
log information, events, graphics of capacity.

What is Observability?
is an strategy to keep the most relevant and important information to measure a system state based on the data generated by their components.

Silvia Acosta Oleta
Silvia Acosta Oleta
6 months ago

1.     What is SRE? And how its  differnece from OPS??
Is full stack Systems trinking and coding skills, with app service availability focus that is data drive
The difference is that  differents goals, skillers, tools and. Only OPS supports, maintenance and capacity
SRE is full. design, development, acceptance, delivery, support, maintenance and capacity
2.     What is difference between SRE and DevOps
DevOps only design, development, testing, acceptance, delivery and SRE design, development, acceptance, delivery, support, maintenance and capacity (NOT TESTING)
3.     What is  the function of SRE team
Agreed delivery, Full agency, after the event
4.     What are the best practices for toil management?
Identify and automate, prioritice projects
5.     Please me understand service level in SRE ?
Are 3 services level SLI, SLO, SLA
But where sRE is focused is the SLO, SLI  
6.     What are the phases to work on incidents ?
Triage, Examine, diagnose, test, cure
7.     What are the items we should have post mortem
Document the incident and resolution and identification root cause and fix
 
8.     What is observability
Is see logs, trace, graphics for detected performance degradation or problems that cause impact 

_ _
_ _
6 months ago

1. What is SRE? and How its differnece from Ops?
SRE is set of practices used in the deploymet/test/operating of software, Ops (Operations) is a team that is resposible 
for BAU – aka keepint the software running withing the agreed SLAs. Ops is integrated as part of SREngineering practicess as SREngineers
where they are have an extra function of loopback for Devops team allowing to better use resources and incrase overall Software workings.

2. What is difference between SRE and DevOps?
SREs are reponsible for Maintenace while DevOps are responsible for Testing

3. What are the function of SRE Team?
Eliminating Toil
Working to Service Levels
Managing Failure

4. What are the best practices for Toil Management?
->Identify and Measure Toil
->Engineer Toil Out of the System
->Reject the Toil
->Use SLOs to Reduce Toil
->Promote Toil Reduction as a Feature
->Start Small and Then Improve

5. Please help me understand service level in SRE?
There are 3 Definitions in SRE that connect to Service Level
SLA SL Agreement – agreement that was done with the client 
SLO SL Objectives – objectives that SRE team must hit to meed that agreement
SLI SL Indicators – the real numbers of app performance

6. What are the phases to work on incidents?
-> Triage – get back to “good enough” state
-> Examine – understand problem/identify trigger
-> Diagnose – find the possible cause
-> Test – identify the problem cause
-> Cure – > fix the problem/document solution

7. What are the items we should have for postmortem?
-> Date/Authors/Reviewers/Incident Commander
-> Action Items
-> Timeline
-> Executive summary
-> Problem summary
-> Lessons Learned

8. What is Observability?
Monitoring + Metrics+ Tracing + Logs

Daniel F Hernandez
Daniel F Hernandez
6 months ago

What is SRE? and How its difference from Ops?
The SRE role is committed to achieve the system stability and participate into development and delivery process for new functionabilities from end to end, the differences from Ops are that SRE Team is involved into SDLC and Ops as one entity, Ops team only operates and support system.

What is difference between SRE and DevOps?
The main differences between SRE and DevOps are that DevOps is pretty focus to shorten SDLC and speed up the software delivery. The SRE is involved into SDLC and Ops as one entity.

What are the function of SRE Team?
The function of SRE team are provide reliability to system, optimization through tools and automation, reduction of toil and accomplishment of SLI, SLO and SLA.

What are the best practices for Toil Management?
Identify manual, repetitive, automatable, reactive tasks.

Please me understand service level in SRE?
Service level in SRE are divided into three categories, SLA agreed with clients, SLO internal team agreement, SLI numbers of performance.

What are the phases to work on incidents?
Preparation, Identification, Containment, Eradication, Recovery, Lessons Learned, Ongoing Improvement.

What are the items we should have for postmortem?
A high-level summary
RCA
Steps taken to diagnose, assess, and resolve
A timeline of significant activity
Learnings and next steps

What is Observability?
Observability is a proactive approach to analyze and optimize the systems.

Silvia Acosta Oleta
Silvia Acosta Oleta
6 months ago

1.     What is SRE? And how its  differnece from OPS??
Is full stack Systems trinking and coding skills, with app service availability focus that is data drive
The difference is that  differents goals, skillers, tools and. Only OPS supports, maintenance and capacity
SRE is full. design, development, acceptance, delivery, support, maintenance and capacity
2.     What is difference between SRE and DevOps
DevOps only design, development, testing, acceptance, delivery and SRE design, development, acceptance, delivery, support, maintenance and capacity (NOT TESTING)
3.     What is  the function of SRE team
Agreed delivery, Full agency, after the event
4.     What are the best practices for toil management?
Identify and automate, prioritice projects
5.     Please me understand service level in SRE ?
Are 3 services level SLI, SLO, SLA
But where sRE is focused is the SLO, SLI  
6.     What are the phases to work on incidents ?
Triage, Examine, diagnose, test, cure
7.     What are the items we should have post mortem
Document the incident and resolution and identification root cause and fix
 
8.     What is observability
Is see logs, trace, graphics for detected performance degradation or problems that cause impact 

jpaulo
jpaulo
6 months ago

What is SRE? and How its differnece from Ops?
SRE is more focusing on site reliability and availibility.

What is difference between SRE and DevOps?
DevOps includes development and operations.

What are the function of SRE Team?
– Eliminating toil
– Managing risk

What are the best practices for Toil Management?
. Identify and define toil
. Quantify toil
. Automate

Please me understand service level in SRE?
SLA
SLO
SLI

What are the phases to work on incidents?
1. Detection
2. Logging
3. Notification
4. Response
5. Resolution
6. Documentation
7. Postmortem

What are the items we should have for postmortam?
1. Incident info
2. Root cause analysis
3. Timeline of actions
4. Document
5. Follow-up

What is Obserbability?
Monitoring of target environment by reviewing events, tracking metrics, finding patterns in logs and checking the health metrics of the infrastructure and applocations

Zbigniew Zysiak
Zbigniew Zysiak
6 months ago

SRE is a methodology that focuses on ensuring the reliability and scalability of cloud-enabled infrastructure, solutions, and services

SRE is more focused on ensuring the reliability of applications in production environments while DevOps is more focused on building and deploying applications

Building software, support and fix issues, optimize processess and be on-call, conducting postmortam reviews.

Identify and quantify toil, automate repetitive tasks, eliminate nontactical/reactive work, set an upper bound on toil, conduct post-incident reviews

Service Level Indicators (SLIs) separate indicators to measure, Service Level Objectives (SLOs) – set of SLI treated as internal level of services, Service Level Agreements (SLAs) – SLOs which were agreed with customer

Detection, Triage, Triage, Postmortam

Incident information, Find root cause, Timeline of actions, Document, Follow-up

Observability is ability to measure the internal states of a system by examining its outputs. Observability is key to reducing repetitive, predictable, and manual tasks that are related to maintaining a service

Jacek Szałęga
Jacek Szałęga
6 months ago

1 What is SRE? and How its differnece from Ops?
Ops primary concern is to keep the systems running in steady state, while SRE focuses on improvements in speed, stability and tasks automation

2 What is difference between SRE and DevOps?
DevOps is a culture of work that aims to break down barriers between development and operations teams to facilitate faster and better work. On the other hand, SRE is a specific implementation of DevOps where automation is heavily used to achieve reliability at scale.

3 What are the function of SRE Team?
– Ensuring the continuous functionality of systems
– Automating ways to keep applications functioning
– Monitoring websites/services to discover errors
– Being proactive in fixing known issues and determining ways to prevent future downtime or hiccups

4 What are the best practices for Toil Management?
 – identify toil
 – perform cost analysis
 – automate

5 Please help me understand service level in SRE?
This one is defined with 3 terms: Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs).
 – SLIs are a quantitative measure of some aspect of the level of service that is provided.
 – SLOs are the target values or ranges of values for a service level that is measured by an SLI.
 – SLAs are contract agreed consequences of meeting (or missing) the SLOs they contain.

6 What are the phases to work on incidents?
 – Detecting
 – Communication with engineers
 – Assessing the impact and applying a severity level
 – Communicating with customers
 – Escalating to the right responders
 – Delegating incident response roles
 – Resolving

7 What are the items we should have for postmortam?
A clear outline of what happened during the incident, including a timeline of events
A detailed analysis of the root cause(s) of the incident
A list of actions taken to mitigate or resolve the incident
A set of follow-up actions to prevent the incident from recurring

8 What is Obserbability?
in SRE Observability refers to the ability to infer a system’s internal state(s) by examining its external outputs. It provides actionable insights into when errors occur within a system and why they occur, enabling engineers to take corrective action right away, reducing downtime, and ensuring that systems remain dependable and highly available. Observability requires collecting data from all levels of the system, not just at the application level, which includes logging, tracing, and metrics

Adam
Adam
6 months ago

What is SRE? and How its differnece from Ops?
# Site Reliability Engineering is focusing on actually solving problems – by “engineering it away” – than just fixing it fast until the next time the same issue pops up.
# It is also focusing on eliminating repetitive manual tasks by automation.
# It also makes sure the service is reliable as much as needed, but not more
This is done by using various software technologies and code.

What is difference between SRE and DevOps?
# DevOps integrates development and operation skills, tasks into one team, people
# SRE team exists with the development team, and working closely together with it

What are the function of SRE Team?
# Using automation to get rid of manual repeptitive work which makes toil. This way supporting more servies does not need to involve more people.
# Managing the risk of service availability failures according to the agreed levels. Calculating how much work needs to be added the reach a given reliability, but not more.
# Handilng incidents that led to service outage or degradation, learn from it, document it and make sure it won’t happen again, or if it does, appropriate actions would be at hand preferably by an automation. 

What are the best practices for Toil Management?
# Probably the best practice is to not manage the toil, but eliminate it
# Basically you need to idetify what counts as toil: manual, repetitive, does not add vaule, automatable
# Either you automate the specific task, or better to implement a solution that makes the task go away, so you dont need to act on it

Please me understand service level in SRE?
Servei Level
# Objective – the availability the service needs to meet
# Indicator – metrics made by tests against the service if its meeting the target SLO
# Agreement – the amount of time the during the SLO needs to be met

What are the phases to work on incidents?
# triage
# examine
# diagnose
# test
# cure

What are the items we should have for postmortam?
# Documentation about what happened, why, what was the solution and suggestions how to use software, code, automation to not produce indident again.

What is Observability?
# Observing service / software behaviuor to able to pinpoint anomailies in case the software does not behave as expected.

Vijay Pathania
Vijay Pathania
6 months ago

What is SRE? and How its difference from Ops?
SRE is site reliability engineering.
It is a practice that uses software engineering to automate tasks like production management ,change management , incident management
etc, Where as Ops is doing many things manually in their daily day to day operations.

What is difference between SRE and DevOps?
Dev ops is all about core development of a prodcut or application
they are not working against each other.
SRE in charge of automating all the things in the deployment of a an application or product.
DEvops is about core development , they are directly dealing with customer expectations and adding new features as asked.
SRE they are working on the implementation of the core or we can say they are working on the deployment.
They give feedback to the Devops if a product is not working correctly
If they get some issues with the app or product they will come back with feedback to the Devops team , which in turn will modify code and release it gain.

What are the function of SRE Team?
reduce manual work.
Automate where ever there is an opportunity.
Do proactive monitoring of running app/product and predict the failures before they actually happen.
Try to fix it if this can be done at their end or involve Developers if this has to be modified at the app/product level.
What are the best practices for Toil Management?

Identify what is causing toil.
Automate it
Document it
Then monitor it

Please me understand service level in SRE?
Service levels are a way to measure the reliability of a service
What are the phases to work on incidents?
Detect
Log it
work on it
resolve it
Notify on resolution
RCA if required
Document it.

What are the items we should have for postmortem?
Document the issue
Action items
RCA
follow ups

What is Obserbability?
In SRE practice ,it allows us to detect and diagnose the issues before they cause much trouble to the customer.

28
0
Would love your thoughts, please comment.x
()
x