Digital Bank Reduced IT Troubleshooting Time by 40%
☹️️️ Weak automation of processes:
When incidents occurred, the duty officer followed step-by-step instructions to diagnose problems and notify the responsible team.
☹️️️ Difficult to keep track of knowledge base updates:
Nobody followed the knowledge base updates, so new employees often made mistakes when following instructions.
☹️️ High human dependency:
With such dependence on a person, SLA suffered greatly, which is not just a pain point but a very critical issue.
☹️️ Disparate monitoring tools:
The fragmented IT infrastructure with many technical departments each had at least one monitoring system. It was difficult to constantly keep the focus of control on each notification, their urgency and importance were leveled due to their large number.
💻 Deployment: Enterprise on-premise version with priority support.
🕒 Period: 2 months.
💪 First, we centralized data from different monitoring systems on a single screen so that engineers working with data could understand the big picture of what is happening.
💪 Based on this data, a single resource-service model was built, showing the health of all components of the entire complex in real time. Links between configuration items reduced the time to find the root cause of the problem and the type of failure.
💪 Then, event-based automation was configured that reduced incident response time and the number of errors when executing instructions. Problem reports were sent to the responsible specialists.
💪 Interactions with the knowledge base were also automated. The engineer got the necessary entry from the knowledge base in the automated mode. Action scripts for recurring incidents of the same types were also automated.
😊 Configured event automation:
Reduced incident response time. Reduced the number of errors when executing instructions.
😊 Problem reports are sent to the responsible persons automatically:
Freed up employees’ time to perform more useful actions on problems.
😊 Automation of work with the knowledge base:
The required knowledge base article is automatically issued to the engineer.
😊 Reduced critical incident processing time:
From 25 to 15 minutes.
😊 Reduced the number of alerts per IT Ops engineer:
From 110 to 10 alerts.