Supply Company Reconciled Network and VDI Engineers
☹️ Problems with VDI:
Workstation infrastructure built on Citrix VDI was constantly unstable on Mondays – slowed down and reset.
☹️ Monitoring didn’t report problems:
Monitoring configured on Citrix servers didn’t provide information about the source of the problem. All indicators were green and from the outside everything seemed fine.
☹️ VDI engineers blamed network engineers and vice versa:
Internal IT staff couldn’t locate the source of the issue, so the company brought in an outsourced organization to investigate, but they also failed to find any anomalies until the VDI engineers saw the problem. Both teams, without finding the root cause, began to blame each other.
💻 Deployment: Free cloud version with 1 month of support.
🕒 Period: 1 month.
💪 We started looking for the cause of the VDI failures by connecting Zabbix, which collected data from VDI to Acure. Then we connected Acure to VMware VCenter, which was responsible for events related to virtualization.
💪 We built a service infrastructure model based on this data and started looking for correlations between events.
💪 After connecting Zabbix and VMWare we realized that the disk subsystem was running slower than usual. It turned out that the issue was with the storage, because at the same time, an adjacent system on the same disk as the Citrix cluster, was running backups. On a Monday morning, the load on the servers increased and the parallel process of copying on the adjacent system caused the VDI to slow down and even reboot.
💪 As a result, the process was immediately stopped and rescheduled, and VDI began to work consistently and without freezes.
😊 Problem fixed:
The root cause of the problem was found in an adjacent system, not covered by VDI monitoring.
😊 Service functionality and stability restored:
After fixing the error, VDI worked without interruption, meeting all SLA requirements.
😊 IT staff cooperation improved across departments:
Both teams received a product that helped them work together on maintaining service availability.