ATS Outage

Incident Report for TalosATS

Postmortem

Yesterday evening, our Talos ATS monitoring systems triggered several alerts, which were immediately escalated as a Priority 1 incident. Our on-call development team began investigating straight away.

The root cause was identified as a Microsoft service outage, which triggered a restart of our APIs. Due to a downstream dependency, several of the APIs did not restart as expected.

The team quickly identified the specific failure and began implementing a fix. All services were fully operational again within 40 minutes, and continued to be closely monitored until the fix was signed off and everything was confirmed stable.

We are reviewing this incident internally to ensure we can further strengthen resilience around this dependency in the future.

Posted Jul 29, 2025 - 09:15 UTC

Resolved

This incident has now been resolved.

Posted Jul 28, 2025 - 20:51 UTC

Monitoring

A fix has now been implemented, and we are monitoring this to ensure the site is stable.

Posted Jul 28, 2025 - 20:46 UTC

Identified

We have now identified the issue and we are working to implementing a solution. Thank you for your patience.

Posted Jul 28, 2025 - 20:45 UTC

Update

We are continuing to investigate the issue currently affecting Talos ATS and careers pages. Thank you for your patience.

Posted Jul 28, 2025 - 20:27 UTC

Investigating

We are investigating a live issue that is affecting Talos ATS and Careers Pages. Our team is actively investigating the issue and working to restore service for all users as soon as possible. Thank you for your patience.

Posted Jul 28, 2025 - 20:18 UTC

This incident affected: Talos360 ATS (ATS Main Site, Personalisation API, Vacancies API, Authorisation API, Interop API, Applicants API, Web API, Integrations HR API).