We understand that this incident disrupted your ability to deploy new agents, and we apologize for the impact on your operations. Between March 25, 2026 at 6:21 am UTC and March 30, 2026 at 6:00 pm UTC—a period of approximately five days—customers in the EU, US, and Asia regions were unable to deploy new agents in certain folders following a transition to the Serverless Machine Runtime.
Existing agents continued to operate normally throughout the incident. If you attempted to deploy a new agent in a folder without a Serverless Machine Runtime provisioned, you encountered errors and were unable to proceed. A workaround was available throughout: you could manually provision a Serverless Machine Runtime at the folder level to immediately unblock new deployments. For organizations where enabling the Serverless Machine Runtime was not viable, our engineering team restored the previous runtime configuration to resume your operations.
A recent change transitioned newly deployed agents to run on the Serverless Machine Runtime instead of the previous runtime. This introduced a requirement for a Serverless Machine Runtime to be provisioned at the folder level before new agents could execute.
This requirement was not intended. The system already contained fallback logic for other workloads—such as API Workflows—that would automatically locate or provision a Serverless Machine Runtime when one was not explicitly assigned at the folder level. However, this fallback logic was not applied to agents during the transition, creating a gap in the expected behavior.
As a result, any folder without an explicitly provisioned Serverless Machine Runtime could not run newly deployed agents. Existing agents and folders with templates already assigned remained unaffected because they continued to use their previously configured runtime.
The issue was first reported by a customer on March 25, 2026 at 6:21 am UTC. Our engineering team was engaged the following day and identified the root cause, approximately one day after the initial report.
We recognize the gap between the issue first surfacing on March 25 and the formal incident declaration approximately five days later on March 30, 2026 at approximately 10:00 am UTC as an area for improvement. While the workaround was communicated during this period, we should have notified you through our status page sooner to ensure you had the information needed to make decisions about your environment.
Following the initial reports on March 25, our engineering team provided workarounds to affected customers. For those of you who were unable to provision Serverless Machine Runtimes, our engineering team restored the previous runtime configuration to resume your operations.
Our engineering team identified the root cause and developed a fix on March 26, 2026, then validated it in a testing environment by March 27, 2026. On March 30, 2026 at approximately 10:00 am UTC, we declared a formal incident to coordinate the deployment of the fix across all affected regions.
The fix ensures that if no Serverless Machine Runtime is found at the folder level, one is automatically selected from the tenant level or provisioned if none exists. We deployed the fix progressively, validated it at each stage, and provided status updates throughout the day. We achieved full recovery by March 30, 2026 at 6:00 pm UTC, when new agent deployments resumed normal operation across all regions without manual intervention.
To prevent similar issues in the future, we are expanding our regression testing to validate that new releases properly account for all configuration, deployment, and licensing scenarios—including cases where optional infrastructure components have not been explicitly provisioned at the folder level. We are adding automated test cases that specifically verify fallback behavior across all workload types whenever runtime requirements change, ensuring that agents, API Workflows, and other workloads are validated consistently before any release reaches your environment.
We are also enhancing our monitoring and alerting systems to detect deployment failures of this nature proactively, without relying on customer reports. This includes automated checks that verify new agent deployments succeed across a representative sample of folder configurations immediately after each release, allowing us to identify and resolve issues before they affect you.
Finally, we are reviewing our escalation and communication processes to ensure that when issues are identified, you are notified promptly through our status page. Closing the gap between issue detection and customer communication is a priority so that you have the information you need to take action as quickly as possible. These improvements directly address the gaps this incident exposed—in testing coverage, proactive detection, and customer communication—and will help ensure that future runtime transitions do not disrupt your agent deployments.