Partial Outage on Search
Incident Report for Onshape
Postmortem

I wanted to share some additional information about today’s extended performance problems.

Without our knowledge, an internal system had accidentally been performing Onshape searches in a loop for several days. We had been monitoring increased search service load, but due to the rate limiting we have in place we did not notice any operational problems and the source was not identified.

We deployed a new version of Onshape today that had a higher rate limit for search requests to support an upcoming feature. This was enough to cause our search service to overload and begin rejecting requests. Any part of Onshape that uses search was slow. The engineering and operations teams spent over 2 hours adding capacity and examining all of the changes in the new code build before discovering the runaway internal system.

We blocked the errant system and the service immediately returned to normal.

First, I want to apologize for the extended downtime. It took us too long to identify the problem and your productivity suffered. Second, we will be implementing additional monitoring controls that will automatically alert us to unusual traffic. We will also be expanding our overall search capacity.

You trust us to protect and manage your design data. We did not earn that trust today. We will work harder to prevent this type of problem from happening again.

John Rousseau

VP, Technical Operations, Onshape

Posted Nov 18, 2020 - 21:56 EST

Resolved
We determined the cause of the outage and have disabled it. Search should now be functional.
Posted Nov 18, 2020 - 14:59 EST
Update
We are continuing to investigate this issue.
Posted Nov 18, 2020 - 14:57 EST
Update
The search response times are still not acceptable. We are continuing to work on restoring full functionality.
Posted Nov 18, 2020 - 13:59 EST
Update
We are continuing to see delayed processing times for search queries which impacts multiple parts of the product. Our engineering teams are working on the issue.
Posted Nov 18, 2020 - 12:34 EST
Investigating
We are currently investigating this issue.
Posted Nov 18, 2020 - 11:36 EST
This incident affected: Onshape CAD Service (https://cad.onshape.com) (North America, Western Europe, Southeast Asia, Australia, Northeast Asia).