I wanted to share some additional information about today’s extended performance problems.
Without our knowledge, an internal system had accidentally been performing Onshape searches in a loop for several days. We had been monitoring increased search service load, but due to the rate limiting we have in place we did not notice any operational problems and the source was not identified.
We deployed a new version of Onshape today that had a higher rate limit for search requests to support an upcoming feature. This was enough to cause our search service to overload and begin rejecting requests. Any part of Onshape that uses search was slow. The engineering and operations teams spent over 2 hours adding capacity and examining all of the changes in the new code build before discovering the runaway internal system.
We blocked the errant system and the service immediately returned to normal.
First, I want to apologize for the extended downtime. It took us too long to identify the problem and your productivity suffered. Second, we will be implementing additional monitoring controls that will automatically alert us to unusual traffic. We will also be expanding our overall search capacity.
You trust us to protect and manage your design data. We did not earn that trust today. We will work harder to prevent this type of problem from happening again.
VP, Technical Operations, Onshape