Apigee Release Schedule

Apigee Edge: Tue-Thu 12am to 4am in Singapore, Central European, and US Eastern time zones (learn more)

Read the release notes to learn what is new.

Integrated Developer Portal Outage
Incident Report for Apigee
Postmortem

# ISSUE SUMMARY

On November 4, 2019, Apigee Integrated Developer Portals experienced an outage for a duration of 75 minutes. We sincerely apologize, and we are taking immediate steps to improve the platform’s performance and availability.

# DETAILED DESCRIPTION OF IMPACT

On November 4th from 16:09 to 17:24 US/Pacific 100% of connections to any Integrated Developer Portal timed out or were rejected with Gateway Timeout Errors. From 17:24 to 18:04 US/Pacific, all Integrated Developer Portals were put into read-only mode, and changes to Portals were unavailable during this period, though the sites were reachable and viewable. This did not affect users of Drupal Developer Portals.

# ROOT CAUSE

An unexpected spike in requests, roughly two orders of magnitude above the baseline traffic, resulted in the database thread pool being exhausted. The traffic bypassed our Denial of Service filtering. This caused portal compute instances to mark themselves as DEAD in the monitoring dashboards.

# REMEDIATION AND PREVENTION

This event lasted much longer than we are satisfied with. Though the incident was identified immediately at its onset, a clear path to mitigating it was not formulated for over 45 minutes. Our engineers have identified several areas of improvement to prevent this from recurring. We are tightening our DoS protections to prevent another service disruption. We are also changing the way our servers handle database transactions so that we will not hit this thread pool exhaustion in the future. In addition to the technical issues above, we’ve made improvements to our internal communications processes, as well as improved our logging and monitoring to be able to identify the incident root cause more quickly in the future.

Google is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.

Posted Nov 12, 2019 - 08:28 PST

Resolved
This issue has been resolved. We will post a public Incident Report on status.apigee.com within 7 days.
Posted Nov 04, 2019 - 17:30 PST
Monitoring
A fix has been implemented, writes have been re-enabled, and we are closely monitoring the situation.
Posted Nov 04, 2019 - 16:44 PST
Update
We have isolated what we believe to be the root cause, and are working on making portals writable again.
Posted Nov 04, 2019 - 16:35 PST
Update
Portals are now available again in Read-Only mode while we continue to roll out a more comprehensive fix.
Posted Nov 04, 2019 - 16:24 PST
Update
Efforts are still underway to roll out a fix.
Posted Nov 04, 2019 - 16:21 PST
Identified
The issue has been identified and a fix is being implemented.
Posted Nov 04, 2019 - 16:02 PST
Update
We have a working theory about the root cause, and efforts are underway to roll out the fix.
Posted Nov 04, 2019 - 16:01 PST
Update
Mitigation work is still underway, and all efforts are on making Portals available again.
Posted Nov 04, 2019 - 15:46 PST
Update
All Integrated Developer Portals are currently unreachable. We are working to identify a root cause and mitigate the issue.
Posted Nov 04, 2019 - 15:29 PST
Investigating
We are currently investigating an issue with Integrated Developer Portals. More information will come within 15 minutes.
Posted Nov 04, 2019 - 15:18 PST
This incident affected: Developer Portal (Integrated).