Resolved

An update has been released for the datagateway service to prevent this issue from re-occurring. We have been monitoring this closely since release and all is looking good.

The SyncData method will be re-enabled for targeted accounts over the coming days, we will continue to keep a close eye on this.

Recovering

We have diagnosed that the issue is due to a data sync method used by the mobile. Under specific cases (when dealing with a large amount of expired data) this method was consuming excessive resources.

Disabling the API calls for the problematic cases has restored performance to normal levels. This will remain disabled while we work on an update to stop this API call from causing performance spikes.

Further updates will be provided when further information is available.

Updated

Further releases and testing have been ongoing today. We believe the root cause has now been identified as an issue with one of the SyncData methods in the Datagateway application.

A configuration change (*) has been applied to mitigate the intermittent performance fluctuations. This will take some time to fully propagate through the system but at this time we are starting to see improvements in overall stability.

Further work is required to fully resolve the issue and prevent re-occurrence, additional updates will be provided as this is progressed to a complete fix.

  • - The configuration change involves disabling the specific SyncData requests that are causing the resource utilisation spikes. This change should stop these requests over the next 24 hours.
Updated

A service release for the data gateway was completed overnight with a view to mitigating these resource utilisation spikes. Close monitoring of the platform components affected will continue as we work towards a full resolution.

Updated

Investigations are currently focused on a particular query method used by the data gateway service as this is where we believe the issue to be. We continue to work towards an ultimate resolution and will provide a further update as soon as more information is available.

Updated

Analysis of the additional logs we have captured is ongoing. A possible cause of these intermittent spikes has been identified and we are working towards a resolution.

Updated

We continue to see intermittent performance spikes and are working to resolve the underlying cause as soon as possible. We have obtained additional debug logs from the impacted applications; analysis of this data is currently underway.

Identified

Our investigations into the root cause continue. We are in the process of capturing additional debug monitoring for our applications to help identify what may be causing these intermittent performance issues.

Performance of the system is currently stable. Additional updates to this incident will be posted as things progress.

Investigating

We are currently experiencing performance issues related to an underlying problem with the application service platform. This is being investigated as a priority and an update will be provided shortly.

Began at:

Affected components
  • Solarvista Platform Services
    • Web Applications