Good morning, Conquer Voice users. Conquer Engineering has identified an issue that can impact use of the Conquer Voice platform for end users. As of writing, the initial issue appears resolved and service is returning to normal, but some users may be stuck. If any users are stuck, please encourage them to submit a bug report and Conquer Support will manually unstick them.
Conquer Engineering is investigating and more details will be provided once they become available.
UPDATE 8:53am Pacific Time: Conquer Engineering has confirmed the issue resolved, but users may be stuck. Any stuck users should contact Conquer Support and Support will manually unstick them.
If users are unable to submit a bug report, or have submitted one but have not heard back from Support, their ticket creation may be delayed. In this case, they should send an email to email@example.com and that will create a ticket for them, too.
UPDATE 9:25am Pacific Time: Users (provided they have been cleared) should be able to connect, place click to dial calls, and accept inbound calls normally, but Campaigns are now experiencing a specific issue. Engineering is actively working to resolve. Users who are stuck should still submit tickets to be manually cleared by Support.
UPDATE 9:45am Pacific Time: Campaign service should be restored. Conquer Support is still clearing stuck users, but the command to clear users is running slowly so it will take longer than usual for Support to complete all requests coming in. Please do continue to encourage any stuck users to submit tickets and Support will continue to clear them as quickly as possible.
UPDATE 9:54am Pacific Time: Conquer Engineering has identified an issue impacting call delivery for some inbound calls. They have identified a possible cause for the inbound call delivery issue as being related to a disruption from a service provider, but are working to confirm that this is accurate.
ROOT CAUSE ANALYSIS:
On 4/25/2023 at 7:58am PDT our cloud provider Google Cloud Platform (GCP) had a network service disruption which affected communications between services. As a result, at 8:06am PDT, the Conquer Development Team noticed requests to the Voice Campaigns services were timing out on queries to the database. Due to the network issue, at 8:46am PDT, the communication broke between redundant Rabbitmq nodes (Conquer’s primary message bus) causing a discrepancy (split brained) between the redundant Rabbit instances. This impacted multiple core Conquer services, resulting ultimately in some users experiencing what appeared to be a full outage. Rabbit recovered quickly at 8:49am PDT, but full network communication was not restored until 9:55am PDT, when GCP resolved the networking issue, which allowed the Conquer Development Team to restore communication between all the affected services. The core set of Conquer services were operational at 10:15am PDT. Some agents may be in a bad state due to issues during the outage and will need assistance from Conquer Support to correct their state. A trailing issue was noticed and fixed effecting the queues dashboard from properly displaying at 10:00am PDT. Conquer is actively monitoring for other minor lingering issues.GCP Service Incident: https://status.cloud.google.com/incidents/gsr6HAk6oCUpNG4CAZ1H#FxgErUaBEZS6pEeX6yiz