PostgresAI / DBLab - Major outage – Incident details

Major outage

Resolved
Major outage
Started 14 days agoLasted about 1 hour

Affected

Platform/Console

Major outage from 5:03 PM to 6:26 PM

PostgresAI Bot (alpha)

Major outage from 5:03 PM to 6:26 PM

Website

Partial outage from 5:03 PM to 6:26 PM

DBLab Demo Main

Partial outage from 5:03 PM to 6:26 PM

DBLab Demo Branching

Operational from 5:03 PM to 6:26 PM

Third Party: Stripe → Stripe API

Updates
  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring

    We implemented a fix and are currently monitoring the result.

    The issue was two-sided:
    1) GKE nodes migration caused pod recreation and Postgres failovers
    2) A manual mistake made on 2024-09-28, when reconfiguring backups, we left an extra .history file in pg_wal, corresponding to a timeline that the cluster reached today; this blocked the recover process. Deteling that history file, we unblocked it.

  • Investigating
    Investigating
    We are currently investigating this incident.