weblate/docs/security/incident-response-plan.rst
2026-04-08 11:20:14 +02:00

149 lines
6.9 KiB
ReStructuredText
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Incident response plan for Weblate
==================================
Scope and objectives
--------------------
This IRP covers incidents impacting the confidentiality, integrity, or
availability of Weblate-operated deployments.
.. note::
This plan is specifically designed for deployments operated by Weblate
s.r.o. Other deployments need to adapt provider-specific and organizational
steps to their own environment.
Roles and responsibilities
--------------------------
- **Incident Response Lead (IRL):** Coordinates all phases of the response process.
- **System Administrator:** Executes containment and recovery measures.
- **Security Officer:** Evaluates security impact and regulatory consequences.
- **Data Protection Officer (DPO):** Evaluates if personal data (PII) was compromised and manages mandatory GDPR notifications.
- **Communications Lead:** Manages notifications to internal stakeholders and external parties if required.
Communication logistics
-----------------------
- **Internal Communication:**
- Primary channel is **Signal** for human-to-human coordination.
- Technical alerts remain outside of Signal to avoid noise.
- **External Communication:**
- **E-mail** is used to reach customers.
- Customer contact lists are maintained in several locations to ensure access during service outages.
- **Public Disclosure:**
- If a security vulnerability is discovered, follow :doc:`/security/issues`.
Incident categories and severity
--------------------------------
Incident activation
^^^^^^^^^^^^^^^^^^^
- Declare an incident when an event is confirmed or strongly suspected to
affect the confidentiality, integrity, or availability of the service beyond
routine operational noise.
- The **Security Officer** declares the incident, assigns the initial severity,
and appoints the **Incident Response Lead (IRL)**.
- If the Security Officer is unavailable, any available senior operator may
declare the incident and hand over ownership as soon as practical.
- Reclassify the incident if the scope or impact changes during investigation.
Incident categories
^^^^^^^^^^^^^^^^^^^
- Category 1 Unauthorized Access
- Category 2 Data Integrity Violation
- Category 3 Service Outage or Degradation
- Category 4 Misconfiguration or Deployment Error
Severity levels and SLAs
^^^^^^^^^^^^^^^^^^^^^^^^
+----------+------------------------------------------------------+---------------------+-----------------------+
| Severity | Definition | Target Acknowledge | Target Initial Action |
+==========+======================================================+=====================+=======================+
| Critical | Total outage; Admin compromise; Active data breach; | < 30 Minutes | < 4 Hours |
| | requires immediate containment. | | |
+----------+------------------------------------------------------+---------------------+-----------------------+
| High | Core feature failure; PII leak of single user. | < 2 Hours | 12 Hours |
+----------+------------------------------------------------------+---------------------+-----------------------+
| Medium | Performance degradation; Minor security issue. | 1 Business Day | 3 Business Days |
+----------+------------------------------------------------------+---------------------+-----------------------+
| Low | UI bugs; Staging issues; Non-security errors. | Best Effort | Best Effort |
+----------+------------------------------------------------------+---------------------+-----------------------+
Incident response lifecycle
---------------------------
Preparation
^^^^^^^^^^^
- Ensure regular daily backups of the PostgreSQL database and the data directory using Weblate's built-in backup with rotation, see :ref:`backup`.
- Ensure Weblate uses a properly configured reverse proxy (e.g., NGINX) with HTTPS (TLS 1.2+).
- Enable 2FA for all admin-level accounts.
- Keep the Weblate instance and its dependencies (Python, Django, Celery, database, etc.) up to date.
- Integrate with SIEM systems using the GELF protocol for audit and application log forwarding.
Identification
^^^^^^^^^^^^^^
- Monitor system and application logs (``journalctl``, reverse proxy logs, Weblate application and audit logs).
- Analyze login events, webhook executions, and push/pull failures.
- Configure alerting (via Prometheus, Zabbix, or SIEM) for multiple login failures, unexpected restarts, or irregular VCS actions.
Containment
^^^^^^^^^^^
- Create an incident record with a case ID and record timeline updates as
actions are taken.
- Coordinate human response in **Signal** and keep technical alerting in the
existing monitoring systems.
- For Category 1 or 2 incidents, create a manual **Hetzner Cloud Snapshot**
before taking disruptive action when it is safe to do so.
- Name format: ``IRP-[CaseID]-[YYYYMMDD]-Evidence``.
- These are separate from standard rotating backups and must be preserved
for analysis.
- Isolate the affected host or service as needed (for example by firewall rules
or service isolation).
- Disable external integrations (Git/webhooks) if they are part of the attack
vector.
- Suspend affected user accounts immediately.
- Revoke or rotate affected administrative, API, VCS, and webhook credentials
as applicable.
- Preserve relevant evidence, including system logs, reverse proxy logs,
Weblate application and audit logs, affected configuration state, and the
list of impacted credentials or integrations.
Eradication
^^^^^^^^^^^
- Remove any unauthorized code or data.
- Patch known vulnerabilities by upgrading Weblate or server components.
- Validate binary and repository integrity using SHA-256 checksums or Git logs.
Recovery
^^^^^^^^
- Restore affected services or data from the latest known-good Weblate backups.
- **PII Assessment:** DPO determines if the breach requires a 72-hour GDPR notification.
- Reintroduce services in a phased approach.
- Confirm the root cause has been removed or a compensating control is in
place before restoring normal traffic.
- Rotate affected credentials and verify integrity of the restored system,
repositories, and configuration.
- The Security Officer and IRL approve returning to normal operations.
- Monitor logs and system behavior continuously for at least 72 hours post-recovery.
Post-incident review
^^^^^^^^^^^^^^^^^^^^
- **Timeline:** Hold a review meeting within **5 business days** of incident closure.
- Compile a full incident timeline and actions taken.
- Perform Root Cause Analysis (RCA) and document it within **10 business days**.
- Update security policies and IRP documentation based on findings.
- Review the effectiveness of detection and containment mechanisms.
- Verify whether escalation, alerting, and external communication followed
:doc:`/security/issues` as expected.