Getting Started with Temporal Cleaner: Setup, Tips, and Best Practices
Temporal Cleaner is a tool for managing and maintaining time-based data—archiving, pruning, compacting, and ensuring retention policies are enforced. This guide walks through a straightforward setup, practical tips, and best practices to help you integrate Temporal Cleaner into your workflow quickly and safely.
1. Quick overview
- Purpose: Automate cleanup of time-series records, logs, events, or any temporal dataset to control storage, improve query performance, and enforce retention policies.
- Common uses: Log rotation, metrics retention, event-store pruning, snapshot cleanup, and database partition management.
2. Prerequisites
- Access to the storage or database containing your temporal data (e.g., PostgreSQL, ClickHouse, S3, Elasticsearch).
- Read/write permissions for cleanup operations and configuration deployment.
- Backup strategy: a tested backup or snapshot mechanism before running destructive cleanup tasks.
- Monitoring/alerting in place (prometheus, CloudWatch, Datadog, etc.) to observe effects.
3. Installation and basic setup
Assuming Temporal Cleaner is distributed as a CLI and/or service:
-
Install the binary or container image:
- Binary: download the latest release and place it on your PATH.
- Docker: pull the image and run with necessary mounts and environment variables.
-
Create a configuration file (YAML or JSON). Minimal fields:
- target: connection details for the database or storage.
- retention: time window to keep (e.g., 90d).
- mode: dry-run | execute
- schedule: cron expression or interval for periodic runs
- filters: optional rules for selective cleanup (by tag, tenant, severity)
Example (YAML-style, adapt to your format):
Code
target: type: postgres host: db.example.local port: 5432 database: events user: cleaner retention: 90d mode: dry-run schedule: “0 3 * * *” filters:
- tag: analytics
- Validate configuration with the built-in validator (if available) or run a dry-run to preview deletions:
- Run:
temporal-cleaner validate –config /path/to/config.yml - Dry-run:
temporal-cleaner run –config /path/to/config.yml –mode dry-run
- Deploy as a scheduled job:
- Kubernetes CronJob, systemd timer, or hosted cron with appropriate permissions.
- Ensure the job runs in a network environment that can reach your target storage.
4. Safety-first workflow
- Always start with dry-run to list records that would be removed.
- Run cleanup on a small tenant or test environment first.
- Maintain recent backups for at least one retention cycle beyond the target retention.
- Use role-separated credentials limiting cleanup scope to only necessary tables/paths.
5. Performance considerations
- Batch deletes: prefer batched/partitioned deletes to avoid long-running transactions.
- Use partition drops where possible (e.g., time-partitioned tables) instead of row-level deletes—they’re faster and safer.
- Rate limit cleanup operations to avoid overloading the database during peak hours.
- Monitor query plans and lock contention; prefer non-blocking operations.
6. Scheduling and coordination
- Schedule runs during off-peak hours and coordinate across services to avoid simultaneous heavy jobs.
- Stagger cleanup across tenants or shards to smooth resource usage.
- If multiple cleaner instances run, implement leader election or distributed locking to avoid duplicate work.
7. Retention strategies
- Fixed window retention: delete anything older than N days—simple and predictable.
- Tiered retention: keep high-resolution recent data (7–30 days), downsample to lower resolution for mid-term (30–365 days), and archive beyond that.
- Per-tenant or per-tag retention for business-critical vs. ephemeral data.
- Legal/compliance exceptions: ensure retention settings respect regulatory requirements.
8. Monitoring and alerting
- Track key metrics: records deleted, bytes freed, run duration, errors, and rate of deletions.
- Alert on spikes in runtime, failure rate, or unexpected volume changes.
- Log actions with enough metadata to audit what was removed and why.
9. Common troubleshooting
- Long-running transactions: switch to partition/epoch-based cleanup or smaller batch sizes.
- Permission errors: confirm service credentials and network access.
- High lock contention: reduce concurrency, lower batch sizes, and align cleanup times with low-traffic windows.
- Unexpected deletions: immediately stop execution, restore from backup if needed, and audit logs to determine cause.
10. Best practices checklist
- Backup: verify backups before enabling destructive runs.
- Dry-run: always validate what will be removed.
- Least privilege: use scoped credentials.
- Partitioning: leverage time-partitioned storage when possible.
- Staggering: avoid running heavy jobs concurrently.
- Monitoring: export metrics and set alerts.
- Documentation: keep retention policies documented and approved.
11. Example workflow (90-day retention)
- Configure cleaner with retention: 90d and mode: dry-run.
- Validate config and run dry-run; review output.
- Schedule weekly dry-run reports for stakeholders.
- After approval, switch to execute mode with conservative batch sizing.
- Monitor first runs closely; verify backup restores in test before relying on automation.
- Move to scheduled runs once stable and observed over multiple cycles.
If you want, I can generate a sample config file tailored to a specific backend (Postgres, ClickHouse, S3, or Elasticsearch) — tell me which backend to target and any constraints (retention period, tenant model).
Leave a Reply