- Design, deploy, and monitor robust data platform infrastructure to ensure high scalability, reliability, and availability, supporting critical data pipelines and analytics.
- Orchestrated and containerized all data platform services using Docker and Docker Compose, standardizing development environments and reducing setup time by 50%.
- Designed and implemented a comprehensive monitoring system for the data platform using Prometheus, Grafana, and Alertmanager.
- Configured custom alerts for infrastructure and application metrics to trigger real-time notifications via Lark during incidents, achieving 95%+ system uptime.
- Designed, implemented, and maintained CI/CD pipelines using GitHub Actions and self-hosted runners, automating deployments and rollbacks which reduced deployment time by 40% and significantly improved release reliability.
- Implemented and configured OpenMetadata as the centralized data governance and metadata management platform. Established end-to-end data lineage, implemented data quality monitoring, observability dashboards, PII classification, and data security policies, significantly improving data discovery, reliability, and compliance across the organization.