Mô Tả Công Việc
As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and performance of our company's digital infrastructure and services. You will work closely with cross-functional teams, including software engineers, system administrators, and DevOps engineers, to design, build, and maintain highly available systems that can handle large-scale traffic and deliver exceptional user experiences. Your primary focus will be on automating operational tasks, optimizing system performance, and monitoring system health.
Responsibilities:
- System Reliability: Monitor and maintain the reliability and availability of the company's digital infrastructure, including servers, networks, databases, and applications.
- Incident Management: Respond to and resolve incidents in a timely manner, ensuring minimal downtime and impact on users. Conduct post-incident analysis and implement preventive measures to avoid future incidents.
- Performance Optimization: Identify system bottlenecks and performance issues, and work with development teams to optimize system performance and scalability.
- Automation: Develop and maintain automation tools and frameworks to streamline operational processes and reduce manual intervention. Automate repetitive tasks and build self-healing systems.
- Continuous Monitoring: Implement monitoring solutions to track system health, performance, and availability. Proactively identify and resolve issues before they impact users.
- Capacity Planning: Collaborate with the infrastructure team to perform capacity planning and ensure that systems have sufficient resources to handle expected growth and traffic spikes.
- Deployment and Release Management: Develop and improve deployment and release processes to ensure smooth and error-free deployments. Implement canary releases and A/B testing strategies.
- Collaboration: Work closely with software engineering and DevOps teams to promote a culture of collaboration and shared responsibility for system reliability and performance.
- Documentation: Create and maintain comprehensive documentation for system configurations, procedures, and troubleshooting guides.
Yêu Cầu Công Việc
- Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).
- Strong knowledge of Linux/Unix systems and networking concepts (at least 6 years)
- Proficiency in at least one programming language (e.g., Python, Go, Java) and experience with scripting languages (e.g., Bash, PowerShell).
- Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and infrastructure-as-code tools (e.g., Terraform, CloudFormation).
- Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).
- Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
- Understanding of agile development methodologies and DevOps principles.
- Strong problem-solving skills and the ability to analyze complex systems.
- Excellent communication and collaboration skills.
Hình thức
Quyền Lợi
In 2022, SSI Securities was proudly honored in the list of "Top 1 Financial Services Industry", second year consecutively in the "Top 100 best places to work in Vietnam" and marked for the 4th time in "Top 50 Attractive Employer Brands - Vietnamese Enterprises" announced by Anphabe.
- Highly competitive and negotiable monthly :Attractive monthly salary, 13th month salary, KPIs cash bonus, Public holiday cash bonus, Birthday gift, Lunar new year gift,...
- Attractive package of 13th month salary, KPIs bonus, Public holiday bonus, Birthday gift, Lunar new year gift
- 12 Annual leaves + 2 paid sick leaves
- Premium AON health-care insurance and annual health check
- Luxury team-building trip and varied engagement activities
- Internal leisure clubs: Football, E-Sport, Running, Gym, Yoga
- Fully-sponsored career-related training