Principal Site Reliability Engineer (AWS)
Britive delivers a leading cloud-native security solution built for the most demanding cloud-forward enterprises. We were founded by security industry veterans with a track record as successful entrepreneurs. Our platform empowers cloud infrastructure and security teams with a dynamic and intelligent privilege administration technology for multi-cloud environments and helps minimize the risks of cloud security breaches and operational disruptions.
We launched our platform less than a year ago and already count several large and Fortune 500 enterprises as customers, including a global automaker, a top retail brand, a national healthcare provider, and a multi-national communications company. Our patent-pending technology has been favorably reviewed by leading analysts at Gartner, Forrester, TechVision, etc. and we are backed by top-tier VCs and prominent angel investors!
You are a passionate Principal Site Reliability Engineer who wants to develop, scale and secure our multi-tenant SaaS application on the AWS platform. You have a strong AWS systems infrastructure background as well as infrastructure-as-code development experience. From day one, you must be able to hit the ground running and bring all your experience to the team to make the infrastructure management and monitoring much smoother. Most importantly, you have a positive “can do” attitude and a passion for delivering technical solutions in a fast-paced startup environment.
- Responsible for design, implementation and uptime of highly scalable application infrastructure running on AWS
- Responsible for creation and maintenance of IaC framework and managing infrastructure deployment using Terraform
- Configure the infrastructure to generate relevant metrics for uptime monitoring
- Configure tools to handle backups and disaster recovery
- Configure tools to ingest monitoring data, develop alert criterion
- Configure tools to manage application and infrastructure security
- Monitor and fine-tune utilization of AWS resources
- Measurement, optimization, and tuning of system performance and ensuring that systems will run reliably and are highly available in a 24/7 production environment
- Help debug issues on platform, finding those non-performant queries, failures, etc.
What will you need?
- Prior experience in performing the same role in a SaaS security product company
- Minimum 10 years of industry experience
- Building and managing tools that power AWS cloud services
- Experience with managing large scale multi-tenanted SaaS applications
- High Availability (HA) and Disaster Recovery (DR) planning and implementation
- Expert knowledge of Terraform
- AWS SysOps Administrator or DevOps Engineer Certification
- Outstanding collaboration and communication skills. Ability to effectively collaborate with distributed teams
Nice to Have:
- Experience going through compliance audits like SOC2, ISO 2700x