At Chainstack we are building the most reliable Web3 infrastructure for the next generation of web applications based on Blockchain technologies.
Thousands of innovators in Blockchain, DeFi, NFT gaming and analytics, and other verticals are empowered by scalable distributed Chainstack APIs. We process billions of requests on a daily basis and provide unified user-friendly access for developers to all prominent Web3 protocols—from Ethereum and Polygon to Solana.
We are looking for a versatile and enthusiastic Site Reliability Engineer, someone who can manage hybrid infrastructure workloads, build robust automation, as well as troubleshoot blockchain protocol issues of any complexity.
Our Team:
We are building a 24/7 Operations Team in the follow-the-sun approach that is owning and running our production services and being responsible for the quality of the service for our customers
Our customers are all types and sizes companies and individuals running blockchain-driven systems with high load and high demand on availability and reliability
We are actively implementing a well-architecture framework and SRE practices in our customer-first approach
Role and Responsibilities:
Manage complex hybrid infrastructure
Maintain and continuously improve reliability, availability, and scalability objectives
Improve monitoring of our platform and workload distribution
Automate periodical tasks to make the Platform Operations team more efficient
Job Requirements:
Proficiency in containers and orchestration platforms (Kubernetes, HashiCorp Nomad) and infrastructure automation tools (Helm, Terraform, Ansible)
Extensive experience working with public cloud providers (Google Cloud, Amazon Web Services, Microsoft Azure)
Have tracked experience in working in DevOps/SRE positions in large production environments
Experience in implementing/extending monitoring (Grafana, Prometheus, VictoriaMetrics)
Programming skills (Python, Go)
Experience with bare metal servers and their operations