Site Reliability Engineer (SRE)
SQD offers a suite of development tools and products built on top of a state-of-the-art decentralised data lake. Developers can use custom indexers and data pipelines to serve data to applications across the Decentralized Finance, Metaverse, Social, and NFT verticals.
As we scale, we’re looking for a highly skilled Site Reliability Engineer (SRE) to design, optimize, and maintain our high-availability (HA) blockchain data ingestion pipeline, ensuring sub-second latency and 99.9% uptime. If you’re passionate about blockchain infrastructure, real-time data, and reliability engineering, this is your chance to make a significant impact.
Requirements:• 3+ years of experience as an SRE, DevOps Engineer, or similar role.
- Strong understanding of HA system design, distributed systems, and low-latency pipelines.
- Experience with monitoring and running blockchain nodes (Ethereum, Solana, etc.), and/or working with node providers as a backup solution.
- Proficiency in Kubernetes, Terraform, Prometheus, Grafana, or equivalent monitoring and orchestration tools.
- Deep knowledge of cloud infrastructure (AWS, GCP, or bare metal setups) and cost optimization strategies.
- Ability to balance performance, reliability, and cost, assessing when to use hosted nodes, subscriptions, or self-hosted setups.
- Willingness to learn and understand the internals of blockchain nodes, EVM/SVM data, and work with engineers to integrate new chains.
- Experience in defining and implementing metrics, logging, and alerting to ensure pipeline health and prevent incidents.
- Programming skills (Python, Go, Rust, or Bash) for automation and infrastructure tooling.
Responsibilities:• Design, build, and optimize a high-availability blockchain data ingestion pipeline with sub-second latency and 99.9% uptime.
- Identify and implement the right tools to run and monitor blockchain nodes, with failover solutions using node providers.
- Continuously assess infrastructure trade-offs (hosted nodes, subscriptions, bare metal) to achieve optimal performance, reliability, and cost-efficiency.
- Work closely with engineers integrating new chains, making necessary patches to the ingestion pipeline.
- Define and maintain key SRE metrics, logging, and alerting, proactively identifying and resolving reliability risks.
- Stay ahead of past incidents, continuously improving observability, automation, and fault tolerance.
- Contribute to incident response, troubleshooting, and on-call rotations.
Benefits:• Competitive salary + industry leading token incentives.
- Fully remote work with flexible hours.
- Work on cutting-edge blockchain infrastructure and real-time data challenges.
- Be part of a highly autonomous, engineering-driven team.
- Fast-moving, high-impact startup culture where your work directly shapes the future of on-chain data.
