[Remote] Senior Site Reliability Engineer, Observability
Note: The job is a remote job and is open to candidates in USA. Chainlink Labs is the industry-standard oracle platform powering decentralized finance (DeFi). As a Senior Site Reliability Engineer focused on Observability, you will enhance the reliability and performance of the company's observability infrastructure while supporting engineering teams in troubleshooting and deploying new products.ResponsibilitiesBuild and orchestrate Modern OTEL-based Observability PlatformSupport multiple telemetry types, like metrics, logs and tracesDefine and support modern governance in observability and problems at scaleEnsure reliability, security, and performance exceed our defined SLAsWork with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive loadLead the design and deployment of monitoring/observability services to detect and alert the team of needed actionIngest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipelineOversee the availability, performance, and supportability of our observability infrastructureCreate processes around alert response operations and support the team to ensure the reliable delivery of oracle dataMake recommendations to ensure sufficient metrics are collected to create alerts with every new feature releaseChampion reliability and security by taking the time to do your work right the first timeSkills7+ years of relevant professional experience. You probably have worked on a devops, infrastructure, SRE, and/or platform team beforeAbility to develop software outside of the scope of typical infrastructure requirements and configurationsExperience programming in C, C++, Java, Python, Go, Perl, or RubyExpert knowledge in all aspects of designing, developing, and managing large real-time systemsExperience with monitoring and logging. You know how to export metrics using Prometheus, have built a Grafana dashboard or two, and have experience with a centralized logging solution like an ELK Stack, Splunk or Grafana StackExperience with distributed systems and container orchestration. You have maintained or even built Kubernetes clusters before and feel comfortable deploying completely new services on themStrong communication skills. You can give and receive constructive feedback, and you do not shy away from planning meetings and code reviewsExcitement for blockchain, Web 3.0, and similar decentralized technologiesExperience running any infrastructure in the blockchain/web3 spaceAbility to scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocityExperience working remotely in a distributed teamA strong desire to grow and challenge yourself. We would expect you to constantly find ways to improve and automate services to reduce toilCompany OverviewChainlink Labs provides open-source blockchain oracle solutions and specializes in the development and integration of chainlink. It was founded in 2014, and is headquartered in San Francisco, California, USA, with a workforce of 501-1000 employees. Its website is https://chainlinklabs.com/.