[Remote] Senior Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. Attain Finance is a leading consumer credit lender with over 50 years of expertise in providing credit solutions across the U.S. They are seeking a Senior Site Reliability Engineer to enhance the reliability and operational excellence of their software delivery systems. The role involves hands-on work across various technologies to ensure the stability and efficiency of their applications in production.ResponsibilitiesBuild and operate the delivery platform. Work across AWS, EKS, ArgoCD, Helm, GitHub Actions, Azure DevOps, Terraform, and PythonFix the problems you own. Find root cause across the AWS and Kubernetes stack, fix it, and harden it so it stays fixedRespond to incidents. Help stabilize during outages, drive root-cause analysis, and ship corrective actions for your systemsStandardize how we build and ship. Define reproducible container builds and GitOps paths on ArgoCD and Helm that replace manual deploymentHelp consolidate the CI estate. Standardize pipelines across GitHub Actions and Azure DevOps for your services — remove brittle steps and silent failures and improve visibilitySupport platform adoption. Build golden-path templates and tooling and help teams move services onto the platformUse progressive delivery. Canary and blue green deploys (Argo Rollouts) and automated rollback for the services you operateBuild observability in. Wire golden-signal metrics, logs, and traces (Prometheus/Mimir, Loki, Tempo, OpenTelemetry) into your services, surfaced in Grafana with SLOs for your domainOperate production systems. Troubleshoot failed to deploy, respond to alerts, and improve behavior from real incidentsHelp meet SLOs and carry on call. Track reliability metrics for the services you operate and share the rotationBuilt across environments. Design dev, test, and prod for safe promotion, recovery from failed deployments, and zero-downtime upgradesHelp set the standard. Build reference implementations for build, deploy, GitOps, promotion gates, and observabilityUphold compliance with the pipeline. Support deployment traceability, approval trails, and segregation of duties for PCI DSS, SOC 2, SOX, and GLBACut toil and cost. Automate repetitive ops work and help tune EKS compute, CI runners, and observability cardinalityUnblock across teams. Get hands-on with Cloud, Security, Application Engineering, Data, and Product to keep delivery movingKill knowledge silos. Write docs, runbooks, and incident learnings, so engineers operate independentlySkillsKubernetes, ArgoCD, Helm, Terraform, Python. Deep hands-on production experienceHands-on AWS. Operate and debug EKS, ECS, EC2, ECR, IAM/IRSA, VPC networking, ALB/NLB, CloudWatch, Secrets Manager, and KMSGitHub Actions and/or Azure DevOps. Build and operate CI/CD at scaleGrafana and the observability stack. Hands-on with Grafana dashboards and alerting, and the metrics, logs, and traces stack (Prometheus/Mimir, Loki, Tempo, OpenTelemetry)Strong scripting. Python and Bash, with the ability to grow into systems-level codingProduction troubleshooting. Comfortable getting into a system under load, finding root cause, and fixing itProduction ownership. Uptime and reliability accountabilityIncident response. You respond and help drive postmortems that yield real improvementsStandards contribution. You contribute to engineering standards and best practicesCompliance awareness. Experience in regulated or high-rigor environments or implementing audit and access controls in pipelinesMentorship. Through code review, examples, and pairing5+ years in site reliability, platform, DevOps, or software engineering, with production ownership of systems or pipelinesAdvanced GitOps. ArgoCD (or Flux), reusable Helm patterns, Argo RolloutsCI consolidation or migration. Moving between CI systems, such as Azure DevOps to GitHub ActionsSelf-hosted observability at scale. Running Grafana, Mimir, Loki, and Tempo in productionSupply chain security. SBOMs, artifact signing (Sigstore/cosign), SLSA provenancePlatform migrations. Contributing to modernization with minimal disruption.NET / C#. Enough to containerize and reason about application workloadsLow-level Kubernetes. Cilium/eBPF, Karpenter, or self-hosted networking and autoscalingResilience testing. Chaos/failure injection or disaster recovery drillsAI-assisted tooling. Responsible use with output validationCertification. AWS Solutions Architect, AWS DevOps Engineer, or CKA/CKADDegree in computer science or equivalent practical experienceBenefitsFlexible Paid Time Off ProgramMedicalDentalVisionLife InsuranceDisabilityOther voluntary coverages401k program, starting on the first of the month following 30 days of employment with a company matchCompany OverviewAttain Finance offers consumer credit lending and personal loan services through multiple brands in the U.S. and Canada. It was founded in 1997, and is headquartered in California, Kentucky, USA, with a workforce of 1001-5000 employees. Its website is https://attainfinance.com.