[Remote] Senior Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. OfficeSpace Software is a leading provider of an AI operating system for the built world, focusing on performance and reliability in workplace environments. They are seeking a Senior Site Reliability Engineer to enhance the performance, reliability, and cost efficiency of their production platform, while transitioning to AI-assisted reliability engineering.ResponsibilitiesDrive measurable improvements in latency, throughput, and availability across a large-scale production environmentOwn system performance—from Linux internals to Kubernetes scheduling—and eliminate bottlenecks before customers feel themDefine and enforce SLIs, SLOs, and error budgets that balance speed, reliability, and growthPartner with application engineers to profile code paths, improve execution efficiency, and harden services under real loadLead database performance optimization across queries, indexing, replication, and workload isolationDesign and oversee AI-assisted load testing, stress testing, and capacity planning workflowsGuide the migration from monolithic deployments to multi-tenant Kubernetes platformsReduce infrastructure spend through architectural decisions, right-sizing, and intelligent scaling strategiesBuild and supervise automation for infrastructure provisioning, configuration management, and observabilitySet clear operational standards for reliability, performance, and incident response—and raise the bar for how we run productionSkills7+ years operating and evolving large-scale production systemsDeep Linux systems expertise with hands-on performance tuning across CPU, memory, disk, and networkingStrong Python skills for automation, tooling, and AI-assisted systems workflowsProduction experience with Ruby/Rails ecosystems, including Puma and SidekiqProven ability to diagnose and resolve complex database performance issues (MySQL/MariaDB or PostgreSQL)Advanced Kubernetes experience—workload sizing, scheduling, and multi-tenant operationsInfrastructure-as-code mastery using Terraform and TerragruntExperience with configuration management tools such as Puppet or AnsibleStrong observability instincts across metrics, logs, and traces using tools like Prometheus, Grafana, Datadog, or ELKAI fluency—comfortable supervising AI agents for analysis, testing, and reporting, and validating their outputsA builder mindset. You move fast, take ownership, and raise standardsScaling and refactoring monolithic applications under real production loadExtracting databases or stateful components from monolithsApache and Nginx tuning at scaleRedis performance optimization and operational managementCI/CD systems and GitOps workflows, including ArgoCDCloud cost optimization and FinOps-aligned operational practicesBenefitsCompetitive Benefits and RewardsComprehensive and competitive benefits packages globally, designed to support our team’s health, well-being, and financial securityCompany OverviewOfficeSpace Software is the leading AI-powered workplace management platform that helps organizations plan, connect, and perform at scale. It was founded in 2006, and is headquartered in Alpharetta, Georgia, USA, with a workforce of 201-500 employees. Its website is https://www.officespacesoftware.com.Company H1B SponsorshipOfficeSpace Software has a track record of offering H1B sponsorships, with 1 in 2022. Please note that this does not guarantee sponsorship for this specific role.