[Remote] Staff Platform Engineer

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Rezdy is hiring a Staff DevOps Engineer to join their new product, Manifest, in a dynamic environment. The role involves owning critical infrastructure, improving developer experience, and collaborating closely with product engineers and DevOps leadership.ResponsibilitiesWork on a team with two other platform engineersOwn and evolve the infrastructure that supports Manifest, including AWS environments, networking, compute, data services, observability, CI/CD, and operational toolingWork with Pulumi and TypeScript to define, maintain, and improve infrastructure as code across the platformSupport and improve our containerized application platform, including deployment pipelines, rollback mechanisms, and runtime configurationHelp operate and harden our data infrastructure, including connection pooling, backups, disaster recovery, replication, and safe schema-change practicesPartner with engineers to improve the reliability and safety of releases, including database migrations, deployment workflows, environment management, and production readiness checksImprove CI/CD workflows so that builds, tests, infrastructure changes, and deployments are fast, reliable, and easy for engineers to understandLead observability and incident readiness work, including alerting, dashboards, SLOs, runbooks, incident response practices, and post-incident follow-upHelp ensure the platform is secure, cost-conscious, and maintainable as the product scalesMentor engineers on infrastructure, operations, reliability, and production ownershipSkillsDeep production experience with AWS, especially services such as ECS/Fargate, RDS/Aurora PostgreSQL, VPC networking, load balancing, IAM, KMS, Secrets Manager, CloudFront, WAF, and related managed servicesExperience designing and operating systems that serve a global user base, seamless multi-region availability, and disaster recovery proceduresTreats reliability, scalability, performance, and observability as a first-class design constraint, building these into designs from the start rather than bolting them on laterStrong infrastructure-as-code experience. Pulumi with TypeScript is ideal, but deep experience with Terraform or another mature IaC approach is also valuableStrong operational knowledge of PostgreSQL, including performance investigation, connection pooling, backups, replication, locking, migrations, and safe schema-change patternsExperience designing and maintaining CI/CD systems, ideally with GitHub Actions, OIDC-based cloud authentication, container builds, environment promotion, required checks, and deployment gatesExperience supporting containerized production workloads and improving deployment safety, rollback strategies, and runtime reliabilityStrong observability and incident response experience, including metrics, logs, traces, alerting, dashboards, runbooks, and post-incident learningThe ability to work effectively in ambiguity, make pragmatic tradeoffs, and communicate clearly with both infrastructure specialists and product engineersA track record of raising the engineering bar through reusable patterns, documentation, automation, mentoring, and thoughtful technical leadershipCompany OverviewThe worldโ€™s leading online booking and distribution platform powering the experiences industry. It is a sub-organization of Checkfront. It was founded in 2011, and is headquartered in Sydney, New South Wales, AUS, with a workforce of 51-200 employees. Its website is http://rezdy.com.

Apply Now โ†’
โ† Back