Cloud Engineer

Remote Full-time
Position Overview:ShyftLabs is seeking a highly skilled Cloud Engineer (Senior, Data Platforms) to join our team and lead the design, implementation, and management of cloud infrastructure for our innovative GenAI applications. This role will be instrumental in building a robust platform that enables rapid experimentation and deployment while maintaining enterprise-grade security and reliability.ShyftLabs is a growing data product company founded in early 2020 and works primarily with Fortune 500 companies. We deliver digital solutions built to help accelerate the growth of businesses in various industries, by focusing on creating value through innovation. Job Responsibilities: Cloud Infrastructure ManagementDesign, provision, and maintain cloud resources across AWS (primary), with capabilities to work in Azure and Google Cloud environmentsManage end-to-end infrastructure for full-stack GenAI applications including:Database systems (Aurora, RDS, DynamoDB, DocumentDB, etc.)Security groups and IAM policiesVPC architecture and network designContainer orchestration (ECS, EKS, Lambda)Storage solutions (S3, EFS, etc.)CDN configuration (CloudFront)DNS management (Route53)Load balancing and auto-scalingData & AI PlatformsDesign feature stores, vector stores, data ingestion frameworks, and lakehouse architecturesManage data governance, lineage, masking, and access controls around data productsServerless ArchitectureDesign and implement serverless solutions using AWS Lambda, API Gateway, and EventBridgeOptimize serverless applications for performance, cost, and scalabilityImplement event-driven architectures and asynchronous processing patternsManage serverless deployment pipelines and monitoringDisaster Recovery & High AvailabilityArchitect and implement comprehensive disaster recovery strategiesDesign multi-region failover capabilities with automated recovery proceduresImplement RTO/RPO requirements through backup strategies and replicationBuild auto-failover mechanisms using Route53 health checks and failover routingCreate and maintain disaster recovery runbooks and testing proceduresEnsure data durability through cross-region replication and backup strategiesPlatform DevelopmentBuild and maintain a self-service platform enabling rapid experimentation and testing of GenAI applicationsImplement Infrastructure as Code (IaC) using Terraform for consistent and repeatable deploymentsCreate streamlined CI/CD pipelines that support local-to-dev-to-prod workflowsDesign systems that minimize deployment time and maximize developer productivityEstablish quick feedback loops between development and deploymentMonitoring & OperationsImplement comprehensive monitoring, observability, and alerting solutionsSet up logging aggregation and analysis toolsEnsure high availability and disaster recovery capabilities Optimize cloud costs while maintaining performanceDevOps ExcellenceChampion DevOps best practices across the organizationAutomate infrastructure provisioning and application deploymentImplement security best practices and compliance requirementsCreate documentation and runbooks for operational procedures Basic Qualifications: Technical Skills5+ years of hands-on experience with AWS services2+ years of hands-on experience with DatabricksExpert-level knowledge of AWS core services (EC2, VPC, IAM, S3, RDS, Lambda, ECS/EKS)Expert-level knowledge of Databricks capabilitiesFamiliarity with SageMaker, Bedrock, or Anthropic/Claude API integrationStrong proficiency with Terraform for infrastructure automationDemonstrated experience with containerization (Docker, Kubernetes)Solid understanding of networking concepts (subnets, routing, security groups, VPN)Experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, AWS CodePipeline)Proficiency in scripting languages (Python, Bash, PowerShell)Serverless & Event-Driven ArchitectureExtensive experience with AWS Lambda, API Gateway, ECS, Step FunctionsKnowledge of serverless frameworks (SAM, Serverless Framework)Experience with event-driven patterns using SNS, SQS, EventBridgeUnderstanding of serverless best practices and optimization techniquesDisaster Recovery & Business ContinuityProven experience designing and implementing DR strategies in AWSExpertise in multi-region architectures and data replicationExperience with AWS backup services and cross-region failoverKnowledge of RTO/RPO planning and implementationHands-on experience with Route53 health checks and failover routing policiesCloud Platform ExperiencePrimary: AWS (extensive experience required)Secondary: Azure and Google Cloud Platform (working knowledge)Multi-cloud architecture understandingMonitoring & ObservabilityExperience with monitoring tools (CloudWatch, Datadog, Prometheus, Grafana)Log management systems (ELK stack, Splunk, CloudWatch Logs) APM tools and distributed tracing Preferred Qualifications AWS certifications (Solutions Architect, DevOps Engineer)Databricks CertificationsExperience with open-source LLMs, embedding models, and RAG-based applicationsExperience with chaos engineering and resilience testingKnowledge of security frameworks and compliance (SOC2, HIPAA, PCI)Experience implementing complex build systems for mono-repo micro-services architecturesBackground in building developer platforms or internal tools Experience with Infrastructure as Code testing frameworks Additional Information We are proud to offer a competitive salary alongside a strong healthcare insurance and benefits package. The role is preferably hybrid, with 2 days per week spent in the office, and flexibility for client engagement needs. We pride ourselves on the growth of our employees, offering extensive learning and development resources.ShyftLabs is an equal-opportunity employer committed to creating a safe, diverse and inclusive environment. We encourage qualified applicants of all backgrounds including ethnicity, religion, disability status, gender identity, sexual orientation, family status, age, nationality, and education levels to apply. If you are contacted for an interview and require accommodation during the interviewing process, please let us know.

Apply Now
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

**Director, Google Customer Engagement Services Practice**

Remote

**Experienced Full Stack Data Analyst – Remote Work Opportunity at arenaflex**

Remote

American Airlines (Customer Service) Remote Jobs - No Experience...

Remote

AI Writing Evaluator (Tier 1)

Remote

Bilingual Virtual Call Center Agent - Arizona

Remote

Urgently Hiring: Part-Time Online Position – Flexibility and Growth Opportunities in Customer Support and Administration

Remote

Accounts Payable Specialist

Remote

Principal Accounting - Auditor

Remote

Target Part-Time Data Entry Jobs @Remote

Remote

[Remote] IT Servie Desk Analyst

Remote
← Back