[Remote] Senior Engineer - Platform Integration
Note: The job is a remote job and is open to candidates in USA. Core42 is a leader in AI-powered cloud and digital infrastructure, driving transformative technology solutions globally. They are seeking a Senior Engineer - Platform Integration to design, build, and operate core GPUaaS control plane services while collaborating with infrastructure teams to manage GPU resources at scale.ResponsibilitiesDesign, build, and operate core GPUaaS control plane servicesDevelop backend APIs and microservices (Python, Go, or Node.js)Integrate deeply with Kubernetes APIs for provisioning, scheduling, and multitenancyBuild and maintain authentication, authorization, and identity systems (OAuth2, SSO, RBAC, LDAP)Design and implement usage tracking and billing systems with strong correctness guaranteesDesign PostgreSQL schemas optimized for scale, auditing, and reliabilityBuild CI/CD pipelines and deployment automation for platform servicesCollaborate with infrastructure teams to surface GPU and system telemetryOwn systems in production including reliability, failure modes, and performanceSkills4–7 years of software engineering experience in backend, platform, or infrastructure rolesStrong backend engineering experience in Python (FastAPI), Go, or Node.jsHands-on experience with Kubernetes in production environmentsExperience building and operating REST and/or gRPC APIsStrong understanding of microservices architecture and cloud-native systemsExperience with PostgreSQL schema design, performance, and migrationsFamiliarity with authentication/authorization systems (OAuth2, SAML, JWT, RBAC)Experience working on systems that require high reliability and correctness under failure conditionsAbility to operate independently in ambiguous or greenfield environmentsExperience with GPU infrastructure, HPC environments, or AI/ML platformsExperience with Kubernetes controllers, operators, Helm, or cluster lifecycle toolingExposure to Slurm or hybrid Kubernetes/HPC scheduling systemsExperience with observability stacks (Prometheus, Grafana, OpenTelemetry)Experience building developer platforms or internal infrastructure toolsFamiliarity with MLOps tooling (Kubeflow, MLflow, PyTorch pipelines)Experience with GitOps workflows (ArgoCD, Flux, etc.)Experience working at cloud providers or infrastructure-heavy SaaS companiesExposure to distributed scheduling systems or resource orchestration platformsExperience with high-scale multi-tenant systemsBenefitsBonus and benefits on topReasonable accommodations to qualified individuals with disabilities throughout the application and employment processCompany OverviewCore42 is a developer of foundation models to empower organizations in different industries. It was founded in 2021, and is headquartered in Abu Dhabi, Abu Dhabi, ARE, with a workforce of 1001-5000 employees. Its website is https://www.core42.ai.