[Remote] Network DevOps Engineer, RDMA Fabric Automation - Multiple Openings
Note: The job is a remote job and is open to candidates in USA. Vultr is a leading cloud infrastructure company focused on providing high-performance solutions for enterprises and AI innovators. They are seeking a Network DevOps Engineer to automate and operate RDMA-based Ethernet fabrics, ensuring reliable network performance at a global scale.ResponsibilitiesAutomate deployment and operations of large-scale RDMA (RoCEv2) Ethernet fabrics across Vultr data centersBuild Ansible and Python-based frameworks to provision, validate, and remediate underlay and overlay networksIntegrate network automation with Vultr’s source-of-truth systems (NetBox, OpsMill) for intent-driven configuration and validationDevelop telemetry ingestion and correlation pipelines (gNMI, Prometheus, Kafka, custom collectors) for real-time network health and performance metricsCollaborate with platform, orchestration, and product engineering teams to optimize RDMA performance, PFC/ECN behavior, and path symmetry across fabricsImplement CI/CD workflows for network configuration changes — validation, pre-checks, and rollbacksInvestigate complex network behaviors across layers — flow hashing, congestion domains, ECMP, and overlay interactionsContribute to the design of next-generation GPU and AI interconnect fabrics, ensuring seamless integration into Vultr’s global network architectureSkillsSolid understanding of modern data center networking: EVPN-VXLAN, BGP, MLAG, QoS, and traffic engineeringDeep familiarity with RoCEv2, RDMA transport tuning, ECN/PFC, and lossless Ethernet designStrong experience with automation frameworks like Ansible, and languages like Python, Golang, Rust, or PHPComfort working with telemetry and monitoring stacks — Prometheus, Grafana, Loki, ELK, or similarPrevious experience integrating with NetBox, Nautobot, OpsMill or similar for topology and configuration source-of-truthFamiliarity with CI/CD systems (GitHub Actions, Jenkins, ArgoCD) for continuous delivery of network automationStrong Linux networking background, including namespaces, netlink, and system-level debuggingBenefits100% company-paid insurance premiums for employee medical, dental and vision plans.401(k) plan that matches 100% up to 4%, with immediate vestingProfessional Development Reimbursement of $2,500 each year11 Holidays + Paid Time Off Accrual + Rollover PlanCommitment matters to Vultr! Increased PTO at 3 year and 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year$500 stipend for remote office setup in first year + $400 each following yearInternet reimbursement up to $75 per monthGym membership reimbursement up to $50 per monthCompany paid Wellable subscriptionCompany OverviewVultr is an AI cloud infrastructure platform offering latest generation NVIDIA GPUs and AMD CPUs and GPUs across 32 worldwide regions It was founded in 2014, and is headquartered in West Palm Beach, Florida, USA, with a workforce of 201-500 employees. Its website is https://www.vultr.com.Company H1B SponsorshipVultr has a track record of offering H1B sponsorships, with 1 in 2024. Please note that this does not guarantee sponsorship for this specific role.