[Remote] Sr Manager, Platform Engineering
Note: The job is a remote job and is open to candidates in USA. Flexential is hiring a Platform Engineering leader in the IT organization to plan roadmaps, establish requirements, develop and operationally manage platform technologies including Observability, DevOps, ITSM and Integrations. This role involves leading a team of platform engineers and being accountable for platform reliability, security, and delivery timelines, with a focus on building foundational platforms for Flexential's IT services.ResponsibilitiesLead the design, development, deployment and operational management of automated, resilient, high availability, self-healing, secure platforms with native-AI capabilities for IT needs, serving both internal as well as customer business capabilitiesLead, Build and manage the Platform Engineering team — hiring, mentoring, performance management, and technical roadmap ownershipPlan, build and operate an OpenTelemetry Observability platform with technologies including Grafana, Mimir, Loki, Tempo, Alertmanager on Kubernetes/RKE2 using Helm and ArgoCDBuild an automated federated Observability Edge Stack — Prometheus + OTel collector nodes deployed per site and Zabbix auto-discovery configuration and Prometheus scrape profile library for 10+ device classes (Cisco, Juniper, Dell, NetApp, etc.)Design, develop and manage engineering lifecycle platforms for high-velocity secure SDLC using Gitlab and similar / related technologiesBuild and operate iaC and CI/CD platforms including GitLab CI/CD, Terraform, Ansible AWX, Helm, and ArgoCD for automated provisioning and application deploymentOwn, enhance and operate critical IT platform technologies e.g Boomi for integrations, AWS for Cloud environments, including their hosted infrastructureEstablish and enforce platform security posture: secrets management via CyberArk/Conjur, RBAC, mTLS, compliance boundary design, and zero inbound telemetry architectureBuild and integrate ITSM capabilities for various platforms e.g automated incident creation, CI enrichment, and CMDB correlationDefine and implement extensibility patterns including AIOps: e.g anomaly detection hooks, event correlation pipeline design, and integration with future ML/AI toolingPartner with other IT and business teams for App Dev, requirements capture, delivery validation and integration needsRepresent platform engineering in cross-functional architecture reviews and executive-level program updatesPerform other management and technical duties as required and assigned for team and operational resilience e.g team building, on -call rotation, etcTravel maybe required to team or project eventsSkills12+ years of relevant technical experience with 4+ years in a management (or Principal-level) role leading an engineering teamDevOps / Platform Engineering - 8+ years, End-to-end ownership of developer/infrastructure platforms; Kubernetes, Helm, ArgoCD, service-mesh, containerized workloadsGitOps / CI-CD - 5+ years GitLab CI/CD, pipeline authoring, infrastructure-as-code delivery8+ years of expert level automation frameworks experience with Python, Terraform, Ansible, etcInfrastructure (Linux/VM) - 8+ years Linux systems administration, VM lifecycle (VMware vCenter/VCF), Netapp storage and compute provisioningWorking knowledge of Networking - 3+ years, TCP/IP, BGP/OSPF, SNMP protocolAI tooling – Strong understanding (or 1+ years experience) with MCP, Agentic workflows, SRE workflows e.g AIOps for Anomaly detection, event correlation, alert noise reduction on Prometheus and Grafana stackExperience with Secrets & Security - 4+ years, CyberArk, Conjur, Vault, or equivalent; RBAC design, compliance boundary architectureEngineering Management - 4+ years, Hiring, team building, performance management, roadmap ownership for teams of 5+ engineersOther training and experience may be substituted for the job requirements at the discretion of the managerHands-on experience or working knowledge of Boomi integrations PaaS(iPaaS) technologiesExperience with design and development of DR test application/automation and process workflows for corporate BCP executionHands-on experience working with AWS products in a Well-architected Framework and multi-account model to develop various compute, storage, network iaaS and PaaS services for IT applicationsHands-on experience working with BAS / BMS systems in a Datacenter / OT environmentBenefitsDiscretionary annual bonus, based on personal and company performance.Benefits of working at Flexential: Benefits are subject to change at the Company's discretion.Flexential participates in the E-Verify program.Company OverviewFlexential provides IT solutions including integrated colocation, interconnection, cloud, data protection, and professional services. It was founded in 2000, and is headquartered in Charlotte, North Carolina, USA, with a workforce of 501-1000 employees. Its website is https://www.flexential.com/.