[Remote] Senior Software Engineer
Note: The job is a remote job and is open to candidates in USA. Microsoft is on a mission to empower every person and organization on the planet, and they are seeking a Senior Software Engineer to join their HPC/AI team. This role involves designing and building networking infrastructure for large-scale AI training, focusing on high performance, low latency, and minimal jitter for distributed AI workloads.ResponsibilitiesDesign, develop, and optimize networking solutions tailored for large-scale AI training infrastructureArchitect and implement high-performance, low-latency, and low-jitter communication frameworks for distributed systemsBenchmark, analyze, and enhance the scalability and reliability of networking systems to handle petabyte-scale data transferDebug and resolve complex networking issues in large-scale, high-performance environmentsDrive identification of dependencies and the development of design documents for a product, application, service, or platformCreate, implement, optimize, debug, refactor, and reuse code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI)Act as a Designated Responsible Individual (DRI) and guides other engineers by developing and following the playbook, working on call to monitor system/product/service for degradation, downtime, or interruptions, alerting stakeholders about status and initiates actions to restore system/product/service for simple and complex problems when appropriateProactively seek new knowledge and adapts to new AI trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scaleSkillsBachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, OR Java, JavaScript, or Python OR equivalent experienceAbility to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafterBachelor's Degree in Computer Science OR related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, OR Python OR Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experienceIn-depth understanding of networking protocols (e.g., Ethernet, TCP/IP, RDMA, gRPC) and distributed systemsFamiliarity with network virtualization, software-defined networking (SDN), or network performance tuningHands-on experience with networking technologies in AI-specific hardware (e.g., InfiniBand, ROCE, NVLink)Familiarity with AI accelerators such as GPUs (NVIDIA, AMD) or TPUs, and how they interact with networking infrastructureExperience with telemetry and observability tools for network monitoring at scaleBackground in building scalable and fault-tolerant systems in large, distributed environmentsBenefitsCertain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-payCompany OverviewMicrosoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services. It was founded in 1975, and is headquartered in Redmond, Washington, USA, with a workforce of 10001+ employees. Its website is https://www.microsoft.com.