[Remote] Site Reliability Systems Engineer
Note: The job is a remote job and is open to candidates in USA. MKS2 Technologies, LLC is an award-winning high growth small business that creates innovative technology solutions for government agencies. The Site Reliability Systems Engineer will work with the IST/System Engineering Team to enhance system reliability and incident resolution, utilizing modern monitoring tools and collaborating with various teams to improve service quality for veterans.ResponsibilitiesUtilize your skills in enterprise-level triage and incident resolution while gaining experience in VA system infrastructureUse modern system monitoring tools to improve VA enterprise reliability and improve the quality of services provided to veteransWork with system and application owners to obtain existing design and functionality, leverage comprehension of workflow systems and applications processes within multiple system environments and work across technology and development teams to diagnose outages and recommend changes to increase reliabilityUse your hardware and software experience to help strengthen the systems the VA relies on. Your primary focus will be investigation, working with event management, application owners, DevOps teams, and system and network administrators to examine issues across enterprise applications and technology stacksPartner with system and application owners to understand their platform designs and how they operate across different environments. This insight will help you diagnose outages, trace workflow issues, and recommend changes that enhance stabilityCollaborate with developers and identity and access teams when deeper technical investigations are neededYouâll gain handsâon experience with enterpriseâlevel triage and incident analysis, which will deepen your understanding of the VAâs infrastructure. Tools like SolarWinds, Dynatrace, and Splunk will be part of your daily workflow, giving you the visibility needed to identify reliability concerns and support improvements to the services delivered to veteransSkillsDeep expertise (3+ years) in two or more of the following tools used for troubleshooting application logging in an enterprise environment (Dynatrace, Splunk, SolarWinds, ServiceNow Operator Workspace)Extensive experience in one or more Technology Areas (Network, Windows, Desktop, Unix/Linux, AWS or Azure Cloud, WebSphere Middleware, Java/JS Development, Microsoft or Oracle Database)8+ years of experience working with key indicators for IT system operability, reliability, application performance, and code quality8+ years of experience deploying, maintaining, and troubleshooting complex applications at an enterprise scale while working with cross-functional teams1+ years of experience in service virtualization, AWS or Azure Cloud technologies, and SaaS and PaaS implementationExperience with using Microsoft Office, including Word, Excel, and PowerPoint2+ years independently leading a team to solve difficult technical challengesHS diploma or GED and 20+ years of relevant professional experience or MA or MS degree in computer science, electronics engineering, or other engineering or technical discipline with 10+ years of relevant professional experienceExperience with test-driven development, distributed systems, microservices and cloud-native application implementationExperience with the following tools: Oracle Enterprise Manager, Riverbed â Aternity, and ServiceNow VTBsPossession of excellent written and verbal communication skillsPossession of strong critical thinking and error assessment capabilitiesVirtual team managementPublic Trust ClearanceCompany OverviewMKS2 is a technology business providing services to the federal government and commercial clients. It was founded in 2008, and is headquartered in Austin, Texas, USA, with a workforce of 201-500 employees. Its website is https://www.mks2.com.