[Remote] Sys/Cloud Admin/Incident Response Engineer

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. i4DM is a company that provides Federal agencies with access to highly skilled professionals for complex mission challenges. They are seeking an experienced Sys/Cloud Admin/Incident Response Engineer to support enterprise monitoring operations, incident detection, and response activities for a mission-critical platform within the Department of Veterans Affairs environment.ResponsibilitiesAdminister, monitor, and support cloud and platform services, virtual infrastructure, and hosted applications to maintain system health, availability, and performanceConfigure, tune, and maintain monitoring, logging, and alerting solutions to improve visibility across infrastructure, applications, and service dependenciesValidate alert accuracy, reduce noise, and help ensure operational issues are detected proactively through effective observability practicesPerform routine system administration tasks such as environment checks, service restarts, access support, patch coordination, and operational maintenance activitiesMonitor incident queues and system alerts, perform initial triage, document impact, and execute defined escalation procedures for incidents affecting mission-critical servicesParticipate in major incident response activities, including troubleshooting, log review, coordination with engineering teams, and support for service restoration effortsFollow incident response playbooks, severity models, and communication protocols to support timely resolution and accurate status reportingDocument incident timelines, actions taken, recovery steps, and supporting evidence to enable post-incident review and continuous improvementSupport coordination during operational events by working across infrastructure, application, DevSecOps, SRE, and service management teamsProvide clear, timely updates on incident status, service impact, troubleshooting progress, and recovery actions to internal stakeholdersEscalate issues appropriately based on impact, urgency, and established operational proceduresMaintain accurate operational records in ticketing, incident, and knowledge management systemsPartner with engineers and platform teams to improve dashboards, alerts, runbooks, and operational procedures supporting reliable service deliveryIdentify recurring operational issues, alert gaps, and system weaknesses, and recommend practical improvements to reduce incident frequency and response timeSupport automation efforts for routine operational tasks, alert correlation, remediation workflows, and incident response activities where applicableContribute to post-incident reviews, root cause analysis activities, and implementation of corrective or preventive actionsHelp maintain operational reporting on incidents, system health, availability, and response metrics to support service-level objectives and operational reviewsEnsure incident records, escalation paths, standard operating procedures, and response documentation remain current and usableSupport compliance with operational policies, security requirements, and change management practices in cloud and enterprise environmentsParticipate in on-call or after-hours operational support, as required, in a 24x7 mission-driven environmentSkillsBachelor's degree in Information Technology, Computer Science, Engineering, Cybersecurity, or a related field; equivalent relevant experience may be considered3+ years of experience in systems administration, cloud operations, site reliability, network operations, incident response, or enterprise production support rolesHands-on experience supporting Windows and/or Linux server environments, cloud-hosted infrastructure, and enterprise application platformsExperience with monitoring, logging, and observability tools used to detect, investigate, and troubleshoot service disruptionsWorking knowledge of incident management processes, ticketing workflows, escalation practices, and service restoration procedures in ITIL-aligned environmentsAbility to analyze logs, alerts, and system behavior to support troubleshooting and rapid issue resolutionStrong written and verbal communication skills, with the ability to document incidents and coordinate effectively across technical and non-technical stakeholdersAbility to work in a 24x7, SLA-driven environment and participate in operational response activities under time-sensitive conditionsCandidates must be eligible to obtain and maintain a Public Trust clearanceExperience supporting VA or other Federal Government environments, including familiarity with operational reporting, service management, and compliance expectationsExperience with cloud and platform technologies such as AWS, Azure, Kubernetes, container platforms, virtualization, or hybrid infrastructureFamiliarity with enterprise monitoring and observability platforms such as Splunk, Dynatrace, CloudWatch, Azure Monitor, Grafana, or similar toolsExperience using scripting or automation tools such as PowerShell, Python, Bash, or infrastructure automation frameworks to streamline operational tasksExposure to DevSecOps, Site Reliability Engineering (SRE), SAFe Agile, or modern incident response and post-incident review practicesRelevant certifications such as AWS Certified SysOps Administrator, Azure Administrator Associate, CompTIA Security+, ITIL Foundation, Splunk, or similar credentialsCompany Overviewi4DM provides full range of information technology consulting services to government and commercial clients. It was founded in 2002, and is headquartered in Millersville, Maryland, USA, with a workforce of 51-200 employees. Its website is https://www.i4dm.com.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

**Experienced Customer Support Representative – Online Remote Jobs at arenaflex**

Remote

**Experienced Remote Data Entry Operator – Global E-commerce and Technology Giant**

Remote

Sr Solutions Architect, Data Center (NYC, NJ)

Remote

Part-Time Executive Assistant

Remote

Entry Level Data Entry Professional for Remote Work Opportunities at arenaflex – Utilizing Technical Skills for Database Management and Quality Assurance

Remote

Senior Brand and Graphics Designer

Remote

Strategic Philanthropy Director | Upstream USA | $125k-$170k | Remote (United States)

Remote

Experienced Customer Success Manager – Driving Long-Term Success for careerzynith Customers

Remote

[Remote/WFM] 911 Operator - Remote Positions Available

Remote

Entry Level Netflix Data Entry - No Experience - Remote Jobs

Remote
← Back