[Remote] IBM Workload Scheduler Administration / Infrastructure Engineer

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Kastech Software Solutions Group is seeking a highly skilled IBM Workload Scheduler Administration / Infrastructure Engineer with 3–5+ years of experience. The role involves managing, maintaining, and optimizing enterprise batch scheduling infrastructure, ensuring high availability and reliable execution of critical business workloads.ResponsibilitiesIBM Workload Scheduler AdministrationAdminister Production IBM Workload Scheduler (formerly Tivoli Workload Scheduler) environment:28,000 unique daily jobsApproximately 350,000 daily job runs44 serversThree additional change-control environmentsInstall, configure, administer, patch, and upgrade IWS components:Master Domain Manager (MDM)Dynamic AgentsDynamic PoolsDynamic Workload Console (DWC)Change Management & GovernanceWork closely with Product Owners and communicate workstreams through JiraManage job promotions using a Workload Application Template-based processPerform safety and stability assessments for all job promotionsManage change control across four separate environmentsEnforce change management standards, policies, and governancePlatform Availability & OperationsMaintain and continuously improve Production platform uptime target of 99.17% per monthFollow SOPs, DevOps practices, and disciplined change-control processesCoordinate platform-impacting communications to a user community of approximately 500 developers and data engineersSupport Production infrastructure consisting of:44 serversMDM, DWC, and Agent environmentsTroubleshooting & SupportResolve:Complex job failuresPerformance bottlenecksAgent-related issuesInfrastructure-related issuesProvide guidance on complex job scheduling designs to less experienced team membersMonitoring, Security & ComplianceMonitor scheduler platform health and performanceManage database maintenance activitiesPerform backup, disaster recovery, and monthly failover testingDefine and maintain:Security policiesUser authorizationsAuthentication for Dynamic Workload Console (DWC)Respond to:Cybersecurity vulnerability assessmentsPCI compliance auditsOther regulatory audit requestsAutomation & DevOpsDesign and implement Ansible-based automation solutionsDevelop self-healing mechanisms to reduce unplanned outagesCoordinate with offshore teams performing SOP activities during non-business hoursDevelop automation scripts using:PythonIWS REST APIsSkillsAbility to modernize, implement, install, configure, upgrade, migrate, develop, or design IBM Workload Scheduler (IWS) / IBM Workload Automation (IWA) solutionsSupport migration activities across pre-production and production environmentsParticipate in knowledge transfer and documentation to enable team self-sufficiency3–5+ years of dedicated IBM Workload Scheduler administration experienceResponsible for managing, maintaining, and optimizing enterprise batch scheduling infrastructurePrimary environment hosted on Red Hat Enterprise Linux (RHEL)Strong expertise in: IBM Workload Scheduler (IWS), Linux System Administration, Scripting and AutomationFocus on ensuring high availability and reliable execution of critical business workloadsAdminister Production IBM Workload Scheduler (formerly Tivoli Workload Scheduler) environment: 28,000 unique daily jobs, Approximately 350,000 daily job runs, 44 servers, Three additional change-control environmentsInstall, configure, administer, patch, and upgrade IWS components: Master Domain Manager (MDM), Dynamic Agents, Dynamic Pools, Dynamic Workload Console (DWC)Work closely with Product Owners and communicate workstreams through JiraManage job promotions using a Workload Application Template-based processPerform safety and stability assessments for all job promotionsManage change control across four separate environmentsEnforce change management standards, policies, and governanceMaintain and continuously improve Production platform uptime target of 99.17% per monthFollow SOPs, DevOps practices, and disciplined change-control processesCoordinate platform-impacting communications to a user community of approximately 500 developers and data engineersResolve: Complex job failures, Performance bottlenecks, Agent-related issues, Infrastructure-related issuesProvide guidance on complex job scheduling designs to less experienced team membersMonitor scheduler platform health and performanceManage database maintenance activitiesPerform backup, disaster recovery, and monthly failover testingDefine and maintain: Security policies, User authorizations, Authentication for Dynamic Workload Console (DWC)Respond to: Cybersecurity vulnerability assessments, PCI compliance audits, Other regulatory audit requestsDesign and implement Ansible-based automation solutionsDevelop self-healing mechanisms to reduce unplanned outagesCoordinate with offshore teams performing SOP activities during non-business hoursDevelop automation scripts using: Python, IWS REST APIsStrong experience with IBM Workload Scheduler architecture, especially Dynamic Workload Broker, V10.1+, high availability of MDM's managing Fault Tolerant Agent and Dynamic Agent agent architecturesStrong conceptual understanding of Master Domain Manager (MDM), Backup MDM (BMDM), Dynamic Workload Console (DWC), Fault Tolerant Agent (FTA), Dynamic Agent (DA)Strong grasp of conman CLI to monitor and control production plan, check job/job stream/resource statusStrong grasp of composer CLI to define, modify and extract scheduling objectsStrong grasp of planman CLI to control pre-production plan and GUI mirroringStrong grasp of lifecycle of daily production planning process, phases of JNextplan/FINALProficiency in navigating the DWC web-based GUI to monitor workloads, manage user access security, and define scheduling objectsExperience installing IWS components, applying Fix Packs, and Interim FixesTroubleshooting with logs under TWSDATA/stdlist, adjusting trace level for netman, batchman, writer, mailman, etcStrong experience with IBM WebSphere LibertyStrong grasp of reading messages.log, traces.log, FFDC logsStrong grasp of configuring JVM heap sizesStrong grasp of configuring tracing scope, tracing levels, tracing retentionStrong experience with Red Hat Enterprise Linux 8+Deep familiarity with bash/shell commands for text processing (for example, grep, awk, sed), file manipulation, and system navigationAbility to manage, start, stop, and troubleshoot SystemD services using systemctl and journalctl for IWS agents and MDMManaging user accounts, groups, service accounts and deep knowledge of Linux file permissions (chmod, chown, ACL on local filesystems and NFS)Ability to monitor system performance using tools like top, htop, vmstat, iostat, and sar to troubleshoot bottlenecks and platform unresponsivenessUnderstanding of Logical Volume Manager (LVM) and filesystem usageChecking TCP port availability, firewall rules (firewalld/iptables), and connectivity between MDM and Dynamic Agents using netstat, ss, ping, curl, etcManaging SSL/TLS certificates, private keystores, public truststores, and working with Certificate AuthorityStrong experience with scripting (Bash Shell, Python, etc.) for automationUnderstanding of networking principlesUnderstanding of basic Oracle database administration, enough to troubleshoot with DBA's to prove when an issue is in OracleUnderstanding of basic SQL to query job metadataUnderstanding of checking database connectivityUnderstanding of AWS cloud infrastructureExperience with using secrets manager (CyberArk PPM, Hashicorp Vault, or similar)Company OverviewKastech Software Solutions Group, incorporated in 2007 and headquartered in Richmond, Texas, is a leading global IT services and consulting company delivering technology-driven solutions to organizations across industries. It was founded in 2008, and is headquartered in Houston, Texas, USA, with a workforce of 1001-5000 employees. Its website is https://www.kastechssg.com.Company H1B SponsorshipKastech Software Solutions Group has a track record of offering H1B sponsorships, with 13 in 2026, 94 in 2025, 65 in 2024, 101 in 2023, 124 in 2022, 171 in 2021, 119 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Market Director – Security Risk Consulting

Remote

Business Developer - Power Stability & Grid Solutions

Remote

WORK FROM HOME/HOME BASED INSURANCE AGENT

Remote

Experienced Data Entry Specialist - Remote Opportunity at careerzynith

Remote

Manager, Growth Marketing - Life Sciences

Remote

Vice President Strategic Marketing (62058)

Remote

External Communications Manager, Corporate Marketing, Remote

Remote

Litigation Associate - REMOTE

Remote

Facebook Data Entry Clerk (At Home) – No Experience Required

Remote

[Remote] Sales Consultant

Remote
← Back