Senior Applications Support Specialist
Key Responsibilities Incident & Problem Management Lead major incident (MI) bridges and restore service with minimum business impact. Handle all L3 escalations , perform deep diagnostics across Java, JVM, middleware, OS, and infra. Own technical RCAs , drive long‑term and systemic remediation. Identify recurring failure patterns and risks. Reliability Engineering Apply SRE principles : SLIs/SLOs, error budgets, resilience patterns. Tune JVM parameters , analyze thread/heap dumps, and improve performance. Influence application architecture for fault tolerance, scalability, and recoverability . Validate DR readiness , failover behavior, and resilience testing outcomes. Change, Release & Risk Provide technical approval and risk assessment for high-risk changes. Enforce operational readiness for new apps and major releases. Ensure changes meet audit, compliance, and regulatory expectations . Automation, Monitoring & Observability Build advanced automation using Shell/Python/PowerShell . Develop frameworks for health validation , automated recovery, and compliance checks. Define observability standards; optimize alerts and improve MTTR . Leadership & Mentorship Mentor L1/L2 teams; review and approve runbooks, SOPs, and KB articles. Act as a trusted technical advisor to stakeholders and leadership. Skills & Qualifications Technical (Mandatory) Strong knowledge of application architecture, distributed systems, and middleware . Java expertise : JVM internals, GC, memory management, thread/heap dump analysis, performance tuning. .Net -- CLR internals, garbage collection, memory management, thread/dump analysis, and application performance tuning. Strong Unix/Linux , networking basics, and advanced scripting ( Shell/Python/PowerShell/VBS ). Advanced SQL and understanding of databases; Autosys (or equivalent scheduler). Handson with observability tools : Splunk, AppDynamics/Dynatrace, ELK, Grafana, Prometheus. Reliability & Operations Major incident leadership, deep RCA, change/release readiness, DR & resilience engineering. Experience in regulated production environments . Soft Skills Strong technical leadership and decision‑making. Clear communication during high‑pressure incidents. Ownership mindset and business awareness. Experience & Education 7–12+ years in Application Reliability, Production Support, SRE, or platform operations. Bachelor’s degree in Computer Science/Engineering or equivalent. ITIL, cloud, or industry certifications (preferred). Banking/financial domain experience (preferred). Working Conditions On‑call and after‑hours support as required. Fast‑paced environment with multiple priorities. Hybrid working model