Production Support Engineer

Remote Full-time
Professional Services – Support Practice
Full-time · Remote (India / US / EU)
Experience : 2-5 years
About Lyzr
Lyzr.ai's agentic AI platform powers intelligent, autonomous workflows for enterprise clients. Production Support Engineers are the front line that keeps those workflows healthy — triaging incidents, resolving tickets, digging into logs, and escalating the right issues to the right teams before clients feel the pain.
This role suits someone who thrives in a fast-paced technical environment, takes ownership seriously, and genuinely enjoys the detective work of diagnosing why something broke in production. You will work within a global follow-the-sun support model, reporting to the Production Support Lead.
What you’ll do
Incident response & triage
Monitor production dashboards and alerts; acknowledge, classify (P1–P3), and triage incoming incidents within SLA response windows.

Perform first-level diagnosis using logs, traces, and monitoring tools (Datadog / Grafana / CloudWatch) to isolate root cause or rule out environmental issues.

Execute approved runbook steps to resolve known issues independently; escalate novel or high-severity issues to the Lead with a clear diagnostic summary.

Maintain accurate, time-stamped ticket updates throughout the incident lifecycle so clients and internal stakeholders always have visibility.

Service request fulfilment
Handle client service requests: configuration changes, access provisioning, agent re-deployments, and data queries within approved change management guardrails.

Validate and document completed requests, ensuring audit trails are maintained in the ticketing system.

Identify recurring requests that could be automated or self-served, and flag them to the Lead for process improvement.

Monitoring & proactive health checks
Run scheduled health checks on production agent pipelines, API integrations, and data connectors; raise pre-emptive alerts for degradation trends.

Maintain and update monitoring dashboards; propose new alert thresholds based on observed patterns.

Participate in post-mortems and contribute findings to the known-error database and runbooks.

Knowledge & collaboration
Document solutions to new issues in the internal knowledge base; keep existing runbooks accurate and up to date.

Collaborate with Engineering, Platform, and Customer Success teams during handoffs, providing clear reproduction steps and log artefacts.

Participate in the on-call rotation (shift-based); expected availability for P1 escalations during assigned windows.

What you bring
Experience: 2–5 years in application / production support or a NOC environment
Domain: SaaS or cloud-hosted platform support; AI/ML familiarity a strong plus
Technical: Log analysis, API debugging, SQL queries, basic Python / shell scripting
Monitoring: Datadog, Grafana, CloudWatch, or equivalent observability tools
Ticketing: Jira Service Management, ServiceNow, or Zendesk
Cloud basics: AWS / GCP / Azure fundamentals; Docker / Kubernetes awareness

Additionally, you will have:
A methodical, structured approach to troubleshooting — you document what you tried, not just what worked.

Clear written communication: ticket updates, client-facing messages, and handover notes that leave no ambiguity.

Comfort working across time zones and collaborating asynchronously with distributed teams.

Bonus: exposure to LLM-based or agentic AI systems, prompt engineering, or RAG pipelines in production.

Bonus: ITIL Foundation certification or equivalent incident management training.

Apply To This Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Senior Director, Accounting

Remote

**Experienced Live Chat Remote Agent – Deliver Exceptional Customer Support for arenaflex**

Remote

Experienced Remote Customer Support Representative for Delta Airlines - Delivering Exceptional Travel Experiences from the Comfort of Your Own Home

Remote

OPL Claims Correspondent

Remote

**Experienced IT Support Analyst / Website Support Specialist – Remote Customer Service Representative**

Remote

Hiring Now: Looking for Mathematics Tutor (Part-Time) in

Remote

Hiring Now: Require Curriculum Designer - Online Learning in Ybor

Remote

Therapy Authorization / Insurance Verificat

Remote

Senior Technical Consultant - Data Conversion

Remote

Online Instructors for Health Sciences Courses (Remote)

Remote
← Back