Duties and Responsibilities:
- Implement, configure and manage new monitoring solutions to ensure end-to- end functionality for monitoring and alerting
- Evaluate off-the- shelf and open source solutions
- Collaborate with Operations teams to ensure full monitoring of all existing infrastructure and ensure that monitoring is automatically enabled as part of spinning new infrastructure
- Ensure thorough and complete monitoring of all environments and layers network, server, storage and applications
- Provide a gap analysis of missing features and implement them across array of monitoring tools
- Collaborate with Engineering where necessary to enhance monitoring of applications and setting alert thresholds.
- Setup monitoring for datacenter deployments as well as deployments in public clouds.
- Assist in implementing scalable monitoring, metrics, and logging solutions which leverage internal systems and third party integrations
- Design Monitoring Dashboard for KPIs and provide weekly, monthly and quarter uptime reports based on synthetic monitoring
- Automate deployment of monitoring agents and servers using configuration management tools like ansible or puppet.
- Serve as a point of escalation for projects and issues
- Be on-call as part of the monitoring team rotation
KNOWLEDGE, SKILLS, AND ABILITIES REQUIRED:
- 1+ years administrating monitoring solutions like NewRelic or Dynatrace or Prometheus or Cloudwatch
- 3+ years of experience in Production Operations
- 2+ years bash, PERL or Python experience
- 2+ years administrating Linux
- Experience with managing and using APM tools like AppDynamics or NewRelic or Dynatrace
- Experience with using Time Series Databases like Graphite and InfluxDB
- Experience with Graphing tools like graphite and grafana, ability to quickly create useful dashboards for the engineering and operations teams.
- Experience with managing and using Splunk and ELK (ElasticSearch, Logstash, Kibana) for log aggregation and operational intelligence
- Experience with monitoring of infrastructure services and applications in public clouds like AWS, Azure and GCP
- Experience and understanding of Container technologies and orchestration tools (Terraform, Docker,Kubernetes, etc).
- Understanding of Asynchronous messaging systems (Kafka, Mule, ActiveMQ)
- Solarwinds administration experience a plus
- Experience working in a SAAS environment a plus
- Must have excellent organizational, communication and presentation skills
- Ability to handle tight deadlines and drive timely completion of tasks
- Ability to work under pressure in 24/7 environment
- Ability to quickly prioritize tasks and projects
- Continuous Integration and build pipeline implementation (Jenkins)
Salary: INR 5,00,000 - 15,00,000 P.A.
Industry: IT-Software / Software Services
Functional Area: IT Software - Application Programming, Maintenance
Role Category: Programming & Design
Role: Software Developer
Employment Type: Permanent Job, Full Time