Summary
Overview
Work History
Education
Skills
Timeline
Generic

Davandra Jagtap (Site Reliability Engineer)

Phoenix,AZ

Summary

Seeking a challenging site reliability engineer role with progressive result- oriented organization which offer ample opportunity to prove, improve and grow in career at professional advancement and a challenging and rewarding technical environment where I can expand upon my existing skill base while contributing to a lively team

Overview

8
8
years of professional experience

Work History

Site Reliability Engineer

International Game Technology, IGT
07.2022 - Current
  • Managed entire Big Data environment ecosystem and performed site reliability engineer tasks
  • Managed 10 cloudera based Big Data Clusters
  • Worked on Hadoop components like HDFS,HIVE,OOZIE,ZOOKEEPER,YARN,HBASE,HIVE,IMPALA,KUDU,SPARK,Cloudera Manager,Ambari
  • Implemented Monitoring tools like ELK,Nagios
  • Configured AWS EMR and Microsoft Azure(HDinsight cluster)
  • Hands on experience in AWS provisioning and good knowledge of AWS services like EC2,S3,Glacier,ELB,RDS,Redshift,IAM,Route 53,VPC,Auto Scaling,Cloud Front,Cloud Watch,Cloud Trail,Cloud Formation,Security Groups,Code Deploy,Code Pipeline
  • Implemented AWS Devops and Azure Devops pipelines for data ingestion
  • Used AWS EMR to transfer and move large amounts of data into and out of other AWS data stores and Databases , such as S3 and DynamoDB
  • Helped design big data clusters and administered them
  • Performed Data Transfer from on premise to AWS Cloud
  • Implemented DevOps practices and tools using Jenkins, Terraform, Git, CICD
  • Monitored workload,Job Performance and capacity planning
  • Collaborated with multiple teams for design and implementation of big data clusters in cloud environments
  • Implemented real-time log/event monitoring tools using DataDog, Cloud Logging, Splunk,Nagios
  • Implemented Infrastructure as Code (IaC) using Terraform, Ansible
  • Performed cluster upgrade activity
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files
  • Solved developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues
  • Collaborated with application teams to install operating system and Hadoop updates, patches, version upgrades
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions
  • Involved in defining job flows
  • Implemented monitoring on AWS EMR Clusters using cloud watch
  • Enhanced system security with regular patch updates and vulnerability assessments.
  • Improved system performance by implementing server upgrades and hardware replacements.
  • Reduced downtime for critical applications by proactively addressing potential issues through regular maintenance and updates.
  • Improved code deployment efficiency by automating processes with CI/CD pipelines.
  • Designed and implemented containerization strategies using Docker and Kubernetes, improving resource utilization and management.
  • Automated manual tasks through scripting languages such as Python or Shell, boosting team productivity levels.
  • Reduced data processing time for large-scale projects by streamlining ETL processes and leveraging distributed computing techniques.
  • Conducted cost analysis exercises for identifying opportunities to optimize spending across various AWS resources without compromising performance or functionality.
  • Used configuration Management tools for infrastructure like Puppet,Chef,Docker,Ansible,Kubernetes
  • Troubleshooted issues using JIRA,Servicenow

Site Reliability Engineer (Big Data)

Wells Fargo Bank, N.A.
05.2021 - 06.2022
  • Installed, configured and deployed Map R Big Data Cluster for Development and Production
  • Continuous monitoring and managing the HADOOP cluster through Map R Control System, Spyglass and Geneos
  • Installing, migrating, and upgrading multiple Map R systems clusters
  • Responded to resolve database access and performance issues
  • On-call availability for rotation on nights and weekends
  • Installed and configured monitoring tools
  • Experience in configuration management tool Ansible
  • Involved in snapshots and HDFS data backup to maintain the backup of cluster data
  • Provided 24/7 on-call support for critical systems, ensuring high availability and rapid issue resolution
  • Implemented Hadoop services like HDFS,Hive,HBASE,ZOOKEEPER,IMPALA,KAFKA,PRESTO,OOZIE,YARN,KUDU,SPARK
  • Improved operational efficiency by monitoring, troubleshooting, and tuning Hadoop clusters using Cloudera Manager or Hortonworks Data Platform tools.
  • Streamlined infrastructure management through automation using industry-leading tools such as Ansible, Kubernetes, and Terraform.
  • Standardized development environments using containerization technologies like Docker, resulting in consistent deployments across various platforms.
  • Championed a DevOps culture within the organization that emphasized collaboration between development, operations, and QA teams for seamless software delivery.
  • Optimized data processing by implementing efficient ETL pipelines and streamlining database design.

Site Reliability Engineer

Barclays Bank
11.2019 - 05.2021
  • Managed 14 Cloudera clusters on Prod,UAT,DEV,SIT environment
  • Manage and Monitor Hadoop Clusters for uninterrupted availability
  • Managed Hadoop services like HDFS,Hive,HBASE,ZOOKEEPER,IMPALA,KAFKA,PRESTO,OOZIE,YARN,KUDU,SPARK
  • Address queries from multiple set of users on the cluster
  • Prepare capacity management and forecast plans for senior management
  • Manage BCP and DR strategy
  • Experience in deploying Python libraries and PySpark using Anaconda environments
  • Manage incident bridges and Root cause analysis for production incidents
  • L1/L2 Application support experience working with Financial Applications
  • Generate usage reports for different applications and users on the cluster
  • Experience Working with Jupyter Notebooks,Jupyterlab extensions
  • Supported Financial Applications like LIBOR,EUDA etc
  • Implemented monitoring and alerting solutions using Prometheus, Grafana, and ELK Stack, enabling proactive issue detection and reducing mean time to resolution
  • Performed Incident Response and Troubleshooting using PagerDuty, Splunk, New Relic
  • Implemented Monitoring and Alerting using Prometheus, Grafana, ELK Stack
  • Implemeted DevOps, Ansible, AWS, Azure, Docker, ELK, Git, Kubernetes, Linux, Terraform Tech stack in the day to day operations

Site Reliability Engineer

The Capital Group Companies Inc
05.2017 - 10.2019
  • Installed and administered 8 Cloudera based Hadoop clusters
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability
  • Design and deploy Azure Backup and other Confidential backup solutions for Azure
  • Maintaining and provide day to day to administration of Cloudera Hadoop infrastructure
  • Managing regular patching and upgrade of CDH
  • Used Oracle to write SQL queries that create/alter/delete tables and to extract the necessary data
  • Excellent command in implementing High Availability,creating backups and recovery and Disaster recovery procedures
  • Microsoft Azure
  • Used NoSQL Database including Hbase, MongoDB, Cassandra
  • Worked closely with Informatica BDM Team and Autosys Teams
  • Handled 4 on premise(CDH 5.13.3) and 4 Microsoft Azure Cloudera Clusters (CDH 5.15)
  • Installed Kafka manager for consumer lags and for monitoring Ka a metrics also this has been used for adding topics, Partitions etc
  • Assisted the other ETL developers in solving complex scenarios and coordinated with source systems owners with day-to-day ETL progress monitoring
  • Providing subject matter expertise in management of Cloudera Hadoop infrastructure

Site Reliability Engineer

Apple Inc
11.2016 - 04.2017
  • Architecting Hadoop clusters with Cloudera CDH 5.7,CDH 5.9 Built UAT and Production Cloudera Cluster on CDH 5.9 Commissioned and decommissioned Data Nodes in cluster in case of problems
  • Debug and solve major issues with Cloudera manager by interacting with Cloudera team from Cloudera Continuous monitoring and managing Hadoop cluster through Ganglia and Nagios Giving presentations about new ecosystems to be implemented in cluster with teams and managers
  • Helped users in production deployments throughout process Resolved tickets submitted by users, P2,P3 issues, troubleshoot errors, documenting, resolving errors On-boarding new users to Hadoop cluster (adding user home directory and providing access to datasets)
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability
  • Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting
  • Experience with Splunk real-time processing architecture and deployment
  • Implemented Hadoop services like HDFS,Hive,HBASE,ZOOKEEPER,IMPALA,KAFKA,PRESTO,OOZIE,YARN,KUDU,SPARK
  • Providing reports to management on Cluster Usage Metrics Benchmarking and Stress Testing an
    Hadoop Cluster With TeraSort, TeraGen and Teravalidate,TestDFSIO & Co
  • Built Kafka clusters

PYTHON DEVELOPER

Pro-Tek Consulting
03.2016 - 09.2016
  • Developed web applications in DJango frameworks Model View Control (MVC) architecture
  • Extensively used DJANGO technologies, which includes forms, templates and form for communication with database in different forms
  • Used Python and DJango to interface with the jquery ui and manage the storage and deletion of content
  • Developed views and templates with Python and DJangos view controller and templating language to create a user-friendly website interface
  • Rewrite existing Python/DJango modules to deliver certain format of data
  • Created data-rich dashboards using React and Django
  • Rest Framework
  • Scraped product information from the web using Scrapy and Selenium


Education

Master of Science - Information Technology And Management

CAMPBELLSVILLE UNIVERSITY
Campbellsville, KY
2020

Master of Science - Electrical Engineering

NORTHWESTERN POLYTECHNIC UNIVERSITY
Fremont, CA
2015

Bachelor Of Engineering - Electronics Engineering

UNIVERSITY OF PUNE
Pune, MH
2014

Skills

  • Site reliability Engineering
  • Linux Admin
  • Big Data
  • Tableau
  • Data Engineering
  • Python
  • CI/CD
  • Dev ops
  • Database Administration
  • Microsoft Azure
  • Amazon Web Services
  • Python Programming
  • Network Troubleshooting
  • Google Cloud Platform

Timeline

Site Reliability Engineer

International Game Technology, IGT
07.2022 - Current

Site Reliability Engineer (Big Data)

Wells Fargo Bank, N.A.
05.2021 - 06.2022

Site Reliability Engineer

Barclays Bank
11.2019 - 05.2021

Site Reliability Engineer

The Capital Group Companies Inc
05.2017 - 10.2019

Site Reliability Engineer

Apple Inc
11.2016 - 04.2017

PYTHON DEVELOPER

Pro-Tek Consulting
03.2016 - 09.2016

Master of Science - Information Technology And Management

CAMPBELLSVILLE UNIVERSITY

Master of Science - Electrical Engineering

NORTHWESTERN POLYTECHNIC UNIVERSITY

Bachelor Of Engineering - Electronics Engineering

UNIVERSITY OF PUNE
Davandra Jagtap (Site Reliability Engineer)