Seeking a challenging site reliability engineer role with progressive result- oriented organization which offer ample opportunity to prove, improve and grow in career at professional advancement and a challenging and rewarding technical environment where I can expand upon my existing skill base while contributing to a lively team
Overview
8
8
years of professional experience
Work History
Site Reliability Engineer
International Game Technology, IGT
07.2022 - Current
Managed entire Big Data environment ecosystem and performed site reliability engineer tasks
Managed 10 cloudera based Big Data Clusters
Worked on Hadoop components like HDFS,HIVE,OOZIE,ZOOKEEPER,YARN,HBASE,HIVE,IMPALA,KUDU,SPARK,Cloudera Manager,Ambari
Implemented Monitoring tools like ELK,Nagios
Configured AWS EMR and Microsoft Azure(HDinsight cluster)
Hands on experience in AWS provisioning and good knowledge of AWS services like EC2,S3,Glacier,ELB,RDS,Redshift,IAM,Route 53,VPC,Auto Scaling,Cloud Front,Cloud Watch,Cloud Trail,Cloud Formation,Security Groups,Code Deploy,Code Pipeline
Implemented AWS Devops and Azure Devops pipelines for data ingestion
Used AWS EMR to transfer and move large amounts of data into and out of other AWS data stores and Databases , such as S3 and DynamoDB
Helped design big data clusters and administered them
Performed Data Transfer from on premise to AWS Cloud
Implemented DevOps practices and tools using Jenkins, Terraform, Git, CICD
Monitored workload,Job Performance and capacity planning
Collaborated with multiple teams for design and implementation of big data clusters in cloud environments
Implemented real-time log/event monitoring tools using DataDog, Cloud Logging, Splunk,Nagios
Implemented Infrastructure as Code (IaC) using Terraform, Ansible
Performed cluster upgrade activity
Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files
Solved developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues
Collaborated with application teams to install operating system and Hadoop updates, patches, version upgrades
Involved in Analyzing system failures, identifying root causes, and recommended course of actions
Involved in defining job flows
Implemented monitoring on AWS EMR Clusters using cloud watch
Enhanced system security with regular patch updates and vulnerability assessments.
Improved system performance by implementing server upgrades and hardware replacements.
Reduced downtime for critical applications by proactively addressing potential issues through regular maintenance and updates.
Improved code deployment efficiency by automating processes with CI/CD pipelines.
Designed and implemented containerization strategies using Docker and Kubernetes, improving resource utilization and management.
Automated manual tasks through scripting languages such as Python or Shell, boosting team productivity levels.
Reduced data processing time for large-scale projects by streamlining ETL processes and leveraging distributed computing techniques.
Conducted cost analysis exercises for identifying opportunities to optimize spending across various AWS resources without compromising performance or functionality.
Used configuration Management tools for infrastructure like Puppet,Chef,Docker,Ansible,Kubernetes
Troubleshooted issues using JIRA,Servicenow
Site Reliability Engineer (Big Data)
Wells Fargo Bank, N.A.
05.2021 - 06.2022
Installed, configured and deployed Map R Big Data Cluster for Development and Production
Continuous monitoring and managing the HADOOP cluster through Map R Control System, Spyglass and Geneos
Installing, migrating, and upgrading multiple Map R systems clusters
Responded to resolve database access and performance issues
On-call availability for rotation on nights and weekends
Installed and configured monitoring tools
Experience in configuration management tool Ansible
Involved in snapshots and HDFS data backup to maintain the backup of cluster data
Provided 24/7 on-call support for critical systems, ensuring high availability and rapid issue resolution
Implemented Hadoop services like HDFS,Hive,HBASE,ZOOKEEPER,IMPALA,KAFKA,PRESTO,OOZIE,YARN,KUDU,SPARK
Improved operational efficiency by monitoring, troubleshooting, and tuning Hadoop clusters using Cloudera Manager or Hortonworks Data Platform tools.
Streamlined infrastructure management through automation using industry-leading tools such as Ansible, Kubernetes, and Terraform.
Standardized development environments using containerization technologies like Docker, resulting in consistent deployments across various platforms.
Championed a DevOps culture within the organization that emphasized collaboration between development, operations, and QA teams for seamless software delivery.
Optimized data processing by implementing efficient ETL pipelines and streamlining database design.
Site Reliability Engineer
Barclays Bank
11.2019 - 05.2021
Managed 14 Cloudera clusters on Prod,UAT,DEV,SIT environment
Manage and Monitor Hadoop Clusters for uninterrupted availability
Managed Hadoop services like HDFS,Hive,HBASE,ZOOKEEPER,IMPALA,KAFKA,PRESTO,OOZIE,YARN,KUDU,SPARK
Address queries from multiple set of users on the cluster
Prepare capacity management and forecast plans for senior management
Manage BCP and DR strategy
Experience in deploying Python libraries and PySpark using Anaconda environments
Manage incident bridges and Root cause analysis for production incidents
L1/L2 Application support experience working with Financial Applications
Generate usage reports for different applications and users on the cluster
Experience Working with Jupyter Notebooks,Jupyterlab extensions
Supported Financial Applications like LIBOR,EUDA etc
Implemented monitoring and alerting solutions using Prometheus, Grafana, and ELK Stack, enabling proactive issue detection and reducing mean time to resolution
Performed Incident Response and Troubleshooting using PagerDuty, Splunk, New Relic
Implemented Monitoring and Alerting using Prometheus, Grafana, ELK Stack
Implemeted DevOps, Ansible, AWS, Azure, Docker, ELK, Git, Kubernetes, Linux, Terraform Tech stack in the day to day operations
Site Reliability Engineer
The Capital Group Companies Inc
05.2017 - 10.2019
Installed and administered 8 Cloudera based Hadoop clusters
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability
Design and deploy Azure Backup and other Confidential backup solutions for Azure
Maintaining and provide day to day to administration of Cloudera Hadoop infrastructure
Managing regular patching and upgrade of CDH
Used Oracle to write SQL queries that create/alter/delete tables and to extract the necessary data
Excellent command in implementing High Availability,creating backups and recovery and Disaster recovery procedures
Microsoft Azure
Used NoSQL Database including Hbase, MongoDB, Cassandra
Worked closely with Informatica BDM Team and Autosys Teams
Handled 4 on premise(CDH 5.13.3) and 4 Microsoft Azure Cloudera Clusters (CDH 5.15)
Installed Kafka manager for consumer lags and for monitoring Ka a metrics also this has been used for adding topics, Partitions etc
Assisted the other ETL developers in solving complex scenarios and coordinated with source systems owners with day-to-day ETL progress monitoring
Providing subject matter expertise in management of Cloudera Hadoop infrastructure
Site Reliability Engineer
Apple Inc
11.2016 - 04.2017
Architecting Hadoop clusters with Cloudera CDH 5.7,CDH 5.9 Built UAT and Production Cloudera Cluster on CDH 5.9 Commissioned and decommissioned Data Nodes in cluster in case of problems
Debug and solve major issues with Cloudera manager by interacting with Cloudera team from Cloudera Continuous monitoring and managing Hadoop cluster through Ganglia and Nagios Giving presentations about new ecosystems to be implemented in cluster with teams and managers
Helped users in production deployments throughout process Resolved tickets submitted by users, P2,P3 issues, troubleshoot errors, documenting, resolving errors On-boarding new users to Hadoop cluster (adding user home directory and providing access to datasets)
Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability
Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting
Experience with Splunk real-time processing architecture and deployment
Implemented Hadoop services like HDFS,Hive,HBASE,ZOOKEEPER,IMPALA,KAFKA,PRESTO,OOZIE,YARN,KUDU,SPARK
Providing reports to management on Cluster Usage Metrics Benchmarking and Stress Testing an
Hadoop Cluster With TeraSort, TeraGen and Teravalidate,TestDFSIO & Co
Built Kafka clusters
PYTHON DEVELOPER
Pro-Tek Consulting
03.2016 - 09.2016
Developed web applications in DJango frameworks Model View Control (MVC) architecture
Extensively used DJANGO technologies, which includes forms, templates and form for communication with database in different forms
Used Python and DJango to interface with the jquery ui and manage the storage and deletion of content
Developed views and templates with Python and DJangos view controller and templating language to create a user-friendly website interface
Rewrite existing Python/DJango modules to deliver certain format of data
Created data-rich dashboards using React and Django
Rest Framework
Scraped product information from the web using Scrapy and Selenium
Education
Master of Science - Information Technology And
Management
CAMPBELLSVILLE UNIVERSITY
Campbellsville, KY
2020
Master of Science - Electrical Engineering
NORTHWESTERN POLYTECHNIC UNIVERSITY
Fremont, CA
2015
Bachelor Of Engineering - Electronics Engineering
UNIVERSITY OF PUNE
Pune, MH
2014
Skills
Site reliability Engineering
Linux Admin
Big Data
Tableau
Data Engineering
Python
CI/CD
Dev ops
Database Administration
Microsoft Azure
Amazon Web Services
Python Programming
Network Troubleshooting
Google Cloud Platform
Timeline
Site Reliability Engineer
International Game Technology, IGT
07.2022 - Current
Site Reliability Engineer (Big Data)
Wells Fargo Bank, N.A.
05.2021 - 06.2022
Site Reliability Engineer
Barclays Bank
11.2019 - 05.2021
Site Reliability Engineer
The Capital Group Companies Inc
05.2017 - 10.2019
Site Reliability Engineer
Apple Inc
11.2016 - 04.2017
PYTHON DEVELOPER
Pro-Tek Consulting
03.2016 - 09.2016
Master of Science - Information Technology And
Management
CAMPBELLSVILLE UNIVERSITY
Master of Science - Electrical Engineering
NORTHWESTERN POLYTECHNIC UNIVERSITY
Bachelor Of Engineering - Electronics Engineering
UNIVERSITY OF PUNE
Similar Profiles
Kisan PatelKisan Patel
Firmware Engineer 1 at International Game Technology (IGT)Firmware Engineer 1 at International Game Technology (IGT)