Experienced Hadoop Administration Engineer with Information Technology experience, extensively in design and implementations of robust technology systems in Big Data (Hadoop), Linux Administration and Data Engineering.
Hands on experience in installation, configuration, and supporting Hadoop Clusters using HPE/MapR.
Installing and configuring Hadoop eco system tools like Pig, Hive, MapRDB, Spark, Sqoop, Flume, Oozie, Ambari, Ranger, Grafana.
Experience in managing and reviewing log files and troubleshooting issues with MapReduce/Yarn/Spark jobs. Experience in Hadoop Cluster Installation, Upgrades, Validation and Configuration.
Used Splunk and Dynatrace extensively for analysis and troubleshooting.
Experienced in writing Shell scripts to automate daily activities.
Experience in setting up automated 24x7 on monitoring and escalation infrastructure for Hadoop Cluster using Nagios, Ganglia and Icinga2.
Excellent command in creating Backups & Recovery, and Disaster recovery procedures.
Involved in benchmarking Hadoop and MaprDB clusters using various batch jobs and workloads.
Experience in minor and major upgrades, patching of Hadoop and Hadoop eco system.
Familiar with writing Oozie workflows and Job Controllers for job automation.
Experience monitoring and troubleshooting issues with Linux memory, CPU, OS, and network.
Hands on experience using automation, cloud orchestration and configuration management tools.
Overview
10
10
years of professional experience
Work History
Bigdata Engineer
American Express, AMEX
08.2022 - Current
Provided platform level support for user applications running MapR Hadoop on large scale production Hadoop clusters. Specializing in YARN, Spark, Zookeeper, Hive, MapRDB, CLDB, MaprFS, Oozie, Pig.
Ensure that critical applications issues are addressed quickly and effectively.
Investigate product related issues both for application/use case teams and for common trends that may arise.
Study and understand critical system components and large cluster operations.
Work on enhancements and improvements to process in Engineering team.
Design and implement scalable data pipelines using modern big data technologies.
Collaborate with data scientists to optimize machine learning model deployment.
Develop and maintain data architecture for high-performance analytics solutions.
Integrate data from diverse sources ensuring data quality and consistency.
Automate data processing workflows to enhance efficiency and reliability.
Lead cross-functional teams in agile projects to deliver data-driven solutions.
Mentor junior engineers on best practices in big data engineering.
Continuously evaluate and adopt emerging technologies in the big data ecosystem.
Ensure data security and compliance with industry standards and regulations.
Analyze complex datasets to extract actionable insights for business stakeholders.
Facilitate remote collaboration using advanced tools for distributed team environments.
Documenting and performing changes to improve system efficiency and resolve bugs/issues.
Hadoop Administrator
Damian Consulting LLC
10.2018 - 07.2022
Involved in major/minor version upgrades from 4.0 to 5.2 and 6.1 in production MapR clusters
Performed Minor Upgrades/Ecosystem upgrades on regular basis
Upgrade ecosystems components on regular basis like Hive, Oozie, Spark, Pig and Sqoop to keep it updated
Performed MapR, OS & Firmware Patches on cluster to maintain interoperability
Worked on troubleshooting and resolving issues on both Batch and MaprDB clusters
Actively participated and assisted team with P1 issue troubleshooting and issue determination
Worked with developers and architects in troubleshooting and analyzing jobs and tuned them for optimum performance
Add/Decommission nodes as they arrive to expand production cluster after thorough validation
Created cluster health monitoring, node validation scripts along with other scripts to automate day to day tasks and patches/upgrades
Worked on tuning yarn configurations for efficient resource utilization in clusters
Worked on monitoring, managing, configuring, and administering batch, mapr-db and disaster recovery clusters
Worked on maintaining data mirroring process in remote DR clusters for all data being mirrored from all clusters to have backups available at any time
Planned and implemented production changes without causing any impacts and downtime
Documented and prepared change plans for each change and validated change plans submitted by peers
Gained experience on architecture, planning, and preparing nodes, data ingestion, disaster recovery, high availability, management, and monitoring
Handled ingestion failures, job waiting and job failures
Data backup and data purging based on retention policy
Handled alerts for CPU, memory, network, and storage related processes
Configured Hive Metastore to use MySQL database to establish multiple user connections to hive tables
Integrated Oozie with rest of Hadoop stack supporting several types of Hadoop jobs Map Reduce, Pig, Hive and Sqoop as well as system specific jobs such as Java programs and Shell scripts
DATABASE ANALYST
Pacific Clinics
01.2018 - 09.2018
Analyzed online data and store data using various analytical tools (SQL Server, Teradata & Tableau) to uncover trends and provide insights that will maximize returns for our online and store initiatives
Collaborate with various departments to design and execute a fact-based continuous improvement / channel growth strategy using online and offline data analysis
Worked on existing OLTP system and created facts, dimensions, data models, star schema representation for the data mart of OLTP and OLAP databases
Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and Understanding the functional workflow of information from source systems to destination systems
Designed and developed SSIS Packages to import and export data from MS Excel, SQL Server and Flat files
Hands on experience in writing SSIS script/script component task using C#, VB script for customized data manipulation of complex SSIS packages
Developed many Tabular Reports, Matrix Reports, Drill down Reports and Charts using SQL Server Reporting Services (SSRS)
Developed many Tabular Reports, Matrix Reports, Drill down Reports and Charts using SQL Server Reporting Services (SSRS)
Experience in creating sub reports, ad-hoc reports, deploying to report server and fine tuning of reports in SQL Reporting Services
Implemented various tasks and transformations for data cleansing and performance tuning of packages by using SSIS
Knowledge in performing data gap analysis in multiple projects using analytical and technical capabilities
Provided technical assistance and mentoring to staff for developing conceptual and logical database design and physical database
DATA ANALYST INTERN
Aurobindo Pharmaceuticals
10.2014 - 05.2015
Experienced in creating indexes on tables to improve the performance by eliminating full table scans to eliminate the complexity of the large queries
Wrote SQL, PL/SQL programs required to retrieve data using cursors
Involved in updating procedures, functions, triggers, tables, views and other T-SQL code and SQL joins for applications
Analyzed and maintained SQL Server Reporting Services (SSRS) reports for extracting real-time data from SQL and Oracle databases
Involved in requirements gathering, analysis, design, and development