Summary
Overview
Work History
Education
Skills
Timeline
Generic

KEZIA JACOB

Scottsdale

Summary

Experienced Hadoop Administration Engineer with Information Technology experience, extensively in design and implementations of robust technology systems in Big Data (Hadoop), Linux Administration and Data Engineering.


  • Hands on experience in installation, configuration, and supporting Hadoop Clusters using HPE/MapR.
  • Installing and configuring Hadoop eco system tools like Pig, Hive, MapRDB, Spark, Sqoop, Flume, Oozie, Ambari, Ranger, Grafana.
  • Experience in managing and reviewing log files and troubleshooting issues with MapReduce/Yarn/Spark jobs. Experience in Hadoop Cluster Installation, Upgrades, Validation and Configuration.
  • Used Splunk and Dynatrace extensively for analysis and troubleshooting.
  • Experienced in writing Shell scripts to automate daily activities.
  • Experience in setting up automated 24x7 on monitoring and escalation infrastructure for Hadoop Cluster using Nagios, Ganglia and Icinga2.
  • Hadoop Cluster capacity planning, performance tuning, cluster monitoring, troubleshooting.
  • Excellent command in creating Backups & Recovery, and Disaster recovery procedures.
  • Involved in benchmarking Hadoop and MaprDB clusters using various batch jobs and workloads.
  • Experience in minor and major upgrades, patching of Hadoop and Hadoop eco system.
  • Familiar with writing Oozie workflows and Job Controllers for job automation.
  • Experience monitoring and troubleshooting issues with Linux memory, CPU, OS, and network.
  • Hands on experience using automation, cloud orchestration and configuration management tools.

Overview

10
10
years of professional experience

Work History

Bigdata Engineer

American Express, AMEX
08.2022 - Current
  • Provided platform level support for user applications running MapR Hadoop on large scale production Hadoop clusters. Specializing in YARN, Spark, Zookeeper, Hive, MapRDB, CLDB, MaprFS, Oozie, Pig.
  • Ensure that critical applications issues are addressed quickly and effectively.
  • Investigate product related issues both for application/use case teams and for common trends that may arise.
  • Study and understand critical system components and large cluster operations.
  • Work on enhancements and improvements to process in Engineering team.
  • Design and implement scalable data pipelines using modern big data technologies.
  • Collaborate with data scientists to optimize machine learning model deployment.
  • Develop and maintain data architecture for high-performance analytics solutions.
  • Integrate data from diverse sources ensuring data quality and consistency.
  • Automate data processing workflows to enhance efficiency and reliability.
  • Lead cross-functional teams in agile projects to deliver data-driven solutions.
  • Mentor junior engineers on best practices in big data engineering.
  • Continuously evaluate and adopt emerging technologies in the big data ecosystem.
  • Ensure data security and compliance with industry standards and regulations.
  • Analyze complex datasets to extract actionable insights for business stakeholders.
  • Facilitate remote collaboration using advanced tools for distributed team environments.
  • Documenting and performing changes to improve system efficiency and resolve bugs/issues.

Hadoop Administrator

Damian Consulting LLC
10.2018 - 07.2022
  • Involved in major/minor version upgrades from 4.0 to 5.2 and 6.1 in production MapR clusters
  • Performed Minor Upgrades/Ecosystem upgrades on regular basis
  • Upgrade ecosystems components on regular basis like Hive, Oozie, Spark, Pig and Sqoop to keep it updated
  • Performed MapR, OS & Firmware Patches on cluster to maintain interoperability
  • Worked on troubleshooting and resolving issues on both Batch and MaprDB clusters
  • Actively participated and assisted team with P1 issue troubleshooting and issue determination
  • Worked with developers and architects in troubleshooting and analyzing jobs and tuned them for optimum performance
  • Add/Decommission nodes as they arrive to expand production cluster after thorough validation
  • Created cluster health monitoring, node validation scripts along with other scripts to automate day to day tasks and patches/upgrades
  • Worked on tuning yarn configurations for efficient resource utilization in clusters
  • Worked on monitoring, managing, configuring, and administering batch, mapr-db and disaster recovery clusters
  • Worked on maintaining data mirroring process in remote DR clusters for all data being mirrored from all clusters to have backups available at any time
  • Planned and implemented production changes without causing any impacts and downtime
  • Documented and prepared change plans for each change and validated change plans submitted by peers
  • Gained experience on architecture, planning, and preparing nodes, data ingestion, disaster recovery, high availability, management, and monitoring
  • Handled ingestion failures, job waiting and job failures
  • Data backup and data purging based on retention policy
  • Handled alerts for CPU, memory, network, and storage related processes
  • Configured Hive Metastore to use MySQL database to establish multiple user connections to hive tables
  • Integrated Oozie with rest of Hadoop stack supporting several types of Hadoop jobs Map Reduce, Pig, Hive and Sqoop as well as system specific jobs such as Java programs and Shell scripts

DATABASE ANALYST

Pacific Clinics
01.2018 - 09.2018
  • Analyzed online data and store data using various analytical tools (SQL Server, Teradata & Tableau) to uncover trends and provide insights that will maximize returns for our online and store initiatives
  • Collaborate with various departments to design and execute a fact-based continuous improvement / channel growth strategy using online and offline data analysis
  • Worked on existing OLTP system and created facts, dimensions, data models, star schema representation for the data mart of OLTP and OLAP databases
  • Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and Understanding the functional workflow of information from source systems to destination systems
  • Designed and developed SSIS Packages to import and export data from MS Excel, SQL Server and Flat files
  • Hands on experience in writing SSIS script/script component task using C#, VB script for customized data manipulation of complex SSIS packages
  • Developed many Tabular Reports, Matrix Reports, Drill down Reports and Charts using SQL Server Reporting Services (SSRS)
  • Developed many Tabular Reports, Matrix Reports, Drill down Reports and Charts using SQL Server Reporting Services (SSRS)
  • Experience in creating sub reports, ad-hoc reports, deploying to report server and fine tuning of reports in SQL Reporting Services
  • Implemented various tasks and transformations for data cleansing and performance tuning of packages by using SSIS
  • Knowledge in performing data gap analysis in multiple projects using analytical and technical capabilities
  • Provided technical assistance and mentoring to staff for developing conceptual and logical database design and physical database

DATA ANALYST INTERN

Aurobindo Pharmaceuticals
10.2014 - 05.2015
  • Experienced in creating indexes on tables to improve the performance by eliminating full table scans to eliminate the complexity of the large queries
  • Wrote SQL, PL/SQL programs required to retrieve data using cursors
  • Involved in updating procedures, functions, triggers, tables, views and other T-SQL code and SQL joins for applications
  • Analyzed and maintained SQL Server Reporting Services (SSRS) reports for extracting real-time data from SQL and Oracle databases
  • Involved in requirements gathering, analysis, design, and development

Education

Master of Science - Management Information System

University of Houston
05.2017

Bachelor of Science - Computer Science

SRM Institute of Science and Technology
05.2015

Skills

  • Hadoop/Bigdata: MapR, HDFS, Mapreduce, Yarn, Pig, Hive, Sqoop, Spark, Flume, Oozie, Zookeeper, Hbase, Mapr-DB, Apache Tomcat, Icinga2, Nagios, Ganglia
  • OS: UNIX, Linux, MS Windows, Centos, Mac OS
  • Other: C, C, Core Java, Linux shell scripts, GIT, SQL, IntelliJ, MS Office, Stack IQ, Splunk, PLSQL, MYSQL

Timeline

Bigdata Engineer

American Express, AMEX
08.2022 - Current

Hadoop Administrator

Damian Consulting LLC
10.2018 - 07.2022

DATABASE ANALYST

Pacific Clinics
01.2018 - 09.2018

DATA ANALYST INTERN

Aurobindo Pharmaceuticals
10.2014 - 05.2015

Bachelor of Science - Computer Science

SRM Institute of Science and Technology

Master of Science - Management Information System

University of Houston
KEZIA JACOB