Summary

Overview

Work History

Education

Skills

Certification

Timeline

SureshGoud Ediga

Phoenix

Summary

Experienced Data Engineer with over 13 years of expertise in designing and implementing large-scale data solutions, specializing in both batch and stream processing pipelines using Databricks, cloud platforms and Hadoop. A proven track record in leading data engineering teams and managing complex projects, with a strong focus on optimizing data flows and infrastructure. Proficient in leveraging Big Data technologies, including advanced Spark optimizations, to enhance processing efficiency and scalability. Demonstrated expertise in developing sophisticated graph algorithms for connected components and label propagation, contributing to deeper business insights and data-driven decision making. Detail-oriented and analytical professional with extensive experience in data management, specializing in data platforms, data processing, and data governance. Proven track record in designing and implementing robust data solutions that improve efficiency, accuracy, and accessibility of data across organizations.

Overview

years of professional experience

Certification

Work History

SR DATA ENGINEER

AMAZON WEB SERVICES

07.2024 - Current

JUL '24 - PRESENT SR DATA ENGINEER │ Amazon, Dallas

Architected and maintained enterprise-scale data pipelines processing 200+ billion records daily, ensuring high availability and performance across distributed systems using AWS native services
Designed and implemented month-end data processing pipeline handling 600+ billion records with complex merge operations, leveraging SQS for message queuing, Lambda for serverless processing, and Glue with Spark Streaming for real-time data transformation
Built high-performance data conversion pipelines transforming massive CSV datasets to optimized Parquet format, improving query performance by 70% and reducing storage costs by 40% for customer-facing analytics
Led data platform and data warehouse initiatives as core team member, maintaining petabyte-scale data storage in Amazon S3 with seamless integration to Redshift Spectrum tables for analytical workloads
Established comprehensive data catalog management using AWS Glue, providing unified metadata repository for both S3 data lakes and Redshift data warehouse, enhancing data discoverability and governance across the organization
Optimized EMR cluster configurations and Spark job performance for processing terabyte-scale datasets, achieving 50% reduction in processing time and 35% cost savings through resource optimization and advanced Spark tuning
Collaborated with cross-functional teams to deliver scalable data solutions supporting business intelligence, machine learning, and customer analytics use cases across multiple business units
Designed fault-tolerant streaming architectures using Kafka, and Spark Streaming to handle real-time data ingestion from multiple sources with sub-second latency requirements
Established best practices for data security and compliance, implementing encryption at rest and in transit, along with fine-grained access controls for sensitive customer data.

SR DATA ENGINEER (MTS)

Ebay

10.2020 - 06.2024

Built a python based spark structured streaming pipeline which consumes daily 2T's of data from Kafka and stores into multiple Delta tables
Contributed efforts to build PySpark code for data processing and data analytics in highly scalable AWS data store
Led the initiative for processing and transforming data to feed into NuGraph(No-sql) databases on AWS servers, enabling efficient account linking and data integration
Engineered a sophisticated connected components algorithm to drive a product recommendation engine, enhancing customer engagement and sales
Collaborated with cross-functional teams to optimize data workflows, ensuring seamless and efficient data movement between various platforms and services
Played a key role in the design and execution of scalable and robust data pipelines, catering to high-volume and complex data needs
Implemented best practices for Spark optimizations, resulting in more efficient data processing and reduced resource consumption
Contributed to the development of a dynamic data infrastructure, supporting both real-time and batch processing needs, aligning with the company's evolving data strategy
Regularly reviewed and updated data models and architectures to keep pace with the latest industry trends and technologies, ensuring continued operational excellence and innovation

SOLUTIONS CONSULTANT

Databricks

01.2020 - 09.2020

Expertly designed and deployed scalable data solutions using Databricks, integrating with cloud ecosystems (AWS, Azure) for optimized performance and cost efficiency
Led client consultations to architect tailor-made Databricks solutions, aligning with specific business needs and facilitating advanced data analytics and machine learning capabilities
Implemented best practices for data processing and analytics in Databricks, resulting in significant performance improvements and operational cost reductions for clients
Conducted comprehensive training sessions for client teams on Databricks utilization, ensuring long-term success and self-sufficiency in managing and optimizing their data platforms
Successfully managed multiple concurrent projects, showcasing exceptional organizational and multitasking abilities
Oversaw the development and implementation of Databricks solutions across various client engagements, ensuring each project met tailored business objectives, deadlines, and budget constraints

SR. BIG DATA ENGINEER

HMS

04.2019 - 12.2019

Working on Big Data infrastructure for batch processing as well as real-time processing
Responsible for building scalable distributed data solutions using Hadoop
Developed automated mapping document driven spark program which converts old data model to new data model
Developed profiling report generator with spark program when new data lands in Hadoop Ecosystem
Implanted structure streaming pipelines which comes from Kafka topic
Working with hive and spark to master raw data which lands in HDFS Json files
Imported RDBMS data and exported Hadoop into ODS database using Spark JDBC connections
Extensively worked with Spark data frames and RDDs for Kafka loaded Json files parsing and loading into hive tables
Handling of Kafka large messages to send data from Hadoop to kafka topic

BIG DATA ENGINEER

Ford Motors

01.2016 - 03.2019

Ingested data from Relational DB using SQOOP, Attunity into HDFS and loaded them into Hive tables and transformed and analyzed large datasets by running Hive queries and using Apache Spark
Extensively worked with Spark data frames and RDDs for Kafka loaded Json files parsing and loading into hive tables
Developed an automation tool for tracking HDFS file timestamps to provide data load gaps in Hadoop
Worked extensively with importing metadata into Hive and migrated SAS applications to work on Hive and Spark
Developing and implementing automation testing script which compares the integration and data validation of hdp data
Worked with Alteryx and Informatica BDM ETL tools for transformations and analytics
Experience on working with Json and xml parsing into hive tables using spark
Worked extensively with Qlikview integrated with Spark for creating customer dashboard and generating data

HADOOP DEVELOPER

American Express - TCS

11.2013 - 12.2014

Designing and creating stories for the development and testing of the application
Configuring and performance tuning the sqoop jobs for importing the input (raw) data from the data warehouse
Developing hive and impala queries using partitioning, bucketing and windowing functions
Designed and developed entire pipeline from data ingestion to reporting tables
Designing and creating Oozie workflows to schedule and manage Hadoop, java, pig and sqoop jobs
Experience in using Sqoop to migrate data to and fro from HDFS and My SQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data
Expertise in RDBMS concepts and constructs, working with data modelers on logical and physical data modeling

HADOOP DEVELOPER

Hitachi - TCS

02.2013 - 10.2013

Involved in daily scrum meetings to provide status of work
Responsible to manage data coming from different sources and loaded into HDFS
Created and written queries in SQL to update the changes in MySQL when we upload or delete file in HDFS
Extended support for application to work with Hive, Pig, Oozie and Sqoop
Worked in Apache Tomcat for deploying and testing the application
Delivered couple of POC's by using above application
Used different file formats like Text files, Sequence Files, Avro

SELENIUM AUTOMATION TESTER

Kaiser - TCS

05.2012 - 01.2013

Created both Manual and Automation Test Scripts using Selenium WebDriver and JAVA Technologies (Eclipse IDE)
Used HP Quality Center for Test Management for functional Test Automation
Provide all data validation through SQL queries, UNIX commands to perform Back-end testing
Participated in Walkthrough and defect report meetings periodically

Education

COMPLETED COURSEWORK TOWARDS BACHELORS - EEE

G PullaReddy Engineering College

01.2005

MASTER OF SCIENCE - COMPUTER SCIENCE

University of central Missouri

01.2001

Skills

Python
Scala
ETL development
Data pipeline architecture
Real-time data processing
Spark performance optimization
Kafka streaming
Data warehousing

Spark development
Data modeling
Performance tuning
Big data handling
SQL and databases
Big data analytics
Cloud architecture

Certification

Databricks Certified Data Engineer Associate, 03/01/24
Databricks lakehouse Essentials, 2022
Spark & Hadoop Developer Certified(CCA175), 2016
Big Data Developer certified, 2015

Timeline

SR DATA ENGINEER

AMAZON WEB SERVICES

07.2024 - Current

SR DATA ENGINEER (MTS)

Ebay

10.2020 - 06.2024

SOLUTIONS CONSULTANT

Databricks

01.2020 - 09.2020

SR. BIG DATA ENGINEER

HMS

04.2019 - 12.2019

BIG DATA ENGINEER

Ford Motors

01.2016 - 03.2019

HADOOP DEVELOPER

American Express - TCS

11.2013 - 12.2014

HADOOP DEVELOPER

Hitachi - TCS

02.2013 - 10.2013

SELENIUM AUTOMATION TESTER

Kaiser - TCS

05.2012 - 01.2013

COMPLETED COURSEWORK TOWARDS BACHELORS - EEE

G PullaReddy Engineering College

MASTER OF SCIENCE - COMPUTER SCIENCE

University of central Missouri