Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

SureshGoud Ediga

Phoenix

Summary

Experienced Data Engineer with over 13 years of expertise in designing and implementing large-scale data solutions, specializing in both batch and stream processing pipelines using Databricks, cloud platforms and Hadoop. A proven track record in leading data engineering teams and managing complex projects, with a strong focus on optimizing data flows and infrastructure. Proficient in leveraging Big Data technologies, including advanced Spark optimizations, to enhance processing efficiency and scalability. Demonstrated expertise in developing sophisticated graph algorithms for connected components and label propagation, contributing to deeper business insights and data-driven decision making. Detail-oriented and analytical professional with extensive experience in data management, specializing in data platforms, data processing, and data governance. Proven track record in designing and implementing robust data solutions that improve efficiency, accuracy, and accessibility of data across organizations.

Overview

13
13
years of professional experience
1
1
Certification

Work History

SR DATA ENGINEER

AMAZON WEB SERVICES
07.2024 - Current

JUL '24 - PRESENT SR DATA ENGINEER │ Amazon, Dallas

  • Architected and maintained enterprise-scale data pipelines processing 200+ billion records daily, ensuring high availability and performance across distributed systems using AWS native services
  • Designed and implemented month-end data processing pipeline handling 600+ billion records with complex merge operations, leveraging SQS for message queuing, Lambda for serverless processing, and Glue with Spark Streaming for real-time data transformation
  • Built high-performance data conversion pipelines transforming massive CSV datasets to optimized Parquet format, improving query performance by 70% and reducing storage costs by 40% for customer-facing analytics
  • Led data platform and data warehouse initiatives as core team member, maintaining petabyte-scale data storage in Amazon S3 with seamless integration to Redshift Spectrum tables for analytical workloads
  • Established comprehensive data catalog management using AWS Glue, providing unified metadata repository for both S3 data lakes and Redshift data warehouse, enhancing data discoverability and governance across the organization
  • Optimized EMR cluster configurations and Spark job performance for processing terabyte-scale datasets, achieving 50% reduction in processing time and 35% cost savings through resource optimization and advanced Spark tuning
  • Collaborated with cross-functional teams to deliver scalable data solutions supporting business intelligence, machine learning, and customer analytics use cases across multiple business units
  • Designed fault-tolerant streaming architectures using Kafka, and Spark Streaming to handle real-time data ingestion from multiple sources with sub-second latency requirements
  • Established best practices for data security and compliance, implementing encryption at rest and in transit, along with fine-grained access controls for sensitive customer data.

SR DATA ENGINEER (MTS)

Ebay
10.2020 - 06.2024
  • Built a python based spark structured streaming pipeline which consumes daily 2T's of data from Kafka and stores into multiple Delta tables
  • Contributed efforts to build PySpark code for data processing and data analytics in highly scalable AWS data store
  • Led the initiative for processing and transforming data to feed into NuGraph(No-sql) databases on AWS servers, enabling efficient account linking and data integration
  • Engineered a sophisticated connected components algorithm to drive a product recommendation engine, enhancing customer engagement and sales
  • Collaborated with cross-functional teams to optimize data workflows, ensuring seamless and efficient data movement between various platforms and services
  • Played a key role in the design and execution of scalable and robust data pipelines, catering to high-volume and complex data needs
  • Implemented best practices for Spark optimizations, resulting in more efficient data processing and reduced resource consumption
  • Contributed to the development of a dynamic data infrastructure, supporting both real-time and batch processing needs, aligning with the company's evolving data strategy
  • Regularly reviewed and updated data models and architectures to keep pace with the latest industry trends and technologies, ensuring continued operational excellence and innovation

SOLUTIONS CONSULTANT

Databricks
01.2020 - 09.2020
  • Expertly designed and deployed scalable data solutions using Databricks, integrating with cloud ecosystems (AWS, Azure) for optimized performance and cost efficiency
  • Led client consultations to architect tailor-made Databricks solutions, aligning with specific business needs and facilitating advanced data analytics and machine learning capabilities
  • Implemented best practices for data processing and analytics in Databricks, resulting in significant performance improvements and operational cost reductions for clients
  • Conducted comprehensive training sessions for client teams on Databricks utilization, ensuring long-term success and self-sufficiency in managing and optimizing their data platforms
  • Successfully managed multiple concurrent projects, showcasing exceptional organizational and multitasking abilities
  • Oversaw the development and implementation of Databricks solutions across various client engagements, ensuring each project met tailored business objectives, deadlines, and budget constraints

SR. BIG DATA ENGINEER

HMS
04.2019 - 12.2019
  • Working on Big Data infrastructure for batch processing as well as real-time processing
  • Responsible for building scalable distributed data solutions using Hadoop
  • Developed automated mapping document driven spark program which converts old data model to new data model
  • Developed profiling report generator with spark program when new data lands in Hadoop Ecosystem
  • Implanted structure streaming pipelines which comes from Kafka topic
  • Working with hive and spark to master raw data which lands in HDFS Json files
  • Imported RDBMS data and exported Hadoop into ODS database using Spark JDBC connections
  • Extensively worked with Spark data frames and RDDs for Kafka loaded Json files parsing and loading into hive tables
  • Handling of Kafka large messages to send data from Hadoop to kafka topic

BIG DATA ENGINEER

Ford Motors
01.2016 - 03.2019
  • Ingested data from Relational DB using SQOOP, Attunity into HDFS and loaded them into Hive tables and transformed and analyzed large datasets by running Hive queries and using Apache Spark
  • Extensively worked with Spark data frames and RDDs for Kafka loaded Json files parsing and loading into hive tables
  • Developed an automation tool for tracking HDFS file timestamps to provide data load gaps in Hadoop
  • Worked extensively with importing metadata into Hive and migrated SAS applications to work on Hive and Spark
  • Developing and implementing automation testing script which compares the integration and data validation of hdp data
  • Worked with Alteryx and Informatica BDM ETL tools for transformations and analytics
  • Experience on working with Json and xml parsing into hive tables using spark
  • Worked extensively with Qlikview integrated with Spark for creating customer dashboard and generating data

HADOOP DEVELOPER

American Express - TCS
11.2013 - 12.2014
  • Designing and creating stories for the development and testing of the application
  • Configuring and performance tuning the sqoop jobs for importing the input (raw) data from the data warehouse
  • Developing hive and impala queries using partitioning, bucketing and windowing functions
  • Designed and developed entire pipeline from data ingestion to reporting tables
  • Designing and creating Oozie workflows to schedule and manage Hadoop, java, pig and sqoop jobs
  • Experience in using Sqoop to migrate data to and fro from HDFS and My SQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data
  • Expertise in RDBMS concepts and constructs, working with data modelers on logical and physical data modeling

HADOOP DEVELOPER

Hitachi - TCS
02.2013 - 10.2013
  • Involved in daily scrum meetings to provide status of work
  • Responsible to manage data coming from different sources and loaded into HDFS
  • Created and written queries in SQL to update the changes in MySQL when we upload or delete file in HDFS
  • Extended support for application to work with Hive, Pig, Oozie and Sqoop
  • Worked in Apache Tomcat for deploying and testing the application
  • Delivered couple of POC's by using above application
  • Used different file formats like Text files, Sequence Files, Avro

SELENIUM AUTOMATION TESTER

Kaiser - TCS
05.2012 - 01.2013
  • Created both Manual and Automation Test Scripts using Selenium WebDriver and JAVA Technologies (Eclipse IDE)
  • Used HP Quality Center for Test Management for functional Test Automation
  • Provide all data validation through SQL queries, UNIX commands to perform Back-end testing
  • Participated in Walkthrough and defect report meetings periodically

Education

COMPLETED COURSEWORK TOWARDS BACHELORS - EEE

G PullaReddy Engineering College
AP
01.2005

MASTER OF SCIENCE - COMPUTER SCIENCE

University of central Missouri
01.2001

Skills

  • Python
  • Scala
  • ETL development
  • Data pipeline architecture
  • Real-time data processing
  • Spark performance optimization
  • Kafka streaming
  • Data warehousing
  • Spark development
  • Data modeling
  • Performance tuning
  • Big data handling
  • SQL and databases
  • Big data analytics
  • Cloud architecture

Certification

  • Databricks Certified Data Engineer Associate, 03/01/24
  • Databricks lakehouse Essentials, 2022
  • Spark & Hadoop Developer Certified(CCA175), 2016
  • Big Data Developer certified, 2015

Timeline

SR DATA ENGINEER

AMAZON WEB SERVICES
07.2024 - Current

SR DATA ENGINEER (MTS)

Ebay
10.2020 - 06.2024

SOLUTIONS CONSULTANT

Databricks
01.2020 - 09.2020

SR. BIG DATA ENGINEER

HMS
04.2019 - 12.2019

BIG DATA ENGINEER

Ford Motors
01.2016 - 03.2019

HADOOP DEVELOPER

American Express - TCS
11.2013 - 12.2014

HADOOP DEVELOPER

Hitachi - TCS
02.2013 - 10.2013

SELENIUM AUTOMATION TESTER

Kaiser - TCS
05.2012 - 01.2013

COMPLETED COURSEWORK TOWARDS BACHELORS - EEE

G PullaReddy Engineering College

MASTER OF SCIENCE - COMPUTER SCIENCE

University of central Missouri
SureshGoud Ediga