Summary
Overview
Work History
Education
Skills
Timeline
Generic

Nacer Chalane

Phoenix

Summary

Results-driven IT Professional with 8+ years of experience and mastery in Python, Java, and C++, specializing in data engineering, machine learning, and cloud-native systems. Expert in architecting and managing scalable ETL pipelines using PySpark and Spark SQL for both batch and real-time data workloads, in environments ranging from Cloudera/Hadoop to Google Cloud Platform (GCP). Proficient in ingesting and transforming large-scale datasets-including connected vehicle data (e.g., telematics, geospatial, sensor logs)-for business insights, anomaly detection, and ML model development. Experienced in deploying ML pipelines on Vertex AI, orchestrating workflows with Kubeflow Pipelines, and maintaining robust, automated CI/CD processes using Jenkins, Tekton, and Terraform. Skilled in implementing supervised and unsupervised ML algorithms, deep learning (TensorFlow, PyTorch), and LLM-based systems using LangChain, Hugging Face, and Chainlit. Strong background in DevSecOps (SonarQube, FOSSA, JFrog, 42Crunch) and infrastructure provisioning. Proven ability to drive high-quality, maintainable, and secure code at scale.

Overview

13
13
years of professional experience

Work History

Data engineer

Ford
MI
07.2020 - Current
  • Designed and implemented large-scale ETL pipelines for high-volume, high-velocity data ingestion, including connected vehicle data streams such as telematics, GPS, and sensor signals, using PySpark, Cloud Pub/Sub, and Dataproc, delivering structured insights into driving behavior, fleet operations, and anomaly detection.
  • Built and optimized ETL workflows to extract data from onboard units (OBUs), transform raw timeseries into structured feature sets (e.g., speed, engine load, geospatial paths), and load into BigQuery and cloud data lakes for downstream analytics and machine learning.
  • Architected and managed a cloud-native data lake using GCS to store raw, refined, and aggregated vehicle data; supported batch and streaming access for data science and real-time analytics teams.
  • Developed asynchronous data extractors using asyncio and multithreading to fetch external APIs (e.g., traffic, weather, vehicle OEM APIs), and enriched vehicle datasets during ETL workflows.
  • Applied statistical methods (Chebyshev's inequality, z-score, IQR filters) and ML models (Isolation Forest, DBSCAN, Autoencoders) to detect anomalies such as abnormal driving patterns, sensor failures, or suspicious routing behavior.
  • Orchestrated ETL and data science workflows using Airflow (Composer) and legacy Oozie, while monitoring and debugging Spark jobs through Cloudera Manager and Ambari in hybrid cloud/on-prem environments.
  • Deployed real-time alerting systems using Cloud Functions and Slack integrations to notify on anomalous vehicle behavior, ETL failures, or SLA violations in ingestion pipelines.
  • Developed Spring Boot microservices deployed on Cloud Run to expose RESTful APIs for managing ETL schedules and pipeline metadata; integrated with Pub/Sub for asynchronous orchestration.
  • Automated pipeline provisioning and resource deployment using Terraform, and enforced consistency with CI/CD pipelines via Tekton, including validation of Spark scripts, APIs, and DAGs.
  • RAG & LLM Integration: Integrated Retrieval-Augmented Generation (RAG) with LangChain, Hugging Face, and FAISS to support LLM-based querying of documentation, logs, and vehicle diagnostics history.
  • Built Chainlit-based internal UIs to let analysts and engineers interact with vehicle datasets via natural language, backed by prompt engineering and vector search.
  • MLOps with Vertex AI & Kubeflow: Developed end-to-end ML pipelines for vehicle anomaly detection, route optimization, and predictive maintenance using Kubeflow Pipelines and deployed via Vertex AI.
  • Integrated BigQuery, Dataflow, and Vertex components in hybrid ML workflows, automating retraining and monitoring through Vertex AI Model Registry, Composer, and Tekton CI/CD.

Data engineer

Wizards of the Coast
Seattle
03.2019 - Current
  • Worked with designers, analysts, developers and software engineers to write and execute reliable and efficient Python programs and related SQL queries for desktop GUI, enterprise, web, scientific and numerical applications as per user or business requirements as well as established coding standards.
  • Wrote a program to use REST API calls to interface with Veeam backup server, and parse output reports of an excel file in python to monitor customer backup usages.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed spark code and spark-SQL/streaming for faster testing and processing of data.
  • Developed and Configured Kafka brokers to pipeline server logs data into spark streaming.
  • Developed ETL jobs using Spark-Scala to migrate data from Oracle to new hive tables.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie.
  • Implemented data ingestion and handling clusters in real time processing using Kafka.
  • Used Pandas, Opencv, NumPy, Seaborn, TensorFlow, Keras, Matplotlib, Sci-kit-learn in Python for developing data pipelines and various machine learning algorithms.
  • Cleansing the data for normal distribution by applying various techniques like missing value treatment, outlier treatment, and hypothesis testing.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Involved in development of Web Services using REST API's for sending and getting data from the external interface in the JSON format.
  • Configured EC2 instances and created S3 data pipes using Boto API to load data from internal data sources.
  • Implemented Agile Methodology for building an internal application.
  • Experience in cloud versioning technologies like GitHub.
  • Worked closely with Data Scientists to know data requirements for the experiments.

Data Science Analyst (Graduate Research Assistant)

The University of Potomac
Washington
08.2017 - 03.2019
  • Developed algorithm that predicts the most vulnerable computers which are more likely to be infected by malware. Achieved 0.7 42 AUC score by applying EDA, one hot encoding and neural network field aware factorization.
  • Achieved 12% RMSLE score while predicting the product price for sellers based on product description & condition, developed by NLP (TF-IDF, N-grams, Count vectorizer, stemming, lemmatizing), ridge and lasso regressions, LDA and gradient boosting framework.
  • Evaluated Information Management System Database to improve Data Quality issues using DQ Analyzer and other Data preprocessing tools.
  • Developed python scripts to automate Data Analysis.
  • Implemented classification algorithm which detects offensive comments and avoid toxicity more accurately to keep online conversation respectful, developed by recurrent neural network (long short-term memory and gated recurrent units) and NLP.
  • Executed Data Analysis and Data Visualization on survey data using Tableau Desktop as well as Compared respondent's demographics data with Univariate Analysis using Python (Pandas, NumPy, Seaborn, ScikitLearn, and Matplotlib).
  • Developed a machine learning model to recommend friends to students based on their similarities.
  • Analyzed university research budget with peer universities budgets in collaboration with the research team, and recommended data standardization and usage to ensure data integrity.
  • Reviewed basic SQL queries and edited inner, left, & right joins in Tableau Desktop by connecting live/dynamic and static datasets.
  • Conducted statistical analysis to validate data and interpretations using Python and R, as well as presented Research findings, status reports and assisted with collecting user feedback to improve the processes and tools.

Python Developer (Python developer/Data engineer)

Société Générale
Bejaia
05.2012 - 09.2015
  • Importing and exporting data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop and developed Spark code.
  • Analyzed SQL scripts and designed the solutions to implement using PySpark.
  • Used JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into Hive tables.
  • Used SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL.
  • Developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
  • Used MongoDB to stored data in JSON format and developed and tested many features of dashboards using Python.
  • Developed server based web traffic statistical analysis tool using Flask, Pandas.
  • Developed and designed an API (RESTful Web Service) for the company's website.
  • Maintained customers relationship management databases (MySQL / PostgreSQL).

Education

Ph.D. - Computational Science & Engineering (Focus: Silicon Photonics)

Harrisburg University of Science and Technology
Harrisburg, PA
07.2025

Master of Science - Computer Science - Scientific Computing Concentration

Harrisburg University of Science and Technology
Harrisburg, PA
01.2024

Bachelor of Information Technology -

University of the Potomac
Washington, DC
01.2017

BTS - Industrial Electronics

Institut National Spécialisé de la Formation Professionnelle (INSFP)
Béjaïa, Algeria
01.2017

Skills

  • Python
  • Java
  • C
  • Scala
  • SQL
  • Bash
  • PySpark
  • Spark SQL
  • Hive
  • HDFS
  • Sqoop
  • Kafka
  • Oozie
  • Cloudera
  • Ambari
  • Dataproc
  • Composer
  • BigQuery
  • Pub/Sub
  • Google Cloud Platform
  • Vertex AI
  • GCS
  • Cloud Run
  • Functions
  • AWS
  • Terraform
  • Docker
  • Kubernetes
  • PostgreSQL
  • MySQL
  • SQLite
  • MongoDB
  • Cassandra
  • Couchbase
  • Linear Regression
  • Logistic Regression
  • Decision Trees
  • Random Forest
  • SVM
  • KMeans
  • DBSCAN
  • PCA
  • Factor Analysis
  • Z-score
  • Chebyshev
  • IQR filtering
  • Isolation Forests
  • Autoencoders
  • TensorFlow
  • PyTorch
  • Keras
  • LangChain
  • Hugging Face
  • FAISS
  • Chainlit
  • RAG pipelines
  • MCP automation
  • Django
  • Flask
  • FastAPI
  • Spring Boot
  • REST APIs
  • JSON
  • GraphQL
  • Asyncio
  • Aiohttp
  • Threading
  • Multiprocessing
  • Jenkins
  • Tekton
  • GitHub Actions
  • SonarQube
  • FOSSA
  • JFrog Artifactory
  • 42Crunch
  • Pandas
  • NumPy
  • SciPy
  • Matplotlib
  • Seaborn
  • Tableau
  • Excel
  • PyCharm
  • Jupyter
  • VS Code
  • IntelliJ
  • Eclipse
  • Spyder
  • Sublime Text
  • Git
  • GitHub
  • GitLab
  • Pytest
  • Unittest

Timeline

Data engineer

Ford
07.2020 - Current

Data engineer

Wizards of the Coast
03.2019 - Current

Data Science Analyst (Graduate Research Assistant)

The University of Potomac
08.2017 - 03.2019

Python Developer (Python developer/Data engineer)

Société Générale
05.2012 - 09.2015

Ph.D. - Computational Science & Engineering (Focus: Silicon Photonics)

Harrisburg University of Science and Technology

Master of Science - Computer Science - Scientific Computing Concentration

Harrisburg University of Science and Technology

Bachelor of Information Technology -

University of the Potomac

BTS - Industrial Electronics

Institut National Spécialisé de la Formation Professionnelle (INSFP)
Nacer Chalane