Summary

Overview

Work History

Education

Skills

Timeline

Nacer Chalane

Phoenix

Summary

Results-driven IT Professional with 8+ years of experience and mastery in Python, Java, and C++, specializing in data engineering, machine learning, and cloud-native systems. Expert in architecting and managing scalable ETL pipelines using PySpark and Spark SQL for both batch and real-time data workloads, in environments ranging from Cloudera/Hadoop to Google Cloud Platform (GCP). Proficient in ingesting and transforming large-scale datasets-including connected vehicle data (e.g., telematics, geospatial, sensor logs)-for business insights, anomaly detection, and ML model development. Experienced in deploying ML pipelines on Vertex AI, orchestrating workflows with Kubeflow Pipelines, and maintaining robust, automated CI/CD processes using Jenkins, Tekton, and Terraform. Skilled in implementing supervised and unsupervised ML algorithms, deep learning (TensorFlow, PyTorch), and LLM-based systems using LangChain, Hugging Face, and Chainlit. Strong background in DevSecOps (SonarQube, FOSSA, JFrog, 42Crunch) and infrastructure provisioning. Proven ability to drive high-quality, maintainable, and secure code at scale.

Overview

years of professional experience

Work History

Data engineer

Ford

07.2020 - Current

Designed and implemented large-scale ETL pipelines for high-volume, high-velocity data ingestion, including connected vehicle data streams such as telematics, GPS, and sensor signals, using PySpark, Cloud Pub/Sub, and Dataproc, delivering structured insights into driving behavior, fleet operations, and anomaly detection.
Built and optimized ETL workflows to extract data from onboard units (OBUs), transform raw timeseries into structured feature sets (e.g., speed, engine load, geospatial paths), and load into BigQuery and cloud data lakes for downstream analytics and machine learning.
Architected and managed a cloud-native data lake using GCS to store raw, refined, and aggregated vehicle data; supported batch and streaming access for data science and real-time analytics teams.
Developed asynchronous data extractors using asyncio and multithreading to fetch external APIs (e.g., traffic, weather, vehicle OEM APIs), and enriched vehicle datasets during ETL workflows.
Applied statistical methods (Chebyshev's inequality, z-score, IQR filters) and ML models (Isolation Forest, DBSCAN, Autoencoders) to detect anomalies such as abnormal driving patterns, sensor failures, or suspicious routing behavior.
Orchestrated ETL and data science workflows using Airflow (Composer) and legacy Oozie, while monitoring and debugging Spark jobs through Cloudera Manager and Ambari in hybrid cloud/on-prem environments.
Deployed real-time alerting systems using Cloud Functions and Slack integrations to notify on anomalous vehicle behavior, ETL failures, or SLA violations in ingestion pipelines.
Developed Spring Boot microservices deployed on Cloud Run to expose RESTful APIs for managing ETL schedules and pipeline metadata; integrated with Pub/Sub for asynchronous orchestration.
Automated pipeline provisioning and resource deployment using Terraform, and enforced consistency with CI/CD pipelines via Tekton, including validation of Spark scripts, APIs, and DAGs.
RAG & LLM Integration: Integrated Retrieval-Augmented Generation (RAG) with LangChain, Hugging Face, and FAISS to support LLM-based querying of documentation, logs, and vehicle diagnostics history.
Built Chainlit-based internal UIs to let analysts and engineers interact with vehicle datasets via natural language, backed by prompt engineering and vector search.
MLOps with Vertex AI & Kubeflow: Developed end-to-end ML pipelines for vehicle anomaly detection, route optimization, and predictive maintenance using Kubeflow Pipelines and deployed via Vertex AI.
Integrated BigQuery, Dataflow, and Vertex components in hybrid ML workflows, automating retraining and monitoring through Vertex AI Model Registry, Composer, and Tekton CI/CD.

Data engineer

Wizards of the Coast

Seattle

03.2019 - Current

Worked with designers, analysts, developers and software engineers to write and execute reliable and efficient Python programs and related SQL queries for desktop GUI, enterprise, web, scientific and numerical applications as per user or business requirements as well as established coding standards.
Wrote a program to use REST API calls to interface with Veeam backup server, and parse output reports of an excel file in python to monitor customer backup usages.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Developed spark code and spark-SQL/streaming for faster testing and processing of data.
Developed and Configured Kafka brokers to pipeline server logs data into spark streaming.
Developed ETL jobs using Spark-Scala to migrate data from Oracle to new hive tables.
Managing and scheduling Jobs on a Hadoop cluster using Oozie.
Implemented data ingestion and handling clusters in real time processing using Kafka.
Used Pandas, Opencv, NumPy, Seaborn, TensorFlow, Keras, Matplotlib, Sci-kit-learn in Python for developing data pipelines and various machine learning algorithms.
Cleansing the data for normal distribution by applying various techniques like missing value treatment, outlier treatment, and hypothesis testing.
Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
Involved in development of Web Services using REST API's for sending and getting data from the external interface in the JSON format.
Configured EC2 instances and created S3 data pipes using Boto API to load data from internal data sources.
Implemented Agile Methodology for building an internal application.
Experience in cloud versioning technologies like GitHub.
Worked closely with Data Scientists to know data requirements for the experiments.

Data Science Analyst (Graduate Research Assistant)

The University of Potomac

Washington

08.2017 - 03.2019

Developed algorithm that predicts the most vulnerable computers which are more likely to be infected by malware. Achieved 0.7 42 AUC score by applying EDA, one hot encoding and neural network field aware factorization.
Achieved 12% RMSLE score while predicting the product price for sellers based on product description & condition, developed by NLP (TF-IDF, N-grams, Count vectorizer, stemming, lemmatizing), ridge and lasso regressions, LDA and gradient boosting framework.
Evaluated Information Management System Database to improve Data Quality issues using DQ Analyzer and other Data preprocessing tools.
Developed python scripts to automate Data Analysis.
Implemented classification algorithm which detects offensive comments and avoid toxicity more accurately to keep online conversation respectful, developed by recurrent neural network (long short-term memory and gated recurrent units) and NLP.
Executed Data Analysis and Data Visualization on survey data using Tableau Desktop as well as Compared respondent's demographics data with Univariate Analysis using Python (Pandas, NumPy, Seaborn, ScikitLearn, and Matplotlib).
Developed a machine learning model to recommend friends to students based on their similarities.
Analyzed university research budget with peer universities budgets in collaboration with the research team, and recommended data standardization and usage to ensure data integrity.
Reviewed basic SQL queries and edited inner, left, & right joins in Tableau Desktop by connecting live/dynamic and static datasets.
Conducted statistical analysis to validate data and interpretations using Python and R, as well as presented Research findings, status reports and assisted with collecting user feedback to improve the processes and tools.

Python Developer (Python developer/Data engineer)

Société Générale

Bejaia

05.2012 - 09.2015

Importing and exporting data jobs, to perform operations like copying data from HDFS and to HDFS using Sqoop and developed Spark code.
Analyzed SQL scripts and designed the solutions to implement using PySpark.
Used JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into Hive tables.
Used SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL.
Developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
Used MongoDB to stored data in JSON format and developed and tested many features of dashboards using Python.
Developed server based web traffic statistical analysis tool using Flask, Pandas.
Developed and designed an API (RESTful Web Service) for the company's website.
Maintained customers relationship management databases (MySQL / PostgreSQL).

Education

Ph.D. - Computational Science & Engineering (Focus: Silicon Photonics)

Harrisburg University of Science and Technology

Harrisburg, PA

07.2025

Master of Science - Computer Science - Scientific Computing Concentration

Harrisburg University of Science and Technology

Harrisburg, PA

01.2024

Bachelor of Information Technology -

University of the Potomac

Washington, DC

01.2017

BTS - Industrial Electronics

Institut National Spécialisé de la Formation Professionnelle (INSFP)

Béjaïa, Algeria

01.2017

Skills

Python
Java
C
Scala
SQL
Bash
PySpark
Spark SQL
Hive
HDFS
Sqoop
Kafka
Oozie
Cloudera
Ambari
Dataproc
Composer
BigQuery
Pub/Sub
Google Cloud Platform
Vertex AI
GCS
Cloud Run
Functions
AWS
Terraform
Docker
Kubernetes
PostgreSQL
MySQL
SQLite
MongoDB
Cassandra
Couchbase
Linear Regression
Logistic Regression
Decision Trees
Random Forest
SVM
KMeans
DBSCAN
PCA
Factor Analysis
Z-score
Chebyshev
IQR filtering
Isolation Forests

Autoencoders
TensorFlow
PyTorch
Keras
LangChain
Hugging Face
FAISS
Chainlit
RAG pipelines
MCP automation
Django
Flask
FastAPI
Spring Boot
REST APIs
JSON
GraphQL
Asyncio
Aiohttp
Threading
Multiprocessing
Jenkins
Tekton
GitHub Actions
SonarQube
FOSSA
JFrog Artifactory
42Crunch
Pandas
NumPy
SciPy
Matplotlib
Seaborn
Tableau
Excel
PyCharm
Jupyter
VS Code
IntelliJ
Eclipse
Spyder
Sublime Text
Git
GitHub
GitLab
Pytest
Unittest

Timeline

Data engineer

Ford

07.2020 - Current

Data engineer

Wizards of the Coast

03.2019 - Current

Data Science Analyst (Graduate Research Assistant)

The University of Potomac

08.2017 - 03.2019

Python Developer (Python developer/Data engineer)

Société Générale

05.2012 - 09.2015

Ph.D. - Computational Science & Engineering (Focus: Silicon Photonics)

Harrisburg University of Science and Technology

Master of Science - Computer Science - Scientific Computing Concentration

Harrisburg University of Science and Technology

Bachelor of Information Technology -

University of the Potomac

BTS - Industrial Electronics

Institut National Spécialisé de la Formation Professionnelle (INSFP)

Nacer Chalane

Summary

Overview

Work History

Data engineer

Data engineer

Data Science Analyst (Graduate Research Assistant)

Python Developer (Python developer/Data engineer)

Education

Ph.D. - Computational Science & Engineering (Focus: Silicon Photonics)

Master of Science - Computer Science - Scientific Computing Concentration

Bachelor of Information Technology -

BTS - Industrial Electronics

Skills

Timeline

Data engineer

Data engineer

Data Science Analyst (Graduate Research Assistant)

Python Developer (Python developer/Data engineer)

Ph.D. - Computational Science & Engineering (Focus: Silicon Photonics)

Master of Science - Computer Science - Scientific Computing Concentration

Bachelor of Information Technology -

BTS - Industrial Electronics

Similar Profiles

Felipe RodriguesFelipe Rodrigues

Aaron ApolinarAaron Apolinar

Robert QuinnRobert Quinn

Omar GouzaOmar Gouza

Dustin TurneyDustin Turney