Photo

Arash Abedi

Computer Engineer

About Me

I am a Computer Engineer with a focus on data science and data engineering. My expertise lies in these fields, where I have honed my skills in programming languages, data analytics, machine learning algorithms, and big data technologies.

Throughout my career, I have worked on a multitude of projects, collaborating with cross-functional teams to drive data-driven decision-making processes and optimize performance.

Education

University of Bergamo Logo
Master in Computer Engineering

Data Science and Data Engineering Pathway

University of Bergamo

2023-2025

Shahrood University of Technology Logo
Bachelor of Science in Robotics Engineering

Faculty of Electrical and Robotic Engineering

Shahrood University of Technology

2009-2013

Master’s Thesis

ENEA Logo
An Anomaly Detection Pipeline for PELL-IP Public Street Lighting Data

Developed as part of a collaborative effort between University of Bergamo and ENEA (Italian National Agency for New Technologies, Energy and Sustainable Economic Development), this project aims to address challenges in analyzing energy consumption data from public street lighting systems. The full project and source code are available on GitLab.

2025

Objective:

The main goal is to build a scalable and production-ready pipeline for unsupervised machine learning models aimed at anomaly detection. By identifying anomalous Points of Delivery (PODIDs) based on monthly energy consumption behaviors, this project seeks to enhance efficiency and reliability in energy management systems, reducing operational inefficiencies and mitigating risks associated with unusual consumption patterns.


Approach:

The pipeline follows a modular and systematic methodology, designed to optimize performance and scalability in real-time anomaly detection. Key aspects of the approach include:

  • Data Ingestion & Preprocessing: The pipeline utilizes PySpark and Apache Spark for distributed data ingestion and transformation, enabling efficient handling of large-scale datasets stored on distributed systems like Hadoop (HDFS).
  • Feature Engineering: The system aggregates monthly energy consumption features and extracts statistical summaries for each Point of Delivery (PODID), ensuring that only complete and meaningful data is included in the anomaly detection process.
  • Hyperparameter Optimization: To enhance model accuracy and performance, GridSearch was employed to identify the optimal hyperparameters for each unsupervised model applied. This systematic approach involved fine-tuning parameters such as the number of estimators and contamination levels for Isolation Forest, and the neighborhood size for Local Outlier Factor (LOF).
  • Anomaly Detection Models: Unsupervised techniques such as PCA, Isolation Forest, and LOF were employed. Each model brought a unique perspective to anomaly detection, and their results were combined via an ensemble approach to enhance robustness and reliability.
  • Production Design: The pipeline was developed with scalability in mind, leveraging PySpark for distributed processing and ensuring integration with production environments. Modular design principles facilitated easy maintenance, while the use of Apache Spark ensured real-time processing capabilities.

Experience

ICT Consultant - Data Science

ALTEN Italia

05/2023 - Ongoing

I provide consultancy services in the ICT sector, specializing in data science and machine learning. My expertise includes implementing machine learning models in Python and predictive modeling on Google Cloud Platform (GCP). I apply both supervised and unsupervised learning techniques for anomaly detection, leveraging methods such as ensemble learning, TensorFlow, and Principal Component Analysis (PCA). Additionally, I analyze network KPIs, deliver comprehensive data visualizations, and provide data-driven insights to enhance decision-making processes.


Software Developer

PARTNER DATA SRL

09/2022 - 04/2023

Developed and maintained desktop and web applications, ensuring high performance and responsiveness using technologies such as Python and Qt Creator.

Skills

Data Science

Machine Learning: Regression and Classification Models, Dataset Structuring, Regularization and Validation.

Deep Learning: Neural Networks and MLP, CNN, RNN, Transformer, Diffusion and Generative Models.

Python Libraries

NumPy, Pandas, SciPy, Scikit-Learn, TensorFlow, Keras, PyTorch, PySpark, PyQt, Matplotlib, Seaborn, Plotly

Programming Languages

Java, Python, C++

Databases

MySQL, Oracle SQL, Microsoft SQL Server

Cloud Platforms

Databricks, GCP

Web Development & Frameworks

HTML, CSS, JavaScript, Spring Boot, Spring Security, Bootstrap

Version Control

Git, GitLab, GitHub