Summary
Overview
Work history
Education
Skills
Certification
Timeline
Generic

Usman Ghani Mughal

London,United Kingdom

Summary

Data Engineer with 4+ years of experience specializing in designing, building, and optimizing scalable, robust data pipelines with a strong focus on reliability, performance, and maintainability across diverse industries. Proficient in data ingestion, ETL/ELT workflows, and architecting resilient data solutions using frameworks like the Medallion Architecture. Skilled in Databricks, PySpark, Delta Lake, and building Spark-based pipelines on AWS EMR with Airflow. Experienced in CI/CD automation, OLTP/OLAP data modelling, Power BI dashboarding, and contributing to AI-driven initiatives.

Overview

4
4
years of professional experience
4
4
years of post-secondary education
1
1
Certification

Work history

Data Engineer

Cloud Enterprise Business Solutions (CEBS)
Islamabad/Pakistan
12.2023 - 01.2025
  • Company Overview: Client: MATAS (Denmark’s largest health & beauty retailer)
  • Migrated Matas (D365 F&O) to Synapse Link in 2 months, reducing load time by 20%.
  • Built scalable ADF pipelines to ingest 1TB/day data into CDP (ADLS Gen2), cutting latency by 40% and optimized existing ADF, reducing runtime by 73% (from 30 minutes to 8 minutes).
  • Built CI/CD in Azure Devops to deploy 50+ ADF, ensuring seamless production releases.
  • Developed and maintained a Medallion architecture with optimised PySpark in Databricks, leveraging Auto Loader, Unity Catalog, and Delta Live Tables for real-time and batch processing, and orchestrated over 100+ workflows using Databricks Asset Bundles, deployed via CI/CD.

Big Data Engineer

Nowasys LTD
Islamabad/Pakistan
02.2023 - 12.2023
  • Company Overview: Client: Anteriad (Anteriad powers B2B with the industry’s leading data)
  • Developed 15 ingestion PySpark pipelines on AWS EMR, ingesting 25–30 terabytes daily into an S3 data lake. Orchestrated workflows via Airflow.
  • Developed/deployed a PySpark-based DQ Framework, reduced data quality errors by 99%.
  • Optimized Spark code and fine-tuned AWS EMR configurations, improving performance and resource utilization by 50%.
  • Implemented automated backfill mechanism for batch pipelines, to reduce data loss by 100%.

Big Data Engineer

the ENTERTAINER
Lahore/Pakistan
05.2022 - 02.2023
  • Company Overview: The ENTERTAINER provides 2-for-1 deals on services from top brands in the Middle East.
  • Optimised Azure Synapse DWH to support cross-functional teams, reducing ad-hoc query time by 5% and dashboard reporting latency by 20%.
  • Defined a standardized data modeling approach (Kimball) for DWH; the approach now serves as a blueprint for 10+ data engineers across the analytics and data team.
  • Built and monitored 50+ ELT pipelines in Azure Data Factory, ingesting data into fact and dimension tables, implementing watermarking for incremental loads.
  • Developed PySpark pipelines on Databricks to process and transform 50M+/day web/app logs, loading into the DWH to improve recommendation system accuracy by 15%.

Data Engineer

Afiniti
Islamabad/Pakistan
04.2021 - 05.2022
  • Company Overview: Afiniti is a leading provider of customer experience (CX) artificial intelligence (AI).
  • Designed and implemented data pipelines for port, vehicle and broadband data serving both US and UK markets, processing 10M+ records daily.
  • Engineered and optimised processes using multiprocessing and multithreading, improving performance by 30% on 100K+ tasks.
  • Built web scraping engine to gather data from sources, reducing data acquisition time by 40%.
  • Provided guidance and support to AI teams, facilitating their understanding and utilization of third-party datasets effectively.

Education

MSc - Data and Data Science Technology

Northumbria University
Current

Bachelor of Science - Computer Science

Comsats University Islamabad
01.2017 - 01.2021

Skills

  • Cloud technologies: Databricks, Azure DevOps
  • Data warehouses: Synapse SQL, Redshift, Teradata
  • Data lakes: ADLS Gen2, S3, HDFS
  • Databases: MySQL, SQL Server, MongoDB
  • Data formats: CSV, JSON, Parquet, Delta
  • Distributed computing: Spark
  • Streaming frameworks: Spark Structured Streaming, Kafka
  • ETL tools: Azure Data Factory
  • Programming languages: Python, Java, Scala, C
  • Data manipulation: Excel, Pandas, NumPy
  • Orchestration tools: Apache Airflow, Cron Jobs
  • Dashboards: Power BI, Tableau

Certification

  • Data Engineering (Nanodegree - Udacity)
  • Microsoft Azure Databricks for Data Engineering
  • Introduction to Big Data with Spark and Hadoop (Coursera | IBM)
  • ETL and Data Pipelines with Shell, Airflow and Kafka (Coursera | IBM)
  • Introduction to Bash Shell Scripting
  • Apache Spark Essential Training: Big Data Engineering (LinkedIn)
  • Advanced Python (LinkedIn)
  • Advanced SQL for Query Tuning and Performance Optimization (LinkedIn)

Timeline

Data Engineer

Cloud Enterprise Business Solutions (CEBS)
12.2023 - 01.2025

Big Data Engineer

Nowasys LTD
02.2023 - 12.2023

Big Data Engineer

the ENTERTAINER
05.2022 - 02.2023

Data Engineer

Afiniti
04.2021 - 05.2022

Bachelor of Science - Computer Science

Comsats University Islamabad
01.2017 - 01.2021

MSc - Data and Data Science Technology

Northumbria University
Usman Ghani Mughal