Summary
Overview
Work history
Education
Skills
Websites
Projects
References
Timeline
Generic

Jesse Pepple

Northampton,England

Summary

Experienced Data Engineer with expertise in ETL, data modelling, and medallion architecture in Azure, Fabric and Databricks ecosystem. Proficient in batch and streaming processes, leveraging tools such as Fabric Data Factory, Eventhouse, Synapse, OneLake, Delta Live Tables and Azure Data Factory. Skilled in Python, PySpark, and SQL programming languages. Adept at using Azure DevOps for CI/CD pipelines and Databricks Asset Bundles for efficient data management. Committed to optimising data workflows and enhancing data-driven decision-making processes.

Overview

1
1
year of professional experience

Work history

Part Time Azure Data Engineer

Rovar Technology
London
2024.08 - 2026.01
  • Designed and maintained end-to-end data pipelines with Azure Data Factory, Databricks, and Synapse Analytics.
  • Developed complex data models to support business decision-making.
  • Streamlined flow of information by introducing efficient ETL processes.
  • Implemented Medallion Architecture using Delta Lake for scalable and reliable data processing.
  • Applied Change Data Capture (CDC) for incremental loading, improving pipeline efficiency.
  • Enforced data governance and lineage using Unity Catalog across large datasets in ADLS Gen2.

Education

Bachelor of Science - Business Computing

University Of Northampton
United Kingdom
2021.09 - 05/2024

Skills

  • Data Engineering: ETL, Data Modelling, Star Schema, Medallion Architecture, Batch & Streaming
  • Tools & Platforms: Fabric Data Factory, Eventhouse, Fabric Warehouse, OneLake, Azure Data Factory, Azure Data Lake Storage (ADLS Gen2), Azure SQL Database, Synapse Analytics, Databricks, Delta Lake, Delta Live Tables
  • Programming & Scripting: Python, PySpark, SQL
  • DevOps & CI/CD: Git, Azure DevOps, Databricks Asset Bundles

Projects

End-to-End Sales Azure Data Engineering Project (Azure | Databricks | CI/CD)

Built production-grade Azure pipeline ingesting sales data from REST APIs via Azure Data Factory into ADLS, processing through Medallion Architecture in Databricks. Implemented Spark Structured Streaming with Auto Loader for incremental ingestion, Delta Live Tables for curated Gold datasets with SCD Type 2 dimensions, and full CI/CD using GitHub and Databricks Asset Bundles across Dev/Test/Prod environments.

Overall Project Impact

  • End-to-End Automation: 100% of the pipeline automated from ingestion → transformation → delivery, ensuring minimal manual intervention.
  • Version Control & CI/CD: All notebooks and pipelines managed via GitHub and Databricks Bundles, enabling reproducibility and seamless collaboration.
  • Scalability & Maintainability: Architecture supports additional data sources and schema evolution without manual updates, future-proofing the pipeline.
  • · Business Value: Accelerated decision-making with timely, reliable sales data; reduced analytics engineering workload by ~6 hours per week.

Link: End-to-End-Sales-Azure-Data-Engineering-Project-With-Databricks-AssetsBundle-CI-CD

Azure Data Engineering Project With CI/CD And Databricks Asset Bundles

Implemented an end-to-end Azure data pipeline using Azure SQL Database as the source, orchestrating incremental ingestion with Azure Data Factory into Azure Data Lake (Bronze layer). Processed and enriched data in Azure Databricks using Spark Structured Streaming, Auto Loader, and schema evolution to build the Silver layer. Created curated Gold layer datasets with Delta Live Tables (DLT), implementing SCD Type 2 for dimensions and SCD Type 1 upserts for facts, following the Medallion Architecture. Integrated CI/CD with Git to automate deployments across environments and delivered analytics-ready datasets using PySpark and SQL to Databricks SQL Warehouse and Azure Synapse Analytics.

Overall Project Impact

End-to-End Automation: 100% of the pipeline automated from ingestion → transformation → delivery using Azure Data Factory, Databricks, and Delta Live Tables (DLT), with incremental ingestion and CDC ensuring zero duplication and full traceability.

Data Quality & Reliability: Implemented SCD Type 2 for dimension tables and SCD Type 1 for fact tables, with data quality expectations validated on 100% of tables and full audit logging across ingestion and transformations.

Scalability & Maintainability: Designed using the Medallion Architecture (Bronze → Silver → Gold) with automated schema evolutionand CI/CD deployments via Azure DevOps and GitHub Asset Bundles across Dev → Test → Prod environments.

Business Value & Analytics Enablement: Delivered curated datasets to Databricks SQL Warehouse, Synapse Analytics, and Power BI Partner Connect, enabling self-service analytics and accelerating decision-making while reducing manual reporting effort by ~2–3 hours per week.

Link: End-to-End-Sales-Azure-Data-Engineering-Project-With-Databricks-AssetsBundle-CI-CD

Flights Azure Databricks Project

Developed a fully end-to-end data engineering solution built exclusively on Azure Databricks, leveraging Spark Structured Streaming for real-time data ingestion and processing. Utilized PySpark to perform scalable, high-performance data transformations, and built Delta Live Tables (DLT) pipelines to automate Slowly Changing Dimensions (SCDs) while enforcing data quality and consistency. Designed and delivered dynamic dimensional models that produced curated, analytics-ready datasets for downstream consumption.

Overall Project Impact

  • End-to-End Automation: Bronze → Silver → Gold pipeline fully automated with streaming, DLT, and UPSERTs
  • Incremental Data Handling: 100% incremental load success → reduced processing time and cost
  • Data Quality & Reliability: All DLT expectations met → no data loss or duplicates
  • Reusability: Dynamic dimensional modelling workflow reusable across multiple datasets
  • Business Value: Faster, reliable access to curated datasets → supports analytics, BI reporting, and decision-making

Link: Flights Azure Databricks End To End Data Engineering Project

Microsoft Fabric Data Engineering Project

Developed end-to-end Fabric pipeline with parameterized Data Factory ingestion, OneLake storage, and Fabric Notebooks for transformation. Implemented SCD Type 2 for historical tracking and star schema modelling in Fabric Data Warehouse. Delivered Power BI dashboards with email-based monitoring

Overall Project Impact

  • Delivered a fully automated, end-to-end Microsoft Fabric Lakehouse pipeline, reducing manual data handling and enabling reliable ingestion, transformation, and analytics for Airbnb datasets.
  • Enabled historical tracking and auditability through implementation of SCD Type 2 on dimension tables, improving analytical accuracy for trend and time-based analysis.
  • Improved pipeline reliability and operational visibility by introducing parameterized ingestion, dynamic control tables, and email-based failure monitoring.
  • Produced analytics-ready, star-schema-modelled gold datasets, significantly simplifying BI development and improving query performance for downstream reporting.
  • Accelerated self-service analytics by delivering curated data directly to the Fabric Data Warehouse and Power BI, reducing dependency on data engineering support.
  • Demonstrated cross-platform data engineering expertise, showcasing the ability to design production-grade pipelines in Microsoft Fabric while understanding trade-offs versus Databricks (DLT, Autoloader) approaches.

Link: Microsoft Fabric Airbnb Data Engineering Project

Azure Databricks End-To-End Project with Azure Devops

Delivered a scalable, automated data engineering solution using Azure Databricks, Azure Data Factory, and Azure Data Lake Storage, with real-time ingestion implemented through Spark Structured Streaming. Built reliable transformation pipelines incorporating SCD Type 1 (manual) and SCD Type 2 (Delta Live Tables), and applied Star Schema modelling with incremental data loading to support efficient analytics. Curated high-quality datasets across the bronze, silver, and gold layers, and published analytics-ready data to Azure Synapse Analytics and Databricks SQL Warehouse for BI and reporting.

Overall Project Impact

  • End-to-End Automation: Bronze → Silver → Gold pipeline fully automated with incremental loads
  • Historical Tracking: SCD Type 1 and Type 2 pipelines ensured data consistency and historical accuracy
  • Data Quality: Zero dropped or duplicated records; data quality checks passed
  • Business Value: Rapid, reliable access to curated datasets for analytics teams, improving decision-making speed
  • Reusable Components: Parameterized notebooks and dynamic ingestion logic make the pipeline reusable for other datasets

Link: https://www.jesseportfolio.co.uk/post/azure-databricks-end-to-end-dataengineering-project-with-azure-devops

Olympics Data Engineering Project with Azure DevOps

Built an end-to-end Azure and Databricks data pipeline using the Olympics 2024 dataset, designed around the Medallion Architecture (Bronze → Silver → Gold). Orchestrated data ingestion with Azure Data Factory into Azure Data Lake Storage and applied strong data governance using Unity Catalog for secure, centralized access control. Developed transformation pipelines with Delta Live Tables (DLT), implementing CDC and SCD Type 1 to manage incremental updates and ensure data consistency. Delivered curated gold-layer datasets to Databricks SQL Warehouse and Azure Synapse Analytics, enabling high-performance analytics and optimized reporting for tools such as Power BI.

Overall Project Impact

  • End-to-End Automation: Bronze → Silver → Gold with CDC + SCD Type 1 handling
  • Governance: Unity Catalog ensures secure, auditable, and governed data assets
  • Incremental Loading: Automatic detection and ingestion of new data → reduces processing costs and runtime
  • Business Value: Analysts and data scientists can now access clean, curated, and ready-to-use datasets for reporting and advanced analytics
  • Portfolio Highlight: Showcases cloud data engineering skills, CI/CD, and modern ETL design with Delta Lake

Link: jesseportfolio.co.uk/post/olympics-data-engineering-project-with-azure-devops

References

References available upon request.

Timeline

Part Time Azure Data Engineer

Rovar Technology
2024.08 - 2026.01

Bachelor of Science - Business Computing

University Of Northampton
2021.09 - 05/2024
Jesse Pepple