Experienced Senior Data Engineer with 10+ years of expertise in designing, building, and optimizing scalable data architectures and pipelines. Proficient in processing 100M+ daily tracking events using Apache Spark, SQL, Databricks, and AWS, with a strong focus on distributed computing, performance optimization, and data governance. Proven ability to enhance efficiency and scalability, including reducing query execution time from 12 minutes to 29 seconds. Skilled in developing complex ETL pipelines for real-time and batch data processing, implementing observability frameworks, and ensuring data quality and compliance. Extensive experience with AWS (S3, Glue, EMR, Lambda, Redshift) and GCP (BigQuery, Dataflow, Pub/Sub). A proactive leader and mentor, collaborating with cross-functional teams to drive business-aligned data strategies and innovation.
• Managing 100M+ daily user tracking events, developing high-performance, scalable ETL pipelines using Apache Spark (Scala Spark/SQL) and Databricks.
• Developed a new user session definition pipeline to process both historical and incremental data, ensuring consistency in user journey tracking.
• Optimised data transformation workflows, reducing query execution time from 12 minutes to just 29 seconds, significantly improving performance and cost efficiency.
• Designed and implemented Data Governance & Observability frameworks, improving data ownership, quality, and transparency.
• Collaborated with product, analytics, and engineering teams to develop data-driven insights, ensuring data accuracy and reliability.
• Designed and built batch data pipelines in AWS, enabling efficient, scalable data processing.
• Developed a Data Lakehouse using AWS & Apache Hudi, optimizing data workflows and addressing large-scale data management challenges.
• Introduced Airflow for orchestration, improving the efficiency and reliability of data pipelines.
• Played a key role in shaping the company’s data platform strategy, collaborating with cross-functional teams to deliver impactful data products.
• Led data engineering initiatives for multiple UK clients, designing and developing data pipelines on Google Cloud Platform (GCP).
• Provided technical leadership to a team of engineers, offering mentorship and guidance.
• Designed and implemented scalable data solutions, ensuring optimal data quality, security, and governance.
• Worked onsite with chief architects & stakeholders, shaping the design and architecture of next-generation data platforms.
• Developed and optimized batch data pipelines using Apache Spark, Hadoop, Hive, and Scala for large-scale data processing.
• Designed and implemented ETL workflows to ingest, transform, and store structured and semi-structured data in a centralized data lake.
• Optimized Spark jobs to improve performance, reducing execution time and resource consumption.
• Ensured data quality through deduplication, standardization, and enrichment for downstream analytics.
• Collaborated with cross-functional teams to develop scalable data models and reporting solutions.
Databricks Expertise
Google Cloud Platform Certified Professional Data Engineer
Soccer, Cricket
Google Cloud Platform Certified Professional Data Engineer
Google Cloud Platform Certified Professional Cloud Architect
Google Cloud Platform Certified Associate Cloud Engineer