Summary
Overview
Work history
Education
Skills
Websites
Certification
Timeline
Generic

Rajeev Ramadurai

Leeds,UK

Summary

Senior Data Engineer with 19+ years of experience, including 5 years building AWS cloud-native Lakehouse infrastructure at The Very Group. Specialises in designing and delivering production-grade, event-driven data pipelines at scale — from real-time ingestion and Apache Iceberg table management to historical data recovery and serverless API development. Consistently takes end-to-end ownership: architecture, Terraform infrastructure, Python implementation, deployment, and ongoing optimisation. Collaborative team player with a track record of reducing query times, resolving production incidents, and enabling platform reusability across teams.

Overview

19
19
years of professional experience
3
3
years of post-secondary education
1
1
Certification

Work history

Senior Data Engineer

The Very Group
Speke, Liverpool
2021.06 - 2026.05
  • Part of the core Lakehouse platform team, responsible for designing and building AWS data infrastructure that powers real-time event ingestion, historical data recovery, and customer-facing data services across the business.
  • Real-Time Event Ingestion Platform
  • Designed and delivered a reusable, config-driven ingestion platform — now the standard pattern for all event data flowing into the Lakehouse — processing millions of events daily across multiple business domains.
  • Architected two ingestion paths to suit different source types: a streaming path using EventBridge Pipes with JSONPath-driven column mapping for application events, and an S3-triggered path via Lambda for file-based sources.
  • Reduced ongoing effort for new data streams to a single YAML config file — no code changes required — by centralising all infrastructure (Firehose, SQS, DLG, Glue, IAM, CloudWatch) within a single Terraform module.
  • Enabled UPSERT support across pipelines using Iceberg’s merge-on-read capability, ensuring downstream consumers always receive deduplicated, consistent data.
  • Query Performance Optimisation — Iceberg Partition Evolution
  • Identified and resolved a critical query performance bottleneck on a 36M-row Iceberg table, improving average query time by ~50% (3.3s → 1.7s) and reducing files scanned per query from 100% to under 1%.
  • Delivered the improvement through a non-destructive Iceberg partition evolution — no data rewrite, no downtime — changing the partition strategy to a composite day + bucket approach with a sort order on the primary lookup key.
  • Built a fully automated, config-driven pipeline to manage the evolution: YAML config → Terraform → DynamoDB → Lambda → Step Functions → EMR Spark job, with end-to-end idempotency and a 10-step QA verification suite.
  • Unblocked a Jenkins deployment pipeline stalled by a dependency on a decommissioned database cluster, implementing a targeted SSM parameter workaround to restore CI/CD flow without infrastructure regressions.
  • Historical Data Recovery & Backstop Replay Service
  • Built a production replay service enabling Lakehouse tables to be fully rebuilt from historical S3 data — used to recover 8.6M+ records across 447 processing chunks during a major backstop recovery exercise.
  • Designed an intelligent chunking strategy that automatically adjusts processing granularity based on data volume, preventing Lambda timeouts on high-volume days while keeping SQS queue depth manageable for quieter periods.
  • Ensured data integrity throughout by enforcing sequential processing (one chunk at a time) to prevent concurrent write conflicts on Iceberg tables — a key architectural decision that eliminated an entire class of production failures.
  • Led a production incident during a live replay — simultaneous pipeline and replay writes triggered CloudWatch alarms — and coordinated a safe resolution following Firehose buffer flush sequencing and IAM policy corrections.
  • Customer Orders API — Serverless Data Service
  • Designed and delivered a serverless Lambda function delivering 7 years of customer order history to the Customer Care Advisor application, returning results in ~11 seconds with a full validation and error-handling contract (200/204/400/500/503).
  • Implemented a config-driven SQL builder that generates parameterised queries entirely from environment variables, supporting multiple filter types (exact match, date range, partial match, integer) with zero hardcoded logic — new filter fields require no code changes.
  • Integrated securely with the Starburst/Teradata data layer over a private VPC network, with credentials managed through AWS Secrets Manager and a dedicated security group enforcing least-privilege egress.
  • Structured the response contract to match the API Gateway proxy integration spec, ensuring the service can be migrated behind a gateway in future with no application-layer changes.
  • Schema Validation & Data Quality
  • Designed a centralised, reusable schema validation service deployed at organisation level, dynamically enforcing type constraints and nullability rules by fetching schemas from AWS Glue Schema Registry at runtime — eliminating duplicated validation logic across pipelines.
  • Resolved a production data quality incident on the Customer Marketing Preferences pipeline — root-caused a concurrent write conflict in Kinesis Firehose, coordinated an 8.6M-record backstop replay, and defined a snapshot retention policy aligned with existing backup infrastructure.
  • Platform Engineering & Infrastructure
  • Authored and maintained Terraform modules for all Lakehouse infrastructure — Firehose delivery streams, SQS queues, EventBridge rules and pipes, Lambda functions, IAM roles, Glue databases, and CloudWatch alarms — enabling consistent, repeatable deployments across dev, test, and production.
  • Collaborated closely with platform architects and peer engineers on MR reviews, architectural decisions (KMS key strategy, Iceberg retention policies, schema registry design), and cross-team delivery.

Lead Data Engineer

British Gas
Staines-upon-Thames, Surrey
2016.01 - 2021.06
  • Led data engineering and architecture for the Insurance & Pricing Analytics team, owning the full data lake platform on Hortonworks/Hadoop and delivering PySpark pipelines for sales, renewals, claims, and PCW automation.
  • Renegotiated a vendor contract to deliver a lake-optimised solution in place of a costly lift-and-shift, saving £100k–£200k while meeting the original timeline.
  • Built a reusable PySpark batch framework leveraging Hive, Spark, and Python pipelines, cutting new user onboarding time from weeks to hours.
  • Delivered automated reporting for Sales, Acquisitions, Renewals, and Claims across all Price Comparison Websites (PCWs), removing manual processing and improving data freshness.
  • Served as the primary subject matter expert for services and home insurance data across the organisation, providing guidance on data quality, governance, and best practices.

Data Engineer / ETL Lead

British Gas
Staines-upon-Thames, Surrey
2013.03 - 2015.12
  • Led end-to-end re-development of the Commercial MI data solution as the onshore delivery lead, managing a team of 8 engineers through analysis, design, build, and UAT across a complex SAP migration.
  • Mapped business requirements to new SAP data flows through detailed data analysis and close collaboration with SAP consultants, producing technical design documentation for Business Objects Data Services and Teradata.
  • Acted as primary interface with senior leadership on project progress, risks, and delivery milestones throughout the programme.

BI / Data Engineer

British Gas
Staines-upon-Thames, Surrey
2012.09 - 2013.02
  • Delivered Smart MI compliance reporting and business process improvements as part of an agile scrum team.
  • Created compliance visibility reports for missed customer visits that remained a key enabler for regulatory reporting, and built data quality dashboards supporting Solvency II compliance.
  • Conducted AS-IS business process workshops, documented target-state requirements, and delivered multiple small changes through a systematic factory model — navigating VAT, IPT, and Geo Region changes mid-programme.

ETL Developer

Toyota Motor Sales
2010.11 - 2012.08
  • Delivered an ETL migration project converting Informatica jobs to MSBI/SSIS for the Claims Processing system.
  • Migrated Informatica ETL jobs to MSBI/SSIS, designing control flows, data flows, and error-detection scripts to industry standards while coordinating deliverables across multiple centres.
  • Produced approach documents, high-level technical specifications, ETL mapping documents, and unit test cases; supported developers in identifying source-to-target system relationships.

ETL / SQL Developer

Cox and Kings
Mumbai
2010.04 - 2010.10
  • Developed SSIS packages and SQL automation for the Cox and Kings intranet data platform.
  • Designed and developed SSIS/DTS packages to extract, transform, and load data across servers from Excel, XML, and flat-file sources; scheduled packages via SQL Server Agent job tasks.

Database Developer

Sun Tours
Chennai
2009.01 - 2010.03
  • Gathered user requirements, designed and implemented SQL queries, views, triggers, and stored procedures on SQL Server 2000/2005; generated Crystal Reports for business users and managed database backup and maintenance.

Implementation Specialist & Database Programmer

Real Image Media
Chennai
2007.10 - 2008.12
  • Acted as implementation specialist and primary client contact for Ramco ERP, writing complex stored procedures, triggers, and scripts; worked directly with clients to understand business operations and translate them into system requirements.

Education

Master of Computer Applications (MCA) -

SRM College of Engineering
Chennai, India
2004.01 - 2007.01

Skills

  • AWS Services: Lambda, Kinesis Firehose, EventBridge (Pipes & Rules), SQS, S3, Glue, EMR on EKS, Step Functions, DynamoDB, Secrets Manager, CloudWatch, IAM, VPC
  • Data Lakehouse: Apache Iceberg (partition evolution, merge-on-read, snapshot management), Glue Data Catalog, Glue Schema Registry, Apache Parquet
  • Languages: Python (boto3, PySpark, trino-python-client), SQL, PL/SQL, Bash, YAML, HCL (Terraform)
  • Infrastructure: Terraform (IaC), Jenkins CI/CD, GitLab, Git, Jira — full dev/test/prod pipeline ownership
  • Databases: Teradata, Starburst / Trino, Amazon Redshift, Oracle, MS SQL Server
  • Big Data: Apache Spark, PySpark, Hive, Hadoop/HDFS, HDP, Sqoop, YARN

Certification

  • Amazon Web Services Data Analytics – Specialty 2024 – 2027
  • Amazon Web Services Solutions Architect – Associate 2020 – 2023
  • Amazon Web Services Developer – Associate 2021 – 2024
  • HashiCorp Terraform - Associate 003 2023 – 2025

Timeline

Senior Data Engineer

The Very Group
2021.06 - 2026.05

Lead Data Engineer

British Gas
2016.01 - 2021.06

Data Engineer / ETL Lead

British Gas
2013.03 - 2015.12

BI / Data Engineer

British Gas
2012.09 - 2013.02

ETL Developer

Toyota Motor Sales
2010.11 - 2012.08

ETL / SQL Developer

Cox and Kings
2010.04 - 2010.10

Database Developer

Sun Tours
2009.01 - 2010.03

Implementation Specialist & Database Programmer

Real Image Media
2007.10 - 2008.12

Master of Computer Applications (MCA) -

SRM College of Engineering
2004.01 - 2007.01
Rajeev Ramadurai