Summary

Overview

Work history

Education

Skills

Websites

Certification

Timeline

Rajeev Ramadurai

Leeds,UK

Summary

Senior Data Engineer with 19+ years of experience, including 5 years building AWS cloud-native Lakehouse infrastructure at The Very Group. Specialises in designing and delivering production-grade, event-driven data pipelines at scale — from real-time ingestion and Apache Iceberg table management to historical data recovery and serverless API development. Consistently takes end-to-end ownership: architecture, Terraform infrastructure, Python implementation, deployment, and ongoing optimisation. Collaborative team player with a track record of reducing query times, resolving production incidents, and enabling platform reusability across teams.

Overview

years of professional experience

years of post-secondary education

Certification

Work history

Senior Data Engineer

The Very Group

Speke, Liverpool

2021.06 - 2026.05

Part of the core Lakehouse platform team, responsible for designing and building AWS data infrastructure that powers real-time event ingestion, historical data recovery, and customer-facing data services across the business.
Real-Time Event Ingestion Platform
Designed and delivered a reusable, config-driven ingestion platform — now the standard pattern for all event data flowing into the Lakehouse — processing millions of events daily across multiple business domains.
Architected two ingestion paths to suit different source types: a streaming path using EventBridge Pipes with JSONPath-driven column mapping for application events, and an S3-triggered path via Lambda for file-based sources.
Reduced ongoing effort for new data streams to a single YAML config file — no code changes required — by centralising all infrastructure (Firehose, SQS, DLG, Glue, IAM, CloudWatch) within a single Terraform module.
Enabled UPSERT support across pipelines using Iceberg’s merge-on-read capability, ensuring downstream consumers always receive deduplicated, consistent data.
Query Performance Optimisation — Iceberg Partition Evolution
Identified and resolved a critical query performance bottleneck on a 36M-row Iceberg table, improving average query time by ~50% (3.3s → 1.7s) and reducing files scanned per query from 100% to under 1%.
Delivered the improvement through a non-destructive Iceberg partition evolution — no data rewrite, no downtime — changing the partition strategy to a composite day + bucket approach with a sort order on the primary lookup key.
Built a fully automated, config-driven pipeline to manage the evolution: YAML config → Terraform → DynamoDB → Lambda → Step Functions → EMR Spark job, with end-to-end idempotency and a 10-step QA verification suite.
Unblocked a Jenkins deployment pipeline stalled by a dependency on a decommissioned database cluster, implementing a targeted SSM parameter workaround to restore CI/CD flow without infrastructure regressions.
Historical Data Recovery & Backstop Replay Service
Built a production replay service enabling Lakehouse tables to be fully rebuilt from historical S3 data — used to recover 8.6M+ records across 447 processing chunks during a major backstop recovery exercise.
Designed an intelligent chunking strategy that automatically adjusts processing granularity based on data volume, preventing Lambda timeouts on high-volume days while keeping SQS queue depth manageable for quieter periods.
Ensured data integrity throughout by enforcing sequential processing (one chunk at a time) to prevent concurrent write conflicts on Iceberg tables — a key architectural decision that eliminated an entire class of production failures.
Led a production incident during a live replay — simultaneous pipeline and replay writes triggered CloudWatch alarms — and coordinated a safe resolution following Firehose buffer flush sequencing and IAM policy corrections.
Customer Orders API — Serverless Data Service
Designed and delivered a serverless Lambda function delivering 7 years of customer order history to the Customer Care Advisor application, returning results in ~11 seconds with a full validation and error-handling contract (200/204/400/500/503).
Implemented a config-driven SQL builder that generates parameterised queries entirely from environment variables, supporting multiple filter types (exact match, date range, partial match, integer) with zero hardcoded logic — new filter fields require no code changes.
Integrated securely with the Starburst/Teradata data layer over a private VPC network, with credentials managed through AWS Secrets Manager and a dedicated security group enforcing least-privilege egress.
Structured the response contract to match the API Gateway proxy integration spec, ensuring the service can be migrated behind a gateway in future with no application-layer changes.
Schema Validation & Data Quality
Designed a centralised, reusable schema validation service deployed at organisation level, dynamically enforcing type constraints and nullability rules by fetching schemas from AWS Glue Schema Registry at runtime — eliminating duplicated validation logic across pipelines.
Resolved a production data quality incident on the Customer Marketing Preferences pipeline — root-caused a concurrent write conflict in Kinesis Firehose, coordinated an 8.6M-record backstop replay, and defined a snapshot retention policy aligned with existing backup infrastructure.
Platform Engineering & Infrastructure
Authored and maintained Terraform modules for all Lakehouse infrastructure — Firehose delivery streams, SQS queues, EventBridge rules and pipes, Lambda functions, IAM roles, Glue databases, and CloudWatch alarms — enabling consistent, repeatable deployments across dev, test, and production.
Collaborated closely with platform architects and peer engineers on MR reviews, architectural decisions (KMS key strategy, Iceberg retention policies, schema registry design), and cross-team delivery.

Lead Data Engineer

British Gas

Staines-upon-Thames, Surrey

2016.01 - 2021.06

Led data engineering and architecture for the Insurance & Pricing Analytics team, owning the full data lake platform on Hortonworks/Hadoop and delivering PySpark pipelines for sales, renewals, claims, and PCW automation.
Renegotiated a vendor contract to deliver a lake-optimised solution in place of a costly lift-and-shift, saving £100k–£200k while meeting the original timeline.
Built a reusable PySpark batch framework leveraging Hive, Spark, and Python pipelines, cutting new user onboarding time from weeks to hours.
Delivered automated reporting for Sales, Acquisitions, Renewals, and Claims across all Price Comparison Websites (PCWs), removing manual processing and improving data freshness.
Served as the primary subject matter expert for services and home insurance data across the organisation, providing guidance on data quality, governance, and best practices.

Data Engineer / ETL Lead

British Gas

Staines-upon-Thames, Surrey

2013.03 - 2015.12

Led end-to-end re-development of the Commercial MI data solution as the onshore delivery lead, managing a team of 8 engineers through analysis, design, build, and UAT across a complex SAP migration.
Mapped business requirements to new SAP data flows through detailed data analysis and close collaboration with SAP consultants, producing technical design documentation for Business Objects Data Services and Teradata.
Acted as primary interface with senior leadership on project progress, risks, and delivery milestones throughout the programme.

BI / Data Engineer

British Gas

Staines-upon-Thames, Surrey

2012.09 - 2013.02

Delivered Smart MI compliance reporting and business process improvements as part of an agile scrum team.
Created compliance visibility reports for missed customer visits that remained a key enabler for regulatory reporting, and built data quality dashboards supporting Solvency II compliance.
Conducted AS-IS business process workshops, documented target-state requirements, and delivered multiple small changes through a systematic factory model — navigating VAT, IPT, and Geo Region changes mid-programme.

ETL Developer

Toyota Motor Sales

2010.11 - 2012.08

Delivered an ETL migration project converting Informatica jobs to MSBI/SSIS for the Claims Processing system.
Migrated Informatica ETL jobs to MSBI/SSIS, designing control flows, data flows, and error-detection scripts to industry standards while coordinating deliverables across multiple centres.
Produced approach documents, high-level technical specifications, ETL mapping documents, and unit test cases; supported developers in identifying source-to-target system relationships.

ETL / SQL Developer

Cox and Kings

Mumbai

2010.04 - 2010.10

Developed SSIS packages and SQL automation for the Cox and Kings intranet data platform.
Designed and developed SSIS/DTS packages to extract, transform, and load data across servers from Excel, XML, and flat-file sources; scheduled packages via SQL Server Agent job tasks.

Database Developer

Sun Tours

Chennai

2009.01 - 2010.03

Gathered user requirements, designed and implemented SQL queries, views, triggers, and stored procedures on SQL Server 2000/2005; generated Crystal Reports for business users and managed database backup and maintenance.

Implementation Specialist & Database Programmer

Real Image Media

Chennai

2007.10 - 2008.12

Acted as implementation specialist and primary client contact for Ramco ERP, writing complex stored procedures, triggers, and scripts; worked directly with clients to understand business operations and translate them into system requirements.

Education

Master of Computer Applications (MCA) -

SRM College of Engineering

Chennai, India

2004.01 - 2007.01

Skills

AWS Services: Lambda, Kinesis Firehose, EventBridge (Pipes & Rules), SQS, S3, Glue, EMR on EKS, Step Functions, DynamoDB, Secrets Manager, CloudWatch, IAM, VPC
Data Lakehouse: Apache Iceberg (partition evolution, merge-on-read, snapshot management), Glue Data Catalog, Glue Schema Registry, Apache Parquet
Languages: Python (boto3, PySpark, trino-python-client), SQL, PL/SQL, Bash, YAML, HCL (Terraform)

Infrastructure: Terraform (IaC), Jenkins CI/CD, GitLab, Git, Jira — full dev/test/prod pipeline ownership
Databases: Teradata, Starburst / Trino, Amazon Redshift, Oracle, MS SQL Server
Big Data: Apache Spark, PySpark, Hive, Hadoop/HDFS, HDP, Sqoop, YARN

Websites

linkedin.com/in/rajeev-ramadurai-25a89713a

Certification

Amazon Web Services Data Analytics – Specialty 2024 – 2027
Amazon Web Services Solutions Architect – Associate 2020 – 2023
Amazon Web Services Developer – Associate 2021 – 2024
HashiCorp Terraform - Associate 003 2023 – 2025

Timeline

Senior Data Engineer

The Very Group

2021.06 - 2026.05

Lead Data Engineer

British Gas

2016.01 - 2021.06

Data Engineer / ETL Lead

British Gas

2013.03 - 2015.12

BI / Data Engineer

British Gas

2012.09 - 2013.02

ETL Developer

Toyota Motor Sales

2010.11 - 2012.08

ETL / SQL Developer

Cox and Kings

2010.04 - 2010.10

Database Developer

Sun Tours

2009.01 - 2010.03

Implementation Specialist & Database Programmer

Real Image Media

2007.10 - 2008.12

Master of Computer Applications (MCA) -

SRM College of Engineering

2004.01 - 2007.01