Summary
Overview
Work History
Education
Skills
Personal Information
Timeline
Generic

ARUN CHATURVEDULA

Bengaluru

Summary

Senior Data Engineer with 7+ years of experience designing scalable big data and cloud solutions using Azure, AWS, Databricks, and Spark. Proven expertise in architecting ETL pipelines, automating workflows, optimizing performance, and enforcing data governance and compliance across industries.

Overview

8
8
years of professional experience

Work History

Senior Data Consultant

Sage
London
05.2025 - Current
  • Led migration of enterprise data platform from Snowflake + Matillion to AWS Lakehouse architecture, enhancing scalability and reducing costs.
  • Designed and implemented automated ingestion pipelines using AWS Glue and S3 Landing Zone, triggered by Lambda + SQS for real-time data processing.
  • Developed Cleansing and Conform Zones in S3 to standardize multi-source data for analytics consumption.
  • Built transformation models in dbt-spark on EMR, enabling reusable, scalable, and cost-efficient data transformations.
  • Automated end-to-end pipeline orchestration using Dagster integrated with GitHub Actions and schema versioning.
  • Strengthened data governance using AWS Glue Data Catalog; evaluated Collibra for lineage, quality checks, and compliance.
  • Applied AWS security best practices (IAM, KMS, CloudTrail, CloudFormation) for access control, encryption, and auditing.
  • Delivered a PoC validating dbt-spark + EMR feasibility, demonstrating performance and cost savings.
  • Reduced ETL execution time and improved scalability compared to Matillion by leveraging distributed processing on EMR.
  • Enabled Sage product teams to access high-quality, governed data via Lakehouse architecture.

Senior Data Consultant

Department of Education
01.2025 - 03.2025
  • Recommended migration approach for legacy systems (PDB23 Server, MDR Server) and SSIS packages to Azure PaaS.
  • Led a 3-month POC validating Azure Data Factory (ADF), Azure Databricks, and Synapse Analytics for ingestion, transformation, and modeling.
  • Conducted in-depth assessment of 3,000+ lines of SQL Stored Procedures and SSIS packages to identify dependencies, risks, and performance bottlenecks.
  • Automated data migration using ADF, Databricks, and Python, streamlining processes and reducing manual effort.
  • Designed production-grade Azure PaaS architecture with Data Lake and Synapse Analytics for scalable data processing.
  • Developed error-handling and validation mechanisms ensuring data integrity during migration.
  • Performed risk assessment and proposed mitigation strategies for complex migration scenarios.
  • Created technical documentation covering architecture, pipeline orchestration, and migration strategy.
  • Collaborated with cloud architects, security teams, and DBAs for governance, compliance, and best practices.
  • Planned and documented migration strategy for key education platforms (GIAS, PIMS, UKRL, CRM services).

Senior Data Consultant

TWS (The Workshop)
03.2024 - 01.2025
  • Migrated Player Lifetime Fact view to Holly Data Warehouse, consolidating multi-domain data (Casino, Poker, Sports, Bonus, Payments).
  • Designed real-time pipelines using Kafka Connect and Exasol for high-volume streaming data.
  • Improved data quality with fault-tolerant pipelines, monitored via Prometheus and Grafana.
  • Built end-to-end ETL pipelines using Azure Data Factory and Azure Databricks for analytics and ML workflows.
  • Implemented CI/CD pipelines with Jenkins and Git for automated deployment and testing.
  • Developed Spark pipelines for distributed data processing and incremental data handling using PySpark.
  • Configured Unity Catalog for centralized governance, RBAC, and data lineage.
  • Implemented real-time ingestion with Azure Event Hubs and Stream Analytics.
  • Optimized data lakes in Azure Data Lake Storage for performance and cost efficiency.
  • Designed policies for metadata management, data discovery, and cataloging.

Senior Data Engineer

Legato Health
10.2022 - 09.2023
  • Executed real-time transformations using Spark Streaming pulling data from PEGA API to ADLS, Cosmos DB, and Azure SFTP.
  • Preprocessed JSON documents into flat files using Spark DataFrames for downstream analytics.
  • Integrated CI/CD pipelines and automated testing with Azure DevOps Pipelines and Test Plans.
  • Developed Spark Streaming workflows for customer communications with Azure Event Hubs.
  • Implemented ETL framework using ADF for data extraction, transformation, and centralized storage in ADLS Gen2.
  • Optimized Spark jobs on Azure HDInsight ensuring performance and resilience.
  • Developed Python scripts for data validation, cleansing, and serverless automation with Azure Functions.
  • Leveraged Azure Monitor, Log Analytics, and Application Insights for observability across workflows.
  • Configured RBAC and Azure AD policies for secure data access.
  • Integrated Kafka topics with Azure Event Hubs for real-time streaming and analytics.

Associate

Cognizant
Bengaluru
11.2021 - 10.2022
  • Designed and developed real-time Databricks pipelines for large-scale data streams.
  • Implemented ETL workflows with AWS Glue and processed data on AWS EMR.
  • Developed serverless data applications using AWS Lambda and Step Functions.
  • Configured monitoring with CloudWatch and managed AWS DMS for database migration.
  • Implemented event-driven processing using SNS, SQS, and Redshift Spectrum for analytics.
  • Developed Python and Scala scripts for data transformation, validation, and analytics.
  • Managed Hadoop clusters, NoSQL data models in DynamoDB, and query optimization with Athena.

Data Engineer

eMids Technologies
Bengaluru
07.2020 - 11.2021
  • Developed Spark-based pipelines to filter and load consumer response data into Hive tables in HDFS.
  • Performed impact analysis of Jira stories and optimized big data ingestion using Spark memory, partitions, and joins.
  • Migrated Airflow from 1.10.X to 2.X and wrote scheduling scripts in Python.
  • Managed CRUD operations in HBase and version control with Git.
  • Developed Scala UDFs for Spark applications and optimized performance in Spark DataFrames.
  • Participated in Scrum ceremonies and design evaluations.

IT Developer

DXC
Bengaluru
12.2017 - 03.2020
  • Used Sqoop to import data from EDW to HDFS with incremental loads into Hive tables.
  • Developed Pig scripts for raw data transformations and optimized Hive performance with partitioning, bucketing, and joins.
  • Designed Hadoop workflows using Oozie and Falcon for scheduled jobs.
  • Applied Spark and Scala for data transformation and analytics.
  • Ensured data quality and reliability across pipelines.

Education

BE -

Velagapudi Ramakrishna Siddhartha Engineering College (VRSEC)
05.2014

Skills

  • Azure (ADF, ADLS, Synapse, Databricks, Functions)
  • AWS (Glue, EMR, Lambda, Redshift, S3, RDS, DynamoDB)
  • Apache Spark
  • Kafka
  • Airflow
  • Dbt
  • Scala
  • Python
  • Hive
  • Pig
  • Sqoop
  • Oozie
  • HBase
  • CI/CD (Jenkins, Azure DevOps, GitHub Actions)
  • Git
  • Automated Testing
  • Prometheus
  • Grafana
  • CloudWatch
  • Azure Monitor
  • Azure Application Insights
  • Databricks
  • Power BI
  • SQL
  • JSON
  • Cosmos DB
  • Active Directory
  • Collibra
  • Agile
  • TDD
  • Impact Analysis
  • Performance Optimization

Personal Information

Title: Senior Data Engineer

Timeline

Senior Data Consultant

Sage
05.2025 - Current

Senior Data Consultant

Department of Education
01.2025 - 03.2025

Senior Data Consultant

TWS (The Workshop)
03.2024 - 01.2025

Senior Data Engineer

Legato Health
10.2022 - 09.2023

Associate

Cognizant
11.2021 - 10.2022

Data Engineer

eMids Technologies
07.2020 - 11.2021

IT Developer

DXC
12.2017 - 03.2020

BE -

Velagapudi Ramakrishna Siddhartha Engineering College (VRSEC)
ARUN CHATURVEDULA