Summary
Overview
Work history
Education
Skills
Certification
Career Summary
Timeline
Generic

Prem Sharma

London

Summary

Experienced Scala Data Engineer and Streaming Engineer with over 17 years in application development, specialising in functional programming using Scala and libraries like Cats and ZIO. Demonstrated expertise in real-time data streaming, cloud platforms, and scalable data architectures, with hands-on experience in Kafka, Flink,K-Streams, Spark Streaming, and Azure Databricks. Proficient in Test Driven Development (TDD), Behaviour Driven Development (BDD), pair programming, and automation testing using tools such as JUnit, Mockito, Cucumber, and Docker. Skilled in leveraging modern cloud platforms like AWS and Azure for secure data pipelines and governance. Notable achievements include designing Stateful stream processing with Kafka Streams + Rocks-DB for high-performance distributed state management and developing end-to-end data pipelines using Apache Airflow. Career goals include advancing expertise in cloud-native solutions to drive innovative data engineering projects.

Overview

16
16
years of professional experience
2008
2008
years of post-secondary education
1
1
Certification

Work history

JVM/Streaming-DataEnginner

IBM
London
2025.07 - 2026.03
  • Collaborated with key stakeholders and data providers to define requirements, ensure data accuracy, and deliver actionable insights through advanced data analytics and reporting
  • Designed and implemented high-throughput streaming pipelines using Apache Flink (Scala) with event-time processing, watermarking, CEP patterns, and stateful stream enrichment for fraud-detection and transaction-monitoring workloads
  • Built low-latency microservices using Scala, Akka, integrating with banking systems (customer onboarding, risk scoring, trade flows) with strict SLAs and high resilience.
  • Developed type-safe functional data transformations using Cats / FP patterns, ensuring predictability, immutability, and fault-tolerant processing in distributed environments.
  • Built end-to-end data pipelines processing high-volume financial datasets (transactions, customer risk attributes, positions data, trade events) using Flink streaming + Scala Spark batch
  • Implemented stateful stream processing in Flink (Scala) using keyed streams, custom window assigners, triggers, and managed state (ValueState / MapState) to build scalable enrichment joins and fraud pattern detection pipelines.

Scala Quantexa AWS Data Enginner

SantanderBank
London
2023.12 - 2025.03
  • Collaborated with key stakeholders and data providers to define requirements, ensure data accuracy, and deliver actionable insights through advanced data analytics and reporting
  • Designed and implemented scalable Spark pipelines to process multi-terabyte datasets, ensuring optimal performance and integration with the wider Big Data ecosystem
  • Played key role in development of an MVP using Quantexa for a major bank in the UK to solve AML specific problems in trade finance
  • Utilized Quantexa's dynamic entity resolution and network analytics tools to visualize and understand the intricate relationships between individual customers, businesses, and other relevant entities.
  • Customized Quantexa's platform to better suit the organization's specific KYC and compliance needs
  • Integrated diverse internal and external data sources using Quantexa's platform to create a holistic view of customers.
  • Used Scala Cats API to build Validation Framework to implement the business validations to the platform

Quantexa Data Enginner

Synchron(HSBC)
London
2022.12 - 2023.08
  • Working as a Scala Developer on a Greenfield project for HSBC.
  • Create Data pipelines for data products using medallion data architecture in Anti Money Laundering Landscape.
  • Working as a Scala developer on a Greenfield project for HSBC.
  • Develop web applications and services, write unit/IT tests, deploy applications.
  • Organised tuning of Spark application for setting right level of parallelism and memory tuning
  • Collaborated with key stakeholders and data providers to define requirements, ensure data accuracy, and deliver actionable insights through advanced data analytics and reporting
  • Designed and implemented scalable Spark pipelines to process multi-terabyte datasets, ensuring optimal performance and integration with the wider Big Data ecosystem
  • Played key role in development of an MVP using Quantexa for a major bank in the UK to solve AML specific problems in trade finance
  • Utilized Quantexa's dynamic entity resolution and network analytics tools to visualize and understand the intricate relationships between individual customers, businesses, and other relevant entities.
  • Customized Quantexa's platform to better suit the organization's specific KYC and compliance needs
  • Integrated diverse internal and external data sources using Quantexa's platform to create a holistic view of customers.
  • Utilized Kafka Streams API with Scala to transform and aggregate high-throughput event data, enabling real-time analytics.
  • Engaged with business stakeholders to gather, analyze, and translate complex real-time data streaming requirements into technical stories for implementation.
  • Collaborated closely with Quantexa Business Analysts (BAs) to refine and break down high-level requirements into actionable user stories using Agile methodologies
  • Designed and optimized stateful stream processing with Kafka Streams + RocksDB for high-performance distributed state management.
  • Designed and implemented scalable data pipelines and custom Data Lake APIs using Apache Spark to efficiently ingest, process, and store large-scale structured and unstructured data across cloud-based storage systems.
  • Led the migration of the enterprise data warehousing platform from on-premise to AWS, leveraging Amazon S3 for scalable storage, AWS Glue for ETL orchestration, Amazon EMR for distributed data processing with Spark, and PostgreSQL on RDS for metadata and analytics storage; improved performance, reduced infrastructure costs, and enabled cloud-native scalability.
  • Designed and developed a real-time event-driven platform using Apache Flink (Scala API), following CQRS-style architecture to handle millions of events for ingestion, transformation, and projection into multiple stateful views

Scala Developer

BAML
Bromley
2022.05 - 2022.11
  • Working as a Scala developer on a Greenfield project for BAML Security Settlement Engine.
  • Pushing the updated holding Position or Position change in Holdings back to downstream systems.
  • Develop services, create APIs, write unit/IT/e2e tests, deploy services.
  • Actively participated in the design discussions and development of a real-time system to handle and update positions of holdings, ensuring timely and accurate representation of data..
  • Architected and developed resilient distributed systems using Akka, leveraging event sourcing patterns to ensure data consistency, durability, and recoverability across microservices
  • Designed and developed highly performant, scalable, and resilient microservices using Scala, with gRPC as the communication protocol.
  • Defined and maintained Protocol Buffers (protobuf) specifications, ensuring a strong contract between microservices.
  • Designed and implemented resilient and scalable data processing pipelines using Scala and FS2 streams, resulting in a [specific benefit, e.g., '50% reduction in data latency' or 'enhanced fault-tolerance for critical workflows']
  • Applied functional programming patterns (Monads, Typeclasses, Implicits, Futures, Cats Effects, Monix Task) to write type-safe, scalable, and resilient Kafka streaming applications.
  • Designed and implemented event-driven pipelines using Apache Flink (Scala API) to process and enrich millions of real-time events per day.

Scala Data Enginner Software Engineer

HSBC
Glasgow, Scotland
2021.08 - 2022.04
  • Working as a Scala developer on a Greenfield project.
  • Develop application/platform, create APIs, write automation tests.
  • Data load/ingestion/transformation using Azure Data Factory.
  • Developed, enhanced, re-engineered, and maintained applications with Scala and FP Cats
  • Utilized Azure Databricks to process and analyze large datasets, extracting valuable insights for informed business decisions
  • Designed and implemented scalable Spark pipelines to process multi-terabyte datasets, ensuring optimal performance and integration with the wider Big Data ecosystem
  • Collaborated with key stakeholders and data providers to define requirements, ensure data accuracy, and deliver actionable insights through advanced data analytics and reporting
  • Designed and optimized stateful stream processing with Kafka Streams + RocksDB for high-performance distributed state management.
  • Designed and implemented Kafka-based real-time streaming solutions on a Cloudera-based on-premise platform, ensuring seamless integration with Hadoop ecosystem.
  • Designed and owned components of a large-scale data platform, implementing real-time data pipelines using Kafka (KStreams, KSQL) and Flink.
  • Built and optimized event-driven architectures using Apache Kafka, ensuring low-latency, fault-tolerant data streaming.
  • Worked extensively with data warehousing solutions like ADLS, and Databricks, optimizing data ingestion, partitioning, and query performance.
  • Configured and managed Unity Catalog for fine-grained data access permissions, enabling secure data sharing across teams and compliance with organizational policies.
  • Integrated Flink with Kafka, HDFS, and Apache Pulsar, ensuring exactly-once semantics using checkpointed sources/sinks.
  • Designed stateful Flink jobs using KeyedProcessFunction, WindowFunction, and BroadcastState to implement dynamic business rules, fraud detection, and time-based alerts.
  • Developed and maintained scalable ETL pipelines on Azure Databricks with Unity Catalog integration to ensure consistent metadata management and lineage tracking.
  • Migrated existing Hive Metastore schemas to Unity Catalog, improving data discoverability, consistency, and manageability across Databricks environments.
  • Developed Spark Structured Streaming jobs in Scala to consume high-volume data from Azure Event Hubs, applying transformations, enrichment, and schema validations before landing into the bronze layer (Delta Lake).
  • Integrated Azure Event Hubs as the real-time ingestion layer for streaming data into the Lakehouse architecture, enabling scalable and resilient event-driven pipelines.
  • Developed Spark Structured Streaming jobs in Scala to consume high-volume data from Azure Event Hubs, applying transformations, enrichment, and schema validations before landing into the bronze layer (Delta Lake).
  • Designed and implemented modern Azure data platforms combining Microsoft Fabric, Azure Synapse, and ADF, with Spark-based transformations for lakehouse architecture (Bronze → Silver → Gold).
  • Led the migration of metadata from legacy Hive Metastore to Unity Catalog, enhancing data discoverability, improving access control, and enabling consistent governance across Azure Databricks environments

Scala Data Enginner

Morgan Stanley
Glasgow, Scotland
2021.06 - 2021.08
  • Working as a Scala Developer on MS Risk platform.
  • Develop web components using TDD and design patterns.
  • TDD, BDD, pair programming.
  • Write automation tests.
  • Designed and implemented scalable Spark pipelines to process multi-terabyte datasets, ensuring optimal performance and integration with the wider Big Data ecosystem
  • Collaborated with key stakeholders and data providers to define requirements, ensure data accuracy, and deliver actionable insights through advanced data analytics and reporting
  • Designed, developed, and managed automated workflows using Control-M to ensure timely and accurate data integration, transformation, and processing for end-to-end data pipeline solutions, optimizing data flow and improving data availability for analytical applications.
  • Led and executed the migration of on-premise data assets to Azure Data Lake, ensuring secure and seamless transition while optimizing storage and access costs.
  • Collaborated with cross-functional teams to translate business goals into actionable data strategies, promoting the use of governed, high-quality datasets and self-serve data products through Databricks, Delta Lake, and BI integrations.
  • Designed and implemented Delta Live Tables (DLT) pipelines to streamline data ingestion from raw landing zones to curated Delta Lake layers (bronze → silver → gold).
  • Integrated Auto Loader with DLT to create continuously updating bronze tables with schema inference and CDC support.
  • Implemented SCD Type 1 and Type 2 patterns in Silver layer using EXPECT and MERGE operations within DLT.
  • Designed and implemented modern Azure data platforms combining Microsoft Fabric, Azure Synapse, and ADF, with Spark-based transformations for lakehouse architecture (Bronze → Silver → Gold).
  • Built end-to-end data pipelines in ADF for orchestrating ingestion from Azure Blob/ADLS into Synapse Spark pools, performing data cleansing and enrichment using Scala Spark notebooks.
  • Leveraged Microsoft Fabric to centralize storage (OneLake), govern metadata, and simplify pipeline development across teams using Synapse Data Engineering and Data Factory experiences.

Software Engineer

Citibank
Glasgow, Scotland
2020.11 - 2021.04
  • Working as a Scala Developer .
  • Develop web components using TDD and design patterns.
  • Write automation tests.
  • Utilized Kafka Streams API with Scala to transform and aggregate high-throughput event data, enabling real-time analytics.
  • Defined expectations and data quality controls using Delta Live Tables to ensure trust in AML datasets
  • Established enterprise-wide data lineage tracking using Unity Catalog and Delta Live Tables, ensuring traceability from raw ingestion to curated layers for audit and compliance.
  • Implemented Unity Catalog in Azure Databricks to centralize metadata management, streamline data discovery, and enforce fine-grained access control across domains.
  • Led the modernization of legacy data lakes to Delta Lake format, enabling ACID compliance, schema evolution, and governance-ready data management.
  • Modeled data products and domains using Delta Lake’s schema structure aligned with enterprise data mesh principles.
  • Led the migration of metadata from legacy Hive Metastore to Unity Catalog, enhancing data discoverability, improving access control, and enabling consistent governance across Azure Databricks environments
  • Developed high-performance Scala microservices using Cats and Akka for the consumption team, enabling low-latency access to curated datasets from the Delta Lake warehouse and supporting real-time analytical workflows.
  • Built scalable batch data pipelines using Scala-Spark in Azure, orchestrated via Azure Data Factory, to ingest and transform raw data into curated Delta Lake layers on ADLS Gen2, enabling a robust lakehouse architecture.
  • Deployed and managed Airflow on Azure Kubernetes Service (AKS) to orchestrate PySpark jobs on Azure Databricks, integrate with ADLS Gen2, and coordinate dependencies with Azure Data Factory .
  • Collaborated with cross-functional teams to translate business goals into actionable data strategies, promoting the use of governed, high-quality datasets and self-serve data products through Databricks, Delta Lake, and BI integrations.
  • Integrated Delta Lake as the single source of truth for both analytical and consumption layers, ensuring governance, performance, and scalability for real-time and batch use cases.
  • Collaborated closely with business stakeholders and domain SMEs to understand critical use cases, translating them into optimized Spark-based data pipelines that balance performance, scalability, and governance in Azure Databricks and Delta Lake environments.
  • Designed and implemented Delta Live Tables (DLT) pipelines to streamline data ingestion from raw landing zones to curated Delta Lake layers (bronze → silver → gold).
  • Leveraged DLT with SQL and Scala-Spark APIs to build declarative data pipelines, improving data quality, reliability, and lineage tracking.
  • Modeled datasets in a medallion architecture using Spark within Fabric’s Synapse runtime, ensuring incremental load support, schema enforcement, and optimized read patterns via Delta-like storage.
  • Built end-to-end data pipelines in ADF for orchestrating ingestion from Azure Blob/ADLS into Synapse Spark pools, performing data cleansing and enrichment using Scala Spark notebooks.
  • Utilized Azure Databricks to process and analyze large datasets, extracting valuable insights for informed business decisions .

Software Engineer

JP Morgan
Glasgow, Scotland
2019.09 - 2020.06
  • Working as a Scala Developer for Corporate Technology Research team.
  • Developing and building Platforms/applications with cutting edge technologies in a continuous integration environment.
  • Writing integration tests and selenium automation tests with Serenity framework Behavior Driven Development (BDD), Test Driven Development (TDD), pair programming.
  • Assessed new tools and technologies for Big Data (Hadoop) in Azure to define development and production architectures.
  • Designed, developed, and managed automated workflows using Control-M to ensure timely and accurate data integration, transformation, and processing for end-to-end data pipeline solutions, optimizing data flow and improving data availability for analytical applications.
  • Led and executed the migration of on-premise data assets to Azure Data Lake, ensuring secure and seamless transition while optimizing storage and access costs.
  • Developed a real-time event-driven data pipeline using Apache Kafka and Scala, optimizing low-latency message processing
  • Implemented data ingestion pipelines using Apache Spark (Structured Streaming & Batch) in Scala, ensuring incremental data processing and upserts (MERGE INTO) for real-time analytics.
  • Developed time-travel queries using Delta Lake’s versioning capabilities, allowing efficient historical data retrieval and rollback operations.
  • Designed and built a modern data warehouse on Delta Lake, enabling ACID-compliant, scalable, and high-performance data storage for analytical workloads.
  • Data load/ingestion/transformation using Azure Data Factory Developed time-travel queries using Delta Lake’s versioning capabilities, allowing efficient historical data retrieval and rollback operations.
  • Configured and managed Unity Catalog for centralized metadata management and fine-grained data access permissions
  • Developed ETL pipelines in Databricks integrated with Unity Catalog to ensure consistent metadata governance and lineage tracking.
  • Implemented Slowly Changing Dimension (SCD) Type 2 logic using Scala and Apache Spark, enabling historical tracking of dimensional data changes with efficient Delta Lake MERGE operations.
  • Contributed to enterprise-wide data governance by implementing secure, discoverable data assets aligned with organizational policies
  • Build the microservices using scala cats akka for Consumption Team for the data lake
  • Collaborated closely with business stakeholders and domain SMEs to understand critical use cases, translating them into optimized Spark-based data pipelines that balance performance, scalability, and governance in Azure Databricks and Delta Lake environments.

Data Engineer

Publicis Sapient
2014.09 - 2019.05
  • Produced data validation jobs to analyse quality of processed data at each step of the ETL pipeline.
  • Process Real time data to analyse the stock variation using Kafka streams spark and Scala
  • Designed and implemented real-time data processing pipelines using Apache Spark and Kafka to streamline data flow and optimize system performance
  • Develop web-based system to manage engineering software configuration and produces BI reports.
  • Delivered input to estimates for design, coding, and unit testing tasks in Scala
  • Created large scale distributed data processing systems/applications by utilising Scala Kafka
  • Developed and implemented advanced analytics solutions using Quantexa
  • Utilized Quantexa's dynamic entity resolution and network analytics tools to visualize and understand the intricate relationships between individual customers, businesses, and other relevant entities
  • Played a key role in development of an MVP using Quantexa for a major bank in the UK to solve KYC AML specific problems in trade finance.
  • Designed and implemented scalable Spark pipelines to process multi-terabyte datasets, ensuring optimal performance and integration with the wider Big Data ecosystem
  • Integrated diverse internal and external data sources using Quantexa's platform to create a holistic view of customers.
  • Utilized Azure Databricks to process and analyze large datasets
  • Collaborated closely with business stakeholders and domain SMEs to understand critical use cases, translating them into optimized Spark-based data pipelines that balance performance, scalability, and governance in Azure Databricks and Delta Lake environments.
  • Designed, developed, and managed automated workflows using Control-M to ensure timely and accurate data integration, transformation, and processing for end-to-end data pipeline solutions, optimizing data flow and improving data availability for analytical applications.
  • Designed and implemented a real-time data pipeline using Apache Kafka, Kafka Streams, and Kafka Connect to process and analyze large-scale data.
  • Designed and implemented Kafka-based real-time streaming solutions on a Cloudera-based on-premise platform, ensuring seamless integration with Hadoop ecosystem.
  • Designed and developed highly performant, scalable, and resilient microservices using Java8
  • Data load/ingestion/transformation using Azure Data FactoryDeveloped time-travel queries using Delta Lake’s versioning capabilities, allowing efficient historical data retrieval and rollback operations.
  • Designed and built a modern data warehouse on Delta Lake, enabling ACID-compliant, scalable, and high-performance data storage for analytical workloads.
  • Configured and managed Unity Catalog for centralized metadata management and fine-grained data access permissions
  • Developed ETL pipelines in Databricks integrated with Unity Catalog to ensure consistent metadata governance and lineage tracking.
  • Contributed to enterprise-wide data governance by implementing secure, discoverable data assets aligned with organizational policies
  • Designed scalable domain-driven data architectures by modeling data products using Unity Catalog's catalog/schema/table hierarchy.
  • Defined and enforced data quality rules in Delta pipelines using expectations and validation logic.
  • Contributed to data governance strategy by enabling centralized access policies and auditability across data assets.
  • Designed and implemented scalable Spark-based ETL pipelines to transform raw datasets and load curated data into Snowflake Data Warehouse, enabling efficient analytics and reporting.
  • Integrated Apache Spark (Scala) with Snowflake via the Spark Connector, ensuring secure and optimized data writes using partitioning and bulk load strategies.
  • Built modular, reusable data ingestion and transformation frameworks in Spark (Scala) with Snowflake as the primary sink for structured data.
  • Tuned Snowflake queries and clustering strategies to enhance query performance, minimize storage cost, and support large-scale data analytics.
  • Engineered CDC pipelines using Spark Structured Streaming to ingest incremental changes into Snowflake for near real-time reporting and compliance.
  • Modeled Snowflake schemas (star/snowflake) to support enterprise-level financial and operational reporting, while enforcing data governance and lineage tracking.
  • Migrated legacy data marts to Snowflake, transforming on-prem ETL logic into Py-Spark jobs orchestrated via Airflow/ADF and integrated with Snowflake.
  • Containerized all key components (py-Spark jobs, Kafka, databases, mock APIs) using Docker Compose to bootstrap a local development environment for end-to-end pipeline testing on local machine.
  • Developed Spark Structured Streaming jobs in Scala to consume high-volume data from Azure Event Hubs, applying transformations, enrichment, and schema validations before landing into the bronze layer (Delta Lake).
  • Leveraged Docker to simulate cloud-native Spark and streaming infrastructure locally, enabling faster integration testing cycles for data pipelines involving Azure/AWS, Databricks, Kafka, and Snowflake.
  • Developed and maintained Kubernetes manifests (YAML) for deploying Spark batch and streaming jobs in containerized environments to replicate production-like orchestration.
  • Built reusable test harnesses to validate data transformations and lineage using ScalaTest and Docker-based test suites, ensuring Spark jobs handled edge cases before production deployment.
  • Worked closely with business and technology stakeholders to support seamless onboarding and integration of data assets into the SST (Standardized Semantic Tier) model, ensuring consistency, governance, and business alignment.
  • Worked with compliance and legal teams to embed regulatory requirements into data pipelines, including audit logging, data retention rules, and lineage tracking using Delta Lake and Unity Catalog.
  • Designed and implemented scalable ingestion frameworks to onboard structured and semi-structured data into Snowflake using Spark (python/scala), Kafka, and RESTful APIs using Akka, ensuring schema conformance and lineage tracking.
  • Developed modular and reusable Spark-Python pipelines that integrated with external APIs to fetch and ingest data into Snowflake's SST-aligned schemas for unified reporting and analytics.
  • Engineered robust ingestion pipelines with error handling, validation layers, and metadata tagging to ensure traceability and support audit and compliance requirements within Snowflake.
  • Designed APIs and ingestion interfaces to allow client applications and external systems to push data into the Snowflake-based SST ecosystem in a secure, scalable, and standardized format.
  • Ensured consistent enforcement of data contracts and semantic validations at the ingestion layer, leveraging pyspark-Spark, Snowflake's native features (Streams, Tasks, Procedures), and metadata repositories.
  • Built scalable batch data pipelines using PySpark in Azure, orchestrated via Azure Data Factory, to ingest and transform raw data into curated Delta Lake layers on ADLS Gen2, enabling a robust lakehouse architecture.
  • Leveraged Microsoft Fabric to centralize storage (OneLake), govern metadata, and simplify pipeline development across teams using Synapse Data Engineering and Data Factory experiences.
  • Modeled datasets in a medallion architecture using Spark within Fabric’s Synapse runtime, ensuring incremental load support, schema enforcement, and optimized read patterns via Delta-like storage.
  • Designed and implemented modern Azure data platforms combining Microsoft Fabric, Azure Synapse, and ADF, with Spark-based transformations for lakehouse architecture (Bronze → Silver → Gold).
  • Contributed to the consumption team by developing resilient Scala microservices using Cats and Akka, exposing APIs that allow downstream systems to query curated datasets directly from the Delta Lake with low latency and strict contract validation.
  • Designed stateful Flink jobs using KeyedProcessFunction, WindowFunction, and BroadcastState to implement dynamic business rules, fraud detection, and time-based alerts.
  • Tuned Flink jobs for performance and fault tolerance using checkpointing, restart strategies, backpressure handling, and operator state management.

Software Engineer

Genpact Capital Markets
2013.05 - 2014.08
  • Worked as a backend developer on platform engineering initiatives for Western Asset Management Bank, using Java 8 and Kotlin.
  • Built and maintained scalable APIs and backend services to support trading and investment workflows.
  • Contributed to the development of RESTful services and internal admin tools with a focus on modular design and testability.
  • Technologies used: Kotlin, Java 8, RESTful services, JSON, Jenkins, Git, JIRA, IntelliJ, Agile (Scrum).

Senior Java Developer

Cetpa Infotech pvt ltd
2009.12 - 2013.04
  • Developed backend components and RESTful APIs for critical banking products (e.g., customer onboarding, loan processing).
  • Participated in Agile ceremonies, sprint planning, and backlog grooming to ensure alignment with business goals.
  • Designed and delivered scalable microservices using Java 8 and Kotlin, with a focus on JVM-based functional programming patterns (e.g., immutability, higher-order functions).
  • Create and maintain REST APIs.
  • Perform project work estimation.
  • Technologies used: Kotlin, Java 8, RESTful services, JSON, Spring, Hibernate, Apache CXF, Maven, Git, Jenkins, JIRA, Mockito, Agile/ScrumHttp server, eclipse, JBehave, Selenium, Mockito, Jenkins, Agile Continuous Integration, JIRA for bug tracking.

Education

BTECH - Electronics and Communication

U.P.T.U University
India

Skills

  • Having 15 years of experience in developing applications with Scala, Java, technologies
  • Experience in Test Driven Development (TDD), BDD, pair programming, NonSQL databases, Docker and understanding of Kubernetes
  • Experience in Crafting FP based algorithms to build Storage Api to store Data in Cloud based Data Lake
  • Experience in functional programming using Scala and fp library Cats ZIO
  • Experience in NoSql like HBASE and designing the Time series data Schema
  • Credited for designing data transformations of Messages using Akka
  • Experience in Automation testing includes BDD, Cucumber
  • Experience in JUnit and Mockito
  • Experience in Maven, Ant and Jenkins and have used vagrant and Docker
  • Experience using software development collaboration tools Atlassian Confluence, JIRA, Fisheye, Crucible Code Review, GitHub, Bit Bucket, Gemini
  • Experienced Scala Data Engineer/Streaming Engineer with a strong background in real-time data streaming, cloud platforms, and scalable data architectures
  • Hands-on expertise in Kafka, KStreams, and Spark Streaming, delivering high-throughput event-driven applications
  • Skilled in Azure Databricks with hands-on experience in Spark (python), Delta Lake, and Unity Catalog for secure, scalable data pipelines and centralized governance
  • Proficient in modern cloud platforms (AWS, Azure) and data warehousing solutions such as Snowflake and Databricks
  • Experience with version control tools like Git, CVS, SVN, VSS
  • Sound knowledge of all aspects of software development life cycle which involves analysis, application design, development, testing, and deployment using Agile, Scrum and Kanban SDLC methodology
  • Familiar with Design Patterns
  • Experience in RDBMS like Oracle and MySQL, schema design and SQL, Tools: TOAD, Navicat, SQL Developer
  • Familiar with PL/SQL, Views, Stored Procedures, Functions, Triggers
  • Experience of Azure
  • Utilized Azure Databricks to process and analyze large datasets, extracting valuable insights for informed business decisions
  • Experience of Apache Spark 20 and Hadoop and Kafka
  • Certified Scala Data Engineer through Quantexa
  • Proficient in leveraging Apache Spark for large-scale data processing, with hands-on experience in optimizing Spark jobs, RDD/Data Frame/Dataset transformations, and integrating with Hadoop ecosystem tools like HDFS and Hive;
  • Experienced in designing and developing real-time streaming applications using Apache Kafka Spark and Scala
  • Designed and optimized stateful stream processing with Kafka Streams RocksDB for high-performance distributed state management
  • Skilled in Azure Databricks with hands-on experience in Spark (Scala), Delta Lake, and Unity Catalog for secure, scalable data pipelines and centralized governance
  • Containerized all key components (Spark jobs, Kafka, Cloud-databases, mock APIs) using Docker Compose to bootstrap a local development environment for end-to-end pipeline testing on local
  • Leveraged Docker to simulate cloud-native Spark infrastructure locally, enabling faster integration testing cycles for data pipelines involving Azure/AWS, Databricks, Kafka, and Snowflake
  • Developed and orchestrated end-to-end data pipelines using Apache Airflow, ensuring reliable and automated workflow execution
  • Developed and maintained Kubernetes manifests (YAML) for deploying Spark batch and streaming jobs in containerized environments to replicate production-like orchestration
  • Built reusable test harnesses to validate data transformations and lineage using ScalaTest and Docker-based test suites, ensuring Spark jobs handled edge cases before production deployment

Certification

  • 2010 Sun Certified Java Programmer Sun Prometric Centre
  • 2023 Scala Quantexa (AML)Certification Quantexa, London
  • Qunatexa Certified Scala Data Enginner(https://www.credly.com/badges/a8645bd1-34bb-4575-820c-958f4ec5657d/linked_in_profile)

Career Summary

  • Dec 2023 – 31 March2025 Santander through Lorien Scala Data Engineer
  • Dec 2022 –Aug 2023 HSBC through Synechron Scala Data Engineer
  • May 2022 – Nov 2022 BAML through Huxley Scala Data Engineer
  • Aug 2021 – April 2022 HSBC through TEKsystems Scala Data Engineer
  • June 2021 – Aug 2021 MS through Resource solutions Scala Data Engineer
  • Nov 2020 - April 2021 Citi through LTI Scala Data Engineer
  • Sept 2019 – June 2020 JPMorgan through Mphasis Scala Data Engineer
  • Sept 2014 – May 2019 Sapient Software Engineer
  • May 2013 – Aug 2016 Headstrong Software Engineer
  • Dec 2009 – April2013 Cetpa Infotech Software Engineer

Timeline

JVM/Streaming-DataEnginner

IBM
2025.07 - 2026.03

Scala Quantexa AWS Data Enginner

SantanderBank
2023.12 - 2025.03

Quantexa Data Enginner

Synchron(HSBC)
2022.12 - 2023.08

Scala Developer

BAML
2022.05 - 2022.11

Scala Data Enginner Software Engineer

HSBC
2021.08 - 2022.04

Scala Data Enginner

Morgan Stanley
2021.06 - 2021.08

Software Engineer

Citibank
2020.11 - 2021.04

Software Engineer

JP Morgan
2019.09 - 2020.06

Data Engineer

Publicis Sapient
2014.09 - 2019.05

Software Engineer

Genpact Capital Markets
2013.05 - 2014.08

Senior Java Developer

Cetpa Infotech pvt ltd
2009.12 - 2013.04

BTECH - Electronics and Communication

U.P.T.U University
Prem Sharma