Detail-orientated and thorough individual with strong problem solving and critical thinking skills. Committed to creating secure network architecture and developing solutions to limit access to protected data and programmes. Monitors computer virus reports and regularly updates virus protection systems.
Overview
8
8
years of professional experience
1
1
Certification
Work history
Data Engineer
RICHO DIGITAL SERVICE
London
10.2022 - 08.2023
Migrating data and constructing ETL (Extract, Transform, Load) workflows to extract information from on-premises sources, third-party APIs, and other platforms into a Synapse workspace, utilizing Python, PySpark, and SQL
Create, build, and sustain scalable, automated, and informative data tables that serve as the core input for models, reports, and dashboards
Designed best practices to facilitate the seamless automation of data ingestion and data pipeline workflows in Azure Data factory, and Databricks to ensure continuous operation
Evaluate and maintain the workflow and enhance the effectiveness of data pipelines responsible for handling more than 50 terabytes of data daily
Engage with business stakeholders to understand business needs and translate the business needs into actionable reports
Key Achievement Implemented automated ETL (extract, transform, load) procedures, simplifying data manipulation, and cutting processing time by up to 50%
Enhanced the efficiency of current ETL processes and SQL queries to optimize the weekly business report's performance.
Data Engineer
Booking.com
London
05.2021 - 10.2022
Constructed a data pipeline that migrated and processed transactional data from on-premises MySQL database into Azure using Synapse Analytics by incorporating 10 million rows of records which reduce manual workload by 30% monthly
Maintained data pipeline up-time of 97% while ingesting daily transactional data and evaluate the workflow and increase the efficiency of data pipelines that process over 10 TB of data daily
Used Databricks notebook to Create tables, partitioning tables, Join conditions, correlated sub queries, nested queries, views for the business application development
Utilized PySpark to distribute data processing on large datasets to improve ingestion and processing of data by 70%
Engage with business stakeholder to understand business needs and translate the business needs into actionable reports.
Data Engineer
Department of Health and Social Care
London
09.2020 - 04.2021
Created pipelines in Azure Data Factory using linked services/datasets/pipeline/ to extract, transform and load data from different sources like Blob storage, Azure SQL Data Warehouse
Developed spark applications using pyspark and spark-sql for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights in Covid-19 data
Designed analytics dashboard using PowerBI for showing real time updates Test modelling
Solution utilized PowerBI, enterprise gateway and Azure SQL Data Warehouse
Used Databricks, Jupyter notebooks and spark-shell to develop, test and analyze spark jobs before scheduling customized spark jobs
Deployed and tested (ci/cd) our developed code using visual studio team services
Worked in agile development environment in sprint cycles of two weeks by dividing and organizing tasks
Used Azure DevOps for a SaaS based code repository and also use it for tracking our agile/scrum workflows in our application development.
Data Engineer
Biffa
London
12.2018 - 09.2020
Maintained and created data flows using Data Factory, Stream Analytics, Data Lake and HD Insight
Used spark in HD Insight to process the steaming and data flows and stored the processed data in Data Lake
Created dashboards to analyze and view different relationships in the data using Microsoft Power BI
Improved and did bug fixes on already created pipelines and flows on Azure HDInsight
Data warehouse using Azure SQL Data Warehouse
Created and managed Kafka, hadoop and spark clusters in HDInsight
Use Azure Databricks to aggregate data, create a data warehouse and deploy work from notebook into production
Create machine learning models using sklearn machine learning library to predict the amount of waste using random forest and linear regression
Use Python for data wrangling e.g
Pandas for merging, pivoting/spreading, melting/gathering, etc., data into DataFrames
Created Data manipulation, analysis and visualization using Python (pandas, matplotlib).
Data Engineer
DELIVEROO
London
07.2016 - 12.2018
Used AWS kinesis to ingest the data into amazon S3
Created and maintained the AWS data pipeline using Kinesis, EMR and Amazon S3
Processed and analyzed the stream and batch data using Spark in EMR
Processed the unstructured and semi-structured data using EMR
Used AWS Glue to efficiently load and prepare the data for analytics
Created dashboards for analyzing the streaming data using Configured Kinesis agent for kinesis streaming and Kinesis Firehose
Managed Amazon IAM roles and added different policies to the roles according to the requirement
Server-less architecture using AWS Lambda with Amazon S3 and Amazon Redshift DB.
Hadoop developer
Pairview
London
04.2015 - 07.2016
Developed data ingestion system using spark streaming and kafka
Created Kafka cluster on Confluent Cloud and produce data to Kafka topics on the cluster
Created spark applications using Scala for batch processing of the data and deployed the application on Cloudera cluster
Loaded data from databases using Sqoop into hive, Hbase and HDFS
Developed oozie workflow to analyze and solve Big Data problems of the client
Developed map-reduce applications using hive and pig in order to solve the big data related problems of the client
Managed different databases including RDBMS like SQL Server, MySql and NoSQL databases like Cassandra and MongoDB
Loaded semi-structured data into Hive and created Hive tables using Hiveql.
Education
BSC. (Honours) computing science -
Staffordshire University
Btec national diploma Applied science - undefined
Lambeth college
Skills
Spark
PySpark
Scala
Python
SQL
Hadoop
Hive
MapReduce
Pig
Sqoop
Cloudera
HDP
S3
EMR
EC2
Kinesis
Elastic Search
HDInsight
DataLake
Databricks
Cloud Storage
DataProc
Pub/Sub
SQL Server
MySql
MS Access
Microsoft SQL Server
MongoDb
Cassandra
Tableau
Gliffy
Adobe Photoshop
JIRA
PowerBI
Certification
SAP Certified Application Associate- Business Intelligence with SAP NetWeaver
Senior Manager at Zinnia Digital Service (Formally Known As SE2 Digital Service)Senior Manager at Zinnia Digital Service (Formally Known As SE2 Digital Service)
Manager-Procurement, Contract & Vendor Management at Zinnia Digital (fka-SE2 Digital Service LLP)Manager-Procurement, Contract & Vendor Management at Zinnia Digital (fka-SE2 Digital Service LLP)