Summary
Overview
Work history
Education
Skills
Timeline
Generic

Amarjeet Mishra

Slough,Berkshire

Summary

PROFILE SUMMARY I am a GCP certified Professional Data Engineer, Machine Learning & Deep Learning enthusiast, and entrepreneur with 4 years of early stage business experience with over 6+ years of industry experience across Big Data Platforms including cloud platforms.
• Design architecture, development & deployment
• Extensive experience in multiple life cycle development projects including gathering business requirements, scope definition, analysis of source systems, and design data strategies for both transactional & analytical systems.
• Architecting & Modeling Data Integrity

• Hands-on experience on major components in AWS like s3,Lambda,Glue,Redshift,Athena, Sagemaker , GCP cloud storage, Dataproc,BigQuery & Hadoop like Spark, HDFS, HIVE, HBase, Zookeeper, Sqoop, Oozie, Flume, as well as Spark, Kafka, Python. Develop scalable and reliable data solutions to move data across systems from multiple sources in real time as well as batch modes.

Overview

11
11
years of professional experience

Work history

Lead Data Engineer

Insight International (UK) Ltd
London
05.2022 - Current
  • Gather requirements from various Stakeholders like Finance, Risk Management for Lloyds Bank (Client)
  • Knowledge of all phases of Agile with a good understanding of System Study, Design, Client Interaction, Coordination, Development, and Implementation of data product build projects.
  • Processing Big Data using tools like Hadoop, GCP, Spark & various big data tools.
  • Actively involved in Group Data Model with a holistic view of Data.
  • Data cataloging & Lineage using Collibra & manual methods.
  • Ensure the movement of Data with required transformations between different Layers in EDH & GCP from various sources.
  • Active participant in the architectural team for the design decisions
  • Parallelization to implement optimizations in Spark nodes to boost the efficiency of ETL/ELT tasks in the Hadoop ecosystem.
  • Deep knowledge in incremental imports and partitioning and bucketing concepts in Hive and Spark SQL needed for optimization
  • Created Hive tables with static & dynamic partitioning strategy & processed data using HQL & Scala-Spark program
  • Production of synthetic data, ingesting data from files into tables, Processing of data for data products built using Scala-based framework using Dataproc cluster (Hive tables),
  • Professional experience in using Python & PYSPARK.
  • Creating BigQuery tables & migration of data from hive to BQ
  • Set up CI-CD pipeline using Jenkins & UCD for automating the deployment process in higher environments.
  • Implemented various automation processes to reduce the manual job by 80% using NLP & built-in modules in Python.
  • Built data product from batch data by analyzing data from scratch.

Big Data Engineer- Trainee

ITC
London
01.2022 - 05.2022
  • Created AWS Cloud Formation templates to create infrastructure in the cloud
  • Populated a Data Lake using AWS Kinesis from various data sources such as S3
  • Processed data stored in S3 using AWS Lambda, Glue, Redshift and AWS Athena
  • Developed ETL jobs in AWS Glue to extract data from S3 buckets and load it into the data mart in Amazon Redshift.Authored AWS Lambda functions to run Python scripts in response to events in S3
  • Used Amazon EMR for processing Big Data implementing tools like Hadoop, Spark, and Hive.Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets
  • Implemented optimizations in Spark nodes and improved the performance of the Spark Cluster
  • Orchestrated workflows in Apache Airflow to run ETL pipelines using tools in AWS
  • Worked with AWS Lambda functions for event-driven processing using AWS boto3 module in Python
  • Used Spark, Spark SQL, and Spark Streaming for data analysis and processing. Implemented Spark using Scala and SparkSQL for faster testing and processing of data
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers
  • Kafka cluster that used a schema to send structured data via micro-batching.

Client Interface Manager

ProPhoenixsoft Pvt Ltd
Bengaluru
01.2017 - 01.2020
  • Driving Microsoft, Google, Dell, HP, Lenovo, Vodafone‘s Business strategy as part of the One Commercial Partner Organization
  • Working in a defined customer territory across IT and NON-IT clients & Aligning with respective team to drive territory sales
  • Exploring the Data generated from Microsoft Data Team & visualizing using Python Streamlit & Tableau
  • Using ML models targeting the potential customers & pass it to the Lead-gen Team
  • Actively involved in developing a Fintech App called Paypro & a replica of Tiktok called ALAP.

Software Engineer

Reverie Language Technologies
Bengaluru
01.2015 - 01.2017
  • Created EC2 instances and auto-scaling. Designed and developed ETL jobs to extract data from AWS S3 and load it in Amazon Redshift
  • Maintaining the database & Loading tables from MySQL database
  • Performed exploratory data analysis in Python using Pandas
  • SQL queries to get the revenue generating customers, response rate & other requirements
  • Building Webapps using streamlit & productionization using Heroku
  • Actively involved in finalizing requirement of clients, solution designing, coordinating with Testing & QA team & deploying and interacting with clients & resolving their issues

Operational Manager

Khlonitrix soft pvt Ltd
Hyderabad
01.2013 - 01.2014
  • Installed and configured software
  • Designed and managed software projects & websites for clients.

Education

MSc - Data Science & Analytics

Royal Holloway, University of London

Bachelors - Computer Science and Engineering

Gandhi Engineering College
2012

Skills

  • BIG DATA PLATFORMS:Hadoop,AWS,GCP
  • DATA PIPELINES/ETL:Flume, Spark, Kafka, Hive, Spark Streaming, Spark SQL, Data Frames, Kinesis, Spark, Spark Streaming, Spark Structured Streaming,DataProc,DataFlow
  • DATA VISUALIZATION: Tableau
  • DATABASES AND DATA WAREHOUSES:,Amazon Redshift, DynamoDB, MongoDB,MySQL, Hive,BigQuery
  • DATA STORES : Data Lake, HDFS, S3,GCS
  • CLOUD COMPONENTS:AWS IAM Formation,Redshift, AWS EMR, AWS S3, EC2, AWS Lambda, AWS Kinesis, GCS,BigQuery,Dataflow,DataProc
  • SOFTWARE DEVELOPMENT: Data Pipelining, Sprint planning, ETL processes, Spark, Python,Hive, PySpark, Keras, SQL, Shell Script,scala,Machine learning
  • DEVELOPMENT TOOLS,CICD: Git,Agile, Scrum,Jenkins,UCD
  • Scheduler Tool: oozie, Airflow,Cloud Composer
  • Methodologies : Agile
  • Problem solving
  • Analytical modelling

Timeline

Lead Data Engineer

Insight International (UK) Ltd
05.2022 - Current

Big Data Engineer- Trainee

ITC
01.2022 - 05.2022

Client Interface Manager

ProPhoenixsoft Pvt Ltd
01.2017 - 01.2020

Software Engineer

Reverie Language Technologies
01.2015 - 01.2017

Operational Manager

Khlonitrix soft pvt Ltd
01.2013 - 01.2014

MSc - Data Science & Analytics

Royal Holloway, University of London

Bachelors - Computer Science and Engineering

Gandhi Engineering College
Amarjeet Mishra