Summary
Overview
Work History
Skills
Contactno
Timeline
Generic

Praveen Kumar

Summary

Over 4+ years of experience as a data engineer in analysis, design, development, and implementation and knowledge in the Azure data engineer.

  • Experience with the full software development life cycle SDLC for projects employing agile and waterfall techniques.
  • Gained experience at using Azure services such as Azure Data Factory, Azure Databricks, Azure Synapse analyses, and others to develop reliable data pipelines, carry out advanced data analyses, and support data-driven decision-making.
  • Experience in Data Engineer, Azure Data Bricks, Azure Data Engineer, Azure Data Lake, Azure Data factory, Azure Cosmo DB, Azure Synapse.
  • Good experience on Hadoop, Scala, Spark, SQL, Python, Hive, ETL, Big Data, Pig, PySpark, Snowflake, Map reduce. Experience on Migrating SQL database to Azure data lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and migrating on premise databases to Azure Data Lake store using Azure Data Factory.
  • Have experience working with Azure BLOB and Data Lake Storage Gen2 and loading data into Azure SQL Synapse Analytics (DW).
  • Created and uphold a contemporary data engineering practice in cooperation with partners, IT stakeholders, and business stakeholders.
  • Implemented complex, strongly typed Spark workloads in Azure Data bricks along with dependency management and Git integration.
  • Experience in Database Design and development with Business Intelligence using SQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP Cubes, Star Schema and Snowflake Schema.
  • Solid experience in data mining, data cleaning, data munging, and machine learning utilizing Python, SQL, Microsoft Excel, Hive, Py Spark, and Spark SQL.
  • Experience in SSIS packages to extract, transform, and load (ETL) data from a variety of sources into data marts.
  • Experience in computing massive amounts of data using Databricks and Azure Data Factory (ADF).
  • Created Py spark scripts to automate the process of file validation in Databricks.
  • Extensive experience in T-SQL programming in creating and using Stored Procedures, Triggers, Functions, and complex queries on various versions of SQL server.
  • Skilled in handling, configuring, and managing databases such as MySQL and NoSQL Databases. Hands-on use of Spark and Scala API to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames in Scala. Experience with data pipeline building, backend micro service development, and REST API using Python, and Scala.
  • Strong knowledge in working with ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools like Azure SQL Data warehouse.
  • Proficient in programming languages like Python and SQL for data manipulation, scripting, and analysis.
  • Skilled in software development methodologies like Agile/Scrum and Waterfall.
  • Coordinated tasks with other developers on the team to meet deadlines.
  • Responsible to complete and update the status of Jira tickets assigned to me on time
  • . Good experience in performance optimization for large-scale data solutions, data warehousing, and data modeling.
  • Strong experience with UNIX/LINUX environments and shell scripts.

Overview

4
4
years of professional experience

Work History

Azure Data Engineer

EPAM
06.2022 - Current

Company:- EPAM

Role:- Azure Data Engineer

Client:- Citibank

Duration:- June 2022 to till date

Description:- As requirement of my work I have to conduct financial analysis, valuation, and risk assessment, to support the team in making informed decisions for clients. Trained in various financial tools and software to analyze the impact on market and client securities and act accordingly.

Responsibilities:-

• Followed the SDLC process, including requirements gathering, design, development, testing, deployment, and maintenance.

• Have good experience working with Azure Bloband Azure data lake storage and loading data into Azure SQL Synapse Analytics (DW).

• Worked on creating Data Lake Analytics account and creating Date Lake Analytics Job in Azure Portal using SQL Script

• Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.

• Utilized version control systems like Git and source code management best practices for collaborative development.

• Collaborated closely with cross-functional teams including data scientists, data analysts, and business stakeholders, ensuring alignment with data requirements and delivering scalable and reliable data solutions.

• Used Azure DevOps for CI/CD (Continuous Integration and Continuous deployment) and Azure repos for version controlling.

• Developed ETL pipelines in and out of Data Warehouse using a combination of Python and Snow SQL.

• Implemented Big Query data processing in Big Query on the GCP Pub/Sub theme, using Python'scloud data streams and using Python's Rest API to load data into Big Query from other systems.

• Developed and maintained efficient ETL/ELTprocesses using SQL and T-SQLscripts to transform and cleanse data in Azure Synapse Analytics.

• Worked on python scripting to automate generation of scripts. Data curation done using azure data bricks.

• Wrote Scala code for Spark applications, as Scalais the native language for Spark and provides better performance for certain operations.

• Worked with Azure DevOps for continuous integration, delivery, and deployment of Sparkapplications.

• Created several Data Bricks Spark jobs with PySpark to perform several tables to table operations.

• Developed UNIX scripts to automate different tasks involved as part of the loading process and worked on Tableau software for reporting needs.

• Managed and tracked all project documentation through JIRA and Confluence software development tools.

• Experienced in Agile methodology involved in bi-weekly sprints, daily scrummeetings with backlogs and story points.

Environment: Azure Blob, Azure data lake, Azure SQL Synapse Analytics (DW), Data Bricks, Spark, Scala, Data Lake Analytics, Azure Data Factory, T-SQL, Spark SQL, Data Analysts, Azure CI/CD, ETL, Python, Snow SQL,Rest API, TL, ELT, SQL, T SQL, Unix, Tableau, Jira, Confluence, Agile, Scrum.

Data Engineer

Quantiphi
03.2020 - 06.2022

Company:- Quantiphi

Role:- Data Engineer

Client:- Fiserv Duration:- March 2020 to June 2022

Description:- This project involved development of warehouse for Client business. It involves extracting data from Legacy applications, creating text extracts and loading them to staging. Data was further cleaned and loaded in corresponding dimensions and facts. The warehouse incremental load continues still and business extensively using it for data analysis and reporting.

Responsibilities:-

· Developed a data pipeline and used Azure stack components such as Azure Data Factory, Azure Data Lake, Azure Data Bricks, Azure Synapse analytics, and Azure Key Vault for analytics.

· Developed strategies for handling large datasets using partitions, Spark SQL, broadcast joins and performance tuning.

· Extract Transform and Load [ETL] data from Source Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics then running scripts in Data Bricks.

· Created data ingestion systems to pull data from traditional RDBMS platforms and store it in NoSQLdatabases such as MongoDB.

· Involved in loading data from the Linuxfile system to Hadoop Distributed File System (HDFS) and setting up HIVE, PIG, HBASE, and SQOOP on Linux/SolarisOperating System.

· Developed and enhanced Snowflake tables, views, and schemas to enable effective data retrieval and storage for reporting and analytics requirements.

· Optimized Python code and SQLqueries, created tables/views, and wrote custom queries and Hive-based exception processes.

· Created Hive tables as per requirements, internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.

· Implemented continuous integration and deployment (CI/CD) pipelines through Jenkins to automate Hadoopjob deployment and managed Hadoop clusters with Cloudera.

· Built streaming ETL pipelines using Spark streaming to extract data from various sources, transform it in real-time, and load it into a data warehouse Snowflake.

· Designed and built Spark/PySpark-based Extract Transformation Loading (ETL) pipelines for migration of credit card transactions, account, and customer data into enterprise Hadoop Data Lake

· Writing complex PL/SQL queries and procedures to extract, transform, and load data from various sources, ensuring data accuracy and completeness.

· Used Spark and Spark-SQLto read the parquet data and create the tables in hive using the Scala API.

· Utilized JIRA to manage project issues and workflow.

· Exposed to all aspects of software development life cycle (SDLC) like Analysis, Planning, Developing, Testing, implementing and post-production analysis of the projects. Worked through Waterfall, Scrum/Agile Methodologies.

· Created UNIX shell scripts to load data from flat files into Oracle tables.

· Environment: Spark, Spark SQL, PL SQL, HDFS, Kafka, Sqoop, Waterfall, Scrum, Agile, Snowflake, Hadoop, CI/CD, ETL, Cloudera, Linux, NO SQL, T SQL, Mongo DB.

Skills

  • Azure Cloud: Azure Data factory, Azure Data Lake, Azure Databricks, Azure SQL Database, Azure Synapse Analytics, Active Directory, Azure Monitoring, Azure Search, Azure Event Hub, Key Vault, Azure Analysis services, Spark, Azure Stream Analytics, Azure Storage, Azure Analysis Services
  • Azure Synapse Analytics, PL/SQL, REST API
  • ETL Tools: Azure Data Factory (ADF), SSIS
  • Databases: Cosmos DB, Azure SQL Database, Azure SQL Data Warehouse, DB2
  • Scripting Languages: Python, Scala, Shell Scripting, SQL, T-SQL
  • HDFS, Cloudera, YARN, Hive, HBase, Sqoop, Flume, Kafka, Impala, Spark, Python and Scala
  • Cloud: Azure DevOps
  • Ticketing tools: ServiceNow, Salesforce, Jira
  • Operating System: Linux, Windows
  • Certifications: Azure Data Engineer Associate

Contactno

+44 7770969088

Timeline

Azure Data Engineer

EPAM
06.2022 - Current

Data Engineer

Quantiphi
03.2020 - 06.2022
Praveen Kumar