Over 4+ years of experience as a data engineer in analysis, design, development, and implementation and knowledge in the Azure data engineer.
Company:- EPAM
Role:- Azure Data Engineer
Client:- Citibank
Duration:- June 2022 to till date
Description:- As requirement of my work I have to conduct financial analysis, valuation, and risk assessment, to support the team in making informed decisions for clients. Trained in various financial tools and software to analyze the impact on market and client securities and act accordingly.
Responsibilities:-
• Followed the SDLC process, including requirements gathering, design, development, testing, deployment, and maintenance.
• Have good experience working with Azure Bloband Azure data lake storage and loading data into Azure SQL Synapse Analytics (DW).
• Worked on creating Data Lake Analytics account and creating Date Lake Analytics Job in Azure Portal using SQL Script
• Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
• Utilized version control systems like Git and source code management best practices for collaborative development.
• Collaborated closely with cross-functional teams including data scientists, data analysts, and business stakeholders, ensuring alignment with data requirements and delivering scalable and reliable data solutions.
• Used Azure DevOps for CI/CD (Continuous Integration and Continuous deployment) and Azure repos for version controlling.
• Developed ETL pipelines in and out of Data Warehouse using a combination of Python and Snow SQL.
• Implemented Big Query data processing in Big Query on the GCP Pub/Sub theme, using Python'scloud data streams and using Python's Rest API to load data into Big Query from other systems.
• Developed and maintained efficient ETL/ELTprocesses using SQL and T-SQLscripts to transform and cleanse data in Azure Synapse Analytics.
• Worked on python scripting to automate generation of scripts. Data curation done using azure data bricks.
• Wrote Scala code for Spark applications, as Scalais the native language for Spark and provides better performance for certain operations.
• Worked with Azure DevOps for continuous integration, delivery, and deployment of Sparkapplications.
• Created several Data Bricks Spark jobs with PySpark to perform several tables to table operations.
• Developed UNIX scripts to automate different tasks involved as part of the loading process and worked on Tableau software for reporting needs.
• Managed and tracked all project documentation through JIRA and Confluence software development tools.
• Experienced in Agile methodology involved in bi-weekly sprints, daily scrummeetings with backlogs and story points.
Environment: Azure Blob, Azure data lake, Azure SQL Synapse Analytics (DW), Data Bricks, Spark, Scala, Data Lake Analytics, Azure Data Factory, T-SQL, Spark SQL, Data Analysts, Azure CI/CD, ETL, Python, Snow SQL,Rest API, TL, ELT, SQL, T SQL, Unix, Tableau, Jira, Confluence, Agile, Scrum.
Company:- Quantiphi
Role:- Data Engineer
Client:- Fiserv Duration:- March 2020 to June 2022
Description:- This project involved development of warehouse for Client business. It involves extracting data from Legacy applications, creating text extracts and loading them to staging. Data was further cleaned and loaded in corresponding dimensions and facts. The warehouse incremental load continues still and business extensively using it for data analysis and reporting.
Responsibilities:-
· Developed a data pipeline and used Azure stack components such as Azure Data Factory, Azure Data Lake, Azure Data Bricks, Azure Synapse analytics, and Azure Key Vault for analytics.
· Developed strategies for handling large datasets using partitions, Spark SQL, broadcast joins and performance tuning.
· Extract Transform and Load [ETL] data from Source Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics then running scripts in Data Bricks.
· Created data ingestion systems to pull data from traditional RDBMS platforms and store it in NoSQLdatabases such as MongoDB.
· Involved in loading data from the Linuxfile system to Hadoop Distributed File System (HDFS) and setting up HIVE, PIG, HBASE, and SQOOP on Linux/SolarisOperating System.
· Developed and enhanced Snowflake tables, views, and schemas to enable effective data retrieval and storage for reporting and analytics requirements.
· Optimized Python code and SQLqueries, created tables/views, and wrote custom queries and Hive-based exception processes.
· Created Hive tables as per requirements, internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
· Implemented continuous integration and deployment (CI/CD) pipelines through Jenkins to automate Hadoopjob deployment and managed Hadoop clusters with Cloudera.
· Built streaming ETL pipelines using Spark streaming to extract data from various sources, transform it in real-time, and load it into a data warehouse Snowflake.
· Designed and built Spark/PySpark-based Extract Transformation Loading (ETL) pipelines for migration of credit card transactions, account, and customer data into enterprise Hadoop Data Lake
· Writing complex PL/SQL queries and procedures to extract, transform, and load data from various sources, ensuring data accuracy and completeness.
· Used Spark and Spark-SQLto read the parquet data and create the tables in hive using the Scala API.
· Utilized JIRA to manage project issues and workflow.
· Exposed to all aspects of software development life cycle (SDLC) like Analysis, Planning, Developing, Testing, implementing and post-production analysis of the projects. Worked through Waterfall, Scrum/Agile Methodologies.
· Created UNIX shell scripts to load data from flat files into Oracle tables.
· Environment: Spark, Spark SQL, PL SQL, HDFS, Kafka, Sqoop, Waterfall, Scrum, Agile, Snowflake, Hadoop, CI/CD, ETL, Cloudera, Linux, NO SQL, T SQL, Mongo DB.