Summary

Work history

Education

Skills

Timeline

Sri Gouri

Summary

Data Engineer with 8 years of experience in Analysis, Design, Development in Big Data technologies like Spark, MapReduce, Hive, Yarn, HDFS and Azure Cloud technologies. Experience in Extraction, Transformation, and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting and aggregating. Experience in designing and implementing AZURE Cloud services like Azure Data Factory, Azure Data Lake, Azure Synapse Analytics, Azure Devops and Azure DataBricks for Big Data solutions. Experience in Data Visualization and Analytics using Power BI. Good knowledge of data architecture including data ingestion pipeline design, Hadoop/Spark architecture, Data Modelling, Machine Learning, and advanced data processing.

Work history

Data Engineer

British Telecom

04.2018 - 07.2021

Responsibilities:

Involved in Monitoring, Maintaining and Reporting Cloudera Hadoop Cluster
Responsible for data services and data movement infrastructures
Processing Millions of JSON, AVRO and PARQUET events from various probes(sources) in to target HDFS Data Lake for real time analysis
Creating secured pipeline for different protocols like PROXY, DHCP, NETFLOW, DNS, etc.,
Developed Ingestion and Enrichment solution using Apache Spark - Kafka Streaming using Python
Transforming data using in-house developed API in Spark Ingestion solution and Enrichment Lookups in to HBASE
Implemented Kafka offset management using Direct Stream API
Implementing Partitioning and Bucketing concepts in Hive on External tables to optimize performance
Created Oozie workflows to automate the jobs for Hive and Impala
Used Azure Data Bricks services and PySpark to enrich and transform the data
Tuning Performance of Spark jobs for high volume feeds on Production Environment
Regular Interaction with Clients to show and tell of end results
Participated in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies
Collaborated with team members and stakeholders in design and development of data environment
Involved in preparing associated documentation for specifications, requirements, and testing

Environment: PySpark, Cloudera, Hadoop, Hive, Impala, Oozie, Kafka, Flume, Maven, GitHub.

Hadoop Developer

KPIT Cummins

08.2016 - 03.2018

Responsibilities:

Installed and configured Hive, HDFS and the NIFI, implemented HDP Hadoop cluster
Assisted with performance tuning and monitoring
Involved in loading and transforming large sets of structured data from router location to EDW using a NIFI data pipeline flow
Worked on Data serialization formats for converting complex objects into sequence bits by using AVRO, JSON and CSV formats
Created Hive tables to load large data sets of structured data coming from WADL after transformation of raw data
Created reports for the BI team using SQOOP to export data into HDFS and Hive
Developed custom NIFI processors for parsing the data from XML to JSON format and filter broken files
Created Hive queries to spot trends by comparing fresh data with EDW reference tables and historical metrics
Used Kafka Utils module in to create an input stream that directly pulls messages from Kafka broker
Worked on partitioning Hive tables and running scripts parallel to reduce run time of the scripts
Extensively worked on creating an End-to-End data pipeline orchestration using NIFI
Provided design recommendations and resolved technical problems
Assisted with data capacity planning and node forecasting
Involved in performance tuning and troubleshooting Hadoop cluster
Administrated Hive, Kafka installing updates, patches, and upgrades
Supported code/design analysis, strategy development and project planning
Managed and reviewed Hadoop log files
Evaluated suitability of Hadoop and its ecosystem to project and implemented various proof of concept applications to eventually adopt them to benefit from the Hadoop initiative

Environment: Spark, Cloudera Distribution, HDFS, Map-Reduce, Hive, Kafka, HUE, Oozie, Sqoop, Maven, BLOB, Micro services, GitHub.

Data Analyst

IGATE

03.2012 - 01.2015

Responsibilities:

Created the tables, views, indexes
Writing Stored Procedures
Configured performance tuning settings using indexes, SQL profiler
Creating database consolidating data and Loading into live database using IS
Creating SSIS Packages by using different data Transformations like Derived Column, Lookup, Data Conversion, Conditional Split, Pivot, Union all and Execute SQL Task to load data into Database
Involved in the performance Tuning
Unit tested the packages
Scheduling the jobs to send the reports

SQL, MySQL Server, Oracle 10g, Java, Eclipse, Maven, Micro services, GitHub.

Education

Master of Science - Data Science

University of Essex

Skills

Good Experience using Issue Tracking and Project management tools like JIRA and RALLY
Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Python technologies

Ability to work effectively in cross-functional team environments, excellent communication, and interpersonal skills
Adequate knowledge and working experience in Agile Methodology, SCRUM, KANBAN and Waterfall Methodologies

Timeline

Data Engineer

British Telecom

04.2018 - 07.2021

Hadoop Developer

KPIT Cummins

08.2016 - 03.2018

Data Analyst

IGATE

03.2012 - 01.2015

Master of Science - Data Science

University of Essex

Sri Gouri

Summary

Work history

Data Engineer

Hadoop Developer

Data Analyst

Education

Master of Science - Data Science

Skills

Timeline

Data Engineer

Hadoop Developer

Data Analyst

Master of Science - Data Science

Similar Profiles

Nikhil YadavNikhil Yadav

Bidhya ChettriBidhya Chettri

Arijit MajumdarArijit Majumdar

Sk Imran QuraishiSk Imran Quraishi

Timi OlayinkaTimi Olayinka