Summary
Work history
Education
Skills
Timeline
Generic

Sri Gouri

Summary

Data Engineer with 8 years of experience in Analysis, Design, Development in Big Data technologies like Spark, MapReduce, Hive, Yarn, HDFS and Azure Cloud technologies. Experience in Extraction, Transformation, and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting and aggregating. Experience in designing and implementing AZURE Cloud services like Azure Data Factory, Azure Data Lake, Azure Synapse Analytics, Azure Devops and Azure DataBricks for Big Data solutions. Experience in Data Visualization and Analytics using Power BI. Good knowledge of data architecture including data ingestion pipeline design, Hadoop/Spark architecture, Data Modelling, Machine Learning, and advanced data processing.

Work history

Data Engineer

British Telecom
04.2018 - 07.2021

Responsibilities:

  • Involved in Monitoring, Maintaining and Reporting Cloudera Hadoop Cluster
  • Responsible for data services and data movement infrastructures
  • Processing Millions of JSON, AVRO and PARQUET events from various probes(sources) in to target HDFS Data Lake for real time analysis
  • Creating secured pipeline for different protocols like PROXY, DHCP, NETFLOW, DNS, etc.,
  • Developed Ingestion and Enrichment solution using Apache Spark - Kafka Streaming using Python
  • Transforming data using in-house developed API in Spark Ingestion solution and Enrichment Lookups in to HBASE
  • Implemented Kafka offset management using Direct Stream API
  • Implementing Partitioning and Bucketing concepts in Hive on External tables to optimize performance
  • Created Oozie workflows to automate the jobs for Hive and Impala
  • Used Azure Data Bricks services and PySpark to enrich and transform the data
  • Tuning Performance of Spark jobs for high volume feeds on Production Environment
  • Regular Interaction with Clients to show and tell of end results
  • Participated in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies
  • Collaborated with team members and stakeholders in design and development of data environment
  • Involved in preparing associated documentation for specifications, requirements, and testing

Environment: PySpark, Cloudera, Hadoop, Hive, Impala, Oozie, Kafka, Flume, Maven, GitHub.

Hadoop Developer

KPIT Cummins
08.2016 - 03.2018


Responsibilities:

  • Installed and configured Hive, HDFS and the NIFI, implemented HDP Hadoop cluster
  • Assisted with performance tuning and monitoring
  • Involved in loading and transforming large sets of structured data from router location to EDW using a NIFI data pipeline flow
  • Worked on Data serialization formats for converting complex objects into sequence bits by using AVRO, JSON and CSV formats
  • Created Hive tables to load large data sets of structured data coming from WADL after transformation of raw data
  • Created reports for the BI team using SQOOP to export data into HDFS and Hive
  • Developed custom NIFI processors for parsing the data from XML to JSON format and filter broken files
  • Created Hive queries to spot trends by comparing fresh data with EDW reference tables and historical metrics
  • Used Kafka Utils module in to create an input stream that directly pulls messages from Kafka broker
  • Worked on partitioning Hive tables and running scripts parallel to reduce run time of the scripts
  • Extensively worked on creating an End-to-End data pipeline orchestration using NIFI
  • Provided design recommendations and resolved technical problems
  • Assisted with data capacity planning and node forecasting
  • Involved in performance tuning and troubleshooting Hadoop cluster
  • Administrated Hive, Kafka installing updates, patches, and upgrades
  • Supported code/design analysis, strategy development and project planning
  • Managed and reviewed Hadoop log files
  • Evaluated suitability of Hadoop and its ecosystem to project and implemented various proof of concept applications to eventually adopt them to benefit from the Hadoop initiative

Environment: Spark, Cloudera Distribution, HDFS, Map-Reduce, Hive, Kafka, HUE, Oozie, Sqoop, Maven, BLOB, Micro services, GitHub.

Data Analyst

IGATE
03.2012 - 01.2015

Responsibilities:

  • Created the tables, views, indexes
  • Writing Stored Procedures
  • Configured performance tuning settings using indexes, SQL profiler
  • Creating database consolidating data and Loading into live database using IS
  • Creating SSIS Packages by using different data Transformations like Derived Column, Lookup, Data Conversion, Conditional Split, Pivot, Union all and Execute SQL Task to load data into Database
  • Involved in the performance Tuning
  • Unit tested the packages
  • Scheduling the jobs to send the reports

SQL, MySQL Server, Oracle 10g, Java, Eclipse, Maven, Micro services, GitHub.

Education

Master of Science - Data Science

University of Essex
UK

Skills

  • Good Experience using Issue Tracking and Project management tools like JIRA and RALLY
  • Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Python technologies
  • Ability to work effectively in cross-functional team environments, excellent communication, and interpersonal skills
  • Adequate knowledge and working experience in Agile Methodology, SCRUM, KANBAN and Waterfall Methodologies

Timeline

Data Engineer

British Telecom
04.2018 - 07.2021

Hadoop Developer

KPIT Cummins
08.2016 - 03.2018

Data Analyst

IGATE
03.2012 - 01.2015

Master of Science - Data Science

University of Essex
Sri Gouri