Summary
Overview
Work History
Education
Skills
Sex
Personal Information
Timeline
Generic

Sahithi Sirisetti

Data Engineer
Wokingham,WOK

Summary

Experienced IT professional with a proven track record of 8 years in the industry, specializing in Hadoop/Big Data Technology. With 3 years dedicated to the Energy and Sales domain in the UK, I bring a comprehensive skill set encompassing the entire Hadoop Ecosystem, including Hadoop, HDFS, Hive, Spark, and Yarn. Proficient in Agile, TDD, and Waterfall methodologies, I have successfully navigated all phases of the Software Development Life Cycle. My expertise extends to Database Design, Normalization, and Data Flow Diagrams, complemented by a strong familiarity with SOA and AWS Cloud Services, including S3, IAM, EMR,Glue and more. I possess hands-on experience in requirements gathering, data modeling, and backend development using Pentaho ETL tools, and excel in data analysis for business case identification and system maintenance. I have collaborated closely with Power BI teams for reporting purposes, demonstrating a knack for cross-functional teamwork. With proficiency in Hadoop distributed file system and large-scale data processing/storage, along with skills in deploying applications using GIT and automating reports through Redshift with Jenkins, I have successfully executed numerous projects from testing to post-production analysis. I bring to the table extensive experience with relational databases and version control systems, including Oracle, SQL Server, PostgreSQL, AWS Redshift, and GIT. Additionally, my proficiency in Docker container management and CI/CD model deployment, coupled with strong technical and business analysis skills, positions me as a quick learner with excellent communication, interpersonal, and debugging abilities, adept at collaborating with cross-domain experts and managing clients onsite and remotely.

Overview

10
10
years of professional experience

Work History

Sr. Data Engineer

British Gas/Hive Home
3 2021 - Current
  • Involved in IoT data processing for Hive, a smart home devices company owned by British Gas, utilizing AWS services.
  • Managed ingestion and processing of data from various sources like Exertis, Amazon, etc., via Salesforce.
  • Engaged in Analysis, Design, Development, System Testing, and User Acceptance Testing, following Agile methodology.
  • Processed raw data in S3 using Kinesis Data Streams, Glue, and Python scripts to various file formats like JSON, CSV.
  • Utilised Pentaho for data ingestion from SFTP to S3 and from S3 to Redshift, and managed Redshift tables.
  • Developed data pipelines and EMR Jobs using Spark Scala for processing raw data and monitored production environments.
  • Implemented Jenkins pipelines for data import/export, Redshift SQL for dataset creation, and Redshift Spectrum for data processing.
  • Adapted Hive data layouts as per business needs, managed automated data extracts, and provided weekly data reports using Jenkins.
  • Implemented Slack notifications for job statuses and utilised Data Integration tools for Redshift.
  • Proficient in Python scripting for data manipulation and collaborated with Power BI team for data visualisation.

Sr. Data Engineer

Centrica PLC UK
10.2020 - 3 2021
  • Implementing and applying rules to customer details in the data lake table.
  • Around 40 rules provided, used to identify customers eligible for smart meter installation and those ineligible.
  • Dependent CRM and WMIS tables ingested and made available in the Data Lake prior to project commencement
  • Engaged in Agile processes with daily scrums for regular discussion on development and release plans.
  • Tasks Include Requirement Analysis, Design, and Development.
  • Creation, data loading, and query writing for Hive tables
  • Master wrapper script developed to trigger all HQL, create log files, and submit Spark jobs.
  • Git merge requests used for code development in the CI/CD Model, managed via Jenkins.
  • Assistance provided to the Production team during deployments and resolution of initial production issues.
  • Critical hive tables migrated to Spark Scala files using data frames for enhanced efficiency.

Data Engineer

Integrated Metrics System
01.2014 - 10.2020
  • IMS organizes client-provided unstructured data, generating required datasets for client applications, replacing traditional databases.
  • It encompasses journal, author, publication, and citation metrics, tailored to client needs. To expedite results, Hadoop Distributed File System (HDFS) replaces databases, storing information as text files for quicker query execution.
  • Collaboration with a leading Business Analytics Service provider facilitated this transition. Tasks involved pulling raw data, processing, and ingestion onto HDFS, along with developing MapReduce jobs for data processing.
  • Datasets were created to fulfill client requirements, with daily data ingestion, preprocessing, and post-processing.
  • Additional tasks included metric calculation, data analysis on large datasets, performance optimization, and job scheduling and tracking using Skybot Scheduler.

Education

Skills

Hadoop Ecosystems (HDFS, Hive, Sqoop, Yarn, Spark)

undefined

Sex

Female

Personal Information

  • Date of Birth: 09/06/1993
  • Marital Status: Married

Timeline

Sr. Data Engineer

Centrica PLC UK
10.2020 - 3 2021

Data Engineer

Integrated Metrics System
01.2014 - 10.2020

Sr. Data Engineer

British Gas/Hive Home
3 2021 - Current

Sahithi SirisettiData Engineer