Summary
Overview
Work history
Education
Skills
Timeline
Generic
Sunitha Vallabhaneni

Sunitha Vallabhaneni

Guildford,Surrey

Summary

Certifications: Certified Kubernetes Administrator (CKA) – 2024, AWS Certified DevOps Engineer – Professional – 2022, AWS Certified Solutions Architect – Associate – 2020.

Experience:

Senior DevOps & Platform Engineer with 10+ years of expertise in architecting and scaling AWS cloud infrastructure. Specialist in multi-region Kubernetes (EKS) orchestration, secure Infrastructure as Code (Terraform/CDK), and Zero-Trust networking. Proven track record in leading enterprise migrations, including Aurora PostgreSQL (v16.8) and Kubernetes (v1.35), while implementing automated security gates with Semgrep and Sonatype. Expert in building developer-centric platforms that eliminate deployment bottlenecks through OIDC-based secretless CI/CD and GitOps workflows.

Core Technical Skills

  • Cloud Platform (10+ Years): Amazon Web Services (AWS) – EKS, ECS, Aurora PostgreSQL, S3, EC2, IAM, Route53, VPC Networking, CloudFront.
  • Orchestration & Containers: Kubernetes (EKS), Docker, Istio Service Mesh, Cluster API, Kyverno, Pod Security Standards (PSS), CRDs.
  • Infrastructure as Code (IaC): HashiCorp Terraform (Modular Architecture), AWS CDK, CloudFormation, Atlantis (GitOps), S3/DynamoDB State Management.
  • Security & Compliance (SecOps): OIDC Identity Federation, Sonatype Nexus/Lifecycle (SCA), Semgrep (SAST), Cloudflare Zero-Trust (Tunnels), Okta SSO, CIS Benchmarks.
  • CI/CD & GitOps: GitHub Actions (Custom YAML Workflows), ArgoCD, Self-Hosted Runners (macOS for iOS/Bare-metal Ubuntu), Nexus IQ.
  • Observability & Reliability: Prometheus, Grafana (Self-Service Dashboarding), PagerDuty (P0/P1 Incident Lead), AWS CloudWatch.

Overview

16
16
years of professional experience

Work history

Platform Engineer

Trust wallet
2025.04 - Current

Kubernetes & Orchestration

  • Cluster Lifecycle Automation: Automated end-to-end provisioning, upgrades, and decommissioning using Terraform and Cluster API; implemented canary deployments and automated rollbacks to ensure environment stability.
  • Zero-Downtime Upgrades: Led multi-region cluster upgrades (v1.29 to v1.35) by conducting compatibility audits and utilising staggered node-draining with PodDisruptionBudgets.
  • Custom Resource Management: Engineered and managed Custom Resource Definitions (CRDs) to extend Kubernetes capabilities, enabling domain-specific automation and standardised resource abstraction for internal platform tools.
  • Multi-Region EKS Management: Orchestrated Amazon EKS clusters across multiple AWS regions, enforcing configuration consistency and high availability through GitOps workflows.
  • Traffic Management & Service Mesh: Deployed Istio Service Mesh to manage advanced traffic routing via VirtualServices and DestinationRules, enforcing cluster-wide mTLS and deep observability.
  • Security & Compliance Hardening: Secured environments using CIS Benchmarks, Pod Security Standards, and Kyverno policies; integrated automated container image scanning into CI/CD pipelines.

Sonatype Nexus Enterprise Deployment

  • Enterprise Artifact Management: Led the end-to-end deployment of Sonatype Nexus IQ and Repository Manager across multi-region Kubernetes clusters, establishing a high-availability security gateway for global component lifecycle management.
  • Supply Chain Security: Mitigated npm supply chain attacks by enforcing strict organizational policies that automatically fail CI/CD builds upon detecting malicious packages, effectively eliminating risks from unvetted external downloads.
  • Secure Dependency Routing: Configured private repository proxies to intercept all third-party dependencies, ensuring every component underwent rigorous security scanning and compliance validation prior to environment ingestion.

Multi-Region Artifact Management

  • Global Artifact Orchestration: Designed a unified artifact management strategy across all regional clusters, ensuring consistent package availability and synchronized security scanning for distributed Kubernetes workloads.
  • Secretless GitHub Actions Integration: Engineered secure, secretless CI/CD pipelines by integrating GitHub Actions with AWS via OIDC, eliminating long-lived IAM credentials and significantly reducing the credential leak attack surface.
  • Automated Security Gates: Implemented Nexus IQ within GitHub workflows to provide automated security gates, enforcing build-level policy checks that prevent the promotion of vulnerable components before they reach the registry.
  • Secure Developer Self-Service: Empowered product teams with self-service deployment templates utilizing identity-based access control, shifting security "left" while maintaining strict organizational guardrails.

Terraform & Infrastructure as Code (IaC):

  • State Integrity & Refactoring: Decoupled monolithic Terraform states into project-specific files using S3 backends with DynamoDB for distributed state locking; significantly reduced the "blast radius" and ensured data integrity across global environments.
  • GitOps with Atlantis: Orchestrated the underlying AWS infrastructure lifecycle via Atlantis, implementing a pull-request-based workflow that facilitated transparent peer reviews and eliminated logical state contention between teams.
  • Custom Workflow Automation: Standardised the execution environment using an atlantis.yaml configuration to define custom pre-plan and post-apply hooks, ensuring consistent Terraform versions and automated checks across all projects.
  • Infrastructure Performance Tuning: Optimised deployment velocity by eliminating cascading dependency checks, resulting in a substantial reduction in terraform plan/apply execution times.
  • Modular Architecture: Developed reusable Terraform modules to standardise core infrastructure, ensuring consistent and repeatable provisioning for multi-region EKS networking and foundational layers.

Self-Hosted CI/CD Runner Infrastructure

  • Hybrid Mobile & Linux Fleet: Orchestrated the deployment of self-hosted macOS and bare-metal Ubuntu runners within private subnets to support high-performance iOS/Mobile builds and secure Linux workloads.
  • Infrastructure Observability: Engineered a comprehensive monitoring stack using Prometheus and custom Grafana dashboards to track runner health, disk utilisation, and build throughput, ensuring high-density job processing.
  • Incident Response & Reliability: Integrated automated alerting with PagerDuty, enabling proactive resolution of hardware and resource bottlenecks to maintain 99.9% runner availability for global product teams.
  • Performance & Security Optimisation: Achieved significant build-time reductions by moving to bare-metal hardware while hardening the environment through private networking and IAM/OIDC identity federation, eliminating public internet exposure.

Aurora PostgreSQL CDC Implementation & Upgrades:

  • Real-Time Data Streaming (CDC): Engineered Change Data Capture (CDC) on AWS Aurora PostgreSQL clusters to facilitate low-latency data streaming to analytics platforms, enabling real-time business intelligence and data-driven reporting.
  • Zero-Downtime Major Upgrades: Orchestrated the upgrade of global Aurora clusters to v16.8 utilising Amazon RDS Blue/Green Deployments; ensured zero-downtime transitions and verified application compatibility through rigorous pre-deployment staging.

SAST Implementation with Semgrep:

  • Automated Security Scanning: Engineered a scalable SAST framework using Semgrep across all microservices, integrating it directly into GitHub Actions to provide real-time vulnerability detection within the developer workflow.
  • Shift-Left Security Governance: Configured custom rule-sets and Semgrep Managed Policies aligned with organizational security standards, blocking insecure code patterns and compliance violations at the Pull Request stage.
  • Security Debt Reduction: Reduced production security vulnerabilities by enforcing automated fixes and developer feedback loops prior to merge, significantly lowering the long-term maintenance overhead and security risk profile.

PagerDuty On-Call Management:

  • Critical Incident Orchestration: Acted as the primary responder for P0 and P1 incidents during peak business periods, managing the end-to-end incident lifecycle and ensuring strict adherence to SLA/SLO targets. PagerDuty Incident Response Best Practices
  • High-Stakes Triage & Resolution: Led cross-functional triage efforts, coordinating between SRE, Product, and Infrastructure teams to mitigate customer impact and restore services under high-pressure scenarios.
  • Root Cause Analysis (RCA) Leadership: Facilitated comprehensive post-mortem/RCA sessions, translating technical failures into actionable engineering tasks to prevent recurrence and improve overall system resilience.

Zero-Trust Network & Identity Management

  • Cloudflare Zero-Trust Architecture: Orchestrated Cloudflare Tunnels via Terraform to eliminate public ingress, exposing internal services securely without opening inbound firewall ports.
  • Enterprise SSO Integration: Integrated Okta SSO with Cloudflare Access to enforce identity-based authentication and MFA, ensuring granular access control for all private infrastructure.
  • Infrastructure Hardening: Leveraged Security Groups, NACLs, and IAM policies to restrict traffic exclusively to verified VPN and Cloudflare endpoints, effectively neutralizing the public attack surface.
  • Secure Remote Access: Enabled seamless, encrypted connectivity to private Amazon EKS resources for distributed teams while maintaining strict compliance with zero-trust principles.

Platform Engineering & Developer Enablement

  • Internal Developer Portal (IDP): Integrated Port.io to centralize service catalogs and automate infrastructure workflows, significantly reducing developer cognitive load and manual DevOps overhead.
  • Automated Secret Lifecycle: Leveraged Port.io for automated secrets rotation and password management, ensuring continuous security compliance and minimizing the risk of credential exposure across all microservices.

SCA & Software Supply Chain Security

  • Continuous Dependency Analysis: Orchestrated Software Composition Analysis (SCA) across all private repositories using Sonatype Lifecycle, ensuring 100% visibility into third-party library vulnerabilities and licensing risks.
  • Event-Driven Security Scanning: Engineered automated triggers within GitHub Actions that activate upon lock-file modifications (e.g., package-lock.json, go.sum), providing real-time feedback during the critical dependency ingestion phase.
  • Inline Developer Feedback: Configured Sonatype to surface security findings directly as GitHub Pull Request comments, enabling developers to remediate vulnerabilities before code is merged into protected branches.
  • Shift-Left Governance: Enforced organizational security policies at the build level, significantly reducing production security debt by preventing the introduction of high-risk or non-compliant components.

AWS Certified Senior DevOps Engineer

Definition Health
Worthing, West Sussex
2022.08 - 2024.12
  • Secure Video Upload Solution:
    Designed and implemented a scalable solution to scan patient-uploaded videos for viruses before storage. Leveraged Kubernetes for scalability by running virus scans in isolated pods and integrated AWS Fargate for dynamic workload scaling. Secured uploads using Ingress controllers and enforced strict network policies for isolation, delivering a secure, scalable, and efficient system that reduced malware risks by 100%.
  • Site-to-Site VPN Implementation:
    Designed and deployed site-to-site VPN connections using AWS VPN Gateway and Customer Gateway, enabling secure and seamless communication between on-premise infrastructure and AWS VPCs. This solution improved data transfer security and reduced latency by 30%.
  • Data Transformation with Iguana:
    Deployed Iguana on AWS EC2 Auto Scaling within a dedicated VPC to convert healthcare data formats (HL7, FHIR, CCD, X12, JSON) for compatibility with proprietary systems. Integrated a CI/CD pipeline for automated deployment and updates, reducing manual intervention by 50% and ensuring a secure, scalable, and efficient data transformation process.
  • Health Record Conversion and Analytics:
    Converted millions of health records from JSON to Parquet using AWS Glue ETL jobs and PySpark. Loaded processed data into AWS Athena, enabling efficient querying and visualization through Amazon QuickSight. This reduced query times by 60% and improved data accessibility for stakeholders.
  • IAM and Security Best Practices:
    Designed and implemented IAM policies and roles to enforce least privilege access. Enabled multi-factor authentication (MFA) and identity federation, enhancing security posture and reducing unauthorized access incidents by 90%.
  • Infrastructure as Code (IaC) Automation:
    Streamlined deployment processes by leveraging IaC tools like Terraform and AWS CDK, integrated into CI/CD pipelines. Automated infrastructure provisioning, reducing deployment times by 40% and ensuring consistent, repeatable environments.
  • SSL/TLS Certificate Management:
    Provisioned, managed, and deployed SSL/TLS certificates using AWS Certificate Manager to encrypt data in transit. Configured ACM integrations with AWS services like ELB, CloudFront, and API Gateway, enabling secure HTTPS connections and improving compliance with industry security standards.
  • GitOps Automation with ArgoCD:
    Implemented GitOps workflows using ArgoCD to automate Kubernetes deployments based on Git repository changes. This reduced deployment errors by 70% and improved deployment frequency by 50%.
  • AWS RDS Backup Optimization:
    Configured incremental and full backups for AWS RDS instances, optimizing storage usage while meeting Recovery Time Objective (RTO) and Recovery Point Objective (RPO) requirements. This reduced backup storage costs by 30% and ensured business continuity.

AWS Certified DevOps Engineer

Earth-i
GUILDFORD , SURREY
2020.04 - 2022.08
  • End-to-End ML Inference Pipeline for Satellite Imagery:
    Designed and implemented a scalable, high-availability ML inference pipeline for object detection on satellite images using a Kubernetes (EKS) cluster with 8 GPU nodes. Containerized workloads were deployed using Helm charts, and Kubernetes Jobs and CronJobs were utilized to monitor S3 for image readiness. Preprocessed satellite images using GDAL and OpenCV, performed ML inference with TensorFlow, and stored object count outputs in CSV format on S3. Optimized the pipeline for GPU utilization, fault tolerance, and seamless scaling, achieving a 40% reduction in processing time and 99.9% uptime.
  • Semantic Segmentation Pipeline with Event-Driven Architecture:
    Built a semantic segmentation pipeline for satellite images by integrating an event-driven architecture with Kubernetes for automation and scalability. Managed deployments and updates through GitOps workflows using ArgoCD. Leveraged S3 event notifications and AWS Lambda to trigger preprocessing workflows, with GPU-accelerated inference deployed on Kubernetes. Utilized GPU-optimized pods, network policies for secure communication, and Ingress controllers for external access. Deployed and managed the solution using Helm charts, with Prometheus and Grafana providing real-time monitoring and performance visualization. This solution improved processing efficiency by 35% and reduced manual intervention by 50%.
  • Persistent Storage Solutions in Kubernetes:
    Designed and deployed Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) in Kubernetes for both dynamic and static storage provisioning. Configured storage solutions using local volumes, NFS, and cloud-based storage such as Amazon EBS, EFS, and Azure Disks to support client-specific workloads. Implemented dynamic volume provisioning with Storage Classes, automating the creation of persistent storage for Kubernetes applications and reducing provisioning time by 60%.
  • Multi-Account, Multi-Region Infrastructure Deployment:
    Launched infrastructure across multiple accounts and regions using CloudFormation StackSets to build Dev, Staging, and Prod environments. Ensured high availability and failover capabilities for RDS Read Replicas, reducing downtime by 90% and improving disaster recovery readiness. This setup supported seamless scaling and provided a robust foundation for critical workloads.

Senior Python Engineer AWS Certified

Earth-i
GUILDFORD SURREY, SURREY
2018.04 - 2020.04
  • Involved in design and deployment of cloud services on AWS stack such as EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM, ELB, Sagemaker, Fargate, ECR. while focusing on high-availability, fault tolerance, and auto-scaling.
  • Setup and Manage Kubernetes cluster using EKS and ECS with Fargate.
  • Used Bash and Python , included Boto3 to supplement automation provided by Terraform for tasks such as encrypting EBS volumes backing AMIs and scheduling Lambda functions for routine AWS tasks .
  • Launch infrastructure in Multi Account, Multi regions using Cloud Formation StackSets for building Dev, Staging, Prod environments for high Availability and to provide Failover in the case of RDS Read Replicas.
  • Deploy Flask applications using Amazon Lightsail and AWS S3.
  • CI/CD pipelines using CodeCommit, CodeBuild, CodeDeploy and Code Pipeline .
  • Elastic Beanstalk for deploying and scaling web applications and services with Multi Docker container support along with ASG, ELB support.
  • AWS Sagemaker Ground Truth tool to setup labeling tasks for Datasets to train Machine Learning models and for authentication integrated with Cognito User Pools.
  • Invoke Step Functions to run Machine Learning Models Inference or long running Batch jobs by using Docker containers from ECR.
  • Query the contents of S3 ( CloudWatch Logs ) bucket by using Amazon Athena (supports Geo spatial queries) .
  • API Gateway - Managing Account level Throttling Issues.
  • Auto Scaling Group integrated with Application Load Balancer to redirect traffic to HTTPS, and also implemented Life cycle Hooks for disaster recovery to create AMIs and Snapshots , by setting policies to provide protection when Scale-In happens.
  • Used Atlassain products like JIRA, Confluence for issue tracking, documentation and code integration.

    Python Experience
  • Object Detection and Classification on satellite images using OpenCV, PyTorch, Tensorflow, Scikit-image, numpy, CNN.
  • Semantic Segmentation of Land Classification on High Resolution Satellite Imagery.
  • Experience in running models (Video Compression, Cloud Detection) on Jetson TX2, Jetson AGX Xavier.
  • Deploying Machine learning Models using AWS Infrastructure






Software Python Developer

Flight Data Services
Fareham, Hampshire
2016.12 - 2018.04
  • Analytics Platform – Built analytics platform from scratch using Python, Pyro4, Django models ( for data ).
  • Applied Swagger API framework for designing, building and documenting RESTful APIs.
  • Implementation of Django New Features for Polaris website
  • Soft De-Identification of Flights Data
  • Quarantine Flight Data – Restrict access to flight data which has level 3 event occurrences
  • Flight Guest Access - Restricted guest access to specific portion of visualisation

Software Developer Trainee/Technical Trainer

TIME Private Limited
Hyderabad, INDIA
2010.08 - 2016.05
  • Trained graduate level team members on subject like Design and Analysis of Algorithms, Graph Theory, Databases, SQL, Programming.
  • Data Analysis(Data Loading, Storage, Plotting And Visualization) of 70000 records using python pandas, matplotlib,scipy.
  • Contributed to the launch of MVP (Minimum Viable Product) using python Django to gather validated learning about the product and its continued development.

Education

Master of Science - Computer Science and Engineering

Osmania University
Hyderabad
2010-07

Bachelor of Science - Computer Science and Engineering

J N T UNIVERSITY
Hyderabad
2007-07

Skills

Certifications:

Certified Kubernetes Administrator (CKA) – 2024

AWS Certified DevOps Engineer – Professional – 2022

AWS Certified Solutions Architect – Associate – 2020

AWS: EKS, ECS, Fargate, Aurora PostgreSQL, EC2, VPC, Route53, IAM

Infrastructure as Code: Terraform, AWS CDK (Python), CloudFormation, Atlantis

Kubernetes: EKS, EKS Multi-Region, Istio, Kyverno, Cluster API, ArgoCD, Helm

Container Management: Docker, Docker Compose

CI/CD & GitOps: GitHub Actions, AWS CodePipeline, ArgoCD

DevSecOps & Supply Chain: Sonatype Lifecycle (SCA), Semgrep (SAST), Nexus IQ, GitHub Dependabot, Container Scanning

Identity & Zero-Trust: Okta SSO, Cloudflare Tunnels

Cloudflare: Domains, Pages, Tunnels

Observability: Prometheus, Grafana, ELK Stack, PagerDuty, Victoria Metrics

Networking: VPC, Subnets, Load Balancers, Site-to-Site VPN

Scripting & Software: Python, Go (Golang), Shell/Bash (10 years)

Tools: Jira, Confluence, Shortcut

Microservices: 5 years deploying Microservices Architecture on Kubernetes

Timeline

Platform Engineer

Trust wallet
2025.04 - Current

AWS Certified Senior DevOps Engineer

Definition Health
2022.08 - 2024.12

AWS Certified DevOps Engineer

Earth-i
2020.04 - 2022.08

Senior Python Engineer AWS Certified

Earth-i
2018.04 - 2020.04

Software Python Developer

Flight Data Services
2016.12 - 2018.04

Software Developer Trainee/Technical Trainer

TIME Private Limited
2010.08 - 2016.05

Master of Science - Computer Science and Engineering

Osmania University

Bachelor of Science - Computer Science and Engineering

J N T UNIVERSITY
Sunitha Vallabhaneni