Managed and administered enterprise-scale Microsoft Azure infrastructure, ensuring high availability, resilience, security, and operational stability across multiple business-critical systems while meeting strict SLA requirements.
- Acted as a subject matter expert for key cloud-hosted platforms, leading technical decision-making, onboarding activities, and providing guidance to development and support teams throughout the software delivery lifecycle.
- Architected and delivered cloud infrastructure for multiple strategic business applications, balancing technical requirements, budget constraints, project timelines, and operational considerations to deliver scalable and cost-effective solutions.
- Designed, implemented, and maintained Infrastructure as Code (IaC) solutions using Terraform and Pulumi, establishing reusable deployment patterns and shared services that accelerated delivery and improved platform consistency.
- Engineered a reusable global infrastructure module adopted across multiple cloud environments, reducing infrastructure deployment times by 50% and improving standardisation across engineering teams.
- Designed and implemented automated CI/CD pipelines within Azure DevOps, incorporating automated testing, deployment validation, security scanning, and release controls to improve software delivery speed and reliability.
- Architected and deployed a dynamic self-service development platform leveraging Docker containers and Azure Kubernetes Service (AKS), enabling rapid provisioning of isolated development environments and significantly reducing developer onboarding and environment setup times.
- Led the implementation of DevOps best practices and automation-first principles, promoting repeatable, scalable, and highly available deployment strategies across multiple product teams.
- Participated in a 24/7 on-call support rota for business-critical services, diagnosing and resolving complex infrastructure, application, networking, and database incidents within defined SLA targets.
- Conducted detailed root cause analysis (RCA) following major incidents, identifying systemic weaknesses and implementing preventative improvements that reduced recurring incidents and increased overall platform reliability.
- Diagnosed and resolved SQL Server performance issues, query bottlenecks, indexing problems, resource contention, and application-database integration issues as part of incident management and performance optimisation activities.
- Designed and implemented centralised monitoring, logging, alerting, and observability solutions, providing real-time operational insights and enabling proactive service management across cloud-hosted platforms.
- Developed automated dashboards and API-driven reporting solutions, providing engineering leadership and stakeholders with actionable insights into platform health, service performance, operational metrics, and cloud expenditure.
- Architected and implemented a cloud cost optimisation and reporting platform, improving budget forecasting accuracy and reducing cloud expenditure by 20% through intelligent cost allocation, utilisation analysis, and governance controls.
- Configured and managed Akamai services, implementing Web Application Firewall (WAF) policies, bot mitigation controls, traffic management, and security best practices to strengthen application security and resilience.
- Developed and optimised high-performance applications, APIs, and integrations using C#, JavaScript, and React, supporting critical business operations and customer-facing services.
- Contributed to the successful relaunch of a major retail platform, optimising deployment processes, enhancing security controls, and delivering the project 75% ahead of schedule.
- Designed and implemented automated backup and disaster recovery solutions for business-critical systems and third-party platforms, ensuring data integrity, resilience, and compliance requirements were consistently met.
- Automated operational processes through PowerShell scripting, API integrations, and workflow automation, reducing manual effort, improving operational efficiency, and eliminating repetitive support tasks.
- Performed peer reviews of infrastructure and application code, promoting engineering standards, knowledge sharing, maintainability, and continuous improvement across delivery teams.
- Collaborated closely with architects, product owners, stakeholders, and delivery teams to define technical solutions, delivery priorities, development estimates, and long-term platform roadmaps.
- Mentored junior engineers and developers, providing technical coaching, sharing best practices, and fostering a culture of continuous learning, operational excellence, and DevOps maturity.
- Redesigned and standardised organisational documentation processes, improving knowledge sharing, onboarding efficiency, and support capability across a user base of more than 300 employees.