I am a data scientist who is passionate about bringing data science concepts and techniques to real world problems, a person who sees AI as a tool for aide not replacement. I offer a strong foundation in analytical thinking and problem solving. I am well versed in python and sql and have extensive knowledge in statistics and machine learning. Being and end to end data scientist I work from the ground up will all my projects from the data build to the visualisation dashboards.
Built a churn model to predict the probability of a customer leaving:
Technical aspect:
Business aspect:
Segmentation model to create customer cohorts
Technical aspect:
Business aspect:
Created cohorts passed onto the commercial team for more tailored marketing and logistics.
Time-series model forecasting the number of faults coming into the business:
Technical aspect:
Business aspect:
Built a classification model to predict propensity to fault and furthered faults:
Technical aspect:
Business aspect:
Setting data science practices within the business:
Building a model to predict price elasticity for a given product for a multinational retail client:
Technical aspect:
• Liaising with industry experts to determine which features should be used in our analytical dataset
• Cleaning the dataset i.e. dealing with missing data, transforming features
• Harnessing linear models to predict price elasticity, writing production R code and building R packages
• Creating tests to see if outputs match business requirements of the client
• Documenting methods used and presenting to clients both technical and non-technical
• Creating the production environment using docker which runs R scripts and writes outputs to a database
Business aspect:
• Allows the company to in real time to determine the price for maximum profit and demand
Building a topic modeller for categorising tweets from a customer service account :
Technical aspect:
• Ingesting live tweets
• Creating a dataset of tweets
• Cleaning tweets i.e. removing emojis and special characters
• Analysis of tweets including word count and time analysis
• Preparation for modelling i.e. lemmatising, count vectoriser
• Creating a Latent Dirichlet Allocation model
Business aspect:
• Allowed the business to see the topics of the most occurring current issue
Professional Program for data science, Microsoft