Highly versatile Data Scientist with expertise in data analysis, visualisation, and advanced analytics. Proficient in Power BI, Tableau, and Looker, creating impactful visualisations for informed decision-making. Skilled in Python, R Studio, and SQL, identifying insights in complex datasets. Strong in statistical modelling, predictive analytics, and regression for trend forecasting and data extraction. Proficient in machine learning, developing impactful algorithms. Well-versed in SPSS and EViews for robust statistical validation. Enthusiastic learner and collaborative team player. Poised to drive strategic data utilisation and contribute to organisational success as a dedicated Data Scientist.
I have done almost 30 projects. I have experience in data wrangling, data processing, modelling and training the machine learning models. I
also have experience of building deep learning models such as neural network, RNN-LSTM etc. I mostly use Python and R as programming
languages and I have worked on MYSQL as primary database technology. I have worked on building dashboards using r shiny app and building
model predictions applications.
1. User clicks predictions based on articles [ Python, machine learning, deep learning and NLP]
● Comparing dimension reduction techniques using high dimensional dataset provided by mondaq ltd using python. Machine learning,
deep learning, NLP and text analytics techniques are used to make predictions of users clicks on new articles.
2. Restaurant profitability predictions [ Python and machine learning ]
● Restaurant profitability predictions by using different Machine Learning algorithms. Restaurant historical dataset was given and the
main aim was to make predictions whether a newly opened restaurant will be profitable or not. Main duties were to clean the data,
building and improving model’s performance.
3. Road accidents analysis and predictions [ R, statistics and machine learning]
● Predictive , descriptive and statistical analysis on the data set of road incidents happened in country Oman using R. Main objective
was to make a machine learning model which could predict the number of deaths as a results of accidents
4. Improving model’s performance using dimensions reductions [ Python and deep learning]
● Auto-encoder and PCA comparison using high dimensional dataset using python. Main goal was to check the effect of deep learning
and neural network autoencoder on the model’s performance. Main tasks included data pre-processing, model building and
improving model by applying dimension reduction techniques.
5. Analysing children’s contacts data and building models [ SQL, R, shiny app and machine learning ]
● Project based on real time company’s dataset. Getting data from SQL database, pre-processing the data, building models using R
language and then building shiny app so, a person with non-technical background could use the app to see model’s prediction
6. Time series forecasting of Insulation resistance values [Machine learning, Deep learning and Python ]
● Applying time series models such as ARIMA, Facebook prophet and LSTM for time series data forecast
7. Catching anomaly detection points [ Machine learning and python]
● Anomaly detection machine learning models to catch point of interest or irregular point that is away from normal data points.
8. Predicting circuit fault type using machine learning [ Machine learning and python]
● Machine learning classification models to predict whether fault will occur or not based on input features. Steps include data cleaning,
visualization, building machine learning models and selecting a final model for predictions of cable faults
9. Sentiment analysis [ Data cleaning pipeline, NLP techniques and sentiment analysis]
● Built a pipeline ( html tags removal, tokenization, stop word removal, stemming, lower case) for cleaning website data using python
language. After cleaning the data applied sentimental analysis on real time data using naive bayes classifier.
10. Analysing and visualizing children contacts data [ Dashboard and R ]
● Data visualization project (created dashboard in which different types of plots, heat-maps, histograms, bar-plots, pyramids and time
series graphs were plotted)
11. Stochastic gradient descent optimisation for heart disease classification.
12. Analysis of a policing dataset from Dallas, Texas in 2016.
13. Study of biodiversity measure, comparison between different taxonomic groups.
14. Text analysis for the transcript of two TED Talk Speakers.
15. Analysis of a dataset which includes various characters of a disease called Alzheimer.
16. Final Year Project - "Impact of Fintech Usage on Firm Performance".