Python, Tensorflow, Keras, PySpark, Dask, Modin, Polars, Numpy, SciPy, Pandas, Scikit-Learn, Graph-tool, NetworkX, Jupyter, Plotly, Matplotlib, BioPython
As a Machine Learning Research Scientist, I have 5+ years of experience in applied deep learning and 7+ years in developing data science pipelines in Python; developed a high-performance data science pipeline for processing billions of genomics sequence data; reimplemented and enhanced existing analytical software, increasing memory-efficiency and speed 1000 fold; created deployable deep neural network models while adhering to responsible-AI principles, achieving state-of-the-art results. I was awarded the prestigious Wellcome Trust ISSF fellowship for continuing high-impact PhD research. I value collaboration and knowledge sharing, taking initiatives in community building and collaboration.
Upon the success of my continuous computational genomics research in the group, I was hired by Prof. Thomas to continue my work to develop the function-discovery pipeline further (see Integrated Masters - research). I expanded the Python-based pipeline I had developed for querying databases, which helped with exploring the genomes of gram-positive bacteria to discover exporters of novel natural antibiotic compounds, which could be used as potential drug targets.
My first introduction to Python programming software development and data science. I completed multiple online courses on these topics from world-class universities, such as MIT and Harvard, while receiving hands-on training and supervision from computer scientists senior researchers in my group as well as the head of Genomics department.
I automated my phylogenomics pipeline - developed during my previous internship - through Python programming, which allowed me to access remote databases more readily and extend the power of the pipeline to a much larger and broader datasets. Notably, as part of an inter-departmental collaboration, I used this pipeline for analysing ABC transformers, among other more challenging transporter families, which resulted in successful identification of a broader range drug-targets.
First introduction to computer science, algorithms and computational biology and genomics, frequentist Vs. Bayesian statistical methods, statistical learning and mathematical inference.
Particularly focused on learning the foundations of the following within the context of the mathematical and statistical underpinnings of phylogenetic methods: optimisation algorithms, Monte Carlo sampling, maximum-likelihood estimation, Hidden Markov Model, MCMC algorithm, Bayesian inference, Dynamical programming, traversal algorithms, search algorithms, multiple sequence alignment algorithms.
Extracted sequences from the BLAST database and developed a rigorous phylogenetics pipeline for functional relationships among bacterial transporter molecules for drug target identification.
Greenfield ML/DL models, HPC, Deep Learning, Data Science, Computational Biology
undefinedPython, Tensorflow, Keras, PySpark, Dask, Modin, Polars, Numpy, SciPy, Pandas, Scikit-Learn, Graph-tool, NetworkX, Jupyter, Plotly, Matplotlib, BioPython
Bash, PyTorch, PyTorch Lightning, JAX, Powerlaw, MrBayes, BLAST, Vaex, RAPIDS
Cython, R, Bioconductor, Bokeh, FLAX, SQL, CUDA
NLP, CV, LLM, Zero-shot Learning, Generative Learning, Supervised Learning, Self-Supervised Learning, Semi-Supervised Learning, Transformers, Deep Long-Tailed learning, Deep Noisy Learning, Forecasting, Regression, Classification, VAE, GAN, Graph Neural Networks, Diffusion Models, Reinforcement Learning