
I am a PhD researcher with more than 2 years of AI benchmarking experience, building evaluation workflows for scientific AI systems. Strong background across biology, model evaluation, data analysis, and experimental design, with experience translating ambiguous scientific problems into structured benchmarks and reproducible workflows.
Part-time.
Building in the Psychometrics x AI space.
Developing novel psychometric assessments that leverage AI in the evaluation process.
AI evaluation: benchmark design, rubric design, model comparison, LLM-as-judge workflows
Technical: Python, pandas, JSON workflows, data analysis, API-based model evaluation
Scientific: cell biology, epithelial biology, biophysics, neuroscience, assay development, wet lab
1st & 2nd year BMBS poster award
Laurent, Jon M., et al. “LABBench2: An Improved Benchmark for AI Systems Performing Biology Research.” arXiv, 5 May 2026, arxiv.org/abs/2604.09554. DOI: 10.48550/arXiv.2604.09554.
Serger, E., Luengo-Gutierrez, L., Chadwick, J.S. et al. The gut metabolite indole-3 propionate promotes nerve regeneration and repair. Nature 607, 585–592 (2022). https://doi.org/10.1038/s41586-022-04884-x
Müller, Franziska, et al. “CBP/p300 Activation Promotes Axon Growth, Sprouting, and Synaptic Plasticity in Chronic Experimental Spinal Cord Injury with Severe Disability.” PLOS Biology, vol. 20, no. 9, 2022, e3001310. DOI: 10.1371/journal.pbio.3001310.