Raphael Sonabend

Hey I'm Raphael Sonabend (PhD, GradStat) [They/Them, He/Him], welcome to my site!

I'm a Technology Manager in the Data for Science and Health team at the Wellcome Trust where I manage projects related to foundational tech, AI, and infectious diseases.

I also work as a freelance data scientist and am available to consult on anything related to data science research and management. My experience ranges from survey design and sampling methods to running end-to-end project management on data science projects, writing academic proposals to designing bespoke machine learning models, and a whole lot more. Check out the 'DSRM' link above for more details.

In my academic time (Fridays) I pursue research around machine learning and survival analysis, I am currently affiliated with Imperial College London as a Visiting Researcher. See the 'Research' tab for more.

As well as theoretical research, I've published several R packages to help make machine learning more accessible, including a universe of packages to expand R's object-oriented capabilities. More recently I've started publishing Julia packages too!

Outside of work you'll probably find me walking my dog.

Explore my website to find out more.
Selected Research and Media

See my Google Scholar for a full list of up-to-date publications.

Research Interests

DSRM Consulting

On Fridays I consult as a sole trader under the name DSRM (Data Science Research and Management) Consulting. I am currently open to contracts focusing on data science and analytics. I have experience in I have worked with organisations across industries and third sector organisations. Brief descriptions of my past experiences are listed here.

Roche, 2019

On behalf of the Alan Turing Institute I consulted with the pharmaceutical company Roche to understand how to make use of machine learning for prediction of non-small cell lung cancer (NSCLC). I was hired by the Turing to demonstrate expertise in the company and to secure a longer-term partnership. In a team of three, we had initial meetings to understand the client's needs, which ranged from integration of very large genomics datasets to understanding statistical biases in their data. I focused on developing a machine learning toolbox for survival analysis that could make survival-distribution predictions after being trained on their data. My work was released open-source as mlr3proba. As well as designing methods compatible with their data and research questions, I also identified biases in the data, discussed methods to prevent security breaches between datasets, and ran a benchmark experiment to highlight the potential for machine learning. The work was well-received and the client signed a long-term agreement with the Turing.

Oaksure, 2018-2019

Oaksure contracted me to design and implement a CRM system that would meet the needs of the company. The company was a relatively small charity and therefore required a system that would meet their needs whilst not being too costly. I provided a report on available solutions and together selected Zoho. I consulted with all employees over a period of 1-2 months, drew up wireframe designs for the CRM, and presented these to the CEO. In addition, I identified key considerations related to GDPR and outlined how these would affect different employees. As well as designing the web-interface, I also ensured that we made use of the Zoho mobile-app and optimised the designs for this platform. In doing so, employees were able to make use of the system for data collection whilst off-site. After tweaking the designs and implementing the system I spent a month training all employees and producing written materials to ensure the system could be used beyond the end of the contract.

LEGO, 2018

On behalf of UCL Consultants I worked with LEGO to analyse a survey they had conducted on staff satisfaction and an intervention to improve this. I was brought onto the project at the last minute and had three days to analyse all their data and carry out statistical tests. Their survey asked staff questions relating to happiness at work before and after an intervention was employed. Staff were not linked between surveys, which limited the number of statistical tests I could utilise. I made use of hypothesis tests, such as Chi-Squared, to test if there was a statistically significant change in satisfaction before and after the survey and if there were differences in companies across the world. In addition, I performed factor analysis, including principal components analysis, to understand the key drivers of employee satisfaction. I presented the findings in a Powerpoint slide deck and a PDF to the client, I included the complex statistical detail in the report and ensured this was well-understood when presenting. I helped the client draw out informative findings that they could present to the wider team and additionally provided feedback on survey design and sampling methods.

Nuffield Health, 2017

On behalf of UCL I consulted with Nuffield Health to design a bespoke survival algorithm for predicting customer churn from their gyms. The client's primary concern was for the model to be interpretable and understandable by key stakeholders. I developed reduction-based models that made use of classification models for making survival predictions over multiple time-points of interest. I utilised Microsoft Azure to train and validate the model in a secure environment.

ISEH, 2017

On behalf of UCL I consulted with ISEH to analyse results of a recent report into the benefits of surgery for anterior cruciate ligament (ACL) injuries. I examined the interaction between recovery, surgery, smoking status, and other demographic features, and created a slide deck that fed into their annual report. I made us of data visualisation techniques and discrete hypothesis testing to demonstrate dependence between variables. In addition I made use of machine learning to hypothesise causality between variables by making use of results linking strictly proper scoring rules and dependence of variables.

Free2Learn and Koppel Project, 2016-2017

I was employed on a fixed contract as an in-house Systems Project Manager to consult employees across the company to design and implement a CRM system. I met with the CEO of the company, heads of departments, and core internal stakeholders to understand their individual needs for a CRM system. I designed a system using Salesforce and made full-use of the advanced automated technology. Using this built-in automation I designed and implemented a system capable of automating a system for sending rent reminder emails, late-stage demands, and integration with QuickBooks.

I use Medium to publish short blog posts about statistics, data science, machine learning, and occassionally ethics. Check out my articles at the links below (contact me if you have any trouble accessing these).

If you want to work, research, or collaborate with me, please feel free to reach out via any of the methods below. I'm actively looking for people to join our lab and to help maintain R packages.