Source: https://blog.f1000.com/2014/04/04/reproducibility-tweetchat-recap/

Ever since we started working on data science projects at Wellcome data labs we have been thinking a lot about reproducibility. As a team of data scientists, we wanted to ensure that the results of our work can be recreated by any member of the team. Among other things this allows us to easily collaborate on projects and offer support to each other. It also reduces the time it takes us to transition from an experimental idea to production.

How do we do that?

Our process has evolved over the years but the core ideas are heavily inspired by data science cookiecutter. Even though we…


Products that rely on data science run the risk of incorporating societal biases in the algorithms that power them — potentially causing unintended harms to their users. At Wellcome Data Labs we have been thinking and openly sharing our journey towards surfacing and mitigating those harms. We split our review process in two streams

  1. Product impact analysis which looks into harms that can be caused by the product irrespective of the algorithm
  2. Algorithmic review process which looks into harms introduced by the model or data irrespective of the product

We have written about product impact analysis in the past so…


Neural networks have been a ubiquitous part of the resurgence of Artificial Intelligence over the last few years. Unsurprisingly then, we decided to use a neural network as the modelling approach for tagging our grants with MeSH. Neural networks have raised the state of the art performance on the task to 71% from below 60%. Understandably neural networks may feel complicated to someone outside the field of machine learning, but in this piece my goal is to make them as understandable as logistic regression and Principal Component Analysis (PCA). …


My day to day job is to develop technologies that automate different processes at Wellcome through data science and machine learning. As a builder of these digital products, my team and I have to consider the unintended consequences they may have on users and on society more broadly. We know this is important because of famous cases where the consequences weren’t considered and harm has been done. For example the Google photos tagging system and Amazon hiring algorithm that have been accused of unintended racist and sexist bias. At Wellcome Data Labs we are developing an agile way of assessing…


As data scientists, our day job is around modelling. We create models to recommend new products, to increase conversion rates, to explain user behaviour etc. And depending on your background, it is more likely to be familiar either with machine learning techniques or regression type analysis. It is the difference among these two distinct approaches in modelling that motivated me to write this post which is a summary of a talk I gave recently in PyData London.

Generally speaking, we model so as to achieve one of two goals a) explain what is going on in our observations, or b)…

Nick Sorros

Data scientist minimising the cost function of negative emotions. www.nicksorros.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store