Unpopular opinion: Most biotech companies aren't ready for AI/ML, and it's not because of their data scientists. It's because of their bench scientists. Here's why: AI/ML requires large, consistent datasets. But in most biotechs, every scientist analyzes their assays slightly differently. They use different thresholds, different ways of normalizing data, different ways of identifying outliers — even different formulas for basic calculations. This inconsistency makes it impossible to compare results across experiments, let alone build AI models. "But we have SOPs!" you might say. Sure, for running the experiments. But for analyzing the data? Rarely. The hard truth is that without standardized analysis protocols, your data is useless for AI/ML. And implementing those protocols means taking away some of the autonomy bench scientists are used to. It's a tough pill to swallow, but it's necessary if biotechs want to harness the power of AI. At Sphinx Bio, we're building tools that make it easy to implement and enforce these protocols, without sacrificing the flexibility scientists need for exploratory analysis.
Leave the lab scientists be, they come to the lab every day and do hard work for less pay. What you need is a knowledge engineering layer between lab and ML, staffed by people who have done both
"Yes" to standardizing data analysis and collaborating with your bioinformaticians during the experimental design process. 👏 "No" to the "it's them" mentality between the bench and data sides 🙅♂️. I worked at an AI/ML company where fingerpointing and accountability volleyball between these two groups was rampant in its culture. That rhetoric only causes more problems than it solves.
Don’t think there’s ever going to be a “standard way” to analyze data. This doesn’t work in other fields and isn’t going to work in biology. Best case scenario is that we end up with a dozen different standards. And that’s what machine intelligence should help us with: making sense of heterogeneous incomplete data. Otherwise, what’s the point? It’s a difficult task, yes. But whoever solves it wins.
This is why some companies use robots for bench work
It’s probably less expensive to use slightly more sophisticated ML methods that don’t rely on naive i.i.d. assumptions about their training data than change the way biologists run experiments though. In physics we learn early on that pens and paper cost less than particle colliders.
I wouldn't say it's *because* of bench scientists, its because of the shear time and expense of generating this type of data in general. From my experience is quantity problem, not a quality problem -- but then again I've worked with some great scientists and RAs!
Alternatively, design experiments so that ML task compatibility is baked in! (This sounds similar to a common cause of biostatisticians: get a consult before getting the data.)
From my past experience, ML/AI teams often don’t understand the complexities of experiments and how different factors cause variations. It’s unfair to blame bench scientists. Instead of pushing for strict standards, we should focus on building AI that can learn from these differences and adapt. Adding a layer between bench scientists and ML teams, with duel knowledge of both areas, could help bridge the gap and lead to better solutions.
Associate Director at Artios Pharma Ltd
4moIt’s not just that. If you generate an IC50 of 5nM with 2 cpds but one is diffusion controlled on rate and fast off and the other is 1000x slower kinetics then I doubt AI is going to tell you why they behave very differently in cells even though they’re ‘apparently’ identical….