Closing the biotech/AI loop at the source: the individual

dibidave
Jul 28, 2020
6 min read

As I transitioned from software engineering and AI to bioengineering, I was immediately struck by the incredible power of the tools researchers had at their disposal thanks to the DNA manipulation technologies developed over the last few decades. We now have the ability to deliver specific genes into cells, know the exact genetic code that represents every component produced in our cells, visualize cellular behavior at the molecular level over time, monitor neurons as they fire en masse, and control the behavior of cells with targeted circuit perturbations. It was totally mind blowing.

One of the first projects Tatyana and I wanted to undertake during our PhDs was to investigate if there were early signs of Alzheimer’s in blood samples. This is a relatively rich field, and a lot of legwork has been done finding markers in blood linked to Alzheimer’s already, so it was hard to find a unique angle. Luckily, we were in a lab that was building out Caltech’s single-cell sequencing capabilities - in short, this is a relatively new technique that lets researchers inspect the full spectrum of behavior going on in individual cells by using sequencing to intercept the RNA messages that determine what the cells produce. It is incredibly rich data; from single samples we could look at 10s of thousands of cells and see their levels of production of 10s of thousands of different components. A big data dreamland. But, we immediately hit a roadblock; getting Alzheimer’s blood samples was hard. Needles are invasive to already vulnerable populations, and blood banks that have blood frozen down for years were disallowed due to federal regulations from following up with patients to see if they have developed Alzheimer’s. So, we decided: let’s collect it ourselves. We don’t have access to a phlebotomist as a school disconnected from a hospital, but with some back-of-the-envelope calculations we realized: for single-cell sequencing you really don’t need that much blood. Most of the sample is discarded at the last step before sequencing anyway. We can probably squeeze out enough from just some capillary blood. We were in luck; Seventh Sense Biosystems had recently released a device that made this even easier than finger pricks; with a tap of a button, it painlessly collected 100uL of blood for you from your arm, no phlebotomist needed. We contacted them and asked for evaluation devices, and set to work on perfecting the single-cell sequencing workflow to work on small volumes. Our first test subjects: ourselves. The data was incredibly exciting - we could go and look to see if particular cells in our immune system had certain programs turned on or off. Are they currently fighting an infection? Are they predisposed to being overactive? I even found a deficiency in an enzyme that I followed up with a doctor and confirmed, that the doctor would have never known to look for until perhaps decades later when my health was being affected by it. Seeing all this information about myself, and imagining that people could potentially know this kind of stuff about themselves, was eye-opening.

Something started to strike me as supremely odd; how can it be that we have this incredible level of knowledge about what's happening at a cellular and molecular level, yet when I go to the doctor and tell them my abdomen has been hurting for a week and I think something's wrong, all they tell me to do is come back in a month if it gets worse? And how can it be that in the face of a global pandemic, the recommendations from government and academics are basically unchanged from the dark ages: go home, close your doors, wash your hands and cover your face. Where is this disconnect coming from?

... how can it be that in the face of a global pandemic, the recommendations from government and academics are basically unchanged from the dark ages: go home, close your doors, wash your hands and cover your face

In the field of AI and machine learning, we often think of problems in the terms of their objective function; what is the target metric we are trying to maximize or minimize. In recommender systems, it's increasing the success of predicting the right product; in medicine it might be diagnosing an image correctly as cancerous or not; in autonomous vehicles it might be correctly detecting objects in the road. From that perspective, humanity isn't much different than machine learning algorithms; we, too, have defined objective functions set forth by governmental and economic incentives, and as a society we optimize for those functions. In machine learning there's a common mistake that many early (and even late) career machine learning engineers make, which is not properly defining the objective function to meet the goal they really want. If you tell your autonomous vehicle to optimize avoiding obstacles at all costs, it might never move at all because the risk of movement is too high. If you tell your medical diagnostic algorithm to be as accurate as possible for a rare disease, it might conclude that it should always predict "negative" regardless of the data, because that's going to get it 99.99% accuracy. Sometimes the problem is exceedingly hard, because you don't even have access to the true objective values; you only have access to proxies. In autonomous vehicles, you don’t get to test out your algorithm on real pedestrians; you work on dummies or simulations.

So, why is it that we have all these great tools for biology and medicine, but our ability to treat common maladies in healthcare is so seemingly far behind? I believe it has a lot to do with the fact that our systems are optimizing for proxy objectives. Researchers in academia and industry alike are removed from their patient outcomes by many layers: animal models, clinical trials on select populations, and economic incentives, to name a few. We can't make perturbations to a system and see how it actually affects the end result directly; that is, patient symptom relief, longevity, and happiness. These are only considered much later in the process.

our systems are optimizing for proxy objectives. Researchers in academia and industry alike are removed from ... how it actually affects the end result directly; that is, patient symptom relief, longevity, and happiness.

Yet, there is someone who has access to the true objective value: the consumer. We know whether we are feeling better today than we did yesterday. We know whether our disease got worse over the last year. And, we as consumers are actively optimizing this objective function ourselves. We are constantly experimenting, taking in advice from myriad sources, and then applying them to ourselves to see if they improve our objectives. And, frankly, in many ways we’re better at this than the healthcare system can provide. People with chronic conditions often learn the routines and lifestyle changes they need to make to manage their symptoms; they demand specific drugs from their doctors because they learned for themselves what works. The healthcare system can’t hope to compete with this level of self experimentation. Their objective function is different; they need to cure the largest number of people with minimal resources, which means the incentive just isn’t there for taking the time to understand what experiments or drugs will work the best for each patient. It’s better to just give them the most likely drug, see if it works, and move down the list if it doesn’t. The cost of pairing the right person with the right intervention at the right time is greater than the cost of iterating through a list of drugs. Especially given the fact that someone continues to make money throughout the whole process, not just when the patient is better.

But, self experimentation has its own slew of problems. We start our self experimentation semi-randomly and we have to balance anecdotes, forum posts, doctor’s advice. And it takes time. So much time. People spend years trying to identify their food triggers or their medicine regimes that work best for them; and let’s not forget, our body changes over time. What worked for you this month, year, decade, probably won’t work for the next. You have to re-learn. You have to keep experimenting.

This is why I chose to work on this problem. I believe that the individual is the most well-positioned agent for gathering data and reporting their objective values for their health, at least when it comes to non-surgical and non-emergency health objectives. But, currently the individual is not equipped with the right tools from the biology and medical world to do this to their best ability; nor do we have the ability to share and report on the results of our experiments; this kind of data is reserved for the healthcare and academic worlds.

the individual is the most well-positioned agent for gathering data and reporting their objective values for their health... [but] is not equipped with the right tools from the biology and medical world to do this to their best ability

Healthcare needs to take a page out of machine learning: people need to be empowered with the ability to take actions directly, report their own health objectives, and iterate, with the help of data - they need to be the key element in the learning loop - any other proxy objective will only ever lead to suboptimal results. It’s time to close the loop.

On engineering

CyberNoodlers

squishy things

and other creative ventures

Closing the biotech/AI loop at the source: the individual

Recent Posts

Comments