3 Things That Google’s Healthcare Artificial Intelligence Desperately Needs

Original Article by Jeffery Alvarez

Google DeepMind recently published a paper in Nature that shared how their predictive algorithm can detect Acute Kidney Injury (AKI) up to 48 hours in advance.

Acute Kidney Injury is the biggest problem you have never heard of.

 In the US, 1 in 5 hospitalized adults gets Acute Kidney Injury, resulting in a 5x risk of death[1].

It takes the lives of over 300,000 Americans every year[2].

Globally, the mortality burden is astronomical and hard to quantify. The most recent analysis by the WHO believes it surpasses 1.7 million as the primary cause of death[3].

 AKI affects everyone. It does not discriminate.

 DeepMind has the right goal; identify & catch AKI before it happens. It would be a game-changer.

The World Health Organization (WHO) agrees, saying that “The timely identification and management of AKI represent the most effective strategy to address the growing global burden sustainably.”[4]

However, AKI is an illness that has found a way to quietly progress and kill millions every year, and we are not catching it today.

A recent poster presented at the National Kidney Foundation’s Spring Clinical Meetings by Dr. Ryann Sohaney analyzed the VA database between 2007-2018, covering over 3.5 million hospitalizations. They found that 25% of the patients met the AKI KDIGO criteria, but only 13.7% were diagnosed[5]

 Will Google DeepMind’s AI do better?

Unfortunately, right now, I’d say the answer is no.

DeepMind’s idea of leveraging AI to help identify AKI early is spot-on. They have been able to train on a large data set and have overcome many of the hurdles needed to deliver something practical.

What lies ahead is the same hurdles that all the other EMR crawling algorithms face;

 Bad Data.

 For any AI system to be successful, it needs to have access to three key things:

  1.  Data that is dependable – Dependability comes down to the acquisition of data as the standard of care. Patient to patient, the same type of data needs to be collected, ideally at the same cadence. When an algorithm only has small snapshots into a patient’s biometrics over various timeframes, the accuracy of the output can be significantly reduced.
  2. Data that is accurate – Accuracy is the ability of a vital measurement in the EMR at any given time, to genuinely reflect reality. Many measures in medicine are entered manually into EMRs. As a result, they can be prone to errors, significant time delays, and even averaging.
  3. Standardized Data Format – The data needs to arrive in an expected format that enables interpretation by the AI.

 In places where these three requirements are fulfilled, we have seen accelerated development and success of AI systems.

 The Medical Futurist put together a fantastic diagram on FDA Approvals for AI where this is illustrated[6].

Figure 1 below shows the value of this alignment. In medical imaging, there is a concentration of AIs, because of DICOM standardization, which ensures that data arrives as expected.

The diagnostic Standard of Care, which ensures CTs are taken for early lung screening, for example, and the amount of development around computer vision AI ensures accuracy. Cardiology follows the same narrative with ECG. They’re dependable, accurate, and standardized.

No alt text provided for this image

 Figure 1: The Medical Futurist, FDA Approvals for AI (With Comments)

The DeepMind data set only meets one of these three requirements; their data is in a standardized format.

The data’s accuracy and dependability are violated because a primary driver of the algorithm’s predictive output is Serum Creatinine.

 In his BBC Radio interview, Dominic King, the Medical Director of DeepMind, talked about Serum Creatinine measurements as one of the biggest challenges surrounding the prediction of AKI[7]. This is due to three things:

  1.  Serum creatinine changes significantly lag behind kidney damage
  2. Serum creatinine changes are multifactorial
  3. Serum creatinine measurements are erratic; a physician must order it

 Let’s unpack these statements a bit more.

 Serum creatinine is a lagging indicator

By the time a meaningful increase in serum creatinine is measured in a blood test, a significant amount of renal damage has already occurred.

Dr. Claudio Ronco described the reason behind this to me as Renal Functional Reserve (RFR).

Our bodies are given two kidneys but can function sufficiently with only one.

We have a built-in reserve to handle fluctuating demands and redundancy to mitigate functional risk. The reserve means that for a functional decline to register in biochemistry, significant damage to the kidney must occur. 

Figure 2 below is from Dr. Ronco’s book, Critical Care Nephrology, illustrates this, and when serum creatinine is impacted.

No alt text provided for this image

Figure 2: Serum creatinine tends to remain relatively normal even in the presence of kidney damage until approximately 50% of nephrons are lost[8]

 A patient’s serum creatinine baseline and fluctuations are multifactorial.

Table 1 illustrates that three very different patient profiles can have the same serum creatinine values. This patient variance has driven the need for computational models MDRD and EPI-CKD, which attempt to nominalize serum creatinine based on other biometrics and translate the amount into an estimated Glomerular Filtration Rate (eGFR).

To me, the scary thing about the 4-variable MDRD Study equation is that it was developed in 1999 using data from only 1628 patients with CKD. The study’s limited statistical power is concerning, and CKD-EPI faces the same challenges.

Table 1: Same Serum Creatinine, Different Patients

No alt text provided for this image

As a result of all of this, the National Kidney Foundation and KDIGO (Kidney Disease Improving Global Outcome) unanimously agree that:

“Serum creatinine alone is not the best way to detect kidney disease especially in the early stages”.

Stepping back to DeepMind, the biggest hurdle ahead for them is the same hurdle that the clinician faces; Serum Creatinine.

The study’s supplementary information outlines the top 10 predictive influencers in the DeepMind model, which is shown in Table 2 below.

Surprisingly, 3 out of the 10 are Serum Creatinine just over different periods. A concerning 4th is the need to have the lab values collected, which will often only be done for the higher-risk patients or patients presenting symptoms.

Table 2: DeepMind Predictive Influencers

No alt text provided for this image

The dependency of any algorithm on Serum Creatinine is subject to the deficiencies that come along with it, which is exceptionally apparent in the study’s Supplementary Data shown in Figure 3 & 4 respectively below.

Figure 3 shows a patient who was admitted and had their serum creatinine taken for the first four days.

It is hard to understand if the Serum Creatinine baseline is 90 or 72 umol/L; did the patient arrive with existing kidney dysfunction?

The patient then did not have their blood tested and serum creatinine value measured for 16 consecutive days!

During that timeframe, the DeepMind algorithm did not escalate the patient’s risk score significantly. The moment the next serum creatinine measurement manually taken from the patient on day 21, and it comes back high; uh-oh, the patient is at high risk.

If the baseline was 72 umol/L, then this patient had a 2x fold increase in serum creatinine and the significant kidney injury it indicates, without algorithm knowing about it.

No alt text provided for this image

Figure 3: My Markup of DeepMind Data, 16 Day Delay Patient

The same issue happens in tighter timeframes. The patient below in Figure 4, has a 40% increase in Serum Creatinine over 72 hours. The patient’s risk profile remains unchanged until that spike in Serum Creatinine is registered on day 5.

No alt text provided for this image

Figure 4: My Markup of DeepMind Data, 72 Hour Delay Patient

The result is a significant 2 to 1 false-positive rate. As Chuck Dinerstein points out in his review Kidney Injury and Artificial Intelligence Still Not Ready for Primetime [9]the majority of false positives in the study were patients already at heightened risk. These are the patients with the highest needed and where bundled management is usually already in place.

So, if we can’t rely on Serum Creatinine, what can we rely on?

 Dr. Kellum from UPMC has dedicated much of the last decade to understanding AKI; how to predict, identify, and monitor patient recovery. In a 32,000 patient retrospective cohort study, Dr. Kellum found that intensive monitoring of urine output provided an increased detection of moderate to severe AKI[10]. He has said:

“it is an absolute necessity for urine output assessment for staging of AKI.”

The international guidelines set by KDIGO support this as well.

So why was the DeepMind AI not powered by Urine Output?

 In a 2015 study published in CHEST, set out to compare AKI diagnosis by UO & Serum Creatinine, Intensive Urine Output data was only available for 25% of the patients[11].

This means that in some of the best Critical Care facilities in the world, diligent, hour-to-hour urine output was only able to be measured in 25% of patients. Measuring UO is manual, time-consuming, and tedious.

 As a result, Urine Output is not an accurate and dependable input, causing it to be omitted from the algorithm.

In the age of AI Assistants, Smart Phones and Self-Driving Cars, our measurements of Urine production are still done with a graduated cylinder

 Where do we go from here?

DeepMind has taken a few steps toward Acute Kidney Injury prediction and has several hurdles remaining. For clinically valid prediction, they must deliver hardware at the point-of-care to change the way we capture data.

With dependable, accurate, and standardized patient data, powering an AI, Acute Kidney Injury does not stand a chance.

I am looking forward to that future.

 About Jeffery Alvarez

Expertise is in identifying, defining, and creating significant clinical value through a holistic design thinking approach that delivers high impact products.

Over the past 17 years, Jeff has helped define, design, manufacture, or launch over 22 different medical devices. Including early product development on Auris Monarch Robotic System and Hansen Vascular System. He has led teams through need-finding, definition, development, and human clinical trials.

Jeff has a BS in Mechanical Engineering from Rensselaer Polytechnic Institute & and MBA from Haas School of Business, UC Berkeley.

References

[1] SusantItaphong et al, CJASN 2013;8:1482-93

[2] Raising awareness of Acute Kidney Injury: a global perspective of a silent killer

  Kidney Int. 2013 Sep;84(3):457-67. doi: 10.1038/ki.2013.153.

[3] Bulletin of the World Health Organization,   2018;96:414-422D. doi: http://dx.doi.org/10.2471/BLT.17.206441

[4] Bulletin of the World Health Organization  2018;96:414-422D. doi:  http://dx.doi.org/10.2471/BLT.17.206441

[5] National Trends in Incidence of AKI Using Consensus Creatinine-Based Criteria Among US Veterans, American Journal of Kidney Diseases, Volume 73, Issue 5, 736

[6] https://medicalfuturist.com/fda-approvals-for-algorithms-in-medicine

[7] https://www.bbc.co.uk/programmes/p07jghsb The ‘Life-saving’ kidney app Best of Today, BBC

[8] Critical Care Nephrology, 3rd edition, Ronco, Bellomo, Kellum & Ricci

[9] Dinerstein, C. (2019, August) Kidney Injury, and Artificial Intelligence Still Not Ready for Primetime, Retrieved from https://www.acsh.org/news/2019/08/05/kidney-injury-and-artificial-intelligence-still-not-ready-prime-time-14201

[10] J Am Soc Nephrol 26: 2231–2238, 2015

[11] http://dx.doi.org/10.1016/j.chest.2017.05.011