Method for Prediction of Artificial Intelligence Model Generalizability for Unseen Data

Medical-based AI systems have seen increased use in recent years across a range of applications (e.g., diagnostics, prognostics, treatment response prediction). Their widespread adoption by the medical community is still restricted, primarily due to their limited ability to realize a high degree of generalizability. As applied to the use of AI systems, the concept of generalizability means the ability of the system to realize stable, dependable, continuity of performance when input data originates from varying geographic locations (institutions), over different historical timeframes, and across a range of methodologic (data acquisition parameters) settings.

The Need

To address current limitations with medical AI system generalizability, and an associated decrease in system performance, research has focused on techniques used to identify “how” to increase performance/realize high generalizability. These investigations have explored use of larger data sets, transfer learning techniques, data augmentation practices, as well as model regularization schemes to improve generalizability. These techniques have been informative, but have only achieved limited success.

The Technology

Unlike previous research on achieving a high level of generalizability by evaluating “how” to increase performance, the developed technique focuses on “when” a high level of generalizability is achieved. The developed technique presents a formulation for an AI system that can predict its generalizability status for unseen data on-the-fly. The method uses a model which has the capacity to map the training data’s fundamental statistical distribution into a standard, multivariate Gaussian distribution. Use of this technique allows the developed model to predict the generalizability status for the unseen data. The developed approach was evaluated on a brain metastases (BM) detection model which was trained using 175 T1c studies (OSU data), and validated using 42 acquired T1c exams, and 72 T1 gradient echo post images (Stanford University). Model results predicted a low generalizability (31%) of the testing data (2 OSU and 33 Stanford studies), where the model produced ~13.5 false positives (FP) at 76.1% BM detection sensitivity for the low, and ~10.5 FP at a BM detection sensitivity for the high generalizability sets, respectively.

Commercial Applications

  • Medical AI systems
  • Autonomous Vehicles
  • Energy, finance, other industrial applications

Benefits/Advantages

  • Real-time confidence indicator on AI systems

Loading icon