Advertisement

Data solidarity for machine learning for embryo selection: a call for the creation of an open access repository of embryo data

Published:March 22, 2022DOI:https://doi.org/10.1016/j.rbmo.2022.03.015

      Abstract

      The last decade has seen an explosion of machine learning applications in healthcare, with mixed and sometimes harmful results despite much promise and associated hype. A significant reason for the reversal in the reported benefit of these applications is the premature implementation of machine learning algorithms in clinical practice. This paper argues the critical need for ‘data solidarity’ for machine learning for embryo selection. A recent Lancet and Financial Times commission defined data solidarity as ‘an approach to the collection, use, and sharing of health data and data for health that safeguards individual human rights while building a culture of data justice and equity, and ensuring that the value of data is harnessed for public good’ (Kickbusch et al., 2021).

      Introduction

      Transparency and reproducibility are key features of the scientific process. Without them it is not possible to validate or build on the work of others. There are two necessary prerequisites: access to the actual, if not the type of, data used in the study and, in the context of machine learning, computational reproducibility (
      • Haibe-Kains B.
      • Adam G.A.
      • Hosny A.
      • Khodakarami F.
      • Shraddha T.
      • Kusko R.
      • Sansone S.A.
      • Tong W.
      • Wolfinger R.D.
      • Mason C.E.
      • Jones W.
      • Dopazo J.
      • Furlanello C.
      • Waldron L.
      • Wang B.
      • McIntosh C.
      • Goldenberg A.
      • Kundaje A.
      • Greene C.S.
      • Broderick T.
      • Hoffman M.M.
      • Leek J.T.
      • Korthauer K.
      • Huber W.
      • Brazma A.
      • Pineau J.
      • Tibshirani R.
      • Hastie T.
      • Ioannidis J.P.A.
      • Quackenbush J.
      • Aerts H.J.W.L.
      Transparency and reproducibility in artificial intelligence.
      ). This paper focuses on the critical need for access to data.

      Developing a machine learning algorithm for embryo selection

      Clinical data from fertility treatment cycles, including outcome data (usually the success or failure of implantation, pregnancy, live birth or a healthy live-born), are combined with measurements of the embryo, usually static or video embryo images that are labelled to identify key points or features. Key steps are accessing the data, training and testing and then validating the machine learning algorithm. The goal of the machine learning process is to construct a predictive model (a formula), which turns the embryo images or measurements into a useful prediction of the outcome.

      Embryo data and known confounders

      The unique situation in IVF studies is that a wide range of factors, both known and unknown, contributes to the end-points of interest. For example, maternal age, the most known confounding factor, has been ignored in some studies investigating the embryonic contribution to treatment outcomes such as implantation or live birth. It is paramount that the datasets used for model development include at least the known confounding factors so that bias is minimized. Indeed, there are increasing data showing that embryo morphokinetics are altered by patient-, treatment- and laboratory-related factors (
      • Liu Y.
      • Feenan K.
      • Chapple V.
      • Matson P.
      Assessing efficacy of Day 3 embryo time-lapse algorithms retrospectively: impacts of dataset type and confounding factors.
      ). Therefore, embryo selection models that include embryonic parameters alone could potentially lead to biased predictions and this demands further emphasis among interested professionals.

      Access to data

      The key issue is whether machine learning algorithms, specifically those developed using deep learning, developed in one population are generalizable to other groups. Machine learning algorithms are known to be subject to bias, through overfitting to the training data and data shifts over time (
      • Wu E.
      • Wu K.
      • Daneshjou R.
      • Ouyang D.
      • Ho D.E.
      • Zou J.
      How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals.
      ). They are further subject to cognitive, automation, technical and other biases (
      • Challen R.
      • Denny J.
      • Pitt M.
      • Gompels L.
      • Edwards T.
      • Tsaneva-Atanasova K.
      Artificial intelligence, bias and clinical safety.
      ). Examples abound of deep learning algorithms that showed promise in the laboratory, but have failed to generalize in clinical practice (
      • Heaven W.D.
      Google's medical AI was super accurate in a lab. Real life was a different story.
      ;
      • Wang X.
      • Liang G.
      • Zhang Y.
      • Blanton H.
      • Bessinger Z.
      Inconsistent Performance of Deep Learning Models on Mammogram Classification.
      ). An absolutely crucial step to reduce the risk of bias and consequent failure in the real-life clinical setting is to validate the machine learning algorithm on an external dataset, independent of the dataset that it was trained on.
      The vast majority of deep learning studies in medicine are not reproducible due to the lack of an open database. Breast cancer is one example, where the only publicly available dataset is DDSM (http://www.eng.usf.edu/cvprg/Mammography/Database.html), which is created from decade-old equipment and has low-quality images that are not labelled well. The more recent databases on breast radiology, such as the InBreast database (
      • Moreira I.
      • Amaral I.
      • Domingues I.
      • Cardoso A.
      • Cardoso M.
      • Cardoso J.
      INbreast: toward a full-field digital mammographic database.
      ), asks users to hand over intellectual property rights before even accessing the database.
      There are ethical issues with allowing access to patient data, relating to consent, privacy and confidentiality even when the data are truly non-identifying (
      • Adibuzzaman M.
      • DeLaurentis P.
      • Hill J.
      • Benneyworth B.D.
      Big data in healthcare – the promises, challenges and opportunities from a research perspective: A case study with a model database.
      ). However, clinical and laboratory data are essential for the ethical development of machine learning-based tools, to ensure that patients benefit and are not harmed, as well as for audit and research. More importantly though, good-quality data are life-saving (or life-creating for patients). Failure to access all available data (including unpublished commercial data) leads to ineffective or harmful interventions continuing to be employed (
      • Savulescu J.
      • Chalmers I.
      • Blunt J.
      Are research ethics committees behaving unethically? Some suggestions for improving performance and accountability.
      ). There is a moral obligation to make data available.
      Sharing data requires effort, time and expense. Data must be ‘cleaned up’, made anonymous, formatted, and corrected as necessary. Furthermore, the data owners (clinics) may be unsure how the data will be used, or worse, misused. They may also be concerned that they are undervaluing the data and might wish to hold out for a (higher) price. The late Hans Rösling coined the phrase ‘database hugging disorder’ to describe this reticence to share data (

      Khokhar, T., 2017. Hugs and databases: in memory of Hans Rosling [WWW Document]. URLhttps://blogs.worldbank.org/opendata/hugs-and-databases-memory-hans-rosling (accessed 2.20.22).

      ).

      Development of an open access, comprehensive repository of embryo images and data

      New paradigms for obtaining, storing and sharing data generally, as advocated and soon to be required by the National Institutes of Health (NIH;
      • Jorgenson L.A.
      • Wolinetz C.D.
      • Collins F.S.
      Incentivizing a New Culture of Data Stewardship.
      ), and specifically for the use of digital technologies (
      • Kickbusch I.
      • Piselli D.
      • Agrawal A.
      • Balicer R.
      • Banner O.
      • Adelhardt M.
      • Capobianco E.
      • Fabian C.
      • Singh Gill A.
      • Lupton D.
      • Medhora R.P.
      • Ndili N.
      • Ryś A.
      • Sambuli N.
      • Settle D.
      • Swaminathan S.
      • Morales J.V.
      • Wolpert M.
      • Wyckoff A.W.
      • Xue L.
      • Bytyqi A.
      • Franz C.
      • Gray W.
      • Holly L.
      • Neumann M.
      • Panda L.
      • Smith R.D.
      • Georges Stevens E.A.
      • Wong B.L.H.
      The Lancet and Financial Times Commission on governing health futures 2030: growing up in a digital world.
      ) are needed.
      An open access comprehensive repository would allow researchers to train and evaluate machine learning algorithms for embryo selection on all kinds of real-world data, data that these models would typically be exposed to, in a wide variety of clinical environments. Therefore, this database should be as inclusive as possible, accepting data from all sources so long as they provide a minimum of information per case: for example, the embryo image, the age of the mother at egg collection, and whether the embryo developed into a live-born baby.
      Data repositories could be achieved under the auspices of government (e.g. the Human Fertilisation and Embryology Authority, although this would be UK only), professional bodies, such as Academy of Clinical Embryologists, or academic institutions, who would oversee the repository, evaluate proposals to access the data, allow access to researchers under licence, and ensure that the data are used ethically, and studies ultimately published. Such a repository will have the added benefit of permitting a comparison of different machine learning algorithms, as well as considerably speeding up the implementation of new, effective algorithms into clinical practice (
      • Kamran F.
      • Tang S.
      • Otles E.
      • McEvoy D.S.
      • Saleh S.N.
      • Gong J.
      • Li B.Y.
      • Dutta S.
      • Liu X.
      • Medford R.J.
      • Valley T.S.
      • West L.R.
      • Singh K.
      • Blumberg S.
      • Donnelly J.P.
      • Shenoy E.S.
      • Ayanian J.Z.
      • Nallamothu B.K.
      • Sjoding M.W.
      • Wiens J.
      Early identification of patients admitted to hospital for covid-19 at risk of clinical deterioration: model development and multisite external validation study.
      ). Professional bodies, as well as the reproductive medicine community itself, should urge the contribution of embryo images and relevant clinical and laboratory information (such as the results of genetic analyses) to this central repository.
      It is crucial at the outset of collecting and storing the data to consider the moral imperative for data sharing, and to build in systems and processes to make data solidarity an inherent attribute of the process.
      The reproductive community can benefit from the experience of other clinical databases that have been established for the purposes of sharing data with researchers, such as Nightingale Science (https://www.nightingalescience.org/), a database of CT scans released by Duke University (https://cvit.duke.edu/resource/database-for-benchmarking-organ-dose-estimates-in-ct/), the MIMIC database (https://lcp.mit.edu/mimic) or neuroimages in the Human Brain Project (https://www.humanbrainproject.eu/en/).
      Key characteristics of these accessible datasets appear to be that the project is driven by visionaries who cross medical and computer science boundaries, and whose primary goal appears to be to contribute to the betterment of medical care; and that there is sufficient funding – for example, the Nightingale Science project is funded by a $2 million grant from Eric Schmidt, the former Google chief executive.

      Premature clinical implementation

      The output of an machine learning algorithm that has been developed and validated internally comprises performance statistics for that algorithm trained and tested on particular datasets (representing particular populations) for predicting a specified outcome. Unfortunately, it is at this point that the algorithm is submitted for clinical-use regulatory approval, and companies start to aggressively market and sell their products (
      • Wu E.
      • Wu K.
      • Daneshjou R.
      • Ouyang D.
      • Ho D.E.
      • Zou J.
      How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals.
      ).
      There are two significant problems with premature implementation. First, biases are rampant in data, and disguised, especially when black-box models are used (

      O'Connor, M., 2021. Algorithm's ‘unexpected’ weakness raises larger concerns about AI's potential in broader populations. [WWW Document]. Healthimaging. URLhttps://www.healthimaging.com/topics/artificial-intelligence/weakness-ai-broader-patient- (accessed 11.12.21).

      ). Second, there are large gaps in the evidence base on the interface of digital technologies and health (
      • Kickbusch I.
      • Piselli D.
      • Agrawal A.
      • Balicer R.
      • Banner O.
      • Adelhardt M.
      • Capobianco E.
      • Fabian C.
      • Singh Gill A.
      • Lupton D.
      • Medhora R.P.
      • Ndili N.
      • Ryś A.
      • Sambuli N.
      • Settle D.
      • Swaminathan S.
      • Morales J.V.
      • Wolpert M.
      • Wyckoff A.W.
      • Xue L.
      • Bytyqi A.
      • Franz C.
      • Gray W.
      • Holly L.
      • Neumann M.
      • Panda L.
      • Smith R.D.
      • Georges Stevens E.A.
      • Wong B.L.H.
      The Lancet and Financial Times Commission on governing health futures 2030: growing up in a digital world.
      ).

      Generating evidence of clinical effectiveness

      As noted, each algorithm provides measures of performance with respect to its own limited dataset. An absolutely crucial step is validating the machine learning algorithm on a large and diverse external dataset.
      After external validation, the product is ready for clinical testing in the field before machine learning tools can confidently, and ethically, be applied to clinical practice (
      • Afnan M.A.M.
      • Rudin C.
      • Conitzer V.
      • Savulescu J.
      • Mishra A.
      • Liu Y.
      • Afnan M.
      Ethical Implementation of Artificial Intelligence to Select Embryos in in Vitro Fertilization.
      ). In view of the uncertainties, it would be wise to adopt a ‘precautionary, mission-orientated and value-based approach’ (
      • Kickbusch I.
      • Piselli D.
      • Agrawal A.
      • Balicer R.
      • Banner O.
      • Adelhardt M.
      • Capobianco E.
      • Fabian C.
      • Singh Gill A.
      • Lupton D.
      • Medhora R.P.
      • Ndili N.
      • Ryś A.
      • Sambuli N.
      • Settle D.
      • Swaminathan S.
      • Morales J.V.
      • Wolpert M.
      • Wyckoff A.W.
      • Xue L.
      • Bytyqi A.
      • Franz C.
      • Gray W.
      • Holly L.
      • Neumann M.
      • Panda L.
      • Smith R.D.
      • Georges Stevens E.A.
      • Wong B.L.H.
      The Lancet and Financial Times Commission on governing health futures 2030: growing up in a digital world.
      ).
      First, clinicians need to be trained on how to use the algorithm, and interpret the output, including understanding levels of uncertainty, so that the whole system (AI + human) performs well (
      • Kickbusch I.
      • Piselli D.
      • Agrawal A.
      • Balicer R.
      • Banner O.
      • Adelhardt M.
      • Capobianco E.
      • Fabian C.
      • Singh Gill A.
      • Lupton D.
      • Medhora R.P.
      • Ndili N.
      • Ryś A.
      • Sambuli N.
      • Settle D.
      • Swaminathan S.
      • Morales J.V.
      • Wolpert M.
      • Wyckoff A.W.
      • Xue L.
      • Bytyqi A.
      • Franz C.
      • Gray W.
      • Holly L.
      • Neumann M.
      • Panda L.
      • Smith R.D.
      • Georges Stevens E.A.
      • Wong B.L.H.
      The Lancet and Financial Times Commission on governing health futures 2030: growing up in a digital world.
      ).
      Then the product needs to be tested in the way it is envisioned that it will be applied in clinical practice, with the patient-desired outcome pre-specified. The gold standard for testing any intervention in the clinic setting is the randomized controlled trial (RCT). Machine learning algorithms are no exception, yet they are rarely tested (
      • Nagendran M.
      • Chen Y.
      • Lovejoy C.A.
      • Gordon A.C.
      • Komorowski M.
      • Harvey H.
      • Topol E.J.
      • Ioannidis J.P.A.
      • Collins G.S.
      • Maruthappu M.
      Artificial intelligence versus clinicians: Systematic review of design, reporting standards, and claims of deep learning studies in medical imaging.
      ).

      Safety concerns

      We have elsewhere argued that, although initially requiring more work, interpretable machine learning, and not black-box machine learning, is safer, and in the long run likely to be more accurate as interpretable models will allow problems to be detected and solved (
      • Afnan M.A.M.
      • Liu Y.
      • Conitzer V.
      • Rudin C.
      • Mishra A.
      • Savulescu J.
      • Afnan M.
      Interpretable, Not Black-Box, Artificial Intelligence Should be Used for Embryo Selection.
      ). This is of particular importance when it comes to embryo selection, since these algorithms may determine which person will be born.
      Once clinical effectiveness has been determined, within a minimum predetermined safety standard, post-implementation surveillance is essential to ensure the safety of this intervention for embryo selection, for both the individual and society, as even small biases can be amplified over generations.

      Case study

      A good example of how a black-box constructed algorithm might have benefitted from being tested in a large external dataset is the Virtus Health device registered as a medical device in Australia (https://www.ebs.tga.gov.au/servlet/xmlmillr6?dbid=ebs/ PublicHTML/pdfStore.nsf&docid=7F7 1610B157CC231CA258687003CBCD3& agid=(PrintDetailsPublic)&actionid=1) ‘to provide clinical decision support for embryo assessment. The device evaluates early embryo development through acquired embryo time lapse images/videos to assist embryo selection during assisted reproduction. The device is intended as an adjunct to clinical decisions. The final assessment and decision shall be made by the embryologist.’ This device predicted with 93% accuracy a fetal heart pregnancy (
      • Tran D.
      • Cooke S.
      • Illingworth P.J.
      • Gardner D.K.
      Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and blastocyst transfer.
      ), and there is currently a non-inferiority RCT registered (https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id= 379161&isReview=true) for this product.
      However, as noted in our previous paper (
      • Afnan M.A.M.
      • Rudin C.
      • Conitzer V.
      • Savulescu J.
      • Mishra A.
      • Liu Y.
      • Afnan M.
      Ethical Implementation of Artificial Intelligence to Select Embryos in in Vitro Fertilization.
      ), the embryos that the algorithm was trained on involved a comparison between good-quality embryos and embryos that were of such poor quality that they would have been discarded and not considered for transfer. Had a large and diverse independent external embryo repository been available, the algorithm developed by Tran and colleagues could have been readily and easily validated, not just for performance statistics between good- and poor-quality embryos in their own dataset, but also in a large and diverse dataset in more clinically relevant situations, giving more confidence, or otherwise, to the researchers, funder and trial participants to investigate this algorithm's effectiveness in an RCT.

      A call to action

      This paper is primarily a call to professional embryology societies, such as the Academy of Clinical Embryologists, which is served by this journal, to develop and oversee access to a large comprehensive database, the Embryo Repository, independent of commercial interests, and monitor the ethical use of the data. Access to such a database could be licensed and paid for, recouping the costs to society and to individual clinics. We call on computer scientists to share source code, allowing others to replicate their work, a key element of scientific rigor, and to allow others to advance the field by building on their work (Buda et al., 2021).
      We also call on clinicians to become knowledgeable about machine learning, and not to be taken in by the hype; on regulators to insist on reliable evidence of clinical effectiveness, and not just performance statistics; and on governments or professional societies to add post-implementation surveillance of machine learning intervention.
      A change in culture is required to make data sharing the rule rather than exception. This change in culture can be encouraged by funders, such as the NIH (
      • Jorgenson L.A.
      • Wolinetz C.D.
      • Collins F.S.
      Incentivizing a New Culture of Data Stewardship.
      ), and journals, such as the Nature Portfolio journals, which state that ‘authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications’ (https://www.nature.com/nature-portfolio/editorial-policies/reporting-standards). We note that this journal (RBMO) ‘encourages’ sharing of data and source code (https://www.rbmojournal.com/content/authorinfo).

      Conclusion

      A comprehensive, diverse, accessible database is essential for robust clinical development of effective machine learning and thus to provide benefit and prevent harm to patients. There is a moral obligation to contribute and share data.
      The Reproductive Medicine and Computer Science communities have the opportunity to lead the way for the ethical implementation of machine learning algorithms in embryo selection, by sharing data and code, thereby safely and rapidly improving patient care.

      References

        • Adibuzzaman M.
        • DeLaurentis P.
        • Hill J.
        • Benneyworth B.D.
        Big data in healthcare – the promises, challenges and opportunities from a research perspective: A case study with a model database.
        AMIA Annu. Symp. Proc. April. 2018; 16: 384-392
        • Afnan M.A.M.
        • Liu Y.
        • Conitzer V.
        • Rudin C.
        • Mishra A.
        • Savulescu J.
        • Afnan M.
        Interpretable, Not Black-Box, Artificial Intelligence Should be Used for Embryo Selection.
        Hum. Reprod. Open. 2021; (https://doi.org/https://doi.org/10.1093/hropen/hoab040)
        • Afnan M.A.M.
        • Rudin C.
        • Conitzer V.
        • Savulescu J.
        • Mishra A.
        • Liu Y.
        • Afnan M.
        Ethical Implementation of Artificial Intelligence to Select Embryos in in Vitro Fertilization.
        in: AIES 2021 - Proc. 2021 AAAI/ACM Conf. AI, Ethics, Soc. 2021: 316-326https://doi.org/10.1145/3461702.3462589
        • Challen R.
        • Denny J.
        • Pitt M.
        • Gompels L.
        • Edwards T.
        • Tsaneva-Atanasova K.
        Artificial intelligence, bias and clinical safety.
        BMJ Qual. Saf. 2019; 28: 231-237https://doi.org/10.1136/bmjqs-2018-008370
        • Haibe-Kains B.
        • Adam G.A.
        • Hosny A.
        • Khodakarami F.
        • Shraddha T.
        • Kusko R.
        • Sansone S.A.
        • Tong W.
        • Wolfinger R.D.
        • Mason C.E.
        • Jones W.
        • Dopazo J.
        • Furlanello C.
        • Waldron L.
        • Wang B.
        • McIntosh C.
        • Goldenberg A.
        • Kundaje A.
        • Greene C.S.
        • Broderick T.
        • Hoffman M.M.
        • Leek J.T.
        • Korthauer K.
        • Huber W.
        • Brazma A.
        • Pineau J.
        • Tibshirani R.
        • Hastie T.
        • Ioannidis J.P.A.
        • Quackenbush J.
        • Aerts H.J.W.L.
        Transparency and reproducibility in artificial intelligence.
        Nature. 2020; 586: E14-E16https://doi.org/10.1038/s41586-020-2766-y
        • Heaven W.D.
        Google's medical AI was super accurate in a lab. Real life was a different story.
        MIT Technol. Rev. 2020; ([WWW Document])
        • Jorgenson L.A.
        • Wolinetz C.D.
        • Collins F.S.
        Incentivizing a New Culture of Data Stewardship.
        Jama. 2021; 20814: 1-2https://doi.org/10.1001/jama.2021.20489
        • Kamran F.
        • Tang S.
        • Otles E.
        • McEvoy D.S.
        • Saleh S.N.
        • Gong J.
        • Li B.Y.
        • Dutta S.
        • Liu X.
        • Medford R.J.
        • Valley T.S.
        • West L.R.
        • Singh K.
        • Blumberg S.
        • Donnelly J.P.
        • Shenoy E.S.
        • Ayanian J.Z.
        • Nallamothu B.K.
        • Sjoding M.W.
        • Wiens J.
        Early identification of patients admitted to hospital for covid-19 at risk of clinical deterioration: model development and multisite external validation study.
        BMJ. 2022; e068576https://doi.org/10.1136/bmj-2021-068576
      1. Khokhar, T., 2017. Hugs and databases: in memory of Hans Rosling [WWW Document]. URLhttps://blogs.worldbank.org/opendata/hugs-and-databases-memory-hans-rosling (accessed 2.20.22).

        • Kickbusch I.
        • Piselli D.
        • Agrawal A.
        • Balicer R.
        • Banner O.
        • Adelhardt M.
        • Capobianco E.
        • Fabian C.
        • Singh Gill A.
        • Lupton D.
        • Medhora R.P.
        • Ndili N.
        • Ryś A.
        • Sambuli N.
        • Settle D.
        • Swaminathan S.
        • Morales J.V.
        • Wolpert M.
        • Wyckoff A.W.
        • Xue L.
        • Bytyqi A.
        • Franz C.
        • Gray W.
        • Holly L.
        • Neumann M.
        • Panda L.
        • Smith R.D.
        • Georges Stevens E.A.
        • Wong B.L.H.
        The Lancet and Financial Times Commission on governing health futures 2030: growing up in a digital world.
        Lancet. 2021; 398: 1727-1776https://doi.org/10.1016/s0140-6736(21)01824-9
        • Liu Y.
        • Feenan K.
        • Chapple V.
        • Matson P.
        Assessing efficacy of Day 3 embryo time-lapse algorithms retrospectively: impacts of dataset type and confounding factors.
        Hum. Fertil. 2019; 22: 182-190
        • Moreira I.
        • Amaral I.
        • Domingues I.
        • Cardoso A.
        • Cardoso M.
        • Cardoso J.
        INbreast: toward a full-field digital mammographic database.
        Acad Radiol. 2012; 19: 236-248https://doi.org/10.1016/j.acra.2011.09.014
        • Nagendran M.
        • Chen Y.
        • Lovejoy C.A.
        • Gordon A.C.
        • Komorowski M.
        • Harvey H.
        • Topol E.J.
        • Ioannidis J.P.A.
        • Collins G.S.
        • Maruthappu M.
        Artificial intelligence versus clinicians: Systematic review of design, reporting standards, and claims of deep learning studies in medical imaging.
        BMJ. 2020; 368: 1-12https://doi.org/10.1136/bmj.m689
      2. O'Connor, M., 2021. Algorithm's ‘unexpected’ weakness raises larger concerns about AI's potential in broader populations. [WWW Document]. Healthimaging. URLhttps://www.healthimaging.com/topics/artificial-intelligence/weakness-ai-broader-patient- (accessed 11.12.21).

        • Savulescu J.
        • Chalmers I.
        • Blunt J.
        Are research ethics committees behaving unethically? Some suggestions for improving performance and accountability.
        BMJ. 1996; 313: 1390-1393
        • Tran D.
        • Cooke S.
        • Illingworth P.J.
        • Gardner D.K.
        Deep learning as a predictive tool for fetal heart pregnancy following time-lapse incubation and blastocyst transfer.
        Hum. Reprod. 2019; ([Comment in: Hum Reprod. 2020 Feb 29;35(2):482; PMID: 32053171 [https://www.ncbi.nlm.nih.gov/pubmed/32053171]][Comment in: Hum Reprod. 2020 Feb 29;35(2):483; PMID: 32053191 [https://www.ncbi.nlm.nih.gov/pubmed/32053191]] 34, 1011–1018https://doi.org/https://dx.doi.org/10.1093/humrep/dez064)
        • Wang X.
        • Liang G.
        • Zhang Y.
        • Blanton H.
        • Bessinger Z.
        Inconsistent Performance of Deep Learning Models on Mammogram Classification.
        J. Am. Coll. Radiol. 2020; 17: 796-803https://doi.org/10.1016/j.jacr.2020.01.006
        • Wu E.
        • Wu K.
        • Daneshjou R.
        • Ouyang D.
        • Ho D.E.
        • Zou J.
        How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals.
        Nat. Med. 2021; 27 (https://doi.org/https://doi.org/10.1038/s41591-021-01312-x): 582-584