Advertisement
Article| Volume 45, ISSUE 6, P1152-1159, December 2022

An interpretable machine learning model for individualized gonadotrophin starting dose selection during ovarian stimulation

Open AccessPublished:July 28, 2022DOI:https://doi.org/10.1016/j.rbmo.2022.07.010

      Abstract

      Research question

      Can we develop an interpretable machine learning model that optimizes starting gonadotrophin dose selection in terms of mature oocytes (metaphase II [MII]), fertilized oocytes (2 pronuclear [2PN]) and usable blastocysts?

      Design

      This was a retrospective study of patients undergoing autologous IVF cycles from 2014 to 2020 (n = 18,591) in three assisted reproductive technology centres in the USA. For each patient cycle, an individual dose–response curve was generated from the 100 most similar patients identified using a K-nearest neighbours model. Patients were labelled as dose-responsive if their dose–response curve showed a region that maximized MII oocytes, and flat-responsive otherwise.

      Results

      Analysis of the dose–response curves showed that 30% of cycles were dose-responsive and 64% were flat-responsive. After propensity score matching, patients in the dose-responsive group who received an optimal starting dose of FSH had on average 1.5 more MII oocytes, 1.2 more 2PN embryos and 0.6 more usable blastocysts using 10 IU less of starting FSH and 195 IU less of total FSH compared with patients given non-optimal doses. In the flat-responsive group, patients who received a low starting dose of FSH had on average 0.3 more MII oocytes, 0.3 more 2PN embryos and 0.2 more usable blastocysts using 149 IU less of starting FSH and 1375 IU less of total FSH compared with patients with a high starting dose.

      Conclusions

      This study demonstrates retrospectively that using a machine learning model for selecting starting FSH can achieve optimal laboratory outcomes while reducing the amount of starting and total FSH used.

      KEYWORDS

      Introduction

      Ovarian stimulation during IVF cycles involves the use of exogenous gonadotrophins to promote multifollicular development. When planning the stimulation protocol, one of the key decisions is what starting dose of FSH to use. Too low a starting dose of FSH may lead to inadequate follicle recruitment and cycle cancellation, while too much FSH may lead to an excessive response. The choice of starting FSH is often individualized to patients based on their age and ovarian reserve, as well as the observed response to FSH in previous cycles. Ultimately, the goal is to select a cost-effective starting dose that will yield an optimal number of oocytes while reducing the risks of cancellation and hyperstimulation.
      The relationship between FSH and ovarian response is complex, and FSH dose–response studies for women undergoing IVF have been limited by sample size and other factors (
      • Lensen S.F.
      • Wilkinson J.
      • Leijdekkers J.A.
      • la Marca A.
      • Mol B.W.J.
      • Marjoribanks J.
      • Torrance H.
      • Broekmans F.J.
      Individualised gonadotropin dose selection using markers of ovarian reserve for women undergoing in vitro fertilisation plus intracytoplasmic sperm injection (IVF/ICSI).
      ). Some studies have reported a positive relationship between FSH and oocytes retrieved, but with conflicting evidence of whether higher doses are beneficial or detrimental to outcomes (
      • Arce J.C.
      • Nyboe Andersen A.
      • Fernández-Sánchez M.
      • Visnova H.
      • Bosch E.
      • García-Velasco J.A.
      • Barri P.
      • de Sutter P.
      • Klein B.M.
      • Fauser B.C.J.M.
      Ovarian response to recombinant human follicle-stimulating hormone: a randomized, antimüllerian hormone–stratified, dose–response trial in women undergoing in vitro fertilization/intracytoplasmic sperm injection.
      ;
      • Berkkanoglu M.
      • Ozgur K.
      What is the optimum maximal gonadotropin dosage used in microdose flare-up cycles in poor responders?.
      ;
      • Olivennes F.
      • Howies C.M.
      • Borini A.
      • Germond M.
      • Trew G.
      • Wikland M.
      • Zegers-Hochschild F.
      • Saunders H.
      • Alam V.
      Individualizing FSH dose for assisted reproduction using a novel algorithm: the CONSORT study.
      ;
      • Pal L.
      • Jindal S.
      • Witt B.R.
      • Santoro N.
      Less is more: increased gonadotropin use for ovarian stimulation adversely influences clinical pregnancy and live birth after in vitro fertilization.
      ). Retrospective studies using the Society for Assisted Reproductive Technology database have shown a negative relationship between total FSH and live birth rates (
      • Baker V.L.
      • Brown M.B.
      • Luke B.
      • Smith G.W.
      • Ireland J.J.
      Gonadotropin dose is negatively correlated with live birth rate: analysis of more than 650,000 assisted reproductive technology cycles.
      ) as well as oocytes retrieved (
      • Clark Z.L.
      • Thakur M.
      • Leach R.E.
      • Ireland J.J.
      FSH dose is negatively correlated with number of oocytes retrieved: analysis of a data set with ∼650,000 ART cycles that previously identified an inverse relationship between FSH dose and live birth rate.
      ). While these analyses were confounded by the fact that poor-prognosis patients are often given higher doses of FSH, the negative correlation was observed for all subgroups including good-prognosis patients (
      • Clark Z.L.
      • Thakur M.
      • Leach R.E.
      • Ireland J.J.
      FSH dose is negatively correlated with number of oocytes retrieved: analysis of a data set with ∼650,000 ART cycles that previously identified an inverse relationship between FSH dose and live birth rate.
      ). A potential limitation of prior studies may be the linear models used, which cannot capture non-linear relationships between FSH and ovarian response. Whether or not higher doses of FSH lead to improved outcomes remains unclear, and further studies on this topic are warranted. In practice, FSH dosing is subjective and can vary significantly between different doctors, clinics and countries, with no universal framework for determining patient dosages.
      Recently, the field of assisted reproduction has investigated the application of artificial intelligence (AI) techniques to support clinical decision making during ovarian stimulation. One of the first studies in this area used machine learning algorithms to automate decisions such as whether to continue or cancel stimulation and whether or not to adjust doses (
      • Letterie G.
      • mac Donald A.
      Artificial intelligence in in vitro fertilization: a computer decision support system for day-to-day management of ovarian stimulation during in vitro fertilization.
      ). More recent studies have developed machine learning techniques to optimize the day of trigger and achieve more metaphase II (MII) oocytes, 2 pronuclear (2PN) embryos and blastocysts (
      • Fanton M.
      • Nutting V.
      • Solano F.
      • Maeder-York P.
      • Hariton E.
      • Barash O.
      • Weckstein L.
      • Sakkas D.
      • Copperman A.B.
      • Loewke K.
      An interpretable machine learning model for predicting the optimal day of trigger during ovarian stimulation.
      ;
      • Hariton E.
      • Chi E.A.
      • Chi G.
      • Morris J.R.
      • Braatz J.
      • Rajpurkar P.
      • Rosen M.
      A machine learning algorithm can optimize the day of trigger to improve in vitro fertilization outcomes.
      ). Other studies have designed machine learning models to streamline follicle monitoring and patient visits (
      • Robertson I.
      • Chmiel F.P.
      • Cheong Y.
      Streamlining follicular monitoring during controlled ovarian stimulation: a data-driven approach to efficient IVF care in the new era of social distancing.
      ). While these studies have demonstrated the potential of using machine learning for clinical decision support, there has been a lack of studies focused specifically on optimizing the selection of the starting FSH dose. Adoption of a machine learning tool for starting dose selection would represent a step towards selecting FSH doses using a more quantitative and standardized approach.
      This study presents a novel machine learning approach for the individualized selection of starting FSH dose during ovarian stimulation. The approach is based on the concept of patient similarity using the K-nearest neighbours algorithm, a non-linear yet interpretable machine learning model that retrieves the K most similar instances (i.e. patients) to the input. For a patient-of-interest, a group of similar patients was identified based on baseline parameters, and those similar patients were then used to create an individualized dose–response curve relating the starting FSH dose to the mature oocytes retrieved. This machine learning approach is interpretable because the creation of the dose curve and its predictions can be easily understood by a clinician. A key component of this approach is having a large and diverse dataset that allows the observation of outcomes for a range of different starting FSH doses. The hypothesis underlying this study was that using a patient similarity model for selecting the starting FSH dose might achieve optimal laboratory outcomes while reducing the amount of starting and total FSH given to the patient, which could help to lower the costs associated with IVF.

      Materials and methods

      Ethics approval

      This study was conducted following the research protocol approved by WCG IRB on 5 May 2021 (study no. 1308073). Patient information was de-identified before analysis.

      Study design and patients

      This was a retrospective study using data collected from three different IVF clinics in the USA. Historical, de-identified electronic medical record data were collected for IVF retrieval cycles started between 2014 and 2020. Records were filtered for autologous, non-cancelled cycles. A total of 18,591 cycles were included in this study. An overview of the study design and modelling approach is shown in Figure 1.
      Figure 1
      Figure 1Overview of the data sources and modelling approach. A total of 18,591 cycles were collected from three sites. For each patient, a K-nearest neighbours model trained on baseline fertility metrics was used to identify the 100 most similar patients. Using these similar patients, a dose–response curve was created by plotting the starting FSH dose against the metaphase II egg yield. AFC, antral follicle count; AMH, anti-Müllerian hormone; BMI, body mass index.

      Data preparation

      Data for training and testing the models were parsed from the electronic medical record records. Cycles were excluded if they were missing any of the following input parameters or outcomes: age, body mass index (BMI), baseline anti-Müllerian hormone (AMH), baseline antral follicle count (AFC), oocytes retrieved, MIIs oocytes or fertilized (2PN) embryos. Furthermore, cycles with apparent data entry errors were excluded, such as cycles where the number of MII oocytes exceeded the number of oocytes retrieved.
      The number of usable blastocysts was defined as the total number of blastocysts transferred plus those frozen. In the dataset, frozen embryo transfers (FET) were not directly linked to a specific retrieval, and thus the cumulative live birth rate (CLBR) was estimated by assuming that each FET was associated with a patient's most recent retrieval cycle. The CLBR was defined as at least one live birth resulting from all embryo transfers. Cycles from all stimulation protocols were included in the study. Following data preparation, there were 1229 cycles from site 1, 11,233 cycles from site 2 and 6129 cycles from site 3.

      KNN model development

      A KNN regression model was developed to predict the number of MII oocytes retrieved using age, BMI, baseline AMH, and baseline AFC as inputs. AMH and AFC were transformed by square root to better normalize their distribution. Each input parameter was standardized by subtracting by the average value and dividing by the standard deviation of the parameter across the dataset. Performance was evaluated using a 5-fold cross-validation across all cycles. Once the optimal hyperparameters to maximize the performance of the KNN model were established, the KNN model was used for identifying similar patients in order to create the dose–response curves.

      Creation of dose–response curves

      Patient-specific dose–response curves were created by using the KNN model to identify the most similar neighbours to a patient-of-interest. Curves were generated by fitting a constrained second-order polynomial to the number of MII oocytes relative to the starting dose of FSH across all of the neighbours. Upper and lower 95th percentile confidence bands were added to the curve.
      The starting dosage of FSH per patient was calculated as the sum of the pure FSH (e.g. Gonal-F or Follistim) plus the FSH component of the FSH/LH medication (e.g. Menopur) taken on the first day of treatment, measured in International Units (IU). Of all the cycles, 13% used only pure FSH, while 87% of cycles used a combination of FSH and LH. High leverage data points, defined as doses of FSH below the 1st percentile or above the 99th percentile, were excluded to prevent skewing of the dose curve at very high and very low dosages. An optimal dose region was defined as the window of dosages where the dose curve showed the maximum response in terms of MII oocytes retrieved. The width of the optimal dose region was set to doses that resulted in less than a 1-egg difference from the peak of the curve.

      Expected benefit from using the dose model

      For each patient, the individual dose–response curve was used to determine whether there was an optimal dose that maximized the prediction of MII oocytes (dose-responsive patients) or whether the dose–response curve showed no optimal dose (flat-responsive patients) (Figure 2). Dose-responsive patients were then categorized as having received either an optimal or a non-optimal dose based on the prediction of the model. Flat-responsive patients were categorized as having received a high or low dose of FSH by comparison with the median value of their neighbours.
      Figure 2
      Figure 2Approach for evaluating the expected benefit of the model. A dose–response curve was created for each patient in the dataset. Patients were classified as either dose-responsive or flat-responsive based on the shape of the dose–response curve. For dose-responsive patients, those given an optimal starting dose were propensity matched to patients given a non-optimal dose, and average outcomes were compared. For flat-responsive patients, those given a low dose were propensity matched to patients given a high dose, and average outcomes were compared.
      Propensity matching was used when comparing between categories by using logistic regression trained on the patient's age, BMI, baseline AMH, and baseline AFC. Outcomes were then evaluated between propensity-matched groups in terms of average starting FSH, total FSH, MII oocytes, 2PN embryos and usable blastocysts in order to calculate expected benefits. In addition, cost savings estimates were performed between groups using the average differences in total FSH.

      Results

      The best performing KNN model was obtained using K = 100 neighbours and the Manhattan distance metric, where the similarity between any two patients was calculated using the sum of the absolute differences for each normalized input parameter. In terms of predicting MII oocytes, the KNN had a mean absolute error of 3.79 mature oocytes and an R2 of 0.45.

      Patient-specific dose–response curves identify two types of response to starting FSH doses

      For a given patient, the individual dose–response curve was generated using the 100 most similar patients identified using the KNN model (Figure 3). Using these curves, patients were labelled as dose-responsive if the optimal region of their dose–response curve showed a range of ≤175 IU of FSH, and flat-responsive otherwise. Across all the cycles, 30% of cycles were identified as dose-responsive and 64% were flat-responsive. The remaining 6% of patients showed an optimal dose region that was not well supported by the nearest neighbours (with ≤5% of neighbours in the optimal region) and were considered not reliable enough for further analysis. Additional examples of dose–response curves are shown in Supplementary Figure 1.
      Figure 3
      Figure 3Example of using the 100 nearest neighbours to create patient-specific dose–response curves. Neighbours are identified based on similar age, body mass index (BMI), anti-Müllerian hormone (AMH) concentration and antral follicle count (AFC) (left). Patients typically show a dose-responsive profile (middle) or flat-responsive profile (right). The dose–response curve (purple) shows the relationship between the starting FSH dose and the number of metaphase II (MII) oocytes retrieved, with confidence intervals shaded. Data points represent the MII oocytes retrieved (mean ± standard deviation) among all neighbours given that starting dose.

      Dose-responsive patients can achieve more MII oocytes without increasing FSH

      Dose-responsive patients who received an optimal starting FSH dose had a mean age of 34.99 years, BMI of 24.74 kg/m2, baseline AMH of 4.26 ng/ml and baseline AFC of 18.68. Dose-responsive patients who received a non-optimal starting dose of FSH had a mean age of 35.60 years, BMI of 24.80 kg/m2, baseline AMH of 3.97 ng/ml and baseline AFC of 17.81.
      Before propensity score matching, patients who received an optimal starting dose of FSH had on average 2.4 more MII oocytes, 1.9 more 2PN embryos and 1.2 more usable blastocysts using 38 IU less of starting FSH and 550 IU less of total FSH compared with patients who received a non-optimal dose. The small differences in baseline parameters between patients given optimal and non-optimal starting doses indicated that the propensity matching might be appropriate. After propensity score matching, patients who received an optimal starting dose of FSH had on average 1.5 more MII oocytes, 1.2 more 2PN embryos and 0.6 more usable blastocysts using 10 IU less of starting FSH and 195 IU less of total FSH compared with propensity-matched patients with non-optimal doses (Table 1). The differences in MII oocytes, 2PN embryos and usable blastocysts were all statistically significant (Bonferroni-corrected P-values <0.001).
      Table 1Patient characteristics and stimulation outcomes for dose-responsive patients
      ParameterPatients dose-responsive to starting FSH
      Patients given optimal doseMatched patients given non-optimal doseDelta (optimal minus non-optimal)
      Sample size (n)20612061N/A
      Age (years)34.9934.790.20
      BMI (kg/m2)24.7425.09–0.35
      Baseline AMH (ng/ml)4.264.200.06
      Baseline AFC (n)18.6819.11–0.43
      MII oocytes (n)14.913.41.5
      For the comparison of mean differences, P-values <0.001 with Bonferroni correction. 2PN, 2 pronuclear; AFC, antral follicle count; AMH, anti-Mullerian hormone; BMI, body mass index; MII, metaphase II; N/A, not applicable.
      2PN embryos (n)11.510.31.2
      For the comparison of mean differences, P-values <0.001 with Bonferroni correction. 2PN, 2 pronuclear; AFC, antral follicle count; AMH, anti-Mullerian hormone; BMI, body mass index; MII, metaphase II; N/A, not applicable.
      Usable blastocysts (n)5.85.20.6
      For the comparison of mean differences, P-values <0.001 with Bonferroni correction. 2PN, 2 pronuclear; AFC, antral follicle count; AMH, anti-Mullerian hormone; BMI, body mass index; MII, metaphase II; N/A, not applicable.
      Starting FSH (IU)310320–10
      Total FSH (IU)35553750–195
      For the comparison of mean differences, P-values <0.001 with Bonferroni correction. 2PN, 2 pronuclear; AFC, antral follicle count; AMH, anti-Mullerian hormone; BMI, body mass index; MII, metaphase II; N/A, not applicable.
      Patients with an optimal dose were propensity matched to patients given a non-optimal dose.
      low asterisk For the comparison of mean differences, P-values <0.001 with Bonferroni correction.2PN, 2 pronuclear; AFC, antral follicle count; AMH, anti-Mullerian hormone; BMI, body mass index; MII, metaphase II; N/A, not applicable.

      Flat-responsive patients can achieve a comparable number of MII oocytes with significantly less FSH

      Flat-responsive patients who received a low starting dose of FSH had a mean age of 36.68 years, BMI of 25.45 kg/m2, baseline AMH of 2.36 ng/ml and baseline AFC of 14.30. Flat-responsive patients who received a high starting dose of FSH had a mean age of 38.11 years, BMI of 25.60 kg/m2, baseline AMH of 1.56 ng/ml and baseline AFC of 11.22.
      Before propensity score matching, patients who received a low starting dose of FSH had on average 2.6 more MII oocytes, 2.1 more 2PN embryos and 1.1 more blastocysts using 175 IU less of FSH and 1875 IU less of total FSH compared with patients who received a high starting dose. The statistically significant differences in baseline parameters between patients given high and low starting doses indicated that the propensity matching was appropriate. After propensity score matching, patients who received a low starting dose of FSH had, on average, 0.3 more MII oocytes, 0.3 more 2PN embryos and 0.2 more usable blastocysts using 149 IU less of FSH and 1375 IU less of total FSH compared with propensity-matched patients with a high starting dose (Table 2). The differences in starting and total FSH were both statistically significant (Bonferroni-corrected P-values <0.001).
      Table 2Patient characteristics and stimulation outcomes for flat-responsive patients
      ParameterPatients with a flat response to starting FSH
      Patients given low doseMatched patients given high doseDelta (low minus high)
      Sample size (n)27672767N/A
      Age (years)36.6836.530.15
      BMI (kg/m2)25.4525.210.24
      Baseline AMH (ng/ml)2.362.330.03
      Baseline AFC (n)14.3014.42–0.12
      MII oocytes (n)10.810.50.3
      2PN embryos (n)8.38.00.3
      Usable blastocysts (n)4.03.80.2
      Starting FSH (IU)282431–149
      For the comparison of mean differences, P-values <0.001 with Bonferroni correction. 2PN, 2 pronuclear; AFC, antral follicle count; AMH, anti-Mullerian hormone; BMI, body mass index; MII, metaphase II; N/A, not applicable.
      Total FSH (IU)36255000–1375*
      Patients with low doses (lower 50th percentile of doses for their neighbours) were propensity matched to patients given high doses (upper 50th percentile of doses for their neighbours).
      low asterisk For the comparison of mean differences, P-values <0.001 with Bonferroni correction.2PN, 2 pronuclear; AFC, antral follicle count; AMH, anti-Mullerian hormone; BMI, body mass index; MII, metaphase II; N/A, not applicable.

      Discussion

      This study is one of the first to use a machine learning model to optimize the starting dose of FSH during ovarian stimulation. By applying dose–response curves for selecting the starting FSH dose, it was shown that patients’ doses can be optimized to achieve optimal laboratory outcomes while reducing the amount of starting and total FSH used, potentially also lowering the costs associated with IVF.
      The KNN model relies upon having a large and diverse dataset, so that the nearest neighbours are as similar as possible to the patient-of-interest while also having different starting doses of FSH. For example, if all of the training data were collected from a single physician's patients, the data might show that all 100 similar patients were given the same starting dose of FSH, which would preclude the generation of a dose–response curve. The dataset was collected from three clinical sites, with over 18,000 total cycles across 7 years, which the authors believe is one of the largest assembled for this specific application. As shown in Supplementary Figure 1, the distributions of patient age, BMI, AFC and AMH retrieved are very similar between the three clinics, while the FSH dose varies more significantly.
      The results indicate that 30% of cycles in the dataset were dose-responsive to different levels of starting FSH and 64% were flat-responsive. In the analysis, a change of at least one MII oocyte within a dose range of less than 175 IU was used to classify a cycle as dose-responsive. A variety of other cut-offs were also evaluated for selecting the optimal dose region, as shown in Supplementary Tables 1 and 2. Using a stricter cut-off for the width of the optimal region would result in a higher expected benefit for dose-responsive patients, but also a reduction in the number of cycles classified as dose-responsive. In practice, the authors believe that this classification into dose-responsive versus flat-responsive may not be necessary to perform, as it would be up to the physician to evaluate the dose–response curve as a continuum and decide on the most appropriate starting dose for a particular patient. However, it was important to perform for this study as it allowed the calculation of an expected benefit for different types of response.
      Although the dose-responsive group had on average better-prognosis patients than the flat-responsive group (age 36.68 versus 34.99 years, and baseline AMH 4.26 versus 2.36 ng/ml), there were a variety of good responders and poor responders in each group (Supplementary Figure 2). In other words, whether a patient was classified as dose-responsive or flat-responsive was not simply a reflection of good or poor prognosis. Both groups showed a potential benefit of using the machine learning model for selecting the starting FSH dose. In the dose-responsive group, the retrospective analysis showed that patients did slightly better in terms of MII oocytes, 2PN embryos and usable blastocysts when they were given a dose in the optimal region. Importantly, these improvements in outcomes were achieved with slightly less starting and total FSH. This indicates that the model is not simply recommending higher concentrations of FSH in order to maximize outcomes. In the flat-responsive group, patients given a low starting dose of FSH had comparable numbers of MII oocytes, 2PN embryos and usable blastocysts compared with patients given higher doses, suggesting that those outcomes can be achieved with significantly less starting and total FSH.
      In the dose-responsive group of patients, the model often showed a non-linear response to starting FSH, where the number of MII oocytes is predicted to increase as FSH increases up until a point of maximal response, beyond which predicted MIIs can either plateau or start to decline. This result may seem counterintuitive; however, similar trends have been seen in previous studies where the highest doses of FSH led to fewer oocytes retrieved (
      • Olivennes F.
      • Howies C.M.
      • Borini A.
      • Germond M.
      • Trew G.
      • Wikland M.
      • Zegers-Hochschild F.
      • Saunders H.
      • Alam V.
      Individualizing FSH dose for assisted reproduction using a novel algorithm: the CONSORT study.
      ). In addition, studies in cattle have confirmed that the maximal response to superovulation has a plateau, at which point higher doses of FSH negatively impact oocyte and embryo outcomes (
      • Baker V.L.
      • Brown M.B.
      • Luke B.
      • Smith G.W.
      • Ireland J.J.
      Gonadotropin dose is negatively correlated with live birth rate: analysis of more than 650,000 assisted reproductive technology cycles.
      ). There are other possible explanations for these trends as well. One possibility is that patients who were given very high starting doses of FSH may have been triggered early due to rapidly rising estradiol concentrations, thereby resulting in fewer MII oocytes. It is also possible that higher doses of FSH are prescribed less often and that those regions of the dose–response curve are not as well supported by the nearest neighbour data.
      In the flat-responsive group of patients, that model predicted that comparable outcomes can be achieved using significantly less FSH than is often prescribed. Lowering the cost of medication during IVF could make costs more affordable both for individuals paying themselves as well as for benefits providers. This is especially important given that poor-prognosis patients often require multiple cycles. Previous studies have reported that using lower doses of FSH could allow providers to cover more cycles of IVF within their budget constraints (
      • Ledger W.
      • Wiebinga C.
      • Anderson P.
      • Irwin D.
      • Holman A.
      • Lloyd A.
      Costs and outcomes associated with IVF using recombinant FSH.
      ). Across all patients in the flat-responsive group, those receiving a lower-50th percentile dose of FSH needed on average a total of 1375 IU less than those receiving an upper-50th percentile dose, with comparable numbers of MII and 2PN embryos and usable blastocysts. In the US market, a savings of 1375 IU would correspond to approximately $2100 saved, assuming an average cost of $1.50/IU (
      • Robins J.C.
      • Khair A.F.
      • Widra E.A.
      • Alper M.M.
      • Nelson W.W.
      • Foster E.D.
      • Sinha A.
      • Ando M.
      • Heiser P.W.
      • Daftary G.S.
      Economic evaluation of highly purified human menotropin or recombinant follicle-stimulating hormone for controlled ovarian stimulation in high-responder patients: analysis of the Menopur in Gonadotropin-releasing Hormone Antagonist Single Embryo Transfer–High Responder (MEGASET-HR) trial.
      ). Given the high prevalence of flat-responsive patients in the current dataset, such savings could potentially be achieved by a large percentage of IVF patients.
      Prior studies have introduced nomograms (
      • la Marca A.
      • Papaleo E.
      • Grisendi V.
      • Argento C.
      • Giulini S.
      • Volpe A.
      Development of a nomogram based on markers of ovarian reserve for the individualisation of the follicle-stimulating hormone starting dose in in vitro fertilisation cycles.
      ;
      • Li Y.
      • Duan Y.
      • Yuan X.
      • Cai B.
      • Xu Y.
      • Yuan Y.
      A Novel Nomogram for Individualized Gonadotropin Starting Dose in GnRH Antagonist Protocol.
      ) or recommendation tables (
      • Nyboe Andersen A.
      • Nelson S.M.
      • Fauser B.C.J.M.
      • García-Velasco J.A.
      • Klein B.M.
      • Arce J.C.
      • Tournaye H.
      • de Sutter P.
      • Decleer W.
      • Petracco A.
      • Borges E.
      • Barbosa C.P.
      • Havelock J.
      • Claman P.
      • Yuzpe A.
      • Višnová H.
      • Ventruba P.
      • Uher P.
      • Mrazek M.
      • Andersen A.N.
      • Knudsen U.B.
      • Dewailly D.
      • Leveque A.G.h.
      • la Marca A.
      • Papaleo E.
      • Kuczynski W.
      • Kozioł K.
      • Anshina M.
      • Zazerskaya I.
      • Gzgzyan A.
      • Bulychova E.
      • Verdú V.
      • Barri P.
      • García-Velasco J.A.
      • Fernández-Sánchez M.
      • Martin F.S.
      • Bosch E.
      • Serna J.
      • Castillon G.
      • Bernabeu R.
      • Ferrando M.
      • Lavery S.
      • Gaudoin M.
      • Fauser B.C.J.M.
      • Klein B.M.
      • Helmgaard L.
      • Mannaerts B.
      • Arce J.C.
      Individualized versus conventional ovarian stimulation for in vitro fertilization: a multicenter, randomized, controlled, assessor-blinded, phase 3 noninferiority trial.
      ;
      • Yovich J.L.
      • Alsbjerg B.
      • Conceicao J.L.
      • Hinchliffe P.M.
      • Keane K.N.
      PIVET rFSH dosing algorithms for individualized controlled ovarian stimulation enables optimized pregnancy productivity rates and avoidance of ovarian hyperstimulation syndrome. Drug Design.
      ) for selecting individualized doses of starting FSH based on markers of ovarian reserve, such as age, AFC and AMH. However, these tools assume a linear response to FSH and do not capture the non-linear dose–response relationships that were observed here for many patients. Also, these tools are designed to recommend a starting dose that achieves an optimal number of oocytes (e.g. 10), whereas the model in this study predicts a patient's response over a range of starting FSH doses, leaving the decision of what is optimal up to the clinician.
      When creating the dose–response curves, this study explored different outcomes such as oocytes retrieved, MII oocytes, 2PN embryos and usable blastocysts. Although the goal of a successful IVF cycle is a healthy live birth, it was not possible with this study's data to calculate an accurate CLBR for each retrieval cycle. However, with some assumptions around linking FET to IVF retrievals, it was possible to estimate the aggregated CLBR across all patients (Supplementary Figure 3). It was found that the number of oocytes retrieved, MII oocytes and 2PN embryos all had a positive correlation with live birth rates, which agrees with prior studies (
      • Hariton E.
      • Kim K.
      • Mumford S.L.
      • Palmor M.
      • Bortoletto P.
      • Cardozo E.R.
      • Karmon A.E.
      • Sabatini M.E.
      • Styer A.K.
      Total number of oocytes and zygotes are predictive of live birth pregnancy in fresh donor oocyte in vitro fertilization cycles.
      ). The models showed the highest accuracy when predicting MII oocytes, which is expected because outcomes like 2PN embryos and usable blastocysts depend on sperm quality and other factors. It was therefore decided that MII oocytes would be used as the primary outcome in the study's models, while observing the expected benefits of using the model for all outcomes.
      The authors believe that demonstrating improvements in laboratory outcomes such as 2PN embryos and usable blastocysts is a meaningful result as it maximizes the opportunity for a woman to achieve a successful live birth. The estimates for CLBR (Supplementary Figure 3) suggest that higher numbers of MII oocytes lead to a higher CLBR, but with diminishing returns. For example, an increase from 1 to 2 MII oocytes represents a 5% increase in CLBR, from 10 to 11 MII oocytes a 2% increase in CLBR, and from 20 to 21 MII oocytes a 1% increase in CLBR.
      The choice of using nearest neighbours is an important step in the authors’ modelling approach. A nearest-neighbour algorithm with relatively few inputs is easy to explain and understand compared with more complicated black-box models, allowing clinicians to interpret how a dose curve was created and why a given prediction was made. The trained KNN selected nearest neighbours using a set of normalized input parameters that included age, BMI, baseline AMH and baseline AFC, which are all available prior to the start of stimulation, and have been shown to be clinically applicable biomarkers of ovarian reserve and oocyte quality (
      • Polyzos N.P.
      • Ayoubi J.M.
      • Pirtea P.
      General infertility workup in times of high assisted reproductive technology efficacy.
      ).
      Although AMH and AFC are moderately correlated when averaged across a population (with a Pearson correlation coefficient of 0.54 in the current dataset), in clinical practice there is a discordance between these parameters in up to 20% of women (
      • Zhang Y.
      • Xu Y.
      • Xue Q.
      • Shang J.
      • Yang X.
      • Shan X.
      • Kuai Y.
      • Wang S.
      • Zeng C.
      Discordance between antral follicle counts and anti-Müllerian hormone levels in women undergoing in vitro fertilization.
      ), supporting the use of both AMH and AFC in the current model. The authors explored other parameters as well, such as day 3 FSH concentrations, but decided to exclude this parameter due to questions regarding its reliability (
      • Alipour F.
      • Jahromi A.R.
      • Maalhagh M.
      • Sobhanian S.
      • Hosseinpoor M.
      Comparison of specificity and sensitivity of AMH and FSH in diagnosis of premature ovarian failure.
      ;
      • Wang S.
      • Zhang Y.
      • Mensah V.
      • Huber W.J.
      • Huang Y.T.
      • Alvero R.
      Discordant anti-müllerian hormone (AMH) and follicle stimulating hormone (FSH) among women undergoing in vitro fertilization (IVF): Which one is the better predictor for live birth?.
      ;
      • Wolff E.F.
      • Taylor H.S.
      Value of the day 3 follicle-stimulating hormone measurement.
      ).
      Out of the four parameters that were used, BMI had the least significant correlation with MII outcomes. Although the impact of BMI on required gonadotrophins is uncertain, some studies have shown that a higher BMI is associated with an increased need for FSH (
      • Legge A.
      • Bouzayen R.
      • Hamilton L.
      • Young D.
      The Impact of Maternal Body Mass Index on In Vitro Fertilization Outcomes.
      ). The authors decided to include BMI because, similar to age, it is a reliably measured baseline parameter that is often used to stratify patient groups. Lastly, for the subset of patients with prior retrievals, the authors tried using prior outcomes as an input for the nearest-neighbour model but found that this did not meaningfully change the dose curves or expected benefit analysis. This is possibly due to the cycle-to-cycle variation in outcomes that exists even when using the same dose of gonadotrophins (
      • Yildiz S.
      • Yakin K.
      • Ata B.
      • Oktem O.
      There is a cycle to cycle variation in ovarian response and pre-hCG serum progesterone level: an analysis of 244 consecutive IVF cycles.
      ).
      This study is not without limitations. Its primary limitation is its retrospective nature. The data in this study are from a US population, and further testing is needed to determine whether the models are generalizable to other patient populations outside the US. Also, a number of other factors can affect the outcomes of a stimulation cycle, such as dose adjustments and timing of the trigger, which were not accounted for in the current modelling. However, since this study was focused on selecting the optimal starting dose of FSH, it is appropriate to focus on only the parameters available at the time of making that decision. In addition, no differentiation was made between types of protocols, as the majority of cycles in the dataset used an antagonist protocol. However, with larger datasets in the future the authors plan to stratify the analysis based on the protocol type, and investigate the effects of changing doses (e.g. step-up or step-down) during the stimulation protocol. Some cycles in the dataset had incomplete or missing data and were therefore excluded, which could have introduced a sampling bias. The calculations of starting FSH dose combined the contribution of pure FSH plus the FSH component of the FSH/LH medication, rather than evaluating each separately. This too will be investigated in future studies. Finally, future work should also investigate other methods for integrating prior cycle information into the dose–response curves, including metrics of ovarian sensitivity such as follicular output rate and ovarian sensitivity index (
      • Genro V.K.
      • Grynberg M.
      • Scheffer J.B.
      • Roux I.
      • Frydman R.
      • Fanchin R.
      Serum anti-Müllerian hormone levels are negatively related to Follicular Output RaTe (FORT) in normo-cycling women undergoing controlled ovarian hyperstimulation.
      ;
      • Yadav V.
      • Malhotra N.
      • Mahey R.
      • Singh N.
      • Kriplani A.
      Ovarian Sensitivity Index (OSI): Validating the Use of a Marker for Ovarian Responsiveness in IVF.
      ).
      In conclusion, this study is one of the first applications of machine learning for optimizing starting FSH dose selection. The results indicate that a patient similarity model for selecting starting FSH can achieve optimal laboratory outcomes while reducing the amount of starting and total FSH given to the patient, which may help to lower the costs associated with IVF. Importantly, this tool represents a step towards selecting FSH doses using a quantitative framework. Simple and interpretable models may be the first ones to achieve clinical trust and adoption as AI is introduced to the field of assisted reproduction (
      • Wang R.
      • Pan W.
      • Jin L.
      • Li Y.
      • Geng Y.
      • Gao C.
      • Chen G.
      • Wang H.
      • Ma D.
      • Liao S.
      Artificial intelligence in reproductive medicine.
      ). Ultimately, these tools will help to standardize care across clinics. The authors’ future work will include continuing to increase the size and diversity of the training dataset, performing prospective validation studies to show improved patient outcomes from using the model, and investigating optimizing gonadotrophin doses for other outcomes such as euploidy rate. In addition, the clinical applicability of the dose–response curves for assisting in patient–doctor communication and shared decision-making will be investigated.

      ACKNOWLEDGEMENTS

      The authors would like to thank Dmitry Gounko, Brianna Amaral and Daniel Duvall for help with data collection; these important data made this study possible. The authors also thank the team members of Alife Health for their helpful discussions and review of the manuscript.

      FUNDING

      This work was funded by Alife Health, Inc., USA.

      Appendix. Supplementary materials

      REFERENCES

        • Alipour F.
        • Jahromi A.R.
        • Maalhagh M.
        • Sobhanian S.
        • Hosseinpoor M.
        Comparison of specificity and sensitivity of AMH and FSH in diagnosis of premature ovarian failure.
        Disease Markers. 2015; : 2015https://doi.org/10.1155/2015/585604
        • Arce J.C.
        • Nyboe Andersen A.
        • Fernández-Sánchez M.
        • Visnova H.
        • Bosch E.
        • García-Velasco J.A.
        • Barri P.
        • de Sutter P.
        • Klein B.M.
        • Fauser B.C.J.M.
        Ovarian response to recombinant human follicle-stimulating hormone: a randomized, antimüllerian hormone–stratified, dose–response trial in women undergoing in vitro fertilization/intracytoplasmic sperm injection.
        Fertility and Sterility. 2014; 102 (1633-1640.e5)https://doi.org/10.1016/J.FERTNSTERT.2014.08.013
        • Baker V.L.
        • Brown M.B.
        • Luke B.
        • Smith G.W.
        • Ireland J.J.
        Gonadotropin dose is negatively correlated with live birth rate: analysis of more than 650,000 assisted reproductive technology cycles.
        Fertility and Sterility. 2015; 104 (1145-1152.e5)https://doi.org/10.1016/J.FERTNSTERT.2015.07.1151
        • Berkkanoglu M.
        • Ozgur K.
        What is the optimum maximal gonadotropin dosage used in microdose flare-up cycles in poor responders?.
        Fertility and Sterility. 2010; 94: 662-665https://doi.org/10.1016/J.FERTNSTERT.2009.03.027
        • Clark Z.L.
        • Thakur M.
        • Leach R.E.
        • Ireland J.J.
        FSH dose is negatively correlated with number of oocytes retrieved: analysis of a data set with ∼650,000 ART cycles that previously identified an inverse relationship between FSH dose and live birth rate.
        Journal of Assisted Reproduction and Genetics. 2021; (2021 38:7 38, 1787–1797)https://doi.org/10.1007/S10815-021-02179-0
        • Fanton M.
        • Nutting V.
        • Solano F.
        • Maeder-York P.
        • Hariton E.
        • Barash O.
        • Weckstein L.
        • Sakkas D.
        • Copperman A.B.
        • Loewke K.
        An interpretable machine learning model for predicting the optimal day of trigger during ovarian stimulation.
        Fertility and Sterility. 2022; 118: 101-108https://doi.org/10.1016/J.FERTNSTERT.2022.04.003
        • Genro V.K.
        • Grynberg M.
        • Scheffer J.B.
        • Roux I.
        • Frydman R.
        • Fanchin R.
        Serum anti-Müllerian hormone levels are negatively related to Follicular Output RaTe (FORT) in normo-cycling women undergoing controlled ovarian hyperstimulation.
        Human Reproduction. 2011; 26: 671-677https://doi.org/10.1093/HUMREP/DEQ361
        • Hariton E.
        • Chi E.A.
        • Chi G.
        • Morris J.R.
        • Braatz J.
        • Rajpurkar P.
        • Rosen M.
        A machine learning algorithm can optimize the day of trigger to improve in vitro fertilization outcomes.
        Fertility and Sterility. 2021; 116: 1227-1235https://doi.org/10.1016/J.FERTNSTERT.2021.06.018
        • Hariton E.
        • Kim K.
        • Mumford S.L.
        • Palmor M.
        • Bortoletto P.
        • Cardozo E.R.
        • Karmon A.E.
        • Sabatini M.E.
        • Styer A.K.
        Total number of oocytes and zygotes are predictive of live birth pregnancy in fresh donor oocyte in vitro fertilization cycles.
        Fertility and Sterility. 2017; 108: 262-268https://doi.org/10.1016/J.FERTNSTERT.2017.05.021
        • la Marca A.
        • Papaleo E.
        • Grisendi V.
        • Argento C.
        • Giulini S.
        • Volpe A.
        Development of a nomogram based on markers of ovarian reserve for the individualisation of the follicle-stimulating hormone starting dose in in vitro fertilisation cycles.
        BJOG: An International Journal of Obstetrics & Gynaecology. 2012; 119: 1171-1179https://doi.org/10.1111/J.1471-0528.2012.03412.X
        • Ledger W.
        • Wiebinga C.
        • Anderson P.
        • Irwin D.
        • Holman A.
        • Lloyd A.
        Costs and outcomes associated with IVF using recombinant FSH.
        Reproductive BioMedicine Online. 2009; 19: 337-342https://doi.org/10.1016/S1472-6483(10)60167-8
        • Legge A.
        • Bouzayen R.
        • Hamilton L.
        • Young D.
        The Impact of Maternal Body Mass Index on In Vitro Fertilization Outcomes.
        Journal of Obstetrics and Gynaecology Canada. 2014; 36: 613-619https://doi.org/10.1016/S1701-2163(15)30541-7
        • Lensen S.F.
        • Wilkinson J.
        • Leijdekkers J.A.
        • la Marca A.
        • Mol B.W.J.
        • Marjoribanks J.
        • Torrance H.
        • Broekmans F.J.
        Individualised gonadotropin dose selection using markers of ovarian reserve for women undergoing in vitro fertilisation plus intracytoplasmic sperm injection (IVF/ICSI).
        Cochrane Database of Systematic Reviews. 2018; https://doi.org/10.1002/14651858.CD012693.pub2
        • Letterie G.
        • mac Donald A.
        Artificial intelligence in in vitro fertilization: a computer decision support system for day-to-day management of ovarian stimulation during in vitro fertilization.
        Fertility and Sterility. 2020; 114: 1026-1031https://doi.org/10.1016/J.FERTNSTERT.2020.06.006
        • Li Y.
        • Duan Y.
        • Yuan X.
        • Cai B.
        • Xu Y.
        • Yuan Y.
        A Novel Nomogram for Individualized Gonadotropin Starting Dose in GnRH Antagonist Protocol.
        Frontiers in Endocrinology. 2021; 12https://doi.org/10.3389/fendo.2021.688654
        • Nyboe Andersen A.
        • Nelson S.M.
        • Fauser B.C.J.M.
        • García-Velasco J.A.
        • Klein B.M.
        • Arce J.C.
        • Tournaye H.
        • de Sutter P.
        • Decleer W.
        • Petracco A.
        • Borges E.
        • Barbosa C.P.
        • Havelock J.
        • Claman P.
        • Yuzpe A.
        • Višnová H.
        • Ventruba P.
        • Uher P.
        • Mrazek M.
        • Andersen A.N.
        • Knudsen U.B.
        • Dewailly D.
        • Leveque A.G.h.
        • la Marca A.
        • Papaleo E.
        • Kuczynski W.
        • Kozioł K.
        • Anshina M.
        • Zazerskaya I.
        • Gzgzyan A.
        • Bulychova E.
        • Verdú V.
        • Barri P.
        • García-Velasco J.A.
        • Fernández-Sánchez M.
        • Martin F.S.
        • Bosch E.
        • Serna J.
        • Castillon G.
        • Bernabeu R.
        • Ferrando M.
        • Lavery S.
        • Gaudoin M.
        • Fauser B.C.J.M.
        • Klein B.M.
        • Helmgaard L.
        • Mannaerts B.
        • Arce J.C.
        Individualized versus conventional ovarian stimulation for in vitro fertilization: a multicenter, randomized, controlled, assessor-blinded, phase 3 noninferiority trial.
        Fertility and Sterility. 2017; 107 (387-396.e4)https://doi.org/10.1016/j.fertnstert.2016.10.033
        • Olivennes F.
        • Howies C.M.
        • Borini A.
        • Germond M.
        • Trew G.
        • Wikland M.
        • Zegers-Hochschild F.
        • Saunders H.
        • Alam V.
        Individualizing FSH dose for assisted reproduction using a novel algorithm: the CONSORT study.
        Reproductive BioMedicine Online. 2011; 22: S73-S82https://doi.org/10.1016/S1472-6483(11)60012-6
        • Pal L.
        • Jindal S.
        • Witt B.R.
        • Santoro N.
        Less is more: increased gonadotropin use for ovarian stimulation adversely influences clinical pregnancy and live birth after in vitro fertilization.
        Fertility and Sterility. 2008; 89: 1694-1701https://doi.org/10.1016/J.FERTNSTERT.2007.05.055
        • Polyzos N.P.
        • Ayoubi J.M.
        • Pirtea P.
        General infertility workup in times of high assisted reproductive technology efficacy.
        Fertility and Sterility. 2022; 118: 8-18https://doi.org/10.1016/j.fertnstert.2022.05.019
        • Robertson I.
        • Chmiel F.P.
        • Cheong Y.
        Streamlining follicular monitoring during controlled ovarian stimulation: a data-driven approach to efficient IVF care in the new era of social distancing.
        Human Reproduction. 2021; 36: 99-106https://doi.org/10.1093/HUMREP/DEAA251
        • Robins J.C.
        • Khair A.F.
        • Widra E.A.
        • Alper M.M.
        • Nelson W.W.
        • Foster E.D.
        • Sinha A.
        • Ando M.
        • Heiser P.W.
        • Daftary G.S.
        Economic evaluation of highly purified human menotropin or recombinant follicle-stimulating hormone for controlled ovarian stimulation in high-responder patients: analysis of the Menopur in Gonadotropin-releasing Hormone Antagonist Single Embryo Transfer–High Responder (MEGASET-HR) trial.
        F&S Reports. 2020; 1: 257-263https://doi.org/10.1016/J.XFRE.2020.09.010
        • Wang R.
        • Pan W.
        • Jin L.
        • Li Y.
        • Geng Y.
        • Gao C.
        • Chen G.
        • Wang H.
        • Ma D.
        • Liao S.
        Artificial intelligence in reproductive medicine.
        Reproduction. 2019; 158: R139-R154https://doi.org/10.1530/REP-18-0523
        • Wang S.
        • Zhang Y.
        • Mensah V.
        • Huber W.J.
        • Huang Y.T.
        • Alvero R.
        Discordant anti-müllerian hormone (AMH) and follicle stimulating hormone (FSH) among women undergoing in vitro fertilization (IVF): Which one is the better predictor for live birth?.
        Journal of Ovarian Research. 2018; 11: 1-8https://doi.org/10.1186/S13048-018-0430-Z/FIGURES/3
        • Wolff E.F.
        • Taylor H.S.
        Value of the day 3 follicle-stimulating hormone measurement.
        Fertility and Sterility. 2004; 81: 1486-1488https://doi.org/10.1016/J.FERTNSTERT.2003.10.055
        • Yadav V.
        • Malhotra N.
        • Mahey R.
        • Singh N.
        • Kriplani A.
        Ovarian Sensitivity Index (OSI): Validating the Use of a Marker for Ovarian Responsiveness in IVF.
        Journal of Reproduction & Infertility. 2019; 20: 83
        • Yildiz S.
        • Yakin K.
        • Ata B.
        • Oktem O.
        There is a cycle to cycle variation in ovarian response and pre-hCG serum progesterone level: an analysis of 244 consecutive IVF cycles.
        Scientific Reports. 2020; (2020 10:1 10, 1–8)https://doi.org/10.1038/s41598-020-72597-0
        • Yovich J.L.
        • Alsbjerg B.
        • Conceicao J.L.
        • Hinchliffe P.M.
        • Keane K.N.
        PIVET rFSH dosing algorithms for individualized controlled ovarian stimulation enables optimized pregnancy productivity rates and avoidance of ovarian hyperstimulation syndrome. Drug Design.
        Development and Therapy. 2016; 10: 2561https://doi.org/10.2147/DDDT.S104104
        • Zhang Y.
        • Xu Y.
        • Xue Q.
        • Shang J.
        • Yang X.
        • Shan X.
        • Kuai Y.
        • Wang S.
        • Zeng C.
        Discordance between antral follicle counts and anti-Müllerian hormone levels in women undergoing in vitro fertilization.
        Reproductive Biology and Endocrinology. 2019; 17: 1-6https://doi.org/10.1186/S12958-019-0497-4/FIGURES/1

      Biography

      Michael Fanton is a Senior Data Scientist at Alife Health, USA, working on artificial intelligence tools for clinical decision support during IVF. He received his BS from UC San Diego and MS and PhD from Stanford University, and completed a postdoctoral fellowship at Northwestern University, USA.
      Key message
      The authors present an interpretable machine learning model to optimize the starting gonadotrophin dose during ovarian stimulation to maximize laboratory outcomes and reduce the utilization of FSH. This model represents a step towards selecting FSH doses using a quantitative framework that could help to standardize care across clinics.