If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Clínica Eugin-Eugin Group, Carrer de Balmes 236, Barcelona 08006, SpainInstituto de Investigación en Inteligencia Artificial, Consejo Superior de Investigaciones Científicas (IIIA-CSIC), Campus de la UAB, Carrer de Can Planas, Zona 2, Cerdanyola de Valles Barcelona 08193, SpainUniversitat Autònoma de Barcelona (UAB), Plaça Cívica, Bellaterra Barcelona 08193, Spain
Instituto de Investigación en Inteligencia Artificial, Consejo Superior de Investigaciones Científicas (IIIA-CSIC), Campus de la UAB, Carrer de Can Planas, Zona 2, Cerdanyola de Valles Barcelona 08193, Spain
Instituto de Investigación en Inteligencia Artificial, Consejo Superior de Investigaciones Científicas (IIIA-CSIC), Campus de la UAB, Carrer de Can Planas, Zona 2, Cerdanyola de Valles Barcelona 08193, Spain
We developed a ML model to recommend first FSH dosage for all types of patients.
•
The model performance surpassed the clinicians’ in both development and validation.
•
The model can serve as quality check, second opinion or learning tool for trainees.
Abstract
Research question
Is it possible to identify accurately the optimal first dose of FSH in ovarian stimulation by means of a machine learning model?
Design
Observational study (2011–2021) including first IVF cycles with own oocytes. A total of 2713 patients from five private reproductive centres were included in the development phase (2011–2019) and 774 in the validation phase (2020–2021). Predictor variables included age, BMI, AMH, AFC and previous live births. Performance was measured with a proposed score based on the number of MII oocytes retrieved and dose received, recommended, or both.
Results
The included cycles were from women aged 37.7 ± 4.4 years (18–45 years), with a BMI of 23.5 ± 4.2 kg/m2, AMH of 2.4 ± 2.3 ng/ml, AFC of 11.3 ± 7.6, and an average number of MII obtained 6.9 ± 5.4. The model reached a mean performance score of 0.87 (95% CI 0.86 to 0.88) in the development phase, significantly better than for doses prescribed by clinicians for the same patients (0.83, 95% CI 0.82 to 0.84; P = 2.44 e-10). Mean performance score of the model recommendations was 0.89 (95% CI 0.88 to 0.90) in the validation phase, also significantly better than clinicians (0.84, 95% CI 0.82 to 0.86; P = 3.81 e-05). The model was shown to surpass the performance of standard practice.
Conclusion
This machine learning model could be used as a training and learning tool for new clinicians, and as quality control for experienced clinicians.
Although significant strides have been made in the last 40 years, the mean pregnancy rate after an IVF cycle still hovers around 30%, with a 20% chance of delivery (
). An important requisite to the success of an IVF cycle is the availability of a certain number of mature oocytes (metaphase III [MII]); usually obtained after ovarian stimulation.
Ovarian stimulation, therefore, represents, a key step for IVF success, as failing to ensure an optimal number of MII oocytes will likely hinder the procedure. As the number of MII oocytes retrieved increases, so does the chance of producing some embryos with high pregnancy potential (
Conventional ovarian stimulation and single embryo transfer for IVF/ICSI. How many oocytes do we need to maximize cumulative live birth rates after utilization of all fresh and frozen embryos?.
A novel predictive model to estimate the number of mature oocytes required for obtaining at least one euploid blastocyst for transfer in couples undergoing in vitro fertilization/intracytoplasmic sperm injection: The ART calculator.
), but stimulating a patient too much leads to an increased risk of ovarian hyperstimulation syndrome (OHSS). As such, a compromise must be reached to strive to retrieve a number inside of a range considered as optimal that does not increase chances of OHSS but maintains good pregnancy potential. The definition of an optimal range of oocytes considered during this study ranges from 10 to 15 oocytes (
). Anything outside these values is considered too many or too few. Whenever a patient falls outside the defined range, the risk of an unsuccessful or cancelled cycle increases as well as the occurrence of OHSS. This implies the need to freeze all the embryos when generated, which increases costs and causing delays in treatment. Acceptance of an increased risk of OHSS, when properly managed with gonadotrophin releasing hormone agonist trigger, in exchange for a higher number of MII oocytes, is controversial.
) showed an increased cumulative LBR when the frozen embryo transfers were taken into account. This could benefit patients with specifically advanced maternal age but not patients with polycystic ovary syndrome (
). For the present study, uniform criteria for all patients were needed, so a conservative view was considered adequate. A response below 15 oocytes was then set as ideal.
Essential to all ovarian stimulation protocols is the starting dose of exogenous FSH. This dose should be sufficient to recruit all FSH responsive follicles but should not be any higher to avoid unsafe effects, i.e. OHSS or decreased oocyte quality. After about 8 days of stimulation, changing the FSH dose does not allow for a significant further recruitment of follicles (
). In other words, if the starting dose of exogenous FSH is inadequate, little can be done to fix its effects on MII yield.
The choice of the FSH starting dose is mostly based on the patient's characteristics, i.e. age, body mass index (BMI) or ovarian reserve and clinical characteristics, i.e. past gravidity and parity. Sometimes, ovarian stimulation leads to unexpected and widely different results even among apparently similar patients, resulting in either too many or too few oocytes collected. Furthermore, where the MII oocytes retrieved are in the expected number range, they may still be of insufficient quality to achieve success, as only 30–40% of microinjected oocytes develop to blastocyst (
Luteal phase after conventional stimulation in the same ovarian cycle might improve the management of poor responder patients fulfilling the Bologna criteria: a case series.
The accumulation of vitrified oocytes is a strategy to increase the number of euploid available blastocysts for transfer after preimplantation genetic testing.
Clinicians use their knowledge and experience to prescribe a starting FSH dose to reach the appropriate range of follicular stimulation. So far, some machine learning models have been developed to encapsulate that medical experience reflected in historical data to try to automate that decision. Two separate nomograms based on patient age, anti-Müllerian hormone (AMH) or antral follicle count (AFC) and basal FSH levels have been developed for this task (
Development of a nomogram based on markers of ovarian reserve for the individualisation of the follicle-stimulating hormone starting dose in in vitro fertilisation cycles.
Novel nomogram-based integrated gonadotropin therapy individualization in in vitro fertilization/intracytoplasmic sperm injection: A modeling approach.
), reporting an increased number of patients with an optimal range of MII oocytes retrieved, and a decreased number in patients with lower response in those using the nomogram. These two nomograms did not include patients older than 40 years or those with irregular cycles, including patients with polycystic ovary syndrome. In a RCT for another model developed specifically for individualized dosage of FSH delta (
Individualized versus conventional ovarian stimulation for in vitro fertilization: a multicenter, randomized, controlled, assessor-blinded, phase 3 noninferiority trial.
Predictive factors and a corresponding treatment algorithm for controlled ovarian stimulation in patients treated with recombinant human follicle stimulating hormone (follitropin alfa) during assisted reproduction technology (ART) procedures. An analysis.
Individualized recombinant human follicle-stimulating hormone dosing using the CONSORT calculator in assisted reproductive technology: A large, multicenter, observational study of routine clinical practice.
), showing that the model was able to reduce the risk of OHSS in patients while maintaining comparable pregnancy rates compared with the clinician-chosen dose, despite a reduction in the number of retrieved oocytes.
The aim of the present study was to develop and validate a model based on machine-learning designed to identify the optimal starting dose for all variants of FSH except delta (as it is not quantified in IU/ml), and to collect several MII oocytes as close as possible to 12 (a middle point in the optimal range considered in this study), for all types of patients.
Materials and methods
Patient population and ethical approval
Data from a total of 2713 first IVF cycles, from 2011–2019, registered in five private centres from two countries pertaining to the same company, were used to develop the model. All five centres operated under similar quality-control protocols, but choice of stimulation and modifications to standard protocols were left to each clinician. Natural cycles and cycles in which gonadotrophin doses were not expressed in IU/ml were excluded. The inclusion of first cycles aimed to prevent bias caused by unrecorded clinician knowledge (such as FSH dosage and results of previous cycles). An additional 774 cycles between 2020 and 2021 were used for prospective validation of the model. Three categories of data were collected as variables. First, the input data, composed of age, BMI, proven fertility (Y/N) and reserve markers AMH and AFC; second, the intervention, namely the first dose of FSH prescribed by the clinician; and third, result data expressed as number of metaphase II oocytes (MII) collected after stimulation (Table 1). Throughout the study, only cases with complete data on all the variables were included. Cycles from both the development and validation databases corresponded to women aged 37.7 ± 4.4 years (18–45 years), with a BMI of 23.5 ± 4.2 kg/m2, AMH of 2.4 ± 2.3 ng/ml, AFC of 11.3 ± 7.6, and with an average number of MII obtained 6.9 ± 5.4.
Table 1PATIENT CHARACTERISTICS IN THE TWO DATABASES USED IN THE STUDY
Characteristics
Development database (n = 2713)
Validation database (n = 774)
P-value
Age, years
37.7 ± 4.6
38.3 ± 4.4
0.007
AMH, ng/ml
2.4 ± 2.3
2.2 ± 2.2
0.003
AFC, n
11.1 ± 7.3
11.3 ± 8.5
0.7
BMI, kg/m2
23.6 ± 4.2
23.2 ± 3.9
0.007
Number of MII retrieved
6.9 ± 5.0
6.8 ± 6.5
0.005
Proven female fertility, %
13
10.1
0.067
Values are expressed as mean and SD or as %. Variables were compared using Mann-Whitney U test. For proportions, a two-sample z-test was conducted. P < 0.05 was considered statistically significant. AFC, antral follicle count; AMH, anti-Müllerian hormone; BMI, body mass index; MII, metaphase II.
), the aim of the present study was to predict the initial dose of FSH to achieve a number of MII as close as possible to 12. The range 10–15 was considered desirable, the range four to nine suboptimal and MII lower than four or above 15 not desirable. Given patient characteristics and limitations in the maximum dose of FSH administered, not every patient is considered able to reach the desired goal. The number of MII was selected because, as an end point, it is close in time and association with the intervention while maintaining clinical relevance (a recognized association exists between number of MII and chances of pregnancy and live birth). Live birth rate (LBR) and clinical pregnancy rate (CPR) were considered initially in the building of the model but were too distant in time from the intervention for any model to be able to predict accurately the effect of a specific treatment using, as in the present study, only the information at the start of treatment and from female participants.
A predictive model was constructed to predict the patient's capability of reacting to the first dose of FSH. This capability can be described by the slope of a simplified linear dose-response function. For any patient, during a natural cycle (0 IU/ml of exogenous FSH) the outcome in number of MII collected would stay mainly between 0 and 1. Given that results of a specific dose of FSH were entered into the database, the value of individual slopes was easily computable. To avoid negative slope values, it was assumed that all patients would achieve 0 MII if given 0 exogenous FSH.
The slope of a linear function is defined as follows:
As the first data point (x1, y1) is set at the origin (0, 0), the slope value for every patient is computed by dividing the outcome or y2 (MII) by x2 (the first dose of FSH).
A linear regression algorithm was trained to predict the slope for every case (defined by its values at the start of the stimulation in age, BMI, AFC AMH and proven fertility). Training was conducted on a random 80% of the development database. The remaining 20% was reserved for testing purposes. The training process was cross-validated five times with five randomly selected training datasets, with their corresponding five test sets.
Dose recommendation by the model
For dose-recommending purposes, the predicted slope for each test patient was used to compute the necessary FSH to obtain an outcome of 12 MII using the following linear function:Y = m * x
Where y is the number of MII, m, the value of the slope, and x, the FSH quantity.
As prescribing more than 300 IU/ml of FSH has been reported to give little to no advantages (
A prospective randomized clinical trial of differing starter doses of recombinant follicle-stimulating hormone (follitropin-β) for first time in vitro fertilization and intracytoplasmic sperm injection treatment cycles.
A score function was designed to compare recommendations made by the model with the prescriptions made by the clinicians. Given any FSH prescription with its resulting MII outcome, the score function assigns a score for a hypothetical recommended dose from –1 (the recommended dose was too low) to 1 (too high), 0 being the best possible value (the dose recommended as appropriate). Doses of FSH were categorized in four ordinal ranks (100 to 150, 151 to 200, 201 to 250 and 251 to 300) to create the score function.
The score function also allows clinical prescriptions to be assessed by setting the recommended dose equal to the clinician prescribed dose. In doing so, the function evaluates how close the MII outcome is from the optimal range (10 to 15), and if there is any room for improving the dose (Supplementary Information).
Evaluating the performance of the model
The performance of model-based recommendations was evaluated using the proposed score in the 20% reserved for testing the development database and in the prospective validation database. In both cases, two scores were computed for each patient. One score for the dose prescribed by the clinician and another score for the model- recommended dose. Absolute values of both scores were compared across all cases to identify which group (clinical or model recommended) had more scores closer to 0, being of no importance if the dose was too high or too low. The Wilcoxon signed-rank test was used for this purpose, as distribution of the scores was not normal. For an easier interpretation of the results, mean score values were expressed as 1 − |score|, where a value close to 1 is best. Scikit-learn 0.24 in Python 3.7.6 was used for all computations.
Results
Predictive and recommendation performance
During the development phase of the model, the mean performance score for clinical doses was 0.83 (95% CI 0.82 to 0.84), and for model recommendations was 0.87 (95% 0.86 to 0.88; P = 2.44 e-10).
During validation, the mean score for prescriptions was calculated to be 0.84 (95% CI 0.82 to 0.86), and for the model's recommendations 0.89 (95% CI 0.88 to 0.90; P = 3.81 e-05).
Score and dosage analysis
To further understand the performance of the model and of the clinical prescriptions, the mode was compared graphically, and clinicians’ score distributions were compared in the test set of the development database and in the validation one (Figure 1).
Figure 1Performance during development and validation. Clinical and model scores during development in panel A, and during validation in panel B. The development data includes the test results of the five cross validations.
The model's score approached 0 (the best possible dose) more times than the clinicians’ dose, suggesting a dose higher than the one favoured by clinicians when not approaching 0. In 57.4% of cases in the test set and in 68.8% in the validation database, the dose rank was not modified in relation to the clinician-prescribed dose.
How the dosage was changed from clinician prescription to model recommendation was further analysed in relation to the real outcome in number of MII (Figure 2).
Figure 2Dose ranks prescribed per range of metaphase II (MII) retrieved. Panel A: by the clinicians during development; panel B: by the model during development; panel C: by the clinicians during validation; and panel D: by the model during validation. The development data include the test results of the fives cross validations.
The model tends to increase the dose for patients with low and sub-optimal oocyte retrievals, but also increases dosage for some of the hyper-responders.
Discussion
Currently, FSH dosage models include several recommended models that have provided optimistic results. Yet, some of them have not been tested by RCT; those that have (
Individualized versus conventional ovarian stimulation for in vitro fertilization: a multicenter, randomized, controlled, assessor-blinded, phase 3 noninferiority trial.
Predictive factors and a corresponding treatment algorithm for controlled ovarian stimulation in patients treated with recombinant human follicle stimulating hormone (follitropin alfa) during assisted reproduction technology (ART) procedures. An analysis.
Development of a nomogram based on markers of ovarian reserve for the individualisation of the follicle-stimulating hormone starting dose in in vitro fertilisation cycles.
Individualized versus conventional ovarian stimulation for in vitro fertilization: a multicenter, randomized, controlled, assessor-blinded, phase 3 noninferiority trial.
) restricts this new personalization of the first FSH dose to a small subset of patients. In this subset, however, this personalized dose finding is not as critical as for the excluded patients. As the model presented in this study includes every type of patient, the results are enhanced for all of them.
In addition to age, AFC, AMH, BMI and presence of previous successful pregnancies as variables in the core model have been shown to be good predictors of the dose-response function slope. This value has already been used as ovarian sensitivity (oocytes recovered per unit of starting FSH) in the development of a monogram tested by RCT (
Development of a nomogram based on markers of ovarian reserve for the individualisation of the follicle-stimulating hormone starting dose in in vitro fertilisation cycles.
). Its use as an objective variable of the core model mitigates the confounding effect produced in any non-randomized treatment database, and that could lead a direct model (oocyte number as objective variable) to determine, for example, that higher doses, which are often prescribed for low-responders, lead to smaller oocyte yields. As in the present study, the treatment is tailored to the patient by the clinician, and it is especially important to account for it. Removal of this effect also allows for the extension of the recommender model for all types of patients so as not to confuse the model; on the contrary, the core model learns that extreme patients (low-responders and hyper-responders) have extreme ovarian potential values.
Additionally, the model constructed around a linear approximation of the dose-response function enables the final user to select the number of MII desired to be retrieved, and then obtain the corresponding FSH dose recommendation. This opens the use of this model to all kinds of situations, not just those in which 12 MII are the desired result, as in the present study.
As a separate contribution to an inclusive recommender model, we have developed a way to test in silico whether the model would improve results compared with historical data, as a step preceding an RCT. To this end, the performance score was designed to encode and automate faithfully an expert clinical assessment of treatment-recommendation-outcome combinations. In other words, it enables us to assert if a recommended dose could fare better than the one already prescribed, given the real result in retrieved MII. In this way, it is possible to estimate reliably whether the model has the possibility to improve current clinical practice. With this information, the investment in a well-designed RCT can be made more confidently. Additionally, results of the in-silico performance of the model are more informative than the sole prediction scores of the core recommender model.
The scores of the present model were consistently better than those of clinical practice, both in the development and validation databases. This is of interest as the model holds its value even though the population of the validation database is significantly older. Therefore, it means that the core model has learnt the important aspects of the relationship between the patient's characteristics and her ovarian potential or slope in the dose-response function. It is worth noting that the most significant predicted improvement was for the patients whose oocyte yield was low or sub-optimal, in which doses are increased on average. Upon implementation, the system's recommendations may improve the average results and most probably avoid some cycle cancellations owing to lack of embryos for transfer.
Detailed analysis of the behaviour of the model revealed its tendency, when incorrect, to overdose some patients. This contrasts with clinical practice, in which the tendency is to underdose when the prescription is inadequate. These instances of overestimations by the models correspond mainly to hyper-responder patient profiles, which are under-represented in our databases. As such, the algorithm could not learn appropriately owing to the lack of a sufficient sample size. Importantly, although the model does tend to overdose these patients, it still recommends the same or lower doses than the clinician in most of these cases, i.e. the clinician also tends to overdose. Nonetheless, we cannot dismiss the possibility that this could lead to a small increase in the risk of OHSS. This contrasts with previously published results in which RCT-tested models reduced the incidence of OHSS risk (
Individualized versus conventional ovarian stimulation for in vitro fertilization: a multicenter, randomized, controlled, assessor-blinded, phase 3 noninferiority trial.
). Secondary results of these studies, however, failed to show an increase in either retrieved oocytes or pregnancy results, with one reporting a reduction in oocyte yield (
). Although the risk of OHSS must be taken seriously, it is also true that it can be managed within a cycle with proper prevention, such as gonadotrophin releasing hormone agonist trigger. All things considered, perhaps a manageable risk for a small portion of patients could be a fair trade-off to avoid a lack of embryos suitable for transfer for other patients.
Further analysis of the instances in which the model made a suboptimal suggestion led to another conclusion. Instances in which the model had negative error scores seem to coincide frequently with negative error scores for the clinician's prescription. Analysis of these cases in more detail produced a profile of patients with good markers and an unexplained low retrieval of oocytes. This could possibly be related to undiagnosed genetic polymorphisms in the FSHR or LHB genes (
), which, obviously, neither the clinicians nor the model could detect.
Despite the possible limitations of the system, it is encouraging that the preliminary results show, in most cases, a similar or better performance score of the model's recommendation compared with the dose prescribed by the clinician.
In conclusion, clinicians prescribe the first FSH dose for each patient based on their characteristics, reserve markers and their own experience with similar cases. Although most of the time they prescribe the dose necessary for an optimal result, sometimes the outcome can unexpectedly vary and fall into suboptimal or extreme ranges. Our model could avoid most of these deviations by analysing the patient's profile and making suggestions for the medical professional to assess.
Once tested and its performance confirmed by RCT, the machine learning model that we have developed could be used as a training and learning tool for new clinicians and could serve as quality control for experienced ones; furthermore, it could provide a second opinion as the information could be useful in peer-to-peer case discussions.
Acknowledgements
We would like to thank Dr Maria Jesús López for kindly lending us her time and her expertise on ovarian stimulation protocols.
Funding
This work was supported by Doctorat Industrial funded by Generalitat de Catalunya [DI-2019-24], by project CI-SUSTAIN funded by the Spanish Ministry of Science and Innovation [PID2019-104156GB-I00], by EUROVA Innovative Training Network (MSCA-ITN-2019-860960), and by intramural funding by Clínica Eugin-Eugin Group.
The accumulation of vitrified oocytes is a strategy to increase the number of euploid available blastocysts for transfer after preimplantation genetic testing.
Conventional ovarian stimulation and single embryo transfer for IVF/ICSI. How many oocytes do we need to maximize cumulative live birth rates after utilization of all fresh and frozen embryos?.
Novel nomogram-based integrated gonadotropin therapy individualization in in vitro fertilization/intracytoplasmic sperm injection: A modeling approach.
A novel predictive model to estimate the number of mature oocytes required for obtaining at least one euploid blastocyst for transfer in couples undergoing in vitro fertilization/intracytoplasmic sperm injection: The ART calculator.
A prospective randomized clinical trial of differing starter doses of recombinant follicle-stimulating hormone (follitropin-β) for first time in vitro fertilization and intracytoplasmic sperm injection treatment cycles.
Predictive factors and a corresponding treatment algorithm for controlled ovarian stimulation in patients treated with recombinant human follicle stimulating hormone (follitropin alfa) during assisted reproduction technology (ART) procedures. An analysis.
Development of a nomogram based on markers of ovarian reserve for the individualisation of the follicle-stimulating hormone starting dose in in vitro fertilisation cycles.
Individualized recombinant human follicle-stimulating hormone dosing using the CONSORT calculator in assisted reproductive technology: A large, multicenter, observational study of routine clinical practice.
Individualized versus conventional ovarian stimulation for in vitro fertilization: a multicenter, randomized, controlled, assessor-blinded, phase 3 noninferiority trial.
Luteal phase after conventional stimulation in the same ovarian cycle might improve the management of poor responder patients fulfilling the Bologna criteria: a case series.
Núria Correa is senior clinical embryologist and researcher at the R&D department of the Eugin Group. She is a PhD candidate at the Universitat Autònoma de Barcelona, working on a research project centred on the application of artificial intelligence in assisted reproduction.
Key message
A machine learning model was trained to recommend first FSH doses for ovarian stimulation. Compared with clinicians, the model achieved consistently better performance scores. The model could be used as a second opinion and as a learning tool for new clinicians to avoid as many non-optimal outcomes as possible.
Article info
Publication history
Published online: June 18, 2022
Accepted:
June 13,
2022
Received in revised form:
May 21,
2022
Received:
February 2,
2022
Declaration: The authors report no financial or commercial conflicts of interest.