Large-scale analysis of interobserver agreement and reliability in cardiotocography interpretation during labor using an online tool.
Imane Ben M'Barek, Badr Ben M'Barek, Grégoire Jauvion, Emilia Holmström, Antoine Agman, Jade Merrer, Pierre-François Ceccaldi
Author Information
Imane Ben M'Barek: Service de Gynécologie Obstétrique, Assistance Publique Hôpitaux de Paris - Hôpital Beaujon, 100 boulevard du Général Leclerc, Clichy La Garenne, France. imane.benmbarek@aphp.fr.
Badr Ben M'Barek: Genos Care, Paris, France.
Grégoire Jauvion: Genos Care, Paris, France.
Emilia Holmström: Service de Gynécologie Obstétrique, Assistance Publique Hôpitaux de Paris - Hôpital Beaujon, 100 boulevard du Général Leclerc, Clichy La Garenne, France.
Antoine Agman: Service de Gynécologie Obstétrique, Assistance Publique Hôpitaux de Paris - Hôpital Beaujon, 100 boulevard du Général Leclerc, Clichy La Garenne, France.
Jade Merrer: AP-HP.Nord-Université Paris Cité, Hôpital Universitaire Robert Debré, Unité d'épidémiologie clinique, 1426, InsermParis, CIC, France.
Pierre-François Ceccaldi: Service de Gynécologie-Obstétrique et Médecine de la reproduction, Hôpital Foch, 40 Rue Worth, 92150, Suresnes, France.
BACKGROUND: While the effectiveness of cardiotocography in reducing neonatal morbidity is still debated, it remains the primary method for assessing fetal well-being during labor. Evaluating how accurately professionals interpret cardiotocography signals is essential for its effective use. The objective was to evaluate the accuracy of fetal hypoxia prediction by practitioners through the interpretation of cardiotocography signals and clinical variables during labor. MATERIAL AND METHODS: We conducted a cross-sectional online survey, involving 120 obstetric healthcare providers from several countries. One hundred cases, including fifty cases of fetal hypoxia, were randomly assigned to participants who were invited to predict the fetal outcome (binary criterion of pH with a threshold of 7.15) based on the cardiotocography signals and clinical variables. After describing the participants, we calculated (with a 95% confidence interval) the success rate, sensitivity and specificity to predict the fetal outcome for the whole population and according to pH ranges, professional groups and number of years of experience. Interobserver agreement and reliability were evaluated using the proportion of agreement and Cohen's kappa respectively. RESULTS: The overall ability to predict a pH level below 7.15 yielded a success rate of 0.58 (95% CI 0.56-0.60), a sensitivity of 0.58 (95% CI 0.56-0.60) and a specificity of 0.63 (95% CI 0.61-0.65). No significant difference in the success rates was observed with respect to profession and number of years of experience. The success rate was higher for the cases with a pH level below 7.05 (0.69) and above 7.20 (0.66) compared to those falling between 7.05 and 7.20 (0.48). The proportion of agreement between participants was good (0.82), with an overall kappa coefficient indicating substantial reliability (0.63). CONCLUSIONS: The use of an online tool enabled us to collect a large amount of data to analyze how practitioners interpret cardiotocography data during labor. Despite a good level of agreement and reliability among practitioners, the overall accuracy is poor, particularly for cases with a neonatal pH between 7.05 and 7.20. Factors such as profession and experience level do not present notable impact on the accuracy of the annotations. The implementation and use of a computerized cardiotocography analysis software has the potential to enhance the accuracy to detect fetal hypoxia, especially for ambiguous cardiotocography tracings.