Bastien Lagneaux

github profile: https://github.com/Blagneaux

PhD student at ENS Rennes and IETR: Development of an Intelligent Experimental Setup for the Measurement and Manipulation of the Hydrodynamic State of a Basin with Submerged Bodies.

Accepted Talks:

From Realistic and Automated Fish Digital Twin to Data Ethics: Building Reproducible Pipelines and Dataset for Biological AI

This paper presents a limited and preliminary automated pipeline for generating hydrodynamic digital twins of fish from real-world experimental data, combining computer vision, computational fluid dynamics (CFD), and ethical AI practices. Using a YOLOv8 segmentation model trained on manually labeled laboratory video recordings, the system extracts fish motion and integrates it into a dimensionless CFD simulation via the HAACHAMA framework. The resulting digital twins closely replicate vortex dynamics observed in nature and enable pressure field estimation without invasive techniques such as Particle Image Velocimetry (PIV). This work demonstrates the feasibility of automated, biologically respectful simulation in aquatic environments and contributes an open-source dataset to support reproducible research. Broader applications include ecological monitoring, bio-inspired robotics, and fish passage optimization, addressing emerging challenges in ethical, cross-disciplinary AI. Building on this proof of concept, the second part of the paper shifts focus toward the critical challenge of scaling the system through the construction of a robust and ethically grounded dataset. Given the widely reported reproducibility crisis in experimental sciences, the creation of such a dataset must go beyond quantity and embrace a reflective framework of quality. We propose key considerations for dataset design in this context: methodological transparency to ensure experimental repeatability; tooling and annotation consistency for reproducibility across computational processes; adherence to ethical standards concerning animal handling and data privacy; durable and open-access storage solutions to promote equitable availability; and careful licensing choices that balance openness with appropriate use. This discussion aims to articulate a roadmap for constructing meaningful datasets at the intersection of AI, biology, and ethics—positioning data not only as a technical resource, but as a socio-technical infrastructure that underpins trustworthy scientific advancement.