Séminaire de Probabilités et Statistique
lundi 10 juin 2024 à 13:45 - Zoom
Dennis Shasha (New York University)
Bipartite Networks Represent Causality Better Than Simple Networks: evidence, algorithms, and applications
A network, whose nodes are genes and whose directed edges represent
positive or negative influences of a regulatory gene and its targets, is
often used as a representation of causality. To infer a network,
researchers often develop a machine learning model and then evaluate
the model based on their match with experimentally verified "gold standard"
edges. The hoped-for result of such a model is a network that may extend
the gold standard edges. Since networks are a form of visual
representation, one can compare their utility with architectural or
machine blueprints. Blueprints are clearly useful, because they give
precise guidance to builders in construction. If the primary role of
gene regulatory networks is to characterize causality, then such
networks should be good tools of prediction because prediction is the
actionable benefit of knowing causality. But are they?
In this paper, we compare prediction quality based on "gold standard"
regulatory edges from previous experimental work with non-linear models
inferred from time series data across four different species.
We show that the machine learning model gives higher predictive
accuracy than linear (or non-linear) models based on the gold standard
edges. Having established that networks fail to characterize causality
properly, we suggest that causality research should focus on four
goals: (i) predictive accuracy, (ii) a parsimonious enumeration of
predictive regulatory genes for each target gene $g$, (iii) the
identification of disjoint sets of predictive regulatory genes for each
target $g$ of roughly equal accuracy, and (iv) the construction of a
bipartite network (whose node types are genes and models)
representation of causality. We provide algorithms for all goals.
Séminaire uniquement sur zoom : https://umontpellier-fr.zoom.us/j/94087408185