Séminaire de Probabilités et Statistique :

Le 14 février 2022 à 14:00 - UM - Bât 09 - Salle de conférence (1er étage)

Présentée par Marchet Camille - CRIStAL, CNRS, Université de Lille

Data-structures for querying large k-mer (collections of) sets. (Séminaire KIM)

High-throughput sequencing datasets are usually deposited in public repositories, e.g. the European Nucleotide Archive, to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow to perform online sequence searches; yet such a feature would be highly useful to investigators. Towards this goal, in the last few years several computational approaches have been introduced to index and query large collections of datasets. In this seminar I propose an overview of methods for representing and indexing sets of k-mer efficiently. Then we will review how these techniques were adapted to index collections of thousands of datasets (and more) for membership queries. I will propose application examples for these techniques with a focus on RNA and splicing.

Webinaire, à 14h.
Lien zoom : https://umontpellier-fr.zoom.us/j/94087408185

Retour