Sami Aydin: Syncmer Digest: A Lossy Compression Method for Sequencing Similarity
Time: Wed 2025-05-14 13.00 - 14.00
Location: Room Cramer
Participating: Sami Aydin
Abstract
As sequencing datasets continue to grow in size, there is increasing demand for methods that enable fast, down-stream analysis without relying on full-resolution representations. Instead of storing complete sequences, it is often sufficient to retain a compact sketch that preserves key properties such as sequence similarity. Such representations can dramatically reduce computational cost while supporting a wide range of downstream tasks including clustering, alignment, and distance estimation.
Syncmer digest is introduced as a method for compactly representing sequencing data while preserving relative sequence similarity. The method applies syncmer-based subsampling to retain representative subsequences with strong positional properties, offering a digest that is both concise and informative. The talk provides a detailed overview of the subsampling process and the underlying syncmer strategy. Initial comparisons of sequence similarity in the original and digested spaces offer early insights into the method's behavior and suggest its potential usefulness in downstream analyses. Future work will focus on characterizing the relationship between similarity measure in the original and digested spaces, with particular attention to its application in average nucleotide identity (ANI) estimation.