Skip to main content
To KTH's start page

Sami Aydin: Syncmer Digest: A Lossy Compression Method for Sequencing Similarity

Time: Wed 2025-05-14 13.00 - 14.00

Location: Room Cramer

Participating: Sami Aydin

Export to calendar

Abstract

As sequencing datasets continue to grow in size, there is increasing demand for methods that enable fast, down-stream analysis without relying on full-resolution representations. Instead of storing complete sequences, it is often sufficient to retain a compact sketch that preserves key properties such as sequence similarity. Such representations can dramatically reduce computational cost while supporting a wide range of downstream tasks including clustering, alignment, and distance estimation.

Syncmer digest is introduced as a method for compactly representing sequencing data while preserving relative sequence similarity. The method applies syncmer-based subsampling to retain representative subsequences with strong positional properties, offering a digest that is both concise and informative. The talk provides a detailed overview of the subsampling process and the underlying syncmer strategy. Initial comparisons of sequence similarity in the original and digested spaces offer early insights into the method's behavior and suggest its potential usefulness in downstream analyses. Future work will focus on characterizing the relationship between similarity measure in the original and digested spaces, with particular attention to its application in average nucleotide identity (ANI) estimation.