Professor Farzad Hassanzadeh at the Wilson Lecture Series
Data compression and sequence analysis for two non-Markovian sources
Although Markov models of sequences are ubiquitous in information theory and statistical signal processing, there are critical applications where Markovianity does not hold. This talk focuses on compression and analysis methods for two such cases: sources with long-range redundancy and evolutionary sources. Long-range redundancy, i.e., the presence of repeated blocks at large distances, is prevalent in large-scale data storage systems, where commonly, up to 85% of the data is redundant. Evolutionary sources, which produce data through consecutive edits, model the generation of genomic data. I will present our results on the compression of these sources, including information-theoretic lower bounds and algorithmic upper bounds. For evolutionary sources, I will also discuss parameter estimation and its applications to computational biology, including a novel approach for quantifying mutation activity based on a single sequence.
Biography of professor Hassanzadeh
Professor Hassanzadeh is an assistant professor in the Department of Electrical and Computer Engineering and the Department of Computer Science at the University of Virginia. Previously, he was a postdoctoral scholar at the California Institute of Technology. He received his Ph.D. in Electrical and Computer Engineering in 2013 from the University of Illinois at Urbana-Champaign. His current research interests include data compression, coding for storage, and machine learning. He is the recipient of the 2013 Robert T. Chien Memorial Award from the University of Illinois for demonstrating excellence in research and the 2014 IEEE Data Storage Best Student Paper Award.