新闻公告
Shilpa Garg
Assistant Professor
University of Copenhagen
Denmark
Project: Haplotype-aware de novo assembly
Scientific question . Humans are diploid, and hence there exist two versions of each
chromosome, one inherited from the mother and the other from the father. Determining the
DNA sequences of these two chromosomal copies---called haplotypes---is important for
many applications ranging from population history to clinical questions. Existing sequencing
technologies cannot read a chromosome from start to end, but instead deliver small pieces
of sequence (called reads). Like in a jigsaw puzzle, the underlying genome sequences are
reconstructed from the reads by finding the overlaps between sequences. We develop
algorithms to solve the genome assembly for diploids, that is, “to simultaneously solve two
jigsaw puzzles with very similar yet different images”. We will apply this method on cancer
genomes that have complex rearrangements.
Approach . Due to sequencing errors in the reads, heterozygous and repetitive genomic
regions, the assembly problem is challenging. Over the past few decades, researchers
solved it by casting it as an overlap graph problem, where nodes are the reads and edges
represent the overlap between reads. To detect regions where haplotypes differ (called
heterozygosity), we look for simple local structures called bubbles. A bubble is a type of
directed acyclic subgraph with a single distinct source and sink vertices that consists of
multiple edges (with the same direction) between these pairs of vertices. Once bubbles have
been identified, they are simplified by removing structures most likely resulting from
sequencing errors. The resulting bubbles can then be used to solve the “phasing problem”:
find haplotype paths based on maximum-likelihood framework.
Tasks.
1. Investigate local structures (bubbles) in graphs.
2. Formalize the problem of removing erroneous structures due to sequencing errors.
3. An efficient algorithm to detect structions in graph that represent regions of
heterozygosity/genomic rearrangements
4. Develop an efficient approach for phasing bubble chains
Relevant papers.
1. A graph-based approach to diploid genome assembly, ISMB 2018/Bioinformatics
( https://academic.oup.com/bioinformatics/article/34/13/i105/5045759 )
2. SDip: A novel graph-based approach to haplotype-aware assembly based structural
variant calling in targeted segmental duplications sequencing
( https://doi.org/10.1101/2020.02.25.964445 )
具体要求 Requirements. 1. Programming: C++, python, shell scripting, graph algorithms 2. Basic knowledge of bioinformatic tools 3. Enthusiasm to solve the problem Possibility to work remotely, with regular meetings on the campus. What you will get: - Extensive mentorship in computational methods - Knowledge of how, conceptually, we can solve biological problems using computational methods. - The opportunity to work in a diverse environment that includes people with vastly different, but complementary skill sets. - Responsibility and satisfaction of owning your own project. Candidates will be called for a short discussion (interview) to access your creativity, reasoning, and problem solving skills. 联系方式 Please contact Shilpa Garg ( shilpa.garg@bio.ku.dk , shilpa.garg2k7@gmail.com ) and include your CV if you’re interested in inventing the future of biology using computational techniques.