Jianfeng Sun

Postdoctoral Research Associate in Single-cell Sequencing Analysis

I obtained my Ph.D. degree (Nov. 2017 - Feb. 2021) in deep learning-based structural biology from the Technical University of Munich, Germany. Before that, I received my Bachelor's degree (B. Sci., Sep. 2010 - June. 2014) in computational mathematics from the Nanjing Tech University, China, and was subsequently trained on a Master's program (M. Eng., Sep. 2014 - June. 2016) in software engineering and bioinformatics at the Beijing Forest University (BJFU), followed by a one-year successive master-doctor training program (Sep. 2016 - June. 2017) at BJFU. Since Jul. 2021, I have joined Dr. Adam Cribbs's lab as a postdoctoral researcher in NDORMS at the University of Oxford. At my Ph.D. stage, I focused on protein-protein interaction networks, structural and evolutionary biology, with the aim of promoting illuminating their biological roles in cellular activities. I was fascinated by deciphering intricately biological networks by capitalizing on artificial intelligence-based algorithms and mathematical models. I am now active in the area of algorithm design and computational analysis for single-cell sequencing data.

The final sequencing library impurities that arise from mixing PCR duplicates and artifacts have an impact on the quantification estimation accuracy for DNA fragments or transcripts. In order to eliminate the PCR duplicates, unique molecular identifiers (UMIs) have been applied experimentally to distinguishing true PCR duplicates from the fragments that are used to be sequenced. The accurate localization of the unique fragments via UMIs is however hampered by those erroneous UMIs during PCR amplification and sequencing. Thus, computational and mathematical methods have been proposed to circumvent the problem. In addition, novel sequencing technologies are emerging as cost-effective solutions to long-read sequencing at the cost of high accuracy, which has prompted massive error-prone long-reads. In purpose-built experiments based on the new sequencing technologies, the error-correction performance of existing methods for UMI identification is found to be unsatisfactory especially when more stringent experiment settings are imposed on UMIs, e.g., a high UMI number. Therefore, I have recently branched out into algorithmic strategy design for improving UMI identification both before and after sequencing. With the rapidly growing volume of sequencing data, more powerful computational workflows for analyzing sequencing data are also about to be built.

Recent publications

More publications