Widespread allele-specific topological domains in the human genome are not confined to imprinted gene clusters

Stephen Richer, Yuan Tian, Stefan Schoenfelder, Laurence Hurst, Adele Murrell, Giuseppina Pisignano

There is widespread interest in the three-dimensional chromatin conformation of the genome and its impact on gene expression. However, these studies frequently do not consider parent-of-origin differences, such as genomic imprinting, which result in monoallelic expression. In addition, genome-wide allele-specific chromatin conformation associations have not been extensively explored. There are few accessible bioinformatic workflows for investigating allelic conformation differences and these require pre-phased haplotypes which are not widely available.

We developed a bioinformatic pipeline, “HiCFlow,” that performs haplotype assembly and visualization of parental chromatin architecture. We benchmarked the pipeline using prototype haplotype phased Hi-C data from GM12878 cells at three disease-associated imprinted gene clusters. Using Region Capture Hi-C and Hi-C data from human cell lines (1-7HB2, IMR-90, and H1-hESCs), we can robustly identify the known stable allele-specific interactions at the IGF2-H19 locus. Other imprinted loci (DLK1 and SNRPN) are more variable and there is no “canonical imprinted 3D structure,” but we could detect allele-specific differences in A/B compartmentalization. Genome-wide, when topologically associating domains (TADs) are unbiasedly ranked according to their allele-specific contact frequencies, a set of allele-specific TADs could be defined. These occur in genomic regions of high sequence variation. In addition to imprinted genes, allele-specific TADs are also enriched for allele-specific expressed genes. We find loci that have not previously been identified as allele-specific expressed genes such as the bitter taste receptors (TAS2Rs).

This study highlights the widespread differences in chromatin conformation between heterozygous loci and provides a new framework for understanding allele-specific expressed genes.
Original languageEnglish
Article number40
JournalGenome Biology
Issue number1
Publication statusPublished - 3 Mar 2023

Bibliographical note

This work has been supported by the Medical Research Council (MR/P000711/1 to A.M. and L.H.), the Leverhulme Trust
(RPG-2020-327 to A.M.), and the EPSRC DTP studentship (2106811 to S.R.).

Availability of data and materials
The Region Capture Hi-C datasets that we generated in this work are available in NCBI repository at the accession number PRJNA926951 [147].
Scripts used for downstream bioinformatics analysis are available under MIT license at Github: https://github.com/Steph
enRicher/HiCFlow [148] and https://github.com/StephenRicher/AS-HiC-Analysis [149]. These scripts are also deposited in
Zenodo: https://zenodo.org/record/7563515 [150] and https://zenodo.org/record/6510198 [151].
Further details of the HiCFlow workfow are provided below.
• Project name: HiCFlow
• Project home page: https://github.com/StephenRicher/HiCFlow
• Archived version: 10.5281/zenodo.7563515
• Operating system: Unix-based operating systems
• Programming language: Snakemake (Python)
• Other requirements: Snakemake 7.3.1 or higher, Conda
• License: MIT License
• Any restrictions to use by non-academics: None
Datasets supporting the conclusions of this study include public available Hi-C Data (GSE63525 [98, 99] GSE163666
[100])/ Phased Variant Data (PRJEB338 [133])/ CTCF ChIP Data (GSE30263 [115, 116], GSE31477 [118], GSE29611 [119],
PRJEB3073 [121], GSE51334 [117])/ CpG Data ((GSE86765 [142], GSE17312 [143], GSE80911 [144])/ Allele-Specifc Expression Data (NA12878 [110], GSE16256 [102–108])/Allele-Specifc Methylation Data (GSE40832 [112, 113])/ Chromatin Loop
Data (http://3dgenome.fsm.northwestern.edu/downloads/loops-hg19.zip) [53]/ Chromatin State Data (15-core) (https://
egg2.wustl.edu/roadmap/web_portal/) [122].

