Accurate mapping of repeat occupancy
A non-trivial problem with mapping protein occupancy is the issue of short-read assignments to repetitive regions. Over 50% of the genome is comprised of repeats.
ChIP suffers from the problem of starting with chromatin fragments <400 bp in length. This prevents any accurate mapping of reads following the IP. CUT&RUN begins with native conditions, and utilizes antibody-based localization of a nuclease that cleaves fragments proximal to the site of localization. These liberated fragments are typically less than 400bp (larger ones serve no utility, and sometimes appear despite Henikoff's assertion that they should not).
The most accurate means of assigning reads to repeats is through long-read sequencing approaches. PAC-BIO sequencing permits this for WGS. However, there is no current means to utilize this for occupancy as both ChIP and CUT&RUN end with short fragments that are input into the library.
It appears that the solution lies within the current long-read mapping approaches.
I propose to develop a system akin to CUTandRUN in which an antibody-based recruitment strategy recruits some DNA 'modifier'. These modifications are incoporated under native conditions, but do not result in short fragments. Instead, genomic integrity is maintained, and the experiment is treated as a WGS experiment. Regions with modifications demark locations of occupancy. With long reads, this permits localization to specific instances of a repeat.
The major caveat here is that a large portion of sequenced material is useless in this case. For this reason, perhaps the experiment should be dual purpose: Mapping occupancy and retro-transposition.
Means of stratifying sequencing locations may be another approach, but verge upon aprioiri hypothesis approaches rather than unbiased discovery approaches.
############################
ChIP suffers from the problem of starting with chromatin fragments <400 bp in length. This prevents any accurate mapping of reads following the IP. CUT&RUN begins with native conditions, and utilizes antibody-based localization of a nuclease that cleaves fragments proximal to the site of localization. These liberated fragments are typically less than 400bp (larger ones serve no utility, and sometimes appear despite Henikoff's assertion that they should not).
The most accurate means of assigning reads to repeats is through long-read sequencing approaches. PAC-BIO sequencing permits this for WGS. However, there is no current means to utilize this for occupancy as both ChIP and CUT&RUN end with short fragments that are input into the library.
It appears that the solution lies within the current long-read mapping approaches.
I propose to develop a system akin to CUTandRUN in which an antibody-based recruitment strategy recruits some DNA 'modifier'. These modifications are incoporated under native conditions, but do not result in short fragments. Instead, genomic integrity is maintained, and the experiment is treated as a WGS experiment. Regions with modifications demark locations of occupancy. With long reads, this permits localization to specific instances of a repeat.
The major caveat here is that a large portion of sequenced material is useless in this case. For this reason, perhaps the experiment should be dual purpose: Mapping occupancy and retro-transposition.
Means of stratifying sequencing locations may be another approach, but verge upon aprioiri hypothesis approaches rather than unbiased discovery approaches.
############################