We just posted our new paper DNA shape complements sequence-based representations of transcription factor binding sites on bioRxiv!
This paper summarizes the work Peter has been doing creating a model of DNA-protein interactions based around estimates of DNA shape. Read the abstract below:
The position weight matrix (PWM) has long been a useful tool for describing variation in the composition of regions of DNA such as transcription factor (TF) binding sites. It is difficult, however, to relate the sequence-based representation of a DNA motif to the biological features of the interaction of a TF with its binding site. Here we present an alternative strategy for representing DNA motifs – called Structural Motif (StruM) – that can easily represent different sets of structural features. Structural features are inferred from dinucleotide properties listed in the Dinucleotide Property Database. StruMs are able to specifically model TF binding sites, using an encoding strategy that is distinct from sequence-based models. This difference in encoding strategies makes StruMs complementary to sequence-based methods of TF binding site identification.