eScholarship@UMMS
Prediction of protein-protein binding free energies
We present an energy function for predicting binding free energies of protein-protein complexes, using the three-dimensional structures of the complex and unbound proteins as input. Our function is a linear combination of nine terms and achieves a correlation coefficient of 0.63 with experimental measurements when tested on a benchmark of 144 complexes using leave-one-out cross validation. Although we systematically tested both atomic and residue-based scoring functions, the selected function is dominated by residue-based terms. Our function is stable for subsets of the benchmark stratified by experimental pH and extent of conformational change upon complex formation, with correlation coefficients ranging from 0.61 to 0.66.
Unsupervised pattern discovery in human chromatin structure through genomic segmentation
We trained Segway, a dynamic Bayesian network method, simultaneously on chromatin data from multiple experiments, including positions of histone modifications, transcription-factor binding and open chromatin, all derived from a human chronic myeloid leukemia cell line. In an unsupervised fashion, we identified patterns associated with transcription start sites, gene ends, enhancers, transcriptional regulator CTCF-binding regions and repressed regions. Software and genome browser tracks are at http://noble.gs.washington.edu/proj/segway/.
Cutting edge: Evidence for a dynamically driven T cell signaling mechanism
T cells use the alphabeta TCR to bind peptides presented by MHC proteins (pMHC) on APCs. Formation of a TCR-pMHC complex initiates T cell signaling via a poorly understood process, potentially involving changes in oligomeric state, altered interactions with CD3 subunits, and mechanical stress. These mechanisms could be facilitated by binding-induced changes in the TCR, but the nature and extent of any such alterations are unclear. Using hydrogen/deuterium exchange, we demonstrate that ligation globally rigidifies the TCR, which via entropic and packing effects will promote associations with neighboring proteins and enhance the stability of existing complexes. TCR regions implicated in lateral associations and signaling are particularly affected. Computational modeling demonstrated a high degree of dynamic coupling between the TCR constant and variable domains that is dampened upon ligation. These results raise the possibility that TCR triggering could involve a dynamically driven, allosteric mechanism.
Modeling gene expression using chromatin features in various cellular contexts
BACKGROUND: Previous work has demonstrated that chromatin feature levels correlate with gene expression. The ENCODE project enables us to further explore this relationship using an unprecedented volume of data. Expression levels from more than 100,000 promoters were measured using a variety of high-throughput techniques applied to RNA extracted by different protocols from different cellular compartments of several human cell lines. ENCODE also generated the genome-wide mapping of eleven histone marks, one histone variant, and DNase I hypersensitivity sites in seven cell lines.
RESULTS: We built a novel quantitative model to study the relationship between chromatin features and expression levels. Our study not only confirms that the general relationships found in previous studies hold across various cell lines, but also makes new suggestions about the relationship between chromatin features and gene expression levels. We found that expression status and expression levels can be predicted by different groups of chromatin features, both with high accuracy. We also found that expression levels measured by CAGE are better predicted than by RNA-PET or RNA-Seq, and different categories of chromatin features are the most predictive of expression for different RNA measurement methods. Additionally, PolyA+ RNA is overall more predictable than PolyA- RNA among different cell compartments, and PolyA+ cytosolic RNA measured with RNA-Seq is more predictable than PolyA+ nuclear RNA, while the opposite is true for PolyA- RNA.
CONCLUSIONS: Our study provides new insights into transcriptional regulation by analyzing chromatin features in different cellular contexts.
Functional analysis of transcription factor binding sites in human promoters
BACKGROUND: The binding of transcription factors to specific locations in the genome is integral to the orchestration of transcriptional regulation in cells. To characterize transcription factor binding site function on a large scale, we predicted and mutagenized 455 binding sites in human promoters. We carried out functional tests on these sites in four different immortalized human cell lines using transient transfections with a luciferase reporter assay, primarily for the transcription factors CTCF, GABP, GATA2, E2F, STAT, and YY1.
RESULTS: In each cell line, between 36% and 49% of binding sites made a functional contribution to the promoter activity; the overall rate for observing function in any of the cell lines was 70%. Transcription factor binding resulted in transcriptional repression in more than a third of functional sites. When compared with predicted binding sites whose function was not experimentally verified, the functional binding sites had higher conservation and were located closer to transcriptional start sites (TSSs). Among functional sites, repressive sites tended to be located further from TSSs than were activating sites. Our data provide significant insight into the functional characteristics of YY1 binding sites, most notably the detection of distinct activating and repressing classes of YY1 binding sites. Repressing sites were located closer to, and often overlapped with, translational start sites and presented a distinctive variation on the canonical YY1 binding motif.
CONCLUSIONS: The genomic properties that we found to associate with functional TF binding sites on promoters -- conservation, TSS proximity, motifs and their variations -- point the way to improved accuracy in future TFBS predictions.
Understanding transcriptional regulation by integrative analysis of transcription factor binding data
Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.
Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors
Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant position and orientation preferences between many cobinding TFs. Genes specifically expressed in a cell line are often associated with a greater occurrence of nearby TF binding in that cell line. We observed cell-line-specific secondary motifs that mediate the binding of the histone deacetylase HDAC2 and the enhancer-binding protein EP300. TF binding sites are located in GC-rich, nucleosome-depleted, and DNase I sensitive regions, flanked by well-positioned nucleosomes, and many of these features show cell type specificity. The GC-richness may be beneficial for regulating TF binding because, when unoccupied by a TF, these regions are occupied by nucleosomes in vivo. We present the results of our analysis in a TF-centric web repository Factorbook (http://factorbook.org) and will continually update this repository as more ENCODE data are generated.
Dicer partner proteins tune the length of mature miRNAs in flies and mammals
Drosophila Dicer-1 produces microRNAs (miRNAs) from pre-miRNA, whereas Dicer-2 generates small interfering RNAs (siRNAs) from long dsRNA. Alternative splicing of the loquacious (loqs) mRNA generates three distinct Dicer partner proteins. To understand the function of each, we constructed flies expressing Loqs-PA, Loqs-PB, or Loqs-PD. Loqs-PD promotes both endo- and exo-siRNA production by Dicer-2. Loqs-PA or Loqs-PB is required for viability, but the proteins are not fully redundant: a specific subset of miRNAs requires Loqs-PB. Surprisingly, Loqs-PB tunes where Dicer-1 cleaves pre-miR-307a, generating a longer miRNA isoform with a distinct seed sequence and target specificity. The longer form of miR-307a represses glycerol kinase and taranis mRNA expression. The mammalian Dicer-partner TRBP, a Loqs-PB homolog, similarly tunes where Dicer cleaves pre-miR-132. Thus, Dicer-binding partner proteins change the choice of cleavage site by Dicer, producing miRNAs with target specificities different from those made by Dicer alone or Dicer bound to alternative protein partners.
The cellular EJC interactome reveals higher-order mRNP structure and an EJC-SR protein nexus
In addition to sculpting eukaryotic transcripts by removing introns, pre-mRNA splicing greatly impacts protein composition of the emerging mRNP. The exon junction complex (EJC), deposited upstream of exon-exon junctions after splicing, is a major constituent of spliced mRNPs. Here, we report comprehensive analysis of the endogenous human EJC protein and RNA interactomes. We confirm that the major "canonical" EJC occupancy site in vivo lies 24 nucleotides upstream of exon junctions and that the majority of exon junctions carry an EJC. Unexpectedly, we find that endogenous EJCs multimerize with one another and with numerous SR proteins to form megadalton sized complexes in which SR proteins are super-stoichiometric to EJC core factors. This tight physical association may explain known functional parallels between EJCs and SR proteins. Further, their protection of long mRNA stretches from nuclease digestion suggests that endogenous EJCs and SR proteins cooperate to promote mRNA packaging and compaction.
A flexible docking approach for prediction of T cell receptor-peptide-MHC complexes
T cell receptors (TCRs) are immune proteins that specifically bind to antigenic molecules, which are often foreign peptides presented by major histocompatibility complex proteins (pMHCs), playing a key role in the cellular immune response. To advance our understanding and modeling of this dynamic immunological event, we assembled a protein-protein docking benchmark consisting of 20 structures of crystallized TCR/pMHC complexes for which unbound structures exist for both TCR and pMHC. We used our benchmark to compare predictive performance using several flexible and rigid backbone TCR/pMHC docking protocols. Our flexible TCR docking algorithm, TCRFlexDock, improved predictive success over the fixed backbone protocol, leading to near-native predictions for 80% of the TCR/pMHC cases among the top 10 models, and 100% of the cases in the top 30 models. We then applied TCRFlexDock to predict the two distinct docking modes recently described for a single TCR bound to two different antigens, and tested several protein modeling scoring functions for prediction of TCR/pMHC binding affinities. This algorithm and benchmark should enable future efforts to predict, and design of uncharacterized TCR/pMHC complexes.
UAP56 couples piRNA clusters to the perinuclear transposon silencing machinery
piRNAs silence transposons during germline development. In Drosophila, transcripts from heterochromatic clusters are processed into primary piRNAs in the perinuclear nuage. The nuclear DEAD box protein UAP56 has been previously implicated in mRNA splicing and export, whereas the DEAD box protein Vasa has an established role in piRNA production and localizes to nuage with the piRNA binding PIWI proteins Ago3 and Aub. We show that UAP56 colocalizes with the cluster-associated HP1 variant Rhino, that nuage granules containing Vasa localize directly across the nuclear envelope from cluster foci containing UAP56 and Rhino, and that cluster transcripts immunoprecipitate with both Vasa and UAP56. Significantly, a charge-substitution mutation that alters a conserved surface residue in UAP56 disrupts colocalization with Rhino, germline piRNA production, transposon silencing, and perinuclear localization of Vasa. We therefore propose that UAP56 and Vasa function in a piRNA-processing compartment that spans the nuclear envelope.
Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium
The Encyclopedia of DNA Elements (ENCODE) consortium aims to identify all functional elements in the human genome including transcripts, transcriptional regulatory regions, along with their chromatin states and DNA methylation patterns. The ENCODE project generates data utilizing a variety of techniques that can enrich for regulatory regions, such as chromatin immunoprecipitation (ChIP), micrococcal nuclease (MNase) digestion and DNase I digestion, followed by deeply sequencing the resulting DNA. As part of the ENCODE project, we have developed a Web-accessible repository accessible at http://factorbook.org. In Wiki format, factorbook is a transcription factor (TF)-centric repository of all ENCODE ChIP-seq datasets on TF-binding regions, as well as the rich analysis results of these data. In the first release, factorbook contains 457 ChIP-seq datasets on 119 TFs in a number of human cell lines, the average profiles of histone modifications and nucleosome positioning around the TF-binding regions, sequence motifs enriched in the regions and the distance and orientation preferences between motif sites.
Strand-specific libraries for high throughput RNA sequencing (RNA-Seq) prepared without poly(A) selection
BACKGROUND: High throughput DNA sequencing technology has enabled quantification of all the RNAs in a cell or tissue, a method widely known as RNA sequencing (RNA-Seq). However, non-coding RNAs such as rRNA are highly abundant and can consume >70% of sequencing reads. A common approach is to extract only polyadenylated mRNA; however, such approaches are blind to RNAs with short or no poly(A) tails, leading to an incomplete view of the transcriptome. Another challenge of preparing RNA-Seq libraries is to preserve the strand information of the RNAs.
DESIGN: Here, we describe a procedure for preparing RNA-Seq libraries from 1 to 4 mug total RNA without poly(A) selection. Our method combines the deoxyuridine triphosphate (dUTP)/uracil-DNA glycosylase (UDG) strategy to achieve strand specificity with AMPure XP magnetic beads to perform size selection. Together, these steps eliminate gel purification, allowing a library to be made in less than two days. We barcode each library during the final PCR amplification step, allowing several samples to be sequenced in a single lane without sacrificing read length. Libraries prepared using this protocol are compatible with Illumina GAII, GAIIx and HiSeq 2000 platforms.
DISCUSSION: The RNA-Seq protocol described here yields strand-specific transcriptome libraries without poly(A) selection, which provide approximately 90% mappable sequences. Typically, more than 85% of mapped reads correspond to protein-coding genes and only 6% derive from non-coding RNAs. The protocol has been used to measure RNA transcript identity and abundance in tissues from flies, mice, rats, chickens, and frogs, demonstrating its general applicability.
Networking development by Boolean logic
Eric Davidson at Caltech has spent several decades investigating the molecular basis of animal development using the sea urchin embryo as an experimental system ( 1) (,) ( 2) although his scholarship extends to all of embryology as embodied in several editions of his landmark book. ( 3) In recent years his laboratory has become a leading force in constructing gene regulatory networks (GRNs) operating in sea urchin development. ( 4) This axis of his work has its roots in this laboratory's cDNA cloning of an actin mRNA from the sea urchin embryo (for the timeline see ref. 1)-one of the first eukaryotic mRNAs to be cloned as it turned out. From that point of departure, the Davidson lab has drilled down into other genes and gene families and the factors that regulate their coordinated regulation, leading them into the GRN era (a field they helped to define) and the development of the computational tools needed to consolidate and advance the GRN field.
The relationship between adiposity and stature in prepubertal children with celiac disease
Background and Aim: The pathogenesis of short stature in celiac disease (CD) is unknown. Obese children are generally taller than their non-obese peers; however, the role of adiposity on stature in CD is unclear. Our aim was to determine the association between adiposity and stature in CD.
Subjects and methods: We compared the anthropometric characteristics of prepubertal children of ages 3-12 years, with biopsy-proven CD (n=40) and who were not on gluten-free diet, to same aged, prepubertal non-CD children (n=50). Body mass index (BMI) was calculated using the formula weight/height2. Sex-adjusted midparental target height (MPTH) standard deviation score (SDS) was calculated using National Children Health Statistics data for 18-year-old adults. Data were expressed as mean±standard deviation.
Results: CD subjects had significantly lower BMI SDS than controls (0.61±1.22 vs. 1.28±1.60, p=0.027) but were not significantly shorter than the controls (-0.05±1.21 vs. 0.21±1.71, p=0.41). When the patients were subdivided into the normal-weight and overweight/obese groups, the normal-weight CD patients were of similar height as the normal-weight controls (p=0.76) but were significantly shorter than both the overweight/obese controls (p=0.003). The MPTH SDS did not differ between the groups.
Conclusions: Overweight/obese prepubertal children with CD were taller than both their normal-weight CD peers and the normal-weight controls, but were of similar height as the overweight/obese control subjects.
NN/LM NER Healthy Community, Community of Interest Final Report (May 1, 2011 - April 30, 2013)
The NN/LM New England Region’s Communities of Interest (COI) foster emerging roles for librarians in dynamic themes in the provision of health information. Members of Communities of Interests share ideas, knowledge, and experiences to help each other improve their library’s services. The Communities of Interest focus on six themes: eScience, Healthcare Workforce, Health Literacy, Healthy Communities, HealthIT, and Knowledge Management. These themes were identified by the NN/LM New England Region at a Town Hall Meeting as priorities for professional development and collaboration. The Communities of Interest host e-learning programs to keep Network Members up-to-date with trends in the profession.
The Healthy Communities COI explores issues related to health information and education outreach to the public in general as well as underserved populations. Topics include planning, implementing, and evaluating community outreach activities, communicating health information to patients and the public, and contributing to your institution to provide community benefit. The COI’s are facilitated by NER staff and led by a Network Member. The Healthy Communities COI is facilitated by Michelle Eberle, Consumer Health Information Coordinator, and led by Deborah Clark, Librarian at Stephens Memorial Hospital. Deborah served as the Leader for Year One and Two. This report summarizes activities from the Community of Interest's first two year.
Linda Cabral, Laura Sefton, and Kathy Muhr on Recruiting People with Mental Health Conditions for Data Collection
Blog post to AEA365, a blog sponsored by the American Evaluation Association (AEA) dedicated to highlighting Hot Tips, Cool Tricks, Rad Resources, and Lessons Learned for evaluators. The American Evaluation Association is an international professional association of evaluators devoted to the application and exploration of program evaluation, personnel evaluation, technology, and many other forms of evaluation. Evaluation involves assessing the strengths and weaknesses of programs, policies, personnel, products, and organizations to improve their effectiveness.
Translational control of mitochondrial energy production mediates neuron morphogenesis
Mitochondrial energy production is a tightly regulated process involving the coordinated transcription of several genes, catalysis of a plethora of posttranslational modifications, and the formation of very large molecular supercomplexes. The regulation of mitochondrial activity is particularly important for the brain, which is a high-energy-consuming organ that depends on oxidative phosphorylation to generate ATP. Here we show that brain mitochondrial ATP production is controlled by the cytoplasmic polyadenylation-induced translation of an mRNA encoding NDUFV2, a key mitochondrial protein. Knockout mice lacking the Cytoplasmic Polyadenylation Element Binding protein 1 (CPEB1) have brain-specific dysfunctional mitochondria and reduced ATP levels, which is due to defective polyadenylation-induced translation of electron transport chain complex I protein NDUFV2 mRNA. This reduced ATP results in defective dendrite morphogenesis of hippocampal neurons both in vitro and in vivo. These and other results demonstrate that CPEB1 control of mitochondrial activity is essential for normal brain development.
Inflammasomes and the Innate Immune Response Against Yersinia Pestis: A Dissertation
Yersinia pestis, the causative agent of plague, is estimated to have claimed the lives of 30-50% of the European population in five years. Although it can now be controlled through antibiotics, there are still lurking dangers of outbreaks from biowarfare and bioterrorism; therefore, ongoing research to further our understanding of its strong virulence factors is necessary for development of new vaccines. Many Gram-negative bacteria, including Y. pseudotuberculosis, the evolutionary ancestor of Y. pestis, produce a hexa-acylated lipid A/LPS which can strongly trigger innate immune responses via activation of Toll-like receptor 4 (TLR4)-MD2. In contrast, Y. pestis grown at 37ºC generates a tetra-acylated lipid A/LPS that poorly induces TLR4-mediated immune activation. We have reported that expression of E. coli lpxL in Y. pestis, which lacks a homologue of this gene, forces the biosynthesis of a hexa-acylated LPS, and that this single modification dramatically reduces virulence in wild type mice, but not in mice lacking a functional TLR4. This emphasizes that avoiding activation of innate immunity is important for Y. pestis virulence. It also provides a model in which survival is strongly dependent on innate immune defenses, presenting a unique opportunity for evaluating the relative importance of innate immunity in protection against bacterial infection. TLR signaling is critical for the sensing of pathogens, and one implication of TLR4 engagement is the induction of the pro-forms of the potent inflammatory cytokines IL-1β and IL-18. Therefore Y. pestis is able to suppress production of these which are generated through caspase-1-activating nucleotide-binding domain and leucine-rich repeat (NLR)-containing inflammasomes. For my thesis, I sought to elucidate the role of NLRs and IL-18/IL-1β during bubonic and pneumonic plague infection. Mice lacking IL-18 signaling led to increased susceptibility to wild type Y. pestis, and an attenuated strain producing a Y. pseudotuberculosis-like hexa-acylated lipid A. I found that the NLRP12, NLRP3 and NLRC4 inflammasomes were important protein complexes in maturing IL-18 and IL-1β during Y. pestis infection, and mice deficient in each of these NLRs were more susceptible to bacterial challenge. NLRC4 and NLRP12 also directed interferongamma production via induction of IL-18 against plague, and minimizing inflammasome activation may have been a central factor in evolution of the high virulence of Y. pestis. This is also the first study that elucidated a pro-inflammatory role for NLRP12 during bacterial infection.
Role of the Cytoplasmic Polyadenylation Element Binding Proteins in Neuron: A Dissertation
Genome regulation is an extremely complex phenomenon. There are various mechanisms in place to ensure smooth performance of the organism. Post-transcriptional regulation of gene expression is one such mechanism. Many proteins bind to mRNAs and regulate their translation. In this thesis, I have focused on the Cytoplasmic Polyadenylation Element Binding family of proteins (CPEB1-4); a group of sequence specific RNA binding proteins important for cell cycle progression, senescence, neuronal function and plasticity. CPEB protein binds mRNAs containing a short Cytoplasmic Polyadenylation Element (CPE) in 3’ untranslated Region (UTR) and regulates the polyadenylation of these mRNAs and thereby controls translation. In Chapter II, I have presented my work on the regulation of mitochondrial function by CPEB. CPEB knockout mice have brain specific defects in mitochondrial function owing to a reduction in Electron transport chain complex I component protein NDUFV2. CPEB controls the translation of this NDUFV2 mRNA and thus affects mitochondrial function. A consequence of this reduced bioenergetics is reduced growth and branching of neurons, again emphasizing the importance of this pathway. Chapter III focuses on the role of CPEB4 in neuronal survival and protection against apoptosis. CPEB4 shuttles between nucleus and cytoplasm and becomes nuclear in response to stimulation with ionotropic glutamate receptors, focal ischemia in vivo and when cultured neurons are deprived of oxygen and glucose; nuclear CPEB4 affords protection against apoptosis in ischemia model. The underlying cause for nuclear translocation is reduction in Endoplasmic Reticulum calcium levels. These studies give an insight into the function and dynamics of these two RNA binding proteins and provide a better understanding of cellular biology.
