eScholarship@UMMS
Uncovering multiple molecular targets for caffeine using a drug target validation strategy combining A 2A receptor knockout mice with microarray profiling
Caffeine is the most widely consumed psychoactive substance and has complex pharmacological actions in brain. In this study, we employed a novel drug target validation strategy to uncover the multiple molecular targets of caffeine using combined A(2A) receptor (A(2A)R) knockouts (KO) and microarray profiling. Caffeine (10 mg/kg) elicited a distinct profile of striatal gene expression in WT mice compared with that by A(2A)R gene deletion or by administering caffeine into A(2A)R KO mice. Thus, A(2A)Rs are required but not sufficient to elicit the striatal gene expression by caffeine (10 mg/kg). Caffeine (50 mg/kg) induced complex expression patterns with three distinct sets of striatal genes: 1) one subset overlapped with those elicited by genetic deletion of A(2A)Rs; 2) the second subset elicited by caffeine in WT as well as A(2A)R KO mice; and 3) the third subset elicited by caffeine only in A(2A)R KO mice. Furthermore, striatal gene sets elicited by the phosphodiesterase (PDE) inhibitor rolipram and the GABA(A) receptor antagonist bicucullin, overlapped with the distinct subsets of striatal genes elicited by caffeine (50 mg/kg) administered to A(2A)R KO mice. Finally, Gene Set Enrichment Analysis reveals that adipocyte differentiation/insulin signaling is highly enriched in the striatal gene sets elicited by both low and high doses of caffeine. The identification of these distinct striatal gene populations and their corresponding multiple molecular targets, including A(2A)R, non-A(2A)R (possibly A(1)Rs and pathways associated with PDE and GABA(A)R) and their interactions, and the cellular pathways affected by low and high doses of caffeine, provides molecular insights into the acute pharmacological effects of caffeine in the brain.
Genomic binding profiles of functionally distinct RNA polymerase III transcription complexes in human cells
Genome-wide occupancy profiles of five components of the RNA polymerase III (Pol III) machinery in human cells identified the expected tRNA and noncoding RNA targets and revealed many additional Pol III-associated loci, mostly near short interspersed elements (SINEs). Several genes are targets of an alternative transcription factor IIIB (TFIIIB) containing Brf2 instead of Brf1 and have extremely low levels of TFIIIC. Strikingly, expressed Pol III genes, unlike nonexpressed Pol III genes, are situated in regions with a pattern of histone modifications associated with functional Pol II promoters. TFIIIC alone associates with numerous ETC loci, via the B box or a novel motif. ETCs are often near CTCF binding sites, suggesting a potential role in chromosome organization. Our results suggest that human Pol III complexes associate preferentially with regions near functional Pol II promoters and that TFIIIC-mediated recruitment of TFIIIB is regulated in a locus-specific manner.
Optical recognition of converted DNA nucleotides for single-molecule DNA sequencing using nanopore arrays
We demonstrate the feasibility of a nanopore based single-molecule DNA sequencing method, which employs multicolor readout. Target DNA is converted according to a binary code, which is recognized by molecular beacons with two types of fluorophores. Solid-state nanopores are then used to sequentially strip off the beacons, leading to a series of detectable photon bursts, at high speed. We show that signals from multiple nanopores can be detected simultaneously, allowing straightforward parallelization to large nanopore arrays.
Mutational analysis of the latency-associated nuclear antigen DNA-binding domain of Kaposi's sarcoma-associated herpesvirus reveals structural conservation among gammaherpesvirus origin-binding proteins
The latency-associated nuclear antigen (LANA) of Kaposi's sarcoma-associated herpesvirus functions as an origin-binding protein (OBP) and transcriptional regulator. LANA binds the terminal repeats via the C-terminal DNA-binding domain (DBD) to support latent DNA replication. To date, the structure of LANA has not been solved. Sequence alignments among OBPs of gammaherpesviruses have revealed that the C terminus of LANA is structurally related to EBNA1, the OBP of Epstein-Barr virus. Based on secondary structure predictions for LANA(DBD) and published structures of EBNA1(DBD), this study used bioinformatics tools to model a putative structure for LANA(DBD) bound to DNA. To validate the predicted model, 38 mutants targeting the most conserved motifs, namely three alpha-helices and a conserved proline loop, were constructed and functionally tested. In agreement with data for EBNA1, residues in helices 1 and 2 mainly contributed to sequence-specific DNA binding and replication activity, whilst mutations in helix 3 affected replication activity and multimer formation. Additionally, several mutants were isolated with discordant phenotypes, which may aid further studies into LANA function. In summary, these data suggest that the secondary and tertiary structures of LANA and EBNA1 DBDs are conserved and are critical for (i) sequence-specific DNA binding, (ii) multimer formation, (iii) LANA-dependent transcriptional repression, and (iv) DNA replication.
Sequence features that drive human promoter function and tissue specificity
Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line-specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line-specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework.
Target RNA-directed trimming and tailing of small silencing RNAs
In Drosophila, microRNAs (miRNAs) typically guide Argonaute1 to repress messenger RNA (mRNA), whereas small interfering RNAs (siRNAs) guide Argonaute2 to destroy viral and transposon RNA. Unlike siRNAs, miRNAs rarely form extensive numbers of base pairs to the mRNAs they regulate. We find that extensive complementarity between a target RNA and an Argonaute1-bound miRNA triggers miRNA tailing and 3'-to-5' trimming. In flies, Argonaute2-bound small RNAs--but not those bound to Argonaute1--bear a 2'-O-methyl group at their 3' ends. This modification blocks target-directed small RNA remodeling: In flies lacking Hen1, the enzyme that adds the 2'-O-methyl group, Argonaute2-associated siRNAs are tailed and trimmed. Target complementarity also affects small RNA stability in human cells. These results provide an explanation for the partial complementarity between animal miRNAs and their targets.
Combinations of affinity-enhancing mutations in a T cell receptor reveal highly nonadditive effects within and between complementarity determining regions and chains
Understanding the energetic and structural response to multiple mutations in a protein-protein interface is a key aspect of rational protein design. Here we investigate the cooperativity of combinations of point mutations of a T cell receptor (TCR) that binds in vivo to HLA-A2 MHC and a viral peptide. The mutations were obtained from two sources: a structure-based design study on the TCR alpha chain (nine mutations) and an in vitro selection study on the TCR beta chain (four mutations). In addition to combining the highest-affinity variants from each chain, we tested other combinations of mutations within and among the chains, for a total of 23 TCR mutants that we measured for binding kinetics to the peptide and major histocompatibility complex. A wide range of binding affinities was observed, from 2- to 1000-fold binding improvement versus that of the wild type, with significant nonadditive effects observed within and between TCR chains. This included an amino acid-dependent cooperative interaction between CDR1 and CDR3 residues that are separated by more than 9 A in the wild-type complex. When analyzing the kinetics of the mutations, we found that the association rates were primarily responsible for the cooperativity, while the dissociation rates were responsible for the anticooperativity (less-than-additive energetics). On the basis of structural modeling of anticooperative mutants, we determined that side chain clash between proximal mutants likely led to nonadditive binding energies. These results highlight the complex nature of TCR association and binding and will be informative in future design efforts that combine multiple mutant residues.
Protein-protein docking benchmark version 4.0
We updated our protein-protein docking benchmark to include complexes that became available since our previous release. As before, we only considered high-resolution complex structures that are nonredundant at the family-family pair level, for which the X-ray or NMR unbound structures of the constituent proteins are also available. Benchmark 4.0 adds 52 new complexes to the 124 cases of Benchmark 3.0, representing an increase of 42%. Thus, benchmark 4.0 provides 176 unbound-unbound cases that can be used for protein-protein docking method development and assessment. Seventeen of the newly added cases are enzyme-inhibitor complexes, and we found no new antigen-antibody complexes. Classifying the new cases according to expected difficulty for protein-protein docking algorithms gives 33 rigid body cases, 11 cases of medium difficulty, and 8 cases that are difficult. Benchmark 4.0 listings and processed structure files are publicly accessible at http://zlab.umassmed.edu/benchmark/.
Performance of ZDOCK and ZRANK in CAPRI rounds 13-19
We report the performance of the ZDOCK and ZRANK algorithms in CAPRI rounds 13-19 and introduce a novel measure atom contact frequency (ACF). To compute ACF, we identify the residues that most often make contact with the binding partner in the complete set of ZDOCK predictions for each target. We used ACF to predict the interface of the proteins, which, in combination with the biological data available in the literature, is a valuable addition to our docking pipeline. Furthermore, we incorporated a straightforward and efficient clustering algorithm with two purposes: (1) to determine clusters of similar docking poses (corresponding to energy funnels) and (2) to remove redundancies from the final set of predictions. With these new developments, we achieved at least one acceptable prediction for targets 29 and 36, at least one medium-quality prediction for targets 41 and 42, and at least one high-quality prediction for targets 37 and 40; thus, we succeeded for six out of a total of 12 targets.
Paternally induced transgenerational environmental reprogramming of metabolic gene expression in mammals
Epigenetic information can be inherited through the mammalian germline and represents a plausible transgenerational carrier of environmental information. To test whether transgenerational inheritance of environmental information occurs in mammals, we carried out an expression profiling screen for genes in mice that responded to paternal diet. Offspring of males fed a low-protein diet exhibited elevated hepatic expression of many genes involved in lipid and cholesterol biosynthesis and decreased levels of cholesterol esters, relative to the offspring of males fed a control diet. Epigenomic profiling of offspring livers revealed numerous modest ( approximately 20%) changes in cytosine methylation depending on paternal diet, including reproducible changes in methylation over a likely enhancer for the key lipid regulator Ppara. These results, in conjunction with recent human epidemiological data, indicate that parental diet can affect cholesterol and lipid metabolism in offspring and define a model system to study environmental reprogramming of the heritable epigenome.
A structure-based benchmark for protein-protein binding affinity
We have assembled a nonredundant set of 144 protein-protein complexes that have high-resolution structures available for both the complexes and their unbound components, and for which dissociation constants have been measured by biophysical methods. The set is diverse in terms of the biological functions it represents, with complexes that involve G-proteins and receptor extracellular domains, as well as antigen/antibody, enzyme/inhibitor, and enzyme/substrate complexes. It is also diverse in terms of the partners' affinity for each other, with K(d) ranging between 10(-5) and 10(-14) M. Nine pairs of entries represent closely related complexes that have a similar structure, but a very different affinity, each pair comprising a cognate and a noncognate assembly. The unbound structures of the component proteins being available, conformation changes can be assessed. They are significant in most of the complexes, and large movements or disorder-to-order transitions are frequently observed. The set may be used to benchmark biophysical models aiming to relate affinity to structure in protein-protein interactions, taking into account the reactants and the conformation changes that accompany the association reaction, instead of just the final product.
ACT: aggregation and correlation toolbox for analyses of genome tracks
We have implemented aggregation and correlation toolbox (ACT), an efficient, multifaceted toolbox for analyzing continuous signal and discrete region tracks from high-throughput genomic experiments, such as RNA-seq or ChIP-chip signal profiles from the ENCODE and modENCODE projects, or lists of single nucleotide polymorphisms from the 1000 genomes project. It is able to generate aggregate profiles of a given track around a set of specified anchor points, such as transcription start sites. It is also able to correlate related tracks and analyze them for saturation--i.e. how much of a certain feature is covered with each new succeeding experiment. The ACT site contains downloadable code in a variety of formats, interactive web servers (for use on small quantities of data), example datasets, documentation and a gallery of outputs. Here, we explain the components of the toolbox in more detail and apply them in various contexts. AVAILABILITY: ACT is available at http://act.gersteinlab.org CONTACT: pi@gersteinlab.org.
A machine learning approach for the prediction of protein surface loop flexibility
Proteins often undergo conformational changes when binding to each other. A major fraction of backbone conformational changes involves motion on the protein surface, particularly in loops. Accounting for the motion of protein surface loops represents a challenge for protein-protein docking algorithms. A first step in addressing this challenge is to distinguish protein surface loops that are likely to undergo backbone conformational changes upon protein-protein binding (mobile loops) from those that are not (stationary loops). In this study, we developed a machine learning strategy based on support vector machines (SVMs). Our SVM uses three features of loop residues in the unbound protein structures-Ramachandran angles, crystallographic B-factors, and relative accessible surface area-to distinguish mobile loops from stationary ones. This method yields an average prediction accuracy of 75.3% compared with a random prediction accuracy of 50%, and an average of 0.79 area under the receiver operating characteristic (ROC) curve using cross-validation. Testing the method on an independent dataset, we obtained a prediction accuracy of 70.5%. Finally, we applied the method to 11 complexes that involve members from the Ras superfamily and achieved prediction accuracy of 92.8% for the Ras superfamily proteins and 74.4% for their binding partners.
Molecular basis of a million-fold affinity maturation process in a protein-protein interaction
Protein engineering is becoming increasingly important for pharmaceutical applications where controlling the specificity and affinity of engineered proteins is required to create targeted protein therapeutics. Affinity increases of several thousand-fold are now routine for a variety of protein engineering approaches, and the structural and energetic bases of affinity maturation have been investigated in a number of such cases. Previously, a 3-million-fold affinity maturation process was achieved in a protein-protein interaction composed of a variant T-cell receptor fragment and a bacterial superantigen. Here, we present the molecular basis of this affinity increase. Using X-ray crystallography, shotgun reversion/replacement scanning mutagenesis, and computational analysis, we describe, in molecular detail, a process by which extrainterfacial regions of a protein complex can be rationally manipulated to significantly improve protein engineering outcomes.
Integrating atom-based and residue-based scoring functions for protein-protein docking
Most scoring functions for protein-protein docking algorithms are either atom-based or residue-based, with the former being able to produce higher quality structures and latter more tolerant to conformational changes upon binding. Earlier, we developed the ZRANK algorithm for reranking docking predictions, with a scoring function that contained only atom-based terms. Here we combine ZRANK's atom-based potentials with five residue-based potentials published by other labs, as well as an atom-based potential IFACE that we published after ZRANK. We simultaneously optimized the weights for selected combinations of terms in the scoring function, using decoys generated with the protein-protein docking algorithm ZDOCK. We performed rigorous cross validation of the combinations using 96 test cases from a docking benchmark. Judged by the integrative success rate of making 1000 predictions per complex, addition of IFACE and the best residue-based pair potential reduced the number of cases without a correct prediction by 38 and 27% relative to ZDOCK and ZRANK, respectively. Thus combination of residue-based and atom-based potentials into a scoring function can improve performance for protein-protein docking. The resulting scoring function is called IRAD (integration of residue- and atom-based potentials for docking) and is available at http://zlab.umassmed.edu.
Gene set enrichment analysis: performance evaluation and usage guidelines
A central goal of biology is understanding and describing the molecular basis of plasticity: the sets of genes that are combinatorially selected by exogenous and endogenous environmental changes, and the relations among the genes. The most viable current approach to this problem consists of determining whether sets of genes are connected by some common theme, e.g. genes from the same pathway are overrepresented among those whose differential expression in response to a perturbation is most pronounced. There are many approaches to this problem, and the results they produce show a fair amount of dispersion, but they all fall within a common framework consisting of a few basic components. We critically review these components, suggest best practices for carrying out each step, and propose a voting method for meeting the challenge of assessing different methods on a large number of experimental data sets in the absence of a gold standard.
Community-wide assessment of protein-interface modeling suggests improvements to design methodology
The CAPRI (Critical Assessment of Predicted Interactions) and CASP (Critical Assessment of protein Structure Prediction) experiments have demonstrated the power of community-wide tests of methodology in assessing the current state of the art and spurring progress in the very challenging areas of protein docking and structure prediction. We sought to bring the power of community-wide experiments to bear on a very challenging protein design problem that provides a complementary but equally fundamental test of current understanding of protein-binding thermodynamics. We have generated a number of designed protein-protein interfaces with very favorable computed binding energies but which do not appear to be formed in experiments, suggesting that there may be important physical chemistry missing in the energy calculations. A total of 28 research groups took up the challenge of determining what is missing: we provided structures of 87 designed complexes and 120 naturally occurring complexes and asked participants to identify energetic contributions and/or structural features that distinguish between the two sets. The community found that electrostatics and solvation terms partially distinguish the designs from the natural complexes, largely due to the nonpolar character of the designed interactions. Beyond this polarity difference, the community found that the designed binding surfaces were, on average, structurally less embedded in the designed monomers, suggesting that backbone conformational rigidity at the designed surface is important for realization of the designed function. These results can be used to improve computational design strategies, but there is still much to be learned; for example, one designed complex, which does form in experiments, was classified by all metrics as a nonbinder.
The 3'-to-5' exoribonuclease Nibbler shapes the 3' ends of microRNAs bound to Drosophila Argonaute1
BACKGROUND: MicroRNAs (miRNAs) are ~22 nucleotide (nt) small RNAs that control development, physiology, and pathology in animals and plants. Production of miRNAs involves the sequential processing of primary hairpin-containing RNA polymerase II transcripts by the RNase III enzymes Drosha in the nucleus and Dicer in the cytoplasm. miRNA duplexes then assemble into Argonaute proteins to form the RNA-induced silencing complex (RISC). In mature RISC, a single-stranded miRNA directs the Argonaute protein to bind partially complementary sequences, typically in the 3' untranslated regions of messenger RNAs, repressing their expression.
RESULTS: Here, we show that after loading into Argonaute1 (Ago1), more than a quarter of all Drosophila miRNAs undergo 3' end trimming by the 3'-to-5' exoribonuclease Nibbler (CG9247). Depletion of Nibbler by RNA interference (RNAi) reveals that miRNAs are frequently produced by Dicer-1 as intermediates that are longer than ~22 nt. Trimming of miRNA 3' ends occurs after removal of the miRNA* strand from pre-RISC and may be the final step in RISC assembly, ultimately enhancing target messenger RNA repression. In vivo, depletion of Nibbler by RNAi causes developmental defects.
CONCLUSIONS: We provide a molecular explanation for the previously reported heterogeneity of miRNA 3' ends and propose a model in which Nibbler converts miRNAs into isoforms that are compatible with the preferred length of Ago1-bound small RNAs.
Epigenetic signatures of autism: trimethylated H3K4 landscapes in prefrontal neurons
CONTEXT: Neuronal dysfunction in cerebral cortex and other brain regions could contribute to the cognitive and behavioral defects in autism.
OBJECTIVE: To characterize epigenetic signatures of autism in prefrontal cortex neurons.
DESIGN: We performed fluorescence-activated sorting and separation of neuronal and nonneuronal nuclei from postmortem prefrontal cortex, digested the chromatin with micrococcal nuclease, and deeply sequenced the DNA from the mononucleosomes with trimethylated H3K4 (H3K4me3), a histone mark associated with transcriptional regulation. Approximately 15 billion base pairs of H3K4me3-enriched sequences were collected from 32 brains.
SETTING: Academic medical center.
PARTICIPANTS: A total of 16 subjects diagnosed as having autism and 16 control subjects ranging in age from 0.5 to 70 years.
MAIN OUTCOME MEASURES: Identification of genomic loci showing autism-associated H3K4me3 changes in prefrontal cortex neurons.
RESULTS: Subjects with autism showed no evidence for generalized disruption of the developmentally regulated remodeling of the H3K4me3 landscape that defines normal prefrontal cortex neurons in early infancy. However, excess spreading of H3K4me3 from the transcription start sites into downstream gene bodies and upstream promoters was observed specifically in neuronal chromatin from 4 of 16 autism cases but not in controls. Variable subsets of autism cases exhibit altered H3K4me3 peaks at numerous genes regulating neuronal connectivity, social behaviors, and cognition, often in conjunction with altered expression of the corresponding transcripts. Autism-associated H3K4me3 peaks were significantly enriched in genes and loci implicated in neurodevelopmental diseases.
CONCLUSIONS: Prefrontal cortex neurons from subjects with autism show changes in chromatin structures at hundreds of loci genome-wide, revealing considerable overlap between genetic and epigenetic risk maps of developmental brain disorders.
Adaptation to P element transposon invasion in Drosophila melanogaster
Transposons evolve rapidly and can mobilize and trigger genetic instability. Piwi-interacting RNAs (piRNAs) silence these genome pathogens, but it is unclear how the piRNA pathway adapts to invasion of new transposons. In Drosophila, piRNAs are encoded by heterochromatic clusters and maternally deposited in the embryo. Paternally inherited P element transposons thus escape silencing and trigger a hybrid sterility syndrome termed P-M hybrid dysgenesis. We show that P-M hybrid dysgenesis activates both P elements and resident transposons and disrupts the piRNA biogenesis machinery. As dysgenic hybrids age, however, fertility is restored, P elements are silenced, and P element piRNAs are produced de novo. In addition, the piRNA biogenesis machinery assembles, and resident elements are silenced. Significantly, resident transposons insert into piRNA clusters, and these new insertions are transmitted to progeny, produce novel piRNAs, and are associated with reduced transposition. P element invasion thus triggers heritable changes in genome structure that appear to enhance transposon silencing.
