prot references

[Bernstein1977Protein] F. C. Bernstein, T. F. Koetzle, G. J. Williams, E. F. Meyer, M. D. Brice, J. R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi. The protein data bank: a computer-based archival file for macromolecular structures. J. Mol. Biol., 112(3):535-542, May 1977. [ bib ]
Keywords: Computers; Great Britain; Information Systems; Japan; Protein Conformation; Proteins; United States
[Pepperrell1991Techniques] C. A. Pepperrell and P. Willett. Techniques for the calculation of three-dimensional structural similarity using inter-atomic distances. J Comput Aided Mol Des, 5(5):455-474, Oct 1991. [ bib ]
This paper reports a comparison of several methods for measuring the degree of similarity between pairs of 3-D chemical structures that are represented by inter-atomic distance matrices. The methods that have been tested use the distance information in very different ways and have very different computational requirements. Experiments with 10 small datasets, for which both structural and biological activity data are available, suggest that the most cost-effective technique is based on a mapping procedure that tries to match pairs of atoms, one from each of the molecules that are being compared, that have neighbouring atoms at approximately the same distances.

Keywords: Algorithms, Binding Sites, Chemical, Chemistry, Comparative Study, Computer Simulation, Databases, Factual, Macromolecular Substances, Models, Molecular Conformation, Molecular Structure, Non-U.S. Gov't, Physical, Protein Conformation, Protein Structure, Proteins, Research Support, Structure-Activity Relationship, Tertiary, 1770381
[Russell1992Multiple] R. B. Russell and G. J. Barton. Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels. Proteins, 14(2):309-323, Oct 1992. [ bib | DOI | http ]
An algorithm is presented for the accurate and rapid generation of multiple protein sequence alignments from tertiary structure comparisons. A preliminary multiple sequence alignment is performed using sequence information, which then determines an initial superposition of the structures. A structure comparison algorithm is applied to all pairs of proteins in the superimposed set and a similarity tree calculated. Multiple sequence alignments are then generated by following the tree from the branches to the root. At each branchpoint of the tree, a structure-based sequence alignment and coordinate transformations are output, with the multiple alignment of all structures output at the root. The algorithm encoded in STAMP (STructural Alignment of Multiple Proteins) is shown to give alignments in good agreement with published structural accounts within the dehydrogenase fold domains, globins, and serine proteinases. In order to reduce the need for visual verification, two similarity indices are introduced to determine the quality of each generated structural alignment. Sc quantifies the global structural similarity between pairs or groups of proteins, whereas Pij' provides a normalized measure of the confidence in the alignment of each residue. STAMP alignments have the quality of each alignment characterized by Sc and Pij' values and thus provide a reproducible resource for studies of residue conservation within structural motifs.

Keywords: Algorithms; Amino Acid Sequence; Animals; Confidence Intervals; Globins; Humans; Molecular Sequence Data; Protein Structure, Tertiary; Sequence Alignment; Sequence Homology, Amino Acid; Serine Endopeptidases; Software
[Henikoff1992Amino] S. Henikoff and J. G. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA, 89(22):10915-10919, Nov 1992. [ bib ]
Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.

Keywords: Algorithms; Amino Acid Sequence; Animals; Caenorhabditis elegans; Drosophila; Lod Score; Mathematics; Molecular Sequence Data; Probability; Proteins; Sequence Homology, Amino Acid; Software
[Sette1994relationship] A. Sette, A. Vitiello, B. Reherman, P. Fowler, R. Nayersina, W. M. Kast, C. J. Melief, C. Oseroff, L. Yuan, J. Ruppert, J. Sidney, M. F. del Guercio, S. Southwood, R. T. Kubo, R. W. Chesnut, H. M. Grey, and F. V. Chisari. The relationship between class i binding affinity and immunogenicity of potential cytotoxic t cell epitopes. J. Immunol., 153(12):5586-5592, Dec 1994. [ bib ]
The relationship between binding affinity for HLA class I molecules and immunogenicity of discrete peptide epitopes has been analyzed in two different experimental approaches. In the first approach, the immunogenicity of potential epitopes ranging in MHC binding affinity over a 10,000-fold range was analyzed in HLA-A*0201 transgenic mice. In the second approach, the antigenicity of approximately 100 different hepatitis B virus (HBV)-derived potential epitopes, all carrying A*0201 binding motifs, was assessed by using PBL of acute hepatitis patients. In both cases, it was found that an affinity threshold of approximately 500 nM (preferably 50 nM or less) apparently determines the capacity of a peptide epitope to elicit a CTL response. These data correlate well with class I binding affinity measurements of either naturally processed peptides or previously described T cell epitopes. Taken together, these data have important implications for the selection of epitopes for peptide-based vaccines, and also formally demonstrate the crucial role of determinant selection in the shaping of T cell responses. Because in most (but not all) cases, high affinity peptides seem to be immunogenic, our data also suggest that holes in the functional T cell repertoire, if they exist, may be relatively rare.

Keywords: Amino Acid Sequence; Animals; Cell Line; Cytotoxicity Tests, Immunologic; Epitopes; HLA-A Antigens; Hepatitis B; Hepatitis B Antigens; Humans; Mice; Mice, Transgenic; Molecular Sequence Data; Peptides; Protein Binding; T-Lymphocytes, Cytotoxic
[Sidney1995Several] J. Sidney, M. F. del Guercio, S. Southwood, V. H. Engelhard, E. Appella, H. G. Rammensee, K. Falk, O. Rötzschke, M. Takiguchi, and R. T. Kubo. Several HLA alleles share overlapping peptide specificities. J. Immunol., 154(1):247-259, Jan 1995. [ bib ]
Herein we describe the establishment of assays to measure peptide binding to purified HLA-B*0701, -B*0801, -B*2705, -B*3501-03, -B*5401, -Cw*0401, -Cw*0602, and -Cw*0702 molecules. The binding of known peptide epitopes or naturally processed peptides correlates well with HLA restriction or origin, underscoring the immunologic relevance of these assays. Analysis of the sequences of various HLA class I alleles suggested that alleles with peptide motifs characterized by proline in position 2 and aromatic or hydrophobic residues at their C-terminus shared key consensus residues at positions 9, 63, 66, 67, and 70 (B pocket) and residue 116 (F pocket). Prediction of the peptide-binding specificity of HLA-B*5401, on the basis of this consensus B and F pocket structure, verified this hypothesis and suggested that a relatively large family of HLA-B alleles (which we have defined as the HLA-B7-like supertype) may significantly overlap in peptide binding specificity. Availability of quantitative binding assays allowed verification that, indeed, many (25%) of the peptide ligands carrying proline in position 2 and hydrophobic/aromatic residues at the C-terminus (the B7-like supermotif) were capable of binding at least three of five HLA-B7-like supertype alleles. Identification of epitopes carrying the B7-like supermotif and binding to a family of alleles represented in over 40% of individuals from all major ethnic groups may be of considerable use in the design of peptide vaccines.

Keywords: Alleles; Amino Acid Sequence; Cell Line, Transformed; Consensus Sequence; Epitopes; Genes, MHC Class I; HLA-B Antigens; HLA-C Antigens; Humans; Molecular Sequence Data; Peptide Fragments; Protein Binding; Protein Structure, Tertiary; Structure-Activity Relationship; Substrate Specificity
[Wilkins1996From] M. R. Wilkins, C. Pasquali, R. D. Appel, K. Ou, O. Golaz, J. C. Sanchez, J. X. Yan, A. A. Gooley, G. Hughes, I. Humphery-Smith, K. L. Williams, and D. F. Hochstrasser. From proteins to proteomes: large scale protein identification by two-dimensional electrophoresis and amino acid analysis. Biotechnology (N Y), 14(1):61-65, Jan 1996. [ bib ]
Separation and identification of proteins by two-dimensional (2-D) electrophoresis can be used for protein-based gene expression analysis. In this report single protein spots, from polyvinylidene difluoride blots of micropreparative E. coli 2-D gels, were rapidly and economically identified by matching their amino acid composition, estimated pI and molecular weight against all E. coli entries in the SWISS-PROT database. Thirty proteins from an E. coli 2-D map were analyzed and identities assigned. Three of the proteins were unknown. By protein sequencing analysis, 20 of the 27 proteins were correctly identified. Importantly, correct identifications showed unambiguous "correct" score patterns. While incorrect protein identifications also showed distinctive score patterns, indicating that protein must be identified by other means. These techniques allow large-scale screening of the protein complement of simple organisms, or tissues in normal and disease states. The computer program described here is accessible via the World Wide Web at URL address (http:@expasy.hcuge.ch/).

Keywords: Amino Acids; Bacterial Proteins; Blood Proteins; Databases, Factual; Electrophoresis, Gel, Two-Dimensional; Escherichia coli; Humans; Microchemistry; Molecular Weight; Multienzyme Complexes; Proteins; Reproducibility of Results; Software; Time Factors
[Sidney1996Definition] J. Sidney, H. M. Grey, S. Southwood, E. Celis, P. A. Wentworth, M. F. del Guercio, R. T. Kubo, R. W. Chesnut, and A. Sette. Definition of an HLA-A3-like supermotif demonstrates the overlapping peptide-binding repertoires of common HLA molecules. Hum Immunol, 45(2):79-93, Feb 1996. [ bib ]
An HLA-A3-like supertype (minimally comprised of products from the HLA class I alleles A3, A11, A31, A*3301, and A*6801) has been defined on the basis of (a) structural similarities in the antigen-binding groove, (b) shared main anchor peptide-binding motifs, (c) the identification of peptides cross-reacting with most or all of these molecules, and (d) the definition of an A3-like supermotif that efficiently predicts highly cross-reactive peptides. Detailed secondary anchor maps for A3, A11, A31, A*3301, and A*6801 are also described. The biologic relevance of the A3-like supertype is indicated by the fact that high frequencies of the A3-like supertype alleles are conserved in all major ethnic groups. Because A3-like supertype alleles are found in most major HLA evolutionary lineages, possibly a reflection of common ancestry, the A3-like supermotif might in fact represent a primeval human HLA class I peptide-binding specificity. It is also possible that these phenomena might be related to optimal exploitation of the peptide specificity by human TAP molecules. The grouping of HLA alleles into supertypes on the basis of their overlapping peptide-binding repertoires represents an alternative to serologic or phylogenetic classification.

Keywords: Alleles; Amino Acid Sequence; Cell Line, Transformed; Cross Reactions; HLA Antigens; HLA-A3 Antigen; HLA-B Antigens; Haplotypes; Humans; Molecular Sequence Data; Peptide Fragments; Protein Binding; Structure-Activity Relationship
[Rarey1996Placement] M. Rarey, S. Wefing, and T. Lengauer. Placement of medium-sized molecular fragments into active sites of proteins. J Comput Aided Mol Des, 10(1):41-54, Feb 1996. [ bib ]
We present an algorithm for placing molecular fragments into the active site of a receptor. A molecular fragment is defined as a connected part of a molecule containing only complete ring systems. The algorithm is part of a docking tool, called FLEXX, which is currently under development at GMD. The overall goal is to provide means of automatically computing low-energy conformations of the ligand within the active site, with an accuracy approaching the limitations of experimental methods for resolving molecular structures and within a run time that allows for docking large sets of ligands. The methods by which we plan to achieve this goal are the explicit exploitation of molecular flexibility of the ligand and the incorporation of physicochemical properties of the molecules. The algorithm for fragment placement, which is the topic of this paper, is based on pattern recognition techniques and is able to predict a small set of possible positions of a molecular fragment with low flexibility within seconds on a workstation. In most cases, a placement with rms deviation below 1.0 A with respect to the X-ray structure is found among the 10 highest ranking solutions, assuming that the receptor is given in the bound conformation.

Keywords: Algorithms; Binding Sites; Databases, Factual; Ligands; Models, Chemical; Peptide Fragments, chemistry; Proteins, chemistry; Software
[Rarey1996fast] M. Rarey, B. Kramer, T. Lengauer, and G. Klebe. A fast flexible docking method using an incremental construction algorithm. J. Mol. Biol., 261(3):470-489, Aug 1996. [ bib | DOI | http ]
We present an automatic method for docking organic ligands into protein binding sites. The method can be used in the design process of specific protein ligands. It combines an appropriate model of the physico-chemical properties of the docked molecules with efficient methods for sampling the conformational space of the ligand. If the ligand is flexible, it can adopt a large variety of different conformations. Each such minimum in conformational space presents a potential candidate for the conformation of the ligand in the complexed state. Our docking method samples the conformation space of the ligand on the basis of a discrete model and uses a tree-search technique for placing the ligand incrementally into the active site. For placing the first fragment of the ligand into the protein, we use hashing techniques adapted from computer vision. The incremental construction algorithm is based on a greedy strategy combined with efficient methods for overlap detection and for the search of new interactions. We present results on 19 complexes of which the binding geometry has been crystallographically determined. All considered ligands are docked in at most three minutes on a current workstation. The experimentally observed binding mode of the ligand is reproduced with 0.5 to 1.2 A rms deviation. It is almost always found among the highest-ranking conformations computed.

Keywords: Aldehyde Reductase, Algorithms, Amiloride, Aminoimidazole Carboxamide, Animals, Arabinose, Automation, Binding Sites, Carbonic Anhydrases, Computational Biology, Computer Simulation, Concanavalin A, Crystallography, Databases, Drug Design, Drug Evaluation, Enzyme Inhibitors, Factual, Folic Acid, Folic Acid Antagonists, Fructose-Bisphosphatase, Humans, Internet, Ligands, Methotrexate, Models, Molecular, Non-U.S. Gov't, Pancreatic Elastase, Pentamidine, Pliability, Point Mutation, Preclinical, Protein Binding, Protein Conformation, Proteins, Reproducibility of Results, Research Support, Ribonucleosides, Software, Tetrahydrofolate Dehydrogenase, Thermolysin, Time Factors, Trypsin, X-Ray, 8780787
[Kristiansen1996database] K. Kristiansen, S. G. Dahl, and O. Edvardsen. A database of mutants and effects of site-directed mutagenesis experiments on G protein-coupled receptors. Proteins, 26(1):81-94, Sep 1996. [ bib | DOI | http ]
A database system and computer programs for storage and retrieval of information about guanine nucleotide-binding protein (G protein) -coupled receptor mutants and associated biological effects have been developed. Mutation data on the receptors were collected from the literature and a database of mutants and effects of mutations was developed. The G protein-coupled receptor, family A, point mutation database (GRAP) provides detailed information on ligand-binding and signal transduction properties of more than 2130 receptor mutants. The amino acid sequences of receptors for which mutation experiments have been reported were aligned, and from this alignment mutation data may be retrieved. Alternatively, a search form allowing detailed specification of which mutants to retrieve may be used, for example, to search for specific amino acid substitutions, substitutions in specific protein domains or reported biological effects. Furthermore, ligand and bibliographic oriented queries may be performed. GRAP is available on the Internet (URL: http://www-grap.fagmed.uit.no/GRAP/+ +homepage.html) using the World-Wide Web system.

Keywords: Amino Acid Sequence; Computer Communication Networks; Computers; GTP-Binding Proteins; Information Systems; Molecular Sequence Data; Mutagenesis, Site-Directed; Mutation; Receptors, Cell Surface; Sequence Alignment
[Nielsen1997Identification] H. Nielsen, J. Engelbrecht, S. Brunak, and G. von Heijne. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng., 10(1):1-6, 1997. [ bib | http | .pdf ]
[Luo1997Mammalian] Y. Luo, A. Batalao, H. Zhou, and L. Zhu. Mammalian two-hybrid system: a complementary approach to the yeast two-hybrid system. Biotechniques, 22(2):350-352, Feb 1997. [ bib ]
Here we demonstrate the use of a mammalian two-hybrid system to study protein-protein interactions. Like the yeast two-hybrid system, this is a genetic, in vivo assay based on the reconstitution of the function of a transcriptional activator. In this system, one protein of interest is expressed as a fusion to the Gal4 DNA-binding domain and another protein is expressed as a fusion to the activation domain of the VP16 protein of the herpes simplex virus. The vectors that express these fusion proteins are cotransfected with a reporter chloramphenicol acetyltransferase (CAT) vector into a mammalian cell line. The reporter plasmid contains a cat gene under the control of five consensus Gal4 binding sites. If the two fusion proteins interact, there will be a significant increase in expression of the cat reporter gene. Previously, it was reported that mouse p53 antitumor protein and simian virus 40 large T antigen interact in a yeast two-hybrid system. Using a mammalian two-hybrid system, we were able to independently confirm this interaction. The mammalian two-hybrid system can be used as a complementary approach to verify protein-protein interactions detected by a yeast two-hybrid system screening. In addition, the mammalian two-hybrid system has two main advantages: (i) Assay results can be obtained within 48 h of transfection, and (ii) protein interactions in mammalian cells may better mimic actual in vivo interactions.

Keywords: Antigens, Polyomavirus Transforming; Binding Sites; Chloramphenicol O-Acetyltransferase; DNA; DNA-Binding Proteins; Fungal Proteins; Genes, Reporter; Genetic Vectors; Hela Cells; Herpes Simplex Virus Protein Vmw65; Humans; Promoter Regions, Genetic; Recombinant Fusion Proteins; Saccharomyces cerevisiae Proteins; Simian virus 40; Transcription Factors; Transfection; Tumor Suppressor Protein p53
[Jones1997Development] G. Jones, P. Willett, R. C. Glen, A. R. Leach, and R. Taylor. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol., 267(3):727-748, Apr 1997. [ bib | DOI | http ]
Prediction of small molecule binding modes to macromolecules of known three-dimensional structure is a problem of paramount importance in rational drug design (the "docking" problem). We report the development and validation of the program GOLD (Genetic Optimisation for Ligand Docking). GOLD is an automated ligand docking program that uses a genetic algorithm to explore the full range of ligand conformational flexibility with partial flexibility of the protein, and satisfies the fundamental requirement that the ligand must displace loosely bound water on binding. Numerous enhancements and modifications have been applied to the original technique resulting in a substantial increase in the reliability and the applicability of the algorithm. The advanced algorithm has been tested on a dataset of 100 complexes extracted from the Brookhaven Protein DataBank. When used to dock the ligand back into the binding site, GOLD achieved a 71% success rate in identifying the experimental binding mode.

Keywords: Algorithms, Binding Sites, Computer Simulation, Crystallography, Genetic, Humans, Ligands, Models, Molecular, NADP, Protein Binding, Protein Conformation, Proteins, Tetrahydrofolate Dehydrogenase, X-Ray, 9126849
[Gulukota1997Two] K. Gulukota, J. Sidney, A. Sette, and C. DeLisi. Two complementary methods for predicting peptides binding major histocompatibility complex molecules. J. Mol. Biol., 267(5):1258-1267, Apr 1997. [ bib | DOI | http ]
Peptides that bind to major histocompatibility complex products (MHC) are known to exhibit certain sequence motifs which, though common, are neither necessary nor sufficient for binding: MHCs bind certain peptides that do not have the characteristic motifs and only about 30% of the peptides having the required motif, bind. In order to develop and test more accurate methods we measured the binding affinity of 463 nonamer peptides to HLA-A2.1. We describe two methods for predicting whether a given peptide will bind to an MHC and apply them to these peptides. One method is based on simulating a neural network and another, called the polynomial method, is based on statistical parameter estimation assuming independent binding of the side-chains of residues. We compare these methods with each other and with standard motif-based methods. The two methods are complementary, and both are superior to sequence motifs. The neural net is superior to simple motif searches in eliminating false positives. Its behavior can be coarsely tuned to the strength of binding desired and it is extendable in a straightforward fashion to other alleles. The polynomial method, on the other hand, has high sensitivity and is a superior method for eliminating false negatives. We discuss the validity of the independent binding assumption in such predictions.

Keywords: Artificial Intelligence; Computing Methodologies; HLA-A2 Antigen; Neural Networks (Computer); Oligopeptides; Protein Binding; Reproducibility of Results
[Schneider1998Artificial] G. Schneider and P. Wrede. Artificial neural networks for computer-based molecular design. Prog Biophys Mol Biol, 70(3):175-222, 1998. [ bib ]
The theory of artificial neural networks is briefly reviewed focusing on supervised and unsupervised techniques which have great impact on current chemical applications. An introduction to molecular descriptors and representation schemes is given. In addition, worked examples of recent advances in this field are highlighted and pioneering publications are discussed. Applications of several types of artificial neural networks to compound classification, modelling of structure-activity relationships, biological target identification, and feature extraction from biopolymers are presented and compared to other techniques. Advantages and limitations of neural networks for computer-aided molecular design and sequence analysis are discussed.

Keywords: Algorithms, Amino Acid Sequence, Amino Acids, Animals, Artificial Intelligence, Automated, Bacterial, Bacterial Proteins, Bicuculline, Binding Sites, Biological, Biological Availability, Blood Proteins, Blood-Brain Barrier, Cation Transport Proteins, Cats, Cell Membrane Permeability, Chemical, Chemistry, Cluster Analysis, Combinatorial Chemistry Techniques, Comparative Study, Computational Biology, Computer Simulation, Computer Systems, Computer-Aided Design, Computer-Assisted, Computing Methodologies, DNA-Binding Proteins, Databases, Dogs, Drug Design, Electric Stimulation, Electromyography, Enzyme Inhibitors, Ether-A-Go-Go Potassium Channels, Excitatory Amino Acid Antagonists, Factual, False Positive Reactions, Forecasting, Forelimb, GABA Antagonists, Gene Expression Profiling, Genome, Glutamic Acid, Humans, Hydrogen Bonding, Image Enhancement, Image Interpretation, Image Processing, Information Storage and Retrieval, Iontophoresis, Kynurenic Acid, Least-Squares Analysis, Linear Models, Liver, Markov Chains, Metabolic Clearance Rate, Metalloendopeptidases, Microelectrodes, Models, Molecular, Molecular Conformation, Molecular Sequence Data, Molecular Structure, Motor Cortex, Movement, Multivariate Analysis, Nerve Net, Neural Networks (Computer), Neuropeptides, Non-U.S. Gov't, Nonlinear Dynamics, Pattern Recognition, Pharmaceutical, Pharmaceutical Preparations, Pharmacokinetics, Phylogeny, Potassium Channels, Predictive Value of Tests, Protein Interaction Mapping, Protein Sorting Signals, Protein Structure, Proteins, Rats, Reproducibility of Results, Research Support, Sensitivity and Specificity, Sequence Alignment, Sequence Analysis, Shoulder, Signal Processing, Software, Statistical, Stereotaxic Techniques, Structure-Activity Relationship, Terminology, Tertiary, Trans-Activators, Voltage-Gated, Zinc, 9830312
[Poggio1998Sparse] Poggio and Girosi. A Sparse Representation for Function Approximation. Neural Comput, 10(6):1445-54, Jul 1998. [ bib ]
We derive a new general representation for a function as a linear combination of local correlation kernels at optimal sparse locations (and scales) and characterize its relation to principal component analysis, regularization, sparsity principles, and support vector machines.

Keywords: Algorithms, Automated, Biometry, Computers, DNA, Databases, Factual, Fungal, Fungal Proteins, GTP-Binding Proteins, Gene Expression, Genes, Learning, Markov Chains, Models, Neural Networks (Computer), Neurological, Non-P.H.S., Non-U.S. Gov't, Nucleic Acid Hybridization, Open Reading Frames, P.H.S., Pattern Recognition, Protein, Protein Structure, Proteins, Reproducibility of Results, Research Support, Saccharomyces cerevisiae, Sequence Alignment, Sequence Analysis, Software, Statistical, Tertiary, U.S. Gov't, 9698352
[Kononen1998Tissue] J. Kononen, L. Bubendorf, A. Kallioniemi, M. Bärlund, P. Schraml, S. Leighton, J. Torhorst, M. J. Mihatsch, G. Sauter, and O. P. Kallioniemi. Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat Med, 4(7):844-847, Jul 1998. [ bib ]
Many genes and signalling pathways controlling cell proliferation, death and differentiation, as well as genomic integrity, are involved in cancer development. New techniques, such as serial analysis of gene expression and cDNA microarrays, have enabled measurement of the expression of thousands of genes in a single experiment, revealing many new, potentially important cancer genes. These genome screening tools can comprehensively survey one tumor at a time; however, analysis of hundreds of specimens from patients in different stages of disease is needed to establish the diagnostic, prognostic and therapeutic importance of each of the emerging cancer gene candidates. Here we have developed an array-based high-throughput technique that facilitates gene expression and copy number surveys of very large numbers of tumors. As many as 1000 cylindrical tissue biopsies from individual tumors can be distributed in a single tumor tissue microarray. Sections of the microarray provide targets for parallel in situ detection of DNA, RNA and protein targets in each specimen on the array, and consecutive sections allow the rapid analysis of hundreds of molecular markers in the same set of specimens. Our detection of six gene amplifications as well as p53 and estrogen receptor expression in breast cancer demonstrates the power of this technique for defining new subgroups of tumors.

Keywords: Animals; Breast Neoplasms, genetics/metabolism/pathology; Cyclin D1, genetics/metabolism; Female; Genetic Techniques; Humans; Immunoenzyme Techniques; In Situ Hybridization, Fluorescence; Mice; Oncogene Proteins v-myb; Proto-Oncogene Proteins c-myc, genetics/metabolism; Rabbits; Receptor, erbB-2, genetics/metabolism; Receptors, Estrogen, genetics/metabolism; Retroviridae Proteins, Oncogenic, genetics/metabolism; Tumor Markers, Biological, genetics/metabolism; Tumor Suppressor Protein p53, genetics/metabolism
[Girosi1998Equivalence] Girosi. An Equivalence Between Sparse Approximation and Support Vector Machines. Neural Comput, 10(6):1455-80, Jul 1998. [ bib ]
This article shows a relationship between two different approximation techniques: the support vector machines (SVM), proposed by V. Vapnik (1995) and a sparse approximation scheme that resembles the basis pursuit denoising algorithm (Chen, 1995; Chen, Donoho, and Saunders, 1995). SVM is a technique that can be derived from the structural risk minimization principle (Vapnik, 1982) and can be used to estimate the parameters of several different approximation schemes, including radial basis functions, algebraic and trigonometric polynomials, B-splines, and some forms of multilayer perceptrons. Basis pursuit denoising is a sparse approximation technique in which a function is reconstructed by using a small number of basis functions chosen from a large set (the dictionary). We show that if the data are noiseless, the modified version of basis pursuit denoising proposed in this article is equivalent to SVM in the following sense: if applied to the same data set, the two techniques give the same solution, which is obtained by solving the same quadratic programming problem. In the appendix, we present a derivation of the SVM technique in one framework of regularization theory, rather than statistical learning theory, establishing a connection between SVM, sparse approximation, and regularization theory.

Keywords: Algorithms, Automated, Biometry, Computers, DNA, Databases, Factual, Fungal, Fungal Proteins, GTP-Binding Proteins, Gene Expression, Genes, Learning, Markov Chains, Models, Neural Networks (Computer), Neurological, Non-P.H.S., Non-U.S. Gov't, Nucleic Acid Hybridization, Open Reading Frames, P.H.S., Pattern Recognition, Protein, Protein Structure, Proteins, Reproducibility of Results, Research Support, Saccharomyces cerevisiae, Sequence Alignment, Sequence Analysis, Software, Statistical, Tertiary, U.S. Gov't, 9698353
[Early1998Polychemotherapy] Early Breast Cancer Trialists’ Collaborative Group. Polychemotherapy for early breast cancer: an overview of the randomised trials. early breast cancer trialists' collaborative group. Lancet, 352(9132):930-942, Sep 1998. [ bib ]
There have been many randomised trials of adjuvant prolonged polychemotherapy among women with early breast cancer, and an updated overview of their results is presented.In 1995, information was sought on each woman in any randomised trial that began before 1990 and involved treatment groups that differed only with respect to the chemotherapy regimens that were being compared. Analyses involved about 18,000 women in 47 trials of prolonged polychemotherapy versus no chemotherapy, about 6000 in 11 trials of longer versus shorter polychemotherapy, and about 6000 in 11 trials of anthracycline-containing regimens versus CMF (cyclophosphamide, methotrexate, and fluorouracil).For recurrence, polychemotherapy produced substantial and highly significant proportional reductions both among women aged under 50 at randomisation (35% [SD 4] reduction; 2p<0.00001) and among those aged 50-69 (20% [SD 3] reduction; 2p<0.00001); few women aged 70 or over had been studied. For mortality, the reductions were also significant both among women aged under 50 (27% [SD 5] reduction; 2p<0.00001) and among those aged 50-69 (11% [SD 3] reduction; 2p=0.0001). The recurrence reductions emerged chiefly during the first 5 years of follow-up, whereas the difference in survival grew throughout the first 10 years. After standardisation for age and time since randomisation, the proportional reductions in risk were similar for women with node-negative and node-positive disease. Applying the proportional mortality reduction observed in all women aged under 50 at randomisation would typically change a 10-year survival of 71% for those with node-negative disease to 78% (an absolute benefit of 7%), and of 42% for those with node-positive disease to 53% (an absolute benefit of 11%). The smaller proportional mortality reduction observed in all women aged 50-69 at randomisation would translate into smaller absolute benefits, changing a 10-year survival of 67% for those with node-negative disease to 69% (an absolute gain of 2%) and of 46% for those with node-positive disease to 49% (an absolute gain of 3%). The age-specific benefits of polychemotherapy appeared to be largely irrespective of menopausal status at presentation, oestrogen receptor status of the primary tumour, and of whether adjuvant tamoxifen had been given. In terms of other outcomes, there was a reduction of about one-fifth (2p=0.05) in contralateral breast cancer, which has already been included in the analyses of recurrence, and no apparent adverse effect on deaths from causes other than breast cancer (death rate ratio 0.89 [SD 0.09]). The directly randomised comparisons of longer versus shorter durations of polychemotherapy did not indicate any survival advantage with the use of more than about 3-6 months of polychemotherapy. By contrast, directly randomised comparisons did suggest that, compared with CMF alone, the anthracycline-containing regimens studied produced somewhat greater effects on recurrence (2p=0.006) and mortality (69% vs 72% 5-year survival; log-rank 2p=0.02). But this comparison is one of many that could have been selected for emphasis, the 99% CI reaches zero, and the results of several of the relevant trials are not yet available.Some months of adjuvant polychemotherapy (eg, with CMF or an anthracycline-containing regimen) typically produces an absolute improvement of about 7-11% in 10-year survival for women aged under 50 at presentation with early breast cancer, and of about 2-3% for those aged 50-69 (unless their prognosis is likely to be extremely good even without such treatment). Treatment decisions involve consideration not only of improvements in cancer recurrence and survival but also of adverse side-effects of treatment, and this report makes no recommendations as to who should or should not be treated.

Keywords: Adult; Aged; Antineoplastic Combined Chemotherapy Protocols, therapeutic use; Breast Neoplasms, chemistry/drug therapy/mortality; Chemotherapy, Adjuvant; Drug Administration Schedule; Female; Humans; Lymphatic Metastasis; Menopause; Middle Aged; Neoplasm Recurrence, Local; Randomized Controlled Trials as Topic; Receptors, Estrogen, analysis; Tamoxifen, administration /&/ dosage
[Pontil1998Properties] M. Pontil and A. Verri. Properties of support vector machines. Neural Comput, 10(4):955-74, May 1998. [ bib ]
Support vector machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed support vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this article, we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending on only the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m, we also show that m + 1 SVs are usually sufficient to determine the decision surface fully. For relatively small m, this latter result leads to a consistent reduction of the SV number.

Keywords: Algorithms, Artificial Intelligence, Automated, Biometry, Computers, DNA, Databases, Factual, Fungal, Fungal Proteins, GTP-Binding Proteins, Gene Expression, Genes, Learning, Linear Models, Markov Chains, Mathematics, Models, Neural Networks (Computer), Neurological, Non-P.H.S., Non-U.S. Gov't, Nonlinear Dynamics, Nucleic Acid Hybridization, Open Reading Frames, P.H.S., Pattern Recognition, Protein, Protein Structure, Proteins, Reproducibility of Results, Research Support, Saccharomyces cerevisiae, Sequence Alignment, Sequence Analysis, Software, Statistical, Tertiary, U.S. Gov't, 9573414
[Strahl1999Methylation] B. D. Strahl, R. Ohba, R. G. Cook, and C. D. Allis. Methylation of histone h3 at lysine 4 is highly conserved and correlates with transcriptionally active nuclei in tetrahymena. Proc Natl Acad Sci U S A, 96(26):14967-14972, Dec 1999. [ bib ]
Studies into posttranslational modifications of histones, notably acetylation, have yielded important insights into the dynamic nature of chromatin structure and its fundamental role in gene expression. The roles of other covalent histone modifications remain poorly understood. To gain further insight into histone methylation, we investigated its occurrence and pattern of site utilization in Tetrahymena, yeast, and human HeLa cells. In Tetrahymena, transcriptionally active macronuclei, but not transcriptionally inert micronuclei, contain a robust histone methyltransferase activity that is highly selective for H3. Microsequence analyses of H3 from Tetrahymena, yeast, and HeLa cells indicate that lysine 4 is a highly conserved site of methylation, which to date, is the major site detected in Tetrahymena and yeast. These data document a nonrandom pattern of H3 methylation that does not overlap with known acetylation sites in this histone. In as much as H3 methylation at lysine 4 appears to be specific to macronuclei in Tetrahymena, we suggest that this modification pattern plays a facilitatory role in the transcription process in a manner that remains to be determined. Consistent with this possibility, H3 methylation in yeast occurs preferentially in a subpopulation of H3 that is preferentially acetylated.

Keywords: Acetyltransferases, metabolism; Amino Acid Sequence; Animals; Cell Nucleus, metabolism; Hela Cells; Histone Acetyltransferases; Histone-Lysine N-Methyltransferase; Histones, metabolism; Humans; Lysine, analogs /&/ derivatives/metabolism; Methylation; Methyltransferases, metabolism; Molecular Sequence Data; Protein Methyltransferases; Protein Processing, Post-Translational; Saccharomyces cerevisiae Proteins; Species Specificity; Tetrahymena thermophila; Transcription, Genetic; Yeasts
[Rigaut1999generic] G. Rigaut, A. Shevchenko, B. Rutz, M. Wilm, M. Mann, and B. Séraphin. A generic protein purification method for protein complex characterization and proteome exploration. Nat Biotechnol, 17(10):1030-1032, Oct 1999. [ bib | DOI | http ]
We have developed a generic procedure to purify proteins expressed at their natural level under native conditions using a novel tandem affinity purification (TAP) tag. The TAP tag allows the rapid purification of complexes from a relatively small number of cells without prior knowledge of the complex composition, activity, or function. Combined with mass spectrometry, the TAP strategy allows for the identification of proteins interacting with a given target protein. The TAP method has been tested in yeast but should be applicable to other cells or organisms.

Keywords: Affinity Labels; Amino Acid Sequence; Electrophoresis, Polyacrylamide Gel; Methods; Molecular Sequence Data; Proteins; Proteome
[Gygi1999Quantitative] S. P. Gygi, B. Rist, S. A. Gerber, F. Turecek, M. H. Gelb, and R. Aebersold. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol, 17(10):994-999, Oct 1999. [ bib | DOI | http ]
We describe an approach for the accurate quantification and concurrent sequence identification of the individual proteins within complex mixtures. The method is based on a class of new chemical reagents termed isotope-coded affinity tags (ICATs) and tandem mass spectrometry. Using this strategy, we compared protein expression in the yeast Saccharomyces cerevisiae, using either ethanol or galactose as a carbon source. The measured differences in protein expression correlated with known yeast metabolic function under glucose-repressed conditions. The method is redundant if multiple cysteinyl residues are present, and the relative quantification is highly accurate because it is based on stable isotope dilution techniques. The ICAT approach should provide a widely applicable means to compare quantitatively global protein expression in cells and tissues.

Keywords: Affinity Labels; Amino Acid Sequence; Chromatography, Liquid; Isotope Labeling; Mass Spectrometry; Proteins
[Debouck1999DNA] C. Debouck and P. N. Goodfellow. DNA microarrays in drug discovery and development. Nat. Genet., 21(1 Suppl):48-50, Jan 1999. [ bib | DOI | http ]
DNA microarrays can be used to measure the expression patterns of thousands of genes in parallel, generating clues to gene function that can help to identify appropriate targets for therapeutic intervention. They can also be used to monitor changes in gene expression in response to drug treatments. Here, we discuss the different ways in which microarray analysis is likely to affect drug discovery.

Keywords: Agricultural, Alleles, Alternaria, Amino Acid, Amino Acid Chloromethyl Ketones, Amino Acid Sequence, Animal, Animals, Apoptosis, Asthma, Bacteria, Base Sequence, Binding Sites, Biotechnology, Blotting, Bone Density, Bone Matrix, Bone and Bones, CCR5, Camptothecin, Caspases, Cathepsins, Cell Surface, Central America, Chloroplast, Chondrocytes, Chromosome Mapping, Chromosomes, Cloning, Cluster Analysis, Collagen, Comparative Study, Coumarins, Crops, Crystallography, DNA, DNA Primers, Dipeptides, Disease, Disease Models, Drug Design, Drug Evaluation, Drug Industry, Enzyme Activation, Enzyme Inhibitors, Escherichia coli, Evolution, Exons, Expressed Sequence Tags, Female, Fetus, Fluorescent Dyes, Food Microbiology, Founder Effect, GTP-Binding Proteins, Gene Expression, Gene Frequency, Gene Library, Genes, Genetic, Genetic Predisposition to Disease, Genome, Geography, Growth Plate, Haplotypes, Hordeum, Human, Humans, Inclusion Bodies, Injections, Intraperitoneal, Introns, Isatin, Knockout, Male, Membrane Proteins, Messenger, Mice, Models, Molecular, Molecular Sequence Data, Molecular Structure, Mutation, Mycotoxins, Neutrophils, Non-U.S. Gov't, Northern, Oligonucleotide Array Sequence Analysis, Osteoarthritis, Osteochondrodysplasias, Osteoclasts, Osteopetrosis, Pair 15, Phaseolus, Polymorphism, Preclinical, Pregnancy, Promoter Regions (Genetics), Protein Precursors, Proteomics, RNA, Receptors, Recombinant Fusion Proteins, Recombinant Proteins, Research Support, Restriction Fragment Length, Ribosomal Proteins, Sequence Alignment, Sequence Analysis, Sequence Homology, South America, Species Specificity, Splenomegaly, Sulfonamides, Synteny, Tissue Distribution, Transcription, Trichothecenes, X-Ray, 9915501
[Mathews1999Expandeda] D. H. Mathews, J. Sabina, M. Zuker, and D. H. Turner. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol., 288(5):911-940, May 1999. [ bib | DOI | http ]
An improved dynamic programming algorithm is reported for RNA secondary structure prediction by free energy minimization. Thermodynamic parameters for the stabilities of secondary structure motifs are revised to include expanded sequence dependence as revealed by recent experiments. Additional algorithmic improvements include reduced search time and storage for multibranch loop free energies and improved imposition of folding constraints. An extended database of 151,503 nt in 955 structures? determined by comparative sequence analysis was assembled to allow optimization of parameters not based on experiments and to test the accuracy of the algorithm. On average, the predicted lowest free energy structure contains 73 % of known base-pairs when domains of fewer than 700 nt are folded; this compares with 64 % accuracy for previous versions of the algorithm and parameters. For a given sequence, a set of 750 generated structures contains one structure that, on average, has 86 % of known base-pairs. Experimental constraints, derived from enzymatic and flavin mononucleotide cleavage, improve the accuracy of structure predictions.

Keywords: 16S, 23S, 5S, Affinity, Algorithms, Aluminum Silicates, Amino Acid, Amino Acid Sequence, Amyloidosis, Archaeal, Bacillus, Bacterial, Bacterial Proteins, Bacteriophage T4, Base Sequence, Chloroplast, Chromatography, Circular Dichroism, Comparative Study, Computational Biology, Databases, Electrophoresis, Entropy, Enzyme Stability, Escherichia coli, Factual, Fibroblast Growth Factor 2, Flavin Mononucleotide, Fluorescence, Genetic, Guanidine, Humans, Huntington Disease, Kinetics, Light, Models, Molecular Sequence Data, Non-P.H.S., Non-U.S. Gov't, Nucleic Acid Conformation, P.H.S., Peptides, Phylogeny, Polyacrylamide Gel, Predictive Value of Tests, Protein Binding, Protein Denaturation, Protein Folding, Protein Structure, RNA, Radiation, Recombinant Proteins, Research Support, Ribosomal, Scattering, Secondary, Sequence Homology, Solutions, Spectrometry, Statistical, Temperature, Thermodynamics, Time Factors, Trinucleotide Repeat Expansion, U.S. Gov't, alpha-Amylase, 10329189
[Wilbur2000Boosting] W. J. Wilbur. Boosting naive Bayesian learning on a large subset of MEDLINE. Proc AMIA Symp, pages 918-22, 2000. [ bib ]
We are concerned with the rating of new documents that appear in a large database (MEDLINE) and are candidates for inclusion in a small specialty database (REBASE). The requirement is to rank the new documents as nearly in order of decreasing potential to be added to the smaller database as possible, so as to improve the coverage of the smaller database without increasing the effort of those who manage this specialty database. To perform this ranking task we have considered several machine learning approaches based on the naï ve Bayesian algorithm. We find that adaptive boosting outperforms naï ve Bayes, but that a new form of boosting which we term staged Bayesian retrieval outperforms adaptive boosting. Staged Bayesian retrieval involves two stages of Bayesian retrieval and we further find that if the second stage is replaced by a support vector machine we again obtain a significant improvement over the strictly Bayesian approach.

Keywords: Acute, Acute Disease, Adenocarcinoma, Algorithms, Amino Acid Sequence, Animals, Artificial Intelligence, Automated, B-Lymphocytes, Bacterial Proteins, Base Pair Mismatch, Base Sequence, Bayes Theorem, Binding Sites, Biological, Bone Marrow Cells, Brachyura, Cell Compartmentation, Chemistry, Child, Chromosome Aberrations, Classification, Codon, Colonic Neoplasms, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, DNA, Data Interpretation, Databases, Decision Trees, Diabetes Mellitus, Diagnosis, Discriminant Analysis, Discrimination Learning, Electric Conductivity, Electrophysiology, Escherichia coli Proteins, Factual, Feedback, Female, Fungal, Gastric Emptying, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Markers, Genetic Predisposition to Disease, Genomics, Hemolysins, Humans, Indians, Information Storage and Retrieval, Initiator, Ion Channels, Kinetics, Leukemia, Likelihood Functions, Lipid Bilayers, Logistic Models, Lymphocytic, MEDLINE, Male, Markov Chains, Melanoma, Models, Molecular, Myeloid, Neoplasm, Neoplasms, Neoplastic, Neural Networks (Computer), Neurological, Nevus, Non-P.H.S., Non-U.S. Gov't, Nonlinear Dynamics, Normal Distribution, North American, Nucleic Acid Conformation, Oligonucleotide Array Sequence Analysis, Organ Specificity, Organelles, Ovarian Neoplasms, Ovary, P.H.S., Pattern Recognition, Physical, Pigmented, Predictive Value of Tests, Promoter Regions (Genetics), Protein Biosynthesis, Protein Folding, Protein Structure, Proteins, Proteome, RNA, Reproducibility of Results, Research Support, Saccharomyces cerevisiae, Secondary, Sensitivity and Specificity, Sequence Alignment, Sequence Analysis, Sex Characteristics, Skin Diseases, Skin Neoplasms, Skin Pigmentation, Software, Sound Spectrography, Statistical, Stomach Diseases, T-Lymphocytes, Thermodynamics, Transcription, Transcription Factors, Tumor Markers, Type 2, U.S. Gov't, Vertebrates, 11080018
[Strahl2000language] B. D. Strahl and C. D. Allis. The language of covalent histone modifications. Nature, 403(6765):41-45, Jan 2000. [ bib | DOI | http ]
Histone proteins and the nucleosomes they form with DNA are the fundamental building blocks of eukaryotic chromatin. A diverse array of post-translational modifications that often occur on tail domains of these proteins has been well documented. Although the function of these highly conserved modifications has remained elusive, converging biochemical and genetic evidence suggests functions in several chromatin-based processes. We propose that distinct histone modifications, on one or more tails, act sequentially or in combination to form a 'histone code' that is, read by other proteins to bring about distinct downstream events.

Keywords: Acetylation; Amino Acid Sequence; Animals; Chromatin, physiology; Histones, chemistry/metabolism/physiology; Humans; Lysine, physiology; Microtubules, physiology; Models, Biological; Molecular Sequence Data; Phosphorylation; Protein Processing, Post-Translational; Serine, metabolism
[Risau-Gusman2000Generalization] Risau-Gusman and Gordon. Generalization properties of finite-size polynomial support vector machines. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics, 62(5 Pt B):7092-9, Nov 2000. [ bib ]
The learning properties of finite-size polynomial support vector machines are analyzed in the case of realizable classification tasks. The normalization of the high-order features acts as a squeezing factor, introducing a strong anisotropy in the patterns distribution in feature space. As a function of the training set size, the corresponding generalization error presents a crossover, more or less abrupt depending on the distribution's anisotropy and on the task to be learned, between a fast-decreasing and a slowly decreasing regime. This behavior corresponds to the stepwise decrease found by Dietrich et al. [Phys. Rev. Lett. 82, 2975 (1999)] in the thermodynamic limit. The theoretical results are in excellent agreement with the numerical simulations.

Keywords: Acute, Acute Disease, Adenocarcinoma, Algorithms, Amino Acid Sequence, Animals, Artificial Intelligence, Automated, B-Lymphocytes, Bacterial Proteins, Base Pair Mismatch, Base Sequence, Bayes Theorem, Binding Sites, Biological, Bone Marrow Cells, Brachyura, Cell Compartmentation, Chemistry, Child, Chromosome Aberrations, Classification, Codon, Colonic Neoplasms, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, DNA, Data Interpretation, Databases, Decision Trees, Diabetes Mellitus, Diagnosis, Discriminant Analysis, Discrimination Learning, Electric Conductivity, Electrophysiology, Escherichia coli Proteins, Factual, Feedback, Female, Fungal, Gastric Emptying, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Markers, Genetic Predisposition to Disease, Genomics, Hemolysins, Humans, Indians, Initiator, Ion Channels, Kinetics, Leukemia, Likelihood Functions, Lipid Bilayers, Logistic Models, Lymphocytic, Male, Markov Chains, Melanoma, Models, Molecular, Myeloid, Neoplasm, Neoplasms, Neoplastic, Neural Networks (Computer), Neurological, Nevus, Non-P.H.S., Non-U.S. Gov't, Nonlinear Dynamics, Normal Distribution, North American, Nucleic Acid Conformation, Oligonucleotide Array Sequence Analysis, Organ Specificity, Organelles, Ovarian Neoplasms, Ovary, P.H.S., Pattern Recognition, Physical, Pigmented, Predictive Value of Tests, Promoter Regions (Genetics), Protein Biosynthesis, Protein Folding, Protein Structure, Proteins, Proteome, RNA, Reproducibility of Results, Research Support, Saccharomyces cerevisiae, Secondary, Sensitivity and Specificity, Sequence Alignment, Sequence Analysis, Sex Characteristics, Skin Diseases, Skin Neoplasms, Skin Pigmentation, Software, Sound Spectrography, Statistical, Stomach Diseases, T-Lymphocytes, Thermodynamics, Transcription, Transcription Factors, Tumor Markers, Type 2, U.S. Gov't, Vertebrates, 0011102066
[Rea2000Regulation] S. Rea, F. Eisenhaber, D. O'Carroll, B. D. Strahl, Z. W. Sun, M. Schmid, S. Opravil, K. Mechtler, C. P. Ponting, C. D. Allis, and T. Jenuwein. Regulation of chromatin structure by site-specific histone h3 methyltransferases. Nature, 406(6796):593-599, Aug 2000. [ bib | DOI | http ]
The organization of chromatin into higher-order structures influences chromosome function and epigenetic gene regulation. Higher-order chromatin has been proposed to be nucleated by the covalent modification of histone tails and the subsequent establishment of chromosomal subdomains by non-histone modifier factors. Here we show that human SUV39H1 and murine Suv39h1-mammalian homologues of Drosophila Su(var)3-9 and of Schizosaccharomyces pombe clr4-encode histone H3-specific methyltransferases that selectively methylate lysine 9 of the amino terminus of histone H3 in vitro. We mapped the catalytic motif to the evolutionarily conserved SET domain, which requires adjacent cysteine-rich regions to confer histone methyltransferase activity. Methylation of lysine 9 interferes with phosphorylation of serine 10, but is also influenced by pre-existing modifications in the amino terminus of H3. In vivo, deregulated SUV39H1 or disrupted Suv39h activity modulate H3 serine 10 phosphorylation in native chromatin and induce aberrant mitotic divisions. Our data reveal a functional interdependence of site-specific H3 tail modifications and suggest a dynamic mechanism for the regulation of higher-order chromatin.

Keywords: Amino Acid Sequence; Animals; Catalytic Domain; Chromatin, chemistry/metabolism; Drosophila; Hela Cells; Histone-Lysine N-Methyltransferase; Humans; Lysine, metabolism; Methylation; Methyltransferases, genetics/metabolism; Mice; Molecular Sequence Data; Phosphorylation; Protein Conformation; Protein Methyltransferases; Protein Structure, Tertiary; Recombinant Proteins, metabolism; Repressor Proteins, genetics/metabolism; Sequence Homology, Amino Acid; Serine, metabolism; Substrate Specificity
[Pandey2000Proteomics] A. Pandey and M. Mann. Proteomics to study genes and genomes. Nature, 405:837-846, 2000. [ bib | http | .pdf ]
[Opper2000Gaussian] M. Opper and O. Winther. Gaussian processes for classification: mean-field algorithms. Neural Comput, 12(11):2655-84, Nov 2000. [ bib ]
We derive a mean-field algorithm for binary classification with gaussian processes that is based on the TAP approach originally proposed in statistical physics of disordered systems. The theory also yields an approximate leave-one-out estimator for the generalization error, which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler "naive" mean-field theory and support vector machines (SVMs) as limiting cases. For both mean-field algorithms and support vector machines, simulation results for three small benchmark data sets are presented. They show that one may get state-of-the-art performance by using the leave-one-out estimator for model selection and the built-in leave-one-out estimators are extremely precise when compared to the exact leave-one-out estimate. The second result is taken as strong support for the internal consistency of the mean-field approach.

Keywords: Acute, Acute Disease, Adenocarcinoma, Algorithms, Amino Acid Sequence, Animals, Artificial Intelligence, Automated, B-Lymphocytes, Bacterial Proteins, Base Pair Mismatch, Base Sequence, Bayes Theorem, Binding Sites, Biological, Bone Marrow Cells, Brachyura, Cell Compartmentation, Chemistry, Child, Chromosome Aberrations, Classification, Colonic Neoplasms, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, DNA, Data Interpretation, Databases, Decision Trees, Diabetes Mellitus, Diagnosis, Discriminant Analysis, Discrimination Learning, Electric Conductivity, Electrophysiology, Escherichia coli Proteins, Factual, Feedback, Female, Fungal, Gastric Emptying, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Markers, Genetic Predisposition to Disease, Hemolysins, Humans, Indians, Ion Channels, Kinetics, Leukemia, Likelihood Functions, Lipid Bilayers, Logistic Models, Lymphocytic, Male, Markov Chains, Melanoma, Models, Molecular, Myeloid, Neoplasm, Neoplasms, Neoplastic, Neural Networks (Computer), Neurological, Nevus, Non-P.H.S., Non-U.S. Gov't, Nonlinear Dynamics, Normal Distribution, North American, Nucleic Acid Conformation, Oligonucleotide Array Sequence Analysis, Organ Specificity, Organelles, Ovarian Neoplasms, Ovary, P.H.S., Pattern Recognition, Physical, Pigmented, Predictive Value of Tests, Promoter Regions (Genetics), Protein Folding, Protein Structure, Proteins, Proteome, RNA, Reproducibility of Results, Research Support, Saccharomyces cerevisiae, Secondary, Sensitivity and Specificity, Sequence Alignment, Sex Characteristics, Skin Diseases, Skin Neoplasms, Skin Pigmentation, Software, Sound Spectrography, Statistical, Stomach Diseases, T-Lymphocytes, Thermodynamics, Transcription, Transcription Factors, Tumor Markers, Type 2, U.S. Gov't, 11110131
[Lazo2000Combinatorial] J. S. Lazo and P. Wipf. Combinatorial chemistry and contemporary pharmacology. J. Pharmacol. Exp. Ther., 293(3):705-709, Jun 2000. [ bib ]
Both solid- and liquid-phase combinatorial chemistry have emerged as powerful tools for identifying pharmacologically active compounds and optimizing the biological activity of a lead compound. Complementary high-throughput in vitro assays are essential for compound evaluation. Cell-based assays that use optical endpoints permit investigation of a wide variety of functional properties of these compounds including specific intracellular biochemical pathways, protein-protein interactions, and the subcellular localization of targets. Integration of combinatorial chemistry with contemporary pharmacology now represents an important factor in drug discovery and development.

Keywords: Alzheimer Disease, Animals, Antineoplastic Agents, Biological, Bleomycin, Cell Cycle, Cell Cycle Proteins, Cell Death, Cell Line, Cell Nucleus, Cell Shape, Cell Transformation, Combinatorial Chemistry Techniques, Cultured, Drug Delivery Systems, Drug Design, Drug Evaluation, Enzyme Inhibitors, Formazans, Gene Expression, Humans, Inhibitory Concentration 50, Kinetics, Magnetic Resonance Spectroscopy, Mass, Mitochondria, Models, Molecular, Neoplasms, Neoplastic, Non-P.H.S., Non-U.S. Gov't, P.H.S., Paclitaxel, Peptide Library, Pharmaceutical Preparations, Pharmacology, Phosphoprotein Phosphatase, Preclinical, Protease Inhibitors, Protein-Tyrosine-Phosphatase, Research Support, Sensitivity and Specificity, Signal Transduction, Spectrum Analysis, Stereoisomerism, Structure-Activity Relationship, Sulfonic Acids, Tetrazolium Salts, Thiazoles, Toxicity Tests, Tumor, Tumor Cells, U.S. Gov't, cdc25 Phosphatase, 10869367
[Klebe2000Recent] G. Klebe. Recent developments in structure-based drug design. J Mol Med, 78(5):269-281, 2000. [ bib ]
Structure-based design has emerged as a new tool in medicinal chemistry. A prerequisite for this new approach is an understanding of the principles of molecular recognition in protein-ligand complexes. If the three-dimensional structure of a given protein is known, this information can be directly exploited for the retrieval and design of new ligands. Structure-based ligand design is an iterative approach. First of all, it requires the crystal structure or a model derived from the crystal structure of a closely related homolog of the target protein, preferentially complexed with a ligand. This complex unravels the binding mode and conformation of a ligand under investigation and indicates the essential aspects determining its binding affinity. It is then used to generate new ideas about ways of improving an existing ligand or of developing new alternative bonding skeletons. Computational methods supplemented by molecular graphics are applied to assist this step of hypothesis generation. The features of the protein binding pocket can be translated into queries used for virtual computer screening of large compound libraries or to design novel ligands de novo. These initial proposals must be confirmed experimentally. Subsequently they are optimized toward higher affinity and better selectivity. The latter aspect is of utmost importance in defining and controlling the pharmacological profile of a ligand. A prerequisite to tailoring selectivity by rational design is a detailed understanding of molecular parameters determining selectivity. Taking examples from current drug development programs (HIV proteinase, t-RNA transglycosylase, thymidylate synthase, thrombin and, related serine proteinases), we describe recent advances in lead discovery via computer screening, iterative design, and understanding of selectivity discrimination.

Keywords: Animals, Chemistry, Computer Simulation, Cross-Over Studies, Crystallography, Deglutition, Deglutition Disorders, Drug Design, Endoscopy, Enzyme Inhibitors, Female, Fluoroscopy, Glossopharyngeal Nerve, HIV Protease Inhibitors, Horse Diseases, Horses, Male, Models, Molecular, Nerve Block, Non-U.S. Gov't, P.H.S., Pharmaceutical, Proteins, Quantitative Structure-Activity Relationship, Random Allocation, Research Support, Thrombin, Thymidylate Synthase, U.S. Gov't, X-Ray, 10954196
[Gether2000Uncovering] U. Gether. Uncovering molecular mechanisms involved in activation of g protein-coupled receptors. Endocr Rev, 21(1):90-113, Feb 2000. [ bib ]
G protein-coupled, seven-transmembrane segment receptors (GPCRs or 7TM receptors), with more than 1000 different members, comprise the largest superfamily of proteins in the body. Since the cloning of the first receptors more than a decade ago, extensive experimental work has uncovered multiple aspects of their function and challenged many traditional paradigms. However, it is only recently that we are beginning to gain insight into some of the most fundamental questions in the molecular function of this class of receptors. How can, for example, so many chemically diverse hormones, neurotransmitters, and other signaling molecules activate receptors believed to share a similar overall tertiary structure? What is the nature of the physical changes linking agonist binding to receptor activation and subsequent transduction of the signal to the associated G protein on the cytoplasmic side of the membrane and to other putative signaling pathways? The goal of the present review is to specifically address these questions as well as to depict the current awareness about GPCR structure-function relationships in general.

Keywords: Animals; GTP-Binding Proteins; Humans; Ligands; Models, Biological; Molecular Conformation; Receptors, Cell Surface
[Zhu2001Global] H. Zhu, M. Bilgin, R. Bangham, D. Hall, A. Casamayor, P. Bertone, N. Lan, R. Jansen, S. Bidlingmaier, T. Houfek, T. Mitchell, P. Miller, R. A. Dean, M. Gerstein, and M. Snyder. Global analysis of protein activities using proteome chips. Science, 293(5537):2101-5, Sep 2001. [ bib | DOI | http | .pdf ]
To facilitate studies of the yeast proteome, we cloned 5800 open reading frames and overexpressed and purified their corresponding proteins. The proteins were printed onto slides at high spatial density to form a yeast proteome microarray and screened for their ability to interact with proteins and phospholipids. We identified many new calmodulin- and phospholipid-interacting proteins; a common potential binding motif was identified for many of the calmodulin-binding proteins. Thus, microarrays of an entire eukaryotic proteome can be prepared and screened for diverse biochemical activities. The microarrays can also be used to screen protein-drug interactions and to detect posttranslational modifications.

Keywords: Amino Acid Motifs, Amino Acid Sequence, Calmodulin, Calmodulin-Binding Proteins, Cell Membrane, Cloning, Fungal Proteins, Glucose, Liposomes, Membrane Proteins, Molecular, Molecular Sequence Data, Non-U.S. Gov't, Open Reading Frames, P.H.S., Peptide Library, Phosphatidylcholines, Phosphatidylinositols, Phospholipids, Protein Binding, Proteome, Recombinant Fusion Proteins, Research Support, Saccharomyces cerevisiae, Signal Transduction, Streptavidin, U.S. Gov't, 11474067
[Xue2001Mini-fingerprints] L. Xue, F. L. Stahura, J. W. Godden, and J. Bajorath. Mini-fingerprints detect similar activity of receptor ligands previously recognized only by three-dimensional pharmacophore-based methods. J Chem Inf Comput Sci, 41(2):394-401, 2001. [ bib ]
Mini-fingerprints (MFPs) are short binary bit string representations of molecular structure and properties, composed of few selected two-dimensional (2D) descriptors and a number of structural keys. MFPs were specifically designed to recognize compounds with similar activity. Here we report that MFPs are capable of detecting similar activities of some druglike molecules, including endothelin A antagonists and alpha(1)-adrenergic receptor ligands, the recognition of which was previously thought to depend on the use of multiple point three-dimensional (3D) pharmacophore methods. Thus, in these cases, MFPs and pharmacophore fingerprints produce similar results, although they define, in terms of their complexity, opposite ends of the spectrum of methods currently used to study molecular similarity or diversity. For each of the studied compound classes, comparison of MFP bit settings identified a consensus or signature pattern. Scaling factors can be applied to these bits in order to increase the probability of finding compounds with similar activity by virtual screening.

Keywords: Adrenergic, Angiotensin II, Cell Surface, Combinatorial Chemistry Techniques, Databases, Drug Evaluation, Endothelins, Environmental Pollutants, Factual, Information Management, Ligands, Molecular Structure, Pharmaceutical Preparations, Platelet Glycoprotein GPIIb-IIIa Complex, Preclinical, Receptors, Serine Proteinase Inhibitors, Structure-Activity Relationship, User-Computer Interface, alpha-1, 11277728
[Wang2001Methylation] H. Wang, Z. Q. Huang, L. Xia, Q. Feng, H. Erdjument-Bromage, B. D. Strahl, S. D. Briggs, C. D. Allis, J. Wong, P. Tempst, and Y. Zhang. Methylation of histone h4 at arginine 3 facilitating transcriptional activation by nuclear hormone receptor. Science, 293(5531):853-857, Aug 2001. [ bib | DOI | http ]
Acetylation of core histone tails plays a fundamental role in transcription regulation. In addition to acetylation, other posttranslational modifications, such as phosphorylation and methylation, occur in core histone tails. Here, we report the purification, molecular identification, and functional characterization of a histone H4-specific methyltransferase PRMT1, a protein arginine methyltransferase. PRMT1 specifically methylates arginine 3 (Arg 3) of H4 in vitro and in vivo. Methylation of Arg 3 by PRMT1 facilitates subsequent acetylation of H4 tails by p300. However, acetylation of H4 inhibits its methylation by PRMT1. Most important, a mutation in the S-adenosyl-l-methionine-binding site of PRMT1 substantially crippled its nuclear receptor coactivator activity. Our finding reveals Arg 3 of H4 as a novel methylation site by PRMT1 and indicates that Arg 3 methylation plays an important role in transcriptional regulation.

Keywords: Acetylation; Amino Acid Sequence; Animals; Arginine, metabolism; Binding Sites; Cell Nucleus, metabolism; Hela Cells; Histones, chemistry/metabolism; Humans; Hydroxamic Acids, pharmacology; Lysine, metabolism; Methylation; Methyltransferases, chemistry/genetics/isolation /&/ purification/metabolism; Molecular Sequence Data; Mutation; Oocytes; Receptors, Androgen, metabolism; Recombinant Proteins, metabolism; S-Adenosylmethionine, metabolism; Transcriptional Activation; Xenopus
[Vercoutere2001Rapid] W. Vercoutere, S. Winters-Hilt, H. Olsen, D. Deamer, D. Haussler, and M. Akeson. Rapid discrimination among individual DNA hairpin molecules at single-nucleotide resolution using an ion channel. Nat Biotechnol, 19(3):248-52, Mar 2001. [ bib | DOI | http | .pdf ]
RNA and DNA strands produce ionic current signatures when driven through an alpha-hemolysin channel by an applied voltage. Here we combine this nanopore detector with a support vector machine (SVM) to analyze DNA hairpin molecules on the millisecond time scale. Measurable properties include duplex stem length, base pair mismatches, and loop length. This nanopore instrument can discriminate between individual DNA hairpins that differ by one base pair or by one nucleotide.

Keywords: Acute, Acute Disease, Adenocarcinoma, Algorithms, Amino Acid Sequence, Artificial Intelligence, Automated, B-Lymphocytes, Bacterial Proteins, Base Pair Mismatch, Base Sequence, Bayes Theorem, Binding Sites, Biological, Bone Marrow Cells, Cell Compartmentation, Chemistry, Child, Chromosome Aberrations, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, DNA, Data Interpretation, Databases, Decision Trees, Diagnosis, Discriminant Analysis, Electric Conductivity, Electrophysiology, Escherichia coli Proteins, Factual, Female, Fungal, Gastric Emptying, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Markers, Hemolysins, Humans, Ion Channels, Kinetics, Leukemia, Lipid Bilayers, Logistic Models, Lymphocytic, Male, Markov Chains, Melanoma, Models, Molecular, Myeloid, Neoplasm, Neoplastic, Neural Networks (Computer), Nevus, Non-P.H.S., Non-U.S. Gov't, Nucleic Acid Conformation, Organ Specificity, Organelles, P.H.S., Pattern Recognition, Physical, Pigmented, Predictive Value of Tests, Promoter Regions (Genetics), Protein Folding, Protein Structure, Proteins, Proteome, RNA, Reproducibility of Results, Research Support, Saccharomyces cerevisiae, Secondary, Sensitivity and Specificity, Sequence Alignment, Sex Characteristics, Skin Diseases, Skin Neoplasms, Skin Pigmentation, Software, Statistical, Stomach Diseases, T-Lymphocytes, Thermodynamics, Transcription, Transcription Factors, Tumor Markers, U.S. Gov't, 11231558
[Vazquez2001Modeling] A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani. Modeling of protein interaction networks. E-print cond-mat/0108043, Aug 2001. [ bib | http | .pdf ]
[Suykens2001Optimal] J. A. Suykens, J. Vandewalle, and B. De Moor. Optimal control by least squares support vector machines. Neural Netw, 14(1):23-35, Jan 2001. [ bib ]
Support vector machines have been very successful in pattern recognition and function estimation problems. In this paper we introduce the use of least squares support vector machines (LS-SVM's) for the optimal control of nonlinear systems. Linear and neural full static state feedback controllers are considered. The problem is formulated in such a way that it incorporates the N-stage optimal control problem as well as a least squares support vector machine approach for mapping the state space into the action space. The solution is characterized by a set of nonlinear equations. An alternative formulation as a constrained nonlinear optimization problem in less unknowns is given, together with a method for imposing local stability in the LS-SVM control scheme. The results are discussed for support vector machines with radial basis function kernel. Advantages of LS-SVM control are that no number of hidden units has to be determined for the controller and that no centers have to be specified for the Gaussian kernels when applying Mercer's condition. The curse of dimensionality is avoided in comparison with defining a regular grid for the centers in classical radial basis function networks. This is at the expense of taking the trajectory of state variables as additional unknowns in the optimization problem, while classical neural network approaches typically lead to parametric optimization problems. In the SVM methodology the number of unknowns equals the number of training data, while in the primal space the number of unknowns can be infinite dimensional. The method is illustrated both on stabilization and tracking problems including examples on swinging up an inverted pendulum with local stabilization at the endpoint and a tracking problem for a ball and beam system.

Keywords: Acute, Acute Disease, Adenocarcinoma, Algorithms, Amino Acid Sequence, Artificial Intelligence, Automated, B-Lymphocytes, Bacterial Proteins, Base Pair Mismatch, Base Sequence, Bayes Theorem, Binding Sites, Biological, Bone Marrow Cells, Cell Compartmentation, Chemistry, Child, Chromosome Aberrations, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, DNA, Data Interpretation, Databases, Decision Trees, Diagnosis, Discriminant Analysis, Electric Conductivity, Electrophysiology, Escherichia coli Proteins, Factual, Feedback, Female, Fungal, Gastric Emptying, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Markers, Hemolysins, Humans, Ion Channels, Kinetics, Leukemia, Lipid Bilayers, Logistic Models, Lymphocytic, Male, Markov Chains, Melanoma, Models, Molecular, Myeloid, Neoplasm, Neoplastic, Neural Networks (Computer), Nevus, Non-P.H.S., Non-U.S. Gov't, Nonlinear Dynamics, Normal Distribution, Nucleic Acid Conformation, Organ Specificity, Organelles, P.H.S., Pattern Recognition, Physical, Pigmented, Predictive Value of Tests, Promoter Regions (Genetics), Protein Folding, Protein Structure, Proteins, Proteome, RNA, Reproducibility of Results, Research Support, Saccharomyces cerevisiae, Secondary, Sensitivity and Specificity, Sequence Alignment, Sex Characteristics, Skin Diseases, Skin Neoplasms, Skin Pigmentation, Software, Statistical, Stomach Diseases, T-Lymphocytes, Thermodynamics, Transcription, Transcription Factors, Tumor Markers, U.S. Gov't, 11213211
[Strahl2001Methylation] B. D. Strahl, S. D. Briggs, C. J. Brame, J. A. Caldwell, S. S. Koh, H. Ma, R. G. Cook, J. Shabanowitz, D. F. Hunt, M. R. Stallcup, and C. D. Allis. Methylation of histone h4 at arginine 3 occurs in vivo and is mediated by the nuclear receptor coactivator prmt1. Curr Biol, 11(12):996-1000, Jun 2001. [ bib ]
Posttranslational modifications of histone amino termini play an important role in modulating chromatin structure and function. Lysine methylation of histones has been well documented, and recently this modification has been linked to cellular processes involving gene transcription and heterochromatin assembly. However, the existence of arginine methylation on histones has remained unclear. Recent discoveries of protein arginine methyltransferases, CARM1 and PRMT1, as transcriptional coactivators for nuclear receptors suggest that histones may be physiological targets of these enzymes as part of a poorly defined transcriptional activation pathway. Here we show by using mass spectrometry that histone H4, isolated from asynchronously growing human 293T cells, is methylated at arginine 3 (Arg-3) in vivo. In support, a novel antibody directed against histone H4 methylated at Arg-3 independently demonstrates the in vivo occurrence of this modification and reveals that H4 Arg-3 methylation is highly conserved throughout eukaryotes. Finally, we show that PRMT1 is the major, if not exclusive, H4 Arg-3 methyltransfase in human 293T cells. These findings suggest a role for arginine methylation of histones in the transcription process.

Keywords: Amino Acid Motifs; Animals; Arginine, metabolism; Cell Line; Genes, Reporter; Histones, metabolism; Humans; Immunoblotting; Methylation; Protein-Arginine N-Methyltransferases, metabolism; Recombinant Fusion Proteins, genetics/metabolism
[Sole2001Model] R. V. Solé, R. Pastor-Satorras, E. D. Smith, and T. Kepler. A Model of Large-Scale Proteome Evolution. Technical report, Santa Fe Institute, 2001. Working paper 01-08-041. [ bib | .html | .pdf ]
[Seol2001Skp1] J. H. Seol, A. Shevchenko, A. Shevchenko, and R. J. Deshaies. Skp1 forms multiple protein complexes, including RAVE, a regulator of V-ATPase assembly. Nat Cell Biol, 3(4):384-91, Apr 2001. [ bib | DOI | http | .pdf ]
SCF ubiquitin ligases are composed of Skp1, Cdc53, Hrt1 and one member of a large family of substrate receptors known as F-box proteins (FBPs). Here we report the identification, using sequential rounds of epitope tagging, affinity purification and mass spectrometry, of 16 Skp1 and Cdc53-associated proteins in budding yeast, including all components of SCF, 9 FBPs, Yjr033 (Rav1) and Ydr202 (Rav2). Rav1, Rav2 and Skp1 form a complex that we have named 'regulator of the (H+)-ATPase of the vacuolar and endosomal membranes' (RAVE), which associates with the V1 domain of the vacuolar membrane (H+)-ATPase (V-ATPase). V-ATPases are conserved throughout eukaryotes, and have been implicated in tumour metastasis and multidrug resistance, and here we show that RAVE promotes glucose-triggered assembly of the V-ATPase holoenzyme. Previous systematic genome-wide two-hybrid screens yielded 17 proteins that interact with Skp1 and Cdc53, only 3 of which overlap with those reported here. Thus, our results provide a distinct view of the interactions that link proteins into a comprehensive cellular network.

Keywords: Affinity, Affinity Labels, Amino Acid Sequence, Animals, Cell Cycle Proteins, Cells, Chromatography, Cloning, Comparative Study, Cullin Proteins, Cultured, Cytoplasm, DNA, DNA Damage, DNA Repair, Electrospray Ionization, Fungal, Fungal Proteins, Gene Targeting, Genetic, Glucose, Holoenzymes, Humans, Macromolecular Substances, Mass, Matrix-Assisted Laser Desorption-Ionization, Mitosis, Molecular, Molecular Sequence Data, Non-P.H.S., Non-U.S. Gov't, P.H.S., Phosphoric Monoester Hydrolases, Protein Binding, Protein Interaction Mapping, Protein Kinases, Proteome, Proteomics, Proton-Translocating ATPases, Recombinant Fusion Proteins, Research Support, Ribonucleoproteins, Ribosomes, S-Phase Kinase-Associated Proteins, Saccharomyces cerevisiae, Saccharomyces cerevisiae Proteins, Sensitivity and Specificity, Sequence Alignment, Signal Transduction, Species Specificity, Spectrometry, Spectrum Analysis, Transcription, U.S. Gov't, Vacuolar Proton-Translocating ATPases, 11283612
[Rain2001protein-protein] J.-C. Rain, L. Selig, H. De Reuse, V. Battaglia, C. Reverdy, S. Simon, G. Lenzen, F. Petel, J. Wojcik, V. Schächter, Y. Chemama, A. Labigne, and P. Legrain. The protein-protein interaction map of Helicobacter pylori. Nature, 409:211-215, 2001. [ bib | http | .pdf ]
[Puig2001tandem] O. Puig, F. Caspary, G. Rigaut, B. Rutz, E. Bouveret, E. Bragado-Nilsson, M. Wilm, and B. Séraphin. The tandem affinity purification (tap) method: a general procedure of protein complex purification. Methods, 24(3):218-229, Jul 2001. [ bib | DOI | http ]
Identification of components present in biological complexes requires their purification to near homogeneity. Methods of purification vary from protein to protein, making it impossible to design a general purification strategy valid for all cases. We have developed the tandem affinity purification (TAP) method as a tool that allows rapid purification under native conditions of complexes, even when expressed at their natural level. Prior knowledge of complex composition or function is not required. The TAP method requires fusion of the TAP tag, either N- or C-terminally, to the target protein of interest. Starting from a relatively small number of cells, active macromolecular complexes can be isolated and used for multiple applications. Variations of the method to specifically purify complexes containing two given components or to subtract undesired complexes can easily be implemented. The TAP method was initially developed in yeast but can be successfully adapted to various organisms. Its simplicity, high yield, and wide applicability make the TAP method a very useful procedure for protein purification and proteome exploration.

Keywords: Bacterial Proteins; Blotting, Western; DNA, Bacterial; Fungal Proteins; Genetic Vectors; Methods; Mutation; Polymerase Chain Reaction; Proteins; Proteome; Ribonucleases; Ribonucleoproteins; Saccharomyces cerevisiae; Saccharomyces cerevisiae Proteins; Staphylococcus aureus
[Nakayama2001Role] J. Nakayama, J. C. Rice, B. D. Strahl, C. D. Allis, and S. I. Grewal. Role of histone h3 lysine 9 methylation in epigenetic control of heterochromatin assembly. Science, 292(5514):110-113, Apr 2001. [ bib | DOI | http ]
The assembly of higher order chromatin structures has been linked to the covalent modifications of histone tails. We provide in vivo evidence that lysine 9 of histone H3 (H3 Lys9) is preferentially methylated by the Clr4 protein at heterochromatin-associated regions in fission yeast. Both the conserved chromo- and SET domains of Clr4 are required for H3 Lys9 methylation in vivo. Localization of Swi6, a homolog of Drosophila HP1, to heterochomatic regions is dependent on H3 Lys9 methylation. Moreover, an H3-specific deacetylase Clr3 and a beta-propeller domain protein Rik1 are required for H3 Lys9 methylation by Clr4 and Swi6 localization. These data define a conserved pathway wherein sequential histone modifications establish a "histone code" essential for the epigenetic inheritance of heterochromatin assembly.

Keywords: Acetylation; Cell Cycle Proteins, chemistry/genetics/metabolism; Centromere, metabolism; Chromosomes, Fungal, metabolism; Fungal Proteins, genetics/metabolism; Gene Silencing; Genes, Fungal; Heterochromatin, metabolism; Histone Deacetylases, genetics/metabolism; Histone-Lysine N-Methyltransferase; Histones, chemistry/metabolism; Lysine, metabolism; Methylation; Methyltransferases, chemistry/genetics/metabolism; Mutation; Protein Methyltransferases; Protein Structure, Tertiary; Recombinant Proteins, chemistry/metabolism; Saccharomyces cerevisiae Proteins; Schizosaccharomyces pombe Proteins; Schizosaccharomyces, genetics/metabolism; Transcription Factors, metabolism
[Miwakeichi2001comparison] F. Miwakeichi, R. Ramirez-Padron, P. A. Valdes-Sosa, and T. Ozaki. A comparison of non-linear non-parametric models for epilepsy data. Comput. Biol. Med., 31(1):41-57, Jan 2001. [ bib ]
EEG spike and wave (SW) activity has been described through a non-parametric stochastic model estimated by the Nadaraya-Watson (NW) method. In this paper the performance of the NW, the local linear polynomial regression and support vector machines (SVM) methods were compared. The noise-free realizations obtained by the NW and SVM methods reproduced SW better than as reported in previous works. The tuning parameters had to be estimated manually. Adding dynamical noise, only the NW method was capable of generating SW similar to training data. The standard deviation of the dynamical noise was estimated by means of the correlation dimension.

Keywords: Acute, Acute Disease, Adenocarcinoma, Algorithms, Amino Acid Sequence, Animals, Artificial Intelligence, Automated, B-Lymphocytes, Bacterial Proteins, Base Pair Mismatch, Base Sequence, Bayes Theorem, Binding Sites, Biological, Bone Marrow Cells, Brachyura, Cell Compartmentation, Chemistry, Child, Chromosome Aberrations, Classification, Codon, Colonic Neoplasms, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, DNA, Data Interpretation, Databases, Decision Trees, Diabetes Mellitus, Diagnosis, Discriminant Analysis, Discrimination Learning, Electric Conductivity, Electroencephalography, Electrophysiology, Epilepsy, Escherichia coli Proteins, Factual, Feedback, Female, Fungal, Gastric Emptying, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Markers, Genetic Predisposition to Disease, Genomics, Hemolysins, Humans, Indians, Information Storage and Retrieval, Initiator, Ion Channels, Kinetics, Leukemia, Likelihood Functions, Linear Models, Lipid Bilayers, Logistic Models, Lymphocytic, MEDLINE, Male, Markov Chains, Melanoma, Models, Molecular, Myeloid, Neoplasm, Neoplasms, Neoplastic, Neural Networks (Computer), Neurological, Nevus, Non-P.H.S., Non-U.S. Gov't, Nonlinear Dynamics, Normal Distribution, North American, Nucleic Acid Conformation, Oligonucleotide Array Sequence Analysis, Organ Specificity, Organelles, Ovarian Neoplasms, Ovary, P.H.S., Pattern Recognition, Physical, Pigmented, Predictive Value of Tests, Promoter Regions (Genetics), Protein Biosynthesis, Protein Folding, Protein Structure, Proteins, Proteome, RNA, Reproducibility of Results, Research Support, Saccharomyces cerevisiae, Secondary, Sensitivity and Specificity, Sequence Alignment, Sequence Analysis, Sex Characteristics, Skin Diseases, Skin Neoplasms, Skin Pigmentation, Software, Sound Spectrography, Statistical, Stochastic Processes, Stomach Diseases, T-Lymphocytes, Thermodynamics, Transcription, Transcription Factors, Tumor Markers, Type 2, U.S. Gov't, Vertebrates, 11058693
[Ma2001Hormone-dependent] H. Ma, C. T. Baumann, H. Li, B. D. Strahl, R. Rice, M. A. Jelinek, D. W. Aswad, C. D. Allis, G. L. Hager, and M. R. Stallcup. Hormone-dependent, carm1-directed, arginine-specific methylation of histone h3 on a steroid-regulated promoter. Curr Biol, 11(24):1981-1985, Dec 2001. [ bib ]
Activation of gene transcription involves chromatin remodeling by coactivator proteins that are recruited by DNA-bound transcription factors. Local modification of chromatin structure at specific gene promoters by ATP-dependent processes and by posttranslational modifications of histone N-terminal tails provides access to RNA polymerase II and its accompanying transcription initiation complex. While the roles of lysine acetylation, serine phosphorylation, and lysine methylation of histones in chromatin remodeling are beginning to emerge, low levels of arginine methylation of histones have only recently been documented, and its physiological role is unknown. The coactivator CARM1 methylates histone H3 at Arg17 and Arg26 in vitro and cooperates synergistically with p160-type coactivators (e.g., GRIP1, SRC-1, ACTR) and coactivators with histone acetyltransferase activity (e.g., p300, CBP) to enhance gene activation by steroid and nuclear hormone receptors (NR) in transient transfection assays. In the current study, CARM1 cooperated with GRIP1 to enhance steroid hormone-dependent activation of stably integrated mouse mammary tumor virus (MMTV) promoters, and this coactivator function required the methyltransferase activity of CARM1. Chromatin immunoprecipitation assays and immunofluorescence studies indicated that CARM1 and the CARM1-methylated form of histone H3 specifically associated with a large tandem array of MMTV promoters in a hormone-dependent manner. Thus, arginine-specific histone methylation by CARM1 is an important part of the transcriptional activation process.

Keywords: Acetylation; Arginine, metabolism; Fluorescent Antibody Technique; Histones, chemistry/metabolism; Hormones, physiology; Lysine, metabolism; Mammary Tumor Virus, Mouse, genetics; Methylation; Phosphorylation; Precipitin Tests; Promoter Regions, Genetic; Protein-Arginine N-Methyltransferases, physiology; Serine, metabolism; Steroids, physiology
[Kim2001Evolving] J. Kim, P.L. Krapivsky, B. Kahng, and S. Redner. Evolving protein interaction networks. E-print cond-mat/0203167, 2001. [ bib | http | .pdf ]
[Dreiseitl2001comparison] S. Dreiseitl, L. Ohno-Machado, H. Kittler, S. Vinterbo, H. Billhardt, and M. Binder. A comparison of machine learning methods for the diagnosis of pigmented skin lesions. J Biomed Inform, 34(1):28-36, Feb 2001. [ bib | DOI | http | .pdf ]
We analyze the discriminatory power of k-nearest neighbors, logistic regression, artificial neural networks (ANNs), decision tress, and support vector machines (SVMs) on the task of classifying pigmented skin lesions as common nevi, dysplastic nevi, or melanoma. Three different classification tasks were used as benchmarks: the dichotomous problem of distinguishing common nevi from dysplastic nevi and melanoma, the dichotomous problem of distinguishing melanoma from common and dysplastic nevi, and the trichotomous problem of correctly distinguishing all three classes. Using ROC analysis to measure the discriminatory power of the methods shows that excellent results for specific classification problems in the domain of pigmented skin lesions can be achieved with machine-learning methods. On both dichotomous and trichotomous tasks, logistic regression, ANNs, and SVMs performed on about the same level, with k-nearest neighbors and decision trees performing worse.

Keywords: Algorithms, Amino Acid Sequence, Artificial Intelligence, Biological, Cell Compartmentation, Comparative Study, Computer Simulation, Computer-Assisted, Decision Trees, Diagnosis, Discriminant Analysis, Humans, Logistic Models, Melanoma, Models, Neural Networks (Computer), Nevus, Non-U.S. Gov't, Organelles, P.H.S., Pigmented, Predictive Value of Tests, Proteins, Reproducibility of Results, Research Support, Skin Diseases, Skin Neoplasms, Skin Pigmentation, U.S. Gov't, 11376540
[Chou2001Using] K.-C. Chou. Using subsite coupling to predict signal peptides. Protein Eng., 14(2):75-79, 2001. [ bib | http | .pdf ]
[Chou2001Prediction] K.-C. Chou. Prediction of protein signal sequences and their cleavage sites. Protein. Struct. Funct. Genet., 42:136-139, 2001. [ bib | http | .pdf ]
[Briggs2001Histone] S. D. Briggs, M. Bryk, B. D. Strahl, W. L. Cheung, J. K. Davie, S. Y. Dent, F. Winston, and C. D. Allis. Histone h3 lysine 4 methylation is mediated by set1 and required for cell growth and rdna silencing in saccharomyces cerevisiae. Genes Dev, 15(24):3286-3295, Dec 2001. [ bib | DOI | http ]
Histone methylation is known to be associated with both transcriptionally active and repressive chromatin states. Recent studies have identified SET domain-containing proteins such as SUV39H1 and Clr4 as mediators of H3 lysine 9 (Lys9) methylation and heterochromatin formation. Interestingly, H3 Lys9 methylation is not observed from bulk histones isolated from asynchronous populations of Saccharomyces cerevisiae or Tetrahymena thermophila. In contrast, H3 lysine 4 (Lys4) methylation is a predominant modification in these smaller eukaryotes. To identify the responsible methyltransferase(s) and to gain insight into the function of H3 Lys4 methylation, we have developed a histone H3 Lys4 methyl-specific antiserum. With this antiserum, we show that deletion of SET1, but not of other putative SET domain-containing genes, in S. cerevisiae, results in the complete abolishment of H3 Lys4 methylation in vivo. Furthermore, loss of H3 Lys4 methylation in a set1 Delta strain can be rescued by SET1. Analysis of histone H3 mutations at Lys4 revealed a slow-growth defect similar to a set1 Delta strain. Chromatin immunoprecipitation assays show that H3 Lys4 methylation is present at the rDNA locus and that Set1-mediated H3 Lys4 methylation is required for repression of RNA polymerase II transcription within rDNA. Taken together, these data suggest that Set1-mediated H3 Lys4 methylation is required for normal cell growth and transcriptional silencing.

Keywords: Animals; Antibody Formation; Blotting, Western; Cell Division; DNA Primers, chemistry; DNA, Bacterial, genetics; DNA, Ribosomal, genetics; DNA-Binding Proteins, metabolism; Fungal Proteins, metabolism; Gene Silencing; Genetic Vectors; Heterochromatin, chemistry/metabolism; Histone-Lysine N-Methyltransferase; Histones, metabolism; Lysine, metabolism; Methylation; Methyltransferases, genetics/metabolism; Mutation; Nucleosomes, chemistry/metabolism; Polymerase Chain Reaction; Precipitin Tests; Protein Methyltransferases; RNA Polymerase III, metabolism; Rabbits; Saccharomyces cerevisiae Proteins; Saccharomyces cerevisiae, genetics; Transcription Factors, metabolism
[Bostroem2001Reproducing] J. Boström. Reproducing the conformations of protein-bound ligands: a critical evaluation of several popular conformational searching tools. J Comput Aided Mol Des, 15(12):1137-1152, Dec 2001. [ bib ]
Several programs (Catalyst, Confort, Flo99, MacroModel, and Omega) that are commonly used to generate conformational ensembles have been tested for their ability to reproduce bioactive conformations. The ligands from thirty-two different ligand-protein complexes determined by high-resolution (< 2.0 A) X-ray crystallography have been analyzed. The Low-Mode Conformational Search method (with AMBER* and the GB/SA hydration model), as implemented in MacroModel, was found to perform better than the other algorithms. The rule-based method Omega, which is orders of magnitude faster than the other methods, also gave reasonable results but were found to be dependent on the input structure. The methods supporting diverse sampling (Catalyst, Confort) performed least well. For the seven ligands in the set having eight or more rotatable bonds, none of the bioactive conformations were ever found, save for one exception (Flo99). These ligands do not bind in a local minimum conformation according to AMBER*/SA. Taking these last two observations together, it is clear that geometrically similar structures should be collected in order to increase the probability of finding the bioactive conformation among the generated ensembles. Factors influencing bioactive conformational retrieval have been identified and are discussed.

Keywords: Algorithms; Crystallography, X-Ray; Ligands; Models, Molecular; Molecular Conformation; Protein Binding; Quantum Theory; Software
[Ballesteros2001G] J. Ballesteros and K. Palczewski. G protein-coupled receptor drug discovery: implications from the crystal structure of rhodopsin. Curr. Opin. Drug Discov. Devel., 4(5):561-574, Sep 2001. [ bib ]
G protein-coupled receptors (GPCRs) are a functionally diverse group of membrane proteins that play a critical role in signal transduction. Because of the lack of a high-resolution structure, the heptahelical transmembrane bundle within the N-terminal extracellular and C-terminal intracellular region of these receptors has initially been modeled based on the high-resolution structure of bacterial retinal-binding protein, bacteriorhodopsin. However, the low-resolution structure of rhodopsin, a prototypical GPCR, revealed that there is a minor relationship between GPCRs and bacteriorhodopsins. The high-resolution crystal structure of the rhodopsin ground state and further refinements of the model provide the first structural information about the entire organization of the polypeptide chain and post-translational moieties. These studies provide a structural template for Family 1 GPCRs that has the potential to significantly improve structure-based approaches to GPCR drug discovery.

Keywords: Amino Acid Sequence; Animals; Crystallography, X-Ray; Drug Design; GTP-Binding Proteins; Humans; Models, Molecular; Molecular Sequence Data; Receptors, Drug; Rhodopsin
[Opper2001Universal] M. Opper and R. Urbanczik. Universal learning curves of support vector machines. Phys Rev Lett, 86(19):4410-3, May 2001. [ bib ]
Using methods of statistical physics, we investigate the role of model complexity in learning with support vector machines (SVMs), which are an important alternative to neural networks. We show the advantages of using SVMs with kernels of infinite complexity on noisy target rules, which, in contrast to common theoretical beliefs, are found to achieve optimal generalization error although the training error does not converge to the generalization error. Moreover, we find a universal asymptotics of the learning curves which depend only on the target rule but not on the SVM kernel.

Keywords: Algorithms, Amino Acid Sequence, Artificial Intelligence, Biological, Cell Compartmentation, Chemistry, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, Databases, Decision Trees, Diagnosis, Discriminant Analysis, Electrophysiology, Factual, Gastric Emptying, Humans, Logistic Models, Melanoma, Models, Neural Networks (Computer), Nevus, Non-U.S. Gov't, Organelles, P.H.S., Physical, Pigmented, Predictive Value of Tests, Proteins, Proteome, Reproducibility of Results, Research Support, Skin Diseases, Skin Neoplasms, Skin Pigmentation, Software, Stomach Diseases, U.S. Gov't, 11328187
[Liang2001Detection] H. Liang and Z. Lin. Detection of delayed gastric emptying from electrogastrograms with support vector machine. IEEE Trans Biomed Eng, 48(5):601-4, May 2001. [ bib ]
A recent study reported a conventional neural network (NN) approach for the noninvasive diagnosis of delayed gastric emptying from the cutaneous electrogastrograms. Using support vector machine, we show that this relatively new technique can be used for detection of delayed gastric emptying and is in fact able to outdo the conventional NN.

Keywords: Algorithms, Amino Acid Sequence, Artificial Intelligence, Biological, Cell Compartmentation, Comparative Study, Computer Simulation, Computer-Assisted, Decision Trees, Diagnosis, Discriminant Analysis, Electrophysiology, Gastric Emptying, Humans, Logistic Models, Melanoma, Models, Neural Networks (Computer), Nevus, Non-U.S. Gov't, Organelles, P.H.S., Pigmented, Predictive Value of Tests, Proteins, Reproducibility of Results, Research Support, Skin Diseases, Skin Neoplasms, Skin Pigmentation, Stomach Diseases, U.S. Gov't, 11341535
[Jeong2001] H. Jeong, S. P. Mason, A. L. Barabási, and Z. N. Oltvai. Lethality and centrality in protein networks. Nature, 411(6833):41-42, May 2001. [ bib | DOI | http ]
Keywords: Fungal Proteins, genetics/physiology; Gene Deletion; Protein Binding; Proteome; Saccharomyces cerevisiae, genetics/physiology; Signal Transduction
[Yu2002Methods] Kun Yu, Nikolai Petrovsky, Christian Schönbach, Judice Y L Koh, and Vladimir Brusic. Methods for prediction of peptide binding to MHC molecules: a comparative study. Mol Med, 8(3):137-148, Mar 2002. [ bib ]
BACKGROUND: A variety of methods for prediction of peptide binding to major histocompatibility complex (MHC) have been proposed. These methods are based on binding motifs, binding matrices, hidden Markov models (HMM), or artificial neural networks (ANN). There has been little prior work on the comparative analysis of these methods. MATERIALS AND METHODS: We performed a comparison of the performance of six methods applied to the prediction of two human MHC class I molecules, including binding matrices and motifs, ANNs, and HMMs. RESULTS: The selection of the optimal prediction method depends on the amount of available data (the number of peptides of known binding affinity to the MHC molecule of interest), the biases in the data set and the intended purpose of the prediction (screening of a single protein versus mass screening). When little or no peptide data are available, binding motifs are the most useful alternative to random guessing or use of a complete overlapping set of peptides for selection of candidate binders. As the number of known peptide binders increases, binding matrices and HMM become more useful predictors. ANN and HMM are the predictive methods of choice for MHC alleles with more than 100 known binding peptides. CONCLUSION: The ability of bioinformatic methods to reliably predict MHC binding peptides, and thereby potential T-cell epitopes, has major implications for clinical immunology, particularly in the area of vaccine design.

Keywords: Amino Acid Motifs; Computational Biology; Histocompatibility Antigens Class I; Humans; Models, Molecular; Peptides; Protein Binding
[Weber2002Building] Griffin Weber, Staal Vinterbo, and Lucila Ohno-Machado. Building an asynchronous web-based tool for machine learning classification. Proc AMIA Symp, pages 869-73, 2002. [ bib ]
Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not been conducted. We developed free software that implements logistic regression with stepwise variable selection as a quick and simple method for initial exploration of important genetic markers in disease classification. To implement the algorithm and allow our collaborators in remote locations to evaluate and compare its results against those of other methods, we developed a user-friendly asynchronous web-based application with a minimal amount of programming using free, downloadable software tools. With this program, we show that classification using logistic regression can perform as well as other more sophisticated algorithms, and it has the advantages of being easy to interpret and reproduce. By making the tool freely and easily available, we hope to promote the comparison of classification methods. In addition, we believe our web application can be used as a model for other bioinformatics laboratories that need to develop web-based analysis tools in a short amount of time and on a limited budget.

Keywords: Acute, Algorithms, Animals, Artificial Intelligence, Automated, Base Pair Mismatch, Base Pairing, Base Sequence, Biological, Biosensing Techniques, Classification, Cluster Analysis, Comparative Study, Computational Biology, Computer-Assisted, Cystadenoma, DNA, Drug, Drug Design, Eukaryotic Cells, Female, Gene Expression, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Markers, Hemolysins, Humans, Internet, Leukemia, Ligands, Likelihood Functions, Logistic Models, Lymphocytic, Markov Chains, Mathematics, Messenger, Models, Molecular, Molecular Probe Techniques, Molecular Sequence Data, Nanotechnology, Neoplasm, Neoplasms, Neoplastic, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Nucleic Acid Conformation, Observer Variation, Oligonucleotide Array Sequence Analysis, Ovarian Neoplasms, P.H.S., Pattern Recognition, Probability, Protein Binding, Proteins, Quality Control, RNA, RNA Splicing, Receptors, Reference Values, Reproducibility of Results, Research Support, Sensitivity and Specificity, Sequence Analysis, Signal Processing, Software, Statistical, Stomach Neoplasms, Thermodynamics, Transcription, Tumor Markers, U.S. Gov't, 12463949
[Wahba2002Soft] Grace Wahba. Soft and hard classification by reproducing kernel Hilbert space methods. Proc Natl Acad Sci U S A, 99(26):16524-30, Dec 2002. [ bib | DOI | http | .pdf ]
Reproducing kernel Hilbert space (RKHS) methods provide a unified context for solving a wide variety of statistical modelling and function estimation problems. We consider two such problems: We are given a training set [yi, ti, i = 1, em leader, n], where yi is the response for the ith subject, and ti is a vector of attributes for this subject. The value of y(i) is a label that indicates which category it came from. For the first problem, we wish to build a model from the training set that assigns to each t in an attribute domain of interest an estimate of the probability pj(t) that a (future) subject with attribute vector t is in category j. The second problem is in some sense less ambitious; it is to build a model that assigns to each t a label, which classifies a future subject with that t into one of the categories or possibly "none of the above." The approach to the first of these two problems discussed here is a special case of what is known as penalized likelihood estimation. The approach to the second problem is known as the support vector machine. We also note some alternate but closely related approaches to the second problem. These approaches are all obtained as solutions to optimization problems in RKHS. Many other problems, in particular the solution of ill-posed inverse problems, can be obtained as solutions to optimization problems in RKHS and are mentioned in passing. We caution the reader that although a large literature exists in all of these topics, in this inaugural article we are selectively highlighting work of the author, former students, and other collaborators.

Keywords: Acute, Algorithms, Animals, Automated, Base Pair Mismatch, Base Pairing, Base Sequence, Biological, Biosensing Techniques, Classification, Cluster Analysis, Comparative Study, Computational Biology, Computer-Assisted, Cystadenoma, DNA, Drug, Drug Design, Eukaryotic Cells, Female, Gene Expression, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Markers, Hemolysins, Humans, Leukemia, Ligands, Likelihood Functions, Lymphocytic, Markov Chains, Mathematics, Messenger, Models, Molecular, Molecular Probe Techniques, Molecular Sequence Data, Nanotechnology, Neoplasm, Neoplastic, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Nucleic Acid Conformation, Observer Variation, Oligonucleotide Array Sequence Analysis, Ovarian Neoplasms, P.H.S., Pattern Recognition, Probability, Protein Binding, Proteins, Quality Control, RNA, RNA Splicing, Receptors, Reference Values, Reproducibility of Results, Research Support, Sensitivity and Specificity, Sequence Analysis, Signal Processing, Statistical, Stomach Neoplasms, Thermodynamics, Transcription, Tumor Markers, U.S. Gov't, 12477931
[Tucker2002Gene] D. L. Tucker, N. Tucker, and T. Conway. Gene expression profiling of the ph response in escherichia coli. J Bacteriol., 184(23):6551-6558, Dec 2002. [ bib ]
Escherichia coli MG1655 acid-inducible genes were identified by whole-genome expression profiling. Cultures were grown to the mid-logarithmic phase on acidified glucose minimal medium, conditions that induce glutamate-dependent acid resistance (AR), while the other AR systems are either repressed or not induced. A total of 28 genes were induced in at least two of three experiments in which the gene expression profiles of cells grown in acid (pH 5.5 or 4.5) were compared to those of cells grown at pH 7.4. As expected, the genes encoding glutamate decarboxylase, gadA and gadB, were significantly induced. Interestingly, two acid-inducible genes code for small basic proteins with pIs of >10.5, and six code for small acidic proteins with pIs ranging from 5.7 to 4.0; the roles of these small basic and acidic proteins in acid resistance are unknown. The acid-induced genes represented only five functional grouping categories, including eight genes involved in metabolism, nine associated with cell envelope structures or modifications, two encoding chaperones, six regulatory genes, and six unknown genes. It is unlikely that all of these genes are involved in the glutamate-dependent AR. However, nine acid-inducible genes are clustered in the gadA region, including hdeA, which encodes a putative periplasmic chaperone, and four putative regulatory genes. One of these putative regulators, yhiE, was shown to significantly increase acid resistance when overexpressed in cells that had not been preinduced by growth at pH 5.5, and mutation of yhiE decreased acid resistance; yhiE could therefore encode an activator of AR genes. Thus, the acid-inducible genes clustered in the gadA region appear to be involved in glutatmate-dependent acid resistance, although their specific roles remain to be elucidated.

Keywords: Culture Media; Escherichia coli; Escherichia coli Proteins; Gene Expression Profiling; Gene Expression Regulation, Bacterial; Heat-Shock Response; Hydrogen-Ion Concentration; Morpholines; Oligonucleotide Array Sequence Analysis
[Sturn2002Genesis:] Alexander Sturn, John Quackenbush, and Zlatko Trajanoski. Genesis: cluster analysis of microarray data. Bioinformatics, 18(1):207-8, Jan 2002. [ bib ]
A versatile, platform independent and easy to use Java suite for large-scale gene expression analysis was developed. Genesis integrates various tools for microarray data analysis such as filters, normalization and visualization tools, distance measures as well as common clustering algorithms including hierarchical clustering, self-organizing maps, k-means, principal component analysis, and support vector machines. The results of the clustering are transparent across all implemented methods and enable the analysis of the outcome of different algorithms and parameters. Additionally, mapping of gene expression data onto chromosomal sequences was implemented to enhance promoter analysis and investigation of transcriptional control mechanisms.

Keywords: Algorithms, Artificial Intelligence, Cluster Analysis, Comparative Study, Computational Biology, Databases, Gene Expression Profiling, Genetic, Models, Molecular Structure, Neural Networks (Computer), Non-U.S. Gov't, Oligonucleotide Array Sequence Analysis, Principal Component Analysis, Programming Languages, Promoter Regions (Genetics), Protein, Proteins, Research Support, Software, Statistical, Transcription, 11836235
[Strahl2002Set2] Brian D Strahl, Patrick A Grant, Scott D Briggs, Zu-Wen Sun, James R Bone, Jennifer A Caldwell, Sahana Mollah, Richard G Cook, Jeffrey Shabanowitz, Donald F Hunt, and C. David Allis. Set2 is a nucleosomal histone h3-selective methyltransferase that mediates transcriptional repression. Mol Cell Biol, 22(5):1298-1306, Mar 2002. [ bib ]
Recent studies of histone methylation have yielded fundamental new insights pertaining to the role of this modification in gene activation as well as in gene silencing. While a number of methylation sites are known to occur on histones, only limited information exists regarding the relevant enzymes that mediate these methylation events. We thus sought to identify native histone methyltransferase (HMT) activities from Saccharomyces cerevisiae. Here, we describe the biochemical purification and characterization of Set2, a novel HMT that is site-specific for lysine 36 (Lys36) of the H3 tail. Using an antiserum directed against Lys36 methylation in H3, we show that Set2, via its SET domain, is responsible for methylation at this site in vivo. Tethering of Set2 to a heterologous promoter reveals that Set2 represses transcription, and part of this repression is mediated through the HMT activity of the SET domain. These results suggest that Set2 and methylation at H3 Lys36 play a role in the repression of gene transcription.

Keywords: Amino Acid Sequence; Gene Expression Regulation, Fungal; Histones, metabolism; Methyltransferases, metabolism; Molecular Sequence Data; Nucleosomes, enzymology; Saccharomyces cerevisiae Proteins, metabolism; Saccharomyces cerevisiae, enzymology/genetics; Substrate Specificity; Transcription, Genetic; Transcriptional Activation
[Song2002Prediction] Minghu Song, Curt M Breneman, Jinbo Bi, N. Sukumar, Kristin P Bennett, Steven Cramer, and Nihal Tugcu. Prediction of protein retention times in anion-exchange chromatography systems using support vector regression. J Chem Inf Comput Sci, 42(6):1347-57, 2002. [ bib ]
Quantitative Structure-Retention Relationship (QSRR) models are developed for the prediction of protein retention times in anion-exchange chromatography systems. Topological, subdivided surface area, and TAE (Transferable Atom Equivalent) electron-density-based descriptors are computed directly for a set of proteins using molecular connectivity patterns and crystal structure geometries. A novel algorithm based on Support Vector Machine (SVM) regression has been employed to obtain predictive QSRR models using a two-step computational strategy. In the first step, a sparse linear SVM was utilized as a feature selection procedure to remove irrelevant or redundant information. Subsequently, the selected features were used to produce an ensemble of nonlinear SVM regression models that were combined using bootstrap aggregation (bagging) techniques, where various combinations of training and validation data sets were selected from the pool of available data. A visualization scheme (star plots) was used to display the relative importance of each selected descriptor in the final set of "bagged" models. Once these predictive models have been validated, they can be used as an automated prediction tool for virtual high-throughput screening (VHTS).

Keywords: Acute, Algorithms, Animals, Anion Exchange Resins, Artificial Intelligence, Automated, Base Pair Mismatch, Base Pairing, Base Sequence, Biological, Biosensing Techniques, Carcinoma, Chemical, Chromatography, Classification, Cluster Analysis, Comparative Study, Computational Biology, Computer-Assisted, Cystadenoma, DNA, Decision Making, Diagnosis, Differential, Drug, Drug Design, Electrostatics, Eukaryotic Cells, Feasibility Studies, Female, Gene Expression, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Markers, Hemolysins, Humans, Internet, Ion Exchange, Leukemia, Ligands, Likelihood Functions, Logistic Models, Lung Neoplasms, Lymphocytic, Lymphoma, Markov Chains, Mathematics, Messenger, Models, Molecular, Molecular Probe Techniques, Molecular Sequence Data, Nanotechnology, Neoplasm, Neoplasms, Neoplastic, Neural Networks (Computer), Non-P.H.S., Non-Small-Cell Lung, Non-U.S. Gov't, Nucleic Acid Conformation, Nucleic Acid Hybridization, Observer Variation, Oligonucleotide Array Sequence Analysis, Ovarian Neoplasms, P.H.S., Pattern Recognition, Probability, Protein Binding, Protein Conformation, Proteins, Quality Control, Quantum Theory, RNA, RNA Splicing, Receptors, Reference Values, Regression Analysis, Reproducibility of Results, Research Support, Sensitivity and Specificity, Sequence Analysis, Signal Processing, Software, Statistical, Stomach Neoplasms, Thermodynamics, Transcription, Tumor Markers, U.S. Gov't, 12444731
[Schmitt2002New] Stefan Schmitt, Daniel Kuhn, and Gerhard Klebe. A new method to detect related function among proteins independent of sequence and fold homology. J. Mol. Biol., 323(2):387-406, Oct 2002. [ bib ]
A new method has been developed to detect functional relationships among proteins independent of a given sequence or fold homology. It is based on the idea that protein function is intimately related to the recognition and subsequent response to the binding of a substrate or an endogenous ligand in a well-characterized binding pocket. Thus, recognition of similar ligands, supposedly linked to similar function, requires conserved recognition features exposed in terms of common physicochemical interaction properties via the functional groups of the residues flanking a particular binding cavity. Following a technique commonly used in the comparison of small molecule ligands, generic pseudocenters coding for possible interaction properties were assigned for a large sample set of cavities extracted from the entire PDB and stored in the database Cavbase. Using a particular query cavity a series of related cavities of decreasing similarity is detected based on a clique detection algorithm. The detected similarity is ranked according to property-based surface patches shared in common by the different clique solutions. The approach either retrieves protein cavities accommodating the same (e.g. co-factors) or closely related ligands or it extracts proteins exhibiting similar function in terms of a related catalytic mechanism. Finally the new method has strong potential to suggest alternative molecular skeletons in de novo design. The retrieval of molecular building blocks accommodated in a particular sub-pocket that shares similarity with the pocket in a protein studied by drug design can inspire the discovery of novel ligands.

Keywords: Algorithms; Binding Sites; Databases, Protein; Models, Molecular; Molecular Structure; Protein Binding; Protein Folding; Protein Structure, Tertiary; Proteins, chemistry/metabolism; Reproducibility of Results
[Pastor-Satorras2002Evolving] R. Pastor-Satorras, E. D. Smith, and R. V. Solé. Evolving protein interaction networks through gene duplication. Technical report, Santa Fe Institute, 2002. Working paper 02-02-008. [ bib | .html | .pdf ]
[Mateos2002Systematic] Alvaro Mateos, Joaquín Dopazo, Ronald Jansen, Yuhai Tu, Mark Gerstein, and Gustavo Stolovitzky. Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res., 12(11):1703-15, Nov 2002. [ bib | DOI | http | .pdf ]
Recent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering of co-expressed genes and, more importantly, by supervised learning algorithms. Using prior knowledge, these algorithms can assign functional annotations based on more complex expression signatures found in existing functional classes. Previously, support vector machines (SVMs) and other machine-learning methods have been applied to a limited number of functional classes for this purpose. Here we present, for the first time, the comprehensive application of supervised neural networks (SNNs) for functional annotation. Our study is novel in that we report systematic results for  100 classes in the Munich Information Center for Protein Sequences (MIPS) functional catalog. We found that only  10% of these are learnable (based on the rate of false negatives). A closer analysis reveals that false positives (and negatives) in a machine-learning context are not necessarily "false" in a biological sense. We show that the high degree of interconnections among functional classes confounds the signatures that ought to be learned for a unique class. We term this the "Borges effect" and introduce two new numerical indices for its quantification. Our analysis indicates that classification systems with a lower Borges effect are better suitable for machine learning. Furthermore, we introduce a learning procedure for combining false positives with the original class. We show that in a few iterations this process converges to a gene set that is learnable with considerably low rates of false positives and negatives and contains genes that are biologically related to the original class, allowing for a coarse reconstruction of the interactions between associated biological pathways. We exemplify this methodology using the well-studied tricarboxylic acid cycle.

Keywords: Acute, Algorithms, Animals, Anion Exchange Resins, Artificial Intelligence, Automated, Base Pair Mismatch, Base Pairing, Base Sequence, Biological, Biosensing Techniques, Carcinoma, Chemical, Chromatography, Citric Acid Cycle, Classification, Cluster Analysis, Comparative Study, Computational Biology, Computer-Assisted, Cystadenoma, DNA, Databases, Decision Making, Diagnosis, Differential, Drug, Drug Design, Electrostatics, Eukaryotic Cells, Factual, Feasibility Studies, Female, Gene Expression, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Heterogeneity, Genetic Markers, Hemolysins, Humans, Internet, Ion Exchange, Leukemia, Ligands, Likelihood Functions, Logistic Models, Lung Neoplasms, Lymphocytic, Lymphoma, Markov Chains, Mathematics, Messenger, Models, Molecular, Molecular Probe Techniques, Molecular Sequence Data, Nanotechnology, Neoplasm, Neoplasms, Neoplastic, Neural Networks (Computer), Non-P.H.S., Non-Small-Cell Lung, Non-U.S. Gov't, Nucleic Acid Conformation, Nucleic Acid Hybridization, Observer Variation, Oligonucleotide Array Sequence Analysis, Ovarian Neoplasms, P.H.S., Pattern Recognition, Probability, Protein Binding, Protein Conformation, Proteins, Quality Control, Quantum Theory, RNA, RNA Splicing, Receptors, Reference Values, Regression Analysis, Reproducibility of Results, Research Support, Saccharomyces cerevisiae Proteins, Sensitivity and Specificity, Sequence Analysis, Signal Processing, Software, Statistical, Stomach Neoplasms, Structural, Structure-Activity Relationship, Thermodynamics, Transcription, Tumor Markers, U.S. Gov't, 12421757
[Marsland2002self-organising] Stephen Marsland, Jonathan Shapiro, and Ulrich Nehmzow. A self-organising network that grows when required. Neural Netw, 15(8-9):1041-58, 2002. [ bib ]
The ability to grow extra nodes is a potentially useful facility for a self-organising neural network. A network that can add nodes into its map space can approximate the input space more accurately, and often more parsimoniously, than a network with predefined structure and size, such as the Self-Organising Map. In addition, a growing network can deal with dynamic input distributions. Most of the growing networks that have been proposed in the literature add new nodes to support the node that has accumulated the highest error during previous iterations or to support topological structures. This usually means that new nodes are added only when the number of iterations is an integer multiple of some pre-defined constant, A. This paper suggests a way in which the learning algorithm can add nodes whenever the network in its current state does not sufficiently match the input. In this way the network grows very quickly when new data is presented, but stops growing once the network has matched the data. This is particularly important when we consider dynamic data sets, where the distribution of inputs can change to a new regime after some time. We also demonstrate the preservation of neighbourhood relations in the data by the network. The new network is compared to an existing growing network, the Growing Neural Gas (GNG), on a artificial dataset, showing how the network deals with a change in input distribution after some time. Finally, the new network is applied to several novelty detection tasks and is compared with both the GNG and an unsupervised form of the Reduced Coulomb Energy network on a robotic inspection task and with a Support Vector Machine on two benchmark novelty detection tasks.

Keywords: Acute, Algorithms, Animals, Anion Exchange Resins, Artificial Intelligence, Automated, Base Pair Mismatch, Base Pairing, Base Sequence, Biological, Biosensing Techniques, Carcinoma, Chemical, Chromatography, Citric Acid Cycle, Classification, Cluster Analysis, Comparative Study, Computational Biology, Computer-Assisted, Cystadenoma, DNA, Databases, Decision Making, Diagnosis, Differential, Drug, Drug Design, Electrostatics, Eukaryotic Cells, Factual, Feasibility Studies, Female, Gene Expression, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Heterogeneity, Genetic Markers, Hemolysins, Humans, Internet, Ion Exchange, Leukemia, Ligands, Likelihood Functions, Logistic Models, Lung Neoplasms, Lymphocytic, Lymphoma, Markov Chains, Mathematics, Messenger, Models, Molecular, Molecular Probe Techniques, Molecular Sequence Data, Nanotechnology, Neoplasm, Neoplasms, Neoplastic, Neural Networks (Computer), Non-P.H.S., Non-Small-Cell Lung, Non-U.S. Gov't, Nucleic Acid Conformation, Nucleic Acid Hybridization, Observer Variation, Oligonucleotide Array Sequence Analysis, Ovarian Neoplasms, P.H.S., Pattern Recognition, Probability, Probability Learning, Protein Binding, Protein Conformation, Proteins, Quality Control, Quantum Theory, RNA, RNA Splicing, Receptors, Reference Values, Regression Analysis, Reproducibility of Results, Research Support, Robotics, Saccharomyces cerevisiae Proteins, Sensitivity and Specificity, Sequence Analysis, Signal Processing, Software, Statistical, Stomach Neoplasms, Structural, Structure-Activity Relationship, Thermodynamics, Transcription, Tumor Markers, U.S. Gov't, 12416693
[MacBeath2002Protein] Gavin MacBeath. Protein microarrays and proteomics. Nat Genet, 32 Suppl:526-532, Dec 2002. [ bib | DOI | http ]
The system-wide study of proteins presents an exciting challenge in this information-rich age of whole-genome biology. Although traditional investigations have yielded abundant information about individual proteins, they have been less successful at providing us with an integrated understanding of biological systems. The promise of proteomics is that, by studying many components simultaneously, we will learn how proteins interact with each other, as well as with non-proteinaceous molecules, to control complex processes in cells, tissues and even whole organisms. Here, I discuss the role of microarray technology in this burgeoning area.

Keywords: Forecasting; Humans; Immunoassay, methods; Protein Array Analysis, methods; Proteomics, methods
[Li2002Involvement] Jiwen Li, Qiushi Lin, Ho-Geun Yoon, Zhi-Qing Huang, Brian D Strahl, C. David Allis, and Jiemin Wong. Involvement of histone methylation and phosphorylation in regulation of transcription by thyroid hormone receptor. Mol Cell Biol, 22(16):5688-5697, Aug 2002. [ bib ]
Previous studies have established an important role of histone acetylation in transcriptional control by nuclear hormone receptors. With chromatin immunoprecipitation assays, we have now investigated whether histone methylation and phosphorylation are also involved in transcriptional regulation by thyroid hormone receptor (TR). We found that repression by unliganded TR is associated with a substantial increase in methylation of H3 lysine 9 (H3-K9) and a decrease in methylation of H3 lysine 4 (H3-K4), methylation of H3 arginine 17 (H3-R17), and a dual modification of phosphorylation of H3 serine 10 and acetylation of lysine 14 (pS10/acK14). On the other hand, transcriptional activation by liganded TR is coupled with a substantial decrease in both H3-K4 and H3-K9 methylation and a robust increase in H3-R17 methylation and the dual modification of pS10/acK14. Trichostatin A treatment results in not only histone hyperacetylation but also an increase in methylation of H3-K4, increase in dual modification of pS10/acK14, and reduction in methylation of H3-K9, revealing an extensive interplay between histone acetylation, methylation, and phosphorylation. In an effort to understand the underlying mechanism for an increase in H3-K9 methylation during repression by unliganded TR, we demonstrated that TR interacts in vitro with an H3-K9-specific histone methyltransferase (HMT), SUV39H1. Functional analysis indicates that SUV39H1 can facilitate repression by unliganded TR and in so doing requires its HMT activity. Together, our data uncover a novel role of H3-K9 methylation in repression by unliganded TR and provide strong evidence for the involvement of multiple distinct histone covalent modifications (acetylation, methylation, and phosphorylation) in transcriptional control by nuclear hormone receptors.

Keywords: Animals; Cell Fractionation; Gene Expression Regulation, drug effects; Genes, Reporter; Histone-Lysine N-Methyltransferase; Histones, chemistry/genetics/metabolism; Humans; Hydroxamic Acids, pharmacology; Methylation; Methyltransferases, metabolism; Oocytes, physiology; Phosphorylation; Protein Methyltransferases; Protein Synthesis Inhibitors, pharmacology; Receptors, Thyroid Hormone, metabolism; Transcription, Genetic; Xenopus laevis, physiology
[Ho2002Systematic] Yuen Ho, Albrecht Gruhler, Adrian Heilbut, Gary D Bader, Lynda Moore, Sally-Lin Adams, Anna Millar, Paul Taylor, Keiryn Bennett, Kelly Boutilier, Lingyun Yang, Cheryl Wolting, Ian Donaldson, Søren Schandorff, Juanita Shewnarane, Mai Vo, Joanne Taggart, Marilyn Goudreault, Brenda Muskat, Cris Alfarano, Danielle Dewar, Zhen Lin, Katerina Michalickova, Andrew R Willems, Holly Sassi, Peter A Nielsen, Karina J Rasmussen, Jens R Andersen, Lene E Johansen, Lykke H Hansen, Hans Jespersen, Alexandre Podtelejnikov, Eva Nielsen, Janne Crawford, Vibeke Poulsen, Birgitte D Sørensen, Jesper Matthiesen, Ronald C Hendrickson, Frank Gleeson, Tony Pawson, Michael F Moran, Daniel Durocher, Matthias Mann, Christopher W V Hogue, Daniel Figeys, and Mike Tyers. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 415(6868):180-3, Jan 2002. [ bib | DOI | http | .pdf ]
The recent abundance of genome sequence data has brought an urgent need for systematic proteomics to decipher the encoded protein networks that dictate cellular function. To date, generation of large-scale protein-protein interaction maps has relied on the yeast two-hybrid system, which detects binary interactions through activation of reporter gene expression. With the advent of ultrasensitive mass spectrometric protein identification methods, it is feasible to identify directly protein complexes on a proteome-wide scale. Here we report, using the budding yeast Saccharomyces cerevisiae as a test case, an example of this approach, which we term high-throughput mass spectrometric protein complex identification (HMS-PCI). Beginning with 10% of predicted yeast proteins as baits, we detected 3,617 associated proteins covering 25% of the yeast proteome. Numerous protein complexes were identified, including many new interactions in various signalling pathways and in the DNA damage response. Comparison of the HMS-PCI data set with interactions reported in the literature revealed an average threefold higher success rate in detection of known complexes compared with large-scale two-hybrid studies. Given the high degree of connectivity observed in this study, even partial HMS-PCI coverage of complex proteomes, including that of humans, should allow comprehensive identification of cellular networks.

Keywords: Affinity Labels, Amino Acid Sequence, Animals, Cell Cycle Proteins, Cloning, Comparative Study, DNA, DNA Damage, DNA Repair, Electrospray Ionization, Fungal, Genetic, Humans, Macromolecular Substances, Mass, Mitosis, Molecular, Molecular Sequence Data, Non-P.H.S., Non-U.S. Gov't, P.H.S., Phosphoric Monoester Hydrolases, Protein Binding, Protein Interaction Mapping, Protein Kinases, Proteome, Proteomics, Research Support, Ribonucleoproteins, Ribosomes, Saccharomyces cerevisiae, Saccharomyces cerevisiae Proteins, Sequence Alignment, Signal Transduction, Spectrometry, Spectrum Analysis, Transcription, U.S. Gov't, 11805813
[Gavin2002Functionala] Anne-Claude Gavin, Markus Bösche, Roland Krause, Paola Grandi, Martina Marzioch, Andreas Bauer, Jörg Schultz, Jens M Rick, Anne-Marie Michon, Cristina-Maria Cruciat, Marita Remor, Christian Höfert, Malgorzata Schelder, Miro Brajenovic, Heinz Ruffner, Alejandro Merino, Karin Klein, Manuela Hudak, David Dickson, Tatjana Rudi, Volker Gnau, Angela Bauch, Sonja Bastuck, Bettina Huhse, Christina Leutwein, Marie-Anne Heurtier, Richard R Copley, Angela Edelmann, Erich Querfurth, Vladimir Rybin, Gerard Drewes, Manfred Raida, Tewis Bouwmeester, Peer Bork, Bertrand Seraphin, Bernhard Kuster, Gitte Neubauer, and Giulio Superti-Furga. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415(6868):141-7, Jan 2002. [ bib | DOI | http | .pdf ]
Most cellular processes are carried out by multiprotein complexes. The identification and analysis of their components provides insight into how the ensemble of expressed proteins (proteome) is organized into functional units. We used tandem-affinity purification (TAP) and mass spectrometry in a large-scale approach to characterize multiprotein complexes in Saccharomyces cerevisiae. We processed 1,739 genes, including 1,143 human orthologues of relevance to human biology, and purified 589 protein assemblies. Bioinformatic analysis of these assemblies defined 232 distinct multiprotein complexes and proposed new cellular roles for 344 proteins, including 231 proteins with no previous functional annotation. Comparison of yeast and human complexes showed that conservation across species extends from single proteins to their molecular environment. Our analysis provides an outline of the eukaryotic proteome as a network of protein complexes at a level of organization beyond binary interactions. This higher-order map contains fundamental biological information and offers the context for a more reasoned and informed approach to drug discovery.

Keywords: Affinity, Affinity Labels, Amino Acid Sequence, Animals, Cell Cycle Proteins, Cells, Chromatography, Cloning, Comparative Study, Cultured, DNA, DNA Damage, DNA Repair, Electrospray Ionization, Fungal, Gene Targeting, Genetic, Humans, Macromolecular Substances, Mass, Matrix-Assisted Laser Desorption-Ionization, Mitosis, Molecular, Molecular Sequence Data, Non-P.H.S., Non-U.S. Gov't, P.H.S., Phosphoric Monoester Hydrolases, Protein Binding, Protein Interaction Mapping, Protein Kinases, Proteome, Proteomics, Recombinant Fusion Proteins, Research Support, Ribonucleoproteins, Ribosomes, Saccharomyces cerevisiae, Saccharomyces cerevisiae Proteins, Sensitivity and Specificity, Sequence Alignment, Signal Transduction, Species Specificity, Spectrometry, Spectrum Analysis, Transcription, U.S. Gov't, 11805813
[Ekins2002Towards] S. Ekins, B. Boulanger, P. W. Swaan, and M. A. Z. Hupcey. Towards a new age of virtual ADME/TOX and multidimensional drug discovery. J Comput Aided Mol Des, 16(5-6):381-401, 2002. [ bib ]
With the continual pressure to ensure follow-up molecules to billion dollar blockbuster drugs, there is a hurdle in profitability and growth for pharmaceutical companies in the next decades. With each success and failure we increasingly appreciate that a key to the success of synthesized molecules through the research and development process is the possession of drug-like properties. These properties include an adequate bioactivity as well as adequate solubility, an ability to cross critical membranes (intestinal and sometimes blood-brain barrier), reasonable metabolic stability and of course safety in humans. Dependent on the therapeutic area being investigated it might also be desirable to avoid certain enzymes or transporters to circumvent potential drug-drug interactions. It may also be important to limit the induction of these same proteins that can result in further toxicities. We have clearly moved the assessment of in vitro absorption, distribution, metabolism, excretion and toxicity (ADME/TOX) parameters much earlier in the discovery organization than a decade ago with the inclusion of higher throughput systems. We are also now faced with huge amounts of ADME/TOX data for each molecule that need interpretation and also provide a valuable resource for generating predictive computational models for future drug discovery. The present review aims to show what tools exist today for visualizing and modeling ADME/TOX data, what tools need to be developed, and how both the present and future tools are valuable for virtual filtering using ADME/TOX and bioactivity properties in parallel as a viable addition to present practices.

Keywords: ATP-Binding Cassette Transporters, Algorithms, Animals, Biological, Biological Availability, Computer Simulation, Drug Design, Drug Evaluation, Drug Industry, Gene Expression Profiling, Humans, Models, Organic Anion Transporters, P.H.S., Pharmaceutical, Pharmaceutical Preparations, Pharmacogenetics, Pharmacokinetics, Preclinical, Proteomics, Research Support, Software, Systems Biology, Technology, Toxicity Tests, U.S. Gov't, 12489686
[Dover2002Methylation] Jim Dover, Jessica Schneider, Mary Anne Tawiah-Boateng, Adam Wood, Kimberly Dean, Mark Johnston, and Ali Shilatifard. Methylation of histone h3 by compass requires ubiquitination of histone h2b by rad6. J Biol Chem, 277(32):28368-28371, Aug 2002. [ bib | DOI | http ]
The DNA of eukaryotes is wrapped around nucleosomes and packaged into chromatin. Covalent modifications of the histone proteins that comprise the nucleosome alter chromatin structure and have major effects on gene expression. Methylation of lysine 4 of histone H3 by COMPASS is required for silencing of genes located near chromosome telomeres and within the rDNA (Krogan, N. J, Dover, J., Khorrami, S., Greenblatt, J. F., Schneider, J., Johnston, M., and Shilatifard, A. (2002) J. Biol. Chem. 277, 10753-10755; Briggs, S. D., Bryk, M., Strahl, B. D., Cheung, W. L., Davie, J. K., Dent, S. Y., Winston, F., and Allis, C. D. (2001) Genes. Dev. 15, 3286-3295). To learn about the mechanism of histone methylation, we surveyed the genome of the yeast Saccharomyces cerevisiae for genes necessary for this process. By analyzing approximately 4800 mutant strains, each deleted for a different non-essential gene, we discovered that the ubiquitin-conjugating enzyme Rad6 is required for methylation of lysine 4 of histone H3. Ubiquitination of histone H2B on lysine 123 is the signal for the methylation of histone H3, which leads to silencing of genes located near telomeres.

Keywords: DNA, Ribosomal, metabolism; Electrophoresis, Polyacrylamide Gel; Gene Silencing; Histones, metabolism; Ligases, metabolism; Lysine, metabolism; Methylation; Models, Biological; Mutation; Saccharomyces cerevisiae Proteins; Saccharomyces cerevisiae, genetics; Ubiquitin, metabolism; Ubiquitin-Conjugating Enzymes
[Dekker2002Capturing] Job Dekker, Karsten Rippe, Martijn Dekker, and Nancy Kleckner. Capturing chromosome conformation. Science, 295(5558):1306-1311, Feb 2002. [ bib | DOI | http ]
We describe an approach to detect the frequency of interaction between any two genomic loci. Generation of a matrix of interaction frequencies between sites on the same or different chromosomes reveals their relative spatial disposition and provides information about the physical properties of the chromatin fiber. This methodology can be applied to the spatial organization of entire genomes in organisms from bacteria to human. Using the yeast Saccharomyces cerevisiae, we could confirm known qualitative features of chromosome organization within the nucleus and dynamic changes in that organization during meiosis. We also analyzed yeast chromosome III at the G1 stage of the cell cycle. We found that chromatin is highly flexible throughout. Furthermore, functionally distinct AT- and GC-rich domains were found to exhibit different conformations, and a population-average 3D model of chromosome III could be determined. Chromosome III emerges as a contorted ring.

Keywords: AT Rich Sequence; Cell Fractionation; Cell Nucleus; Centromere; Chromatin; Chromosomes, Fungal; Cross-Linking Reagents; Deoxyribonuclease EcoRI; Formaldehyde; G1 Phase; GC Rich Sequence; Genome, Fungal; Mathematics; Meiosis; Mitosis; Polymerase Chain Reaction; Protein Conformation; Saccharomyces cerevisiae; Telomere
[Doennes2002Prediction] Pierre Dönnes and Arne Elofsson. Prediction of MHC class I binding peptides, using SVMHC. BMC Bioinformatics, 3:25, Sep 2002. [ bib ]
BACKGROUND: T-cells are key players in regulating a specific immune response. Activation of cytotoxic T-cells requires recognition of specific peptides bound to Major Histocompatibility Complex (MHC) class I molecules. MHC-peptide complexes are potential tools for diagnosis and treatment of pathogens and cancer, as well as for the development of peptide vaccines. Only one in 100 to 200 potential binders actually binds to a certain MHC molecule, therefore a good prediction method for MHC class I binding peptides can reduce the number of candidate binders that need to be synthesized and tested. RESULTS: Here, we present a novel approach, SVMHC, based on support vector machines to predict the binding of peptides to MHC class I molecules. This method seems to perform slightly better than two profile based methods, SYFPEITHI and HLA_BIND. The implementation of SVMHC is quite simple and does not involve any manual steps, therefore as more data become available it is trivial to provide prediction for more MHC types. SVMHC currently contains prediction for 26 MHC class I types from the MHCPEP database or alternatively 6 MHC class I types from the higher quality SYFPEITHI database. The prediction models for these MHC types are implemented in a public web service available at http://www.sbc.su.se/svmhc/. CONCLUSIONS: Prediction of MHC class I binding peptides using Support Vector Machines, shows high performance and is easy to apply to a large number of MHC class I types. As more peptide data are put into MHC databases, SVMHC can easily be updated to give prediction for additional MHC class I types. We suggest that the number of binding peptides needed for SVM training is at least 20 sequences.

Keywords: Animals; Artificial Intelligence; Comparative Study; Computational Biology; Databases, Protein; Epitopes, T-Lymphocyte; HLA Antigens; Histocompatibility Antigens Class I; Humans; Peptides; Predictive Value of Tests; Protein Binding; Research Support, Non-U.S. Gov't; Sensitivity and Specificity
[Chan2002Comparison] Kwokleung Chan, Te-Won Lee, Pamela A Sample, Michael H Goldbaum, Robert N Weinreb, and Terrence J Sejnowski. Comparison of machine learning and traditional classifiers in glaucoma diagnosis. IEEE Trans Biomed Eng, 49(9):963-74, Sep 2002. [ bib | DOI | http | .pdf ]
Glaucoma is a progressive optic neuropathy with characteristic structural changes in the optic nerve head reflected in the visual field. The visual-field sensitivity test is commonly used in a clinical setting to evaluate glaucoma. Standard automated perimetry (SAP) is a common computerized visual-field test whose output is amenable to machine learning. We compared the performance of a number of machine learning algorithms with STATPAC indexes mean deviation, pattern standard deviation, and corrected pattern standard deviation. The machine learning algorithms studied included multilayer perceptron (MLP), support vector machine (SVM), and linear (LDA) and quadratic discriminant analysis (QDA), Parzen window, mixture of Gaussian (MOG), and mixture of generalized Gaussian (MGG). MLP and SVM are classifiers that work directly on the decision boundary and fall under the discriminative paradigm. Generative classifiers, which first model the data probability density and then perform classification via Bayes' rule, usually give deeper insight into the structure of the data space. We have applied MOG, MGG, LDA, QDA, and Parzen window to the classification of glaucoma from SAP. Performance of the various classifiers was compared by the areas under their receiver operating characteristic curves and by sensitivities (true-positive rates) at chosen specificities (true-negative rates). The machine-learning-type classifiers showed improved performance over the best indexes from STATPAC. Forward-selection and backward-elimination methodology further improved the classification rate and also has the potential to reduce testing time by diminishing the number of visual-field location measurements.

Keywords: Acute, Algorithms, Animals, Anion Exchange Resins, Artificial Intelligence, Automated, Base Pair Mismatch, Base Pairing, Base Sequence, Biological, Biosensing Techniques, Carcinoma, Chemical, Chromatography, Citric Acid Cycle, Classification, Cluster Analysis, Comparative Study, Computational Biology, Computer-Assisted, Cystadenoma, DNA, Databases, Decision Making, Diagnosis, Differential, Discriminant Analysis, Drug, Drug Design, Electrostatics, Epitopes, Eukaryotic Cells, Factual, False Negative Reactions, False Positive Reactions, Feasibility Studies, Female, Gene Expression, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Heterogeneity, Genetic Markers, Glaucoma, HLA Antigens, Hemolysins, Histocompatibility Antigens Class I, Humans, Internet, Intraocular Pressure, Ion Exchange, Lasers, Leukemia, Ligands, Likelihood Functions, Logistic Models, Lung Neoplasms, Lymphocytic, Lymphoma, Markov Chains, Mathematics, Messenger, Models, Molecular, Molecular Probe Techniques, Molecular Sequence Data, Nanotechnology, Neoplasm, Neoplasms, Neoplastic, Neural Networks (Computer), Neurological, Non-P.H.S., Non-Small-Cell Lung, Non-U.S. Gov't, Nucleic Acid Conformation, Nucleic Acid Hybridization, Observer Variation, Oligonucleotide Array Sequence Analysis, Open-Angle, Ophthalmoscopy, Optic Disk, Optic Nerve Diseases, Ovarian Neoplasms, P.H.S., Pattern Recognition, Peptides, Perimetry, Predictive Value of Tests, Probability, Probability Learning, Protein, Protein Binding, Protein Conformation, Proteins, Quality Control, Quantum Theory, RNA, RNA Splicing, ROC Curve, Receptors, Reference Values, Regression Analysis, Reproducibility of Results, Research Support, Robotics, Saccharomyces cerevisiae Proteins, Sensitivity and Specificity, Sequence Analysis, Signal Processing, Software, Statistical, Stomach Neoplasms, Structural, Structure-Activity Relationship, T-Lymphocyte, Thermodynamics, Transcription, Tumor Markers, U.S. Gov't, 12214886
[Bryk2002Evidence] Mary Bryk, Scott D Briggs, Brian D Strahl, M. Joan Curcio, C. David Allis, and Fred Winston. Evidence that set1, a factor required for methylation of histone h3, regulates rdna silencing in s. cerevisiae by a sir2-independent mechanism. Curr Biol, 12(2):165-170, Jan 2002. [ bib ]
Several types of histone modifications have been shown to control transcription. Recent evidence suggests that specific combinations of these modifications determine particular transcription patterns. The histone modifications most recently shown to play critical roles in transcription are arginine-specific and lysine-specific methylation. Lysine-specific histone methyltransferases all contain a SET domain, a conserved 130 amino acid motif originally identified in polycomb- and trithorax-group proteins from Drosophila. Members of the SU(VAR)3-9 family of SET-domain proteins methylate K9 of histone H3. Methylation of H3 has also been shown to occur at K4. Several studies have suggested a correlation between K4-methylated H3 and active transcription. In this paper, we provide evidence that K4-methylated H3 is required in a negative role, rDNA silencing in Saccharomyces cerevisiae. In a screen for rDNA silencing mutants, we identified a mutation in SET1, previously shown to regulate silencing at telomeres and HML. Recent work has shown that Set1 is a member of a complex and is required for methylation of K4 of H3 at several genomic locations. In addition, we demonstrate that a K4R change in H3, which prevents K4 methylation, impairs rDNA silencing, indicating that Set1 regulates rDNA silencing, directly or indirectly, via H3 methylation. Furthermore, we present several lines of evidence that the role of Set1 in rDNA silencing is distinct from that of the histone deacetylase Sir2. Together, these results suggest that Set1-dependent H3 methylation is required for rDNA silencing in a Sir2-independent fashion.

Keywords: Acetylation; DNA Methylation; DNA, Ribosomal, genetics; DNA-Binding Proteins, metabolism; Drosophila Proteins; Fungal Proteins, metabolism; Gene Silencing; Histone Deacetylases, metabolism; Histone-Lysine N-Methyltransferase; Histones, metabolism; Mutation; Saccharomyces cerevisiae Proteins; Saccharomyces cerevisiae, metabolism; Silent Information Regulator Proteins, Saccharomyces cerevisiae; Sirtuin 2; Sirtuins; Trans-Activators, metabolism; Transcription Factors, metabolism
[Brusic2002Prediction] V. Brusic, N. Petrovsky, G. Zhang, and V. B. Bajic. Prediction of promiscuous peptides that bind HLA class I molecules. Immunol. Cell Biol., 80(3):280-285, Jun 2002. [ bib ]
Promiscuous T-cell epitopes make ideal targets for vaccine development. We report here a computational system, MULTIPRED, for the prediction of peptide binding to the HLA-A2 supertype. It combines a novel representation of peptide/MHC interactions with a hidden Markov model as the prediction algorithm. MULTIPREDis both sensitive and specific, and demonstrates high accuracy of peptide-binding predictions for HLA-A*0201, *0204, and *0205 alleles, good accuracy for *0206 allele, and marginal accuracy for *0203 allele. MULTIPREDreplaces earlier requirements for individual prediction models for each HLA allelic variant and simplifies computational aspects of peptide-binding prediction. Preliminary testing indicates that MULTIPRED can predict peptide binding to HLA-A2 supertype molecules with high accuracy, including those allelic variants for which no experimental binding data are currently available.

Keywords: Algorithms, Amino Acid Motifs, Amino Acid Sequence, Antigen-Antibody Complex, Automated, Binding Sites, Computational Biology, Drug Delivery Systems, Drug Design, Epitopes, Forecasting, Genes, HLA Antigens, HLA-A Antigens, HLA-A2 Antigen, HLA-DR Antigens, Humans, Internet, MHC Class I, Markov Chains, Molecular Sequence Data, Neural Networks (Computer), Pattern Recognition, Peptide Fragments, Peptides, Protein, Protein Binding, Protein Interaction Mapping, Sensitivity and Specificity, Sequence Analysis, Software, T-Lymphocyte, User-Computer Interface, Viral Vaccines, 12067415
[Briggs2002Gene] Scott D Briggs, Tiaojiang Xiao, Zu-Wen Sun, Jennifer A Caldwell, Jeffrey Shabanowitz, Donald F Hunt, C. David Allis, and Brian D Strahl. Gene silencing: trans-histone regulatory pathway in chromatin. Nature, 418(6897):498, Aug 2002. [ bib | DOI | http ]
The fundamental unit of eukaryotic chromatin, the nucleosome, consists of genomic DNA wrapped around the conserved histone proteins H3, H2B, H2A and H4, all of which are variously modified at their amino- and carboxy-terminal tails to influence the dynamics of chromatin structure and function - for example, conjugation of histone H2B with ubiquitin controls the outcome of methylation at a specific lysine residue (Lys 4) on histone H3, which regulates gene silencing in the yeast Saccharomyces cerevisiae. Here we show that ubiquitination of H2B is also necessary for the methylation of Lys 79 in H3, the only modification known to occur away from the histone tails, but that not all methylated lysines in H3 are regulated by this 'trans-histone' pathway because the methylation of Lys 36 in H3 is unaffected. Given that gene silencing is regulated by the methylation of Lys 4 and Lys 79 in histone H3, we suggest that H2B ubiquitination acts as a master switch that controls the site-selective histone methylation patterns responsible for this silencing.

Keywords: Chromatin, chemistry/metabolism; Gene Expression Regulation, Fungal; Gene Silencing; Histone-Lysine N-Methyltransferase; Histones, chemistry/metabolism; Ligases, metabolism; Methylation; Models, Biological; Nuclear Proteins, metabolism; Saccharomyces cerevisiae Proteins; Saccharomyces cerevisiae, genetics/metabolism; Ubiquitin, metabolism; Ubiquitin-Conjugating Enzymes
[Bowd2002Comparing] Christopher Bowd, Kwokleung Chan, Linda M Zangwill, Michael H Goldbaum, Te-Won Lee, Terrence J Sejnowski, and Robert N Weinreb. Comparing neural networks and linear discriminant functions for glaucoma detection using confocal scanning laser ophthalmoscopy of the optic disc. Invest Ophthalmol Vis Sci, 43(11):3444-54, Nov 2002. [ bib | http | .pdf ]
PURPOSE: To determine whether neural network techniques can improve differentiation between glaucomatous and nonglaucomatous eyes, using the optic disc topography parameters of the Heidelberg Retina Tomograph (HRT; Heidelberg Engineering, Heidelberg, Germany). METHODS: With the HRT, one eye was imaged from each of 108 patients with glaucoma (defined as having repeatable visual field defects with standard automated perimetry) and 189 subjects without glaucoma (no visual field defects with healthy-appearing optic disc and retinal nerve fiber layer on clinical examination) and the optic nerve topography was defined by 17 global and 66 regional HRT parameters. With all the HRT parameters used as input, receiver operating characteristic (ROC) curves were generated for the classification of eyes, by three neural network techniques: linear and Gaussian support vector machines (SVM linear and SVM Gaussian, respectively) and a multilayer perceptron (MLP), as well as four previously proposed linear discriminant functions (LDFs) and one LDF developed on the current data with all HRT parameters used as input. RESULTS: The areas under the ROC curves for SVM linear and SVM Gaussian were 0.938 and 0.945, respectively; for MLP, 0.941; for the current LDF, 0.906; and for the best previously proposed LDF, 0.890. With the use of forward selection and backward elimination optimization techniques, the areas under the ROC curves for SVM Gaussian and the current LDF were increased to approximately 0.96. CONCLUSIONS: Trained neural networks, with global and regional HRT parameters used as input, improve on previously proposed HRT parameter-based LDFs for discriminating between glaucomatous and nonglaucomatous eyes. The performance of both neural networks and LDFs can be improved with optimization of the features in the input. Neural network analyses show promise for increasing diagnostic accuracy of tests for glaucoma.

Keywords: Acute, Algorithms, Animals, Anion Exchange Resins, Artificial Intelligence, Automated, Base Pair Mismatch, Base Pairing, Base Sequence, Biological, Biosensing Techniques, Carcinoma, Chemical, Chromatography, Citric Acid Cycle, Classification, Cluster Analysis, Comparative Study, Computational Biology, Computer-Assisted, Cystadenoma, DNA, Databases, Decision Making, Diagnosis, Differential, Discriminant Analysis, Drug, Drug Design, Electrostatics, Eukaryotic Cells, Factual, Feasibility Studies, Female, Gene Expression, Gene Expression Profiling, Gene Expression Regulation, Genes, Genetic, Genetic Heterogeneity, Genetic Markers, Glaucoma, Hemolysins, Humans, Internet, Intraocular Pressure, Ion Exchange, Lasers, Leukemia, Ligands, Likelihood Functions, Logistic Models, Lung Neoplasms, Lymphocytic, Lymphoma, Markov Chains, Mathematics, Messenger, Models, Molecular, Molecular Probe Techniques, Molecular Sequence Data, Nanotechnology, Neoplasm, Neoplasms, Neoplastic, Neural Networks (Computer), Non-P.H.S., Non-Small-Cell Lung, Non-U.S. Gov't, Nucleic Acid Conformation, Nucleic Acid Hybridization, Observer Variation, Oligonucleotide Array Sequence Analysis, Open-Angle, Ophthalmoscopy, Optic Disk, Ovarian Neoplasms, P.H.S., Pattern Recognition, Probability, Probability Learning, Protein Binding, Protein Conformation, Proteins, Quality Control, Quantum Theory, RNA, RNA Splicing, ROC Curve, Receptors, Reference Values, Regression Analysis, Reproducibility of Results, Research Support, Robotics, Saccharomyces cerevisiae Proteins, Sensitivity and Specificity, Sequence Analysis, Signal Processing, Software, Statistical, Stomach Neoplasms, Structural, Structure-Activity Relationship, Thermodynamics, Transcription, Tumor Markers, U.S. Gov't, 12407155
[Ong2002Stable] Shao-En Ong, Blagoy Blagoev, Irina Kratchmarova, Dan Bach Kristensen, Hanno Steen, Akhilesh Pandey, and Matthias Mann. Stable isotope labeling by amino acids in cell culture, silac, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics, 1(5):376-386, May 2002. [ bib ]
Quantitative proteomics has traditionally been performed by two-dimensional gel electrophoresis, but recently, mass spectrometric methods based on stable isotope quantitation have shown great promise for the simultaneous and automated identification and quantitation of complex protein mixtures. Here we describe a method, termed SILAC, for stable isotope labeling by amino acids in cell culture, for the in vivo incorporation of specific amino acids into all mammalian proteins. Mammalian cell lines are grown in media lacking a standard essential amino acid but supplemented with a non-radioactive, isotopically labeled form of that amino acid, in this case deuterated leucine (Leu-d3). We find that growth of cells maintained in these media is no different from growth in normal media as evidenced by cell morphology, doubling time, and ability to differentiate. Complete incorporation of Leu-d3 occurred after five doublings in the cell lines and proteins studied. Protein populations from experimental and control samples are mixed directly after harvesting, and mass spectrometric identification is straightforward as every leucine-containing peptide incorporates either all normal leucine or all Leu-d3. We have applied this technique to the relative quantitation of changes in protein expression during the process of muscle cell differentiation. Proteins that were found to be up-regulated during this process include glyceraldehyde-3-phosphate dehydrogenase, fibronectin, and pyruvate kinase M2. SILAC is a simple, inexpensive, and accurate procedure that can be used as a quantitative proteomic approach in any cell culture system.

Keywords: 3T3 Cells; Amino Acids; Animals; Cell Culture Techniques; Cell Differentiation; Cell Line; Deuterium; Genetic Techniques; Hydrogen-Ion Concentration; Leucine; Mice; Muscles; Peptides; Proteomics; Time Factors; Up-Regulation
[Zhu2003Introduction] Lingyun Zhu, Baoming Wu, and Changxiu Cao. Introduction to medical data mining. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi, 20(3):559-62, Sep 2003. [ bib ]
Modern medicine generates a great deal of information stored in the medical database. Extracting useful knowledge and providing scientific decision-making for the diagnosis and treatment of disease from the database increasingly becomes necessary. Data mining in medicine can deal with this problem. It can also improve the management level of hospital information and promote the development of telemedicine and community medicine. Because the medical information is characteristic of redundancy, multi-attribution, incompletion and closely related with time, medical data mining differs from other one. In this paper we have discussed the key techniques of medical data mining involving pretreatment of medical data, fusion of different pattern and resource, fast and robust mining algorithms and reliability of mining results. The methods and applications of medical data mining based on computation intelligence such as artificial neural network, fuzzy system, evolutionary algorithms, rough set, and support vector machine have been introduced. The features and problems in data mining are summarized in the last section.

Keywords: Algorithms, Anion Exchange Resins, Automatic Data Processing, Chemical, Chromatography, Computational Biology, Computer-Assisted, Data Interpretation, Databases, Decision Making, Decision Trees, English Abstract, Factual, Fuzzy Logic, Humans, Indicators and Reagents, Information Storage and Retrieval, Ion Exchange, Models, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Nucleic Acid Conformation, P.H.S., Proteins, Quantitative Structure-Activity Relationship, RNA, ROC Curve, Research Support, Sequence Analysis, Statistical, Transfer, U.S. Gov't, 14565039
[Zhao2003Applicationa] Y. Zhao, C. Pinilla, D. Valmori, R. Martin, and R. Simon. Application of support vector machines for T-cell epitopes prediction. Bioinformatics, 19(15):1978-1984, Oct 2003. [ bib ]
MOTIVATION: The T-cell receptor, a major histocompatibility complex (MHC) molecule, and a bound antigenic peptide, play major roles in the process of antigen-specific T-cell activation. T-cell recognition was long considered exquisitely specific. Recent data also indicate that it is highly flexible, and one receptor may recognize thousands of different peptides. Deciphering the patterns of peptides that elicit a MHC restricted T-cell response is critical for vaccine development. RESULTS: For the first time we develop a support vector machine (SVM) for T-cell epitope prediction with an MHC type I restricted T-cell clone. Using cross-validation, we demonstrate that SVMs can be trained on relatively small data sets to provide prediction more accurate than those based on previously published methods or on MHC binding. SUPPLEMENTARY INFORMATION: Data for 203 synthesized peptides is available at http://linus.nci.nih.gov/Data/LAU203_Peptide.pdf

Keywords: Algorithms, Amino Acid Sequence, Antigen, Antigen Presentation, Antigen-Antibody Complex, Artificial Intelligence, Autoimmune Diseases, Autoimmunity, Bacterial Proteins, CD4-Positive T-Lymphocytes, Cell Proliferation, Cells, Clone Cells, Cluster Analysis, Conserved Sequence, Cross Reactions, Cultured, Cytokines, Databases, Epitope Mapping, Epitopes, Gene Products, Genetic, HIV-1, HLA-DQ Antigens, HLA-DR2 Antigen, Haplotypes, Helper-Inducer, Hemagglutination, Histocompatibility Antigens Class I, Humans, K562 Cells, Molecular Mimicry, Molecular Sequence Data, Multiple Sclerosis, Myelin Proteins, Neural Networks (Computer), Orthomyxoviridae, Peptide Library, Peptides, Protein, Protein Binding, Protein Interaction Mapping, ROC Curve, Receptors, Relapsing-Remitting, Reproducibility of Results, Reverse Transcriptase Polymerase Chain Reaction, Sensitivity and Specificity, Sequence Analysis, Structure-Activity Relationship, T-Cell, T-Lymphocyte, T-Lymphocytes, Torque teno virus, Viral, Viral Proteins, gag, 14555632
[Waterman2003Transcriptional] S.R. Waterman and P.L.C. Small. Transcriptional expression of escherichia coli glutamate-dependent acid resistance genes gada and gadbc in an hns rpos mutant. J. Bacteriol., 185(15):4644-4647, Aug 2003. [ bib ]
Resistance to being killed by acidic environments with pH values lower than 3 is an important feature of both pathogenic and nonpathogenic Escherichia coli. The most potent E. coli acid resistance system utilizes two isoforms of glutamate decarboxylase encoded by gadA and gadB and a putative glutamate:gamma-aminobutyric acid antiporter encoded by gadC. The gad system is controlled by two repressors (H-NS and CRP), one activator (GadX), one repressor-activator (GadW), and two sigma factors (sigma(S) and sigma(70)). In contrast to results of previous reports, we demonstrate that gad transcription can be detected in an hns rpoS mutant strain of E. coli K-12, indicating that gad promoters can be initiated by sigma(70) in the absence of H-NS.

Keywords: Bacterial Proteins; DNA-Binding Proteins; Drug Resistance, Bacterial; Escherichia coli; Escherichia coli Proteins; Gene Expression Regulation, Bacterial; Glutamate Decarboxylase; Glutamates; Hydrogen-Ion Concentration; Membrane Proteins; Mutation; Sigma Factor; Transcription, Genetic
[Vickers2003Efficient] T. A. Vickers, S. Koo, C. F. Bennett, S. T. Crooke, N. M. Dean, and B. F. Baker. Efficient reduction of target RNAs by small interfering RNA and RNase H-dependent antisense agents. A comparative analysis. J. Biol. Chem., 278(9):7108-18, Feb 2003. [ bib | DOI | http ]
RNA interference can be considered as an antisense mechanism of action that utilizes a double-stranded RNase to promote hydrolysis of the target RNA. We have performed a comparative study of optimized antisense oligonucleotides designed to work by an RNA interference mechanism to oligonucleotides designed to work by an RNase H-dependent mechanism in human cells. The potency, maximal effectiveness, duration of action, and sequence specificity of optimized RNase H-dependent oligonucleotides and small interfering RNA (siRNA) oligonucleotide duplexes were evaluated and found to be comparable. Effects of base mismatches on activity were determined to be position-dependent for both siRNA oligonucleotides and RNase H-dependent oligonucleotides. In addition, we determined that the activity of both siRNA oligonucleotides and RNase H-dependent oligonucleotides is affected by the secondary structure of the target mRNA. To determine whether positions on target RNA identified as being susceptible for RNase H-mediated degradation would be coincident with siRNA target sites, we evaluated the effectiveness of siRNAs designed to bind the same position on the target mRNA as RNase H-dependent oligonucleotides. Examination of 80 siRNA oligonucleotide duplexes designed to bind to RNA from four distinct human genes revealed that, in general, activity correlated with the activity to RNase H-dependent oligonucleotides designed to the same site, although some exceptions were noted. The one major difference between the two strategies is that RNase H-dependent oligonucleotides were determined to be active when directed against targets in the pre-mRNA, whereas siRNAs were not. These results demonstrate that siRNA oligonucleotide- and RNase H-dependent antisense strategies are both valid strategies for evaluating function of genes in cell-based assays.

Keywords: Animals, Antisense, Base Sequence, COS Cells, Calf Thymus, Cultured, Dose-Response Relationship, Drug, Flow Cytometry, Humans, Intercellular Adhesion Molecule-1, Introns, Luciferases, Messenger, Molecular Sequence Data, Nucleic Acid Conformation, Oligonucleotides, PTEN Phosphohydrolase, Phosphoric Monoester Hydrolases, Protein Structure, RNA, Ribonuclease H, Small Interfering, Tertiary, Time Factors, Tumor Cells, Tumor Suppressor Proteins, 12500975
[Takahashi2003Proteomic] Nobuhiro Takahashi, Mitsuaki Yanagida, Sally Fujiyama, Toshiya Hayano, and Toshiaki Isobe. Proteomic snapshot analyses of preribosomal ribonucleoprotein complexes formed at various stages of ribosome biogenesis in yeast and mammalian cells. Mass Spectrom Rev, 22(5):287-317, 2003. [ bib | DOI | http | .pdf ]
Proteomic technologies powered by advancements in mass spectrometry and bioinformatics and coupled with accumulated genome sequence data allow a comprehensive study of cell function through large-scale and systematic protein identifications of protein constituents of the cell and tissues, as well as of multi-protein complexes that carry out many cellular function in a higher-order network in the cell. One of the most extensively analyzed cellular functions by proteomics is the production of ribosome, the protein-synthesis machinery, in the nucle(ol)us-the main site of ribosome biogenesis. The use of tagged proteins as affinity bait, coupled with mass spectrometric identification, enabled us to isolate synthetic intermediates of ribosomes that might represent snapshots of nascent ribosomes at particular stages of ribosome biogenesis and to identify their constituents-some of which showed dynamic changes for association with the intermediates at various stages of ribosome biogenesis. In this review, in conjunction with the results from yeast cells, our proteomic approach to analyze ribosome biogenesis in mammalian cells is described.

Keywords: Affinity Labels, Animals, Comparative Study, Electrospray Ionization, Genetic, Macromolecular Substances, Mass, Mitosis, Non-P.H.S., Non-U.S. Gov't, P.H.S., Protein Interaction Mapping, Proteome, Proteomics, Research Support, Ribonucleoproteins, Ribosomes, Saccharomyces cerevisiae, Saccharomyces cerevisiae Proteins, Signal Transduction, Spectrometry, Transcription, U.S. Gov't, 12949916
[Sheinerman2003Sequence] Felix B Sheinerman, Bissan Al-Lazikani, and Barry Honig. Sequence, structure and energetic determinants of phosphopeptide selectivity of SH2 domains. J. Mol. Biol., 334(4):823-841, Dec 2003. [ bib ]
Here, we present an approach for the prediction of binding preferences of members of a large protein family for which structural information for a number of family members bound to a substrate is available. The approach involves a number of steps. First, an accurate multiple alignment of sequences of all members of a protein family is constructed on the basis of a multiple structural superposition of family members with known structure. Second, the methods of continuum electrostatics are used to characterize the energetic contribution of each residue in a protein to the binding of its substrate. Residues that make a significant contribution are mapped onto the protein sequence and are used to define a "binding site signature" for the complex being considered. Third, sequences whose structures have not been determined are checked to see if they have binding-site signatures similar to one of the known complexes. Predictions of binding affinity to a given substrate are based on similarities in binding-site signature. An important component of the approach is the introduction of a context-specific substitution matrix suitable for comparison of binding-site residues.The methods are applied to the prediction of phosphopeptide selectivity of SH2 domains. To this end, the energetic roles of all protein residues in 17 different complexes of SH2 domains with their cognate targets are analyzed. The total number of residues that make significant contributions to binding is found to vary from nine to 19 in different complexes. These energetically important residues are found to contribute to binding through a variety of mechanisms, involving both electrostatic and hydrophobic interactions. Binding-site signatures are found to involve residues in different positions in SH2 sequences, some of them as far as 9A away from a bound peptide. Surprisingly, similarities in the signatures of different domains do not correlate with whole-domain sequence identities unless the latter is greater than 50%.An extensive comparison with the optimal binding motifs determined by peptide library experiments, as well as other experimental data indicate that the similarity in binding preferences of different SH2 domains can be deduced on the basis of their binding-site signatures. The analysis provides a rationale for the empirically derived classification of SH2 domains described by Songyang & Cantley, in that proteins in the same group are found to have similar residues at positions important for binding. Confident predictions of binding preference can be made for about 85% of SH2 domain sequences found in SWISSPROT. The approach described in this work is quite general and can, in principle, be used to analyze binding preferences of members of large protein families for which structural information for a number of family members is available. It also offers a strategy for predicting cross-reactivity of compounds designed to bind to a particular target, for example in structure-based drug design.

Keywords: Amino Acid Sequence; Binding Sites; Molecular Sequence Data; Peptide Library; Phosphopeptides; Protein Binding; Sequence Alignment; Substrate Specificity; src Homology Domains
[Salim2003Combination] N. Salim, J. Holliday, and P. Willett. Combination of fingerprint-based similarity coefficients using data fusion. J Chem Inf Comput Sci, 43(2):435-442, 2003. [ bib | DOI | http ]
Many different types of similarity coefficients have been described in the literature. Since different coefficients take into account different characteristics when assessing the degree of similarity between molecules, it is reasonable to combine them to further optimize the measures of similarity between molecules. This paper describes experiments in which data fusion is used to combine several binary similarity coefficients to get an overall estimate of similarity for searching databases of bioactive molecules. The results show that search performances can be improved by combining coefficients with little extra computational cost. However, there is no single combination which gives a consistently high performance for all search types.

Keywords: 80 and over, Acid-Base Imbalance, Acute, Acute Disease, Adolescent, Adult, African Americans, Aged, Anemia, Animals, Anti-HIV Agents, Anti-Infective Agents, Antibiotics, Antibodies, Antineoplastic, Antineoplastic Agents, Antineoplastic Combined Chemotherapy Protocols, Antitubercular Agents, Aorta, Asparaginase, Autoimmune, B-Cell, Bangladesh, Bicarbonates, Biological Markers, Blood Glucose, California, Camptothecin, Cellulitis, Chorionic Gonadotropin, Chronic Disease, Ciprofloxacin, Clinical Protocols, Colorectal Neoplasms, Combination, Comparative Study, Daunorubicin, Decision Trees, Dexamethasone, Diabetes Mellitus, Dideoxynucleosides, Directly Observed Therapy, Disease Transmission, Drug Administration Schedule, Drug Resistance, Drug Therapy, English Abstract, Female, Fluorouracil, Follow-Up Studies, Glucose Tolerance Test, Glucosephosphate Dehydrogenase, Glyburide, HIV Infections, HIV-1, Health Planning, Health Resources, Helminth, Hemolysis, Hemolytic, Hormonal, Hospital Mortality, Human, Humans, Hypoglycemic Agents, Immunoglobulin M, In Vitro, Incidence, Indinavir, Insulin, Intensive Care Units, Interstitial, Lactates, Leucovorin, Leukemia, Male, Maternal Age, Middle Aged, Motor Activity, Multidrug-Resistant, Mutation, Nephritis, Non-U.S. Gov't, Organoplatinum Compounds, Pennsylvania, Phytotherapy, Plant Extracts, Plant Leaves, Population Dynamics, Potassium Channels, Prednisone, Pregnancy, Pregnancy Outcome, Prenatal, Prenatal Care, Progesterone, Prognosis, Prospective Studies, Pulmonary, Rabbits, Randomized Controlled Trials, Rats, Research Support, Retrospective Studies, Risk Assessment, Scalp Dermatoses, Schistosomiasis japonica, Severity of Illness Index, Spondylarthropathies, Streptozocin, Survival Rate, Trauma Centers, Trauma Severity Indices, Tubal, Tuberculosis, Type 2, Ultrasonography, Vertical, Vincristine, Viral, Viral Load, Wistar, Wounds and Injuries, Ziziphus, beta Subunit, 12653506
[Roschke2003Karyotypic] Anna V Roschke, Giovanni Tonon, Kristen S Gehlhaus, Nicolas McTyre, Kimberly J Bussey, Samir Lababidi, Dominic A Scudiero, John N Weinstein, and Ilan R Kirsch. Karyotypic complexity of the nci-60 drug-screening panel. Cancer Res, 63(24):8634-8647, Dec 2003. [ bib ]
We used spectral karyotyping to provide a detailed analysis of karyotypic aberrations in the diverse group of cancer cell lines established by the National Cancer Institute for the purpose of anticancer drug discovery. Along with the karyotypic description of these cell lines we defined and studied karyotypic complexity and heterogeneity (metaphase-to-metaphase variations) based on three separate components of genomic anatomy: (a) ploidy; (b) numerical changes; and (c) structural rearrangements. A wide variation in these parameters was evident in these cell lines, and different association patterns between them were revealed. Analysis of the breakpoints and other specific features of chromosomal changes across the entire set of cell lines or within particular lineages pointed to a striking lability of centromeric regions that distinguishes the epithelial tumor cell lines. We have also found that balanced translocations are as frequent in absolute number within the cell lines derived from solid as from hematopoietic tumors. Important similarities were noticed between karyotypic changes in cancer cell lines and that seen in primary tumors. This dataset offers insights into the causes and consequences of the destabilizing events and chromosomal instability that may occur during tumor development and progression. It also provides a foundation for investigating associations between structural genome anatomy and cancer molecular markers and targets, gene expression, gene dosage, and resistance or sensitivity to tens of thousands of molecular compounds.

Keywords: Cell Line, Tumor; Chromosome Aberrations; DNA Repair, genetics; Drug Screening Assays, Antitumor; Humans; Neoplasms, genetics/pathology; Ploidies; Retinoblastoma Protein, genetics; Spectral Karyotyping; Translocation, Genetic; Tumor Suppressor Protein p53, genetics
[Patterson2003Proteomics] Scott D Patterson and Ruedi H Aebersold. Proteomics: the first decade and beyond. Nat Genet, 33 Suppl:311-323, Mar 2003. [ bib | DOI | http ]
Proteomics is the systematic study of the many and diverse properties of proteins in a parallel manner with the aim of providing detailed descriptions of the structure, function and control of biological systems in health and disease. Advances in methods and technologies have catalyzed an expansion of the scope of biological studies from the reductionist biochemical analysis of single proteins to proteome-wide measurements. Proteomics and other complementary analysis methods are essential components of the emerging 'systems biology' approach that seeks to comprehensively describe biological systems through integration of diverse types of data and, in the future, to ultimately allow computational simulations of complex biological systems.

Keywords: Amino Acid Sequence; Base Sequence; Chromatography, Liquid; Computational Biology; DNA; Genetic Techniques; History, 20th Century; History, 21st Century; Mass Spectrometry; Oligonucleotide Array Sequence Analysis; Proteins; Proteomics
[Jambon2003New] Martin Jambon, Anne Imberty, Gilbert Deléage, and Christophe Geourjon. A new bioinformatic approach to detect common 3d sites in protein structures. Proteins, 52(2):137-145, Aug 2003. [ bib | DOI | http ]
An innovative bioinformatic method has been designed and implemented to detect similar three-dimensional (3D) sites in proteins. This approach allows the comparison of protein structures or substructures and detects local spatial similarities: this method is completely independent from the amino acid sequence and from the backbone structure. In contrast to already existing tools, the basis for this method is a representation of the protein structure by a set of stereochemical groups that are defined independently from the notion of amino acid. An efficient heuristic for finding similarities that uses graphs of triangles of chemical groups to represent the protein structures has been developed. The implementation of this heuristic constitutes a software named SuMo (Surfing the Molecules), which allows the dynamic definition of chemical groups, the selection of sites in the proteins, and the management and screening of databases. To show the relevance of this approach, we focused on two extreme examples illustrating convergent and divergent evolution. In two unrelated serine proteases, SuMo detects one common site, which corresponds to the catalytic triad. In the legume lectins family composed of >100 structures that share similar sequences and folds but may have lost their ability to bind a carbohydrate molecule, SuMo discriminates between functional and non-functional lectins with a selectivity of 96%. The time needed for searching a given site in a protein structure is typically 0.1 s on a PIII 800MHz/Linux computer; thus, in further studies, SuMo will be used to screen the PDB.

Keywords: Algorithms; Catalytic Domain; Chymotrypsin, chemistry/genetics; Computational Biology, methods; Evolution, Molecular; Fabaceae, chemistry; Models, Molecular; Plant Lectins, chemistry/genetics; Protein Conformation; Proteins, chemistry; Reproducibility of Results; Subtilisin, chemistry/genetics
[Bild2003] E. Huang, S. Ishida, J. Pittman, H. Dressman, A. Bild, M. Kloos, M. D'Amico, R. G. Pestell, M. West, and J. R. Nevins. Gene expression phenotypic models that predict the activity of oncogenic pathways. Nat Genet, 34(2):226-30, 2003. [ bib ]
High-density DNA microarrays measure expression of large numbers of genes in one assay. The ability to find underlying structure in complex gene expression data sets and rigorously test association of that structure with biological conditions is essential to developing multi-faceted views of the gene activity that defines cellular phenotype. We sought to connect features of gene expression data with biological hypotheses by integrating 'metagene' patterns from DNA microarray experiments in the characterization and prediction of oncogenic phenotypes. We applied these techniques to the analysis of regulatory pathways controlled by the genes HRAS (Harvey rat sarcoma viral oncogene homolog), MYC (myelocytomatosis viral oncogene homolog) and E2F1, E2F2 and E2F3 (encoding E2F transcription factors 1, 2 and 3, respectively). The phenotypic models accurately predict the activity of these pathways in the context of normal cell proliferation. Moreover, the metagene models trained with gene expression patterns evoked by ectopic production of Myc or Ras proteins in primary tissue culture cells properly predict the activity of in vivo tumor models that result from deregulation of the MYC or HRAS pathways. We conclude that these gene expression phenotypes have the potential to characterize the complex genetic alterations that typify the neoplastic state, whether in vitro or in vivo, in a way that truly reflects the complexity of the regulatory pathways that are affected.

Keywords: Animals *Cell Cycle Proteins *DNA-Binding Proteins E2F Transcription Factors E2F1 Transcription Factor E2F2 Transcription Factor E2F3 Transcription Factor Female *Gene Expression Gene Expression Profiling Gene Expression Regulation, Neoplastic Genes, myc Genes, ras Mammary Neoplasms, Experimental/genetics Mice Mice, Transgenic *Models, Genetic Oligonucleotide Array Sequence Analysis *Oncogenes Phenotype Transcription Factors/genetics
[Harborth2003Sequence] J. Harborth, S. M. Elbashir, K. Vandenburgh, H. Manninga, S. A. Scaringe, K. Weber, and T. Tuschl. Sequence, chemical, and structural variation of small interfering RNAs and short hairpin RNAs and the effect on mammalian gene silencing. Antisense Nucleic Acid. Drug. Dev., 13(2):83-105, Apr 2003. [ bib | DOI | http ]
Small interfering RNAs (siRNAs) induce sequence-specific gene silencing in mammalian cells and guide mRNA degradation in the process of RNA interference (RNAi). By targeting endogenous lamin A/C mRNA in human HeLa or mouse SW3T3 cells, we investigated the positional variation of siRNA-mediated gene silencing. We find cell-type-dependent global effects and cell-type-independent positional effects. HeLa cells were about 2-fold more responsive to siRNAs than SW3T3 cells but displayed a very similar pattern of positional variation of lamin A/C silencing. In HeLa cells, 26 of 44 tested standard 21-nucleotide (nt) siRNA duplexes reduced the protein expression by at least 90%, and only 2 duplexes reduced the lamin A/C proteins to <50%. Fluorescent chromophores did not perturb gene silencing when conjugated to the 5'-end or 3'-end of the sense siRNA strand and the 5'-end of the antisense siRNA strand, but conjugation to the 3'-end of the antisense siRNA abolished gene silencing. RNase-protecting phosphorothioate and 2'-fluoropyrimidine RNA backbone modifications of siRNAs did not significantly affect silencing efficiency, although cytotoxic effects were observed when every second phosphate of an siRNA duplex was replaced by phosphorothioate. Synthetic RNA hairpin loops were subsequently evaluated for lamin A/C silencing as a function of stem length and loop composition. As long as the 5'-end of the guide strand coincided with the 5'-end of the hairpin RNA, 19-29 base pair (bp) hairpins effectively silenced lamin A/C, but when the hairpin started with the 5'-end of the sense strand, only 21-29 bp hairpins were highly active.

Keywords: Adaptor Protein Complex alpha Subunits, Animal, Animals, Antisense, Apolipoproteins B, Base Sequence, Biological Transport, Blotting, Catalytic, Cell Line, Cell Membrane, Cell Survival, Chemical, Cholesterol, Clathrin, Clathrin Heavy Chains, Disease Models, Endocytosis, Epidermal Growth Factor, Fluorescence, Gene Expression Profiling, Gene Silencing, Gene Therapy, Hela Cells, Humans, Injections, Intravenous, Jejunum, Kinetics, Lamin Type A, Liver, Messenger, Metabolic Syndrome X, Mice, Microscopy, Models, Molecular Sequence Data, NIH 3T3 Cells, Non-U.S. Gov't, Nucleic Acid, Oligonucleotides, Open Reading Frames, Post-Transcriptional, Protein Isoforms, Pyrimidines, RNA, RNA Interference, RNA Processing, RNA Stability, Research Support, Reverse Transcriptase Polymerase Chain Reaction, Sensitivity and Specificity, Sequence Homology, Small Interfering, Subcellular Fractions, Swiss 3T3 Cells, Thionucleotides, Time Factors, Transfection, Transferrin, Transgenic, Tumor, Western, 12804036
[Ge2003Reducing] Xijin Ge, Shuichi Tsutsumi, Hiroyuki Aburatani, and Shuichi Iwata. Reducing false positives in molecular pattern recognition. Genome Inform Ser Workshop Genome Inform, 14:34-43, 2003. [ bib ]
In the search for new cancer subtypes by gene expression profiling, it is essential to avoid misclassifying samples of unknown subtypes as known ones. In this paper, we evaluated the false positive error rates of several classification algorithms through a 'null test' by presenting classifiers a large collection of independent samples that do not belong to any of the tumor types in the training dataset. The benchmark dataset is available at www2.genome.rcast.u-tokyo.ac.jp/pm/. We found that k-nearest neighbor (KNN) and support vector machine (SVM) have very high false positive error rates when fewer genes (<100) are used in prediction. The error rate can be partially reduced by including more genes. On the other hand, prototype matching (PM) method has a much lower false positive error rate. Such robustness can be achieved without loss of sensitivity by introducing suitable measures of prediction confidence. We also proposed a cluster-and-select technique to select genes for classification. The nonparametric Kruskal-Wallis H test is employed to select genes differentially expressed in multiple tumor types. To reduce the redundancy, we then divided these genes into clusters with similar expression patterns and selected a given number of genes from each cluster. The reliability of the new algorithm is tested on three public datasets.

Keywords: Amino Acid Sequence, Amino Acids, Animals, Automated, Base Sequence, Bayes Theorem, Biological, Carbohydrate Conformation, Carbohydrate Sequence, Cattle, Computational Biology, Computer Simulation, Crystallography, DNA, Databases, Factual, False Positive Reactions, Gene Expression Profiling, Genes, Genetic, Genetic Techniques, Genome, Histocompatibility Antigens Class I, Human, Humans, Introns, Least-Squares Analysis, MHC Class I, Major Histocompatibility Complex, Markov Chains, Messenger, Mice, Models, Monosaccharides, Neoplasms, Non-U.S. Gov't, Nonparametric, Pattern Recognition, Peptides, Phylogeny, Plants, Poly A, Polysaccharides, Predictive Value of Tests, Protein, Protein Structure, Proteins, RNA, Rats, Reproducibility of Results, Research Support, Saccharomyces cerevisiae, Secondary, Sequence Alignment, Software, Species Specificity, Statistics, Theoretical, X-Ray, 15706518
[Formosa2003Changing] T. Formosa. Changing the dna landscape: putting a spn on chromatin. Curr Top Microbiol Immunol, 274:171-201, 2003. [ bib ]
In eukaryotic cells, transcription and replication each occur on DNA templates that are incorporated into nucleosomes. Formation of chromatin generally limits accessibility of specific DNA sequences and inhibits progression of polymerases as they copy information from the DNA. The processes that select sites for initiating either transcription or replication are therefore strongly influenced by factors that modulate the properties of chromatin proteins. Further, in order to elongate their products, both DNA and RNA polymerases must be able to overcome the inhibition presented by chromatin (Lipford and Bell 2001; Workman and Kingston 1998). One way to adjust the properties of chromatin proteins is to covalently modify them by adding or removing chemical moieties. Both histone and non-histone chromatin proteins are altered by acetylation, methylation, and other changes, and the 'nucleosome modifying' complexes that perform these reactions are important components of pathways of transcriptional regulation (Cote 2002; Orphanides and Reinberg 2000; Roth et al. 2001; Strahl and Allis 2000; Workman and Kingston 1998). Another way to alter the effects of nucleosomes is to change the position of the histone octamers relative to specific DNA sequences (Orphanides and Reinberg 2000; Verrijzer 2002; Wang 2002; Workman and Kingston 1998). Since the ability of a sequence to be bound by specific proteins can vary significantly whether the sequence is in the linkers between nucleosomes or at various positions within a nucleosome, 'nucleosome remodeling' complexes that rearrange nucleosome positioning are also important regulators of transcription. Since the DNA replication machinery has to encounter many of the same challenges posed by chromatin, it seems likely that modifying and remodeling complexes also act during duplication of the genome, but most of the current information on these factors relates to regulation of transcription. This chapter describes the factor known variously as FACT in humans, where it promotes elongation of RNA polymerase II on nucleosomal templates in vitro (Orphanides et al. 1998, 1999), DUF in frogs, where it is needed for DNA replication in oocyte extracts (Okuhara et al. 1999), and CP or SPN in yeast, where it is linked in vivo to both transcription and replication (Brewster et al. 2001; Formosa et al. 2001). Like the nucleosome modifying and remodeling complexes, it is broadly conserved among eukaryotes, affects a wide range of processes that utilize chromatin, and directly alters the properties of nucleosomes. However, it does not have nucleosome modifying or standard ATP-dependent remodeling activity, and therefore represents a third class of chromatin modulating factors. It is also presently unique in the extensive connections it displays with both transcription and replication: FACT/DUF/CP/SPN appears to modify nucleosomes in a way that is directly important for the efficient functioning of both RNA polymerases and DNA polymerases. While less is known about the mechanisms it uses to promote its functions than for other factors that affect chromatin, it is clearly an essential part of the complex mixture of activities that modulate access to DNA within chromatin. Physical and genetic interactions suggest that FACT/DUF/CP/SPN affects multiple pathways within replication and transcription as a member of several distinct complexes. Some of the interactions are easy to assimilate into models for replication or transcription, such as direct binding to DNA polymerase alpha (Wittmeyer and Formosa 1997; Wittmeyer et al. 1999), association with nucleosome modifying complexes (John et al. 2000), and interaction with factors that participate in elongation of RNA Polymerase II (Gavin et al. 2002; Squazzo et al. 2002). Others are more surprising such as an association with the 19S complex that regulates the function of the 20S proteasome (Ferdous et al. 2001; Xu et al. 1995), and the indication that FACT/DUF/CP/SPN can act as a specificity factor for casein kinase II (Keller et al. 2001). This chapter reviews the varied approaches that have each revealed different aspects of the function of FACT/DUF/CP/SPN, and presents a picture of a factor that can both alter nucleosomes and orchestrate the assembly or activity of a broad range of complexes that act upon chromatin.

Keywords: Animals; Cell Cycle Proteins, metabolism; Chromatin, metabolism; DNA, metabolism; Eukaryotic Cells, metabolism; Gene Expression Regulation; Humans; Saccharomyces cerevisiae Proteins; Transcription Factors, metabolism; Transcription, Genetic; Transcriptional Elongation Factors
[Diekman2003Hybrid] Casey Diekman, Wei He, Nagabhushana Prabhu, and Harvey Cramer. Hybrid methods for automated diagnosis of breast tumors. Anal Quant Cytol Histol, 25(4):183-90, Aug 2003. [ bib ]
OBJECTIVE: To design and analyze a new family of hybrid methods for the diagnosis of breast tumors using fine needle aspirates. STUDY DESIGN: We present a radically new approach to the design of diagnosis systems. In the new approach, a nonlinear classifier with high sensitivity but low specificity is hybridized with a linear classifier having low sensitivity but high specificity. Data from the Wisconsin Breast Cancer Database are used to evaluate, computationally, the performance of the hybrid classifiers. RESULTS: The diagnosis scheme obtained by hybridizing the nonlinear classifier ellipsoidal multisurface method (EMSM) with the linear classifier proximal support vector machine (PSVM) was found to have a mean sensitivity of 97.36% and a mean specificity of 95.14% and was found to yield a 2.44% improvement in the reliability of positive diagnosis over that of EMSM at the expense of 0.4% degradation in the reliability of negative diagnosis, again compared to EMSM. At the 95% confidence level we can trust the hybrid method to be 96.19-98.53% correct in its malignant diagnosis of new tumors and 93.57-96.71% correct in its benign diagnosis. CONCLUSION: Hybrid diagnosis schemes represent a significant paradigm shift and provide a promising new technique to improve the specificity of nonlinear classifiers without seriously affecting the high sensitivity of nonlinear classifiers.

Keywords: Algorithms, Amino Acid Sequence, Amino Acids, Anion Exchange Resins, Antigen-Antibody Complex, Artificial Intelligence, Automated, Automatic Data Processing, Benchmarking, Biological, Biological Markers, Biopsy, Blood Cells, Blood Proteins, Breast Neoplasms, Cell Line, Cellular Structures, Chemical, Chromatography, Chromosome Aberrations, Cluster Analysis, Colonic Neoplasms, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, Computing Methodologies, DNA, Data Interpretation, Databases, Decision Making, Decision Trees, Diagnosis, Diffusion Magnetic Resonance Imaging, Disease, English Abstract, Epitopes, Expert Systems, Factual, Female, Fine-Needle, Fusion, Fuzzy Logic, Gene Expression Profiling, Gene Expression Regulation, Gene Targeting, Genetic, Genome, Histocompatibility Antigens Class I, Humans, Hydrogen Bonding, Hydrophobicity, Image Interpretation, Image Processing, In Vitro, Indicators and Reagents, Information Storage and Retrieval, Ion Exchange, Least-Squares Analysis, Leiomyosarcoma, Liver Cirrhosis, Lung Neoplasms, Magnetic Resonance Imaging, Male, Mass, Mathematical Computing, Matrix-Assisted Laser Desorption-Ionization, Models, Molecular, Molecular Sequence Data, Neoplasm Proteins, Neoplasms, Neoplastic, Nephroblastoma, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Nonl, Nucleic Acid Conformation, Nucleic Acid Hybridization, Oligonucleotide Array Sequence Analysis, Oncogene Proteins, Ovarian Neoplasms, P.H.S., Pattern Recognition, Predictive Value of Tests, Pro, Prostatic Neoplasms, Protein, Protein Binding, Protein Interaction Mapping, Protein Structure, Proteins, Quantitative Structure-Activity Relationship, RNA, ROC Curve, Reproducibility of Results, Research Support, Rhabdomyosarcoma, Secondary, Sensitivity and Specificity, Sequence Alignment, Sequence Analysis, Severity of Illness Index, Software, Solubility, Spectrometry, Statistical, Structure-Activity Relationship, Subcellular Fractions, Subtraction Technique, T-Lymphocyte, Tissue Distribution, Transcription Factors, Transfer, Treatment Outcome, Tumor, Tumor Markers, U.S. Gov't, User-Computer Interface, inear Dynamics, teome, 12961824
[Chenna2003Multiple] R. Chenna, H. Sugawara, T. Koike, R. Lopez, T. J. Gibson, D. G. Higgins, and J. D. Thompson. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res., 31(13):3497-3500, Jul 2003. [ bib ]
The Clustal series of programs are widely used in molecular biology for the multiple alignment of both nucleic acid and protein sequences and for preparing phylogenetic trees. The popularity of the programs depends on a number of factors, including not only the accuracy of the results, but also the robustness, portability and user-friendliness of the programs. New features include NEXUS and FASTA format output, printing range numbers and faster tree calculation. Although, Clustal was originally developed to run on a local computer, numerous Web servers have been set up, notably at the EBI (European Bioinformatics Institute) (http://www.ebi.ac.uk/clustalw/).

Keywords: Algorithms; Amino Acid Sequence; Internet; Nucleic Acids; Phylogeny; Sequence Alignment; Sequence Analysis; Sequence Analysis, Protein; Software
[Chan2003Detection] Ian Chan, William Wells, Robert V Mulkern, Steven Haker, Jianqing Zhang, Kelly H Zou, Stephan E Maier, and Clare M C Tempany. Detection of prostate cancer by integration of line-scan diffusion, T2-mapping and T2-weighted magnetic resonance imaging; a multichannel statistical classifier. Med Phys, 30(9):2390-8, Sep 2003. [ bib | .pdf ]
A multichannel statistical classifier for detecting prostate cancer was developed and validated by combining information from three different magnetic resonance (MR) methodologies: T2-weighted, T2-mapping, and line scan diffusion imaging (LSDI). From these MR sequences, four different sets of image intensities were obtained: T2-weighted (T2W) from T2-weighted imaging, Apparent Diffusion Coefficient (ADC) from LSDI, and proton density (PD) and T2 (T2 Map) from T2-mapping imaging. Manually segmented tumor labels from a radiologist, which were validated by biopsy results, served as tumor "ground truth." Textural features were extracted from the images using co-occurrence matrix (CM) and discrete cosine transform (DCT). Anatomical location of voxels was described by a cylindrical coordinate system. A statistical jack-knife approach was used to evaluate our classifiers. Single-channel maximum likelihood (ML) classifiers were based on 1 of the 4 basic image intensities. Our multichannel classifiers: support vector machine (SVM) and Fisher linear discriminant (FLD), utilized five different sets of derived features. Each classifier generated a summary statistical map that indicated tumor likelihood in the peripheral zone (PZ) of the prostate gland. To assess classifier accuracy, the average areas under the receiver operator characteristic (ROC) curves over all subjects were compared. Our best FLD classifier achieved an average ROC area of 0.839(+/-0.064), and our best SVM classifier achieved an average ROC area of 0.761(+/-0.043). The T2W ML classifier, our best single-channel classifier, only achieved an average ROC area of 0.599(+/-0.146). Compared to the best single-channel ML classifier, our best multichannel FLD and SVM classifiers have statistically superior ROC performance (P=0.0003 and 0.0017, respectively) from pairwise two-sided t-test. By integrating the information from multiple images and capturing the textural and anatomical features in tumor areas, summary statistical maps can potentially aid in image-guided prostate biopsy and assist in guiding and controlling delivery of localized therapy under image guidance.

Keywords: Algorithms, Anion Exchange Resins, Antigen-Antibody Complex, Artificial Intelligence, Automated, Automatic Data Processing, Biological, Blood Cells, Chemical, Chromatography, Cluster Analysis, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, Data Interpretation, Databases, Decision Making, Decision Trees, Diffusion Magnetic Resonance Imaging, English Abstract, Epitopes, Expert Systems, Factual, Fuzzy Logic, Gene Expression Profiling, Gene Expression Regulation, Gene Targeting, Genome, Histocompatibility Antigens Class I, Humans, Image Interpretation, Image Processing, In Vitro, Indicators and Reagents, Information Storage and Retrieval, Ion Exchange, Least-Squares Analysis, Liver Cirrhosis, Magnetic Resonance Imaging, Male, Models, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Nonl, Nucleic Acid Conformation, P.H.S., Pattern Recognition, Pro, Prostatic Neoplasms, Protein, Protein Binding, Protein Interaction Mapping, Proteins, Quantitative Structure-Activity Relationship, RNA, ROC Curve, Reproducibility of Results, Research Support, Sensitivity and Specificity, Sequence Analysis, Severity of Illness Index, Statistical, Structure-Activity Relationship, Subtraction Technique, T-Lymphocyte, Transcription Factors, Transfer, Treatment Outcome, U.S. Gov't, User-Computer Interface, inear Dynamics, teome, 14528961
[Buus2003Sensitive] S. Buus, S. L. Lauemøller, P. Worning, C. Kesmir, T. Frimurer, S. Corbet, A. Fomsgaard, J. Hilden, A. Holm, and S. Brunak. Sensitive quantitative predictions of peptide-MHC binding by a 'query by committee' artificial neural network approach. Tissue Antigens, 62(5):378-384, Nov 2003. [ bib ]
We have generated Artificial Neural Networks (ANN) capable of performing sensitive, quantitative predictions of peptide binding to the MHC class I molecule, HLA-A*0204. We have shown that such quantitative ANN are superior to conventional classification ANN, that have been trained to predict binding vs non-binding peptides. Furthermore, quantitative ANN allowed a straightforward application of a 'Query by Committee' (QBC) principle whereby particularly information-rich peptides could be identified and subsequently tested experimentally. Iterative training based on QBC-selected peptides considerably increased the sensitivity without compromising the efficiency of the prediction. This suggests a general, rational and unbiased approach to the development of high quality predictions of epitopes restricted to this and other HLA molecules. Due to their quantitative nature, such predictions will cover a wide range of MHC-binding affinities of immunological interest, and they can be readily integrated with predictions of other events involved in generating immunogenic epitopes. These predictions have the capacity to perform rapid proteome-wide searches for epitopes. Finally, it is an example of an iterative feedback loop whereby advanced, computational bioinformatics optimize experimental strategy, and vice versa.

Keywords: HLA-A Antigens; Humans; Neural Networks (Computer); Peptides; Protein Binding; Proteome; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, P.H.S.
[Bhasin2003MHCBN] Manoj Bhasin, Harpreet Singh, and G. P S Raghava. MHCBN: a comprehensive database of MHC binding and non-binding peptides. Bioinformatics, 19(5):665-666, Mar 2003. [ bib ]
MHCBN is a comprehensive database of Major Histocompatibility Complex (MHC) binding and non-binding peptides compiled from published literature and existing databases. The latest version of the database has 19 777 entries including 17 129 MHC binders and 2648 MHC non-binders for more than 400 MHC molecules. The database has sequence and structure data of (a) source proteins of peptides and (b) MHC molecules. MHCBN has a number of web tools that include: (i) mapping of peptide on query sequence; (ii) search on any field; (iii) creation of data sets; and (iv) online data submission. The database also provides hypertext links to major databases like SWISS-PROT, PDB, IMGT/HLA-DB, GenBank and PUBMED.

Keywords: Amino Acid Sequence; Binding Sites; Database Management Systems; Databases, Protein; Histocompatibility Antigens; Information Storage and Retrieval; Macromolecular Substances; Major Histocompatibility Complex; Molecular Sequence Data; Peptide Fragments; Peptides; Protein Binding; Protein Conformation; Sequence Alignment; Sequence Analysis, Protein; Structure-Activity Relationship; User-Computer Interface
[Bagirov2003New] A. M. Bagirov, B. Ferguson, S. Ivkovic, G. Saunders, and J. Yearwood. New algorithms for multi-class cancer diagnosis using tumor gene expression signatures. Bioinformatics, 19(14):1800-7, Sep 2003. [ bib | http | .pdf ]
MOTIVATION: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. RESULTS: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set. AVAILABILITY: Available on request from the authors.

Keywords: Algorithms, Amino Acid Sequence, Anion Exchange Resins, Antigen-Antibody Complex, Artificial Intelligence, Automated, Automatic Data Processing, Biological, Blood Cells, Chemical, Chromatography, Cluster Analysis, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, DNA, Data Interpretation, Databases, Decision Making, Decision Trees, Diffusion Magnetic Resonance Imaging, English Abstract, Epitopes, Expert Systems, Factual, Fuzzy Logic, Gene Expression Profiling, Gene Expression Regulation, Gene Targeting, Genetic, Genome, Histocompatibility Antigens Class I, Humans, Image Interpretation, Image Processing, In Vitro, Indicators and Reagents, Information Storage and Retrieval, Ion Exchange, Least-Squares Analysis, Liver Cirrhosis, Magnetic Resonance Imaging, Male, Models, Molecular Sequence Data, Neoplasms, Neoplastic, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Nonl, Nucleic Acid Conformation, Oligonucleotide Array Sequence Analysis, P.H.S., Pattern Recognition, Pro, Prostatic Neoplasms, Protein, Protein Binding, Protein Interaction Mapping, Proteins, Quantitative Structure-Activity Relationship, RNA, ROC Curve, Reproducibility of Results, Research Support, Sensitivity and Specificity, Sequence Alignment, Sequence Analysis, Severity of Illness Index, Statistical, Structure-Activity Relationship, Subtraction Technique, T-Lymphocyte, Transcription Factors, Transfer, Treatment Outcome, Tumor Markers, U.S. Gov't, User-Computer Interface, inear Dynamics, teome, 14512351
[Attwood2003PRINTS] T. K. Attwood, P. Bradley, D. R. Flower, A. Gaulton, N. Maudling, A. L. Mitchell, G. Moulton, A. Nordle, K. Paine, P. Taylor, A. Uddin, and C. Zygouri. Prints and its automatic supplement, preprints. Nucleic Acids Res., 31(1):400-402, Jan 2003. [ bib ]
The PRINTS database houses a collection of protein fingerprints. These may be used to assign uncharacterised sequences to known families and hence to infer tentative functions. The September 2002 release (version 36.0) includes 1800 fingerprints, encoding approximately 11 000 motifs, covering a range of globular and membrane proteins, modular polypeptides and so on. In addition to its continued steady growth, we report here the development of an automatic supplement, prePRINTS, designed to increase the coverage of the resource and reduce some of the manual burdens inherent in its maintenance. The databases are accessible for interrogation and searching at http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/.

Keywords: Amino Acid Motifs; Animals; Automation; Conserved Sequence; Databases, Protein; Proteins; Software
[Arakawa2003Application] M. Arakawa, K. Hasegawa, and K. Funatsu. Application of the novel molecular alignment method using the Hopfield Neural Network to 3D-QSAR. J Chem Inf Comput Sci, 43(5):1396-1402, 2003. [ bib | DOI | http ]
Recently, we investigated and proposed the novel molecular alignment method with the Hopfield Neural Network (HNN). Molecules are represented by four kinds of chemical properties (hydrophobic group, hydrogen-bonding acceptor, hydrogen-bonding donor, and hydrogen-bonding donor/acceptor), and then those properties between two molecules correspond to each other using HNN. The 12 pairs of enzyme-inhibitors were used for validation, and our method could successfully reproduce the real molecular alignments obtained from X-ray crystallography. In this paper, we apply the molecular alignment method to three-dimensional quantitative structure-activity relationship (3D-QSAR) analysis. The two data sets (human epidermal growth factor receptor-2 inhibitors and cyclooxygenase-2 inhibitors) were investigated to validate our method. As a result, the robust and predictive 3D-QSAR models were successfully obtained in both data sets.

Keywords: Chemical, Cyclooxygenase 2, Cyclooxygenase 2 Inhibitors, Cyclooxygenase Inhibitors, Databases, Enzyme Inhibitors, Epidermal Growth Factor, Factual, Humans, Isoenzymes, Membrane Proteins, Models, Molecular, Neural Networks (Computer), Prostaglandin-Endoperoxide Synthases, Quantitative Structure-Activity Relationship, Receptor, 14502472
[Anderson2003new] D.C. Anderson, W. Li, D.G. Payan, and W.S. Noble. A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J Proteome Res, 2(2):137-146, 2003. [ bib | .pdf ]
Shotgun tandem mass spectrometry-based peptide sequencing using programs such as SEQUEST allows high-throughput identification of peptides, which in turn allows the identification of corresponding proteins. We have applied a machine learning algorithm, called the support vector machine, to discriminate between correctly and incorrectly identified peptides using SEQUEST output. Each peptide was characterized by SEQUEST-calculated features such as delta Cn and Xcorr, measurements such as precursor ion current and mass, and additional calculated parameters such as the fraction of matched MS/MS peaks. The trained SVM classifier performed significantly better than previous cutoff-based methods at separating positive from negative peptides. Positive and negative peptides were more readily distinguished in training set data acquired on a QTOF, compared to an ion trap mass spectrometer. The use of 13 features, including four new parameters, significantly improved the separation between positive and negative peptides. Use of the support vector machine and these additional parameters resulted in a more accurate interpretation of peptide MS/MS spectra and is an important step toward automated interpretation of peptide tandem mass spectrometry data in proteomics.

Keywords: biosvm proteomics
[Bleicher2003Hit] K. H. Bleicher, H.-J. Böhm, K. Müller, and A. I. Alanine. Hit and lead generation: beyond high-throughput screening. Nat Rev Drug Discov, 2(5):369-378, May 2003. [ bib | DOI | http ]
The identification of small-molecule modulators of protein function, and the process of transforming these into high-content lead series, are key activities in modern drug discovery. The decisions taken during this process have far-reaching consequences for success later in lead optimization and even more crucially in clinical development. Recently, there has been an increased focus on these activities due to escalating downstream costs resulting from high clinical failure rates. In addition, the vast emerging opportunities from efforts in functional genomics and proteomics demands a departure from the linear process of identification, evaluation and refinement activities towards a more integrated parallel process. This calls for flexible, fast and cost-effective strategies to meet the demands of producing high-content lead series with improved prospects for clinical success.

Keywords: Amino Acid Motifs, Combinatorial Chemistry Techniques, Drug Design, Drug Evaluation, Genomics, Preclinical, Proteomics, 12750740
[Xia2004RNAi] H. Xia, Q. Mao, S. L. Eliason, S. Q. Harper, I. H. Martins, H. T. Orr, H. L. Paulson, L. Yang, R. M. Kotin, and B. L. Davidson. RNAi suppresses polyglutamine-induced neurodegeneration in a model of spinocerebellar ataxia. Nat. Med., 10(8):816-820, Aug 2004. [ bib | DOI | http ]
The dominant polyglutamine expansion diseases, which include spinocerebellar ataxia type 1 (SCA1) and Huntington disease, are progressive, untreatable, neurodegenerative disorders. In inducible mouse models of SCA1 and Huntington disease, repression of mutant allele expression improves disease phenotypes. Thus, therapies designed to inhibit expression of the mutant gene would be beneficial. Here we evaluate the ability of RNA interference (RNAi) to inhibit polyglutamine-induced neurodegeneration caused by mutant ataxin-1 in a mouse model of SCA1. Upon intracerebellar injection, recombinant adeno-associated virus (AAV) vectors expressing short hairpin RNAs profoundly improved motor coordination, restored cerebellar morphology and resolved characteristic ataxin-1 inclusions in Purkinje cells of SCA1 mice. Our data demonstrate in vivo the potential use of RNAi as therapy for dominant neurodegenerative disease.

Keywords: Adenoviridae, Animal, Animals, Blotting, Brain, Cells, Comparative Study, Cultured, Disease Models, Gene Expression, Genetic, Glutamine, Immunohistochemistry, Messenger, Mice, Nerve Degeneration, Nerve Tissue Proteins, Non-U.S. Gov't, Northern, Nuclear Proteins, P.H.S., Plasmids, Psychomotor Performance, Purkinje Cells, RNA, RNA Interference, Research Support, Reverse Transcriptase Polymerase Chain Reaction, Small Interfering, Spinocerebellar Ataxias, Transduction, Transgenic, U.S. Gov't, 15286770
[Wang2004Simple] Kai Wang, Ekachai Jenwitheesuk, Ram Samudrala, and John E Mittler. Simple linear model provides highly accurate genotypic predictions of HIV-1 drug resistance. Antivir Ther, 9(3):343-52, Jun 2004. [ bib ]
Drug resistance is a major obstacle to the successful treatment of HIV-1 infection. Genotypic assays are used widely to provide indirect evidence of drug resistance, but the performance of these assays has been mixed. We used standard stepwise linear regression to construct drug resistance models for seven protease inhibitors and 10 reverse transcriptase inhibitors using data obtained from the Stanford HIV drug resistance database. We evaluated these models by hold-one-out experiments and by tests on an independent dataset. Our linear model outperformed other publicly available genotypic interpretation algorithms, including decision tree, support vector machine and four rules-based algorithms (HIVdb, VGI, ANRS and Rega) under both tests. Interestingly, our model did well despite the absence of any terms for interactions between different residues in protease or reverse transcriptase. The resulting linear models are easy to understand and can potentially assist in choosing combination therapy regimens.

Keywords: Algorithms, Computational Biology, Databases, Drug Resistance, Forecasting, Genetic, Genotype, HIV Protease Inhibitors, HIV-1, Humans, Information Management, Information Storage and Retrieval, Kinetics, Linear Models, Microbial Sensitivity Tests, Models, Non-U.S. Gov't, P.H.S., Periodicals, Point Mutation, Pyrimidinones, Research Support, Reverse Transcriptase Inhibitors, Theoretical, U.S. Gov't, Viral, 15259897
[Vallabhaneni2004Motor] Anirudh Vallabhaneni and Bin He. Motor imagery task classification for brain computer interface applications using spatiotemporal principle component analysis. Neurol Res, 26(3):282-7, Apr 2004. [ bib | DOI | http ]
Classification of single-trial imagined left- and right-hand movements recorded through scalp EEG are explored in this study. Classical event-related desynchronization/synchronization (ERD/ERS) calculation approach was utilized to extract ERD features from the raw scalp EEG signal. Principle Component Analysis (PCA) was used for feature extraction and applied on spatial, as well as temporal dimensions in two consecutive steps. A Support Vector Machine (SVM) classifier using a linear decision function was used to classify each trial as either left or right. The present approach has yielded good classification results and promises to have potential for further refinement for increased accuracy as well as application in online brain computer interface (BCI).

Keywords: Amino Acids, Antibodies, Artificial Intelligence, Biological, Brain, Brain Mapping, Calibration, Comparative Study, Computational Biology, Cysteine, Cystine, Electrodes, Electroencephalography, Evoked Potentials, Female, Horseradish Peroxidase, Humans, Imagery (Psychotherapy), Imagination, Laterality, Male, Monoclonal, Movement, Neoplasms, Non-P.H.S., Non-U.S. Gov't, P.H.S., Perception, Principal Component Analysis, Protein, Protein Array Analysis, Proteins, Research Support, Sensitivity and Specificity, Sequence Analysis, Tumor Markers, U.S. Gov't, User-Computer Interface, 15142321
[Tzeng2004Predicting] Huey-Ming Tzeng, Jer-Guang Hsieh, and Yih-Lon Lin. Predicting nurses' intention to quit with a support vector machine: a new approach to set up an early warning mechanism in human resource management. Comput Inform Nurs, 22(4):232-42, 2004. [ bib ]
This project developed a Support Vector Machine for predicting nurses' intention to quit, using working motivation, job satisfaction, and stress levels as predictors. This study was conducted in three hospitals located in southern Taiwan. The target population was all nurses (389 valid cases). For cross-validation, we randomly split cases into four groups of approximately equal sizes, and performed four training runs. After the training, the average percentage of misclassification on the training data was 0.86, while that on the testing data was 10.8, resulting in predictions with 89.2% accuracy. This Support Vector Machine can predict nurses' intention to quit, without asking these nurses whether they have an intention to quit.

Keywords: Adolescent, Adult, Algorithms, Amino Acid Sequence, Amino Acids, Anatomic, Attitude of Health Personnel, Bacterial Proteins, Bias (Epidemiology), Brain, Brain Mapping, Burnout, Comparative Study, Computer Simulation, Computer-Assisted, Data Interpretation, Diffusion Magnetic Resonance Imaging, Facial Asymmetry, Facial Expression, Facial Paralysis, Female, Gene Expression Profiling, Gram-Negative Bacteria, Gram-Positive Bacteria, Hospital, Humans, Image Interpretation, Intention, Job Satisfaction, Logistic Models, Magnetoencephalography, Male, Middle Aged, Models, Motion, Neural Networks (Computer), Neural Pathways, Non-U.S. Gov't, Nonlinear Dynamics, Nursing Administration Research, Nursing Staff, Personnel Management, Personnel Turnover, Photography, Predictive Value of Tests, Professional, Protein, Proteins, Proteome, Psychological, Questionnaires, Regression Analysis, Reproducibility of Results, Research Support, Retina, Risk Factors, Sequence Alignment, Sequence Analysis, Severity of Illness Index, Software, Statistical, Subcellular Fractions, Taiwan, Theoretical, Workplace, 15494654
[Sun2004protein] Zhenghong Sun, Xiaoli Fu, Lu Zhang, Xiaoli Yang, Feizhou Liu, and Gengxi Hu. A protein chip system for parallel analysis of multi-tumor markers and its application in cancer detection. Anticancer Res, 24(2C):1159-65, 2004. [ bib ]
BACKGROUND: Tumor markers are routinely measured in clinical oncology. However, their value in cancer detection has been controversial largely because no single tumor marker is sensitive and specific enough to meet strict diagnostic criteria. One strategy to overcome the shortcomings of single tumor markers is to measure a combination of tumor markers to increase sensitivity and look for distinct patterns to increase specificity. This study aimed to develop a system for parallel detection of tumor markers as a tool for tumor detection in both cancer patients and asymptomatic populations at high risk. MATERIALS AND METHODS: A protein chip was fabricated with twelve monoclonal antibodies against the following tumor markers respectively: CA125, CA15-3, CA19-9, CA242, CEA, AFP, PSA, free-PSA, HGH, beta-HCG, NSE and ferritin. Tumor markers were captured after the protein chip was incubated with serum samples. A secondary antibody conjugated with HRP was used to detect the captured tumor markers using chemiluminescence technique. Quantification of the tumor markers was obtained after calibration with standard curves. RESULTS: The chip system showed an overall sensitivity of 68.18% after testing 1147 cancer patients, with high sensitivities for liver, pancreas and ovarian tumors and low sensitivities for gastrointestinal tumors, and a specificity of 97.1% after testing 793 healthy individuals. Application of the chip system in physical checkups of 15,867 individuals resulted in 16 cases that were subsequently confirmed as having cancers. Analysis of the detection results with a Support Vector Machine algorithm considerably increased the specificity of the system as reflected in healthy individuals and hepatitis/cirrhosis patients, but only modestly decreased the sensitivity for cancer patients. CONCLUSION: This protein chip system is a potential tool for assisting cancer diagnosis and for screening cancer in high-risk populations.

Keywords: Antibodies, Artificial Intelligence, Biological, Calibration, Female, Horseradish Peroxidase, Humans, Male, Monoclonal, Neoplasms, Protein Array Analysis, Sensitivity and Specificity, Tumor Markers, 15154641
[Stahura2004Virtual] Florence L Stahura and Jürgen Bajorath. Virtual screening methods that complement HTS. Comb Chem High Throughput Screen, 7(4):259-69, Jun 2004. [ bib ]
In this review, we discuss a number of computational methods that have been developed or adapted for molecule classification and virtual screening (VS) of compound databases. In particular, we focus on approaches that are complementary to high-throughput screening (HTS). The discussion is limited to VS methods that operate at the small molecular level, which is often called ligand-based VS (LBVS), and does not take into account docking algorithms or other structure-based screening tools. We describe areas that greatly benefit from combining virtual and biological screening and discuss computational methods that are most suitable to contribute to the integration of screening technologies. Relevant approaches range from established methods such as clustering or similarity searching to techniques that have only recently been introduced for LBVS applications such as statistical methods or support vector machines. Finally, we discuss a number of representative applications at the interface between VS and HTS.

Keywords: Algorithms, Animals, Antisense, Artificial Intelligence, Cell Line, Cluster Analysis, Comparative Study, Computational Biology, Computer Simulation, DNA Fingerprinting, Drug Evaluation, Fluorescence, Fuzzy Logic, Gene Silencing, Gene Targeting, Genetic, Hela Cells, Humans, Imaging, Intracellular Space, Microscopy, Models, Neoplasms, Neural Networks (Computer), Non-U.S. Gov't, Oligonucleotides, P.H.S., Preclinical, Prognosis, Proteomics, Quantitative Structure-Activity Relationship, RNA, RNA Interference, Research Support, Sensitivity and Specificity, Small Interfering, Thionucleotides, Three-Dimensional, Tumor, U.S. Gov't, 15200375
[Song2004Comparison] Xiaowei Song, Arnold Mitnitski, Jafna Cox, and Kenneth Rockwood. Comparison of machine learning techniques with classical statistical models in predicting health outcomes. Medinfo, 11(Pt 1):736-40, 2004. [ bib ]
Several machine learning techniques (multilayer and single layer perceptron, logistic regression, least square linear separation and support vector machines) are applied to calculate the risk of death from two biomedical data sets, one from patient care records, and another from a population survey. Each dataset contained multiple sources of information: history of related symptoms and other illnesses, physical examination findings, laboratory tests, medications (patient records dataset), health attitudes, and disabilities in activities of daily living (survey dataset). Each technique showed very good mortality prediction in the acute patients data sample (AUC up to 0.89) and fair prediction accuracy for six year mortality (AUC from 0.70 to 0.76) in individuals from epidemiological database surveys. The results suggest that the nature of data is of primary importance rather than the learning technique. However, the consistently superior performance of the artificial neural network (multi-layer perceptron) indicates that nonlinear relationships (which cannot be discerned by linear separation techniques) can provide additional improvement in correctly predicting health outcomes.

Keywords: Aged, Air, Algorithms, Amino Acids, Animals, Area Under Curve, Artifacts, Artificial Intelligence, Atrial, Automated, Canada, Carotid Stenosis, Cerebrovascular Accident, Cerebrovascular Circulation, Comparative Study, Computer-Assisted, Cysteine, Decision Trees, Dementia, Diagnosis, Disulfides, Doppler, Embolism, Expert Systems, Extramural, Factor Analysis, Female, Gene Expression, Gene Expression Profiling, Health Status, Heart Septal Defects, Humans, Intracranial Embolism, Male, Models, Molecular, Myocardial Infarction, N.I.H., Neoplasms, Neural Networks (Computer), Non-U.S. Gov't, Oligonucleotide Array Sequence Analysis, Oxidation-Reduction, P.H.S., Pattern Recognition, Prognosis, Protein Binding, Protein Folding, Proteins, ROC Curve, Research Support, Sensitivity and Specificity, Software, Statistical, Transcranial, Treatment Outcome, U.S. Gov't, Ultrasonography, 15360910
[Smith2004Towards] P. A. Smith, M. J. Sorich, L. S C Low, R. A. McKinnon, and J. O. Miners. Towards integrated ADME prediction: past, present and future directions for modelling metabolism by UDP-glucuronosyltransferases. J Mol Graph Model, 22(6):507-17, Jul 2004. [ bib | DOI | http | .pdf ]
Undesirable absorption, distribution, metabolism, excretion (ADME) properties are the cause of many drug development failures and this has led to the need to identify such problems earlier in the development process. This review highlights computational (in silico) approaches that have been used to identify the characteristics of ligands influencing molecular recognition and/or metabolism by the drug-metabolising enzyme UDP-gucuronosyltransferase (UGT). Current studies applying pharmacophore elucidation, 2D-quantitative structure metabolism relationships (2D-QSMR), 3D-quantitative structure metabolism relationships (3D-QSMR), and non-linear pattern recognition techniques such as artificial neural networks and support vector machines for modelling metabolism by UGT are reported. An assessment of the utility of in silico approaches for the qualitative and quantitative prediction of drug glucuronidation parameters highlights the benefit of using multiple pharmacophores and also non-linear techniques for classification. Some of the challenges facing the development of generalisable models for predicting metabolism by UGT, including the need for screening of more diverse structures, are also outlined.

Keywords: Algorithms, Animals, Antisense, Artificial Intelligence, Astrocytoma, Automated, Autonomic Nervous System, Brain, Brain Neoplasms, Cell Line, Cerebral Cortex, Child, Cluster Analysis, Cognition, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, DNA Fingerprinting, Databases, Diagnosis, Discriminant Analysis, Drug Design, Drug Evaluation, Electroencephalography, Emotions, Event-Related Potentials, Evoked Potentials, Factual, Fluorescence, Fuzzy Logic, Gene Silencing, Gene Targeting, Genetic, Glucuronosyltransferase, Hand, Hela Cells, Humans, Imaging, Intracellular Space, Magnetic Resonance Spectroscopy, Male, Meningeal Neoplasms, Meningioma, Microscopy, Models, Molecular Structure, Monitoring, Motor, Neoplasm Metastasis, Neoplasms, Neural Networks (Computer), Non-U.S. Gov't, Oligonucleotides, P.H.S., P300, Pattern Recognition, Peptides, Pharmaceutical Preparations, Physiologic, Preclinical, Predictive Value of Tests, Preschool, Prognosis, Protein Interaction Mapping, Protein Structure, Proteins, Proteomics, Quantitative Structure-Activity Relationship, Quaternary, RNA, RNA Interference, Recognition (Psychology), Reproducibility of Results, Research Support, Sensitivity and Specificity, Signal Processing, Small Interfering, Software, Thionucleotides, Three-Dimensional, Tumor, U.S. Gov't, User-Computer Interface, Word Processing, 15182810
[Shulman-Peleg2004Recognition] Alexandra Shulman-Peleg, Ruth Nussinov, and Haim J Wolfson. Recognition of functional sites in protein structures. J Mol Biol, 339(3):607-633, Jun 2004. [ bib | DOI | http ]
Recognition of regions on the surface of one protein, that are similar to a binding site of another is crucial for the prediction of molecular interactions and for functional classifications. We first describe a novel method, SiteEngine, that assumes no sequence or fold similarities and is able to recognize proteins that have similar binding sites and may perform similar functions. We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites. Our method is robust and efficient enough to allow computationally demanding applications such as the first and the third. From the biological standpoint, the first application may identify secondary binding sites of drugs that may lead to side-effects. The third application finds new potential sites on the protein that may provide targets for drug design. Each of the three applications may aid in assigning a function and in classification of binding patterns. We highlight the advantages and disadvantages of each type of search, provide examples of large-scale searches of the entire Protein Data Base and make functional predictions.

Keywords: Algorithms; Catalytic Domain; Hydrogen Bonding; Models, Molecular; Protein Conformation; Proteins, chemistry
[Shoeb2004Patient-specific] Ali Shoeb, Herman Edwards, Jack Connolly, Blaise Bourgeois, S. Ted Treves, and John Guttag. Patient-specific seizure onset detection. Epilepsy Behav, 5(4):483-98, Aug 2004. [ bib | DOI | http | .pdf ]
This article presents an automated, patient-specific method for the detection of epileptic seizure onset from noninvasive electroencephalography. We adopt a patient-specific approach to exploit the consistency of an individual patient's seizure and nonseizure electroencephalograms. Our method uses a wavelet decomposition to construct a feature vector that captures the morphology and spatial distribution of an electroencephalographic epoch, and then determines whether that vector is representative of a patient's seizure or nonseizure electroencephalogram using the support vector machine classification algorithm. Our completely automated method was tested on noninvasive electroencephalograms from 36 pediatric subjects suffering from a variety of seizure types. It detected 131 of 139 seizure events within 8.0+/-3.2 seconds of electrographic onset, and declared 15 false detections in 60 hours of clinical electroencephalography. Our patient-specific method can be used to initiate delay-sensitive clinical procedures following seizure onset, for example, the injection of a functional imaging radiotracer.

Keywords: Algorithms, Comparative Study, Computational Biology, Computer-Assisted, Databases, Diagnosis, Drug Resistance, Electroencephalography, Epilepsy, Forecasting, Genetic, Genotype, HIV Protease Inhibitors, HIV-1, Humans, Information Management, Information Storage and Retrieval, Kinetics, Linear Models, Microbial Sensitivity Tests, Models, Monitoring, Non-U.S. Gov't, P.H.S., Periodicals, Physiologic, Point Mutation, Pyrimidinones, Reaction Time, Research Support, Reverse Transcriptase Inhibitors, Signal Processing, Theoretical, Time Factors, U.S. Gov't, Viral, 15256184
[Seeger2004Gaussian] Matthias Seeger. Gaussian processes for machine learning. Int J Neural Syst, 14(2):69-106, Apr 2004. [ bib ]
Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.

Keywords: Algorithms, Amino Acids, Antibodies, Artificial Intelligence, Astrocytoma, Automated, Bayes Theorem, Biological, Biopsy, Brain, Brain Mapping, Brain Neoplasms, Calibration, Comparative Study, Computational Biology, Computer-Assisted, Computing Methodologies, Cysteine, Cystine, Dysplastic Nevus Syndrome, Electrodes, Electroencephalography, Entropy, Eosine Yellowish-(YS), Evoked Potentials, Female, Gene Expression Profiling, Hematoxylin, Horseradish Peroxidase, Humans, Image Interpretation, Image Processing, Imagery (Psychotherapy), Imagination, Laterality, Linear Models, Male, Melanoma, Models, Monoclonal, Movement, Neoplasms, Neural Networks (Computer), Neuropeptides, Non-P.H.S., Non-U.S. Gov't, Nonparametric, Normal Distribution, P.H.S., Pattern Recognition, Perception, Principal Component Analysis, Protein, Protein Array Analysis, Protein Interaction Mapping, Proteins, Regression Analysis, Research Support, Sensitivity and Specificity, Sequence Alignment, Sequence Ana, Sequence Analysis, Skin Neoplasms, Software, Statistical, Statistics, Tumor Markers, U.S. Gov't, User-Computer Interface, World Health Organization, lysis, 15112367
[Ross2004Multiplexed] Philip L Ross, Yulin N Huang, Jason N Marchese, Brian Williamson, Kenneth Parker, Stephen Hattan, Nikita Khainovski, Sasi Pillai, Subhakar Dey, Scott Daniels, Subhasish Purkayastha, Peter Juhasz, Stephen Martin, Michael Bartlet-Jones, Feng He, Allan Jacobson, and Darryl J Pappin. Multiplexed protein quantitation in saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics, 3(12):1154-1169, Dec 2004. [ bib | DOI | http ]
We describe here a multiplexed protein quantitation strategy that provides relative and absolute measurements of proteins in complex mixtures. At the core of this methodology is a multiplexed set of isobaric reagents that yield amine-derivatized peptides. The derivatized peptides are indistinguishable in MS, but exhibit intense low-mass MS/MS signature ions that support quantitation. In this study, we have examined the global protein expression of a wild-type yeast strain and the isogenic upf1Delta and xrn1Delta mutant strains that are defective in the nonsense-mediated mRNA decay and the general 5' to 3' decay pathways, respectively. We also demonstrate the use of 4-fold multiplexing to enable relative protein measurements simultaneously with determination of absolute levels of a target protein using synthetic isobaric peptide standards. We find that inactivation of Upf1p and Xrn1p causes common as well as unique effects on protein expression.

Keywords: Cations; Chromatography, Ion Exchange; Chromatography, Liquid; Down-Regulation; Exoribonucleases; Fungal Proteins; Indicators and Reagents; Ions; Mass Spectrometry; Models, Chemical; Peptides; Phenotype; Proteomics; RNA Helicases; RNA, Messenger; Saccharomyces cerevisiae; Saccharomyces cerevisiae Proteins; Succinimides
[Qin2004[Automated] Dong mei Qin, Zhan yi Hu, and Yong heng Zhao. Automated classification of celestial spectra based on support vector machines. Guang Pu Xue Yu Guang Pu Fen Xi, 24(4):507-11, Apr 2004. [ bib ]
The main objective of an automatic recognition system of celestial objects via their spectra is to classify celestial spectra and estimate physical parameters automatically. This paper proposes a new automatic classification method based on support vector machines to separate non-active objects from active objects via their spectra. With low SNR and unknown red-shift value, it is difficult to extract true spectral lines, and as a result, active objects can not be determined by finding strong spectral lines and the spectral classification between non-active and active objects becomes difficult. The proposed method in this paper combines the principal component analysis with support vector machines, and can automatically recognize the spectra of active objects with unknown red-shift values from non-active objects. It finds its applicability in the automatic processing of voluminous observed data from large sky surveys in astronomy.

Keywords: 80 and over, Adult, Aged, Algorithms, Amino Acids, Animals, Area Under Curve, Artifacts, Automated, Birefringence, Brain Chemistry, Brain Neoplasms, Comparative Study, Computer-Assisted, Cornea, Cross-Sectional Studies, Decision Trees, Diagnosis, Diagnostic Imaging, Diagnostic Techniques, Discriminant Analysis, Evolution, Face, Female, Genetic, Glaucoma, Humans, Intraocular Pressure, Lasers, Least-Squares Analysis, Magnetic Resonance Imaging, Magnetic Resonance Spectroscopy, Male, Middle Aged, Models, Molecular, Nerve Fibers, Non-U.S. Gov't, Numerical Analysis, Ophthalmological, Optic Nerve Diseases, Optical Coherence, P.H.S., Pattern Recognition, Photic Stimulation, Prospective Studies, Protein, ROC Curve, Regression Analysis, Research Support, Retinal Ganglion Cells, Sensitivity and Specificity, Sequence Analysis, Statistics, Tomography, U.S. Gov't, Visual Fields, beta-Lactamases, 15766170
[Prados2004Mining] J. Prados, A. Kalousis, J.C. Sanchez, L. Allard, O. Carrette, and M. Hilario. Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents. Proteomics, 4(8):2320-2332, 2004. [ bib | DOI | http | .pdf ]
In this paper we try to identify potential biomarkers for early stroke diagnosis using surface-enhanced laser desorption/ionization mass spectrometry coupled with analysis tools from machine learning and data mining. Data consist of 42 specimen samples, i.e., mass spectra divided in two big categories, stroke and control specimens. Among the stroke specimens two further categories exist that correspond to ischemic and hemorrhagic stroke; in this paper we limit our data analysis to discriminating between control and stroke specimens. We performed two suites of experiments. In the first one we simply applied a number of different machine learning algorithms; in the second one we have chosen the best performing algorithm as it was determined from the first phase and coupled it with a number of different feature selection methods. The reason for this was 2-fold, first to establish whether feature selection can indeed improve performance, which in our case it did not seem to confirm, but more importantly to acquire a small list of potentially interesting biomarkers. Of the different methods explored the most promising one was support vector machines which gave us high levels of sensitivity and specificity. Finally, by analyzing the models constructed by support vector machines we produced a small set of 13 features that could be used as potential biomarkers, and which exhibited good performance both in terms of sensitivity, specificity and model stability.

Keywords: biosvm proteomics
[Pavlidis2004Support] Paul Pavlidis, Ilan Wapinski, and William Stafford Noble. Support vector machine classification on the web. Bioinformatics, 20(4):586-7, Mar 2004. [ bib | DOI | http | .pdf ]
The support vector machine (SVM) learning algorithm has been widely applied in bioinformatics. We have developed a simple web interface to our implementation of the SVM algorithm, called Gist. This interface allows novice or occasional users to apply a sophisticated machine learning algorithm easily to their data. More advanced users can download the software and source code for local installation. The availability of these tools will permit more widespread application of this powerful learning algorithm in bioinformatics.

Keywords: Adaptation, Algorithms, Ambergris, Amino Acid Sequence, Animals, Artifacts, Artificial Intelligence, Automated, Cadmium, Candida, Candida albicans, Capillary, Clinical, Cluster Analysis, Combinatorial Chemistry Techniques, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, Computing Methodologies, Databases, Decision Support Systems, Electrophoresis, Enzymes, Europe, Eye Enucleation, Humans, Image Interpretation, Image Processing, Information Storage and Retrieval, Internet, Magnetic Resonance Imaging, Magnetic Resonance Spectroscopy, Markov Chains, Melanoma, Models, Molecular, Molecular Conformation, Molecular Sequence Data, Molecular Structure, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Nonlinear Dynamics, Odors, P.H.S., Pattern Recognition, Perfume, Physiological, Predictive Value of Tests, Prognosis, Prospective Studies, Protein, Protein Structure, Proteins, Proteomics, Quantitative Structure-Activity Relationship, Rats, Reproducibility of Results, Research Support, Saccharomyces cerevisiae, Saccharomyces cerevisiae Proteins, Secondary, Sensitivity and Specificity, Signal Processing, Single-Blind Method, Soft Tissue Neoplasms, Software, Statistical, U.S. Gov't, Uveal Neoplasms, Visual, 14990457
[Li2004Fusing] Shutao Li, James Tin-Yau Kwok, Ivor Wai-Hung Tsang, and Yaonan Wang. Fusing images with different focuses using support vector machines. IEEE Trans Neural Netw, 15(6):1555-61, Nov 2004. [ bib ]
Many vision-related processing tasks, such as edge detection, image segmentation and stereo matching, can be performed more easily when all objects in the scene are in good focus. However, in practice, this may not be always feasible as optical lenses, especially those with long focal lengths, only have a limited depth of field. One common approach to recover an everywhere-in-focus image is to use wavelet-based image fusion. First, several source images with different focuses of the same scene are taken and processed with the discrete wavelet transform (DWT). Among these wavelet decompositions, the wavelet coefficient with the largest magnitude is selected at each pixel location. Finally, the fused image can be recovered by performing the inverse DWT. In this paper, we improve this fusion procedure by applying the discrete wavelet frame transform (DWFT) and the support vector machines (SVM). Unlike DWT, DWFT yields a translation-invariant signal representation. Using features extracted from the DWFT coefficients, a SVM is trained to select the source image that has the best focus at each pixel location, and the corresponding DWFT coefficients are then incorporated into the composite wavelet representation. Experimental results show that the proposed method outperforms the traditional approach both visually and quantitatively.

Keywords: Algorithms, Amino Acid, Amino Acids, Artificial Intelligence, Ascomycota, Automated, Base Sequence, Chromosome Mapping, Codon, Colonic Neoplasms, Comparative Study, Computer Simulation, Computer-Assisted, Computing Methodologies, Crystallography, DNA, DNA Primers, Databases, Diagnostic Imaging, Enzymes, Fixation, Gene Expression Profiling, Genetic, Hordeum, Host-Parasite Relations, Humans, Image Enhancement, Image Interpretation, Informatics, Information Storage and Retrieval, Kinetics, Magnetic Resonance Spectroscopy, Models, Nanotechnology, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Ocular, Oligonucleotide Array Sequence Analysis, P.H.S., Pattern Recognition, Plant, Plants, Predictive Value of Tests, Protein, Protein Conformation, Research Support, Sample Size, Selection (Genetics), Sequence Alignment, Sequence Analysis, Sequence Homology, Signal Processing, Skin, Software, Statistical, Subtraction Technique, Theoretical, Thermodynamics, U.S. Gov't, Viral Proteins, X-Ray, 15565781
[Lal2004Support] Thomas Navin Lal, Michael Schröder, Thilo Hinterberger, Jason Weston, Martin Bogdan, Niels Birbaumer, and Bernhard Schölkopf. Support vector channel selection in BCI. IEEE Trans Biomed Eng, 51(6):1003-10, Jun 2004. [ bib ]
Designing a brain computer interface (BCI) system one can choose from a variety of features that may be useful for classifying brain activity during a mental task. For the special case of classifying electroencephalogram (EEG) signals we propose the usage of the state of the art feature selection algorithms Recursive Feature Elimination and Zero-Norm Optimization which are based on the training of support vector machines (SVM). These algorithms can provide more accurate solutions than standard filter methods for feature selection. We adapt the methods for the purpose of selecting EEG channels. For a motor imagery paradigm we show that the number of used channels can be reduced significantly without increasing the classification error. The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks. Furthermore we show how time dependent task specific information can be visualized.

Keywords: Algorithms, Animals, Antisense, Artificial Intelligence, Automated, Autonomic Nervous System, Brain, Cell Line, Cerebral Cortex, Child, Cluster Analysis, Cognition, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, DNA Fingerprinting, Databases, Drug Evaluation, Electroencephalography, Emotions, Event-Related Potentials, Evoked Potentials, Factual, Fluorescence, Fuzzy Logic, Gene Silencing, Gene Targeting, Genetic, Hand, Hela Cells, Humans, Imaging, Intracellular Space, Male, Microscopy, Models, Monitoring, Motor, Neoplasms, Neural Networks (Computer), Non-U.S. Gov't, Oligonucleotides, P.H.S., P300, Pattern Recognition, Peptides, Physiologic, Preclinical, Predictive Value of Tests, Preschool, Prognosis, Protein Interaction Mapping, Protein Structure, Proteins, Proteomics, Quantitative Structure-Activity Relationship, Quaternary, RNA, RNA Interference, Recognition (Psychology), Reproducibility of Results, Research Support, Sensitivity and Specificity, Signal Processing, Small Interfering, Software, Thionucleotides, Three-Dimensional, Tumor, U.S. Gov't, User-Computer Interface, Word Processing, 15188871
[Kharchenko2004Filling] P. Kharchenko, D. Vitkup, and G. M. Church. Filling gaps in a metabolic network using expression information. Bioinformatics, 20 Suppl 1:I178-I185, Aug 2004. [ bib | DOI | http ]
MOTIVATION: The metabolic models of both newly sequenced and well-studied organisms contain reactions for which the enzymes have not been identified yet. We present a computational approach for identifying genes encoding such missing metabolic enzymes in a partially reconstructed metabolic network. RESULTS: The metabolic expression placement (MEP) method relies on the coexpression properties of the metabolic network and is complementary to the sequence homology and genome context methods that are currently being used to identify missing metabolic genes. The MEP algorithm predicts over 20% of all known Saccharomyces cerevisiae metabolic enzyme-encoding genes within the top 50 out of 5594 candidates for their enzymatic function, and 70% of metabolic genes whose expression level has been significantly perturbed across the conditions of the expression dataset used. AVAILABILITY: Freely available (in Supplementary information). SUPPLEMENTARY INFORMATION: Available at the following URL http://arep.med.harvard.edu/kharchenko/mep/supplements.html

Keywords: Bacterial, Binding Sites, Biological, Comparative Study, DNA, Energy Metabolism, Enzyme Induction, Enzymes, Escherichia coli Proteins, Fungal, Gene Expression Regulation, Genes, Genetic, Genome, Models, Non-P.H.S., Non-U.S. Gov't, Phylogeny, Promoter Regions (Genetics), Protein, Research Support, Saccharomyces cerevisiae, Saccharomyces cerevisiae Proteins, Sequence Analysis, Systems Biology, Transcription Factors, U.S. Gov't, 15262797
[Kaper2004BCI] Matthias Kaper, Peter Meinicke, Ulf Grossekathoefer, Thomas Lingner, and Helge Ritter. BCI Competition 2003-Data set IIb: support vector machines for the P300 speller paradigm. IEEE Trans Biomed Eng, 51(6):1073-6, Jun 2004. [ bib ]
We propose an approach to analyze data from the P300 speller paradigm using the machine-learning technique support vector machines. In a conservative classification scheme, we found the correct solution after five repetitions. While the classification within the competition is designed for offline analysis, our approach is also well-suited for a real-world online solution: It is fast, requires only 10 electrode positions and demands only a small amount of preprocessing.

Keywords: Algorithms, Animals, Antisense, Artificial Intelligence, Automated, Autonomic Nervous System, Brain, Cell Line, Child, Cluster Analysis, Cognition, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, DNA Fingerprinting, Databases, Drug Evaluation, Electroencephalography, Emotions, Event-Related Potentials, Factual, Fluorescence, Fuzzy Logic, Gene Silencing, Gene Targeting, Genetic, Hela Cells, Humans, Imaging, Intracellular Space, Microscopy, Models, Monitoring, Neoplasms, Neural Networks (Computer), Non-U.S. Gov't, Oligonucleotides, P.H.S., P300, Pattern Recognition, Peptides, Physiologic, Preclinical, Predictive Value of Tests, Preschool, Prognosis, Protein Interaction Mapping, Protein Structure, Proteins, Proteomics, Quantitative Structure-Activity Relationship, Quaternary, RNA, RNA Interference, Recognition (Psychology), Reproducibility of Results, Research Support, Sensitivity and Specificity, Signal Processing, Small Interfering, Software, Thionucleotides, Three-Dimensional, Tumor, U.S. Gov't, User-Computer Interface, Word Processing, 15188881
[Hizukuri2004Extraction] Yoshiyuki Hizukuri, Yoshihiro Yamanishi, Kosuke Hashimoto, and Minoru Kanehisa. Extraction of species-specific glycan substructures. Genome Inform Ser Workshop Genome Inform, 15(1):69-81, 2004. [ bib | .html | .pdf ]
Glycans, which are carbohydrate sugar chains attached to some lipids or proteins, have a huge variety of structures and play a key role in cell communication, protein interaction and immunity. The availability of a number of glycan structures stored in the KEGG/GLYCAN database makes it possible for us to conduct a large-scale comparative research of glycans. In this paper, we present a novel approach to compare glycan structures and extract characteristic glycan substructures of certain organisms. In the algorithm we developed a new similarity measure of glycan structures taking into account of several biological aspects of glycan synthesis and glycosyltransferases, and we confirmed the validity of our similarity measure by conducting experiments on its ability to classify glycans between organisms in the framework of a support vector machine. Finally, our method successfully extracted a set of candidates of substructrues which are characteristic to human, rat, mouse, bovine, pig, chicken, yeast, wheat and sycamore, respectively. We confirmed that the characteristic substructures extracted by our method correspond to the substructures which are known as the species-specific sugar chain of gamma-glutamyltranspeptidases in the kidney.

Keywords: Amino Acid Sequence, Animals, Carbohydrate Conformation, Carbohydrate Sequence, Cattle, Computer Simulation, Databases, Genes, Histocompatibility Antigens Class I, Humans, Least-Squares Analysis, MHC Class I, Major Histocompatibility Complex, Mice, Monosaccharides, Non-U.S. Gov't, Peptides, Phylogeny, Plants, Polysaccharides, Protein, Rats, Research Support, Saccharomyces cerevisiae, Species Specificity, 15712111
[Graumann2004Applicability] Johannes Graumann, Leslie A Dunipace, Jae Hong Seol, W. Hayes McDonald, John R Yates, Barbara J Wold, and Raymond J Deshaies. Applicability of tandem affinity purification MudPIT to pathway proteomics in yeast. Mol Cell Proteomics, 3(3):226-37, Mar 2004. [ bib | DOI | http | .pdf ]
A combined multidimensional chromatography-mass spectrometry approach known as "MudPIT" enables rapid identification of proteins that interact with a tagged bait while bypassing some of the problems associated with analysis of polypeptides excised from SDS-polyacrylamide gels. However, the reproducibility, success rate, and applicability of MudPIT to the rapid characterization of dozens of proteins have not been reported. We show here that MudPIT reproducibly identified bona fide partners for budding yeast Gcn5p. Additionally, we successfully applied MudPIT to rapidly screen through a collection of tagged polypeptides to identify new protein interactions. Twenty-five proteins involved in transcription and progression through mitosis were modified with a new tandem affinity purification (TAP) tag. TAP-MudPIT analysis of 22 yeast strains that expressed these tagged proteins uncovered known or likely interacting partners for 21 of the baits, a figure that compares favorably with traditional approaches. The proteins identified here comprised 102 previously known and 279 potential physical interactions. Even for the intensively studied Swi2p/Snf2p, the catalytic subunit of the Swi/Snf chromatin remodeling complex, our analysis uncovered a new interacting protein, Rtt102p. Reciprocal tagging and TAP-MudPIT analysis of Rtt102p revealed subunits of both the Swi/Snf and RSC complexes, identifying Rtt102p as a common interactor with, and possible integral component of, these chromatin remodeling machines. Our experience indicates it is feasible for an investigator working with a single ion trap instrument in a conventional molecular/cellular biology laboratory to carry out proteomic characterization of a pathway, organelle, or process (i.e. "pathway proteomics") by systematic application of TAP-MudPIT.

Keywords: Affinity Labels, Comparative Study, Electrospray Ionization, Genetic, Mass, Mitosis, Non-P.H.S., Non-U.S. Gov't, P.H.S., Protein Interaction Mapping, Proteome, Proteomics, Research Support, Saccharomyces cerevisiae, Saccharomyces cerevisiae Proteins, Signal Transduction, Spectrometry, Transcription, U.S. Gov't, 14660704
[Glotsos2004Computer-based] Dimitris Glotsos, Panagiota Spyridonos, Panagiotis Petalas, Dionisis Cavouras, Panagiota Ravazoula, Petroula-Arampatoni Dadioti, Ioanna Lekka, and George Nikiforidis. Computer-based malignancy grading of astrocytomas employing a support vector machine classifier, the WHO grading system and the regular hematoxylin-eosin diagnostic staining procedure. Anal Quant Cytol Histol, 26(2):77-83, Apr 2004. [ bib ]
OBJECTIVE: To investigate and develop an automated technique for astrocytoma malignancy grading compatible with the clinical routine. STUDY DESIGN: One hundred forty biopsies of astrocytomas were collected from 2 hospitals. The degree of tumor malignancy was defined as low or high according to the World Health Organization grading system. From each biopsy, images were digitized and segmented to isolate nuclei from background tissue. Morphologic and textural nuclear features were quantified to encode tumor malignancy. Each case was represented by a 40-dimensional feature vector. An exhaustive search procedure in feature space was utilized to determine the best feature combination that resulted in the smallest classification error. Low and high grade tumors were discriminated using support vector machines (SVMs). To evaluate the system performance, all available data were split randomly into training and test sets. RESULTS: The best vector combination consisted of 3 textural and 2 morphologic features. Low and high grade cases were discriminated with an accuracy of 90.7% and 88.9%, respectively, using an SVM classifier with polynomial kernel of degree 2. CONCLUSION: The proposed methodology was based on standards that are common in daily clinical practice and might be used in parallel with conventional grading as a second-opinion tool to reduce subjectivity in the classification of astrocytomas.

Keywords: Amino Acids, Antibodies, Artificial Intelligence, Astrocytoma, Biological, Biopsy, Brain, Brain Mapping, Brain Neoplasms, Calibration, Comparative Study, Computational Biology, Computer-Assisted, Cysteine, Cystine, Electrodes, Electroencephalography, Eosine Yellowish-(YS), Evoked Potentials, Female, Hematoxylin, Horseradish Peroxidase, Humans, Image Processing, Imagery (Psychotherapy), Imagination, Laterality, Male, Monoclonal, Movement, Neoplasms, Non-P.H.S., Non-U.S. Gov't, P.H.S., Perception, Principal Component Analysis, Protein, Protein Array Analysis, Proteins, Research Support, Sensitivity and Specificity, Sequence Analysis, Software, Tumor Markers, U.S. Gov't, User-Computer Interface, World Health Organization, 15131894
[Glotsos2004Automated] Dimitris Glotsos, Panagiota Spyridonos, Dionisis Cavouras, Panagiota Ravazoula, Petroula-Arampantoni Dadioti, and George Nikiforidis. Automated segmentation of routinely hematoxylin-eosin-stained microscopic images by combining support vector machine clustering and active contour models. Anal Quant Cytol Histol, 26(6):331-40, Dec 2004. [ bib ]
OBJECTIVE: To develop a method for the automated segmentation of images of routinely hematoxylin-eosin (H-E)-stained microscopic sections to guarantee correct results in computer-assisted microscopy. STUDY DESIGN: Clinical material was composed 50 H-E-stained biopsies of astrocytomas and 50 H-E-stained biopsies of urinary bladder cancer. The basic idea was to use a support vector machine clustering (SVMC) algorithm to provide gross segmentation of regions holding nuclei and subsequently to refine nuclear boundary detection with active contours. The initialization coordinates of the active contour model were defined using a SVMC pixel-based classification algorithm that discriminated nuclear regions from the surrounding tissue. Starting from the boundaries of these regions, the snake fired and propagated until converging to nuclear boundaries. RESULTS: The method was validated for 2 different types of H-E-stained images. Results were evaluated by 2 histopathologists. On average, 94% of nuclei were correctly delineated. CONCLUSION: The proposed algorithm could be of value in computer-based systems for automated interpretation of microscopic images.

Keywords: Adenosinetriphosphatase, Adolescent, Adult, Algorithms, Amino Acid Sequence, Amino Acids, Animals, Astrocytoma, Automated, Automation, Base Sequence, Bayes Theorem, Biological, Biopsy, Bladder Neoplasms, Breast Neoplasms, Carbohydrate Conformation, Carbohydrate Sequence, Cattle, Cell Cycle Proteins, Cell Nucleus, Computational Biology, Computer Simulation, Computer-Assisted, Crystallography, DNA, Databases, Diagnosis, Differential, Eosine Yellowish-(YS), Exoribonucleases, Factual, False Negative Reactions, False Positive Reactions, Female, Gene Expression, Gene Expression Profiling, Genes, Genetic, Genetic Techniques, Genetic Vectors, Genome, Hematoxylin, Histocompatibility Antigens Class I, Human, Humans, Image Interpretation, Image Processing, Introns, Least-Squares Analysis, MHC Class I, Major Histocompatibility Complex, Markov Chains, Messenger, Mice, Middle Aged, Models, Molecular Structure, Monosaccharides, Multigene Family, Mutation, Neoplasms, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Nonparametric, Nucleotidyltransferases, Observer Variation, Oligonucleotide Array Sequence Analysis, P.H.S., Pattern Recognition, Peptides, Phenotype, Phylogeny, Plants, Poly A, Polysaccharides, Predictive Value of Tests, Protein, Protein Biosynthesis, Protein Kinase Inhibitors, Protein Structure, Proteins, RNA, RNA Helicases, RNA Splicing, Rats, Reproducibility of Results, Research Support, Retrospective Studies, Saccharomyces cerevisiae, Saccharomyces cerevisiae Proteins, Secondary, Sensitivity and Specificity, Sequence Alignment, Software, Species Specificity, Staining and Labeling, Statistics, Theoretical, Transcription, U.S. Gov't, Ultrasonography, X-Ray, 15678615
[Faugeras2004Variational] Olivier Faugeras, Geoffray Adde, Guillaume Charpiat, Christophe Chefd'hotel, Maureen Clerc, Thomas Deneux, Rachid Deriche, Gerardo Hermosillo, Renaud Keriven, Pierre Kornprobst, Jan Kybic, Christophe Lenglet, Lucero Lopez-Perez, Théo Papadopoulo, Jean-Philippe Pons, Florent Segonne, Bertrand Thirion, David Tschumperlé, Thierry Viéville, and Nicolas Wotawa. Variational, geometric, and statistical methods for modeling brain anatomy and function. Neuroimage, 23 Suppl 1:S46-55, 2004. [ bib | DOI | http | .pdf ]
We survey the recent activities of the Odyssée Laboratory in the area of the application of mathematics to the design of models for studying brain anatomy and function. We start with the problem of reconstructing sources in MEG and EEG, and discuss the variational approach we have developed for solving these inverse problems. This motivates the need for geometric models of the head. We present a method for automatically and accurately extracting surface meshes of several tissues of the head from anatomical magnetic resonance (MR) images. Anatomical connectivity can be extracted from diffusion tensor magnetic resonance images but, in the current state of the technology, it must be preceded by a robust estimation and regularization stage. We discuss our work based on variational principles and show how the results can be used to track fibers in the white matter (WM) as geodesics in some Riemannian space. We then go to the statistical modeling of functional magnetic resonance imaging (fMRI) signals from the viewpoint of their decomposition in a pseudo-deterministic and stochastic part that we then use to perform clustering of voxels in a way that is inspired by the theory of support vector machines and in a way that is grounded in information theory. Multimodal image matching is discussed next in the framework of image statistics and partial differential equations (PDEs) with an eye on registering fMRI to the anatomy. The paper ends with a discussion of a new theory of random shapes that may prove useful in building anatomical and functional atlases.

Keywords: Adolescent, Adult, Algorithms, Anatomic, Bacterial Proteins, Brain, Brain Mapping, Comparative Study, Computer Simulation, Computer-Assisted, Diffusion Magnetic Resonance Imaging, Facial Asymmetry, Facial Expression, Facial Paralysis, Female, Gene Expression Profiling, Gram-Negative Bacteria, Gram-Positive Bacteria, Humans, Image Interpretation, Magnetoencephalography, Male, Middle Aged, Models, Motion, Neural Pathways, Non-U.S. Gov't, Photography, Protein, Proteome, Research Support, Retina, Sequence Alignment, Sequence Analysis, Severity of Illness Index, Software, Statistical, Subcellular Fractions, 15501100
[Doytchinova2004Identifying] Irini A Doytchinova, Pingping Guan, and Darren R Flower. Identifying human MHC supertypes using bioinformatic methods. J. Immunol., 172(7):4314-4323, Apr 2004. [ bib ]
Classification of MHC molecules into supertypes in terms of peptide-binding specificities is an important issue, with direct implications for the development of epitope-based vaccines with wide population coverage. In view of extremely high MHC polymorphism (948 class I and 633 class II HLA alleles) the experimental solution of this task is presently impossible. In this study, we describe a bioinformatics strategy for classifying MHC molecules into supertypes using information drawn solely from three-dimensional protein structure. Two chemometric techniques-hierarchical clustering and principal component analysis-were used independently on a set of 783 HLA class I molecules to identify supertypes based on structural similarities and molecular interaction fields calculated for the peptide binding site. Eight supertypes were defined: A2, A3, A24, B7, B27, B44, C1, and C4. The two techniques gave 77% consensus, i.e., 605 HLA class I alleles were classified in the same supertype by both methods. The proposed strategy allowed "supertype fingerprints" to be identified. Thus, the A2 supertype fingerprint is Tyr(9)/Phe(9), Arg(97), and His(114) or Tyr(116); the A3-Tyr(9)/Phe(9)/Ser(9), Ile(97)/Met(97) and Glu(114) or Asp(116); the A24-Ser(9) and Met(97); the B7-Asn(63) and Leu(81); the B27-Glu(63) and Leu(81); for B44-Ala(81); the C1-Ser(77); and the C4-Asn(77).

Keywords: Alleles; Amino Acid Motifs; Binding Sites; Computational Biology; DNA Fingerprinting; HLA Antigens; HLA-A Antigens; HLA-B Antigens; HLA-C Antigens; Histocompatibility Antigens Class I; Histocompatibility Testing; Humans; Multigene Family; Protein Interaction Mapping
[Darbellay2004Solid] Georges A Darbellay, Rebecca Duff, Jean-Marc Vesin, Paul-André Despland, Dirk W Droste, Carlos Molina, Joachim Serena, Roman Sztajzel, Patrick Ruchat, Theodoros Karapanayiotides, Afksendyios Kalangos, Julien Bogousslavsky, Erich B Ringelstein, and Gérald Devuyst. Solid or gaseous circulating brain emboli: are they separable by transcranial ultrasound? J Cereb Blood Flow Metab, 24(8):860-8, Aug 2004. [ bib ]
High-intensity transient signals (HITS) detected by transcranial Doppler (TCD) ultrasound may correspond to artifacts or to microembolic signals, the latter being either solid or gaseous emboli. The goal of this study was to assess what can be achieved with an automatic signal processing system for artifact/microembolic signals and solid/gas differentiation in different clinical situations. The authors studied 3,428 HITS in vivo in a multicenter study, i.e., 1,608 artifacts in healthy subjects, 649 solid emboli in stroke patients with a carotid stenosis, and 1,171 gaseous emboli in stroke patients with patent foramen ovale. They worked with the dual-gate TCD combined to three types of statistical classifiers: binary decision trees (BDT), artificial neural networks (ANN), and support vector machines (SVM). The sensitivity and specificity to separate artifacts from microembolic signals by BDT reached was 94% and 97%, respectively. For the discrimination between solid and gaseous emboli, the classifier achieved a sensitivity and specificity of 81% and 81% for BDT, 84% and 84% for ANN, and 86% and 86% for SVM, respectively. The current results for artifact elimination and solid/gas differentiation are already useful to extract data for future prospective clinical studies.

Keywords: Air, Algorithms, Amino Acids, Animals, Artifacts, Atrial, Carotid Stenosis, Cerebrovascular Accident, Cerebrovascular Circulation, Comparative Study, Cysteine, Decision Trees, Disulfides, Doppler, Embolism, Heart Septal Defects, Humans, Intracranial Embolism, Models, Molecular, Neural Networks (Computer), Non-U.S. Gov't, Oxidation-Reduction, Protein Binding, Protein Folding, Proteins, Research Support, Sensitivity and Specificity, Transcranial, Ultrasonography, 15362716
[Cohen2004application] Gilles Cohen, Mélanie Hilario, Hugo Sax, Stéphane Hugonnet, Christian Pellegrini, and Antoine Geissbuhler. An application of one-class support vector machine to nosocomial infection detection. Medinfo, 11(Pt 1):716-20, 2004. [ bib ]
Nosocomial infections (NIs)-those acquired in health care settings-are among the major causes of increased mortality among hospitalized patients. They are a significant burden for patients and health authorities alike; it is thus important to monitor and detect them through an effective surveillance system. This paper describes a retrospective analysis of a prevalence survey of NIs done in the Geneva University Hospital. Our goal is to identify patients with one or more NIs on the basis of clinical and other data collected during the survey. In this two-class classification task, the main difficulty lies in the significant imbalance between positive or infected (11%) and negative (89%) cases. To cope with class imbalance, we investigate one-class SVMs which can be trained to distinguish two classes on the basis of examples from a single class (in this case, only "normal" or non infected patients). The infected ones are then identified as "abnormal" cases or outliers that deviate significantly from the normal profile. Experimental results are encouraging: whereas standard 2-class SVMs scored a baseline sensitivity of 50.6% on this problem, the one-class approach increased sensitivity to as much as 92.6%. These results are comparable to those obtained by the authors in a previous study on asymmetrical soft margin SVMs; they suggest that one-class SVMs can provide an effective and efficient way of overcoming data imbalance in classification problems.

Keywords: Aged, Air, Algorithms, Amino Acids, Animals, Area Under Curve, Artifacts, Artificial Intelligence, Atrial, Automated, Canada, Carotid Stenosis, Cerebrovascular Accident, Cerebrovascular Circulation, Comparative Study, Computer-Assisted, Cross Infection, Cysteine, Data Collection, Decision Trees, Dementia, Diagnosis, Disulfides, Doppler, Embolism, Expert Systems, Extramural, Factor Analysis, Female, Gene Expression, Gene Expression Profiling, Health Status, Heart Septal Defects, Hospitals, Humans, Infection Control, Intracranial Embolism, Male, Models, Molecular, Myocardial Infarction, N.I.H., Neoplasms, Neural Networks (Computer), Non-U.S. Gov't, Oligonucleotide Array Sequence Analysis, Oxidation-Reduction, P.H.S., Pattern Recognition, Population Surveillance, Prevalence, Prognosis, Protein Binding, Protein Folding, Proteins, ROC Curve, Research Support, Retrospective Studies, Sensitivity and Specificity, Software, Statistical, Switzerland, Transcranial, Treatment Outcome, U.S. Gov't, Ultrasonography, University, 15360906
[Causier2004Studying] Barry Causier. Studying the interactome with the yeast two-hybrid system and mass spectrometry. Mass Spectrom Rev, 23(5):350-367, 2004. [ bib | DOI | http ]
Protein interactions are crucial to the life of a cell. The analysis of such interactions is allowing biologists to determine the function of uncharacterized proteins and the genes that encode them. The yeast two-hybrid system has become one of the most popular and powerful tools to study protein-protein interactions. With the advent of proteomics, the two-hybrid system has found a niche in interactome mapping. However, it is clear that only by combining two-hybrid data with that from complementary approaches such as mass spectrometry (MS) can the interactome be analyzed in full. This review introduces the yeast two-hybrid system to those unfamiliar with the technique, and discusses how it can be used in combination with MS to unravel the network of protein interactions that occur in a cell.

Keywords: Genes, Fungal; Genome; Mass Spectrometry; Proteins; Proteomics; Yeasts
[Bowd2004Confocal] Christopher Bowd, Linda M Zangwill, Felipe A Medeiros, Jiucang Hao, Kwokleung Chan, Te-Won Lee, Terrence J Sejnowski, Michael H Goldbaum, Pamela A Sample, Jonathan G Crowston, and Robert N Weinreb. Confocal scanning laser ophthalmoscopy classifiers and stereophotograph evaluation for prediction of visual field abnormalities in glaucoma-suspect eyes. Invest Ophthalmol Vis Sci, 45(7):2255-62, Jul 2004. [ bib | DOI | http | .pdf ]
PURPOSE: To determine whether Heidelberg Retina Tomograph (HRT; Heidelberg Engineering, Dossenheim, Germany) classification techniques and investigational support vector machine (SVM) analyses can detect optic disc abnormalities in glaucoma-suspect eyes before the development of visual field abnormalities. METHODS: Glaucoma-suspect eyes (n = 226) were classified as converts or nonconverts based on the development of repeatable (either two or three consecutive) standard automated perimetry (SAP)-detected abnormalities over the course of the study (mean follow-up, approximately 4.5 years). Hazard ratios for development of SAP abnormalities were calculated based on baseline classification results, follow-up time, and end point status (convert, nonconvert). Classification techniques applied were HRT classification (HRTC), Moorfields Regression Analysis, forward-selection optimized SVM (SVM fwd) and backward elimination-optimized SVM (SVM back) analysis of HRT data, and stereophotograph assessment. RESULTS: Univariate analyses indicated that all classification techniques were predictors of the development of two repeatable abnormal SAP results, with hazards ratios (95% confidence interval [CI]) ranging from 1.32 (1.00-1.75) for HRTC to 2.0 (1.48-2.76) for stereophotograph assessment (all P < or = 0.05). Only SVM (SVM fwd and SVM back) analysis of HRT data and stereophotograph assessment were univariate predictors of the development of three repeatable abnormal SAP results, with hazard ratios (95% CI) ranging from 1.73 (1.16-2.82) for SVM fwd to 1.82 (1.19-3.12) for SVM back (both P < 0.007). Multivariate analyses including each classification technique individually in a model with age, baseline SAP pattern standard deviation [PSD], and baseline IOP indicated that all classification techniques except HRTC (P = 0.06) were predictors of the development of two repeatable abnormal SAP results with hazards ratios ranging from 1.30 (0.99, 1.73) for HRTC to 1.90 (1.37, 2.69) for stereophotograph assessment. Only SVM (SVM fwd and SVM back) analysis of HRT data and stereophotograph assessment were significant predictors of the development of three repeatable abnormal SAP results in multivariate analyses; hazard ratios of 1.57 (1.03, 2.59) and 1.70 (1.18, 2.51), respectively. SAP PSD was a significant predictor of two repeatable abnormal SAP results in multivariate models with all classification techniques, with hazard ratios ranging from 3.31 (1.39, 7.89) to 4.70 (2.02, 10.93) per 1-dB increase. CONCLUSIONS: HRT classifications techniques and stereophotograph assessment can detect optic disc topography abnormalities in glaucoma-suspect eyes before the development of SAP abnormalities. These data support strongly the importance of optic disc examination for early glaucoma diagnosis.

Keywords: 80 and over, Adolescent, Adult, Aged, Algorithms, Artificial Intelligence, Auditory, Benchmarking, Binding Sites, Brain Stem, Breast Diseases, Chemical, Child, Chromosomes, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, Data Interpretation, Databases, Diagnosis, Diagnostic Errors, Differential, Drug Resistance, Electroencephalography, Epilepsy, Evoked Potentials, Female, Forecasting, Gene Expression, Gene Expression Profiling, Genetic, Genotype, Glaucoma, Greece, HIV Protease Inhibitors, HIV-1, Human, Humans, Infant, Information Management, Information Storage and Retrieval, Intraocular Pressure, Kinetics, Language Development Disorders, Lasers, Least-Squares Analysis, Linear Models, Male, Microbial Sensitivity Tests, Middle Aged, Models, Molecular, Monitoring, Nephroblastoma, Non-U.S. Gov't, Nonlinear Dynamics, Ocular Hypertension, Oligonucleotide Array Sequence Analysis, Ophthalmoscopy, Optic Disk, Optic Nerve Diseases, P.H.S., Pair 1, Perimetry, Periodicals, Phosphorylation, Phosphotransferases, Photography, Physiologic, Point Mutation, Preschool, Prognosis, Protein, Proteins, Pyrimidinones, Reaction Time, Recurrence, Reproducibility of Results, Research Support, Reverse Transcriptase Inhibitors, Sensitivity and Specificity, Sequence Alignment, Sequence Analysis, Signal Processing, Software, Sound Localization, Statistical, Stochastic Processes, Structure-Activity Relationship, Theoretical, Time Factors, U.S. Gov't, Viral, Vision Disorders, Visual Fields, 15223803
[Bern2004Automatic] M. Bern, D. Goldberg, W. H. McDonald, and III Yates, J. R. Automatic Quality Assessment of Peptide Tandem Mass Spectra. Bioinformatics, 20(Suppl. 1):i49-i54, 2004. [ bib | http | .pdf ]
Motivation: A powerful proteomics methodology couples high-performance liquid chromatography (HPLC) with tandem mass spectrometry and database-search software, such as SEQUEST. Such a set-up, however, produces a large number of spectra, many of which are of too poor quality to be useful. Hence a filter that eliminates poor spectra before the database search can significantly improve throughput and robustness. Moreover, spectra judged to be of high quality, but that cannot be identified by database search, are prime candidates for still more computationally intensive methods, such as de novo sequencing or wider database searches including post-translational modifications. Results: We report on two different approaches to assessing spectral quality prior to identification: binary classification, which predicts whether or not SEQUEST will be able to make an identification, and statistical regression, which predicts a more universal quality metric involving the number of b- and y-ion peaks. The best of our binary classifiers can eliminate over 75 spectra while losing only 10 regression can pick out spectra of modified peptides that can be identified by a de novo program but not by SEQUEST. In a section of independent interest, we discuss intensity normalization of mass spectra.

Keywords: biosvm proteomics
[Baumgartner2004Supervised] C. Baumgartner, C. Bohm, D. Baumgartner, G. Marini, K. Weinberger, B. Olgemoller, B. Liebl, and A. A. Roscher. Supervised machine learning techniques for the classification of metabolic disorders in newborns. Bioinformatics, 20(17):2985-2996, 2004. [ bib | DOI | http | .pdf ]
Motivation: During the Bavarian newborn screening programme all newborns have been tested for about 20 inherited metabolic disorders. Owing to the amount and complexity of the generated experimental data, machine learning techniques provide a promising approach to investigate novel patterns in high-dimensional metabolic data which form the source for constructing classification rules with high discriminatory power. Results: Six machine learning techniques have been investigated for their classification accuracy focusing on two metabolic disorders, phenylketo nuria (PKU) and medium-chain acyl-CoA dehydrogenase deficiency (MCADD). Logistic regression analysis led to superior classification rules (sensitivity >96.8 to all investigated algorithms. Including novel constellations of metabolites into the models, the positive predictive value could be strongly increased (PKU 71.9 54.6 clearly prove that the mined data confirm the known and indicate some novel metabolic patterns which may contribute to a better understanding of newborn metabolism. Availability: WEKA machine learning package: www.cs.waikato.ac.nz/ ml/weka and statistical software package ADE-4: http://pbil.univ-lyon1.fr/ADE-4

Keywords: biosvm proteomics
[Perola2004Conformational] Emanuele Perola and Paul S Charifson. Conformational analysis of drug-like molecules bound to proteins: an extensive study of ligand reorganization upon binding. J. Med. Chem., 47(10):2499-2510, May 2004. [ bib | DOI | http ]
This paper describes a large-scale study on the nature and the energetics of the conformational changes drug-like molecules experience upon binding. Ligand strain energies and conformational reorganization were analyzed with different computational methods on 150 crystal structures of pharmaceutically relevant protein-ligand complexes. The common knowledge that ligands rarely bind in their lowest calculated energy conformation was confirmed. Additionally, we found that over 60% of the ligands do not bind in a local minimum conformation. While approximately 60% of the ligands were calculated to bind with strain energies lower than 5 kcal/mol, strain energies over 9 kcal/mol were calculated in at least 10% of the cases regardless of the method used. A clear correlation was found between acceptable strain energy and ligand flexibility, while there was no correlation between strain energy and binding affinity, thus indicating that expensive conformational rearrangements can be tolerated in some cases without overly penalizing the tightness of binding. On the basis of the trends observed, thresholds for the acceptable strain energies of bioactive conformations were defined with consideration of the impact of ligand flexibility. An analysis of the degree of folding of the bound ligands confirmed the general tendency of small molecules to bind in an extended conformation. The results suggest that the unfolding of hydrophobic ligands during binding, which exposes hydrophobic surfaces to contact with protein residues, could be one of the factors accounting for high reorganization energies. Finally, different methods for conformational analysis were evaluated, and guidelines were defined to maximize the prevalence of bioactive conformations in computationally generated ensembles.

Keywords: Drug Design; Endopeptidases; Ligands; Molecular Conformation; Pharmaceutical Preparations; Phosphotransferases; Protein Binding; Protein Folding; Proteins; Thermodynamics
[Mestres2004Computational] Jordi Mestres. Computational chemogenomics approaches to systematic knowledge-based drug discovery. Curr Opin Drug Discov Devel, 7(3):304-313, May 2004. [ bib ]
Chemogenomics, the identification of all possible drugs for all possible targets, has recently emerged as a new paradigm in drug discovery in which efficiency in the compound design and optimization process is achieved through the gain and reuse of targeted knowledge. As targeted knowledge resides at the interface between chemistry and biology, computational tools aimed at integrating the chemical and biological spaces play a central role in chemogenomics. This review covers the recent progress made in integrative computational approaches to data annotation and knowledge generation for the systematic knowledge-based design and screening of chemical libraries.

Keywords: Chemistry, Pharmaceutical; Combinatorial Chemistry Techniques; Computational Biology; Drug Design; Genomics; Ligands; Proteins; Receptors, G-Protein-Coupled
[Luo2004gene-silencing] K. Q. Luo and D. C. Chang. The gene-silencing efficiency of siRNA is strongly dependent on the local structure of mRNA at the targeted region. Biochem. Biophys. Res. Commun., 318(1):303-10, May 2004. [ bib | DOI | http ]
The gene-silencing effect of short interfering RNA (siRNA) is known to vary strongly with the targeted position of the mRNA. A number of hypotheses have been suggested to explain this phenomenon. We would like to test if this positional effect is mainly due to the secondary structure of the mRNA at the target site. We proposed that this structural factor can be characterized by a single parameter called "the hydrogen bond (H-b) index," which represents the average number of hydrogen bonds formed between nucleotides in the target region and the rest of the mRNA. This index can be determined using a computational approach. We tested the correlation between the H-b index and the gene-silencing effects on three genes (Bcl-2, hTF, and cyclin B1) using a variety of siRNAs. We found that the gene-silencing effect is inversely dependent on the H-b index, indicating that the local mRNA structure at the targeted site is the main cause of the positional effect. Based on this finding, we suggest that the H-b index can be a useful guideline for future siRNA design.

Keywords: Animals, Apoptosis, Base Composition, Base Pairing, Base Sequence, Binding Sites, Cell Cycle, Cell Proliferation, Comparative Study, Cultured, Cyclin B, Cyclin D1, DNA-Binding Proteins, Down-Regulation, Extramural, Fluorescence, Gene Silencing, Gene Targeting, Genetic Vectors, Green Fluorescent Proteins, Hela Cells, Humans, Hydrogen Bonding, Luminescent Proteins, Male, Messenger, Mice, Microscopy, Models, Molecular, Molecular Sequence Data, N.I.H., Non-U.S. Gov't, Nucleic Acid Conformation, Nude, P.H.S., Prostatic Neoplasms, Proto-Oncogene Proteins c-bcl-2, Proto-Oncogene Proteins c-myc, RNA, Regression Analysis, Research Support, STAT3 Transcription Factor, Small Interfering, Thromboplastin, Trans-Activators, Tumor Cells, U.S. Gov't, 15110788
[Kim2004Emotion] K. H. Kim, S. W. Bang, and S. R. Kim. Emotion recognition system using short-term monitoring of physiological signals. Med Biol Eng Comput, 42(3):419-27, May 2004. [ bib ]
A physiological signal-based emotion recognition system is reported. The system was developed to operate as a user-independent system, based on physiological signal databases obtained from multiple subjects. The input signals were electrocardiogram, skin temperature variation and electrodermal activity, all of which were acquired without much discomfort from the body surface, and can reflect the influence of emotion on the autonomic nervous system. The system consisted of preprocessing, feature extraction and pattern classification stages. Preprocessing and feature extraction methods were devised so that emotion-specific characteristics could be extracted from short-segment signals. Although the features were carefully extracted, their distribution formed a classification problem, with large overlap among clusters and large variance within clusters. A support vector machine was adopted as a pattern classifier to resolve this difficulty. Correct-classification ratios for 50 subjects were 78.4% and 61.8%, for the recognition of three and four categories, respectively.

Keywords: Algorithms, Animals, Antisense, Artificial Intelligence, Autonomic Nervous System, Cell Line, Child, Cluster Analysis, Comparative Study, Computational Biology, Computer Simulation, Computer-Assisted, DNA Fingerprinting, Drug Evaluation, Emotions, Fluorescence, Fuzzy Logic, Gene Silencing, Gene Targeting, Genetic, Hela Cells, Humans, Imaging, Intracellular Space, Microscopy, Models, Monitoring, Neoplasms, Neural Networks (Computer), Non-U.S. Gov't, Oligonucleotides, P.H.S., Physiologic, Preclinical, Preschool, Prognosis, Proteomics, Quantitative Structure-Activity Relationship, RNA, RNA Interference, Recognition (Psychology), Research Support, Sensitivity and Specificity, Signal Processing, Small Interfering, Thionucleotides, Three-Dimensional, Tumor, U.S. Gov't, User-Computer Interface, 15191089
[Zhang2005MULTIPRED] G. L. Zhang, A. M. Khan, K. N. Srinivasan, J. T. August, and V. Brusic. MULTIPRED: a computational system for prediction of promiscuous HLA binding peptides. Nucleic Acids Res/, 33(Web Server issue):W172-W179, Jul 2005. [ bib | DOI | http ]
MULTIPRED is a web-based computational system for the prediction of peptide binding to multiple molecules (proteins) belonging to human leukocyte antigens (HLA) class I A2, A3 and class II DR supertypes. It uses hidden Markov models and artificial neural network methods as predictive engines. A novel data representation method enables MULTIPRED to predict peptides that promiscuously bind multiple HLA alleles within one HLA supertype. Extensive testing was performed for validation of the prediction models. Testing results show that MULTIPRED is both sensitive and specific and it has good predictive ability (area under the receiver operating characteristic curve A(ROC) > 0.80). MULTIPRED can be used for the mapping of promiscuous T-cell epitopes as well as the regions of high concentration of these targets-termed T-cell epitope hotspots. MULTIPRED is available at http://antigen.i2r.a-star.edu.sg/multipred/.

Keywords: Algorithms, Amino Acid Sequence, Antigen-Antibody Complex, Automated, Binding Sites, Computational Biology, Drug Delivery Systems, Drug Design, Epitopes, HLA Antigens, HLA-A Antigens, HLA-DR Antigens, Humans, Internet, Markov Chains, Molecular Sequence Data, Neural Networks (Computer), Pattern Recognition, Peptides, Protein, Protein Binding, Protein Interaction Mapping, Sequence Analysis, Software, T-Lymphocyte, User-Computer Interface, Viral Vaccines, 15980449
[Thukral2005Prediction] Sushil K Thukral, Paul J Nordone, Rong Hu, Leah Sullivan, Eric Galambos, Vincent D Fitzpatrick, Laura Healy, Michael B Bass, Mary E Cosenza, and Cynthia A Afshari. Prediction of nephrotoxicant action and identification of candidate toxicity-related biomarkers. Toxicol Pathol, 33(3):343-55, 2005. [ bib | DOI | http ]
A vast majority of pharmacological compounds and their metabolites are excreted via the urine, and within the complex structure of the kidney,the proximal tubules are a main target site of nephrotoxic compounds. We used the model nephrotoxicants mercuric chloride, 2-bromoethylamine hydrobromide, hexachlorobutadiene, mitomycin, amphotericin, and puromycin to elucidate time- and dose-dependent global gene expression changes associated with proximal tubular toxicity. Male Sprague-Dawley rats were dosed via intraperitoneal injection once daily for mercuric chloride and amphotericin (up to 7 doses), while a single dose was given for all other compounds. Animals were exposed to 2 different doses of these compounds and kidney tissues were collected on day 1, 3, and 7 postdosing. Gene expression profiles were generated from kidney RNA using 17K rat cDNA dual dye microarray and analyzed in conjunction with histopathology. Analysis of gene expression profiles showed that the profiles clustered based on similarities in the severity and type of pathology of individual animals. Further, the expression changes were indicative of tubular toxicity showing hallmarks of tubular degeneration/regeneration and necrosis. Use of gene expression data in predicting the type of nephrotoxicity was then tested with a support vector machine (SVM)-based approach. A SVM prediction module was trained using 120 profiles of total profiles divided into four classes based on the severity of pathology and clustering. Although mitomycin C and amphotericin B treatments did not cause toxicity, their expression profiles were included in the SVM prediction module to increase the sample size. Using this classifier, the SVM predicted the type of pathology of 28 test profiles with 100% selectivity and 82% sensitivity. These data indicate that valid predictions could be made based on gene expression changes from a small set of expression profiles. A set of potential biomarkers showing a time- and dose-response with respect to the progression of proximal tubular toxicity were identified. These include several transporters (Slc21a2, Slc15, Slc34a2), Kim 1, IGFbp-1, osteopontin, alpha-fibrinogen, and Gstalpha.

Keywords: Algorithms, Animals, Antibiotics, Antineoplastic, Artificial Intelligence, Butadienes, Chloroplasts, Comparative Study, Computer Simulation, Computer-Assisted, Diagnosis, Disinfectants, Dose-Response Relationship, Drug, Drug Toxicity, Electrodes, Electroencephalography, Ethylamines, Expert Systems, Feedback, Fungicides, Gene Expression Profiling, Genes, Genetic Markers, Humans, Implanted, Industrial, Information Storage and Retrieval, Kidney, Kidney Tubules, MEDLINE, Male, Mercuric Chloride, Microarray Analysis, Molecular Biology, Motor Cortex, Movement, Natural Language Processing, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Plant Proteins, Predictive Value of Tests, Proteins, Proteome, Proximal, Puromycin Aminonucleoside, Rats, Reproducibility of Results, Research Support, Sprague-Dawley, Subcellular Fractions, Terminology, Therapy, Time Factors, Toxicogenetics, U.S. Gov't, User-Computer Interface, 15805072
[Stelzl2005human] Ulrich Stelzl, Uwe Worm, Maciej Lalowski, Christian Haenig, Felix H Brembeck, Heike Goehler, Martin Stroedicke, Martina Zenkner, Anke Schoenherr, Susanne Koeppen, Jan Timm, Sascha Mintzlaff, Claudia Abraham, Nicole Bock, Silvia Kietzmann, Astrid Goedde, Engin Toksöz, Anja Droege, Sylvia Krobitsch, Bernhard Korn, Walter Birchmeier, Hans Lehrach, and Erich E Wanker. A human protein-protein interaction network: a resource for annotating the proteome. Cell, 122(6):957-968, Sep 2005. [ bib | DOI | http ]
Protein-protein interaction maps provide a valuable framework for a better understanding of the functional organization of the proteome. To detect interacting pairs of human proteins systematically, a protein matrix of 4456 baits and 5632 preys was screened by automated yeast two-hybrid (Y2H) interaction mating. We identified 3186 mostly novel interactions among 1705 proteins, resulting in a large, highly connected network. Independent pull-down and co-immunoprecipitation assays validated the overall quality of the Y2H interactions. Using topological and GO criteria, a scoring system was developed to define 911 high-confidence interactions among 401 proteins. Furthermore, the network was searched for interactions linking uncharacterized gene products and human disease proteins to regulatory cellular pathways. Two novel Axin-1 interactions were validated experimentally, characterizing ANP32A and CRMP1 as modulators of Wnt signaling. Systematic human protein interaction screens can lead to a more comprehensive understanding of protein function and cellular processes.

Keywords: Databases as Topic; Humans; Intracellular Signaling Peptides and Proteins; Models, Molecular; Nerve Tissue Proteins; Protein Binding; Proteins; Proteomics; Repressor Proteins; Two-Hybrid System Techniques
[Shulman-Peleg2005SiteEngines] Alexandra Shulman-Peleg, Ruth Nussinov, and Haim J Wolfson. Siteengines: recognition and comparison of binding sites and protein-protein interfaces. Nucleic Acids Res, 33(Web Server issue):W337-W341, Jul 2005. [ bib | DOI | http ]
Protein surface regions with similar physicochemical properties and shapes may perform similar functions and bind similar binding partners. Here we present two web servers and software packages for recognition of the similarity of binding sites and interfaces. Both methods recognize local geometrical and physicochemical similarity, which can be present even in the absence of overall sequence or fold similarity. The first method, SiteEngine (http:/bioinfo3d.cs.tau.ac.il/SiteEngine), receives as an input two protein structures and searches the complete surface of one protein for regions similar to the binding site of the other. The second, Interface-to-Interface (I2I)-SiteEngine (http:/bioinfo3d.cs.tau.ac.il/I2I-SiteEngine), compares protein-protein interfaces, which are regions of interaction between two protein molecules. It receives as an input two structures of protein-protein complexes, extracts the interfaces and finds the three-dimensional transformation that maximizes the similarity between two pairs of interacting binding sites. The output of both servers consists of a superimposition in PDB file format and a list of physicochemical properties shared by the compared entities. The methods are highly efficient and the freely available software packages are suitable for large-scale database searches of the entire PDB.

Keywords: Amino Acids, chemistry; Binding Sites; Internet; Multiprotein Complexes, chemistry/metabolism; Protein Conformation; Protein Interaction Mapping, methods; Software; User-Computer Interface
[Shen2005[Detection] Li Shen, Jie Yang, and Yue Zhou. Detection of PVCs with support vector machine. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi, 22(1):78-81, Feb 2005. [ bib ]
The classifiction of heart beats is the foundation for automated arrhythmia monitoring devices. Support vector machnies (SVMs) have meant a great advance in solving classification or pattern recognition. This study describes SVM for the identification of premature ventricular contractions (PVCs) in surface ECGs. Features for the classification task are extracted by analyzing the heart rate, morphology and wavelet energy of the heart beats from a single lead. The performance of different SVMs is evaluated on the MIT-BIH arrhythmia database following the association for the advancement of medical instrumentation (AAMI) recommendations.

Keywords: 80 and over, Adult, Aged, Algorithms, Amino Acids, Animals, Area Under Curve, Artifacts, Automated, Birefringence, Brain Chemistry, Brain Neoplasms, Comparative Study, Computer-Assisted, Cornea, Cross-Sectional Studies, Decision Trees, Diagnosis, Diagnostic Imaging, Diagnostic Techniques, Discriminant Analysis, Evolution, Face, Female, Genetic, Glaucoma, Humans, Intraocular Pressure, Lasers, Least-Squares Analysis, Magnetic Resonance Imaging, Magnetic Resonance Spectroscopy, Male, Middle Aged, Models, Molecular, Nerve Fibers, Non-U.S. Gov't, Numerical Analysis, Ophthalmological, Optic Nerve Diseases, Optical Coherence, P.H.S., Pattern Recognition, Photic Stimulation, Prospective Studies, Protein, ROC Curve, Regression Analysis, Research Support, Retinal Ganglion Cells, Sensitivity and Specificity, Sequence Analysis, Statistics, Tomography, U.S. Gov't, Visual Fields, beta-Lactamases, 15762121
[Sheinerman2005High] Felix B Sheinerman, Elie Giraud, and Abdelazize Laoui. High affinity targets of protein kinase inhibitors have similar residues at the positions energetically important for binding. J. Mol. Biol., 352(5):1134-1156, Oct 2005. [ bib | DOI | http ]
Inhibition of protein kinase activity is a focus of intense drug discovery efforts in several therapeutic areas. Major challenges facing the field include understanding of the factors determining the selectivity of kinase inhibitors and the development of compounds with the desired selectivity profile. Here, we report the analysis of sequence variability among high and low affinity targets of eight different small molecule kinase inhibitors (BIRB796, Tarceva, NU6102, Gleevec, SB203580, balanol, H89, PP1). It is observed that all high affinity targets of each inhibitor are found among a relatively small number of kinases, which have similar residues at the specific positions important for binding. The findings are highly statistically significant, and allow one to exclude the majority of kinases in a genome from a list of likely targets for an inhibitor. The findings have implications for the design of novel inhibitors with a desired selectivity profile (e.g. targeted at multiple kinases), the discovery of new targets for kinase inhibitor drugs, comparative analysis of different in vivo models, and the design of "a-la-carte" chemical libraries tailored for individual kinases.

Keywords: Amino Acid Sequence; Amino Acids; Binding Sites; Electrostatics; Humans; Ligands; Molecular Sequence Data; Piperazines; Protein Binding; Protein Kinase Inhibitors; Protein Kinases; Pyrazoles; Pyrimidines; Sequence Alignment; Thermodynamics
[Shadforth2005Protein] Ian Shadforth, Daniel Crowther, and Conrad Bessant. Protein and peptide identification algorithms using ms for use in high-throughput, automated pipelines. Proteomics, 5(16):4082-4095, Nov 2005. [ bib | DOI | http ]
Current proteomics experiments can generate vast quantities of data very quickly, but this has not been matched by data analysis capabilities. Although there have been a number of recent reviews covering various aspects of peptide and protein identification methods using MS, comparisons of which methods are either the most appropriate for, or the most effective at, their proposed tasks are not readily available. As the need for high-throughput, automated peptide and protein identification systems increases, the creators of such pipelines need to be able to choose algorithms that are going to perform well both in terms of accuracy and computational efficiency. This article therefore provides a review of the currently available core algorithms for PMF, database searching using MS/MS, sequence tag searches and de novo sequencing. We also assess the relative performances of a number of these algorithms. As there is limited reporting of such information in the literature, we conclude that there is a need for the adoption of a system of standardised reporting on the performance of new peptide and protein identification algorithms, based upon freely available datasets. We go on to present our initial suggestions for the format and content of these datasets.

Keywords: Algorithms; Alternative Splicing; Databases, Protein; Peptides; Polymorphism, Genetic; Proteins; Proteomics; Sequence Analysis; Software; Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization
[Seike2005Proteomic] M. Seike, T. Kondo, K. Fujii, T. Okano, T. Yamada, Y. Matsuno, A. Gemma, S. Kudoh, and S. Hirohashi. Proteomic signatures for histological types of lung cancer. Proteomics, Jul 2005. [ bib | DOI | http | .pdf ]
We performed proteomic studies on lung cancer cells to elucidate the mechanisms that determine histological phenotype. Thirty lung cancer cell lines with three different histological backgrounds (squamous cell carcinoma, small cell lung carcinoma and adenocarcinoma) were subjected to two-dimensional difference gel electrophoresis (2-D DIGE) and grouped by multivariate analyses on the basis of their protein expression profiles. 2-D DIGE achieves more accurate quantification of protein expression by using highly sensitive fluorescence dyes to label the cysteine residues of proteins prior to two-dimensional polyacrylamide gel electrophoresis. We found that hierarchical clustering analysis and principal component analysis divided the cell lines according to their original histology. Spot ranking analysis using a support vector machine algorithm and unsupervised classification methods identified 32 protein spots essential for the classification. The proteins corresponding to the spots were identified by mass spectrometry. Next, lung cancer cells isolated from tumor tissue by laser microdissection were classified on the basis of the expression pattern of these 32 protein spots. Based on the expression profile of the 32 spots, the isolated cancer cells were categorized into three histological groups: the squamous cell carcinoma group, the adenocarcinoma group, and a group of carcinomas with other histological types. In conclusion, our results demonstrate the utility of quantitative proteomic analysis for molecular diagnosis and classification of lung cancer cells.

Keywords: biosvm proteomics
[Sassi2005automated] Alexander P Sassi, Frank Andel, Hans-Marcus L Bitter, Michael P S Brown, Robert G Chapman, Jeraldine Espiritu, Alfred C Greenquist, Isabelle Guyon, Mariana Horchi-Alegre, Kathy L Stults, Ann Wainright, Jonathan C Heller, and John T Stults. An automated, sheathless capillary electrophoresis-mass spectrometry platform for discovery of biomarkers in human serum. Electrophoresis, 26(7-8):1500-12, Apr 2005. [ bib | DOI | http | .pdf ]
A capillary electrophoresis-mass spectrometry (CE-MS) method has been developed to perform routine, automated analysis of low-molecular-weight peptides in human serum. The method incorporates transient isotachophoresis for in-line preconcentration and a sheathless electrospray interface. To evaluate the performance of the method and demonstrate the utility of the approach, an experiment was designed in which peptides were added to sera from individuals at each of two different concentrations, artificially creating two groups of samples. The CE-MS data from the serum samples were divided into separate training and test sets. A pattern-recognition/feature-selection algorithm based on support vector machines was used to select the mass-to-charge (m/z) values from the training set data that distinguished the two groups of samples from each other. The added peptides were identified correctly as the distinguishing features, and pattern recognition based on these peptides was used to assign each sample in the independent test set to its respective group. A twofold difference in peptide concentration could be detected with statistical significance (p-value < 0.0001). The accuracy of the assignment was 95%, demonstrating the utility of this technique for the discovery of patterns of biomarkers in serum.

Keywords: 80 and over, Adult, Aged, Algorithms, Amino Acids, Animals, Area Under Curve, Artifacts, Automated, Birefringence, Brain Chemistry, Brain Neoplasms, Comparative Study, Computer-Assisted, Cornea, Cross-Sectional Studies, Decision Trees, Diagnosis, Diagnostic Imaging, Diagnostic Techniques, Discriminant Analysis, Evolution, Face, Female, Genetic, Glaucoma, Humans, Intraocular Pressure, Lasers, Least-Squares Analysis, Magnetic Resonance Imaging, Magnetic Resonance Spectroscopy, Male, Middle Aged, Models, Molecular, Nerve Fibers, Non-U.S. Gov't, Numerical Analysis, Ophthalmological, Optic Nerve Diseases, Optical Coherence, P.H.S., Pattern Recognition, Photic Stimulation, Prospective Studies, Protein, ROC Curve, Regression Analysis, Research Support, Retinal Ganglion Cells, Sensitivity and Specificity, Sequence Analysis, Statistics, Tomography, U.S. Gov't, Visual Fields, beta-Lactamases, 15765480
[Rual2005Towards] Jean-François Rual, Kavitha Venkatesan, Tong Hao, Tomoko Hirozane-Kishikawa, Amélie Dricot, Ning Li, Gabriel F Berriz, Francis D Gibbons, Matija Dreze, Nono Ayivi-Guedehoussou, Niels Klitgord, Christophe Simon, Mike Boxem, Stuart Milstein, Jennifer Rosenberg, Debra S Goldberg, Lan V Zhang, Sharyl L Wong, Giovanni Franklin, Siming Li, Joanna S Albala, Janghoo Lim, Carlene Fraughton, Estelle Llamosas, Sebiha Cevik, Camille Bex, Philippe Lamesch, Robert S Sikorski, Jean Vandenhaute, Huda Y Zoghbi, Alex Smolyar, Stephanie Bosak, Reynaldo Sequerra, Lynn Doucette-Stamm, Michael E Cusick, David E Hill, Frederick P Roth, and Marc Vidal. Towards a proteome-scale map of the human protein-protein interaction network. Nature, 437(7062):1173-1178, Oct 2005. [ bib | DOI | http ]
Systematic mapping of protein-protein interactions, or 'interactome' mapping, was initiated in model organisms, starting with defined biological processes and then expanding to the scale of the proteome. Although far from complete, such maps have revealed global topological and dynamic features of interactome networks that relate to known biological properties, suggesting that a human interactome map will provide insight into development and disease mechanisms at a systems level. Here we describe an initial version of a proteome-scale map of human binary protein-protein interactions. Using a stringent, high-throughput yeast two-hybrid system, we tested pairwise interactions among the products of approximately 8,100 currently available Gateway-cloned open reading frames and detected approximately 2,800 interactions. This data set, called CCSB-HI1, has a verification rate of approximately 78% as revealed by an independent co-affinity purification assay, and correlates significantly with other biological attributes. The CCSB-HI1 data set increases by approximately 70% the set of available binary interactions within the tested space and reveals more than 300 new connections to over 100 disease-associated proteins. This work represents an important step towards a systematic and comprehensive human interactome project.

Keywords: Cloning, Molecular; Humans; Open Reading Frames; Protein Binding; Proteome; RNA; Saccharomyces cerevisiae; Two-Hybrid System Techniques
[Rice2005Reconstructing] J.J. Rice, Y. Tu, and G. Stolovitzky. Reconstructing biological networks using conditional correlation analysis. Bioinformatics, 21(6):765-773, Mar 2005. [ bib | DOI | http ]
MOTIVATION: One of the present challenges in biological research is the organization of the data originating from high-throughput technologies. One way in which this information can be organized is in the form of networks of influences, physical or statistical, between cellular components. We propose an experimental method for probing biological networks, analyzing the resulting data and reconstructing the network architecture. METHODS: We use networks of known topology consisting of nodes (genes), directed edges (gene-gene interactions) and a dynamics for the genes' mRNA concentrations in terms of the gene-gene interactions. We proposed a network reconstruction algorithm based on the conditional correlation of the mRNA equilibrium concentration between two genes given that one of them was knocked down. Using simulated gene expression data on networks of known connectivity, we investigated how the reconstruction error is affected by noise, network topology, size, sparseness and dynamic parameters. RESULTS: Errors arise from correlation between nodes connected through intermediate nodes (false positives) and when the correlation between two directly connected nodes is obscured by noise, non-linearity or multiple inputs to the target node (false negatives). Two critical components of the method are as follows: (1) the choice of an optimal correlation threshold for predicting connections and (2) the reduction of errors arising from indirect connections (for which a novel algorithm is proposed). With these improvements, we can reconstruct networks with the topology of the transcriptional regulatory network in Escherichia coli with a reasonably low error rate.

Keywords: Algorithms; Computer Simulation; Gene Expression Profiling; Gene Expression Regulation; Models, Biological; Models, Statistical; Oligonucleotide Array Sequence Analysis; Protein Interaction Mapping; Signal Transduction; Statistics as Topic; Transcription Factors
[Perez-Cruz2005Convergence] Fernando Pérez-Cruz, Carlos Bousoño-Calzón, and Antonio Artés-Rodríguez. Convergence of the IRWLS Procedure to the Support Vector Machine Solution. Neural Comput, 17(1):7-18, Jan 2005. [ bib ]
An iterative reweighted least squares (IRWLS) procedure recently proposed is shown to converge to the support vector machine solution. The convergence to a stationary point is ensured by modifying the original IRWLS procedure.

Keywords: 80 and over, Aged, Algorithms, Amino Acids, Animals, Area Under Curve, Automated, Brain Chemistry, Brain Neoplasms, Comparative Study, Computer-Assisted, Cross-Sectional Studies, Decision Trees, Diagnosis, Diagnostic Imaging, Diagnostic Techniques, Discriminant Analysis, Evolution, Face, Genetic, Glaucoma, Humans, Lasers, Least-Squares Analysis, Magnetic Resonance Imaging, Magnetic Resonance Spectroscopy, Middle Aged, Models, Molecular, Nerve Fibers, Non-U.S. Gov't, Numerical Analysis, Ophthalmological, Optic Nerve Diseases, P.H.S., Pattern Recognition, Photic Stimulation, Protein, ROC Curve, Regression Analysis, Research Support, Retinal Ganglion Cells, Sensitivity and Specificity, Sequence Analysis, Statistics, U.S. Gov't, beta-Lactamases, 15779160
[Peters2005Generating] Bjoern Peters and Alessandro Sette. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinformatics, 6:132, 2005. [ bib | DOI | http ]
BACKGROUND: Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as a sequence motif, a scoring matrix or an artificial neural network. The purpose of these models is two-fold. First, they can provide a summary of experimental results, allowing for a deeper understanding of the mechanisms involved in sequence recognition. Second, such models can be used to predict the experimental outcome for yet untested sequences. In the past we reported the development of a method to generate such models called the Stabilized Matrix Method (SMM). This method has been successfully applied to predicting peptide binding to MHC molecules, peptide transport by the transporter associated with antigen presentation (TAP) and proteasomal cleavage of protein sequences. RESULTS: Herein we report the implementation of the SMM algorithm as a publicly available software package. Specific features determining the type of problems the method is most appropriate for are discussed. Advantageous features of the package are: (1) the output generated is easy to interpret, (2) input and output are both quantitative, (3) specific computational strategies to handle experimental noise are built in, (4) the algorithm is designed to effectively handle bounded experimental data, (5) experimental data from randomized peptide libraries and conventional peptides can easily be combined, and (6) it is possible to incorporate pair interactions between positions of a sequence. CONCLUSION: Making the SMM method publicly available enables bioinformaticians and experimental biologists to easily access it, to compare its performance to other prediction methods, and to extend it to other applications.

Keywords: Algorithms; Amino Acid Sequence; Biology; Computational Biology; Computer Simulation; Data Interpretation, Statistical; Databases, Protein; Models, Biological; Models, Statistical; Neural Networks (Computer); Peptide Library; Peptides; Programming Languages; Prote; Sensitivity and Specificity; Software; in Binding
[Papadopoulos2005Characterization] A. Papadopoulos, D. I. Fotiadis, and A. Likas. Characterization of clustered microcalcifications in digitized mammograms using neural networks and support vector machines. Artif. Intell. Med., 34(2):141-50, Jun 2005. [ bib | DOI | http | .pdf ]
OBJECTIVE: Detection and characterization of microcalcification clusters in mammograms is vital in daily clinical practice. The scope of this work is to present a novel computer-based automated method for the characterization of microcalcification clusters in digitized mammograms. METHODS AND MATERIAL: The proposed method has been implemented in three stages: (a) the cluster detection stage to identify clusters of microcalcifications, (b) the feature extraction stage to compute the important features of each cluster and (c) the classification stage, which provides with the final characterization. In the classification stage, a rule-based system, an artificial neural network (ANN) and a support vector machine (SVM) have been implemented and evaluated using receiver operating characteristic (ROC) analysis. The proposed method was evaluated using the Nijmegen and Mammographic Image Analysis Society (MIAS) mammographic databases. The original feature set was enhanced by the addition of four rule-based features. RESULTS AND CONCLUSIONS: In the case of Nijmegen dataset, the performance of the SVM was Az=0.79 and 0.77 for the original and enhanced feature set, respectively, while for the MIAS dataset the corresponding characterization scores were Az=0.81 and 0.80. Utilizing neural network classification methodology, the corresponding performance for the Nijmegen dataset was Az=0.70 and 0.76 while for the MIAS dataset it was Az=0.73 and 0.78. Although the obtained high classification performance can be successfully applied to microcalcification clusters characterization, further studies must be carried out for the clinical evaluation of the system using larger datasets. The use of additional features originating either from the image itself (such as cluster location and orientation) or from the patient data may further improve the diagnostic value of the system.

Keywords: Apoptosis, Gene Expression Profiling, Humans, Neoplasms, Non-U.S. Gov't, Oligonucleotide Array Sequence Analysis, Polymerase Chain Reaction, Proteins, Research Support, Subcellular Fractions, Unknown Primary, 15894178
[Nabieva2005Whole-proteome] Elena Nabieva, Kam Jim, Amit Agarwal, Bernard Chazelle, and Mona Singh. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics, 21 Suppl 1:i302-i310, Jun 2005. [ bib | DOI | http ]
MOTIVATION: Determining protein function is one of the most important problems in the post-genomic era. For the typical proteome, there are no functional annotations for one-third or more of its proteins. Recent high-throughput experiments have determined proteome-scale protein physical interaction maps for several organisms. These physical interactions are complemented by an abundance of data about other types of functional relationships between proteins, including genetic interactions, knowledge about co-expression and shared evolutionary history. Taken together, these pairwise linkages can be used to build whole-proteome protein interaction maps. RESULTS: We develop a network-flow based algorithm, FunctionalFlow, that exploits the underlying structure of protein interaction maps in order to predict protein function. In cross-validation testing on the yeast proteome, we show that FunctionalFlow has improved performance over previous methods in predicting the function of proteins with few (or no) annotated protein neighbors. By comparing several methods that use protein interaction maps to predict protein function, we demonstrate that FunctionalFlow performs well because it takes advantage of both network topology and some measure of locality. Finally, we show that performance can be improved substantially as we consider multiple data sources and use them to create weighted interaction networks. AVAILABILITY: http://compbio.cs.princeton.edu/function

Keywords: Algorithms; Computational Biology, methods; Evolution, Molecular; Fungal Proteins, chemistry; Genomics; Models, Statistical; Models, Theoretical; Protein Interaction Mapping, methods; Proteins, chemistry; Proteomics, methods
[Miteva2005Fast] M. A. Miteva, W. H. Lee, M. O. Montes, and B. O. Villoutreix. Fast structure-based virtual ligand screening combining FRED, DOCK, and Surflex. J. Med. Chem., 48(19):6012-6022, Sep 2005. [ bib | DOI | http ]
A protocol was devised in which FRED, DOCK, and Surflex were combined in a multistep virtual ligand screening (VLS) procedure to screen the pocket of four different proteins. One goal was to evaluate the impact of chaining "freely available packages to academic users" on docking/scoring accuracy and CPU time consumption. A bank of 65 660 compounds including 49 known actives was generated. Our procedure is successful because docking/scoring parameters are tuned according to the nature of the binding pocket and because a shape-based filtering tool is applied prior to flexible docking. The obtained enrichment factors are in line with those reported in recent studies. We suggest that consensus docking/scoring could be valuable to some drug discovery projects. The present protocol could process the entire bank for one receptor in less than a week on one processor, suggesting that VLS experiments could be performed even without large computer resources.

Keywords: Binding Sites, Databases, Estrogen, Factor VIIa, Factual, Ligands, Molecular Structure, Neuraminidase, Non-U.S. Gov't, Protein Binding, Quantitative Structure-Activity Relationship, Receptors, Research Support, Thymidine Kinase, 16162004
[Micchelli2005On] Charles A Micchelli and Massimiliano Pontil. On learning vector-valued functions. Neural Comput, 17(1):177-204, Jan 2005. [ bib | DOI | http ]
In this letter, we provide a study of learning in a Hilbert space of vectorvalued functions. We motivate the need for extending learning theory of scalar-valued functions by practical considerations and establish some basic results for learning vector-valued functions that should prove useful in applications. Specifically, we allow an output space Y to be a Hilbert space, and we consider a reproducing kernel Hilbert space of functions whose values lie in Y. In this setting, we derive the form of the minimal norm interpolant to a finite set of data and apply it to study some regularization functionals that are important in learning theory. We consider specific examples of such functionals corresponding to multiple-output regularization networks and support vector machines, for both regression and classification. Finally, we provide classes of operator-valued kernels of the dot product and translation-invariant type.

Keywords: Algorithms, Amino Acid, Amino Acids, Artificial Intelligence, Ascomycota, Automated, Base Sequence, Chromosome Mapping, Codon, Colonic Neoplasms, Comparative Study, Computer Simulation, Computer-Assisted, Computing Methodologies, Crystallography, DNA, DNA Primers, Databases, Decision Support Techniques, Diagnostic Imaging, Enzymes, Feedback, Fixation, Gene Expression Profiling, Genetic, Hordeum, Host-Parasite Relations, Humans, Image Enhancement, Image Interpretation, Informatics, Information Storage and Retrieval, Kinetics, Logistic Models, Magnetic Resonance Spectroscopy, Mathematical Computing, Models, Nanotechnology, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Nonlinear Dynamics, Ocular, Oligonucleotide Array Sequence Analysis, P.H.S., Pattern Recognition, Plant, Plants, Predictive Value of Tests, Protein, Protein Conformation, Regression Analysis, Research Support, Sample Size, Selection (Genetics), Sequence Alignment, Sequence Analysis, Sequence Homology, Signal Processing, Skin, Software, Statistical, Subtraction Technique, Theoretical, Thermodynamics, U.S. Gov't, Viral Proteins, X-Ray, 15563752
[Mavroforakis2005Significance] Michael Mavroforakis, Harris Georgiou, Nikos Dimitropoulos, Dionisis Cavouras, and Sergios Theodoridis. Significance analysis of qualitative mammographic features, using linear classifiers, neural networks and support vector machines. Eur J Radiol, 54(1):80-9, Apr 2005. [ bib | DOI | http | .pdf ]
Advances in modern technologies and computers have enabled digital image processing to become a vital tool in conventional clinical practice, including mammography. However, the core problem of the clinical evaluation of mammographic tumors remains a highly demanding cognitive task. In order for these automated diagnostic systems to perform in levels of sensitivity and specificity similar to that of human experts, it is essential that a robust framework on problem-specific design parameters is formulated. This study is focused on identifying a robust set of clinical features that can be used as the base for designing the input of any computer-aided diagnosis system for automatic mammographic tumor evaluation. A thorough list of clinical features was constructed and the diagnostic value of each feature was verified against current clinical practices by an expert physician. These features were directly or indirectly related to the overall morphological properties of the mammographic tumor or the texture of the fine-scale tissue structures as they appear in the digitized image, while others contained external clinical data of outmost importance, like the patient's age. The entire feature set was used as an annotation list for describing the clinical properties of mammographic tumor cases in a quantitative way, such that subsequent objective analyses were possible. For the purposes of this study, a mammographic image database was created, with complete clinical evaluation descriptions and positive histological verification for each case. All tumors contained in the database were characterized according to the identified clinical features' set and the resulting dataset was used as input for discrimination and diagnostic value analysis for each one of these features. Specifically, several standard methodologies of statistical significance analysis were employed to create feature rankings according to their discriminating power. Moreover, three different classification models, namely linear classifiers, neural networks and support vector machines, were employed to investigate the true efficiency of each one of them, as well as the overall complexity of the diagnostic task of mammographic tumor characterization. Both the statistical and the classification results have proven the explicit correlation of all the selected features with the final diagnosis, qualifying them as an adequate input base for any type of similar automated diagnosis system. The underlying complexity of the diagnostic task has justified the high value of sophisticated pattern recognition architectures.

Keywords: Algorithms, Animals, Antibiotics, Antineoplastic, Artificial Intelligence, Butadienes, Chloroplasts, Comparative Study, Computer Simulation, Computer-Assisted, Diagnosis, Disinfectants, Dose-Response Relationship, Drug, Drug Toxicity, Electrodes, Electroencephalography, Ethylamines, Expert Systems, Feedback, Fungicides, Gene Expression Profiling, Genes, Genetic Markers, Humans, Implanted, Industrial, Information Storage and Retrieval, Kidney, Kidney Tubules, MEDLINE, Male, Mercuric Chloride, Microarray Analysis, Molecular Biology, Motor Cortex, Movement, Natural Language Processing, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Plant Proteins, Predictive Value of Tests, Proteins, Proteome, Proximal, Puromycin Aminonucleoside, Rats, Reproducibility of Results, Research Support, Sprague-Dawley, Subcellular Fractions, Terminology, Therapy, Time Factors, Toxicogenetics, U.S. Gov't, User-Computer Interface, 15797296
[Lasso2005Vessel] András Lassó and Emanuele Trucco. Vessel enhancement in digital X-ray angiographic sequences by temporal statistical learning. Comput Med Imaging Graph, 29(5):343-55, Jul 2005. [ bib | DOI | http ]
Keywords: Apoptosis, Gene Expression Profiling, Humans, Neoplasms, Non-U.S. Gov't, Oligonucleotide Array Sequence Analysis, Polymerase Chain Reaction, Proteins, Research Support, Subcellular Fractions, Unknown Primary, 15893453
[Larsen2005integrative] Mette Voldby Larsen, Claus Lundegaard, Kasper Lamberth, Søren Buus, Søren Brunak, Ole Lund, and Morten Nielsen. An integrative approach to CTL epitope prediction: a combined algorithm integrating MHC class I binding, TAP transport efficiency, and proteasomal cleavage predictions. Eur. J. Immunol., 35(8):2295-2303, Aug 2005. [ bib | DOI | http ]
Reverse immunogenetic approaches attempt to optimize the selection of candidate epitopes, and thus minimize the experimental effort needed to identify new epitopes. When predicting cytotoxic T cell epitopes, the main focus has been on the highly specific MHC class I binding event. Methods have also been developed for predicting the antigen-processing steps preceding MHC class I binding, including proteasomal cleavage and transporter associated with antigen processing (TAP) transport efficiency. Here, we use a dataset obtained from the SYFPEITHI database to show that a method integrating predictions of MHC class I binding affinity, TAP transport efficiency, and C-terminal proteasomal cleavage outperforms any of the individual methods. Using an independent evaluation dataset of HIV epitopes from the Los Alamos database, the validity of the integrated method is confirmed. The performance of the integrated method is found to be significantly higher than that of the two publicly available prediction methods BIMAS and SYFPEITHI. To identify 85% of the epitopes in the HIV dataset, 9% and 10% of all possible nonamers in the HIV proteins must be tested when using the BIMAS and SYFPEITHI methods, respectively, for the selection of candidate epitopes. This number is reduced to 7% when using the integrated method. In practical terms, this means that the experimental effort needed to identify an epitope in a hypothetical protein with 85% probability is reduced by 20-30% when using the integrated method.The method is available at http://www.cbs.dtu.dk/services/NetCTL. Supplementary material is available at http://www.cbs.dtu.dk/suppl/immunology/CTL.php.

Keywords: Algorithms; Data Interpretation, Statistical; Epitopes, T-Lymphocyte; Histocompatibility Antigens Class I; Humans; Hydrolysis; Predictive Value of Tests; Proteasome Endopeptidase Complex; Protein Binding; Research Support, N.I.H., Extramural; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, P.H.S.; T-Lymphocytes, Cytotoxic
[LaBaer2005Protein] Joshua LaBaer and Niroshan Ramachandran. Protein microarrays as tools for functional proteomics. Curr Opin Chem Biol, 9(1):14-19, Feb 2005. [ bib | DOI | http ]
Protein microarrays present an innovative and versatile approach to study protein abundance and function at an unprecedented scale. Given the chemical and structural complexity of the proteome, the development of protein microarrays has been challenging. Despite these challenges there has been a marked increase in the use of protein microarrays to map interactions of proteins with various other molecules, and to identify potential disease biomarkers, especially in the area of cancer biology. In this review, we discuss some of the promising advances made in the development and use of protein microarrays.

Keywords: Protein Array Analysis; Proteins; Proteomics; Surface Properties
[Ikeda2005asymptotic] Kazushi Ikeda and Tsutomu Aoishi. An asymptotic statistical analysis of support vector machines with soft margins. Neural Netw, 18(3):251-9, Apr 2005. [ bib | DOI | http | .pdf ]
The generalization properties of support vector machines (SVMs) are examined. From a geometrical point of view, the estimated parameter of an SVM is the one nearest the origin in the convex hull formed with given examples. Since introducing soft margins is equivalent to reducing the convex hull of the examples, an SVM with soft margins has a different learning curve from the original. In this paper we derive the asymptotic average generalization error of SVMs with soft margins in simple cases, that is, only when the dimension of inputs is one, and quantitatively show that soft margins increase the generalization error.

Keywords: Apoptosis, Gene Expression Profiling, Humans, Neoplasms, Non-U.S. Gov't, Oligonucleotide Array Sequence Analysis, Polymerase Chain Reaction, Proteins, Research Support, Subcellular Fractions, Unknown Primary, 15896573
[Haasdonk2005Feature] Bernard Haasdonk. Feature space interpretation of SVMs with indefinite kernels. IEEE Trans Pattern Anal Mach Intell, 27(4):482-92, Apr 2005. [ bib | DOI | http | .pdf ]
Kernel methods are becoming increasingly popular for various kinds of machine learning tasks, the most famous being the support vector machine (SVM) for classification. The SVM is well understood when using conditionally positive definite (cpd) kernel functions. However, in practice, non-cpd kernels arise and demand application in SVMs. The procedure of "plugging" these indefinite kernels in SVMs often yields good empirical classification results. However, they are hard to interpret due to missing geometrical and theoretical understanding. In this paper, we provide a step toward the comprehension of SVM classifiers in these situations. We give a geometric interpretation of SVMs with indefinite kernel functions. We show that such SVMs are optimal hyperplane classifiers not by margin maximization, but by minimization of distances between convex hulls in pseudo-Euclidean spaces. By this, we obtain a sound framework and motivation for indefinite SVMs. This interpretation is the basis for further theoretical analysis, e.g., investigating uniqueness, and for the derivation of practical guidelines like characterizing the suitability of indefinite SVMs.

Keywords: Algorithms, Animals, Antibiotics, Antineoplastic, Artificial Intelligence, Automated, Automatic Data Processing, Butadienes, Chloroplasts, Cluster Analysis, Comparative Study, Computer Simulation, Computer-Assisted, Computing Methodologies, Database Management Systems, Databases, Diagnosis, Disinfectants, Dose-Response Relationship, Drug, Drug Toxicity, Electrodes, Electroencephalography, Ethylamines, Expert Systems, Factual, Feedback, Fungicides, Gene Expression Profiling, Genes, Genetic Markers, Humans, Image Enhancement, Image Interpretation, Implanted, Industrial, Information Storage and Retrieval, Kidney, Kidney Tubules, MEDLINE, Male, Mercuric Chloride, Microarray Analysis, Molecular Biology, Motor Cortex, Movement, Natural Language Processing, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Numerical Analysis, Pattern Recognition, Plant Proteins, Predictive Value of Tests, Proteins, Proteome, Proximal, Puromycin Aminonucleoside, Rats, Reproducibility of Results, Research Support, Sensitivity and Specificity, Signal Processing, Sprague-Dawley, Subcellular Fractions, Terminology, Therapy, Time Factors, Toxicogenetics, U.S. Gov't, User-Computer Interface, 15794155
[Golland2005Detection] Polina Golland, W. Eric L Grimson, Martha E Shenton, and Ron Kikinis. Detection and analysis of statistical differences in anatomical shape. Med Image Anal, 9(1):69-86, Feb 2005. [ bib | DOI | http ]
We present a computational framework for image-based analysis and interpretation of statistical differences in anatomical shape between populations. Applications of such analysis include understanding developmental and anatomical aspects of disorders when comparing patients versus normal controls, studying morphological changes caused by aging, or even differences in normal anatomy, for example, differences between genders. Once a quantitative description of organ shape is extracted from input images, the problem of identifying differences between the two groups can be reduced to one of the classical questions in machine learning of constructing a classifier function for assigning new examples to one of the two groups while making as few misclassifications as possible. The resulting classifier must be interpreted in terms of shape differences between the two groups back in the image domain. We demonstrate a novel approach to such interpretation that allows us to argue about the identified shape differences in anatomically meaningful terms of organ deformation. Given a classifier function in the feature space, we derive a deformation that corresponds to the differences between the two classes while ignoring shape variability within each class. Based on this approach, we present a system for statistical shape analysis using distance transforms for shape representation and the support vector machines learning algorithm for the optimal classifier estimation and demonstrate it on artificially generated data sets, as well as real medical studies.

Keywords: Algorithms, Amino Acid, Artificial Intelligence, Ascomycota, Automated, Base Sequence, Chromosome Mapping, Codon, Colonic Neoplasms, Comparative Study, Computer-Assisted, Crystallography, DNA, DNA Primers, Databases, Diagnostic Imaging, Gene Expression Profiling, Hordeum, Host-Parasite Relations, Humans, Image Interpretation, Informatics, Kinetics, Magnetic Resonance Spectroscopy, Models, Nanotechnology, Non-P.H.S., Non-U.S. Gov't, Oligonucleotide Array Sequence Analysis, P.H.S., Pattern Recognition, Plant, Plants, Predictive Value of Tests, Protein, Research Support, Selection (Genetics), Sequence Alignment, Sequence Analysis, Sequence Homology, Skin, Software, Statistical, Theoretical, Thermodynamics, U.S. Gov't, Viral Proteins, X-Ray, 15581813
[Ehlers2005NBS1] Justis P Ehlers and J. William Harbour. NBS1 expression as a prognostic marker in uveal melanoma. Clin. Cancer Res., 11(5):1849-53, Mar 2005. [ bib | DOI | http | .pdf ]
PURPOSE: Up to half of uveal melanoma patients die of metastatic disease. Treatment of the primary eye tumor does not improve survival in high-risk patients due to occult micrometastatic disease, which is present at the time of eye tumor diagnosis but is not detected and treated until months to years later. Here, we use microarray gene expression data to identify a new prognostic marker. EXPERIMENTAL DESIGN: Microarray gene expression profiles were analyzed in 25 primary uveal melanomas. Tumors were ranked by support vector machine (SVM) and by cytologic severity. Nbs1 protein expression was assessed by quantitative immunohistochemistry in 49 primary uveal melanomas. Survival was assessed using Kaplan-Meier life-table analysis. RESULTS: Expression of the Nijmegen breakage syndrome (NBS1) gene correlated strongly with SVM and cytologic tumor rankings (P < 0.0001). Further, immunohistochemistry expression of the Nbs1 protein correlated strongly with both SVM and cytologic rankings (P < 0.0001). The 6-year actuarial survival was 100% in patients with low immunohistochemistry expression of Nbs1 and 22% in those with high Nbs1 expression (P = 0.01). CONCLUSIONS: NBS1 is a strong predictor of uveal melanoma survival and potentially could be used as a clinical marker for guiding clinical management.

Keywords: 80 and over, Adult, Aged, Algorithms, Amino Acid Sequence, Amino Acids, Analysis of Variance, Animals, Area Under Curve, Artifacts, Automated, Bacteriophage T4, Base Sequence, Biological, Birefringence, Brain Chemistry, Brain Neoplasms, Cell Cycle Proteins, Comparative Study, Computational Biology, Computer-Assisted, Cornea, Cross-Sectional Studies, Databases, Decision Trees, Diagnosis, Diagnostic Imaging, Diagnostic Techniques, Discriminant Analysis, Evolution, Extramural, Face, Female, Gene Expression Profiling, Genetic, Glaucoma, Humans, Immunohistochemistry, Intraocular Pressure, Lasers, Least-Squares Analysis, Likelihood Functions, Magnetic Resonance Imaging, Magnetic Resonance Spectroscopy, Male, Markov Chains, Melanoma, Middle Aged, Models, Molecular, Mutation, N.I.H., Nerve Fibers, Non-P.H.S., Non-U.S. Gov't, Nuclear Proteins, Nucleic Acid, Nucleic Acid Conformation, Numerical Analysis, Oligonucleotide Array Sequence Analysis, Ophthalmological, Optic Nerve Diseases, Optical Coherence, P.H.S., Pattern Recognition, Photic Stimulation, Polymorphism, Prognosis, Prospective Studies, Protein, Protein Structure, Proteins, RNA, ROC Curve, Regression Analysis, Reproducibility of Results, Research Support, Retinal Ganglion Cells, Secondary, Sensitivity and Specificity, Sequence Analysis, Single Nucleotide, Single-Stranded Conformational, Software, Statistics, Survival Analysis, Tertiary, Tomography, Tumor Markers, U.S. Gov't, Untranslated, Uveal Neoplasms, Visual Fields, beta-Lactamases, 15756009
[Doyle2005PlosBiol] John Doyle and Marie Csete. Motifs, control, and stability. PLoS Biol, 3(11):e392, Nov 2005. [ bib | DOI | http ]
Keywords: Amino Acid Motifs; Bacterial Physiological Phenomena; Bacterial Proteins, chemistry; Escherichia coli, metabolism; Genes, Bacterial; Genes, Plant; Glycolysis; Heat-Shock Proteins, chemistry; Models, Biological; Models, Theoretical; Molecular Chaperones, chemistry; Plant Proteins, chemistry; Protein Interaction Mapping; Protein Structure, Tertiary; Transcription Factors, chemistry; Transcription, Genetic
[Dong2005Fast] Jian xiong Dong, Adam Krzyzak, and Ching Y Suen. Fast SVM training algorithm with decomposition on very large data sets. IEEE Trans Pattern Anal Mach Intell, 27(4):603-18, Apr 2005. [ bib ]
Training a support vector machine on a data set of huge size with thousands of classes is a challenging problem. This paper proposes an efficient algorithm to solve this problem. The key idea is to introduce a parallel optimization step to quickly remove most of the nonsupport vectors, where block diagonal matrices are used to approximate the original kernel matrix so that the original problem can be split into hundreds of subproblems which can be solved more efficiently. In addition, some effective strategies such as kernel caching and efficient computation of kernel matrix are integrated to speed up the training process. Our analysis of the proposed algorithm shows that its time complexity grows linearly with the number of classes and size of the data set. In the experiments, many appealing properties of the proposed algorithm have been investigated and the results show that the proposed algorithm has a much better scaling capability than Libsvm, SVMlight, and SVMTorch. Moreover, the good generalization performances on several large databases have also been achieved.

Keywords: Algorithms, Animals, Antibiotics, Antineoplastic, Artificial Intelligence, Automated, Automatic Data Processing, Butadienes, Chloroplasts, Comparative Study, Computer Simulation, Computer-Assisted, Database Management Systems, Databases, Diagnosis, Disinfectants, Dose-Response Relationship, Drug, Drug Toxicity, Electrodes, Electroencephalography, Ethylamines, Expert Systems, Factual, Feedback, Fungicides, Gene Expression Profiling, Genes, Genetic Markers, Humans, Image Enhancement, Image Interpretation, Implanted, Industrial, Information Storage and Retrieval, Kidney, Kidney Tubules, MEDLINE, Male, Mercuric Chloride, Microarray Analysis, Molecular Biology, Motor Cortex, Movement, Natural Language Processing, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Numerical Analysis, Pattern Recognition, Plant Proteins, Predictive Value of Tests, Proteins, Proteome, Proximal, Puromycin Aminonucleoside, Rats, Reproducibility of Results, Research Support, Sensitivity and Specificity, Signal Processing, Sprague-Dawley, Subcellular Fractions, Terminology, Therapy, Time Factors, Toxicogenetics, U.S. Gov't, User-Computer Interface, 15794164
[Dong2005Prediction] Hai-Long Dong and Yan-Fang Sui. Prediction of HLA-A2-restricted CTL epitope specific to HCC by SYFPEITHI combined with polynomial method. World J Gastroenterol, 11(2):208-211, Jan 2005. [ bib ]
AIM: To predict the HLA-A2-restricted CTL epitopes of tumor antigens associated with hepatocellular carcinoma (HCC). METHODS: MAGE-1, MAGE-3, MAGE-8, P53 and AFP were selected as objective antigens in this study for the close association with HCC. The HLA-A*0201 restricted CTL epitopes of objective tumor antigens were predicted by SYFPEITHI prediction method combined with the polynomial quantitative motifs method. The threshold of polynomial scores was set to -24. RESULTS: The SYFPEITHI prediction values of all possible nonamers of a given protein sequence were added together and the ten high-scoring peptides of each protein were chosen for further analysis in primary prediction. Thirty-five candidates of CTL epitopes (nonamers) derived from the primary prediction results were selected by analyzing with the polynomial method and compared with reported CTL epitopes. CONCLUSION: The combination of SYFPEITHI prediction method and polynomial method can improve the prediction efficiency and accuracy. These nonamers may be useful in the design of therapeutic peptide vaccine for HCC and as immunotherapeutic strategies against HCC after identified by immunology experiment.

Keywords: Amino Acid Sequence; Carcinoma, Hepatocellular; Databases, Protein; Epitopes; HLA-A2 Antigen; Humans; Liver Neoplasms; Major Histocompatibility Complex; Research Support, Non-U.S. Gov't; T-Lymphocytes, Cytotoxic
[Ding2005Minimum] Chris Ding and Hanchuan Peng. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol, 3(2):185-205, Apr 2005. [ bib ]
How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. We propose a minimum redundancy - maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 6 gene expression data sets: NCI, Lymphoma, Lung, Child Leukemia, Leukemia, and Colon. Improvements are observed consistently among 4 classification methods: Naive Bayes, Linear discriminant analysis, Logistic regression, and Support vector machines. SUPPLIMENTARY: The top 60 MRMR genes for each of the datasets are listed in http://crd.lbl.gov/ cding/MRMR/. More information related to MRMR methods can be found at http://www.hpeng.net/.

Keywords: Adult, Aged, Aging, Algorithms, Animals, Apoptosis, Artificial Intelligence, Automated, Biological, Bone Marrow, Breast Neoplasms, Classification, Cluster Analysis, Comparative Study, Computer Simulation, Computer-Assisted, Diagnosis, Dose-Response Relationship, Drug, Female, Foot, Gait, Gene Expression Profiling, Gene Expression Regulation, Gene Silencing, Genetic Vectors, Humans, Image Interpretation, Information Storage and Retrieval, Kidney, Liver, Logistic Models, Male, Messenger, Models, Myocardium, Neoplasms, Non-U.S. Gov't, Oligonucleotide Array Sequence Analysis, Pattern Recognition, Pharmaceutical Preparations, Polymerase Chain Reaction, Principal Component Analysis, Proteins, RNA, Rats, Reproducibility of Results, Research Support, Sensitivity and Specificity, Small Interfering, Sprague-Dawley, Statistical, Subcellular Fractions, Unknown Primary, 15852500
[Dhingra2005Substantial] Vikas Dhingra, Mukta Gupta, Tracy Andacht, and Zhen F Fu. New frontiers in proteomics research: a perspective. Int. J. Pharm., 299(1-2):1-18, Aug 2005. [ bib | DOI | http ]
Substantial advances have been made in the fundamental understanding of human biology, ranging from DNA structure to identification of diseases associated with genetic abnormalities. Genome sequence information is becoming available in unprecedented amounts. The absence of a direct functional correlation between gene transcripts and their corresponding proteins, however, represents a significant roadblock for improving the efficiency of biological discoveries. The success of proteomics depends on the ability to identify and analyze protein products in a cell or tissue and, this is reliant on the application of several key technologies. Proteomics is in its exponential growth phase. Two-dimensional electrophoresis complemented with mass spectrometry provides a global view of the state of the proteins from the sample. Proteins identification is a requirement to understand their functional diversity. Subtle difference in protein structure and function can contribute to complexity and diversity of life. This review focuses on the progress and the applications of proteomics science with special reference to integration of the evolving technologies involved to address biological questions.

Keywords: Computational Biology; Electrophoresis, Gel, Two-Dimensional; Humans; Peptide Mapping; Protein Interaction Mapping; Proteomics; Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization
[Cole2005Comparing] Jason C Cole, Christopher W Murray, J. Willem M Nissink, Richard D Taylor, and Robin Taylor. Comparing protein-ligand docking programs is difficult. Proteins, 60(3):325-332, Aug 2005. [ bib | DOI | http ]
There is currently great interest in comparing protein-ligand docking programs. A review of recent comparisons shows that it is difficult to draw conclusions of general applicability. Statistical hypothesis testing is required to ensure that differences in pose-prediction success rates and enrichment rates are significant. Numerical measures such as root-mean-square deviation need careful interpretation and may profitably be supplemented by interaction-based measures and visual inspection of dockings. Test sets must be of appropriate diversity and of good experimental reliability. The effects of crystal-packing interactions may be important. The method used for generating starting ligand geometries and positions may have an appreciable effect on docking results. For fair comparison, programs must be given search problems of equal complexity (e.g. binding-site regions of the same size) and approximately equal time in which to solve them. Comparisons based on rescoring require local optimization of the ligand in the space of the new objective function. Re-implementations of published scoring functions may give significantly different results from the originals. Ostensibly minor details in methodology may have a profound influence on headline success rates.

Keywords: Algorithms; Artificial Intelligence; Binding Sites; Computational Biology, methods; Computer Simulation; Crystallization; Crystallography, X-Ray; Databases, Protein; Ligands; Models, Molecular; Molecular Structure; Programming Languages; Protein Binding; Proteins, chemistry; Proteomics, methods; Reproducibility of Results; Software
[Bui2005Automated] Huynh-Hoa Bui, John Sidney, Bjoern Peters, Muthuraman Sathiamurthy, Asabe Sinichi, Kelly-Anne Purton, Bianca R Mothé, Francis V Chisari, David I Watkins, and Alessandro Sette. Automated generation and evaluation of specific mhc binding predictive tools: Arb matrix applications. Immunogenetics, 57(5):304-314, Jun 2005. [ bib | DOI | http ]
Prediction of which peptides can bind major histocompatibility complex (MHC) molecules is commonly used to assist in the identification of T cell epitopes. However, because of the large numbers of different MHC molecules of interest, each associated with different predictive tools, tool generation and evaluation can be a very resource intensive task. A methodology commonly used to predict MHC binding affinity is the matrix or linear coefficients method. Herein, we described Average Relative Binding (ARB) matrix methods that directly predict IC(50) values allowing combination of searches involving different peptide sizes and alleles into a single global prediction. A computer program was developed to automate the generation and evaluation of ARB predictive tools. Using an in-house MHC binding database, we generated a total of 85 and 13 MHC class I and class II matrices, respectively. Results from the automated evaluation of tool efficiency are presented. We anticipate that this automation framework will be generally applicable to the generation and evaluation of large numbers of MHC predictive methods and tools, and will be of value to centralize and rationalize the process of evaluation of MHC predictions. MHC binding predictions based on ARB matrices were made available at http://epitope.liai.org:8080/matrix web server.

Keywords: Animals; Binding Sites; Computer Simulation; Databases, Protein; Epitopes; Histocompatibility Antigens; Humans; Major Histocompatibility Complex; Models, Biological; Protein Binding
[Bernardo2005Chemogenomica] D. di Bernardo, M.J. Thompson, T.S. Gardner, S.E. Chobot, E.L. Eastwood, A.P. Wojtovich, S.J. Elliott, S.E. Schaus, and J.J. Collins. Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nat Biotechnol, 23(3):377-383, Mar 2005. [ bib | DOI | http ]
A major challenge in drug discovery is to distinguish the molecular targets of a bioactive compound from the hundreds to thousands of additional gene products that respond indirectly to changes in the activity of the targets. Here, we present an integrated computational-experimental approach for computing the likelihood that gene products and associated pathways are targets of a compound. This is achieved by filtering the mRNA expression profile of compound-exposed cells using a reverse-engineered model of the cell's gene regulatory network. We apply the method to a set of 515 whole-genome yeast expression profiles resulting from a variety of treatments (compounds, knockouts and induced expression), and correctly enrich for the known targets and associated pathways in the majority of compounds examined. We demonstrate our approach with PTSB, a growth inhibitory compound with a previously unknown mode of action, by predicting and validating thioredoxin and thioredoxin reductase as its target.

Keywords: Algorithms; Artificial Intelligence; Computer Simulation; Drug Delivery Systems; Drug Design; Gene Expression Profiling; Gene Expression Regulation; Models, Biological; Models, Statistical; Protein Engineering; Protein Interaction Mapping; Saccharomyces cerevisiae; Saccharomyces cerevisiae Proteins; Signal Transduction; Thioredoxin-Disulfide Reductase; Thioredoxins
[Bagos2005Evaluation] Pantelis G Bagos, Theodore D Liakopoulos, and Stavros J Hamodrakas. Evaluation of methods for predicting the topology of beta-barrel outer membrane proteins and a consensus prediction method. BMC Bioinformatics, 6(1):7, Jan 2005. [ bib | DOI | http | .pdf ]
BACKGROUND: Prediction of the transmembrane strands and topology of beta-barrel outer membrane proteins is of interest in current bioinformatics research. Several methods have been applied so far for this task, utilizing different algorithmic techniques and a number of freely available predictors exist. The methods can be grossly divided to those based on Hidden Markov Models (HMMs), on Neural Networks (NNs) and on Support Vector Machines (SVMs). In this work, we compare the different available methods for topology prediction of beta-barrel outer membrane proteins. We evaluate their performance on a non-redundant dataset of 20 beta-barrel outer membrane proteins of gram-negative bacteria, with structures known at atomic resolution. Also, we describe, for the first time, an effective way to combine the individual predictors, at will, to a single consensus prediction method. RESULTS: We assess the statistical significance of the performance of each prediction scheme and conclude that Hidden Markov Model based methods, HMM-B2TMR, ProfTMB and PRED-TMBB, are currently the best predictors, according to either the per-residue accuracy, the segments overlap measure (SOV) or the total number of proteins with correctly predicted topologies in the test set. Furthermore, we show that the available predictors perform better when only transmembrane beta-barrel domains are used for prediction, rather than the precursor full-length sequences, even though the HMM-based predictors are not influenced significantly. The consensus prediction method performs significantly better than each individual available predictor, since it increases the accuracy up to 4% regarding SOV and up to 15% in correctly predicted topologies. CONCLUSIONS: The consensus prediction method described in this work, optimizes the predicted topology with a dynamic programming algorithm and is implemented in a web-based application freely available to non-commercial users at http://bioinformatics.biol.uoa.gr/ConBBPRED.

Keywords: Algorithms, Cell Nucleus, Cytoplasm, Databases, Genetic Vectors, Humans, Internet, Mitochondria, Models, Non-U.S. Gov't, Peptides, Protein, Proteins, Proteomics, Reproducibility of Results, Research Support, Software, Theoretical, 15647112
[Bagga2005Quantitative] Harmohina Bagga, David S Greenfield, and William J Feuer. Quantitative assessment of atypical birefringence images using scanning laser polarimetry with variable corneal compensation. Am J Ophthalmol, 139(3):437-46, Mar 2005. [ bib | DOI | http | .pdf ]
PURPOSE: To define the clinical characteristics of atypical birefringence images and to describe a quantitative method for their identification. DESIGN: Prospective, comparative, clinical observational study. METHODS: Normal and glaucomatous eyes underwent complete examination, standard automated perimetry, scanning laser polarimetry with variable corneal compensation (GDx-VCC), and optical coherence tomography (OCT) of the macula, peripapillary retinal nerve fiber layer (RNFL), and optic disk. Eyes were classified into two groups: normal birefringence pattern (NBP) and atypical birefringence pattern (ABP). Clinical, functional, and structural characteristics were assessed separately. A multiple logistic regression model was used to predict eyes with ABP on the basis of a quantitative scan score generated by a support vector machine (SVM) with GDx-VCC. RESULTS: Sixty-five eyes of 65 patients were enrolled. ABP images were observed in 5 of 20 (25%) normal eyes and 23 of 45 (51%) glaucomatous eyes. Compared with eyes with NBP, glaucomatous eyes with ABP demonstrated significantly lower SVM scores (P < .0001, < 0.0001, 0.008, 0.03, and 0.03, respectively) and greater temporal, mean, inferior, and nasal RNFL thickness using GDx-VCC; and a weaker correlation with OCT generated RNFL thickness (R(2) = .75 vs .27). ABP images were significantly correlated with older age (R(2) = .16, P = .001). The SVM score was the only significant (P < .0001) predictor of ABP images and provided high discriminating power between eyes with NBP and ABP (area under the receiver operator characteristic curve = 0.98). CONCLUSIONS: ABP images exist in a subset of normal and glaucomatous eyes, are associated with older patient age, and produce an artifactual increase in RNFL thickness using GDx-VCC. The SVM score is highly predictive of ABP images.

Keywords: 80 and over, Adult, Aged, Algorithms, Amino Acids, Animals, Area Under Curve, Artifacts, Automated, Birefringence, Brain Chemistry, Brain Neoplasms, Comparative Study, Computer-Assisted, Cornea, Cross-Sectional Studies, Decision Trees, Diagnosis, Diagnostic Imaging, Diagnostic Techniques, Discriminant Analysis, Evolution, Face, Female, Genetic, Glaucoma, Humans, Intraocular Pressure, Lasers, Least-Squares Analysis, Magnetic Resonance Imaging, Magnetic Resonance Spectroscopy, Male, Middle Aged, Models, Molecular, Nerve Fibers, Non-U.S. Gov't, Numerical Analysis, Ophthalmological, Optic Nerve Diseases, Optical Coherence, P.H.S., Pattern Recognition, Photic Stimulation, Prospective Studies, Protein, ROC Curve, Regression Analysis, Research Support, Retinal Ganglion Cells, Sensitivity and Specificity, Sequence Analysis, Statistics, Tomography, U.S. Gov't, Visual Fields, beta-Lactamases, 15767051
[Asefa2005Support] Tirusew Asefa, Mariush Kemblowski, Gilberto Urroz, and Mac McKee. Support vector machines (SVMs) for monitoring network design. Ground Water, 43(3):413-22, 2005. [ bib | DOI | http | .pdf ]
In this paper we present a hydrologic application of a new statistical learning methodology called support vector machines (SVMs). SVMs are based on minimization of a bound on the generalized error (risk) model, rather than just the mean square error over a training set. Due to Mercer's conditions on the kernels, the corresponding optimization problems are convex and hence have no local minima. In this paper, SVMs are illustratively used to reproduce the behavior of Monte Carlo-based flow and transport models that are in turn used in the design of a ground water contamination detection monitoring system. The traditional approach, which is based on solving transient transport equations for each new configuration of a conductivity field, is too time consuming in practical applications. Thus, there is a need to capture the behavior of the transport phenomenon in random media in a relatively simple manner. The objective of the exercise is to maximize the probability of detecting contaminants that exceed some regulatory standard before they reach a compliance boundary, while minimizing cost (i.e., number of monitoring wells). Application of the method at a generic site showed a rather promising performance, which leads us to believe that SVMs could be successfully employed in other areas of hydrology. The SVM was trained using 510 monitoring configuration samples generated from 200 Monte Carlo flow and transport realizations. The best configurations of well networks selected by the SVM were identical with the ones obtained from the physical model, but the reliabilities provided by the respective networks differ slightly.

Keywords: Adult, Aged, Aging, Algorithms, Apoptosis, Artificial Intelligence, Automated, Computer-Assisted, Female, Foot, Gait, Gene Expression Profiling, Humans, Image Interpretation, Male, Neoplasms, Non-U.S. Gov't, Oligonucleotide Array Sequence Analysis, Pattern Recognition, Polymerase Chain Reaction, Proteins, Reproducibility of Results, Research Support, Sensitivity and Specificity, Subcellular Fractions, Unknown Primary, 15882333
[Yu2005Ovarian] J. S. Yu, S. Ongarello, R. Fiedler, X. W. Chen, G. Toffolo, C. Cobelli, and Z. Trajanoski. Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics, 21(10):2200-9, May 2005. [ bib | DOI | http | .pdf ]
MOTIVATION: High-throughput and high-resolution mass spectrometry instruments are increasingly used for disease classification and therapeutic guidance. However, the analysis of immense amount of data poses considerable challenges. We have therefore developed a novel method for dimensionality reduction and tested on a published ovarian high-resolution SELDI-TOF dataset. RESULTS: We have developed a four-step strategy for data preprocessing based on: (1) binning, (2) Kolmogorov-Smirnov test, (3) restriction of coefficient of variation and (4) wavelet analysis. Subsequently, support vector machines were used for classification. The developed method achieves an average sensitivity of 97.38% (sd = 0.0125) and an average specificity of 93.30% (sd = 0.0174) in 1000 independent k-fold cross-validations, where k = 2, ..., 10. AVAILABILITY: The software is available for academic and non-commercial institutions.

Keywords: biosvm proteomics
[Begg2005Support] Rezaul K Begg, Marimuthu Palaniswami, and Brendan Owen. Support vector machines for automated gait classification. IEEE Trans Biomed Eng, 52(5):828-38, May 2005. [ bib | DOI | http | .pdf ]
Ageing influences gait patterns causing constant threats to control of locomotor balance. Automated recognition of gait changes has many advantages including, early identification of at-risk gait and monitoring the progress of treatment outcomes. In this paper, we apply an artificial intelligence technique [support vector machines (SVM)] for the automatic recognition of young-old gait types from their respective gait-patterns. Minimum foot clearance (MFC) data of 30 young and 28 elderly participants were analyzed using a PEAK-2D motion analysis system during a 20-min continuous walk on a treadmill at self-selected walking speed. Gait features extracted from individual MFC histogram-plot and Poincaré-plot images were used to train the SVM. Cross-validation test results indicate that the generalization performance of the SVM was on average 83.3% (+/-2.9) to recognize young and elderly gait patterns, compared to a neural network's accuracy of 75.0+/-5.0%. A "hill-climbing" feature selection algorithm demonstrated that a small subset (3-5) of gait features extracted from MFC plots could differentiate the gait patterns with 90% accuracy. Performance of the gait classifier was evaluated using areas under the receiver operating characteristic plots. Improved performance of the classifier was evident when trained with reduced number of selected good features and with radial basis function kernel. These results suggest that SVMs can function as an efficient gait classifier for recognition of young and elderly gait patterns, and has the potential for wider applications in gait identification for falls-risk minimization in the elderly.

Keywords: Adult, Aged, Aging, Algorithms, Apoptosis, Artificial Intelligence, Automated, Computer-Assisted, Female, Foot, Gait, Gene Expression Profiling, Humans, Image Interpretation, Male, Neoplasms, Non-U.S. Gov't, Oligonucleotide Array Sequence Analysis, Pattern Recognition, Polymerase Chain Reaction, Proteins, Reproducibility of Results, Research Support, Sensitivity and Specificity, Subcellular Fractions, Unknown Primary, 15887532
[Surgand2006chemogenomic] Jean-Sebastien Surgand, Jordi Rodrigo, Esther Kellenberger, and Didier Rognan. A chemogenomic analysis of the transmembrane binding cavity of human g-protein-coupled receptors. Proteins, 62(2):509-538, Feb 2006. [ bib | DOI | http ]
The amino acid sequences of 369 human nonolfactory G-protein-coupled receptors (GPCRs) have been aligned at the seven transmembrane domain (TM) and used to extract the nature of 30 critical residues supposed-from the X-ray structure of bovine rhodopsin bound to retinal-to line the TM binding cavity of ground-state receptors. Interestingly, the clustering of human GPCRs from these 30 residues mirrors the recently described phylogenetic tree of full-sequence human GPCRs (Fredriksson et al., Mol Pharmacol 2003;63:1256-1272) with few exceptions. A TM cavity could be found for all investigated GPCRs with physicochemical properties matching that of their cognate ligands. The current approach allows a very fast comparison of most human GPCRs from the focused perspective of the predicted TM cavity and permits to easily detect key residues that drive ligand selectivity or promiscuity.

Keywords: Amino Acid Sequence; Binding Sites; Genomics; Humans; Ligands; Models, Molecular; Phylogeny; Receptors, G-Protein-Coupled
[Salomon2006Predicting] J. Salomon and D. R. Flower. Predicting Class II MHC-Peptide binding: a kernel based approach using similarity scores. BMC Bioinformatics, 7:501, 2006. [ bib | DOI | http ]
BACKGROUND: Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel. RESULTS: The kernel approach presented here shows increased prediction accuracy with a significantly higher number of true positives and negatives on multiple MHC class II alleles, when testing data sets from MHCPEP 1, MCHBN 2, and MHCBench 3. Evaluation by cross validation, when segregating binders and non-binders, produced an average of 0.824 AROC for the MHCBench data sets (up from 0.756), and an average of 0.96 AROC for multiple alleles of the MHCPEP database. CONCLUSION: The method improves performance over existing state-of-the-art methods of MHC class II peptide binding predictions by using a custom, knowledge-based representation of peptides. Similarity scores, in contrast to a fixed-length, pocket-specific representation of amino acids, provide a flexible and powerful way of modelling MHC binding, and can easily be applied to other dynamic sequence problems.

Keywords: Amino Acid, Binding Sites, Computational Biology, Databases, Epitope Mapping, Genetic, HLA-A Antigens, HLA-DR Antigens, Histocompatibility Antigens Class II, Humans, Peptides, Protein, Protein Binding, Protein Conformation, ROC Curve, Reproducibility of Results, Sequence Alignment, Sequence Analysis, Sequence Homology, 17105666
[Paik2006Gene] Soonmyung Paik, Gong Tang, Steven Shak, Chungyeul Kim, Joffre Baker, Wanseop Kim, Maureen Cronin, Frederick L. Baehner, Drew Watson, John Bryant, Joseph P. Costantino, Charles E Geyer, Jr, D Lawrence Wickerham, and Norman Wolmark. Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer. J Clin Oncol, 24(23):3726-3734, Aug 2006. [ bib | DOI | http ]
The 21-gene recurrence score (RS) assay quantifies the likelihood of distant recurrence in women with estrogen receptor-positive, lymph node-negative breast cancer treated with adjuvant tamoxifen. The relationship between the RS and chemotherapy benefit is not known.The RS was measured in tumors from the tamoxifen-treated and tamoxifen plus chemotherapy-treated patients in the National Surgical Adjuvant Breast and Bowel Project (NSABP) B20 trial. Cox proportional hazards models were utilized to test for interaction between chemotherapy treatment and the RS.A total of 651 patients were assessable (227 randomly assigned to tamoxifen and 424 randomly assigned to tamoxifen plus chemotherapy). The test for interaction between chemotherapy treatment and RS was statistically significant (P = .038). Patients with high-RS (> or = 31) tumors (ie, high risk of recurrence) had a large benefit from chemotherapy (relative risk, 0.26; 95% CI, 0.13 to 0.53; absolute decrease in 10-year distant recurrence rate: mean, 27.6%; SE, 8.0%). Patients with low-RS (< 18) tumors derived minimal, if any, benefit from chemotherapy treatment (relative risk, 1.31; 95% CI, 0.46 to 3.78; absolute decrease in distant recurrence rate at 10 years: mean, -1.1%; SE, 2.2%). Patients with intermediate-RS tumors did not appear to have a large benefit, but the uncertainty in the estimate can not exclude a clinically important benefit.The RS assay not only quantifies the likelihood of breast cancer recurrence in women with node-negative, estrogen receptor-positive breast cancer, but also predicts the magnitude of chemotherapy benefit.

Keywords: Adult; Aged; Antineoplastic Combined Chemotherapy Protocols, administration /&/ dosage/therapeutic use; Breast Neoplasms, drug therapy/metabolism/pathology/prevention /&/ control; Cisplatin, administration /&/ dosage; Female; Fluorouracil, administration /&/ dosage; Gene Expression Regulation, Neoplastic; Humans; Linear Models; Lymphatic Metastasis; Methotrexate, administration /&/ dosage; Middle Aged; Mitomycins, administration /&/ dosage; Neoplasm Proteins, metabolism; Neoplasm Recurrence, Local, metabolism/prevention /&/ control; Odds Ratio; Predictive Value of Tests; Prognosis; Proportional Hazards Models; Randomized Controlled Trials as Topic; Receptors, Estrogen, metabolism; Recurrence, prevention /&/ control; Reverse Transcriptase Polymerase Chain Reaction; Risk Assessment; Risk Factors; Tamoxifen, administration /&/ dosage; Tumor Markers, Biological, metabolism
[Oti2006Predicting] M. Oti, B. Snel, M. A. Huynen, and H. G. Brunner. Predicting disease genes using protein-protein interactions. J Med Genet, 43(8):691-698, Aug 2006. [ bib | DOI | http ]
BACKGROUND: The responsible genes have not yet been identified for many genetically mapped disease loci. Physically interacting proteins tend to be involved in the same cellular process, and mutations in their genes may lead to similar disease phenotypes. OBJECTIVE: To investigate whether protein-protein interactions can predict genes for genetically heterogeneous diseases. METHODS: 72,940 protein-protein interactions between 10,894 human proteins were used to search 432 loci for candidate disease genes representing 383 genetically heterogeneous hereditary diseases. For each disease, the protein interaction partners of its known causative genes were compared with the disease associated loci lacking identified causative genes. Interaction partners located within such loci were considered candidate disease gene predictions. Prediction accuracy was tested using a benchmark set of known disease genes. RESULTS: Almost 300 candidate disease gene predictions were made. Some of these have since been confirmed. On average, 10% or more are expected to be genuine disease genes, representing a 10-fold enrichment compared with positional information only. Examples of interesting candidates are AKAP6 for arrythmogenic right ventricular dysplasia 3 and SYN3 for familial partial epilepsy with variable foci. CONCLUSIONS: Exploiting protein-protein interactions can greatly increase the likelihood of finding positional candidate disease genes. When applied on a large scale they can lead to novel candidate gene predictions.

Keywords: Animals; Benchmarking; Databases, Protein; Disease; Genetic Predisposition to Disease; Humans; Protein Binding; Proteins
[Mishra2006Human] G.R. Mishra, M. Suresh, K. Kumaran, N. Kannabiran, S. Suresh, P. Bala, K. Shivakumar, N. Anuradha, R. Reddy, T.M. Raghavan, S. Menon, G. Hanumanthu, M. Gupta, S. Upendran, S. Gupta, M. Mahesh, B. Jacob, P. Mathew, P. Chatterjee, K.S. Arun, S. Sharma, K.N. Chandrika, N. Deshpande, K. Palvankar, R. Raghavnath, R. Krishnakanth, H. Karathia, B. Rekha, R. Nayak, G. Vishnupriya, H.G.M. Kumar, M. Nagini, G.S.S. Kumar, R. Jose, P. Deepthi, S.S. Mohan, GandhiT.K.B., H.C. Harsha, K.S. Deshpande, M. Sarker, T.S.K. Prasad, and A. Pandey. Human protein reference database-2006 update. Nucleic Acids Res, 34(Database issue):D411-D414, Jan 2006. [ bib | DOI | http ]
Human Protein Reference Database (HPRD) (http://www.hprd.org) was developed to serve as a comprehensive collection of protein features, post-translational modifications (PTMs) and protein-protein interactions. Since the original report, this database has increased to >20 000 proteins entries and has become the largest database for literature-derived protein-protein interactions (>30 000) and PTMs (>8000) for human proteins. We have also introduced several new features in HPRD including: (i) protein isoforms, (ii) enhanced search options, (iii) linking of pathway annotations and (iv) integration of a novel browser, GenProt Viewer (http://www.genprot.org), developed by us that allows integration of genomic and proteomic information. With the continued support and active participation by the biomedical community, we expect HPRD to become a unique source of curated information for the human proteome and spur biomedical discoveries based on integration of genomic, transcriptomic and proteomic data.

Keywords: Databases, Protein; Genomics; Humans; Internet; Protein Interaction Mapping; Protein Isoforms; Protein Processing, Post-Translational; Proteins; Proteome; Proteomics; Signal Transduction; Systems Integration; User-Computer Interface
[Ma2006MSB] Wenzhe Ma, Luhua Lai, Qi Ouyang, and Chao Tang. Robustness and modular design of the drosophila segment polarity network. Mol Syst Biol, 2:70, 2006. [ bib | DOI | http ]
Biomolecular networks have to perform their functions robustly. A robust function may have preferences in the topological structures of the underlying network. We carried out an exhaustive computational analysis on network topologies in relation to a patterning function in Drosophila embryogenesis. We found that whereas the vast majority of topologies can either not perform the required function or only do so very fragilely, a small fraction of topologies emerges as particularly robust for the function. The topology adopted by Drosophila, that of the segment polarity network, is a top ranking one among all topologies with no direct autoregulation. Furthermore, we found that all robust topologies are modular-each being a combination of three kinds of modules. These modules can be traced back to three subfunctions of the patterning function, and their combinations provide a combinatorial variability for the robust topologies. Our results suggest that the requirement of functional robustness drastically reduces the choices of viable topology to a limited set of modular combinations among which nature optimizes its choice under evolutionary and other biological constraints.

Keywords: Animals; Biological Evolution; Body Patterning; Computer Simulation; Drosophila Proteins, physiology; Drosophila melanogaster, anatomy /&/ histology/physiology; Feedback, Physiological; Gene Expression Regulation, Developmental; Genes, Insect; Models, Biological; Signal Transduction; Systems Biology, methods; Transcription Factors
[Leach2006Prediction] A. R. Leach, B. K. Shoichet, and C. E. Peishoff. Prediction of protein-ligand interactions. docking and scoring: successes and gaps. J. Med. Chem., 49(20):5851-5855, Oct 2006. [ bib | DOI | http ]
Keywords: Binding Sites; Drug Design; Ligands; Models, Molecular; Protein Binding; Proteins, chemistry; Quantitative Structure-Activity Relationship
[Kurata2006PlosCompBio] Hiroyuki Kurata, Hana El-Samad, Rei Iwasaki, Hisao Ohtake, John C Doyle, Irina Grigorova, Carol A Gross, and Mustafa Khammash. Module-based analysis of robustness tradeoffs in the heat shock response system. PLoS Comput Biol, 2(7):e59, Jul 2006. [ bib | DOI | http ]
Biological systems have evolved complex regulatory mechanisms, even in situations where much simpler designs seem to be sufficient for generating nominal functionality. Using module-based analysis coupled with rigorous mathematical comparisons, we propose that in analogy to control engineering architectures, the complexity of cellular systems and the presence of hierarchical modular structures can be attributed to the necessity of achieving robustness. We employ the Escherichia coli heat shock response system, a strongly conserved cellular mechanism, as an example to explore the design principles of such modular architectures. In the heat shock response system, the sigma-factor sigma32 is a central regulator that integrates multiple feedforward and feedback modules. Each of these modules provides a different type of robustness with its inherent tradeoffs in terms of transient response and efficiency. We demonstrate how the overall architecture of the system balances such tradeoffs. An extensive mathematical exploration nevertheless points to the existence of an array of alternative strategies for the existing heat shock response that could exhibit similar behavior. We therefore deduce that the evolutionary constraints facing the system might have steered its architecture toward one of many robustly functional solutions.

Keywords: Computer Simulation; Escherichia coli Proteins, metabolism; Escherichia coli, metabolism; Feedback, physiology; Gene Expression Regulation, Bacterial, physiology; Heat-Shock Proteins, metabolism; Heat-Shock Response, physiology; Models, Biological; Oxidative Stress, physiology; Signal Transduction, physiology; Systems Biology, methods
[Kubinyi2006Chemogenomics] H. Kubinyi. Chemogenomics in drug discovery. Ernst Schering Res Found Workshop, 58:1-19, 2006. [ bib ]
Chemogenomics is a new strategy in drug discovery which, in principle, searches for all molecules that are capable of interacting with any biological target. Because of the almost infinite number of drug-like organic molecules, this is an impossible task. Therefore chemogenomics has been defined as the investigation of classes of compounds (libraries) against families of functionally related proteins. In this definition, chemogenomics deals with the systematic analysis of chemical-biological interactions. Congeneric series of chemical analogs are probes to investigate their action on specific target classes, e.g., GPCRs, kinases, phosphodiesterases, ion channels, serine proteases, and others. Whereas such a strategy developed in pharmaceutical industry almost 20 years ago, it is now more systematically applied in the search for target- and subtype-specific ligands. The term "privileged structures" has been defined for scaffolds, such as the benzodiazepines, which very often produce biologically active analogs in a target family, in this case in the class of G-protein-coupled receptors. The SOSA approach is a strategy to modify the selectivity of biologically active compounds, generating new drug candidates from the side activities of therapeutically used drugs.

Keywords: Animals; Chemistry, Pharmaceutical; Combinatorial Chemistry Techniques; Drug Design; Drug Industry; Genomics; Humans; Models, Chemical; Molecular Structure; Mutation; Pharmacogenetics; Protein Binding
[Huebert2006Genome-wide] Dana J Huebert, Michael Kamal, Aisling O'Donovan, and Bradley E Bernstein. Genome-wide analysis of histone modifications by chip-on-chip. Methods, 40(4):365-369, Dec 2006. [ bib | DOI | http ]
Post-translational modifications to histone proteins regulate the packaging of genomic DNA into chromatin, gene activity and other functions of the genome. They are understood to play key roles in embryonic development and disease pathogenesis. Recent advances in technology have made it possible to analyze chromatin structure genome-wide in mammalian cells. Global patterns of histone modifications can be observed using a technique called ChIP-on-chip, which combines the specificity of chromatin immunoprecipitation with the unbiased, high-throughput capabilities of microarrays. The resulting maps provide insight into the functions of, and relationships between, different modifications. Here, we provide validated ChIP-on-chip methods for analyzing histone modification patterns at genome-scale in mammalian cells.

Keywords: Animals; Chromatin Immunoprecipitation; Chromosomes, Mammalian; Genomics; Histone Code; Histones; Oligonucleotide Array Sequence Analysis; Protein Processing, Post-Translational
[Huang2006Ligsite] Bingding Huang and Michael Schroeder. Ligsitecsc: predicting ligand binding sites using the connolly surface and degree of conservation. BMC Struct Biol, 6:19, 2006. [ bib | DOI | http ]
BACKGROUND: Identifying pockets on protein surfaces is of great importance for many structure-based drug design applications and protein-ligand docking algorithms. Over the last ten years, many geometric methods for the prediction of ligand-binding sites have been developed. RESULTS: We present LIGSITEcsc, an extension and implementation of the LIGSITE algorithm. LIGSITEcsc is based on the notion of surface-solvent-surface events and the degree of conservation of the involved surface residues. We compare our algorithm to four other approaches, LIGSITE, CAST, PASS, and SURFNET, and evaluate all on a dataset of 48 unbound/bound structures and 210 bound-structures. LIGSITEcsc performs slightly better than the other tools and achieves a success rate of 71% and 75%, respectively. CONCLUSION: The use of the Connolly surface leads to slight improvements, the prediction re-ranking by conservation to significant improvements of the binding site predictions. A web server for LIGSITEcsc and its source code is available at scoppi.biotec.tu-dresden.de/pocket

Keywords: Algorithms; Binding Sites; Databases, Protein; Ligands; Models, Molecular; Proteins, chemistry
[Theres2006Structural] Theres Fagerberg, Jean-Charles Cerottini, and Olivier Michielin. Structural prediction of peptides bound to MHC class I. J. Mol. Biol., 356(2):521-546, Feb 2006. [ bib | DOI | http ]
An ab initio structure prediction approach adapted to the peptide-major histocompatibility complex (MHC) class I system is presented. Based on structure comparisons of a large set of peptide-MHC class I complexes, a molecular dynamics protocol is proposed using simulated annealing (SA) cycles to sample the conformational space of the peptide in its fixed MHC environment. A set of 14 peptide-human leukocyte antigen (HLA) A0201 and 27 peptide-non-HLA A0201 complexes for which X-ray structures are available is used to test the accuracy of the prediction method. For each complex, 1000 peptide conformers are obtained from the SA sampling. A graph theory clustering algorithm based on heavy atom root-mean-square deviation (RMSD) values is applied to the sampled conformers. The clusters are ranked using cluster size, mean effective or conformational free energies, with solvation free energies computed using Generalized Born MV 2 (GB-MV2) and Poisson-Boltzmann (PB) continuum models. The final conformation is chosen as the center of the best-ranked cluster. With conformational free energies, the overall prediction success is 83% using a 1.00 Angstroms crystal RMSD criterion for main-chain atoms, and 76% using a 1.50 Angstroms RMSD criterion for heavy atoms. The prediction success is even higher for the set of 14 peptide-HLA A0201 complexes: 100% of the peptides have main-chain RMSD values < or =1.00 Angstroms and 93% of the peptides have heavy atom RMSD values < or =1.50 Angstroms. This structure prediction method can be applied to complexes of natural or modified antigenic peptides in their MHC environment with the aim to perform rational structure-based optimizations of tumor vaccines.

Keywords: , Algorithms, Amino Acid Sequence, Antibodies, Artificial Intelligence, Automated, Binding Sites, Chemical, Computer Simulation, Databases, Epitope Mapping, Genes, HLA-A Antigens, HLA-DQ Antigens, Histocompatibility Antigens Class I, Humans, Immunoassay, Immunological, MHC Class I, Models, Molecular, Molecular Sequence Data, Pattern Recognition, Peptides, Protein, Protein Binding, Protein Conformation, Protein Interaction Mapping, Protein Structure, Sequence Alignment, Sequence Analysis, Software, Tertiary, Water, 16368108
[Citri2006MolCelBiol] Ami Citri and Yosef Yarden. Egf-erbb signalling: towards the systems level. Nat Rev Mol Cell Biol, 7(7):505-516, Jul 2006. [ bib | DOI | http ]
Signalling through the ERBB/HER receptors is intricately involved in human cancer and already serves as a target for several cancer drugs. Because of its inherent complexity, it is useful to envision ERBB signalling as a bow-tie-configured, evolvable network, which shares modularity, redundancy and control circuits with robust biological and engineered systems. Because network fragility is an inevitable trade-off of robustness, systems-level understanding is expected to generate therapeutic opportunities to intercept aberrant network activation.

Keywords: Animals; Endocytosis, physiology; Epidermal Growth Factor, metabolism; Feedback, Physiological; Humans; Ligands; Models, Molecular; Oncogene Proteins v-erbB, genetics/metabolism; Phosphatidylinositol 3-Kinases, metabolism; Protein Conformation; Receptor, Epidermal Growth Factor, chemistry/genetics/metabolism; Signal Transduction, physiology
[Bulyk2006DNA] Martha L Bulyk. DNA microarray technologies for measuring protein-DNA interactions. Curr Opin Biotechnol, 17(4):422-430, Aug 2006. [ bib | DOI | http ]
DNA-binding proteins have key roles in many cellular processes, including transcriptional regulation and replication. Microarray-based technologies permit the high-throughput identification of binding sites and enable the functional roles of these binding proteins to be elucidated. In particular, microarray readout either of chromatin immunoprecipitated DNA-bound proteins (ChIP-chip) or of DNA adenine methyltransferase fusion proteins (DamID) enables the identification of in vivo genomic target sites of proteins. A complementary approach to analyse the in vitro binding of proteins directly to double-stranded DNA microarrays (protein binding microarrays; PBMs), permits rapid characterization of their DNA binding site sequence specificities. Recent advances in DNA microarray synthesis technologies have facilitated the definition of DNA-binding sites at much higher resolution and coverage, and advances in these and emerging technologies will further increase the efficiencies of these exciting new approaches.

Keywords: Animals; Chromatin Immunoprecipitation, methods; Cross-Linking Reagents, chemistry; DNA, analysis/chemistry/metabolism; DNA-Binding Proteins, analysis/genetics/metabolism; Humans; Oligonucleotide Array Sequence Analysis, methods; Protein Binding
[Bui2006Structural] H.-H. Bui, A. J. Schiewe, H. von Grafenstein, and I. S. Haworth. Structural prediction of peptides binding to MHC class I molecules. Proteins, 63(1):43-52, Apr 2006. [ bib | DOI | http ]
Peptide binding to class I major histocompatibility complex (MHCI) molecules is a key step in the immune response and the structural details of this interaction are of importance in the design of peptide vaccines. Algorithms based on primary sequence have had success in predicting potential antigenic peptides for MHCI, but such algorithms have limited accuracy and provide no structural information. Here, we present an algorithm, PePSSI (peptide-MHC prediction of structure through solvated interfaces), for the prediction of peptide structure when bound to the MHCI molecule, HLA-A2. The algorithm combines sampling of peptide backbone conformations and flexible movement of MHC side chains and is unique among other prediction algorithms in its incorporation of explicit water molecules at the peptide-MHC interface. In an initial test of the algorithm, PePSSI was used to predict the conformation of eight peptides bound to HLA-A2, for which X-ray data are available. Comparison of the predicted and X-ray conformations of these peptides gave RMSD values between 1.301 and 2.475 A. Binding conformations of 266 peptides with known binding affinities for HLA-A2 were then predicted using PePSSI. Structural analyses of these peptide-HLA-A2 conformations showed that peptide binding affinity is positively correlated with the number of peptide-MHC contacts and negatively correlated with the number of interfacial water molecules. These results are consistent with the relatively hydrophobic binding nature of the HLA-A2 peptide binding interface. In summary, PePSSI is capable of rapid and accurate prediction of peptide-MHC binding conformations, which may in turn allow estimation of MHCI-peptide binding affinity.

Keywords: Algorithms, Amino Acid Sequence, Antigens, Artificial Intelligence, Automated, Binding Sites, Chemical, Computational Biology, Computer Simulation, Crystallog, Crystallography, Electrostatics, Genes, Genetic, HLA Antigens, Histocompatibility Antigens Class I, Humans, Hydrogen Bonding, Ligands, MHC Class I, Major Histocompatibility Complex, Models, Molecular, Molecular Conformation, Molecular Sequence Data, Pattern Recognition, Peptides, Protein, Protein Binding, Protein Conformation, Proteomics, Quantitative Structure-Activity Relationship, Sequence Alignment, Sequence Analysis, Software, Structural Homology, Structure-Activity Relationship, Thermodynamics, Water, X-Ray, X-Rays, raphy, 16447245
[Bhavani2006Substructure-based] S. Bhavani, A. Nagargadde, A. Thawani, V. Sridhar, and N. Chandra. Substructure-based support vector machine classifiers for prediction of adverse effects in diverse classes of drugs. J. Chem. Inform. Model., 46(6):2478-2486, 2006. [ bib | DOI | http ]
Unforeseen adverse effects exhibited by drugs contribute heavily to late-phase failure and even withdrawal of marketed drugs. Torsade de pointes (TdP) is one such important adverse effect, which causes cardiac arrhythmia and, in some cases, sudden death, making it crucial for potential drugs to be screened for torsadogenicity. The need to tap the power of computational approaches for the prediction of adverse effects such as TdP is increasingly becoming evident. The availability of screening data including those in organized databases greatly facilitates exploration of newer computational approaches. In this paper, we report the development of a prediction method based on a support machine vector algorithm. The method uses a combination of descriptors, encoding both the type of toxicophore as well as the position of the toxicophore in the drug molecule, thus considering both the pharmacophore and the three-dimensional shape information of the molecule. For delineating toxicophores, a novel pattern-recognition method that utilizes substructures within a molecule has been developed. The results obtained using the hybrid approach have been compared with those available in the literature for the same data set. An improvement in prediction accuracy is clearly seen, with the accuracy reaching up to 97% in predicting compounds that can cause TdP and 90% for predicting compounds that do not cause TdP. The generic nature of the method has been demonstrated with four data sets available for carcinogenicity, where prediction accuracies were significantly higher, with a best receiver operating characteristics (ROC) value of 0.81 as against a best ROC value of 0.7 reported in the literature for the same data set. Thus, the method holds promise for wide applicability in toxicity prediction.

Keywords: Algorithms; Carcinogens; Chemistry, Pharmaceutical; Computational Biology; Drug Evaluation, Preclinical; Drug Industry; Humans; Models, Chemical; Models, Statistical; Neural Networks (Computer); Pattern Recognition, Automated; ROC Curve; Sequence Analysis, Protein; Software; Torsades de Pointes
[Glaser2006Method] F. Glaser, R. J. Morris, R. J. Najmanovich, R. A. Laskowski, and J. M. Thornton. A method for localizing ligand binding pockets in protein structures. Proteins, 62(2):479-488, February 2006. [ bib | DOI | http ]
The accurate identification of ligand binding sites in protein structures can be valuable in determining protein function. Once the binding site is known, it becomes easier to perform in silico and experimental procedures that may allow the ligand type and the protein function to be determined. For example, binding pocket shape analysis relies heavily on the correct localization of the ligand binding site. We have developed SURFNET-ConSurf, a modular, two-stage method for identifying the location and shape of potential ligand binding pockets in protein structures. In the first stage, the SURFNET program identifies clefts in the protein surface that are potential binding sites. In the second stage, these clefts are trimmed in size by cutting away regions distant from highly conserved residues, as defined by the ConSurf-HSSP database. The largest clefts that remain tend to be those where ligands bind. To test the approach, we analyzed a nonredundant set of 244 protein structures from the PDB and found that SURFNET-ConSurf identifies a ligand binding pocket in 75% of them. The trimming procedure reduces the original cleft volumes by 30% on average, while still encompassing an average 87% of the ligand volume. From the analysis of the results we conclude that for those cases in which the ligands are found in large, highly conserved clefts, the combined SURFNET-ConSurf method gives pockets that are a better match to the ligand shape and location. We also show that this approach works better for enzymes than for nonenzyme proteins.

Keywords: ligand-volume, protein-ligand, surface
[Valentini2007Mosclust:] Giorgio Valentini. Mosclust: a software library for discovering significant structures in bio-molecular data. Bioinformatics, 23(3):387-389, Feb 2007. [ bib | DOI | http ]
The R package mosclust (model order selection for clustering problems) implements algorithms based on the concept of stability for discovering significant structures in bio-molecular data. The software library provides stability indices obtained through different data perturbations methods (resampling, random projections, noise injection), as well as statistical tests to assess the significance of multi-level structures singled out from the data. Availability: http://homes.dsi.unimi.it/ valenti/SW/mosclust/download/mosclust_1.0.tar.gz. Supplementary information: http://homes.dsi.unimi.it/ valenti/SW/mosclust.

Keywords: Algorithms; Artificial Intelligence; Cluster Analysis; Gene Expression Profiling, methods; Oligonucleotide Array Sequence Analysis, methods; Pattern Recognition, Automated, methods; Programming Languages; Proteome, metabolism; Signal Transduction, physiology; Software
[Tung2007POPI:] Chun-Wei Tung and Shinn-Ying Ho. Popi: predicting immunogenicity of mhc class i binding peptides by mining informative physicochemical properties. Bioinformatics, 23(8):942-949, Apr 2007. [ bib | DOI | http ]
MOTIVATION: Both modeling of antigen-processing pathway including major histocompatibility complex (MHC) binding and immunogenicity prediction of those MHC-binding peptides are essential to develop a computer-aided system of peptide-based vaccine design that is one goal of immunoinformatics. Numerous studies have dealt with modeling the immunogenic pathway but not the intractable problem of immunogenicity prediction due to complex effects of many intrinsic and extrinsic factors. Moderate affinity of the MHC-peptide complex is essential to induce immune responses, but the relationship between the affinity and peptide immunogenicity is too weak to use for predicting immunogenicity. This study focuses on mining informative physicochemical properties from known experimental immunogenicity data to understand immune responses and predict immunogenicity of MHC-binding peptides accurately. RESULTS: This study proposes a computational method to mine a feature set of informative physicochemical properties from MHC class I binding peptides to design a support vector machine (SVM) based system (named POPI) for the prediction of peptide immunogenicity. High performance of POPI arises mainly from an inheritable bi-objective genetic algorithm, which aims to automatically determine the best number m out of 531 physicochemical properties, identify these m properties and tune SVM parameters simultaneously. The dataset consisting of 428 human MHC class I binding peptides belonging to four classes of immunogenicity was established from MHCPEP, a database of MHC-binding peptides (Brusic et al., 1998). POPI, utilizing the m = 23 selected properties, performs well with the accuracy of 64.72% using leave-one-out cross-validation, compared with two sequence alignment-based prediction methods ALIGN (54.91%) and PSI-BLAST (53.23%). POPI is the first computational system for prediction of peptide immunogenicity based on physicochemical properties. AVAILABILITY: A web server for prediction of peptide immunogenicity (POPI) and the used dataset of MHC class I binding peptides (PEPMHCI) are available at http://iclab.life.nctu.edu.tw/POPI

Keywords: Algorithms; Artificial Intelligence; Binding Sites; Epitope Mapping; Histocompatibility Antigens Class I; Oligopeptides; Pattern Recognition, Automated; Protein Binding; Software; Structure-Activity Relationship
[Taylor2007minimum] Chris F Taylor, Norman W Paton, Kathryn S Lilley, Pierre-Alain Binz, Randall K Julian, Andrew R Jones, Weimin Zhu, Rolf Apweiler, Ruedi Aebersold, Eric W Deutsch, Michael J Dunn, Albert J R Heck, Alexander Leitner, Marcus Macht, Matthias Mann, Lennart Martens, Thomas A Neubert, Scott D Patterson, Peipei Ping, Sean L Seymour, Puneet Souda, Akira Tsugita, Joel Vandekerckhove, Thomas M Vondriska, Julian P Whitelegge, Marc R Wilkins, Ioannnis Xenarios, John R Yates, and Henning Hermjakob. The minimum information about a proteomics experiment (miape). Nat Biotechnol, 25(8):887-893, Aug 2007. [ bib | DOI | http ]
Both the generation and the analysis of proteomics data are now widespread, and high-throughput approaches are commonplace. Protocols continue to increase in complexity as methods and technologies evolve and diversify. To encourage the standardized collection, integration, storage and dissemination of proteomics data, the Human Proteome Organization's Proteomics Standards Initiative develops guidance modules for reporting the use of techniques such as gel electrophoresis and mass spectrometry. This paper describes the processes and principles underpinning the development of these modules; discusses the ramifications for various interest groups such as experimentalists, funders, publishers and the private sector; addresses the issue of overlap with other reporting guidelines; and highlights the criticality of appropriate tools and resources in enabling 'MIAPE-compliant' reporting.

Keywords: Databases, Protein; Gene Expression Profiling; Genome, Human; Guidelines as Topic; Humans; Information Storage and Retrieval; Internationality; Proteomics; Research
[Rhodes2007Oncomine] Daniel R. Rhodes, Shanker Kalyana-Sundaram, Vasudeva Mahavisno, Radhika Varambally, Jianjun Yu, Benjamin B. Briggs, Terrence R. Barrette, Matthew J. Anstet, Colleen Kincead-Beal, Prakash Kulkarni, Sooryanaryana Varambally, Debashis Ghosh, and Arul M. Chinnaiyan. Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia, 9(2):166-180, Feb 2007. [ bib ]
DNA microarrays have been widely applied to cancer transcriptome analysis; however, the majority of such data are not easily accessible or comparable. Furthermore, several important analytic approaches have been applied to microarray analysis; however, their application is often limited. To overcome these limitations, we have developed Oncomine, a bioinformatics initiative aimed at collecting, standardizing, analyzing, and delivering cancer transcriptome data to the biomedical research community. Our analysis has identified the genes, pathways, and networks deregulated across 18,000 cancer gene expression microarrays, spanning the majority of cancer types and subtypes. Here, we provide an update on the initiative, describe the database and analysis modules, and highlight several notable observations. Results from this comprehensive analysis are available at http://www.oncomine.org.

Keywords: Antineoplastic Agents, pharmacology; Automatic Data Processing; Chromosome Mapping; Chromosomes, Human, genetics; Computational Biology, organization /&/ administration; Data Collection; Data Display; Data Interpretation, Statistical; Databases, Genetic; Drug Design; Gene Expression Profiling, statistics /&/ numerical data; Gene Expression Regulation, Neoplastic; Genes, Neoplasm; Humans; Internet; Models, Biological; Neoplasm Proteins, biosynthesis/chemistry/genetics; Neoplasms, classification/genetics/metabolism; Oligonucleotide Array Sequence Analysis; Subtraction Technique; Transcription, Genetic
[Morris2007Identification] Stephanie A Morris, Bhargavi Rao, Benjamin A Garcia, Sandra B Hake, Robert L Diaz, Jeffrey Shabanowitz, Donald F Hunt, C. David Allis, Jason D Lieb, and Brian D Strahl. Identification of histone h3 lysine 36 acetylation as a highly conserved histone modification. J Biol Chem, 282(10):7632-7640, Mar 2007. [ bib | DOI | http ]
Histone lysine acetylation is a major mechanism by which cells regulate the structure and function of chromatin, and new sites of acetylation continue to be discovered. Here we identify and characterize histone H3K36 acetylation (H3K36ac). By mass spectrometric analyses of H3 purified from Tetrahymena thermophila and Saccharomyces cerevisiae (yeast), we find that H3K36 can be acetylated or methylated. Using an antibody specific to H3K36ac, we show that this modification is conserved in mammals. In yeast, genome-wide ChIP-chip experiments show that H3K36ac is localized predominantly to the promoters of RNA polymerase II-transcribed genes, a pattern inversely related to that of H3K36 methylation. The pattern of H3K36ac localization is similar to that of other sites of H3 acetylation, including H3K9ac and H3K14ac. Using histone acetyltransferase complexes purified from yeast, we show that the Gcn5-containing SAGA complex that regulates transcription specifically acetylates H3K36 in vitro. Deletion of GCN5 completely abolishes H3K36ac in vivo. These data expand our knowledge of the genomic targets of Gcn5, show H3K36ac is highly conserved, and raise the intriguing possibility that the transition between H3K36ac and H3K36me acts as an "acetyl/methyl switch" governing chromatin function along transcription units.

Keywords: Acetylation; Amino Acid Sequence; Animals; Chromatin Immunoprecipitation; Conserved Sequence; Histone Acetyltransferases, physiology; Histones, chemistry; Humans; Lysine; Methylation; Mice; Molecular Sequence Data; Promoter Regions, Genetic; Saccharomyces cerevisiae Proteins, physiology; Saccharomyces cerevisiae, chemistry; Tetrahymena, chemistry
[Kroemer2007Structure] Romano T Kroemer. Structure-based drug design: docking and scoring. Curr. Protein Pept. Sci., 8(4):312-328, Aug 2007. [ bib ]
This review gives an introduction into ligand - receptor docking and illustrates the basic underlying concepts. An overview of different approaches and algorithms is provided. Although the application of docking and scoring has led to some remarkable successes, there are still some major challenges ahead, which are outlined here as well. Approaches to address some of these challenges and the latest developments in the area are presented. Some aspects of the assessment of docking program performance are discussed. A number of successful applications of structure-based virtual screening are described.

Keywords: Algorithms; Artificial Intelligence; Computational Biology; Computer Simulation; Computer-Aided Design; Drug Design; Imaging, Three-Dimensional; Ligands; Models, Molecular; Protein Binding; Protein Conformation; Software; Structure-Activity Relationship
[Kahraman2007Shape] A. Kahraman, R. J. Morris, R. A. Laskowski, and J. M. Thornton. Shape variation in protein binding pockets and their ligands. J. Mol. Biol., 368(1):283-301, Apr 2007. [ bib | DOI | http ]
A common assumption about the shape of protein binding pockets is that they are related to the shape of the small ligand molecules that can bind there. But to what extent is that assumption true? Here we use a recently developed shape matching method to compare the shapes of protein binding pockets to the shapes of their ligands. We find that pockets binding the same ligand show greater variation in their shapes than can be accounted for by the conformational variability of the ligand. This suggests that geometrical complementarity in general is not sufficient to drive molecular recognition. Nevertheless, we show when considering only shape and size that a significant proportion of the recognition power of a binding pocket for its ligand resides in its shape. Additionally, we observe a "buffer zone" or a region of free space between the ligand and protein, which results in binding pockets being on average three times larger than the ligand that they bind.

Keywords: Binding Sites; Computer Simulation; Ligands; Models, Molecular; Models, Statistical; Protein Binding; Protein Conformation; Protein Folding
[Garcia2007Organismal] Benjamin A Garcia, Sandra B Hake, Robert L Diaz, Monika Kauer, Stephanie A Morris, Judith Recht, Jeffrey Shabanowitz, Nilamadhab Mishra, Brian D Strahl, C. David Allis, and Donald F Hunt. Organismal differences in post-translational modifications in histones h3 and h4. J Biol Chem, 282(10):7641-7655, Mar 2007. [ bib | DOI | http ]
Post-translational modifications (PTMs) of histones play an important role in many cellular processes, notably gene regulation. Using a combination of mass spectrometric and immunobiochemical approaches, we show that the PTM profile of histone H3 differs significantly among the various model organisms examined. Unicellular eukaryotes, such as Saccharomyces cerevisiae (yeast) and Tetrahymena thermophila (Tet), for example, contain more activation than silencing marks as compared with mammalian cells (mouse and human), which are generally enriched in PTMs more often associated with gene silencing. Close examination reveals that many of the better-known modified lysines (Lys) can be either methylated or acetylated and that the overall modification patterns become more complex from unicellular eukaryotes to mammals. Additionally, novel species-specific H3 PTMs from wild-type asynchronously grown cells are also detected by mass spectrometry. Our results suggest that some PTMs are more conserved than previously thought, including H3K9me1 and H4K20me2 in yeast and H3K27me1, -me2, and -me3 in Tet. On histone H4, methylation at Lys-20 showed a similar pattern as H3 methylation at Lys-9, with mammals containing more methylation than the unicellular organisms. Additionally, modification profiles of H4 acetylation were very similar among the organisms examined.

Keywords: Acetylation; Animals; Hela Cells; Histones, chemistry/metabolism; Humans; Methylation; Mice; NIH 3T3 Cells; Protein Processing, Post-Translational; Saccharomyces cerevisiae, metabolism; Species Specificity; Tandem Mass Spectrometry; Tetrahymena, metabolism
[Dalton2007Evaluation] James A R Dalton and Richard M Jackson. An evaluation of automated homology modelling methods at low target template sequence similarity. Bioinformatics, 23(15):1901-1908, Aug 2007. [ bib | DOI | http ]
MOTIVATION: There are two main areas of difficulty in homology modelling that are particularly important when sequence identity between target and template falls below 50%: sequence alignment and loop building. These problems become magnified with automatic modelling processes, as there is no human input to correct mistakes. As such we have benchmarked several stand-alone strategies that could be implemented in a workflow for automated high-throughput homology modelling. These include three new sequence-structure alignment programs: 3D-Coffee, Staccato and SAlign, plus five homology modelling programs and their respective loop building methods: Builder, Nest, Modeller, SegMod/ENCAD and Swiss-Model. The SABmark database provided 123 targets with at least five templates from the same SCOP family and sequence identities </=50%. RESULTS: When using Modeller as the common modelling program, 3D-Coffee outperforms Staccato and SAlign using both multiple templates and the best single template, and across the sequence identity range 20-50%. The mean model RMSD generated from 3D-Coffee using multiple templates is 15 and 28% (or using single templates, 3 and 13%) better than those generated by Staccato and Salign, respectively. 3D-Coffee gives equivalent modelling accuracy from multiple and single templates, but Staccato and SAlign are more successful with single templates, their quality deteriorating as additional lower sequence identity templates are added. Evaluating the different homology modelling programs, on average Modeller performs marginally better in overall modelling than the others tested. However, on average Nest produces the best loops with an 8% improvement by mean RMSD compared to the loops generated by Builder.

Keywords: Algorithms; Amino Acid Sequence; Computer Simulation; Models, Chemical; Models, Molecular; Molecular Sequence Data; Proteins, chemistry; Reproducibility of Results; Sensitivity and Specificity; Sequence Alignment, methods; Sequence Analysis, Protein, methods; Software; Software Validation
[Bock2007Effective] Mary Ellen Bock, Claudio Garutti, and Conettina Guerra. Effective labeling of molecular surface points for cavity detection and location of putative binding sites. Comput Syst Bioinformatics Conf, 6:263-274, 2007. [ bib ]
We present a method for detecting and comparing cavities on protein surfaces that is useful for protein binding site recognition. The method is based on a representation of the protein structures by a collection of spin-images and their associated spin-image profiles. Results of the cavity detection procedure are presented for a large set of non-redundant proteins and compared with SURFNET-ConSurf. Our comparison method is used to find a surface region in one cavity of a protein that is geometrically similar to a surface region in the cavity of another protein. Such a finding would be an indication that the two regions likely bind to the same ligand. Our overall approach for cavity detection and comparison is benchmarked on several pairs of known complexes, obtaining a good coverage of the atoms of the binding sites.

Keywords: Binding Sites; Computer Simulation; Models, Chemical; Models, Molecular; Protein Binding; Protein Conformation; Protein Folding; Proteins, chemistry/ultrastructure; Sequence Analysis, Protein, methods; Surface Properties
[Bantscheff2007Quantitative] Marcus Bantscheff, Markus Schirle, Gavain Sweetman, Jens Rick, and Bernhard Kuster. Quantitative mass spectrometry in proteomics: a critical review. Anal Bioanal Chem, 389(4):1017-1031, Oct 2007. [ bib | DOI | http ]
The quantification of differences between two or more physiological states of a biological system is among the most important but also most challenging technical tasks in proteomics. In addition to the classical methods of differential protein gel or blot staining by dyes and fluorophores, mass-spectrometry-based quantification methods have gained increasing popularity over the past five years. Most of these methods employ differential stable isotope labeling to create a specific mass tag that can be recognized by a mass spectrometer and at the same time provide the basis for quantification. These mass tags can be introduced into proteins or peptides (i) metabolically, (ii) by chemical means, (iii) enzymatically, or (iv) provided by spiked synthetic peptide standards. In contrast, label-free quantification approaches aim to correlate the mass spectrometric signal of intact proteolytic peptides or the number of peptide sequencing events with the relative or absolute protein quantity directly. In this review, we critically examine the more commonly used quantitative mass spectrometry methods for their individual merits and discuss challenges in arriving at meaningful interpretations of quantitative proteomic data.

Keywords: Automatic Data Processing; Isotope Labeling; Mass Spectrometry; Peptides; Proteins; Proteome; Proteomics; Reference Standards
[Qiu2007structural] J. Qiu, J. Hue, A. Ben-Hur, J.-P. Vert, and W. S. Noble. A structural alignment kernel for protein structures. Bioinformatics, 23(9):1090-1098, May 2007. [ bib | DOI | http ]
MOTIVATION: This work aims to develop computational methods to annotate protein structures in an automated fashion. We employ a support vector machine (SVM) classifier to map from a given class of structures to their corresponding structural (SCOP) or functional (Gene Ontology) annotation. In particular, we build upon recent work describing various kernels for protein structures, where a kernel is a similarity function that the classifier uses to compare pairs of structures. RESULTS: We describe a kernel that is derived in a straightforward fashion from an existing structural alignment program, MAMMOTH. We find in our benchmark experiments that this kernel significantly out-performs a variety of other kernels, including several previously described kernels. Furthermore, in both benchmarks, classifying structures using MAMMOTH alone does not work as well as using an SVM with the MAMMOTH kernel. AVAILABILITY: http://noble.gs.washington.edu/proj/3dkernel

Keywords: Algorithms; Amino Acid Sequence; Artificial Intelligence; Molecular Sequence Data; Pattern Recognition, Automated; Proteins; Sequence Alignment; Sequence Analysis, Protein; Sequence Homology, Amino Acid
[Jin2007yeast] Fulai Jin, Larisa Avramova, Jing Huang, and Tony Hazbun. A yeast two-hybrid smart-pool-array system for protein-interaction mapping. Nat Methods, 4(5):405-407, May 2007. [ bib | DOI | http ]
We present here a new two-hybrid smart pool array (SPA) system in which, instead of individual activation domain strains, well-designed activation domain pools are screened in an array format that allows built-in replication and prey-bait deconvolution. Using this method, a Saccharomyces cerevisiae genome SPA increases yeast two-hybrid screening efficiency by an order of magnitude.

Keywords: Genome, Fungal; Protein Interaction Mapping; Saccharomyces cerevisiae; Saccharomyces cerevisiae Proteins; Two-Hybrid System Techniques
[Wu2008Network-based] X. Wu, R. Jiang, M.Q. Zhang, and S. Li. Network-based global inference of human disease genes. Mol. Syst. Biol., 4:189, 2008. [ bib | DOI | http ]
Deciphering the genetic basis of human diseases is an important goal of biomedical research. On the basis of the assumption that phenotypically similar diseases are caused by functionally related genes, we propose a computational framework that integrates human protein-protein interactions, disease phenotype similarities, and known gene-phenotype associations to capture the complex relationships between phenotypes and genotypes. We develop a tool named CIPHER to predict and prioritize disease genes, and we show that the global concordance between the human protein network and the phenotype network reliably predicts disease genes. Our method is applicable to genetically uncharacterized phenotypes, effective in the genome-wide scan of disease genes, and also extendable to explore gene cooperativity in complex diseases. The predicted genetic landscape of over 1000 human phenotypes, which reveals the global modular organization of phenotype-genotype relationships. The genome-wide prioritization of candidate genes for over 5000 human phenotypes, including those with under-characterized disease loci or even those lacking known association, is publicly released to facilitate future discovery of disease genes.

Keywords: BRCA1 Protein; Bias (Epidemiology); Breast Neoplasms; Disease; Female; Gene Regulatory Networks; Genes; Genome, Human; Genotype; Humans; Linkage (Genetics); Phenotype; Software
[Weis2008Structural] William I Weis and Brian K Kobilka. Structural insights into G-protein-coupled receptor activation. Curr Opin Struct Biol, 18(6):734-740, Dec 2008. [ bib | DOI | http ]
G-protein-coupled receptors (GPCRs) are the largest family of eukaryotic plasma membrane receptors, and are responsible for the majority of cellular responses to external signals. GPCRs share a common architecture comprising seven transmembrane (TM) helices. Binding of an activating ligand enables the receptor to catalyze the exchange of GTP for GDP in a heterotrimeric G protein. GPCRs are in a conformational equilibrium between inactive and activating states. Crystallographic and spectroscopic studies of the visual pigment rhodopsin and two beta-adrenergic receptors have defined some of the conformational changes associated with activation.

Keywords: Animals; Crystallography; Humans; Membrane Proteins; Models, Molecular; Receptors, Adrenergic, beta; Receptors, G-Protein-Coupled; Rhodopsin
[Taylor2008Guidelines] Chris F Taylor, Pierre-Alain Binz, Ruedi Aebersold, Michel Affolter, Robert Barkovich, Eric W Deutsch, David M Horn, Andreas Hühmer, Martin Kussmann, Kathryn Lilley, Marcus Macht, Matthias Mann, Dieter Müller, Thomas A Neubert, Janice Nickson, Scott D Patterson, Roberto Raso, Kathryn Resing, Sean L Seymour, Akira Tsugita, Ioannis Xenarios, Rong Zeng, and Randall K Julian. Guidelines for reporting the use of mass spectrometry in proteomics. Nat Biotechnol, 26(8):860-861, Aug 2008. [ bib | DOI | http ]
Keywords: Databases, Protein; Guidelines as Topic; Mass Spectrometry; Proteomics
[Suter2008Two-hybrid] Bernhard Suter, Saranya Kittanakom, and Igor Stagljar. Two-hybrid technologies in proteomics research. Curr Opin Biotechnol, 19(4):316-323, Aug 2008. [ bib | DOI | http ]
Given that protein-protein interactions (PPIs) regulate nearly every living process; the exploration of global and pathway-specific protein interaction networks is expected to have major implications in the understanding of diseases and for drug discovery. Consequently, the development and application of methodologies that address physical associations among proteins is of major importance in today's proteomics research. The most widely and successfully used methodology to assess PPIs is the yeast two-hybrid system (YTH). Here we present an overview on the current applications of YTH and variant technologies in yeast and mammalian systems. Two-hybrid-based methods will not only continue to have a dominant role in the assessment of protein interactomes but will also become important in the development of novel compounds that target protein interaction interfaces for therapeutic intervention.

Keywords: Animals; Drug Design; Mammals; Proteomics; Two-Hybrid System Techniques
[Schalon2008Simple] C. Schalon, J-S. Surgand, E. Kellenberger, and D. Rognan. A simple and fuzzy method to align and compare druggable ligand-binding sites. Proteins, 71(4):1755-1778, Jun 2008. [ bib | DOI | http ]
A novel method to measure distances between druggable protein cavities is presented. Starting from user-defined ligand binding sites, eight topological and physicochemical properties are projected from cavity-lining protein residues to an 80 triangle-discretised sphere placed at the centre of the binding site, thus defining a cavity fingerprint. Representing binding site properties onto a discretised sphere presents many advantages: (i) a normalised distance between binding sites of different sizes may be easily derived by summing up the normalised differences between the 8 computed descriptors; (ii) a structural alignment of two proteins is simply done by systematically rotating/translating one mobile sphere around one immobile reference; (iii) a certain degree of fuzziness in the comparison is reached by projecting global amino acid properties (e.g., charge, size, functional groups count, distance to the site centre) independently of local rotameric/tautomeric states of cavity-lining residues. The method was implemented in a new program (SiteAlign) and tested in a number of various scenarios: measuring the distance between 376 related active site pairs, computing the cross-similarity of members of a protein family, predicting the targets of ligands with various promiscuity levels. The proposed method is robust enough to detect local similarity among active sites of different sizes, to discriminate between protein subfamilies and to recover the known targets of promiscuous ligands by virtual screening.

Keywords: Algorithms; Amino Acid Sequence; Binding Sites, drug effects; Drug Design; Hydrogen Bonding; Ligands; Protein Binding; Sequence Alignment; Structure-Activity Relationship
[Moitessier2008Towards] N. Moitessier, P. Englebienne, D. Lee, J. Lawandi, and C. R. Corbeil. Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go. Br. J. Pharmacol., 153 Suppl 1:S7-26, Mar 2008. [ bib | DOI | http ]
Accelerating the drug discovery process requires predictive computational protocols capable of reducing or simplifying the synthetic and/or combinatorial challenge. Docking-based virtual screening methods have been developed and successfully applied to a number of pharmaceutical targets. In this review, we first present the current status of docking and scoring methods, with exhaustive lists of these. We next discuss reported comparative studies, outlining criteria for their interpretation. In the final section, we describe some of the remaining developments that would potentially lead to a universally applicable docking/scoring method.

Keywords: Algorithms; Animals; Artificial Intelligence; Computer Simulation; Drug Evaluation, Preclinical, methods; Humans; Metals, chemistry; Models, Molecular; Molecular Conformation; Nucleic Acids, chemistry/drug effects; Proteins, chemistry/drug effects; Reproducibility of Results; Stochastic Processes
[Launay2008Homology] G. Launay and T. Simonson. Homology modelling of protein-protein complexes: a simple method and its possibilities and limitations. BMC Bioinformatics, 9:427, 2008. [ bib | DOI | http ]
BACKGROUND: Structure-based computational methods are needed to help identify and characterize protein-protein complexes and their function. For individual proteins, the most successful technique is homology modelling. We investigate a simple extension of this technique to protein-protein complexes. We consider a large set of complexes of known structures, involving pairs of single-domain proteins. The complexes are compared with each other to establish their sequence and structural similarities and the relation between the two. Compared to earlier studies, a simpler dataset, a simpler structural alignment procedure, and an additional energy criterion are used. Next, we compare the Xray structures to models obtained by threading the native sequence onto other, homologous complexes. An elementary requirement for a successful energy function is to rank the native structure above any threaded structure. We use the DFIREbeta energy function, whose quality and complexity are typical of the models used today. Finally, we compare near-native models to distinctly non-native models. RESULTS: If weakly stable complexes are excluded (defined by a binding energy cutoff), as well as a few unusual complexes, a simple homology principle holds: complexes that share more than 35% sequence identity share similar structures and interaction modes; this principle was less clearcut in earlier studies. The energy function was then tested for its ability to identify experimental structures among sets of decoys, produced by a simple threading procedure. On average, the experimental structure is ranked above 92% of the alternate structures. Thus, discrimination of the native structure is good but not perfect. The discrimination of near-native structures is fair. Typically, a single, alternate, non-native binding mode exists that has a native-like energy. Some of the associated failures may correspond to genuine, alternate binding modes and/or native complexes that are artefacts of the crystal environment. In other cases, additional model filtering with more sophisticated tools is needed. CONCLUSION: The results suggest that the simple modelling procedure applied here could help identify and characterize protein-protein complexes. The next step is to apply it on a genomic scale.

Keywords: Algorithms; Protein Binding; Protein Conformation; Protein Interaction Domains and Motifs; Proteins, chemistry/metabolism; Structural Homology, Protein
[Kohler2008Walking] S. Köhler, S. Bauer, D. Horn, and P.N. Robinson. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet., 82(4):949-958, Apr 2008. [ bib | DOI | http ]
The identification of genes associated with hereditary disorders has contributed to improving medical care and to a better understanding of gene functions, interactions, and pathways. However, there are well over 1500 Mendelian disorders whose molecular basis remains unknown. At present, methods such as linkage analysis can identify the chromosomal region in which unknown disease genes are located, but the regions could contain up to hundreds of candidate genes. In this work, we present a method for prioritization of candidate genes by use of a global network distance measure, random walk analysis, for definition of similarities in protein-protein interaction networks. We tested our method on 110 disease-gene families with a total of 783 genes and achieved an area under the ROC curve of up to 98% on simulated linkage intervals of 100 genes surrounding the disease gene, significantly outperforming previous methods based on local distance measures. Our results not only provide an improved tool for positional-cloning projects but also add weight to the assumption that phenotypically similar diseases are associated with disturbances of subnetworks within the larger protein interactome that extend beyond the disease proteins themselves.

Keywords: Animals; Chromosome Mapping; Computational Biology; Databases, Genetic; Genetic Diseases, Inborn; Genetic Predisposition to Disease; Humans; Internet; Linkage (Genetics); Mice; Pedigree; Protein Interaction Mapping; Software
[Ala2008Prediction] U. Ala, R.M. Piro, E. Grassi, C. Damasco, L. Silengo, M. Oti, P. Provero, and F. Di Cunto. Prediction of human disease genes by human-mouse conserved coexpression analysis. PLoS Comput. Biol., 4(3):e1000043, Mar 2008. [ bib | DOI | http ]
BACKGROUND: Even in the post-genomic era, the identification of candidate genes within loci associated with human genetic diseases is a very demanding task, because the critical region may typically contain hundreds of positional candidates. Since genes implicated in similar phenotypes tend to share very similar expression profiles, high throughput gene expression data may represent a very important resource to identify the best candidates for sequencing. However, so far, gene coexpression has not been used very successfully to prioritize positional candidates. METHODOLOGY/PRINCIPAL FINDINGS: We show that it is possible to reliably identify disease-relevant relationships among genes from massive microarray datasets by concentrating only on genes sharing similar expression profiles in both human and mouse. Moreover, we show systematically that the integration of human-mouse conserved coexpression with a phenotype similarity map allows the efficient identification of disease genes in large genomic regions. Finally, using this approach on 850 OMIM loci characterized by an unknown molecular basis, we propose high-probability candidates for 81 genetic diseases. CONCLUSION: Our results demonstrate that conserved coexpression, even at the human-mouse phylogenetic distance, represents a very strong criterion to predict disease-relevant relationships among human genes.

Keywords: Algorithms; Animals; Biological Markers; Chromosome Mapping; Conserved Sequence; Diagnosis, Computer-Assisted; Gene Expression Profiling; Genetic Diseases, Inborn; Genetic Predisposition to Disease; Humans; Mice; Proteome
[Xie2009Unified] Lei Xie, Li Xie, and Philip E Bourne. A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics, 25(12):i305-i312, Jun 2009. [ bib | DOI | http ]
Functional relationships between proteins that do not share global structure similarity can be established by detecting their ligand-binding-site similarity. For a large-scale comparison, it is critical to accurately and efficiently assess the statistical significance of this similarity. Here, we report an efficient statistical model that supports local sequence order independent ligand-binding-site similarity searching. Most existing statistical models only take into account the matching vertices between two sites that are defined by a fixed number of points. In reality, the boundary of the binding site is not known or is dependent on the bound ligand making these approaches limited. To address these shortcomings and to perform binding-site mapping on a genome-wide scale, we developed a sequence-order independent profile-profile alignment (SOIPPA) algorithm that is able to detect local similarity between unknown binding sites a priori. The SOIPPA scoring integrates geometric, evolutionary and physical information into a unified framework. However, this imposes a significant challenge in assessing the statistical significance of the similarity because the conventional probability model that is based on fixed-point matching cannot be applied. Here we find that scores for binding-site matching by SOIPPA follow an extreme value distribution (EVD). Benchmark studies show that the EVD model performs at least two-orders faster and is more accurate than the non-parametric statistical method in the previous SOIPPA version. Efficient statistical analysis makes it possible to apply SOIPPA to genome-based drug discovery. Consequently, we have applied the approach to the structural genome of Mycobacterium tuberculosis to construct a protein-ligand interaction network. The network reveals highly connected proteins, which represent suitable targets for promiscuous drugs.

Keywords: Binding Sites; Computational Biology, methods; Drug Discovery, methods; Genome; Ligands; Models, Statistical; Mycobacterium tuberculosis, genetics/metabolism; Proteins, chemistry
[Topiol2009X-ray] Sid Topiol and Michael Sabio. X-ray structure breakthroughs in the GPCR transmembrane region. Biochem Pharmacol, 78(1):11-20, Jul 2009. [ bib | DOI | http ]
G-protein-coupled receptor (GPCR) proteins [Lundstrom KH, Chiu ML, editors. G protein-coupled receptors in drug discovery. CRC Press; 2006] are the single largest drug target, representing 25-50% of marketed drugs [Overington JP, Al-Lazikani B, Hopkins AL. How many drug targets are there? Nat Rev Drug Discov 2006;5(12):993-6; Parrill AL. Crystal structures of a second G protein-coupled receptor: triumphs and implications. ChemMedChem 2008;3:1021-3]. While there are six subclasses of GPCR proteins, the hallmark of all GPCR proteins is the transmembrane-spanning region. The general architecture of this transmembrane (TM) region has been known for some time to contain seven alpha-helices. From a drug discovery and design perspective, structural information of the GPCRs has been sought as a tool for structure-based drug design. The advances in the past decade of technologies for structure-based design have proven to be useful in a number of areas. Invoking these approaches for GPCR targets has remained challenging. Until recently, the most closely related structures available for GPCR modeling have been those of bovine rhodopsin. While a representative of class A GPCRs, bovine rhodopsin is not a ligand-activated GPCR and is fairly distant in sequence homology to other class A GPCRs. Thus, there is a variable degree of uncertainty in the use of the rhodopsin X-ray structure as a template for homology modeling of other GPCR targets. Recent publications of X-ray structures of class A GPCRs now offer the opportunity to better understand the molecular mechanism of action at the atomic level, to deploy X-ray structures directly for their use in structure-based design, and to provide more promising templates for many other ligand-mediated GPCRs. We summarize herein some of the recent findings in this area and provide an initial perspective of the emerging opportunities, possible limitations, and remaining questions. Other aspects of the recent X-ray structures are described by Weis and Kobilka [Weis WI, Kobilka BK. Structural insights into G-protein-coupled receptor activation. Curr Opin Struct Biol 2008;18:734-40] and Mustafi and Palczewski [Mustafi D, Palczewski K. Topology of class A G protein-coupled receptors: insights gained from crystal structures of rhodopsins, adrenergic and adenosine receptors. Mol Pharmacol 2009;75:1-12].

Keywords: Animals; Cell Membrane; Humans; Models, Molecular; Molecular Conformation; Pindolol; Propanolamines; Protein Conformation; Receptor, Adenosine A2A; Receptors, Adrenergic, beta-2; Receptors, G-Protein-Coupled; Retinaldehyde; Rhodopsin; X-Ray Diffraction
[Terentiev2009Dynamic] A. A. Terentiev, N. T. Moldogazieva, and K. V. Shaitan. Dynamic proteomics in modeling of the living cell. protein-protein interactions. Biochemistry (Mosc), 74(13):1586-1607, Dec 2009. [ bib ]
This review is devoted to describing, summarizing, and analyzing of dynamic proteomics data obtained over the last few years and concerning the role of protein-protein interactions in modeling of the living cell. Principles of modern high-throughput experimental methods for investigation of protein-protein interactions are described. Systems biology approaches based on integrative view on cellular processes are used to analyze organization of protein interaction networks. It is proposed that finding of some proteins in different protein complexes can be explained by their multi-modular and polyfunctional properties; the different protein modules can be located in the nodes of protein interaction networks. Mathematical and computational approaches to modeling of the living cell with emphasis on molecular dynamics simulation are provided. The role of the network analysis in fundamental medicine is also briefly reviewed.

Keywords: Animals; Humans; Mass Spectrometry; Models, Theoretical; Molecular Dynamics Simulation; Multiprotein Complexes; Protein Conformation; Protein Interaction Mapping; Proteins; Proteomics; Systems Biology; Two-Hybrid System Techniques
[Park2009ChIP] Peter J Park. Chip-seq: advantages and challenges of a maturing technology. Nat Rev Genet, 10(10):669-680, Oct 2009. [ bib | DOI | http ]
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.

Keywords: Animals; Chromatin Immunoprecipitation, methods; Computational Biology; DNA-Binding Proteins, genetics; Epigenesis, Genetic; Humans; Nucleosomes, genetics; Sequence Analysis, DNA, methods
[Mustafi2009Topology] Debarshi Mustafi and Krzysztof Palczewski. Topology of class a g protein-coupled receptors: insights gained from crystal structures of rhodopsins, adrenergic and adenosine receptors. Mol Pharmacol, 75(1):1-12, Jan 2009. [ bib | DOI | http ]
Biological membranes are densely packed with membrane proteins that occupy approximately half of their volume. In almost all cases, membrane proteins in the native state lack the higher-order symmetry required for their direct study by diffraction methods. Despite many technical difficulties, numerous crystal structures of detergent solubilized membrane proteins have been determined that illustrate their internal organization. Among such proteins, class A G protein-coupled receptors have become amenable to crystallization and high resolution X-ray diffraction analyses. The derived structures of native and engineered receptors not only provide insights into their molecular arrangements but also furnish a framework for designing and testing potential models of transformation from inactive to active receptor signaling states and for initiating rational drug design.

Keywords: Animals; Crystallography, X-Ray; Humans; Models, Molecular; Protein Structure, Secondary; Receptors, Adrenergic; Receptors, G-Protein-Coupled; Receptors, Purinergic P1; Rhodopsin
[Lievens2009Mammalian] Sam Lievens, Irma Lemmens, and Jan Tavernier. Mammalian two-hybrids come of age. Trends Biochem Sci, 34(11):579-588, Nov 2009. [ bib | DOI | http ]
A diverse series of mammalian two-hybrid technologies for the detection of protein-protein interactions have emerged in the past few years, complementing the established yeast two-hybrid approach. Given the mammalian background in which they operate, these assays open new avenues to study the dynamics of mammalian protein interaction networks, i.e. the temporal, spatial and functional modulation of protein-protein associations. In addition, novel assay formats are available that enable high-throughput mammalian two-hybrid applications, facilitating their use in large-scale interactome mapping projects. Finally, as they can be applied in drug discovery and development programs, these techniques also offer exciting new opportunities for biomedical research.

Keywords: Animals; Genes, Reporter; Humans; Models, Biological; Protein Binding; Protein Interaction Mapping; Proteins; Recombinant Fusion Proteins; Transfection; Two-Hybrid System Techniques
[LeCao2009Sparse] K.-A. Lê Cao, P. G. P. Martin, C. Robert-Granié, and P. Besse. Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics, 10:34, 2009. [ bib | DOI | http ]
In the context of systems biology, few sparse approaches have been proposed so far to integrate several data sets. It is however an important and fundamental issue that will be widely encountered in post genomic studies, when simultaneously analyzing transcriptomics, proteomics and metabolomics data using different platforms, so as to understand the mutual interactions between the different data sets. In this high dimensional setting, variable selection is crucial to give interpretable results. We focus on a sparse Partial Least Squares approach (sPLS) to handle two-block data sets, where the relationship between the two types of variables is known to be symmetric. Sparse PLS has been developed either for a regression or a canonical correlation framework and includes a built-in procedure to select variables while integrating data. To illustrate the canonical mode approach, we analyzed the NCI60 data sets, where two different platforms (cDNA and Affymetrix chips) were used to study the transcriptome of sixty cancer cell lines.We compare the results obtained with two other sparse or related canonical correlation approaches: CCA with Elastic Net penalization (CCA-EN) and Co-Inertia Analysis (CIA). The latter does not include a built-in procedure for variable selection and requires a two-step analysis. We stress the lack of statistical criteria to evaluate canonical correlation methods, which makes biological interpretation absolutely necessary to compare the different gene selections. We also propose comprehensive graphical representations of both samples and variables to facilitate the interpretation of the results.sPLS and CCA-EN selected highly relevant genes and complementary findings from the two data sets, which enabled a detailed understanding of the molecular characteristics of several groups of cell lines. These two approaches were found to bring similar results, although they highlighted the same phenomenons with a different priority. They outperformed CIA that tended to select redundant information.

Keywords: Computational Biology, methods; Genomics; Metabolomics; Proteomics; Systems Biology, methods
[Jensen2009STRING] L.J. Jensen, M. Kuhn, M. Stark, S. Chaffron, C. Creevey, J. Muller, T. Doerks, P. Julien, A. Roth, M. Simonovic, P. Bork, and C. von Mering. String 8-a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res, 37(Database issue):D412-D416, Jan 2009. [ bib | DOI | http ]
Functional partnerships between proteins are at the core of complex cellular phenotypes, and the networks formed by interacting proteins provide researchers with crucial scaffolds for modeling, data reduction and annotation. STRING is a database and web resource dedicated to protein-protein interactions, including both physical and functional interactions. It weights and integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections, thus acting as a meta-database that maps all interaction evidence onto a common set of genomes and proteins. The most important new developments in STRING 8 over previous releases include a URL-based programming interface, which can be used to query STRING from other resources, improved interaction prediction via genomic neighborhood in prokaryotes, and the inclusion of protein structures. Version 8.0 of STRING covers about 2.5 million proteins from 630 organisms, providing the most comprehensive view on protein-protein interactions currently available. STRING can be reached at http://string-db.org/.

Keywords: Databases, Protein; Genomics; Multiprotein Complexes; Protein Interaction Mapping; Proteins; User-Computer Interface
[Fullwood2009oestrogen-receptor-alpha-bound] Melissa J Fullwood, Mei Hui Liu, You Fu Pan, Jun Liu, Han Xu, Yusoff Bin Mohamed, Yuriy L Orlov, Stoyan Velkov, Andrea Ho, Poh Huay Mei, Elaine G Y Chew, Phillips Yao Hui Huang, Willem-Jan Welboren, Yuyuan Han, Hong Sain Ooi, Pramila N Ariyaratne, Vinsensius B Vega, Yanquan Luo, Peck Yean Tan, Pei Ye Choy, K. D Senali Abayratna Wansa, Bing Zhao, Kar Sian Lim, Shi Chi Leow, Jit Sin Yow, Roy Joseph, Haixia Li, Kartiki V Desai, Jane S Thomsen, Yew Kok Lee, R. Krishna Murthy Karuturi, Thoreau Herve, Guillaume Bourque, Hendrik G Stunnenberg, Xiaoan Ruan, Valere Cacheux-Rataboul, Wing-Kin Sung, Edison T Liu, Chia-Lin Wei, Edwin Cheung, and Yijun Ruan. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature, 462(7269):58-64, Nov 2009. [ bib | DOI | http ]
Genomes are organized into high-level three-dimensional structures, and DNA elements separated by long genomic distances can in principle interact functionally. Many transcription factors bind to regulatory DNA elements distant from gene promoters. Although distal binding sites have been shown to regulate transcription by long-range chromatin interactions at a few loci, chromatin interactions and their impact on transcription regulation have not been investigated in a genome-wide manner. Here we describe the development of a new strategy, chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) for the de novo detection of global chromatin interactions, with which we have comprehensively mapped the chromatin interaction network bound by oestrogen receptor alpha (ER-alpha) in the human genome. We found that most high-confidence remote ER-alpha-binding sites are anchored at gene promoters through long-range chromatin interactions, suggesting that ER-alpha functions by extensive chromatin looping to bring genes together for coordinated transcriptional regulation. We propose that chromatin interactions constitute a primary mechanism for regulating transcription in mammalian genomes.

Keywords: Binding Sites; Cell Line; Chromatin; Chromatin Immunoprecipitation; Cross-Linking Reagents; Estrogen Receptor alpha; Formaldehyde; Genome, Human; Humans; Promoter Regions, Genetic; Protein Binding; Reproducibility of Results; Sequence Analysis, DNA; Transcription, Genetic; Transcriptional Activation
[Wallace2010Identification] Emma V B Wallace, David Stoddart, Andrew J Heron, Ellina Mikhailova, Giovanni Maglia, Timothy J Donohoe, and Hagan Bayley. Identification of epigenetic dna modifications with a protein nanopore. Chem Commun (Camb), 46(43):8195-8197, Nov 2010. [ bib | DOI | http ]
Two DNA bases, 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (hmC), marks of epigenetic modification, are recognized in immobilized DNA strands and distinguished from G, A, T and C by nanopore current recording. Therefore, if further aspects of nanopore sequencing can be addressed, the approach will provide a means to locate epigenetic modifications in unamplified genomic DNA.

Keywords: 5-Methylcytosine, chemistry; Cyclodextrins, chemistry; Cytosine, analogs /&/ derivatives/chemistry; DNA, chemistry; Epigenesis, Genetic; Hemolysin Proteins, chemistry; Nanopores
[Vanunu2010Associating] O. Vanunu, O. Magger, E. Ruppin, T. Shlomi, and R. Sharan. Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol., 6(1):e1000641, Jan 2010. [ bib | DOI | http ]
A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE's predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation.

Keywords: Algorithms; Alzheimer Disease; Databases, Genetic; Diabetes Mellitus; Disease; Genes; Humans; Male; Multiprotein Complexes; Prostatic Neoplasms; Protein Interaction Mapping; Proteins; Reproducibility of Results
[Stoddart2010Nucleobase] David Stoddart, Andrew J Heron, Jochen Klingelhoefer, Ellina Mikhailova, Giovanni Maglia, and Hagan Bayley. Nucleobase recognition in ssdna at the central constriction of the alpha-hemolysin pore. Nano Lett, 10(9):3633-3637, Sep 2010. [ bib | DOI | http ]
Nanopores are under investigation for single-molecule DNA sequencing. The alpha-hemolysin (alphaHL) protein nanopore contains three recognition points capable of nucleobase discrimination in individual immobilized ssDNA molecules. We have modified the recognition point R(1) by extensive mutagenesis of residue 113. Amino acids that provide an energy barrier to ion flow (e.g., bulky or hydrophobic residues) strengthen base identification, while amino acids that lower the barrier weaken it. Amino acids with related side chains produce similar patterns of nucleobase recognition providing a rationale for the redesign of recognition points.

Keywords: Amino Acid Substitution; Base Sequence; DNA, Single-Stranded, chemistry; Hemolysin Proteins, chemistry; Models, Molecular; Mutagenesis
[Gehlenborg2010Visualization] Nils Gehlenborg, Seán I O'Donoghue, Nitin S Baliga, Alexander Goesmann, Matthew A Hibbs, Hiroaki Kitano, Oliver Kohlbacher, Heiko Neuweger, Reinhard Schneider, Dan Tenenbaum, and Anne-Claude Gavin. Visualization of omics data for systems biology. Nat Methods, 7(3 Suppl):S56-S68, Mar 2010. [ bib | DOI | http ]
High-throughput studies of biological systems are rapidly accumulating a wealth of 'omics'-scale data. Visualization is a key aspect of both the analysis and understanding of these data, and users now have many visualization methods and tools to choose from. The challenge is to create clear, meaningful and integrated visualizations that give biological insight, without being overwhelmed by the intrinsic complexity of the data. In this review, we discuss how visualization tools are being used to help interpret protein interaction, gene expression and metabolic profile data, and we highlight emerging new directions.

Keywords: Genomics; Image Processing, Computer-Assisted; Mass Spectrometry; Metabolomics; Nuclear Magnetic Resonance, Biomolecular; Protein Binding; Proteomics; Systems Biology
[Choudhary2010Decoding] Chunaram Choudhary and Matthias Mann. Decoding signalling networks by mass spectrometry-based proteomics. Nat Rev Mol Cell Biol, 11(6):427-439, Jun 2010. [ bib | DOI | http ]
Signalling networks regulate essentially all of the biology of cells and organisms in normal and disease states. Signalling is often studied using antibody-based techniques such as western blots. Large-scale 'precision proteomics' based on mass spectrometry now enables the system-wide characterization of signalling events at the levels of post-translational modifications, protein-protein interactions and changes in protein expression. This technology delivers accurate and unbiased information about the quantitative changes of thousands of proteins and their modifications in response to any perturbation. Current studies focus on phosphorylation, but acetylation, methylation, glycosylation and ubiquitylation are also becoming amenable to investigation. Large-scale proteomics-based signalling research will fundamentally change our understanding of signalling networks.

Keywords: Animals; Humans; Mass Spectrometry; Protein Processing, Post-Translational; Proteome; Proteomics; Signal Transduction
[Aranda2010IntAct] B. Aranda, P. Achuthan, Y. Alam-Faruque, I. Armean, A. Bridge, C. Derow, M. Feuermann, A. T. Ghanbarian, S. Kerrien, J. Khadake, J. Kerssemakers, C. Leroy, M. Menden, M. Michaut, L. Montecchi-Palazzi, S. N. Neuhauser, S. Orchard, V. Perreau, B. Roechert, K. van Eijk, and H. Hermjakob. The intact molecular interaction database in 2010. Nucleic Acids Res, 38(Database issue):D525-D531, Jan 2010. [ bib | DOI | http ]
IntAct is an open-source, open data molecular interaction database and toolkit. Data is abstracted from the literature or from direct data depositions by expert curators following a deep annotation model providing a high level of detail. As of September 2009, IntAct contains over 200.000 curated binary interaction evidences. In response to the growing data volume and user requests, IntAct now provides a two-tiered view of the interaction data. The search interface allows the user to iteratively develop complex queries, exploiting the detailed annotation with hierarchical controlled vocabularies. Results are provided at any stage in a simplified, tabular view. Specialized views then allows 'zooming in' on the full annotation of interactions, interactors and their properties. IntAct source code and data are freely available at http://www.ebi.ac.uk/intact.

Keywords: Animals; Computational Biology; Databases, Genetic; Databases, Protein; False Positive Reactions; Humans; Information Storage and Retrieval; Internet; Programming Languages; Protein Interaction Mapping; Protein Structure, Tertiary; Proteins; Software; User-Computer Interface; Vocabulary, Controlled
[Tomizaki2010Protein] Kin-ya Tomizaki, Kenji Usui, and Hisakazu Mihara. Protein-protein interactions and selection: array-based techniques for screening disease-associated biomarkers in predictive/early diagnosis. FEBS J, 277(9):1996-2005, May 2010. [ bib | DOI | http ]
There has been considerable interest in recent years in the development of miniaturized and parallelized array technology for protein-protein interaction analysis and protein profiling, namely 'protein-detecting microarrays'. Protein-detecting microarrays utilize a wide variety of capture agents (antibodies, fusion proteins, DNA/RNA aptamers, synthetic peptides, carbohydrates, and small molecules) immobilized at high spatial density on a solid surface. Each capture agent binds selectively to its target protein in a complex mixture, such as serum or cell lysate samples. Captured proteins are subsequently detected and quantified in a high-throughput fashion, with minimal sample consumption. Protein-detecting microarrays were first described by MacBeath and Schreiber in 2000, and the number of publications involving this technology is rapidly increasing. Furthermore, the first multiplex immunoassay systems have been cleared by the US Food and Drug Administration, signaling recognition of the usefulness of miniaturized and parallelized array technology for protein detection in predictive/early diagnosis. Although genetic tests still predominate, with further development protein-based diagnosis will become common in clinical use within a few years.

Keywords: Animals; Biological Markers, analysis/metabolism; Early Diagnosis; Humans; Mass Screening, methods; Protein Array Analysis, methods; Proteins, analysis/metabolism; Risk Factors

This file was generated by bibtex2html 1.97.