О достоверности процедуры выравнивания первичных структур биополимеров
Диссертация
Недостатком всех цитированных работ является то, что алгоритмические выравнивания сравнивались не с истинным эволюционным выравниванием, которое неизвестно, а с его приближением, что вносит в результаты погрешность, величина которой не поддается оценке. Мы предлагаем для оценки качества алгоритмов сравнивать искусственно сгенерированные последовательности, для которых истинное выравнивание… Читать ещё >
Содержание
- СПИСОК ТЕРМИНОВ И СОКРАЩЕНИЙ
- Глава 1. ОБЗОР ЛИТЕРАТУРЫ
- 1. 1. Основы изменчивости генома
- 1. 2. Сравнение последовательностей и алгоритмы выравнивания
- 1. 3. Выравнивание последовательностей методом динамического программирования
- 1. 4. Выравнивание и расстояние
- 1. 5. Выравнивание и сходство
- 1. 6. Вхождение одной последовательности в другую
- 1. 7. Поиск сходных фрагментов
Список литературы
- Alberts B., Johnson A., Lewis J., Raff M., Roberts K., Walter P. 2002. Molecular Biology of the Cell. 4th Edition. Garland Science, US.
- International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature, 2001, 409, 860−921.
- Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. 1990. Basic local alignment search tool. J Mol Biol. 215(3), 403−410.
- Amit I, Wides R, Yarden Y. 2007. Evolvable signaling networks of receptor tyrosine kinases: relevance of robustness to malignancy and to cancer therapy.
- Mol. Syst. Biol. 3, 151−172.
- Arratia R., Gordon L. and Waterman M.S. 1986. An extreme value theory for sequence matching. Ann. Stat. 14, 971−993.
- Benner S.A., Cohen M.A. and Gonnet G.H. 1993. Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J. Mol.Biol., 229, 1065−1082.
- Blaisdell B.E. 1985. Markov chain analysis finds a significant influence of neighboring bases on the occurence of a base in eukaryotic nuclear DNA sequences both protein coding and noncoding. J. Mol. Evol., 21, 278−288.
- Bourque G., Pevzner P.A., Tesler G. 2004. Reconstructing the genomic architecture of ancestral mammals: lessons from human, mouse, and rat genomes. Genome Res. 14(4), 507−516.
- Boussau B, Gueguen L, Gouy M. 2008. Accounting for horizontal gene transfers explains conflicting hypotheses regarding the position of aquificales in the phylogeny of Bacteria. BMC Evol Biol. 8(1), 272.
- Bray J.E., Todd A.E., Pearl F.M., Thornton J.M., Orengo C.A. 2000. The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues. Protein Eng. 13(3), 153−165.
- Chimpanzee Sequencing and Analysis Consortium. 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 437(7055), 69−87. Comment in: Nature. 2005. 437(7055), 50−1.
- Dayhoff M., Schwartz R. and Orcutt B. 1978. A model of evolutionary change in proteins. 345−352. In: Dayhoff M., ed., Atlas of protein sequence and structure. National Biomedical Research Foundation, Washington, DC.
- Denker E, Bapteste E, Le Guyader H, Manuel M, Rabet N. 2008. Horizontal gene transfer and the evolution of cnidarian stinging cells. Curr Biol. 18(18), R858−859.
- Domingues F.S., Lackner P., Andreeva A., et al. 2000. Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. J. Mol. Biol. 297, 1003−1013.
- Doolittle R.F. 1981. Similar amino acid sequences: chance or common ancestry? Science. 214, 149−159
- Eddy S.R. 1998. Profile hidden Markov models. Bioinformatics, 14, 755−763.
- Ewing B., Green P. 2000. Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genet., 25, 232−234.
- Finkelstein A.V., Roytberg M.A. 1993. Computation of biopolymers: a general approach to different problems. Biosystems. 30(1−3), 1−19.
- Gamier N., Friedrich A., Bolze R., Bettler E., Moulinier L., Geourjon C., Thompson J.D., Deleage G., Poch O. 2006. MAGOS: multiple alignment and modelling server.
- Bioinformatics. 22(17), 2164−2165.
- Gotoh O. 1982. An improved algorithm for matching biological sequences. J. Mol. Biol., 162, 705−708.
- Gotoh O. 1999. Multiple sequence alignment: algorithms and applications. Adv Biophys. 36, 159−206. Review.
- Hollich V, Milchert L, Arvestad L, Sonnhammer ELL. 2005. Assessment of Protein Measures and Tree Building Methods for Phylogenetic Tree Reconstruction. Mol. Biol. Evol. 22(11), 2257−2264.
- John B., Sali A. 2004. Detection of homologous proteins by an intermediate sequence search. Protein Sci. 13(1), 54−62.
- Jordan I.K., Kondrashov F.A., Adzhubei I.A., et al. 2005. A universal trend of amino acid gain and loss in protein evolution. Nature. 433, 633−638.
- Kaback D.B., Guacci V., Barber D., Mahon J.V. 1992. Chromosome size-dependent control of meiotic recombination. Science. 256, 228−232.
- Karlin S. and Ost F. 1987. Counts of long aligned word matches among random letter sequences. Adv. Appl. Prob. 19, 293−351.
- Karlin S., Morris M., Ghandour G. and Leung M.-Y. 1988. Efficient algorithms for molecular sequence analysis. Proc. Natl. Acad. Sci. U.S.A. 85, 841−845.
- Karlin S., Morris M., Ghandour G., Leung M.Y. 1988. Algorithms for identifying local molecular sequence features. Comput Appl Biosci. 4(1), 41−51.
- Keese P. 2008. Risks from GMOs due to Horizontal Gene Transfer. Environ Biosafety Res. 7(3), 123−149.
- Kolchinskiii A.M., Barskii V.E., Zasedatelev A.S. 2007. Biochips in the laboratory of A. D. Mirzabekov: 1988-2007. Mol Biol. 41(5), 757−764. Russian.
- Koonin E.V., Aravind L., Kondrashov A.S. 2000. The impact of comparative genomics on our understanding of evolution. Cell. 101(6), 573−576.
- Kruskal J.B. and Sankoff D. 1983. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. In: Sankoff, D. and Kruskal, J.B., Eds., Addison-Wesley, London.
- Laurie D.A. and Hulten M.A. 1985. Further studies on bivalent chiasma in human males with normal kariotypes. Ann. Hum. Genet. 49, 189−201.
- Lipman D.J. and Pearson W.R. 1985. Rapid and sensitive protein similarity searches. Science. 227, 1435−1441.
- Madden T.L., Shavirin S., Spouge J. L, Wolf Y.I., Koonin E.V., AltSchul S.F. 2001. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 29(14), 29 943 005.
- Makarova Iu. A, Kramerov D.A. 2007. Small nucleolar RNA genes. Genetika. 43(2), 149−58.
- Mayr G., Domingues F.S., Lackner P. 2007. Comparative analysis of protein structure alignments. BMC Struct Biol. 26(7), 50.
- Mevissen H.Th. and Vingron M. 1996. Quantifying the local reliability of a sequence alignment. Prot. Eng. 9, 127−132.
- Mitrophanov A.Y., Borodovsky M. 2006. Statistical significance in biological sequence analysis. Brief Bioinform. 7(1), 2−24
- Mouse genome sequenbing Consortium 2002. Initial sequencing and comparative analysis of the mouse genome. Nature. 420(6915), 520−62. Comments in: Nat. Biotechnol. 2003. 21(1), 31−2. Nature. 2002. 420(6915), 512−4. Nature. 2002. 420(6915), 515−6.
- Myers E., Miller W. 1988. Sequence comparison with concave weighting functions. Bull. Math. Biol. 50, 97−120.
- Needleman S.B. and Wunsch C.D. 1970. A general method applicable to the search of similarity in the amino-acid sequence of two proteins. J. Mol. Biol. 48, 443 453.
- Notredame C., Higgins D.G., Heringa J. 2000. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 302(1), 205−217
- Pearson W.R. and Lipman D.J. 1988. Improved tools for biological sequence comparisons. Proc. Natl. Acad. Sci. U.S.A. 85, 2444−2448.
- Perrodou E., Chica C., Poch O., Gibson T.J., Thompson J.D. 2008.
- A new protein linear motif benchmark for multiple sequence alignment software. BMC Bioinformatics. 25(9), 213.
- Pontius J.U., Mullikin J.C., Smith D.R. et al. 2007. Initial sequence and comparative analysis of the cat genome. Genome Res. 17(11), 1675−89. Comment in: Genome Res. 2007. 17(11), 1547−1549.
- Reese J.T., Pearson W.R. 2002. Empirical determination of effective gap penalties for sequence comparison. Bioinformatics, 18, 1500−1507.
- Rigoutsos I., Huynh T., Floratos A., Parida L., Piatt D. 2002. Dictionary-driven protein annotation. Nucleic Acids Res. 30(17), 3901−3916.
- Rosenberg M.S. 2005. Evolutionary distance estimation and fidelity of pair wise sequence alignment. BMC Bioinformatics. 19(6), 102.
- Rubina A.Y., Kolchinsky A., Makarov A.A., Zasedatelev A.S. 2008. Why 3-D? Gel-based microarrays in proteomics. Proteomics. 8(4), 817−831
- Saitou N., Nei M. 1987. The Neighbor-joining Method: A New Method for Reconstructing of Phylogenetic Trees. Mol. Biol. Evol. 4, 406−425.
- Sankararaman S, Sjolander K. 2008. INTREPID INformation-theoretic TREe traversal for Protein functional site IDentification. Bioinformatics. 24(21), 24 452 452.
- Schlosshauer M. and Ohlsson M. 2002. A novel approach to local reliability of sequence alignments. Bioinformatics. 6, 847−854.
- Sellers P. 1974. On the theory and computation of evolutionary distances. SIAM J. Appl. Math., 26, 787−793.
- Sellers P. 1980. The theory and computation of evolutionary distances: pattern recognition. J. Algorithms, 1, 359−373.
- Sellers, P. 1984. Pattern recognition in genetic sequences by mismatch density. Bull. Math. Biol. 46, 501−514.
- Senchenko V.N., Liu J., Loginov W. et al. 2004. Discovery of frequent homozygous deletions in chromosome 3p21.3 LUCA and AP20 regions in renal, lung and breast carcinomas. Oncogene. 23(34), 5719−5728.
- Shah A.R., Oehmen C.S., Webb-Robertson B.J. 2008. SVM-HUSTLE an iterative semi-supervised machine learning approach for pairwise protein remote homology detection. Bioinformatics. 24(6), 783−790.
- Smith T.F. and Waterman M.S. 1981. Identification of common molecular subsequnces. J. Mol. Biol. 147, 195−197.
- Stark A., Sunyaev S., Russell R.B. 2003. A model for statistical significance of local similarities in structure. J Mol Biol. 326(5), 1307−1316.
- Sunyaev S.R., Eisenhaber F., Rodchenkov I.V., Eisenhaber B., Tumanyan V.G., Kuznetsov E.N. 1999. PSIC: profile extraction from sequence alignment with position-specific counts of independent observations. Prot. Eng. 12, 387−394.
- Thomas P.D., Campbell M.J., Kejariwal A., Mi H., Karlak B., Daverman R., Diemer K., Muruganujan A., Narechania A. 2003. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13(9), 2129−41.
- Thompson J.D., Plewniak F., Poch O. 1999. BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics. 15, 8788.
- Thompson J.D., Koehl P., Ripp R., Poch O. 2005. BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins. 61(1), 127 136.
- Thompson J.D., Prigent V., Poch O. 2004. LEON: multiple alignment Evaluation Of Neighbours. Nucleic Acids Res. 32(4), 1298−307.
- Thompson J.D., Koehl P., Ripp R., Poch O. 2005. BAliBASE 3.0: latest developments of the multiple alignment benchmark. Proteins. 61(1), 127−136.
- Thompson J.D., Muller A., Waterhouse A., Procter J., Barton G.J., Plewniak F., Poch O. 2006. MACSIMS: multiple alignment of complete sequences information management system. BMC Bioinformatics. 23, 7, 318−329.
- Vingron M. and Argos P. 1990. Determination of reliable regions in protein sequence alignments. Prot. Eng. 3, 565−569.
- Vingron M and Argos P. 1991. Motif recognition and alignment for many sequences by comparison of dot-matrices. J Mol Biol. 218(1), 33−43.
- Vogt G., Etzold T., Argos P. 1995. An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J. Mol. Biol. 249, 816−831.
- Waterman M.S., Smith T.F. and Beyer W.A. 1976. Some biological sequence metrics. Adv. Math. 20, 367−387.
- Waterman M.S. 1984. Efficient sequence alignment algorithms. J. Theor. Biol. 108, 333- 337.
- Waterman M.S., Galas D. and Arratia R. 1984. Pattern recognition in several sequences: consensus and alignment. Bull. Math. Biol., 46, 515−527.
- Waterman M.S. 1987. A new algorithm for best subseqence alignment with application to tRNA-rRNA comparisons. J. Mol. Biol., 197, 723−728.
- Waterman M.S., ed. 1989. Mathematical methods for DNA sequences. CRC Press, Boca Raton, Florida.
- Wilbur W.J. and Lipman D.J. 1983. Rapid similarity searches of nucleic acid and protein data banks. Proc. Natl. Acad. Sci. U.S.A. 80, 726−730.
- Yamada S., Gotoh O., Yamana H. 2006. Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost. BMC Bioinformatics. 7, 524.
- Yang AS. 2002. Structure-dependent sequence alignment for remotely related proteins. Bioinformatics. 18(12), 1658−1665.
- Литвинов И.И., Лобанов М. Ю., Миронов А. А., Финкелынтейн A.B., Ройтберг M.A. 2006. Информация о вторичной структуре белка улучшает качество выравнивания. Мол. Биол., 40(3), 533−540.
- Поройков В.В., Есипова Н. Г., Туманян В. Г. 1984. Распределение аминокислотных остатков в первичных структурах белков. Мол. Биол., 18(2),
- Ройт А., Бростофф Дж., Мейл Д. 2000. В кн. Иммуноглобулины, Мир, Москва, стр. 131−144.
- Ройтберг, М.А. 1984. Алгоритм определения первичных структур. Пущино, Препринт НЦБИ, 24.
- Самарский A.A., Гулин A.B. 1989. В кн.: Численные методы. Наука, Москва, стр. 209.
- Туманян В.Г., Сотникова Л. Е., Холопов A.B. 1966. Об определении вторичной структуры РНК по последовательности нуклеотидов. Докл. Акад. Наук, 166(6), 1465−1468.541.547.