BLASTP 2.2.25+ Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for composition-based statistics: Alejandro A. Schäffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 15,229,318 sequences; 5,219,829,388 total letters Query= Rv1914c Length=135 Score E Sequences producing significant alignments: (Bits) Value gi|15609051|ref|NP_216430.1| hypothetical protein Rv1914c [Mycob... 273 4e-72 gi|323719577|gb|EGB28701.1| hypothetical protein TMMG_01174 [Myc... 248 1e-64 gi|254550930|ref|ZP_05141377.1| hypothetical protein Mtube_10796... 243 8e-63 gi|118618349|ref|YP_906681.1| hypothetical protein MUL_2941 [Myc... 178 2e-43 gi|183982824|ref|YP_001851115.1| hypothetical protein MMAR_2820 ... 174 3e-42 gi|240172358|ref|ZP_04751017.1| hypothetical protein MkanA1_2378... 122 2e-26 gi|317507100|ref|ZP_07964861.1| hypothetical protein HMPREF9336_... 82.0 2e-14 gi|148265072|ref|YP_001231778.1| hypothetical protein Gura_3033 ... 80.5 7e-14 gi|296395124|ref|YP_003660008.1| hypothetical protein Srot_2743 ... 79.3 2e-13 gi|343921194|gb|EGV31918.1| hypothetical protein ThidrDRAFT_1803... 64.3 6e-09 gi|170577352|ref|XP_001893973.1| Immunoglobulin I-set domain con... 35.8 2.0 gi|21752415|dbj|BAC04191.1| unnamed protein product [Homo sapiens] 35.0 3.2 gi|312067154|ref|XP_003136609.1| hypothetical protein LOAG_01021... 34.3 5.8 gi|242054577|ref|XP_002456434.1| hypothetical protein SORBIDRAFT... 34.3 6.7 gi|90023210|ref|YP_529037.1| hypothetical protein Sde_3570 [Sacc... 33.9 7.4 >gi|15609051|ref|NP_216430.1| hypothetical protein Rv1914c [Mycobacterium tuberculosis H37Rv] gi|15841386|ref|NP_336423.1| hypothetical protein MT1965 [Mycobacterium tuberculosis CDC1551] gi|31793107|ref|NP_855600.1| hypothetical protein Mb1949c [Mycobacterium bovis AF2122/97] 76 more sequence titlesLength=135 Score = 273 bits (699), Expect = 4e-72, Method: Compositional matrix adjust. Identities = 135/135 (100%), Positives = 135/135 (100%), Gaps = 0/135 (0%) Query 1 MVLSRTSTGRVILVPTQLRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIP 60 MVLSRTSTGRVILVPTQLRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIP Sbjct 1 MVLSRTSTGRVILVPTQLRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIP 60 Query 61 LTSITKAEATNARVYAAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWV 120 LTSITKAEATNARVYAAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWV Sbjct 61 LTSITKAEATNARVYAAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWV 120 Query 121 SVTDPDALVTACTAK 135 SVTDPDALVTACTAK Sbjct 121 SVTDPDALVTACTAK 135 >gi|323719577|gb|EGB28701.1| hypothetical protein TMMG_01174 [Mycobacterium tuberculosis CDC1551A] gi|339294849|gb|AEJ46960.1| hypothetical protein CCDC5079_1770 [Mycobacterium tuberculosis CCDC5079] gi|339298474|gb|AEJ50584.1| hypothetical protein CCDC5180_1747 [Mycobacterium tuberculosis CCDC5180] Length=122 Score = 248 bits (634), Expect = 1e-64, Method: Compositional matrix adjust. Identities = 121/122 (99%), Positives = 122/122 (100%), Gaps = 0/122 (0%) Query 14 VPTQLRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIPLTSITKAEATNAR 73 +PTQLRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIPLTSITKAEATNAR Sbjct 1 MPTQLRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIPLTSITKAEATNAR 60 Query 74 VYAAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWVSVTDPDALVTACT 133 VYAAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWVSVTDPDALVTACT Sbjct 61 VYAAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWVSVTDPDALVTACT 120 Query 134 AK 135 AK Sbjct 121 AK 122 >gi|254550930|ref|ZP_05141377.1| hypothetical protein Mtube_10796 [Mycobacterium tuberculosis '98-R604 INH-RIF-EM'] Length=135 Score = 243 bits (619), Expect = 8e-63, Method: Compositional matrix adjust. Identities = 123/135 (92%), Positives = 123/135 (92%), Gaps = 0/135 (0%) Query 1 MVLSRTSTGRVILVPTQLRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIP 60 MVLSRTSTGRVILVPTQLRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIP Sbjct 1 MVLSRTSTGRVILVPTQLRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIP 60 Query 61 LTSITKAEATNARVYAAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWV 120 LTS AAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWV Sbjct 61 LTSPPACGRAPRGFEAAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWV 120 Query 121 SVTDPDALVTACTAK 135 SVTDPDALVTACTAK Sbjct 121 SVTDPDALVTACTAK 135 >gi|118618349|ref|YP_906681.1| hypothetical protein MUL_2941 [Mycobacterium ulcerans Agy99] gi|118570459|gb|ABL05210.1| conserved hypothetical protein - truncated [Mycobacterium ulcerans Agy99] Length=122 Score = 178 bits (452), Expect = 2e-43, Method: Compositional matrix adjust. Identities = 86/117 (74%), Positives = 98/117 (84%), Gaps = 0/117 (0%) Query 16 TQLRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIPLTSITKAEATNARVY 75 TQLRF+RWFLPL+VPLGLGPKN + V AG+LHV+MGWAFAADIP+TSI A TNARV+ Sbjct 3 TQLRFERWFLPLSVPLGLGPKNCAVRVEAGNLHVRMGWAFAADIPVTSIKSAALTNARVF 62 Query 76 AAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWVSVTDPDALVTAC 132 AAGVH+ GRWLVNGS KGLVALTI+PP QAK S+ VR LW+SVTDPDAL+ AC Sbjct 63 AAGVHYSGGRWLVNGSGKGLVALTIEPPAQAKAVFMSVRVRSLWISVTDPDALIAAC 119 >gi|183982824|ref|YP_001851115.1| hypothetical protein MMAR_2820 [Mycobacterium marinum M] gi|183176150|gb|ACC41260.1| conserved hypothetical protein [Mycobacterium marinum M] Length=122 Score = 174 bits (442), Expect = 3e-42, Method: Compositional matrix adjust. Identities = 84/118 (72%), Positives = 96/118 (82%), Gaps = 0/118 (0%) Query 16 TQLRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIPLTSITKAEATNARVY 75 TQLRF+RWFLPL+VPLGLGPKN + V AG+LHV+MGWAFAADIP+ SI A TNARV+ Sbjct 3 TQLRFERWFLPLSVPLGLGPKNCAVGVEAGNLHVRMGWAFAADIPVASIKSAALTNARVF 62 Query 76 AAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWVSVTDPDALVTACT 133 AAGVH+ GRWLVNGS KGLVAL I+PP QA S+ VR LW+SVTDPDAL+ ACT Sbjct 63 AAGVHYSGGRWLVNGSGKGLVALMIEPPAQATAVFMSVRVRSLWISVTDPDALIAACT 120 >gi|240172358|ref|ZP_04751017.1| hypothetical protein MkanA1_23788 [Mycobacterium kansasii ATCC 12478] Length=108 Score = 122 bits (305), Expect = 2e-26, Method: Compositional matrix adjust. Identities = 60/107 (57%), Positives = 77/107 (72%), Gaps = 0/107 (0%) Query 29 VPLGLGPKNSELWVGAGSLHVKMGWAFAADIPLTSITKAEATNARVYAAGVHFGFGRWLV 88 +PLG GPK+SE+ V G+L VK GW F A+IPL SI A+ N RVY+ G H GRWLV Sbjct 1 MPLGCGPKHSEVRVQGGTLRVKFGWGFNAEIPLASIKDAKPNNERVYSWGAHGFRGRWLV 60 Query 89 NGSRKGLVALTIDPPEQAKMWKKSMTVRELWVSVTDPDALVTACTAK 135 NGS KG+V LT+DPP +AK+ +T++ L+VSVTDPDAL+ +AK Sbjct 61 NGSSKGIVELTVDPPTRAKVMGVPVTLKTLYVSVTDPDALIAEVSAK 107 >gi|317507100|ref|ZP_07964861.1| hypothetical protein HMPREF9336_01232 [Segniliparus rugosus ATCC BAA-974] gi|316254594|gb|EFV13903.1| hypothetical protein HMPREF9336_01232 [Segniliparus rugosus ATCC BAA-974] Length=122 Score = 82.0 bits (201), Expect = 2e-14, Method: Compositional matrix adjust. Identities = 45/116 (39%), Positives = 64/116 (56%), Gaps = 1/116 (0%) Query 18 LRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIPLTSITKAEATNARVYAA 77 +R+DRW+ LA +GPK + + + LHV+ GW+F D+PL +I+ A R Sbjct 5 MRYDRWYQLLATVFWMGPKRTVIRIVDDVLHVRHGWSFRIDVPLANISSARIYGKRPLGW 64 Query 78 GVHFGFGRWLVNGSRKGLVALTIDPP-EQAKMWKKSMTVRELWVSVTDPDALVTAC 132 GVH WLVNGSR G+V + D P + AK +R L +S+TDPD + A Sbjct 65 GVHAFQNGWLVNGSRDGIVIVQFDAPIKPAKAPLFRWPIRSLAISLTDPDGFLAAL 120 >gi|148265072|ref|YP_001231778.1| hypothetical protein Gura_3033 [Geobacter uraniireducens Rf4] gi|146398572|gb|ABQ27205.1| hypothetical protein Gura_3033 [Geobacter uraniireducens Rf4] Length=135 Score = 80.5 bits (197), Expect = 7e-14, Method: Compositional matrix adjust. Identities = 46/129 (36%), Positives = 73/129 (57%), Gaps = 2/129 (1%) Query 5 RTSTGRVILVPTQ--LRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIPLT 62 R+ GR L P + ++FD W+ L+ L L P +S + V + + V+MGWAF A P Sbjct 2 RSFIGRRPLSPQRFPIQFDPWYGILSSALFLRPSSSYVEVNSEEIRVRMGWAFRACFPRA 61 Query 63 SITKAEATNARVYAAGVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWVSV 122 ++ A T+ R + GVH GRWLVNGS +G++ + + P ++ + + +R+L VSV Sbjct 62 AVALAAETHGRPLSRGVHGFAGRWLVNGSGQGILTIDLTPTQRGYVMGFPVRLRQLMVSV 121 Query 123 TDPDALVTA 131 +P L A Sbjct 122 AEPATLAAA 130 >gi|296395124|ref|YP_003660008.1| hypothetical protein Srot_2743 [Segniliparus rotundus DSM 44985] gi|296182271|gb|ADG99177.1| hypothetical protein Srot_2743 [Segniliparus rotundus DSM 44985] Length=167 Score = 79.3 bits (194), Expect = 2e-13, Method: Compositional matrix adjust. Identities = 47/115 (41%), Positives = 62/115 (54%), Gaps = 1/115 (0%) Query 18 LRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIPLTSITKAEATNARVYAA 77 +R+DRW PL+ G+GPK + + V L VK GWAF D+PL +I A R A Sbjct 45 MRYDRWCRPLSTVFGMGPKRAVIRVDDNMLRVKHGWAFQIDVPLGNIASARLYGKRPLAW 104 Query 78 GVHFGFGRWLVNGSRKGLVALTIDPP-EQAKMWKKSMTVRELWVSVTDPDALVTA 131 GVH WLVNGSR G+V + P + K + VR + VS+ PDA + A Sbjct 105 GVHGAEDGWLVNGSRDGVVIVRFATPVKPVKAPLGAWPVRCVLVSLEKPDAFLAA 159 >gi|343921194|gb|EGV31918.1| hypothetical protein ThidrDRAFT_1803 [Thiorhodococcus drewsii AZ1] Length=128 Score = 64.3 bits (155), Expect = 6e-09, Method: Compositional matrix adjust. Identities = 40/109 (37%), Positives = 62/109 (57%), Gaps = 0/109 (0%) Query 18 LRFDRWFLPLAVPLGLGPKNSELWVGAGSLHVKMGWAFAADIPLTSITKAEATNARVYAA 77 +RFD W+ L+ L L P S + + + V+MGWAF+A P T+I + N + Sbjct 8 IRFDGWYEFLSTLLLLPPSTSYVSISRDLVEVRMGWAFSARFPRTAIASVASLNRPPVSR 67 Query 78 GVHFGFGRWLVNGSRKGLVALTIDPPEQAKMWKKSMTVRELWVSVTDPD 126 GVH GRWLVNGS +G++ L + P ++ + + +REL VS+ P+ Sbjct 68 GVHGFAGRWLVNGSGRGILTLDLKPAQRGYVMGFPVRLRELQVSLERPE 116 >gi|170577352|ref|XP_001893973.1| Immunoglobulin I-set domain containing protein [Brugia malayi] gi|158599677|gb|EDP37190.1| Immunoglobulin I-set domain containing protein [Brugia malayi] Length=4791 Score = 35.8 bits (81), Expect = 2.0, Method: Composition-based stats. Identities = 15/36 (42%), Positives = 22/36 (62%), Gaps = 0/36 (0%) Query 54 AFAADIPLTSITKAEATNARVYAAGVHFGFGRWLVN 89 AF +DIP T+IT+ E+ N +V G F +W +N Sbjct 551 AFLSDIPATTITEGESLNVKVIITGDPTPFTKWYIN 586 >gi|21752415|dbj|BAC04191.1| unnamed protein product [Homo sapiens] Length=158 Score = 35.0 bits (79), Expect = 3.2, Method: Compositional matrix adjust. Identities = 18/49 (37%), Positives = 27/49 (56%), Gaps = 9/49 (18%) Query 40 LWVGAGSLHVK----MGWAFAADIPLTSITKAEATNARVYAAGVHFGFG 84 +W GA +LHVK + W A+ IP+ AE+ N ++A +H G G Sbjct 13 IWTGAENLHVKISCSLDWLMASVIPV-----AESRNLYIFADELHLGMG 56 >gi|312067154|ref|XP_003136609.1| hypothetical protein LOAG_01021 [Loa loa] gi|307768228|gb|EFO27462.1| hypothetical protein LOAG_01021 [Loa loa] Length=5884 Score = 34.3 bits (77), Expect = 5.8, Method: Composition-based stats. Identities = 13/36 (37%), Positives = 22/36 (62%), Gaps = 0/36 (0%) Query 54 AFAADIPLTSITKAEATNARVYAAGVHFGFGRWLVN 89 +F +DIP T++T+ E+ N +V G F +W +N Sbjct 1467 SFLSDIPATTVTEGESLNVKVIVTGDPTPFTKWYIN 1502 >gi|242054577|ref|XP_002456434.1| hypothetical protein SORBIDRAFT_03g036240 [Sorghum bicolor] gi|241928409|gb|EES01554.1| hypothetical protein SORBIDRAFT_03g036240 [Sorghum bicolor] Length=1074 Score = 34.3 bits (77), Expect = 6.7, Method: Composition-based stats. Identities = 13/33 (40%), Positives = 22/33 (67%), Gaps = 0/33 (0%) Query 101 DPPEQAKMWKKSMTVRELWVSVTDPDALVTACT 133 DP ++A++WK+ M E + + +PDA+ ACT Sbjct 297 DPDDEARLWKEHMNQIEATMVLLEPDAVARACT 329 >gi|90023210|ref|YP_529037.1| hypothetical protein Sde_3570 [Saccharophagus degradans 2-40] gi|89952810|gb|ABD82825.1| type II secretion system protein N [Saccharophagus degradans 2-40] Length=256 Score = 33.9 bits (76), Expect = 7.4, Method: Compositional matrix adjust. Identities = 35/107 (33%), Positives = 47/107 (44%), Gaps = 26/107 (24%) Query 43 GAGSLHVKMGWAFAADIPLTSITKAEA----TNARVYAAGVHFGFGRWLVN---GSRKGL 95 G GS+ V +A++ SI K +A NARV+A G F G + N R GL Sbjct 141 GTGSIEV-----ISAEVTPQSIQKMDARVSWQNARVFADGTWFSLGSYAANVKENGRGGL 195 Query 96 VA--LTIDPPEQAKM---------WKKSMTVRELWVSVTDPDALVTA 131 A +D P Q K+ WK + TV+ L P+ LV A Sbjct 196 AADVFDLDAPFQTKLNADWMANQGWKLNGTVKPL---SNAPELLVQA 239 Lambda K H 0.321 0.134 0.429 Gapped Lambda K H 0.267 0.0410 0.140 Effective search space used: 129391415580 Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Sep 5, 2011 4:36 AM Number of letters in database: 5,219,829,388 Number of sequences in database: 15,229,318 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Neighboring words threshold: 11 Window for multiple hits: 40