BLASTP 2.2.25+
Reference:
Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database
search programs", Nucleic Acids Res. 25:3389-3402.
Reference for composition-based statistics:
Alejandro A. Schäffer, L. Aravind, Thomas L. Madden, Sergei
Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and
Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST
protein database searches with composition-based statistics and
other refinements", Nucleic Acids Res. 29:2994-3005.
Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
excluding environmental samples from WGS projects
15,229,318 sequences; 5,219,829,388 total letters
Query= Rv2630
Length=179
Score E
Sequences producing significant alignments: (Bits) Value
gi|15609767|ref|NP_217146.1| hypothetical protein Rv2630 [Mycoba... 357 6e-97
gi|340627651|ref|YP_004746103.1| hypothetical protein MCAN_26761... 354 3e-96
gi|289553580|ref|ZP_06442790.1| hypothetical protein TBXG_01330 ... 353 6e-96
gi|308232195|ref|ZP_07415245.2| hypothetical protein TMAG_02439 ... 323 5e-87
gi|308380421|ref|ZP_07489902.2| hypothetical protein TMKG_03062 ... 303 6e-81
gi|306790035|ref|ZP_07428357.1| hypothetical protein TMDG_00348 ... 255 2e-66
gi|339295506|gb|AEJ47617.1| hypothetical protein CCDC5079_2427 [... 243 6e-63
gi|240169277|ref|ZP_04747936.1| hypothetical protein MkanA1_0818... 192 2e-47
gi|145594926|ref|YP_001159223.1| hypothetical protein Strop_2398... 131 4e-29
gi|159038127|ref|YP_001537380.1| hypothetical protein Sare_2547 ... 130 6e-29
gi|337768996|emb|CCB77709.1| conserved protein of unknown functi... 123 1e-26
gi|330468160|ref|YP_004405903.1| hypothetical protein VAB18032_2... 119 2e-25
gi|296268819|ref|YP_003651451.1| hypothetical protein Tbis_0834 ... 115 4e-24
gi|302555771|ref|ZP_07308113.1| conserved hypothetical protein [... 102 2e-20
gi|269126263|ref|YP_003299633.1| hypothetical protein Tcur_2028 ... 94.7 5e-18
gi|291435590|ref|ZP_06574980.1| conserved hypothetical protein [... 91.7 4e-17
gi|302562260|ref|ZP_07314602.1| conserved hypothetical protein [... 84.7 5e-15
gi|296268412|ref|YP_003651044.1| hypothetical protein Tbis_0423 ... 65.5 3e-09
gi|297154768|gb|ADI04480.1| hypothetical protein SBI_01359 [Stre... 53.9 8e-06
gi|167042797|gb|ABZ07515.1| putative protein of unknown function... 48.1 6e-04
gi|322420995|ref|YP_004200218.1| hypothetical protein GM18_3508 ... 47.8 7e-04
gi|327400701|ref|YP_004341540.1| hypothetical protein Arcve_0809... 46.6 0.001
gi|320160546|ref|YP_004173770.1| hypothetical protein ANT_11360 ... 46.2 0.002
gi|73668780|ref|YP_304795.1| archease family protein [Methanosar... 46.2 0.002
gi|328952466|ref|YP_004369800.1| protein of unknown function DUF... 44.7 0.006
gi|328949729|ref|YP_004367064.1| protein of unknown function DUF... 41.2 0.060
gi|295798135|emb|CAX68976.1| conserved hypothetical protein of u... 41.2 0.060
gi|320449451|ref|YP_004201547.1| hypothetical protein TSC_c03600... 40.8 0.074
gi|284162832|ref|YP_003401455.1| hypothetical protein Arcpr_1737... 40.4 0.11
gi|322793517|gb|EFZ17043.1| hypothetical protein SINV_12010 [Sol... 40.4 0.12
gi|225709956|gb|ACO10824.1| archease [Caligus rogercresseyi] 40.0 0.15
gi|119719189|ref|YP_919684.1| hypothetical protein Tpen_0271 [Th... 39.7 0.17
gi|154151334|ref|YP_001404952.1| hypothetical protein Mboo_1793 ... 39.7 0.19
gi|225712042|gb|ACO11867.1| archease [Lepeophtheirus salmonis] >... 39.7 0.19
gi|150003078|ref|YP_001297822.1| hypothetical protein BVU_0490 [... 39.3 0.24
gi|21227639|ref|NP_633561.1| hypothetical protein MM_1537 [Metha... 39.3 0.24
gi|254882357|ref|ZP_05255067.1| acetyl-CoA carboxylase [Bacteroi... 38.9 0.30
gi|55981714|ref|YP_145011.1| hypothetical protein TTHA1745 [Ther... 38.9 0.30
gi|294775898|ref|ZP_06741397.1| conserved hypothetical protein [... 38.5 0.36
gi|303277509|ref|XP_003058048.1| predicted protein [Micromonas p... 38.5 0.39
gi|253701867|ref|YP_003023056.1| hypothetical protein GM21_3272 ... 38.5 0.39
gi|77917636|ref|YP_355451.1| hypothetical protein Pcar_0018 [Pel... 38.5 0.42
gi|344287554|ref|XP_003415518.1| PREDICTED: protein archease-lik... 38.5 0.44
gi|171185173|ref|YP_001794092.1| hypothetical protein Tneu_0707 ... 38.5 0.46
gi|19075632|ref|NP_588132.1| UTP-glucose-1-phosphate uridylyltra... 38.1 0.54
gi|70999021|ref|XP_754232.1| APSES transcription factor [Aspergi... 38.1 0.56
gi|197117378|ref|YP_002137805.1| hypothetical protein Gbem_0988 ... 37.7 0.62
gi|300710833|ref|YP_003736647.1| hypothetical protein HacjB3_073... 37.7 0.63
gi|332016564|gb|EGI57445.1| Protein archease-like protein [Acrom... 37.7 0.72
gi|218294652|ref|ZP_03495506.1| protein of unknown function DUF1... 37.4 0.97
>gi|15609767|ref|NP_217146.1| hypothetical protein Rv2630 [Mycobacterium tuberculosis H37Rv]
gi|15842170|ref|NP_337207.1| hypothetical protein MT2705 [Mycobacterium tuberculosis CDC1551]
gi|31793816|ref|NP_856309.1| hypothetical protein Mb2663 [Mycobacterium bovis AF2122/97]
54 more sequence titles
Length=179
Score = 357 bits (915), Expect = 6e-97, Method: Compositional matrix adjust.
Identities = 179/179 (100%), Positives = 179/179 (100%), Gaps = 0/179 (0%)
Query 1 MLHRDDHINPPRPRGLDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEA 60
MLHRDDHINPPRPRGLDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEA
Sbjct 1 MLHRDDHINPPRPRGLDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEA 60
Query 61 WAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVG 120
WAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVG
Sbjct 61 WAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVG 120
Query 121 ETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 179
ETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV
Sbjct 121 ETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 179
>gi|340627651|ref|YP_004746103.1| hypothetical protein MCAN_26761 [Mycobacterium canettii CIPT
140010059]
gi|340005841|emb|CCC45007.1| hypothetical protein MCAN_26761 [Mycobacterium canettii CIPT
140010059]
Length=179
Score = 354 bits (908), Expect = 3e-96, Method: Compositional matrix adjust.
Identities = 178/179 (99%), Positives = 178/179 (99%), Gaps = 0/179 (0%)
Query 1 MLHRDDHINPPRPRGLDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEA 60
MLHRDDHINPPRPRGLDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEA
Sbjct 1 MLHRDDHINPPRPRGLDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEA 60
Query 61 WAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVG 120
WAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTAD DDDLLVAVLEEVIYLLDTVG
Sbjct 61 WAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADHDDDLLVAVLEEVIYLLDTVG 120
Query 121 ETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 179
ETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV
Sbjct 121 ETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 179
>gi|289553580|ref|ZP_06442790.1| hypothetical protein TBXG_01330 [Mycobacterium tuberculosis KZN
605]
gi|289438212|gb|EFD20705.1| hypothetical protein TBXG_01330 [Mycobacterium tuberculosis KZN
605]
Length=178
Score = 353 bits (906), Expect = 6e-96, Method: Compositional matrix adjust.
Identities = 177/178 (99%), Positives = 178/178 (100%), Gaps = 0/178 (0%)
Query 2 LHRDDHINPPRPRGLDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEAW 61
+HRDDHINPPRPRGLDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEAW
Sbjct 1 MHRDDHINPPRPRGLDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEAW 60
Query 62 APTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVGE 121
APTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVGE
Sbjct 61 APTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVGE 120
Query 122 TPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 179
TPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV
Sbjct 121 TPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 178
>gi|308232195|ref|ZP_07415245.2| hypothetical protein TMAG_02439 [Mycobacterium tuberculosis SUMu001]
gi|308369806|ref|ZP_07419146.2| hypothetical protein TMBG_02768 [Mycobacterium tuberculosis SUMu002]
gi|308371077|ref|ZP_07423757.2| hypothetical protein TMCG_01878 [Mycobacterium tuberculosis SUMu003]
11 more sequence titles
Length=164
Score = 323 bits (829), Expect = 5e-87, Method: Compositional matrix adjust.
Identities = 163/164 (99%), Positives = 164/164 (100%), Gaps = 0/164 (0%)
Query 16 LDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLG 75
+DVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLG
Sbjct 1 MDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLG 60
Query 76 TVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGV 135
TVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGV
Sbjct 61 TVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGV 120
Query 136 DVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 179
DVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV
Sbjct 121 DVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 164
>gi|308380421|ref|ZP_07489902.2| hypothetical protein TMKG_03062 [Mycobacterium tuberculosis SUMu011]
gi|308361533|gb|EFP50384.1| hypothetical protein TMKG_03062 [Mycobacterium tuberculosis SUMu011]
Length=163
Score = 303 bits (776), Expect = 6e-81, Method: Compositional matrix adjust.
Identities = 153/153 (100%), Positives = 153/153 (100%), Gaps = 0/153 (0%)
Query 27 NPLRALARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESA 86
NPLRALARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESA
Sbjct 11 NPLRALARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESA 70
Query 87 HAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDAST 146
HAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDAST
Sbjct 71 HAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDAST 130
Query 147 LVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 179
LVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV
Sbjct 131 LVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 163
>gi|306790035|ref|ZP_07428357.1| hypothetical protein TMDG_00348 [Mycobacterium tuberculosis SUMu004]
gi|307085322|ref|ZP_07494435.1| hypothetical protein TMLG_02361 [Mycobacterium tuberculosis SUMu012]
gi|308333522|gb|EFP22373.1| hypothetical protein TMDG_00348 [Mycobacterium tuberculosis SUMu004]
gi|308365146|gb|EFP53997.1| hypothetical protein TMLG_02361 [Mycobacterium tuberculosis SUMu012]
gi|323718782|gb|EGB27940.1| hypothetical protein TMMG_02642 [Mycobacterium tuberculosis CDC1551A]
Length=130
Score = 255 bits (651), Expect = 2e-66, Method: Compositional matrix adjust.
Identities = 129/130 (99%), Positives = 130/130 (100%), Gaps = 0/130 (0%)
Query 50 VPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVL 109
+PHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVL
Sbjct 1 MPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVL 60
Query 110 EEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRH 169
EEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRH
Sbjct 61 EEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRH 120
Query 170 GWRCAVTLDV 179
GWRCAVTLDV
Sbjct 121 GWRCAVTLDV 130
>gi|339295506|gb|AEJ47617.1| hypothetical protein CCDC5079_2427 [Mycobacterium tuberculosis
CCDC5079]
gi|339299124|gb|AEJ51234.1| hypothetical protein CCDC5180_2397 [Mycobacterium tuberculosis
CCDC5180]
Length=124
Score = 243 bits (621), Expect = 6e-63, Method: Compositional matrix adjust.
Identities = 123/124 (99%), Positives = 124/124 (100%), Gaps = 0/124 (0%)
Query 56 LRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYL 115
+RIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYL
Sbjct 1 MRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEEVIYL 60
Query 116 LDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAV 175
LDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAV
Sbjct 61 LDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAV 120
Query 176 TLDV 179
TLDV
Sbjct 121 TLDV 124
>gi|240169277|ref|ZP_04747936.1| hypothetical protein MkanA1_08184 [Mycobacterium kansasii ATCC
12478]
Length=140
Score = 192 bits (487), Expect = 2e-47, Method: Compositional matrix adjust.
Identities = 95/137 (70%), Positives = 108/137 (79%), Gaps = 0/137 (0%)
Query 43 TSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDD 102
+GHRSVPHTADLRIEAWAPTRDGCIRQAVLG VESFLD SA TR RR+TA+ DD
Sbjct 4 NKAGHRSVPHTADLRIEAWAPTRDGCIRQAVLGVVESFLDTSSAPVQQTRRRRVTANSDD 63
Query 103 DLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNEL 162
LLVA L+EVIYLLDT G+ PV+L L + DGGVD+T DA + QVGAVPKAVSLN+L
Sbjct 64 SLLVAALDEVIYLLDTTGQAPVELMLSEADGGVDMTLEMVDAGAVPQVGAVPKAVSLNDL 123
Query 163 RFSQGRHGWRCAVTLDV 179
+G HGWRC+VT+DV
Sbjct 124 CLVRGEHGWRCSVTVDV 140
>gi|145594926|ref|YP_001159223.1| hypothetical protein Strop_2398 [Salinispora tropica CNB-440]
gi|145304263|gb|ABP54845.1| protein of unknown function DUF101 [Salinispora tropica CNB-440]
Length=177
Score = 131 bits (330), Expect = 4e-29, Method: Compositional matrix adjust.
Identities = 72/155 (47%), Positives = 99/155 (64%), Gaps = 3/155 (1%)
Query 27 NPLRALARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESA 86
P +A+ + +P GHR +PHTAD+RIEAWAPTR+ C+ +AV V++F+D A
Sbjct 24 GPEQAIVADDRGVEPQPERGHRCLPHTADVRIEAWAPTREACVAEAVTALVDTFVDPGPA 83
Query 87 HAVHTRLRRLTADRDDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDG--GVDVTFATTDA 144
R R A D DLLV +LEEVI+ ++T+GE P+ + D DG G+ V + TTDA
Sbjct 84 QPTAERAYRAPAAEDGDLLVNILEEVIFRMETMGELPLRTEVHD-DGTDGLHVRWQTTDA 142
Query 145 STLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 179
T+ +GAVPKA+SL+ELRF W CA+T+DV
Sbjct 143 DTVELIGAVPKAISLHELRFGPDGPRWSCALTVDV 177
>gi|159038127|ref|YP_001537380.1| hypothetical protein Sare_2547 [Salinispora arenicola CNS-205]
gi|157916962|gb|ABV98389.1| protein of unknown function DUF101 [Salinispora arenicola CNS-205]
Length=142
Score = 130 bits (328), Expect = 6e-29, Method: Compositional matrix adjust.
Identities = 74/141 (53%), Positives = 96/141 (69%), Gaps = 1/141 (0%)
Query 40 KPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTAD 99
+P GHR VPHTAD+RIEAWAPTR+ C+ +AV +++F+D A R AD
Sbjct 2 EPRPERGHRCVPHTADVRIEAWAPTREACVAEAVAALLDTFVDHRPARPTAERTYHAPAD 61
Query 100 RDDDLLVAVLEEVIYLLDTVGETPVDLRLR-DVDGGVDVTFATTDASTLVQVGAVPKAVS 158
+DDDLLV+VL+EVI+ +DT GE P+ +R D DGG+ V + TTD + +GAVPKAVS
Sbjct 62 QDDDLLVSVLDEVIFRMDTTGELPLRTEVRDDGDGGLHVRWQTTDTGEMELIGAVPKAVS 121
Query 159 LNELRFSQGRHGWRCAVTLDV 179
L+ELRF GW CAVT+DV
Sbjct 122 LHELRFGPDAEGWACAVTVDV 142
>gi|337768996|emb|CCB77709.1| conserved protein of unknown function [Streptomyces cattleya
NRRL 8057]
Length=150
Score = 123 bits (308), Expect = 1e-26, Method: Compositional matrix adjust.
Identities = 71/142 (50%), Positives = 91/142 (65%), Gaps = 0/142 (0%)
Query 38 AGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLT 97
+G+ GHR++PHTADLRIEAWAPTR+GC+ +AV TVE+F D A L L
Sbjct 9 SGRDRAERGHRALPHTADLRIEAWAPTREGCLAEAVAATVEAFADPAGARPSREHLVELG 68
Query 98 ADRDDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAV 157
+D LVA+L+EV+Y LD+ GE PV + + G+ V A DA++L GAVPKAV
Sbjct 69 GAGAEDRLVALLDEVVYRLDSAGEVPVATEVTWLPEGLRVRLAMADAASLAVTGAVPKAV 128
Query 158 SLNELRFSQGRHGWRCAVTLDV 179
+ + L F G GWRCAVTLDV
Sbjct 129 TWHRLEFGGGPSGWRCAVTLDV 150
>gi|330468160|ref|YP_004405903.1| hypothetical protein VAB18032_21015 [Verrucosispora maris AB-18-032]
gi|328811131|gb|AEB45303.1| hypothetical protein VAB18032_21015 [Verrucosispora maris AB-18-032]
Length=131
Score = 119 bits (297), Expect = 2e-25, Method: Compositional matrix adjust.
Identities = 68/130 (53%), Positives = 81/130 (63%), Gaps = 0/130 (0%)
Query 50 VPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVL 109
VPHTAD+RIEAWAP R GC+ +AV VESF DL A R R D+DLLVAVL
Sbjct 2 VPHTADVRIEAWAPDRAGCLAEAVTAMVESFADLAGARLHAEREFRPPPAADEDLLVAVL 61
Query 110 EEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRH 169
+EVIY ++T E P+ + D GG+ V + T + GAVPKAVSL+ELRF
Sbjct 62 DEVIYRMETADELPLVTEITDDAGGLRVRWGVTATGEVELTGAVPKAVSLHELRFGGNDA 121
Query 170 GWRCAVTLDV 179
GW AVTLDV
Sbjct 122 GWSGAVTLDV 131
>gi|296268819|ref|YP_003651451.1| hypothetical protein Tbis_0834 [Thermobispora bispora DSM 43833]
gi|296091606|gb|ADG87558.1| protein of unknown function DUF101 [Thermobispora bispora DSM
43833]
Length=153
Score = 115 bits (287), Expect = 4e-24, Method: Compositional matrix adjust.
Identities = 63/148 (43%), Positives = 82/148 (56%), Gaps = 12/148 (8%)
Query 39 GKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDL------ESAHAVHTR 92
G P GHR++PHTAD+R+ AWAPTR CI +AVLG ESF DL A +H
Sbjct 11 GGPAARRGHRTLPHTADMRVAAWAPTRAECIAEAVLGVAESFTDLGGDPPCRGAETLH-- 68
Query 93 LRRLTADRDDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGA 152
L +D L+A L+EVI+ LDT P R+ + G+ + T+ + Q GA
Sbjct 69 ---LEPGPPEDQLLAALDEVIFQLDTTARVPFRAEAREEENGISLRLWHTELGAVTQTGA 125
Query 153 VPKAVSLNELRFSQGRHG-WRCAVTLDV 179
PK ++L LRF QG G W C VT+DV
Sbjct 126 SPKGIALENLRFEQGPDGTWTCEVTVDV 153
>gi|302555771|ref|ZP_07308113.1| conserved hypothetical protein [Streptomyces viridochromogenes
DSM 40736]
gi|302473389|gb|EFL36482.1| conserved hypothetical protein [Streptomyces viridochromogenes
DSM 40736]
Length=151
Score = 102 bits (254), Expect = 2e-20, Method: Compositional matrix adjust.
Identities = 59/133 (45%), Positives = 80/133 (61%), Gaps = 0/133 (0%)
Query 46 GHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLL 105
GHR+VPH D+RIEAWA R+ C+ +AVL VE F D+ L DDDLL
Sbjct 18 GHRAVPHPGDIRIEAWAACREHCLAEAVLAMVECFADVSGVRPTAVDQVWLAEASDDDLL 77
Query 106 VAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFS 165
++L+EVI+ L+ G+ PVD+ + DGG+DV A T + + GAVP AV+ +ELR
Sbjct 78 ASLLDEVIFRLEAYGQVPVDVEADEADGGLDVRLAVTGVADVAITGAVPTAVAWDELRIG 137
Query 166 QGRHGWRCAVTLD 178
+GW CAV +D
Sbjct 138 PDPYGWSCAVRVD 150
>gi|269126263|ref|YP_003299633.1| hypothetical protein Tcur_2028 [Thermomonospora curvata DSM 43183]
gi|268311221|gb|ACY97595.1| protein of unknown function DUF101 [Thermomonospora curvata DSM
43183]
Length=152
Score = 94.7 bits (234), Expect = 5e-18, Method: Compositional matrix adjust.
Identities = 59/138 (43%), Positives = 84/138 (61%), Gaps = 8/138 (5%)
Query 44 SSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDD 103
++GHR++PHTAD+RIEAWAPT C+ +A VESF DL +HA T +R + +D
Sbjct 21 AAGHRTLPHTADMRIEAWAPTLQECLTEAARAMVESFTDL--SHAKPTAVREFPLENNDP 78
Query 104 --LLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNE 161
LL+++LEE+IY +D E P+ + L G TD + L +GAVPKAVS +
Sbjct 79 ENLLLSLLEELIYRMDAHAELPLSITLH----GSQARCTMTDTTRLPTLGAVPKAVSRAD 134
Query 162 LRFSQGRHGWRCAVTLDV 179
L ++ +GW C T+DV
Sbjct 135 LHVTKTPNGWHCTATVDV 152
>gi|291435590|ref|ZP_06574980.1| conserved hypothetical protein [Streptomyces ghanaensis ATCC
14672]
gi|291338485|gb|EFE65441.1| conserved hypothetical protein [Streptomyces ghanaensis ATCC
14672]
Length=152
Score = 91.7 bits (226), Expect = 4e-17, Method: Compositional matrix adjust.
Identities = 57/138 (42%), Positives = 81/138 (59%), Gaps = 1/138 (0%)
Query 41 PGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADR 100
PG +GHR+VP D+RIE WA +R+ C+ +AV VE F D+ RL
Sbjct 14 PG-DNGHRTVPQEDDVRIEVWAESRESCLAEAVAAVVECFADVSGVRPTGVGRIRLDEAS 72
Query 101 DDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLN 160
DDDLL A+L+E+++ L G+ PVD+ + +GG+DV A + + GA+P AV+
Sbjct 73 DDDLLAALLDEILHRLRVHGQVPVDVEADEAEGGLDVRLAVAGLADVRVTGALPTAVAWE 132
Query 161 ELRFSQGRHGWRCAVTLD 178
ELR G +GW CAVT+D
Sbjct 133 ELRIGPGPYGWSCAVTVD 150
>gi|302562260|ref|ZP_07314602.1| conserved hypothetical protein [Streptomyces griseoflavus Tu4000]
gi|302479878|gb|EFL42971.1| conserved hypothetical protein [Streptomyces griseoflavus Tu4000]
Length=168
Score = 84.7 bits (208), Expect = 5e-15, Method: Compositional matrix adjust.
Identities = 50/135 (38%), Positives = 78/135 (58%), Gaps = 0/135 (0%)
Query 44 SSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDD 103
++GHR+V ++RIEAWA +R+ C+ +AV VE F D+ RL DDD
Sbjct 33 ANGHRAVSQEDEVRIEAWAASRENCLAEAVAAMVECFADVSGVRPTGVGRVRLEEPSDDD 92
Query 104 LLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELR 163
LL ++L+E++Y L+ G+ PVD+ +GG+DV A + + G P AV+ +LR
Sbjct 93 LLASLLDEILYRLEEHGQVPVDVEADAAEGGLDVRLALAALTDVRLTGPPPTAVAWEDLR 152
Query 164 FSQGRHGWRCAVTLD 178
G +GW CA+T++
Sbjct 153 IHPGPYGWSCALTIE 167
>gi|296268412|ref|YP_003651044.1| hypothetical protein Tbis_0423 [Thermobispora bispora DSM 43833]
gi|296091199|gb|ADG87151.1| protein of unknown function DUF101 [Thermobispora bispora DSM
43833]
Length=142
Score = 65.5 bits (158), Expect = 3e-09, Method: Compositional matrix adjust.
Identities = 44/141 (32%), Positives = 71/141 (51%), Gaps = 4/141 (2%)
Query 43 TSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDD 102
SSG+R++P A L +EAW +R+ C+ +AV V+SF ++ A + L DD
Sbjct 2 VSSGYRAIPRAAGLVVEAWGDSREECLAEAVRAVVDSFAEIGDAVPDGSVEFALAVRDDD 61
Query 103 DLLVAVLEEVIYLLDTVGETPVDLRLRD----VDGGVDVTFATTDASTLVQVGAVPKAVS 158
LL+ VL+EVI + PVD+ + + G ++ AT + +VGA+P+AV
Sbjct 62 ALLLTVLDEVIDQIQVEERVPVDVSIDEGTGIAMGEFEIRLATVPLEAVREVGALPEAVQ 121
Query 159 LNELRFSQGRHGWRCAVTLDV 179
F + WR ++V
Sbjct 122 PAGSWFRRESGVWRAHALIEV 142
>gi|297154768|gb|ADI04480.1| hypothetical protein SBI_01359 [Streptomyces bingchenggensis
BCW-1]
Length=54
Score = 53.9 bits (128), Expect = 8e-06, Method: Compositional matrix adjust.
Identities = 27/54 (50%), Positives = 33/54 (62%), Gaps = 0/54 (0%)
Query 126 LRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNELRFSQGRHGWRCAVTLDV 179
+ L V GGV TD +L GA PKAV+L+ L F +G GWRC+VTLDV
Sbjct 1 MELTAVPGGVRACLHMTDTGSLRATGAAPKAVTLHGLEFGRGPDGWRCSVTLDV 54
>gi|167042797|gb|ABZ07515.1| putative protein of unknown function DUF101 [uncultured marine
microorganism HF4000_ANIW137I15]
Length=142
Score = 48.1 bits (113), Expect = 6e-04, Method: Compositional matrix adjust.
Identities = 40/141 (29%), Positives = 60/141 (43%), Gaps = 8/141 (5%)
Query 44 SSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDD 103
S+G R P TAD+ IE W P +A+ G D + + R++ + +D
Sbjct 5 SAGFRLSPTTADVAIETWGPDAKDIFLEAIRGLFSVMCDPGTVEPRRSIPCRVSGENAED 64
Query 104 LLVAVLEEVIYLLDTVGETPVDLRLR-----DVDGGVDVTFATTDASTLVQVGAVPKAVS 158
L L E IYL + G D+R+ V+G + + L G KAV+
Sbjct 65 LFFRWLNEAIYLHEIHGFLACDVRISRWTNTQVEGLLIGEAVDPERHAL---GLEVKAVT 121
Query 159 LNELRFSQGRHGWRCAVTLDV 179
L+ L+ GWR V +DV
Sbjct 122 LHRLKVVHEPGGWRAYVIVDV 142
>gi|322420995|ref|YP_004200218.1| hypothetical protein GM18_3508 [Geobacter sp. M18]
gi|320127382|gb|ADW14942.1| protein of unknown function DUF101 [Geobacter sp. M18]
Length=143
Score = 47.8 bits (112), Expect = 7e-04, Method: Compositional matrix adjust.
Identities = 42/135 (32%), Positives = 63/135 (47%), Gaps = 12/135 (8%)
Query 54 ADLRIEAWAPTRDGCIRQAVLGT----VESFLDLESAHAVHTRLRRLTADRDDDLLVAVL 109
AD+ +AWAPT + + A T VE+ DL A + +L + + D+ LL L
Sbjct 12 ADIAFDAWAPTLEELFQDAARATMQVMVENLPDLRPAQTLEVKLEQ---ENDEMLLFDFL 68
Query 110 EEVIYLLDT--VGETPVDLRLRDVDGGVDV--TFATTDAS-TLVQVGAVPKAVSLNELRF 164
E+I+ D + P +L + D GV + T A + T Q+ KAV++ +
Sbjct 69 NELIFYKDARRLILLPTELSILPGDSGVTLRATLAGEEIDVTRHQMNTDVKAVTMLRYKV 128
Query 165 SQGRHGWRCAVTLDV 179
Q GWR V LDV
Sbjct 129 EQVPQGWRATVVLDV 143
>gi|327400701|ref|YP_004341540.1| hypothetical protein Arcve_0809 [Archaeoglobus veneficus SNP6]
gi|327316209|gb|AEA46825.1| protein of unknown function DUF101 [Archaeoglobus veneficus SNP6]
Length=137
Score = 46.6 bits (109), Expect = 0.001, Method: Compositional matrix adjust.
Identities = 40/143 (28%), Positives = 61/143 (43%), Gaps = 18/143 (12%)
Query 47 HRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLV 106
+R + HTAD+ E + + + I A L E+F E R + D +D LL
Sbjct 3 YRFIDHTADIAFEVFGSSIEELIENATLAFCEAFAYSEKIEGEVEREVEVEGDAEDMLLY 62
Query 107 AVLEEVIYLLDT----------VGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKA 156
L E++YL DT + E L+ R V G +T V+V PKA
Sbjct 63 HWLNELLYLFDTEFFAAKQAKAIVEGDGMLKARGVLKGGKLT------PEAVKVE--PKA 114
Query 157 VSLNELRFSQGRHGWRCAVTLDV 179
++L+ R + GW V +D+
Sbjct 115 ITLHNYRVEKRNGGWYAFVVVDI 137
>gi|320160546|ref|YP_004173770.1| hypothetical protein ANT_11360 [Anaerolinea thermophila UNI-1]
gi|319994399|dbj|BAJ63170.1| hypothetical protein ANT_11360 [Anaerolinea thermophila UNI-1]
Length=139
Score = 46.2 bits (108), Expect = 0.002, Method: Compositional matrix adjust.
Identities = 41/144 (29%), Positives = 62/144 (44%), Gaps = 8/144 (5%)
Query 38 AGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLT 97
A +P T G + HTAD + WAP +QA G + + +ES V L LT
Sbjct 2 ASEPKTLMGFEEIEHTADWSLRVWAPDLPTFFKQAATGMLH-LMGVESLD-VPAVLSDLT 59
Query 98 ADRDDD--LLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPK 155
+ DD LLVA L E+++LL+ + P L+ + A + L + K
Sbjct 60 VEGDDPEALLVAFLTEILFLLEK-SKVPRSFHLKFEGTKLK---AQLECVPLKSLAKEIK 115
Query 156 AVSLNELRFSQGRHGWRCAVTLDV 179
AV+ + L + +T DV
Sbjct 116 AVTFHNLSIQKQNGVLEATITFDV 139
>gi|73668780|ref|YP_304795.1| archease family protein [Methanosarcina barkeri str. Fusaro]
gi|121718682|sp|Q46D27.1|ARCH_METBF RecName: Full=Protein archease
gi|72395942|gb|AAZ70215.1| archease family protein [Methanosarcina barkeri str. Fusaro]
Length=146
Score = 46.2 bits (108), Expect = 0.002, Method: Compositional matrix adjust.
Identities = 37/145 (26%), Positives = 65/145 (45%), Gaps = 18/145 (12%)
Query 47 HRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLV 106
+ + HTAD++ +A+ TR+ A L +D + R L + + LLV
Sbjct 8 YEYLDHTADIKFQAYGKTREEVFENAALAMFNVIIDTKKISGDTAREIFLKSPDLESLLV 67
Query 107 AVLEEVIYLLD----TVGETPVDLRLRDVDGGVDVTFATTDASTL---VQVGAVP----- 154
L E++YL + E VD +R+ +G +T A L + ++P
Sbjct 68 DWLSELLYLFEVDEIVFREFRVD-NIREENGEYSIT-----AQALGEKYDLKSLPFETEI 121
Query 155 KAVSLNELRFSQGRHGWRCAVTLDV 179
KAV+ N+L ++ GW+ V +D+
Sbjct 122 KAVTYNQLEITKTADGWKAQVVVDI 146
>gi|328952466|ref|YP_004369800.1| protein of unknown function DUF101 [Desulfobacca acetoxidans
DSM 11109]
gi|328452790|gb|AEB08619.1| protein of unknown function DUF101 [Desulfobacca acetoxidans
DSM 11109]
Length=143
Score = 44.7 bits (104), Expect = 0.006, Method: Compositional matrix adjust.
Identities = 39/139 (29%), Positives = 62/139 (45%), Gaps = 6/139 (4%)
Query 45 SGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDL 104
SG RS+ HTAD+ + + T Q +D A+ +R+ + A + L
Sbjct 7 SGFRSIAHTADVGFKLYGETLADIFVQGAHALYGMMVDRRRLRALESRMVAVDAPDREAL 66
Query 105 LVAVLEEVIYLLDTVG--ETPVD-LRLRDVDGGVDVTFATTDASTL-VQVGAVPKAVSLN 160
L+A L ++YL DT G +D L L DV+ G + D ++ G KA + +
Sbjct 67 LIAWLNHLLYLFDTTGFLGKQIDILDLSDVNLGARMQGEKLDPERHDLKTGV--KAATYH 124
Query 161 ELRFSQGRHGWRCAVTLDV 179
+L Q + GW V D+
Sbjct 125 KLAVRQTQAGWEATVIFDL 143
>gi|328949729|ref|YP_004367064.1| protein of unknown function DUF101 [Marinithermus hydrothermalis
DSM 14884]
gi|328450053|gb|AEB10954.1| protein of unknown function DUF101 [Marinithermus hydrothermalis
DSM 14884]
Length=139
Score = 41.2 bits (95), Expect = 0.060, Method: Compositional matrix adjust.
Identities = 39/138 (29%), Positives = 53/138 (39%), Gaps = 7/138 (5%)
Query 48 RSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVA 107
R + HTAD+ E APTR+ A G + TR L A + LLV
Sbjct 3 RMLDHTADVGFEVRAPTREALFETAREGLLRLMFQHPPTQGSATREVTLQAPDLEFLLVR 62
Query 108 VLEEVIYLLDTVGETPVDLRLRDVDGGVDVTF------ATTDASTLVQVGAVPKAVSLNE 161
L E+IYL+ T G P + + T DA + G + K+ + +
Sbjct 63 WLNELIYLVQTAGFVPARAAITITEDAQGFTLRARLEGQPFDAEAMGWQGEI-KSATFHG 121
Query 162 LRFSQGRHGWRCAVTLDV 179
L GW V LDV
Sbjct 122 LEVRPETSGWWARVVLDV 139
>gi|295798135|emb|CAX68976.1| conserved hypothetical protein of unknown function DUF101 [uncultured
bacterium]
Length=141
Score = 41.2 bits (95), Expect = 0.060, Method: Compositional matrix adjust.
Identities = 36/138 (27%), Positives = 61/138 (45%), Gaps = 11/138 (7%)
Query 50 VPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVL 109
+ HTAD+ I +A +R+ + G L E +A + + L + + D+LLV L
Sbjct 7 IEHTADIGIRVFARSREELFANSAEGLFSLLLKKEQTNASNIKKVTLKSPKLDELLVQWL 66
Query 110 EEVIYLLDTVGETPVDL--RLRDVDGGVDVTFATT------DASTLVQVGAVPKAVSLNE 161
E++ L P L + DG + + AT D+ TL + KA + +
Sbjct 67 NELLSLFYAEAFVPAKFSASLAETDGTLTLN-ATVEGDEIEDSVTLAKTEV--KAATYHR 123
Query 162 LRFSQGRHGWRCAVTLDV 179
L+ Q + G++ V DV
Sbjct 124 LKVEQFKGGYKAEVIFDV 141
>gi|320449451|ref|YP_004201547.1| hypothetical protein TSC_c03600 [Thermus scotoductus SA-01]
gi|320149620|gb|ADW20998.1| hypothetical protein TSC_c03600 [Thermus scotoductus SA-01]
Length=139
Score = 40.8 bits (94), Expect = 0.074, Method: Compositional matrix adjust.
Identities = 39/135 (29%), Positives = 62/135 (46%), Gaps = 10/135 (7%)
Query 52 HTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDD--DLLVAVL 109
HTAD+ A + +G + A+ G ++ +R RRL+ + +D LLV L
Sbjct 8 HTADVGFLLEAESLEGLFQAALKGLLQVMFLFPPEGG--SRRRRLSLEAEDLETLLVRFL 65
Query 110 EEVIYLLDTVGETPVDLRLR--DVDGGVDVT---FATTDASTLVQVGAVPKAVSLNELRF 164
E+IYL+ T G P R+R GG +T + + G V K+ + + L+
Sbjct 66 NELIYLIQTKGFVPGKARVRVQKEAGGYRLTATLWGEPFQESFGFQGEV-KSATFHGLQV 124
Query 165 SQGRHGWRCAVTLDV 179
S+ W+ V LDV
Sbjct 125 SRENGAWKAQVILDV 139
>gi|284162832|ref|YP_003401455.1| hypothetical protein Arcpr_1737 [Archaeoglobus profundus DSM
5631]
gi|284012829|gb|ADB58782.1| protein of unknown function DUF101 [Archaeoglobus profundus DSM
5631]
Length=138
Score = 40.4 bits (93), Expect = 0.11, Method: Compositional matrix adjust.
Identities = 32/147 (22%), Positives = 55/147 (38%), Gaps = 25/147 (17%)
Query 47 HRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLV 106
+R + HTAD+ E + + + I A E+F+ E + AD D LL
Sbjct 3 YRFIDHTADVAFEVFGNSLEELIENATYAFYEAFVYTEKLDENRVLNVNVEADSPDYLLY 62
Query 107 AVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQ--------------VGA 152
L +++ DT + GG V F + +++ V
Sbjct 63 NWLSKLLIAFDT-----------EFFGGKTVEFVKVEEGEILKATGKIRGGTLRPEIVKV 111
Query 153 VPKAVSLNELRFSQGRHGWRCAVTLDV 179
PKA++L+ + GW V +D+
Sbjct 112 EPKAITLHNFVVEKKNGGWYAYVVVDI 138
>gi|322793517|gb|EFZ17043.1| hypothetical protein SINV_12010 [Solenopsis invicta]
Length=151
Score = 40.4 bits (93), Expect = 0.12, Method: Compositional matrix adjust.
Identities = 22/72 (31%), Positives = 36/72 (50%), Gaps = 4/72 (5%)
Query 47 HRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDD--DL 104
+ + HTAD+++ AW T D Q + DLE A T++ + A+ DD L
Sbjct 12 YEYLDHTADVQLHAWGDTLDEAFEQCAMAMFGYMTDLERVQA--TQVHYIEAEGDDMESL 69
Query 105 LVAVLEEVIYLL 116
L L+E++Y+
Sbjct 70 LFHFLDELLYMF 81
>gi|225709956|gb|ACO10824.1| archease [Caligus rogercresseyi]
Length=167
Score = 40.0 bits (92), Expect = 0.15, Method: Compositional matrix adjust.
Identities = 22/71 (31%), Positives = 35/71 (50%), Gaps = 0/71 (0%)
Query 44 SSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDD 103
+SG+ + HTAD++I AW+ T I +A LG DL + A + + +
Sbjct 16 TSGYEYLDHTADVQIHAWSSTLREAICEAALGVYNYMTDLSAVTASEDLILKAQGHDLES 75
Query 104 LLVAVLEEVIY 114
LL L+E +Y
Sbjct 76 LLYNFLDECLY 86
>gi|119719189|ref|YP_919684.1| hypothetical protein Tpen_0271 [Thermofilum pendens Hrk 5]
gi|119524309|gb|ABL77681.1| protein of unknown function DUF101 [Thermofilum pendens Hrk 5]
Length=165
Score = 39.7 bits (91), Expect = 0.17, Method: Compositional matrix adjust.
Identities = 26/90 (29%), Positives = 40/90 (45%), Gaps = 0/90 (0%)
Query 31 ALARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVH 90
A+ + V G GT G +PHTAD+ + A + + A +G E + S +
Sbjct 11 AVKKGVVEGMSGTCGGFELLPHTADIMVRAKGRSIEEAFSNAAVGMYEVMTSVASIEPLE 70
Query 91 TRLRRLTADRDDDLLVAVLEEVIYLLDTVG 120
R ++LL LE ++ LLDT G
Sbjct 71 DREVVAEGFDLENLLYNFLENLLVLLDTEG 100
>gi|154151334|ref|YP_001404952.1| hypothetical protein Mboo_1793 [Candidatus Methanoregula boonei
6A8]
gi|153999886|gb|ABS56309.1| protein of unknown function DUF101 [Methanoregula boonei 6A8]
Length=135
Score = 39.7 bits (91), Expect = 0.19, Method: Compositional matrix adjust.
Identities = 30/87 (35%), Positives = 44/87 (51%), Gaps = 2/87 (2%)
Query 47 HRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLV 106
+ + HTAD++I A APTR+G +A ++ + V TR L AD LL
Sbjct 3 YEELSHTADVKIRARAPTREGLFEEAFRALMQVMYGEDRIGNV-TRTIDLCADDPQSLLC 61
Query 107 AVLEEVIYLLDTVGETPVDLRLRDVDG 133
L EV+Y+ + G D R+R +DG
Sbjct 62 DFLSEVLYVSEVDGLVFRDARVR-LDG 87
>gi|225712042|gb|ACO11867.1| archease [Lepeophtheirus salmonis]
gi|290561178|gb|ADD37991.1| Protein archease [Lepeophtheirus salmonis]
Length=161
Score = 39.7 bits (91), Expect = 0.19, Method: Compositional matrix adjust.
Identities = 27/84 (33%), Positives = 39/84 (47%), Gaps = 5/84 (5%)
Query 34 RCVQAGKPGTSS-GHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTR 92
Q P S+ G+ + HTAD++I +WA T I Q+ LG DL+ +
Sbjct 5 HATQESDPDRSTVGYEYLDHTADVQIHSWASTLREAIEQSALGLFNYMTDLQKVES--KS 62
Query 93 LRRLTADRDD--DLLVAVLEEVIY 114
+ L AD D LL L+E +Y
Sbjct 63 ILVLKADGHDLESLLYNFLDECLY 86
>gi|150003078|ref|YP_001297822.1| hypothetical protein BVU_0490 [Bacteroides vulgatus ATCC 8482]
gi|149931502|gb|ABR38200.1| conserved hypothetical protein [Bacteroides vulgatus ATCC 8482]
Length=783
Score = 39.3 bits (90), Expect = 0.24, Method: Composition-based stats.
Identities = 32/119 (27%), Positives = 53/119 (45%), Gaps = 11/119 (9%)
Query 32 LARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHT 91
L RC A G G VP+ + W +G IR + G + ++ L + H ++
Sbjct 122 LKRCQDAAGDGYLCG---VPNGRKM----WKEIEEGNIRASGFGLNDRWVPLYNIHKIYA 174
Query 92 RLRRLTADRDD----DLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDAST 146
LR T D ++LV + + +I L+ + + + LR GG++ TFA A T
Sbjct 175 GLRDATLQTDSREAKEMLVKLTDWMIRLVSKLSDEQIQEMLRSEHGGLNETFADVAAIT 233
>gi|21227639|ref|NP_633561.1| hypothetical protein MM_1537 [Methanosarcina mazei Go1]
gi|23396956|sp|Q8PWN9.1|ARCH_METMA RecName: Full=Protein archease
gi|20906029|gb|AAM31233.1| conserved protein [Methanosarcina mazei Go1]
Length=146
Score = 39.3 bits (90), Expect = 0.24, Method: Compositional matrix adjust.
Identities = 34/158 (22%), Positives = 55/158 (35%), Gaps = 32/158 (20%)
Query 41 PGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADR 100
P + + HTAD++ A+ T + A L +D E R LT+
Sbjct 2 PSQGKKYEYLEHTADIKFLAYGETVEEVFENAALAMFNVIIDTEKVSGETEREVLLTSPD 61
Query 101 DDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVP------ 154
+ LLV L E++YL + VD V F + ++
Sbjct 62 LESLLVDWLSELLYLFE-------------VDEVVFWKFQVEEIREEEGEYSIKALASGE 108
Query 155 -------------KAVSLNELRFSQGRHGWRCAVTLDV 179
KAV+ N+L + GW+ V +D+
Sbjct 109 KYYPESHPFETEIKAVTYNQLELEKTAGGWKAQVVVDI 146
>gi|254882357|ref|ZP_05255067.1| acetyl-CoA carboxylase [Bacteroides sp. 4_3_47FAA]
gi|319640591|ref|ZP_07995310.1| hypothetical protein HMPREF9011_00907 [Bacteroides sp. 3_1_40A]
gi|254835150|gb|EET15459.1| acetyl-CoA carboxylase [Bacteroides sp. 4_3_47FAA]
gi|317387761|gb|EFV68621.1| hypothetical protein HMPREF9011_00907 [Bacteroides sp. 3_1_40A]
Length=783
Score = 38.9 bits (89), Expect = 0.30, Method: Composition-based stats.
Identities = 32/119 (27%), Positives = 53/119 (45%), Gaps = 11/119 (9%)
Query 32 LARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHT 91
L RC A G G VP+ + W +G IR + G + ++ L + H ++
Sbjct 122 LKRCQDAAGDGYLCG---VPNGRKM----WKEIEEGNIRASGFGLNDRWVPLYNIHKIYA 174
Query 92 RLRRLTADRDD----DLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDAST 146
LR T D ++LV + + +I L+ + + + LR GG++ TFA A T
Sbjct 175 GLRDATLQTDSREAKEMLVKLTDWMIRLVSKLSDEQIQDMLRSEHGGLNETFADVAAIT 233
>gi|55981714|ref|YP_145011.1| hypothetical protein TTHA1745 [Thermus thermophilus HB8]
gi|55773127|dbj|BAD71568.1| conserved hypothetical protein [Thermus thermophilus HB8]
Length=139
Score = 38.9 bits (89), Expect = 0.30, Method: Compositional matrix adjust.
Identities = 38/137 (28%), Positives = 59/137 (44%), Gaps = 6/137 (4%)
Query 48 RSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVA 107
R + HTAD+ A + + + A+ G ++ R RL A+ + LLV
Sbjct 4 RPLDHTADVGFALEAQSLEELFQAALKGLLDVMFTAPPQGGRKRRHLRLFAEDLETLLVR 63
Query 108 VLEEVIYLLDTVGETPVDLRLR--DVDGGVDVT---FATTDASTLVQVGAVPKAVSLNEL 162
L E+IYL+ T G P R+R + +GG +T F G V K+ + + L
Sbjct 64 FLNELIYLIQTKGFVPGRARIRVEEEEGGYRLTATLFGEPFQERFGFQGEV-KSATFHGL 122
Query 163 RFSQGRHGWRCAVTLDV 179
+ W+ V LDV
Sbjct 123 SVRKEDGRWKAQVILDV 139
>gi|294775898|ref|ZP_06741397.1| conserved hypothetical protein [Bacteroides vulgatus PC510]
gi|294450267|gb|EFG18768.1| conserved hypothetical protein [Bacteroides vulgatus PC510]
Length=783
Score = 38.5 bits (88), Expect = 0.36, Method: Composition-based stats.
Identities = 32/119 (27%), Positives = 53/119 (45%), Gaps = 11/119 (9%)
Query 32 LARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHT 91
L RC A G G VP+ + W DG IR + G + ++ L + H ++
Sbjct 122 LKRCQDAAGDGYLCG---VPNGRKM----WKEIEDGNIRASGFGLNDRWVPLYNIHKIYA 174
Query 92 RLRRLTADRDD----DLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDAST 146
LR T + ++LV + + +I L+ + + + LR GG++ TFA A T
Sbjct 175 GLRDATLQTGNKEAKEMLVKLTDWMIRLVSKLSDEQIQDMLRSEHGGLNETFADVAAIT 233
>gi|303277509|ref|XP_003058048.1| predicted protein [Micromonas pusilla CCMP1545]
gi|226460705|gb|EEH57999.1| predicted protein [Micromonas pusilla CCMP1545]
Length=143
Score = 38.5 bits (88), Expect = 0.39, Method: Compositional matrix adjust.
Identities = 22/67 (33%), Positives = 33/67 (50%), Gaps = 0/67 (0%)
Query 52 HTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVAVLEE 111
HTAD++I AW T + C +A LG + L+ +R R A LL A L+E
Sbjct 8 HTADIQIHAWGATVEECFARAALGMFDYMTPLDKLSEGASRARDEGAHDAHSLLFAFLDE 67
Query 112 VIYLLDT 118
+++ T
Sbjct 68 LLFHFHT 74
>gi|253701867|ref|YP_003023056.1| hypothetical protein GM21_3272 [Geobacter sp. M21]
gi|251776717|gb|ACT19298.1| protein of unknown function DUF101 [Geobacter sp. M21]
Length=143
Score = 38.5 bits (88), Expect = 0.39, Method: Compositional matrix adjust.
Identities = 36/132 (28%), Positives = 53/132 (41%), Gaps = 6/132 (4%)
Query 54 ADLRIEAWAPTRDGCIRQAVLGTVESFLD-LESAHAVHTRLRRLTADRDDDLLVAVLEEV 112
AD+ +AWA T + R A TV+ + LE T +L + ++ LL L E+
Sbjct 12 ADVAFDAWAKTLEELFRDAARATVQVMAENLEGIRRTQTVEVKLIQENEEMLLFDFLNEL 71
Query 113 IYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLV-----QVGAVPKAVSLNELRFSQG 167
I+ D + L V G V T + ++ KAV++ Q
Sbjct 72 IFYKDARRLLLLPAELTIVRGATSVKLGGTLQGEEIDPARHEMNTDVKAVTMLRYAVEQT 131
Query 168 RHGWRCAVTLDV 179
GWR V LDV
Sbjct 132 DEGWRATVVLDV 143
>gi|77917636|ref|YP_355451.1| hypothetical protein Pcar_0018 [Pelobacter carbinolicus DSM 2380]
gi|77543719|gb|ABA87281.1| conserved hypothetical protein [Pelobacter carbinolicus DSM 2380]
Length=159
Score = 38.5 bits (88), Expect = 0.42, Method: Compositional matrix adjust.
Identities = 42/147 (29%), Positives = 68/147 (47%), Gaps = 10/147 (6%)
Query 38 AGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLR-RL 96
A + G ++ R + HTAD+ IEA A + + QA G + + A ++ +
Sbjct 18 AKRKGGNAAFRLLEHTADMGIEARASSCEELFVQAARGMLAVLAGQADSTAPPKKITLEV 77
Query 97 TADRDDDLLVAVLEEVIYLLDTVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVP-- 154
A ++LLV L E++YL+ + G P D+ L + D+ A T+ + VP
Sbjct 78 RAGDVEELLVVWLNELLYLIQSKGLWPRDIVLSGMQP--DLLEARL---TVAPLAGVPQR 132
Query 155 --KAVSLNELRFSQGRHGWRCAVTLDV 179
KAV+ + L S WR V LD+
Sbjct 133 EIKAVTYHHLLVSCFHGLWRGRVYLDL 159
>gi|344287554|ref|XP_003415518.1| PREDICTED: protein archease-like [Loxodonta africana]
Length=180
Score = 38.5 bits (88), Expect = 0.44, Method: Compositional matrix adjust.
Identities = 20/81 (25%), Positives = 34/81 (42%), Gaps = 0/81 (0%)
Query 34 RCVQAGKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRL 93
+ V+A P + + + HTAD+++ AW T + Q + D + +HT
Sbjct 31 KAVKAKYPPVNQKYEYLDHTADVQLHAWGDTLEEAFEQCAMAMFGYMTDTGTVEPLHTVE 90
Query 94 RRLTADRDDDLLVAVLEEVIY 114
D LL L+E +Y
Sbjct 91 VETQGDDLQSLLFHFLDEWLY 111
>gi|171185173|ref|YP_001794092.1| hypothetical protein Tneu_0707 [Thermoproteus neutrophilus V24Sta]
gi|226708028|sp|B1YCY3.1|ARCH_THENV RecName: Full=Protein archease
gi|170934385|gb|ACB39646.1| protein of unknown function DUF101 [Thermoproteus neutrophilus
V24Sta]
Length=147
Score = 38.5 bits (88), Expect = 0.46, Method: Compositional matrix adjust.
Identities = 36/149 (25%), Positives = 60/149 (41%), Gaps = 13/149 (8%)
Query 39 GKPGTSSGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTA 98
GKP + +R HTAD+ I+A+ T + A + E H+R +
Sbjct 4 GKP---ADYRYGEHTADVLIQAYGCTLEEAFVNAAVALAEVTYSTSKVEPKHSREVEVEY 60
Query 99 DRDDDLLVAVLEEVIYLLDTVG---ETPVDLRLRDVDGGVDVTFATTDASTLVQ-----V 150
D + LL ++E++YL D +DL+L DGG + A Q
Sbjct 61 DDLEGLLFKWIDELLYLFDAEKFAISRKIDLKLEK-DGGYRIK-AVLYGDIYSQEKHGFT 118
Query 151 GAVPKAVSLNELRFSQGRHGWRCAVTLDV 179
G + KA++ + + Q W +D+
Sbjct 119 GLIVKAMTFHMMEIRQIGDYWMVQYVVDI 147
>gi|19075632|ref|NP_588132.1| UTP-glucose-1-phosphate uridylyltransferase (predicted) [Schizosaccharomyces
pombe 972h-]
gi|12231053|sp|P78811.2|UGPA1_SCHPO RecName: Full=Probable UTP--glucose-1-phosphate uridylyltransferase;
AltName: Full=UDP-glucose pyrophosphorylase; Short=UDPGP;
Short=UGPase
gi|4176544|emb|CAA22857.1| UTP-glucose-1-phosphate uridylyltransferase (predicted) [Schizosaccharomyces
pombe]
Length=506
Score = 38.1 bits (87), Expect = 0.54, Method: Compositional matrix adjust.
Identities = 42/192 (22%), Positives = 83/192 (44%), Gaps = 38/192 (19%)
Query 6 DHINPPRPRGLDVPCARLRATNPLRALARCV---QAGKPGTSSGHRS------------V 50
+H+N R ++VP + + N A A+ + +A K + ++S V
Sbjct 149 EHLN--RKYNVNVPFVLMNSFNTDEATAKVIKKYEAHKIDILTFNQSRYPRVHKETLLPV 206
Query 51 PHTADLRIEAWAPTRDGCIRQAVL--GTVESFLDLESAHAVHTRLRRLTADRDDDLLVAV 108
PHTAD I+ W P G + +A+ G +++ + + + + L A D ++L +
Sbjct 207 PHTADSAIDEWYPPGHGDVFEALTNSGIIDTLIAQGKEYLFVSNIDNLGAVVDLNILNHM 266
Query 109 LE-EVIYLLDTVGETPVDLR---LRDVDGGVDVTFATTDASTLVQVGAVP-----KAVSL 159
+E YL++ +T D++ L D DG V L+++ VP + S+
Sbjct 267 VETNAEYLMELTNKTKADVKGGTLIDYDGNV----------RLLEIAQVPPQHVEEFKSI 316
Query 160 NELRFSQGRHGW 171
+ ++ + W
Sbjct 317 KKFKYFNTNNLW 328
>gi|70999021|ref|XP_754232.1| APSES transcription factor [Aspergillus fumigatus Af293]
gi|66851869|gb|EAL92194.1| APSES transcription factor, putative [Aspergillus fumigatus Af293]
gi|159127250|gb|EDP52365.1| APSES transcription factor, putative [Aspergillus fumigatus A1163]
Length=668
Score = 38.1 bits (87), Expect = 0.56, Method: Compositional matrix adjust.
Identities = 37/134 (28%), Positives = 55/134 (42%), Gaps = 14/134 (10%)
Query 6 DHINPPRPRGLDVPCARLRATNPLRALARCVQAGKPGTSSGHRSVPHTADLRIEAWAPTR 65
DH PP P+ +R RA+ + A KP + G S PH + E
Sbjct 81 DHTPPPAPKHTSAASSRPRASKKKAVNEQVFSAAKPIRNMGPPSFPHE---QFEINPGYD 137
Query 66 DG-CIRQAVLGTVESFLDLE----SAHAVHTRLRRLTADRDDDLLVAVLEEVIY---LLD 117
D I QA L + D E S H ++R R+ + + ++ E ++Y LLD
Sbjct 138 DNESIEQATLESSSMAADEEMMSMSQHGAYSRKRKREMNEVTAMSISEQEHILYGDQLLD 197
Query 118 ---TVGETPVDLRL 128
TVG+ P R+
Sbjct 198 YFMTVGDAPEATRI 211
>gi|197117378|ref|YP_002137805.1| hypothetical protein Gbem_0988 [Geobacter bemidjiensis Bem]
gi|197086738|gb|ACH38009.1| protein of unknown function DUF101 [Geobacter bemidjiensis Bem]
Length=143
Score = 37.7 bits (86), Expect = 0.62, Method: Compositional matrix adjust.
Identities = 38/138 (28%), Positives = 58/138 (43%), Gaps = 18/138 (13%)
Query 54 ADLRIEAWAPTRDGCIRQAVLGTVESFLD-LESAHAVHTRLRRLTADRDDDLLVAVLEEV 112
AD+ +AWA T + R A TV+ + LE T +LT + ++ LL L E+
Sbjct 12 ADVAFDAWAKTLEELFRDAARATVQVMAENLEGIRRSQTVEVKLTQENEEMLLFDFLNEL 71
Query 113 IYLLD-----------TVGETPVDLRLRDVDGGVDVTFATTDASTLVQVGAVPKAVSLNE 161
I+ D T+ + LR G ++ A + +T V KAV++
Sbjct 72 IFYKDARRLLLLPAELTIMRGESGVELRGTLQGEEIDPARHEMNTDV------KAVTMLR 125
Query 162 LRFSQGRHGWRCAVTLDV 179
+ GWR V LDV
Sbjct 126 YAVEKTDEGWRATVVLDV 143
>gi|300710833|ref|YP_003736647.1| hypothetical protein HacjB3_07350 [Halalkalicoccus jeotgali B3]
gi|299124516|gb|ADJ14855.1| hypothetical protein HacjB3_07350 [Halalkalicoccus jeotgali B3]
Length=137
Score = 37.7 bits (86), Expect = 0.63, Method: Compositional matrix adjust.
Identities = 43/131 (33%), Positives = 61/131 (47%), Gaps = 6/131 (4%)
Query 52 HTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLR-RLTADRDDDLLVAVLE 110
HTAD+ I A PT D LG + +S A R L A+ + LL L+
Sbjct 10 HTADVGIGASGPTLDSVF--GALGDGLAAAQCDSIPADGERFSFSLIAESREALLFDYLD 67
Query 111 EVIYLLDTVGETPVDLRLRDVDGGVDVTF-ATTDASTLVQVGAVP-KAVSLNELRFSQGR 168
++IY D PVD R+ +D G + A++ L V A KAV+ +E+R +
Sbjct 68 QLIYERDVRLVLPVDNRIT-IDPGEEWRLDASSRGVPLEAVEAREVKAVTYSEMRIEEVE 126
Query 169 HGWRCAVTLDV 179
+GW V LDV
Sbjct 127 NGWEAYVVLDV 137
>gi|332016564|gb|EGI57445.1| Protein archease-like protein [Acromyrmex echinatior]
Length=157
Score = 37.7 bits (86), Expect = 0.72, Method: Compositional matrix adjust.
Identities = 20/74 (28%), Positives = 36/74 (49%), Gaps = 4/74 (5%)
Query 45 SGHRSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDD-- 102
+ + + HTAD+++ AW T D Q + DLE T++ + A+ DD
Sbjct 16 AKYEYLDHTADVQLHAWGDTMDEAFEQCAMAMFGYMTDLERVQV--TQVHHIEAEGDDME 73
Query 103 DLLVAVLEEVIYLL 116
LL L+E++++
Sbjct 74 SLLFHFLDELLFMF 87
>gi|218294652|ref|ZP_03495506.1| protein of unknown function DUF101 [Thermus aquaticus Y51MC23]
gi|218244560|gb|EED11084.1| protein of unknown function DUF101 [Thermus aquaticus Y51MC23]
Length=139
Score = 37.4 bits (85), Expect = 0.97, Method: Compositional matrix adjust.
Identities = 24/76 (32%), Positives = 36/76 (48%), Gaps = 0/76 (0%)
Query 48 RSVPHTADLRIEAWAPTRDGCIRQAVLGTVESFLDLESAHAVHTRLRRLTADRDDDLLVA 107
R + HTAD+ E A + +G + A+ G ++ R L A+ + LLV
Sbjct 4 RPLDHTADVGFELEAESLEGLFQAALAGLLQVMFQNPPQRGKRRRRVVLEAEDLETLLVR 63
Query 108 VLEEVIYLLDTVGETP 123
L E+IYL+ T G P
Sbjct 64 YLNELIYLIQTKGFVP 79
Lambda K H
0.322 0.137 0.416
Gapped
Lambda K H
0.267 0.0410 0.140
Effective search space used: 158760884352
Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
excluding environmental samples from WGS projects
Posted date: Sep 5, 2011 4:36 AM
Number of letters in database: 5,219,829,388
Number of sequences in database: 15,229,318
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Neighboring words threshold: 11
Window for multiple hits: 40