BLASTP 2.2.25+ Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for composition-based statistics: Alejandro A. Schäffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 15,229,318 sequences; 5,219,829,388 total letters Query= Rv0962c Length=224 Score E Sequences producing significant alignments: (Bits) Value gi|15608102|ref|NP_215477.1| lipoprotein LprP [Mycobacterium tub... 458 2e-127 gi|15840387|ref|NP_335424.1| putative lipoprotein [Mycobacterium... 456 1e-126 gi|339293968|gb|AEJ46079.1| putative lipoprotein [Mycobacterium ... 424 7e-117 gi|340625975|ref|YP_004744427.1| putative lipoprotein LPRP [Myco... 394 5e-108 gi|289749488|ref|ZP_06508866.1| lipoprotein lprP [Mycobacterium ... 369 3e-100 gi|240169359|ref|ZP_04748018.1| putative lipoprotein [Mycobacter... 361 6e-98 gi|85717436|ref|ZP_01048385.1| ISPsy14, transposase [Nitrobacter... 40.8 0.17 gi|289579403|ref|YP_003478030.1| copper amine oxidase domain pro... 38.5 0.70 gi|13488090|ref|NP_085699.1| oligopeptide ABC transporter ATP-bi... 37.7 1.4 gi|126289989|ref|XP_001364012.1| PREDICTED: transcription factor... 36.6 3.0 gi|170041738|ref|XP_001848610.1| conserved hypothetical protein ... 36.2 4.1 gi|313891814|ref|ZP_07825419.1| outer membrane protein [Dialiste... 35.8 4.7 gi|334128477|ref|ZP_08502365.1| dihydroxyacetone kinase [Centipe... 35.0 7.3 gi|345355686|gb|EGW87895.1| N-ethylmaleimide reductase domain pr... 35.0 7.6 >gi|15608102|ref|NP_215477.1| lipoprotein LprP [Mycobacterium tuberculosis H37Rv] gi|148660742|ref|YP_001282265.1| putative lipoprotein LprP [Mycobacterium tuberculosis H37Ra] gi|167967735|ref|ZP_02550012.1| putative lipoprotein LprP [Mycobacterium tuberculosis H37Ra] gi|307083485|ref|ZP_07492598.1| lipoprotein lprP [Mycobacterium tuberculosis SUMu012] gi|2829499|sp|P71548.1|LPRP_MYCTU RecName: Full=Uncharacterized lipoprotein lprP; Flags: Precursor gi|1524200|emb|CAB01988.1| POSSIBLE LIPOPROTEIN LPRP [Mycobacterium tuberculosis H37Rv] gi|148504894|gb|ABQ72703.1| putative lipoprotein LprP [Mycobacterium tuberculosis H37Ra] gi|308366900|gb|EFP55751.1| lipoprotein lprP [Mycobacterium tuberculosis SUMu012] Length=224 Score = 458 bits (1179), Expect = 2e-127, Method: Compositional matrix adjust. Identities = 224/224 (100%), Positives = 224/224 (100%), Gaps = 0/224 (0%) Query 1 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLA 60 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLA Sbjct 1 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLA 60 Query 61 NLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQEASELFFGRPAPTPQQW 120 NLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQEASELFFGRPAPTPQQW Sbjct 61 NLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQEASELFFGRPAPTPQQW 120 Query 121 LQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYSY 180 LQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYSY Sbjct 121 LQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYSY 180 Query 181 NTGCHPPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY 224 NTGCHPPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY Sbjct 181 NTGCHPPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY 224 >gi|15840387|ref|NP_335424.1| putative lipoprotein [Mycobacterium tuberculosis CDC1551] gi|31792151|ref|NP_854644.1| lipoprotein LprP [Mycobacterium bovis AF2122/97] gi|121636888|ref|YP_977111.1| putative lipoprotein lprP [Mycobacterium bovis BCG str. Pasteur 1173P2] 71 more sequence titlesLength=224 Score = 456 bits (1172), Expect = 1e-126, Method: Compositional matrix adjust. Identities = 223/224 (99%), Positives = 223/224 (99%), Gaps = 0/224 (0%) Query 1 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLA 60 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLA Sbjct 1 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLA 60 Query 61 NLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQEASELFFGRPAPTPQQW 120 NLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQEASELFFGRPAPTPQQW Sbjct 61 NLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQEASELFFGRPAPTPQQW 120 Query 121 LQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYSY 180 LQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYSY Sbjct 121 LQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYSY 180 Query 181 NTGCHPPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY 224 NTGCH PAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY Sbjct 181 NTGCHLPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY 224 >gi|339293968|gb|AEJ46079.1| putative lipoprotein [Mycobacterium tuberculosis CCDC5079] Length=225 Score = 424 bits (1089), Expect = 7e-117, Method: Compositional matrix adjust. Identities = 223/225 (99%), Positives = 223/225 (99%), Gaps = 1/225 (0%) Query 1 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQ-L 59 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQ L Sbjct 1 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQQL 60 Query 60 ANLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQEASELFFGRPAPTPQQ 119 ANLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQEASELFFGRPAPTPQQ Sbjct 61 ANLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQEASELFFGRPAPTPQQ 120 Query 120 WLQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYS 179 WLQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYS Sbjct 121 WLQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYS 180 Query 180 YNTGCHPPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY 224 YNTGCH PAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY Sbjct 181 YNTGCHLPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY 225 >gi|340625975|ref|YP_004744427.1| putative lipoprotein LPRP [Mycobacterium canettii CIPT 140010059] gi|340004165|emb|CCC43303.1| putative lipoprotein LPRP [Mycobacterium canettii CIPT 140010059] Length=223 Score = 394 bits (1012), Expect = 5e-108, Method: Compositional matrix adjust. Identities = 208/224 (93%), Positives = 213/224 (96%), Gaps = 1/224 (0%) Query 1 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLA 60 MKR SRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLA Sbjct 1 MKRLSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLA 60 Query 61 NLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQEASELFFGRPAPTPQQW 120 NLDATIRAMIAKYSPQT+FS+ + H GCNDPFTRTIGRQE S+ FFGRPAPTPQQW Sbjct 61 NLDATIRAMIAKYSPQTQFSS-LATGHPPGGCNDPFTRTIGRQEESDHFFGRPAPTPQQW 119 Query 121 LQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYSY 180 LQIVTELAPVFKAAGFRPN+SVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYSY Sbjct 120 LQIVTELAPVFKAAGFRPNDSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYSY 179 Query 181 NTGCHPPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY 224 NTGCH PAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY Sbjct 180 NTGCHLPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY 223 >gi|289749488|ref|ZP_06508866.1| lipoprotein lprP [Mycobacterium tuberculosis T92] gi|289690075|gb|EFD57504.1| lipoprotein lprP [Mycobacterium tuberculosis T92] Length=182 Score = 369 bits (946), Expect = 3e-100, Method: Compositional matrix adjust. Identities = 179/181 (99%), Positives = 180/181 (99%), Gaps = 0/181 (0%) Query 44 KIVNGRPDLETVQQQLANLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQ 103 +IVNGRPDLETVQQQLANLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQ Sbjct 2 EIVNGRPDLETVQQQLANLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQ 61 Query 104 EASELFFGRPAPTPQQWLQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGV 163 EASELFFGRPAPTPQQWLQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGV Sbjct 62 EASELFFGRPAPTPQQWLQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGV 121 Query 164 TINLVNGDNRGPLGYSYNTGCHPPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDA 223 TINLVNGDNRGPLGYSYNTGCH PAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDA Sbjct 122 TINLVNGDNRGPLGYSYNTGCHLPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDA 181 Query 224 Y 224 Y Sbjct 182 Y 182 >gi|240169359|ref|ZP_04748018.1| putative lipoprotein [Mycobacterium kansasii ATCC 12478] Length=223 Score = 361 bits (926), Expect = 6e-98, Method: Compositional matrix adjust. Identities = 176/224 (79%), Positives = 190/224 (85%), Gaps = 1/224 (0%) Query 1 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLA 60 M R R TA LL + LL GC+KPNT DPYANPGRGELDR Q+IVN RPDLETVQQQLA Sbjct 1 MNRPHRYATAGLLSVTLLLTGCLKPNTLDPYANPGRGELDRLQQIVNKRPDLETVQQQLA 60 Query 61 NLDATIRAMIAKYSPQTRFSTGVTVSHLTNGCNDPFTRTIGRQEASELFFGRPAPTPQQW 120 NLDATIRA+IA+YSPQT+FS+ +VSH TNGCNDPF RTIGRQ S+ FFG PAPTP+QW Sbjct 61 NLDATIRAVIAEYSPQTKFSS-TSVSHPTNGCNDPFVRTIGRQVGSDHFFGEPAPTPEQW 119 Query 121 LQIVTELAPVFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYSY 180 L+IVTELAPVFKAAGFRPNN+ PGD P PLG+ N SQIRDDGV I LVNGD PL YSY Sbjct 120 LRIVTELAPVFKAAGFRPNNAAPGDAPLPLGSANDSQIRDDGVLIRLVNGDEHSPLSYSY 179 Query 181 NTGCHPPAAWRTAPPPLNMRPANDPDVHYPYLYGSPGGRTRDAY 224 +TGCH PAAWRTAPPPLNMRP NDPDVHYPYLYGSPGGRTRDAY Sbjct 180 DTGCHLPAAWRTAPPPLNMRPPNDPDVHYPYLYGSPGGRTRDAY 223 >gi|85717436|ref|ZP_01048385.1| ISPsy14, transposase [Nitrobacter sp. Nb-311A] gi|85695738|gb|EAQ33647.1| ISPsy14, transposase [Nitrobacter sp. Nb-311A] Length=396 Score = 40.8 bits (94), Expect = 0.17, Method: Compositional matrix adjust. Identities = 26/75 (35%), Positives = 36/75 (48%), Gaps = 7/75 (9%) Query 144 GDPPQPLGAPNYS----QIRDDGVTINLVNGDNR--GPLGYSYNTGCHPPAAW-RTAPPP 196 G PPQ LG P+++ +++ GVT+ L+ + R P GY Y C AAW R A P Sbjct 67 GRPPQDLGEPDWARVAQELKRKGVTLTLLWQEYRTAHPEGYGYTWFCERFAAWQRRAHPT 126 Query 197 LNMRPANDPDVHYPY 211 R A + Y Sbjct 127 FRHRHAAGAVLQTDY 141 >gi|289579403|ref|YP_003478030.1| copper amine oxidase domain protein [Thermoanaerobacter italicus Ab9] gi|289529116|gb|ADD03468.1| copper amine oxidase domain protein [Thermoanaerobacter italicus Ab9] Length=452 Score = 38.5 bits (88), Expect = 0.70, Method: Compositional matrix adjust. Identities = 28/96 (30%), Positives = 44/96 (46%), Gaps = 11/96 (11%) Query 57 QQLANLDATIRAMI-------AKYSPQTRFSTGVTVSH--LTNGCNDPFTRTIGRQEASE 107 QQL ++ +R +I +KY P+ G V L NG F G++ AS Sbjct 27 QQLPKKNSNVRVLIEVDDYKLSKYEPKKGAYLGAYVYQDTLINGDMKKFNELTGKKHASF 86 Query 108 -LFFGRPAPTPQQWLQIVTELAPVFKAAGFRPNNSV 142 ++ G +P PQ+W+ + E+ A F PNN + Sbjct 87 FIYVGYGSPFPQKWIDQLKEVGAAAHIA-FEPNNGL 121 >gi|13488090|ref|NP_085699.1| oligopeptide ABC transporter ATP-binding protein [Mesorhizobium loti MAFF303099] gi|14027948|dbj|BAB54540.1| oligopeptide ABC transporter ATP-binding protein [Mesorhizobium loti MAFF303099] Length=335 Score = 37.7 bits (86), Expect = 1.4, Method: Compositional matrix adjust. Identities = 48/167 (29%), Positives = 67/167 (41%), Gaps = 32/167 (19%) Query 6 RSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGR-----PDLETVQQQLA 60 R T ALL L +P+ F Y + G +RQ+++ R PDL + ++ Sbjct 133 REKTQALLSTVGL-----RPDQFASYPHELSG--GQRQRVILARALVLDPDLLVCDEPVS 185 Query 61 NLDATIRAMIAKYSPQTRFSTGVT---VSHLTNGCNDPFTRTI----GR--QEAS--ELF 109 LD ++ A + + S G+T +SH R I GR +EAS ELF Sbjct 186 ALDVSVGAQVVNLLKDVQASRGLTYLFISHDLKIVRQIADRVIVMYLGRIMEEASATELF 245 Query 110 FGRPAPTPQQWLQIVTELAPVFKAAGFRPNNS--VPGDPPQPLGAPN 154 P Q L V L P P + GDPP P+ PN Sbjct 246 RNPLHPYTQALLSAVPSLKP-------HPQRRLIIQGDPPNPMEVPN 285 >gi|126289989|ref|XP_001364012.1| PREDICTED: transcription factor IIIB 90 kDa subunit [Monodelphis domestica] Length=681 Score = 36.6 bits (83), Expect = 3.0, Method: Composition-based stats. Identities = 24/60 (40%), Positives = 31/60 (52%), Gaps = 5/60 (8%) Query 9 TAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLANLDATIRA 68 TAA LGI + CI P + +P N G GELD +NG D E + L +A I+A Sbjct 421 TAASLGITESIRECISPQSREPNENSGDGELD-----LNGIDDSEIDRYILNENEAQIKA 475 >gi|170041738|ref|XP_001848610.1| conserved hypothetical protein [Culex quinquefasciatus] gi|167865270|gb|EDS28653.1| conserved hypothetical protein [Culex quinquefasciatus] Length=280 Score = 36.2 bits (82), Expect = 4.1, Method: Compositional matrix adjust. Identities = 26/80 (33%), Positives = 37/80 (47%), Gaps = 8/80 (10%) Query 105 ASELFFGRPAPTPQQWLQIVTELAPVFKAAGFRPNNSVP-GDPPQPLGAPNYSQIRDDGV 163 +S F G+PAP Q+ V LAP G+ P +P + P PL N+ + Sbjct 170 SSSFFSGKPAPIYNQF--TVQHLAPHLSTLGYNPRFGLPLNNAPSPLLTKNHGPVALGSG 227 Query 164 TINLVNGDN-----RGPLGY 178 +I +V+G N G LGY Sbjct 228 SIGIVHGPNGVALGSGSLGY 247 >gi|313891814|ref|ZP_07825419.1| outer membrane protein [Dialister microaerophilus UPII 345-E] gi|329121152|ref|ZP_08249783.1| outer membrane chaperone Skp (OmpH) [Dialister micraerophilus DSM 19965] gi|313119808|gb|EFR42995.1| outer membrane protein [Dialister microaerophilus UPII 345-E] gi|327471314|gb|EGF16768.1| outer membrane chaperone Skp (OmpH) [Dialister micraerophilus DSM 19965] Length=144 Score = 35.8 bits (81), Expect = 4.7, Method: Compositional matrix adjust. Identities = 23/75 (31%), Positives = 39/75 (52%), Gaps = 10/75 (13%) Query 1 MKRTSRSLTAALLGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLA 60 MK +S A++ A +L+GC + G +D +KI N P ++ +Q+++ Sbjct 1 MKLWKKSALIAMIAAATVLSGC---------GSEKVGVVDF-EKIQNESPKIKEIQKEVI 50 Query 61 NLDATIRAMIAKYSP 75 N DA IRA +AK + Sbjct 51 NKDAEIRARLAKETE 65 >gi|334128477|ref|ZP_08502365.1| dihydroxyacetone kinase [Centipeda periodontii DSM 2778] gi|333387154|gb|EGK58357.1| dihydroxyacetone kinase [Centipeda periodontii DSM 2778] Length=334 Score = 35.0 bits (79), Expect = 7.3, Method: Compositional matrix adjust. Identities = 25/79 (32%), Positives = 41/79 (52%), Gaps = 5/79 (6%) Query 13 LGIAALLAGCIKPNTFDPYANPGRGELDRRQKIVNGRPDLETVQQQLANLDATIRAMIAK 72 +G+A L CI P + G E++ I +G P +E + +L +DAT+ M+A+ Sbjct 189 MGVA--LTPCIVPEAGKATFSIGDDEMEIGMGI-HGEPGIE--RTKLQTIDATVETMLAR 243 Query 73 YSPQTRFSTGVTVSHLTNG 91 F++G TV+ L NG Sbjct 244 ILDDLPFASGDTVAVLVNG 262 >gi|345355686|gb|EGW87895.1| N-ethylmaleimide reductase domain protein [Escherichia coli 3030-1] gi|345359731|gb|EGW91906.1| N-ethylmaleimide reductase domain protein [Escherichia coli STEC_DG131-3] gi|345394001|gb|EGX23766.1| N-ethylmaleimide reductase domain protein [Escherichia coli TX1999] Length=120 Score = 35.0 bits (79), Expect = 7.6, Method: Compositional matrix adjust. Identities = 27/90 (30%), Positives = 39/90 (44%), Gaps = 8/90 (8%) Query 130 VFKAAGFRPNNSVPGDPPQPLGAPNYSQIRDDGVTINLVNGDNRGPLGYSYNTGCHPP-- 187 +F A R + PGD P PL A Y Q G+ I+ + GY+ G H P Sbjct 21 IFMAPLTRLRSIEPGDIPTPLMAEYYRQRASAGLIISEATQISAQAKGYAGAPGIHSPEQ 80 Query 188 -AAWRTAPPPLNMRPANDPDVHYPYLYGSP 216 AAW+ +P ++ V +P G+P Sbjct 81 IAAWKKSPLAFMLK-----MVIWPCSCGTP 105 Lambda K H 0.317 0.136 0.427 Gapped Lambda K H 0.267 0.0410 0.140 Effective search space used: 286119069840 Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Sep 5, 2011 4:36 AM Number of letters in database: 5,219,829,388 Number of sequences in database: 15,229,318 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Neighboring words threshold: 11 Window for multiple hits: 40