<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1751-0473-3-9</ui>
   <ji>1751-0473</ji>
   <fm>
      <dochead>Brief reports</dochead>
      <bibl>
         <title>
            <p>FASH: A web application for nucleotides sequence search</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Veksler-Lublinksy</snm>
               <fnm>Isana</fnm>
               <insr iid="I1"/>
               <email>vaksler@cs.bgu.ac.il</email>
            </au>
            <au id="A2">
               <snm>Barash</snm>
               <fnm>Danny</fnm>
               <insr iid="I1"/>
               <email>dbarash@cs.bgu.ac.il</email>
            </au>
            <au id="A3">
               <snm>Avisar</snm>
               <fnm>Chai</fnm>
               <insr iid="I1"/>
               <email>chaiavisar@gmail.com</email>
            </au>
            <au id="A4">
               <snm>Troim</snm>
               <fnm>Einav</fnm>
               <insr iid="I1"/>
               <email>einav.troim@gmail.com</email>
            </au>
            <au id="A5">
               <snm>Chew</snm>
               <fnm>Paul</fnm>
               <insr iid="I2"/>
               <email>chew@cs.cornell.edu</email>
            </au>
            <au id="A6" ca="yes">
               <snm>Kedem</snm>
               <fnm>Klara</fnm>
               <insr iid="I1"/>
               <email>klara@cs.bgu.ac.il</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Computer Science, Ben-Gurion University, 84105 Beer-Sheva, Israel</p>
            </ins>
            <ins id="I2">
               <p>Computer Science Department, 721 Rhodes Hall, Cornell University, Ithaca, NY 14853, USA</p>
            </ins>
         </insg>
         <source>Source Code for Biology and Medicine</source>
         <issn>1751-0473</issn>
         <pubdate>2008</pubdate>
         <volume>3</volume>
         <issue>1</issue>
         <fpage>9</fpage>
         <url>http://www.scfbm.org/content/3/1/9</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18505581</pubid>
               <pubid idtype="doi">10.1186/1751-0473-3-9</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>19</day>
               <month>12</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>27</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>27</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Veksler-Lublinksy et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p/>
               </st>
               <p>FASH (Fourier Alignment Sequence Heuristics) is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome), FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of <it>contiguous </it>seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate.</p>
            </sec>
            <sec>
               <st>
                  <p>Availability</p>
               </st>
               <p>FASH can be accessed at</p>
               <p><url>https://fash.bgu.ac.il:8443/fash/default.jsp</url> (secured website)</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Recent discoveries <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> suggest that long RNA sequences, acting as natural sensors, exist in eukaryotic genomes and that such RNA sequences have not yet been found by commonly used bioinformatics methods. Although packages such as BLAST <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> and FASTA <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> are tremendously useful, alternative approaches may locate candidates that have been missed by traditional approaches.</p>
      </sec>
      <sec>
         <st>
            <p>Algorithm</p>
         </st>
         <p>Our algorithm is based on the Fast Fourier Transform (FFT) and is similar to a method originally developed in the 80's <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. We define a "Query vs. Text" matrix M where each entry (i, j) is assigned the value 1 if Query [i] is identical to Text [j], and 0 otherwise.</p>
         <p>This matrix is the product of matrices Q and T where Q is derived from the Query and T is derived from the Text (see Figure <figr fid="F1">1</figr>). Let <it>m </it>and <it>n </it>represent the length of the Query and the length of the Text, respectively. Matrix Q consists of <it>m </it>rows and 4 columns with one column for each base (e.g., <it>U</it>, <it>C</it>, <it>G</it>, and <it>A </it>for RNA). The entries in Q consist of 0s and 1s indicating which bases are present; there is exactly one 1 in each row. Matrix T is similar, but with <it>n </it>rows. It is easy to see that M, the "Query vs. Text" matrix, is equal to QT' where T' represents the transpose of T.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Veksler-Lublinsky et al</p>
            </caption>
            <text>
               <p><b>Veksler-Lublinsky et al</b>. Matrices Q, T, and M.</p>
            </text>
            <graphic file="1751-0473-3-9-1"/>
         </fig>
         <p>To locate a substring similar to the query string, we do not need the entire matrix M, we need just the sum along each diagonal of M. The sum of M's elements along diagonal <it>d </it>indicates the number of identities between nucleotides in Query [1..m] and in Text [d..d+m-1]. Using the FFT, we can efficiently calculate the sum for each diagonal of the matrix, obtaining the number of matches along each such diagonal in far less time than it takes to build matrix M.</p>
         <p>Note that a gap (an unmatched nucleotide) has the effect of switching the "match-path" to an adjacent diagonal. If gaps are randomly distributed and if the Query sequence is sufficiently long then there should be enough similarity along one of the diagonals to detect the match. Thus, FASH works best for detecting remote homologs if the Query string is fairly long (say, 400 or longer).</p>
         <p>Once we have located the significant diagonals, we apply traditional sequence alignment methods on the portions of the Text near significant diagonals. Significance is determined by straightforward statistical considerations. Assuming that each element of the diagonal is a single Bernoulli trial, the expected number of matches along a diagonal is <it>mp </it>where <it>m </it>is the length of the query, and <it>p </it>is the probability of a match in each position. The variance is <it>mp</it>(1 - <it>p</it>). For example, assuming each base has probability <it>p </it>= 0.25 of appearing in a particular position, if the Query is of length 600 then the expected sum along a random diagonal is (600)(0.25) = 150, with variance (600)(0.25)(0.75) = 112.5, and with a standard deviation <it>&#963; </it>&#8776; 10.61. In other words, the expected sum for a randomly chosen diagonal is 150 and a sum above 192.5 is more than 4<it>&#963; </it>from the mean. If four <it>&#963;</it>s are used for the significance threshold then false positives occur on less than 1 out of 10000 diagonals.</p>
         <p>If <it>n </it>is very large (e.g., the size of a genome), we break the text sequence into pieces of size 2<sup>13 </sup>where each such piece has an overlap of size 2<sup>10 </sup>with the previous piece. We use the FFT to calculate sums for each piece separately.</p>
      </sec>
      <sec>
         <st>
            <p>Complexity analysis</p>
         </st>
         <p>Using the FFT, all diagonal sums for M can be found in time <it>O</it>(<it>n </it>log <it>n</it>) where <it>n </it>is the length of the longer sequence (the Text, in our case). For our application, <it>n </it>is very large, so direct use of the FFT is impractical. Thus, we divide the Text sequence into pieces of size 2<sup><it>S </it></sup>with 2<sup><it>K </it></sup>overlap, where <it>S </it>and <it>K </it>are small constants (13 and 10 respectively, in our application). There are at most <inline-formula><m:math name="1751-0473-3-9-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mi>n</m:mi><m:mrow><m:msup><m:mn>2</m:mn><m:mi>S</m:mi></m:msup><m:mo>&#8722;</m:mo><m:msup><m:mn>2</m:mn><m:mi>K</m:mi></m:msup></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqWGUbGBaeaacqaIYaGmdaahaaqabeaacqWGtbWuaaGaeyOeI0IaeGOmaiZaaWbaaeqabaGaem4saSeaaaaaaaa@333B@</m:annotation></m:semantics></m:math></inline-formula> such pieces. For each piece, the FFT takes time 2<sup><it>s</it></sup><it>log</it>2<sup><it>s</it></sup>, leading to an overall computation time of: <inline-formula><m:math name="1751-0473-3-9-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>O</m:mi><m:mo stretchy="false">(</m:mo><m:mfrac><m:mi>n</m:mi><m:mrow><m:msup><m:mn>2</m:mn><m:mi>S</m:mi></m:msup><m:mo>&#8722;</m:mo><m:msup><m:mn>2</m:mn><m:mi>K</m:mi></m:msup></m:mrow></m:mfrac><m:mo>&#215;</m:mo><m:msup><m:mn>2</m:mn><m:mi>S</m:mi></m:msup><m:mi>l</m:mi><m:mi>o</m:mi><m:mi>g</m:mi><m:msup><m:mn>2</m:mn><m:mi>S</m:mi></m:msup><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mi>O</m:mi><m:mo stretchy="false">(</m:mo><m:mfrac><m:mi>n</m:mi><m:mrow><m:msup><m:mn>2</m:mn><m:mi>K</m:mi></m:msup><m:mo stretchy="false">(</m:mo><m:msup><m:mn>2</m:mn><m:mrow><m:mi>S</m:mi><m:mo>&#8722;</m:mo><m:mi>K</m:mi></m:mrow></m:msup><m:mo>&#8722;</m:mo><m:mn>1</m:mn><m:mo stretchy="false">)</m:mo></m:mrow></m:mfrac><m:mo>&#215;</m:mo><m:msup><m:mn>2</m:mn><m:mi>S</m:mi></m:msup><m:mi>l</m:mi><m:mi>o</m:mi><m:mi>g</m:mi><m:msup><m:mn>2</m:mn><m:mi>S</m:mi></m:msup><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mi>O</m:mi><m:mo stretchy="false">(</m:mo><m:mi>n</m:mi><m:mo>&#215;</m:mo><m:mi>l</m:mi><m:mi>o</m:mi><m:mi>g</m:mi><m:msup><m:mn>2</m:mn><m:mi>S</m:mi></m:msup><m:mo>&#215;</m:mo><m:mfrac><m:mrow><m:msup><m:mn>2</m:mn><m:mrow><m:mi>S</m:mi><m:mo>&#8722;</m:mo><m:mi>K</m:mi></m:mrow></m:msup></m:mrow><m:mrow><m:msup><m:mn>2</m:mn><m:mrow><m:mi>S</m:mi><m:mo>&#8722;</m:mo><m:mi>K</m:mi></m:mrow></m:msup><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:mfrac><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mi>O</m:mi><m:mo stretchy="false">(</m:mo><m:mi>s</m:mi><m:mi>n</m:mi><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4ta8KaeiikaGscfa4aaSaaaeaacqWGUbGBaeaacqaIYaGmdaahaaqabeaacqWGtbWuaaGaeyOeI0IaeGOmaiZaaWbaaeqabaGaem4saSeaaaaakiabgEna0kabikdaYmaaCaaaleqabaGaem4uamfaaOGaemiBaWMaem4Ba8Maem4zaCMaeGOmaiZaaWbaaSqabeaacqWGtbWuaaGccqGGPaqkcqGH9aqpcqWGpbWtcqGGOaakjuaGdaWcaaqaaiabd6gaUbqaaiabikdaYmaaCaaabeqaaiabdUealbaacqGGOaakcqaIYaGmdaahaaqabeaacqWGtbWucqGHsislcqWGlbWsaaGaeyOeI0IaeGymaeJaeiykaKcaaOGaey41aqRaeGOmaiZaaWbaaSqabeaacqWGtbWuaaGccqWGSbaBcqWGVbWBcqWGNbWzcqaIYaGmdaahaaWcbeqaaiabdofatbaakiabcMcaPiabg2da9iabd+eapjabcIcaOiabd6gaUjabgEna0kabdYgaSjabd+gaVjabdEgaNjabikdaYmaaCaaaleqabaGaem4uamfaaOGaey41aqBcfa4aaSaaaeaacqaIYaGmdaahaaqabeaacqWGtbWucqGHsislcqWGlbWsaaaabaGaeGOmaiZaaWbaaeqabaGaem4uamLaeyOeI0Iaem4saSeaaiabgkHiTiabigdaXaaakiabcMcaPiabg2da9iabd+eapjabcIcaOiabdohaZjabd6gaUjabcMcaPaaa@7DA6@</m:annotation></m:semantics></m:math></inline-formula>.</p>
         <p>Additional time is needed for the dynamic-programming-based alignment methods that are run on the region around each significant diagonal, but this time is negligible compared to the FFT time.</p>
      </sec>
      <sec>
         <st>
            <p>Server overview</p>
         </st>
         <p>FASH was designed as a user friendly application. The GUI is based on J2EE technology and was built using JSP pages and servlets. It runs on an Apache Tomcat web server. The application uses a Java N-Tier architecture containing the web layer, business layer, and data access layer. A MySQL database is used for saving all results from the <it>Request </it>and <it>Process </it>steps (see below), and for storing several pre-loaded genome files (taken from NCBI) that can be used as search Text.</p>
         <p>We briefly describe some of FASH's features. For a more detailed description, we refer the reader to the documentation available at our web-site. The search is divided into three main steps.</p>
         <p>&#8226; Request: The user enters a Query and either chooses a pre-loaded Text genome or uploads a Text sequence (see Figure <figr fid="F2">2</figr>). After submitting the request, the user gets a serial key which will be needed for the <it>Process </it>step. All diagonal sums are calculated using the FFT. An email message is sent to the user when this step is complete.</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Veksler-Lublinsky et al</p>
            </caption>
            <text>
               <p><b>Veksler-Lublinsky et al</b>. The request screen.</p>
            </text>
            <graphic file="1751-0473-3-9-2"/>
         </fig>
         <p>&#8226; Process: The user enters a <it>Request </it>serial key and parameters: the alignment method, the scoring matrix, gap penalties, and the threshold (see Figure <figr fid="F3">3</figr>). After submitting parameters, the user gets a new serial key needed for the <it>Results </it>step. Sequence alignment methods are applied on all diagonals with sum above the specified threshold. For each <it>Request</it>, the user can submit several <it>Processes </it>each with its own serial key. The division into phases <it>Request </it>and <it>Process </it>enables the user to modify the parameters of the search, as well as the threshold, without reinvesting time in calculating sums using the FFT. An email message is sent to the user when the <it>Process </it>step is complete.</p>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Veksler-Lublinsky et al</p>
            </caption>
            <text>
               <p><b>Veksler-Lublinsky et al</b>. The process screen.</p>
            </text>
            <graphic file="1751-0473-3-9-3"/>
         </fig>
         <p>&#8226; Results: The user enters a <it>Process </it>serial key and indicates the number of alignment results to view (see Figure <figr fid="F4">4</figr>). FASH supports viewing up to 1000 results. Results are ranked by score and percent of identity between the Query and a position in the Text. The user can view the alignment, number and percent of identical positions, mismatches and gaps.</p>
         <fig id="F4">
            <title>
               <p>Figure 4</p>
            </title>
            <caption>
               <p>Veksler-Lublinsky et al</p>
            </caption>
            <text>
               <p><b>Veksler-Lublinsky et al</b>. The results screen.</p>
            </text>
            <graphic file="1751-0473-3-9-4"/>
         </fig>
         <p>The user can check the <it>Request </it>or <it>Process </it>status at any time on the <it>Check Status </it>page. The answer is supplied from the online database.</p>
         <p>The user can try the system with a guided example by following the "Guide to FASH" link. Information about different options and parameters is available via the "Help" link.</p>
      </sec>
      <sec>
         <st>
            <p>Illustrative example where our application is advantageous</p>
         </st>
         <p>In order to illustrate a potential success of our method, we extracted a 560 nts sequence from chromosome Y of the human genome:</p>
         <p>taaccctaaccctaaccctaaccctaaccctaaccctctgaaagtggacctatcagcaggatgtgggtgggagcagattagagaataaaagcagactgcct  gagccagcagtggcaacccaatggggtccctttccatactgtggaagcttcgttctttcactctttgcaataaatcttgctattgctcactctttgggtccaca  ctgcctttatgagctgtgacactcaccgcaaaggtctgcagcttcactcctgagccagtgagaccacaaccccaccagaaagaagaaactcagaacacatc tgaacatcagaagaaacaaactccggacgcgccacctttaagaactgtaacactcaccgcgaggttccgcgtcttcattcttgaagtcagtgagaccaaga  acccaccaattccagacacactaggaccctgagacaacccctagaagagcacctggttgataacccagttcccatctgggatttaggggacctggacagcc  cggaaaatgagctcctcatctctaacccagttcccctgtggggatttaggg</p>
         <p>We searched this sequence both with the BLAST tool and with our FASH application, both were able to find the query. Next, we mutated every 7<sup><it>th </it></sup>nucleotide in the query sequence, and ran the program again. BLAST uses exact matching word (of size W) heuristic &#8211; it looks for short sequences of continuous matches longer than W and then extends them to produce an alignment.</p>
         <p>In this run, we chose the minimal word size that BLAST enables (7 nucleotides). BLAST was not able to find our query in the human genome, while FASH ranked it as its highest hit, with 86% indentity to the query.</p>
         <p>The mutated sequence was (mutated positions are underlined): taaccc<ul>c</ul>aaccct<ul>g</ul>acccta<ul>c</ul>ccctaa<ul>a</ul>cctaac<ul>t</ul>ctctga<ul>c</ul>agtgga<ul>t</ul>ctatca<ul>c</ul>caggat<ul>c</ul>tgggtg<ul>c</ul>gagcag<ul>c</ul>ttagag<ul>g</ul>ataaaa<ul>c</ul>cagact<ul>a</ul>cct gag<ul>t</ul>cagcag<ul>c</ul>ggcaac<ul>t</ul>caatgg<ul>a</ul>gtccct<ul>g</ul>tccata<ul>t</ul>tgtgga<ul>t</ul>gcttcg<ul>g</ul>tctttc<ul>t</ul>ctcttt<ul>c</ul>caataa<ul>g</ul>tcttgc<ul>g</ul>attgct<ul>a</ul>actctt<ul>c</ul>gggtcc<ul>t</ul>cactgc<ul>t </ul>tttatg<ul>t</ul>gctgtg<ul>t</ul>cactca<ul>a</ul>cgcaaa<ul>a</ul>gtctgc<ul>t</ul>gcttca<ul>a</ul>tcctga<ul>c</ul>ccagtg<ul>c</ul>gaccac<ul>t</ul>acccca<ul>g</ul>cagaaa<ul>c</ul>aagaaa<ul>a</ul>tcagaa<ul>a</ul>acatct<ul>c</ul>aacatc<ul> t</ul>gaagaa<ul>g</ul>caaact<ul>a</ul>cggacg<ul>t</ul>gccacc<ul>g</ul>ttaaga<ul>c</ul>ctgtaa<ul>t</ul>actcac<ul>t</ul>gcgagg<ul>c</ul>tccgcg<ul>a</ul>cttcat<ul>c</ul>cttgaa<ul>c</ul>tcagtg<ul>t</ul>gaccaa<ul>t</ul>aaccca<ul>t</ul>caattc <ul>a</ul>agacac<ul>t</ul>ctagga<ul>t</ul>cctgag<ul>g</ul>caaccc<ul>g</ul>tagaag<ul>t</ul>gcacct<ul>t</ul>gttgat<ul>t</ul>acccag<ul>c</ul>tcccat<ul>t</ul>tgggat<ul>a</ul>tagggg<ul>c</ul>cctgga<ul>g</ul>agcccg<ul>t</ul>aaaatg<ul>c</ul>gctcct<ul>a</ul> atctct<ul>g</ul>acccag<ul>a</ul>tcccct<ul>t</ul>tgggga<ul>c</ul>ttaggg</p>
         <p>We observe above a case where our method is superior to BLAST. Admittedly, our test case is a synthetic example, but with the growing number and variety of biologically important problems it may well be that in the future our FASH application can be found helpful.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>Project Name: FASH</p>
         <p>Project home page: <url>https://fash.bgu.ac.il:8443/fash/default.jsp</url>.</p>
         <p>Operating System(s): The FASH web application is platform independent.</p>
         <p>Programming language: Java</p>
         <p>Other requirements: None</p>
         <p>License: None</p>
         <p>Any restrictions to use by non-academics: None</p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>FASH: Fourier Alignment Sequence Heuristics.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>All authors have read and approved the final manuscript. KK defined the problem and designed the project, IV supervised CA and AT code writing, IV, DB, CA, ET, PC, and KK tested and debugged the programs. All authors participated in the manuscript preparation.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Evgeny Nudler for the initial motivation to this work. The research was supported by the Lynn and William Frankel Center for Computer Sciences at Ben-Gurion University and a grant from the Israel USA binational science foundation BSF 2003291.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>RNA-mediated response to heat shock in mammalian cells</p>
            </title>
            <aug>
               <au>
                  <snm>Shamovsky</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Ivannikov</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kandel</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Gershon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Nudler</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>440</volume>
            <issue>7083</issue>
            <fpage>556</fpage>
            <lpage>560</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature04518</pubid>
                  <pubid idtype="pmpid" link="fulltext">16554823</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Sch&#228;ffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zheng Zhang</snm>
                  <fnm>JZ</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>FASTA database searching tool</p>
            </title>
            <aug>
               <au>
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DD</fnm>
               </au>
            </aug>
            <source>PNAS</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <fpage>2444</fpage>
            <lpage>2448</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">280013</pubid>
                  <pubid idtype="pmpid" link="fulltext">3162770</pubid>
                  <pubid idtype="doi">10.1073/pnas.85.8.2444</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>An efficient method for matching nucleic acid sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sawyer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kochin</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1982</pubdate>
            <volume>10</volume>
            <fpage>133</fpage>
            <lpage>139</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">326121</pubid>
                  <pubid idtype="pmpid" link="fulltext">6174932</pubid>
                  <pubid idtype="doi">10.1093/nar/10.1.133</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Digital signal processing methods for biosequence comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Benson</snm>
                  <fnm>DC</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1990</pubdate>
            <volume>18</volume>
            <issue>10</issue>
            <fpage>3001</fpage>
            <lpage>3006</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">330830</pubid>
                  <pubid idtype="pmpid" link="fulltext">2349096</pubid>
                  <pubid idtype="doi">10.1093/nar/18.10.3001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>

