Jan 18, 2012

How to Extract Entries From Multiple Fasta

,

How to Extract Entries From Multiple Fastathumbnail
FASTA formats represent nucleotide sequences.
FASTA is a text-based format used in bioinformatics for representing sequences, especially those of nucleotides and peptides, with base pairs represented by a single letter. A FASTA sequence consists of a single-line description, distinguished by a “greater than” symbol on the first line, followed by a multi-line nucleotide or peptide sequence. You can extract multiple sequences from a FASTA file using special modules, or add-ons, to the Perl programming language, known as BioPerl, that have been specially developed to handle the FASTA format. You can also manually code a Perl script to match patterns in a file or use other available tools to extract FASTA sequences.



Instructions

Things You'll Need

  • FASTA file
  • Perl editor
  • BioPerl
  • ActiveState Perl
  • Biopieces
    • 1
      Launch your Perl editor application. You may use a simple text editor, such as Notepad. You will need to save the file with a “.pl” extension to indicate that it is a Perl program.
    • 2
      Extract a sequence from a multiple-FASTA file by performing pattern-matching in Perl, by typing the following code into the editor:
      #!/usr/bin/perl
      my $ fasta_seq = shift;
      my $sequence = shift;
      my $workfile = `cat $ fasta_seq `;
      my ($fasta_seq) = $workfile =~ /(>$sequence[^>]+)/s;
      print $fasta_seq;
    • 3
      Extract the sequences from the FASTA file using BioPerl. You can extract multiple sequences by typing the following code into the editor:
      #!/bin/perl -w
      use Bio::SeqIO;
      $sequenceobject = Bio::SeqIO->new(-file => "fasta_file_path", -format => "fasta" );
      The Bio::SeqIO module provides seamless sequence processing. You can retrieve a single sequence using the following statement:
      $retrievedsequence = $ sequenceobject ->next_seq;
      You can loop through the object and retrieve multiple sequences, as follows:
      while ($retrievedsequence = $ sequenceobject ->next_seq)
      {
      print $ retrievedsequence ->seq,"\n";
      }
    • 4
      Extract the sequences from the FASTA file using the “Biopieces” application, which is framework containing a set of modular tools for manipulating bioinformatics data. You run your Biopieces command at the command line.
      read_fasta -i fasta_file | grab -p sequence | write_fasta -o sequence_file –x
      This is a good option if you are not very technically inclined, as the framework encapsulates much of the programming effort required to process the FASTA file and output the matched sequences.

0 commentaires to “How to Extract Entries From Multiple Fasta”

Post a Comment

 

HowToYo Copyright © 2011 | Template design by O Pregador | Powered by Blogger Templates