How to Extract Entries From Multiple Fasta

FASTA is a text-based format used in bioinformatics for representing sequences, especially those of nucleotides and peptides, with base pairs represented by a single letter. A FASTA sequence consists of a single-line description, distinguished by a “greater than” symbol on the first line, followed by a multi-line nucleotide or peptide sequence. You can extract multiple sequences from a FASTA file using special modules, or add-ons, to the Perl programming language, known as BioPerl, that have been specially developed to handle the FASTA format. You can also manually code a Perl script to match patterns in a file or use other available tools to extract FASTA sequences.

Instructions

FASTA file
Perl editor
BioPerl
ActiveState Perl
Biopieces

- 1
  
  Launch your Perl editor application. You may use a simple text editor, such as Notepad. You will need to save the file with a “.pl” extension to indicate that it is a Perl program.
- 2
  
  Extract a sequence from a multiple-FASTA file by performing pattern-matching in Perl, by typing the following code into the editor:
  
  #!/usr/bin/perl
  my $ fasta_seq = shift;
  my $sequence = shift;
  my $workfile = `cat $ fasta_seq `;
  my ($fasta_seq) = $workfile =~ /(>$sequence[^>]+)/s;
  print $fasta_seq;
- 3
  
  Extract the sequences from the FASTA file using BioPerl. You can extract multiple sequences by typing the following code into the editor:
  
  #!/bin/perl -w
  
  use Bio::SeqIO;
  
  $sequenceobject = Bio::SeqIO->new(-file => "fasta_file_path", -format => "fasta" );
  
  The Bio::SeqIO module provides seamless sequence processing. You can retrieve a single sequence using the following statement:
  
  $retrievedsequence = $ sequenceobject ->next_seq;
  
  You can loop through the object and retrieve multiple sequences, as follows:
  
  while ($retrievedsequence = $ sequenceobject ->next_seq)
  {
  print $ retrievedsequence ->seq,"\n";
  }
- 4
  
  Extract the sequences from the FASTA file using the “Biopieces” application, which is framework containing a set of modular tools for manipulating bioinformatics data. You run your Biopieces command at the command line.
  
  read_fasta -i fasta_file | grab -p sequence | write_fasta -o sequence_file –x
  
  This is a good option if you are not very technically inclined, as the framework encapsulates much of the programming effort required to process the FASTA file and output the matched sequences.

HowToYo

Category

Featured Posts

Popular posts

Jan 18, 2012

How to Extract Entries From Multiple Fasta

Instructions

Things You'll Need

0 commentaires to “How to Extract Entries From Multiple Fasta”

Post a Comment