ReadMe documentation for release of RJE_UNIPROT software

Distribution compiled: Fri Oct 10 16:34:13 2008

Questions/Comments?: please contact software@cabbagesofdoom.co.uk


Installation Instructions

  1. Place the peat.zip file in chosen directory (e.g. c:\bioware\) and unzip.
  2. A subdirectory peat will be created containing all the necessary files to run

The software should run on any system that has Python installed. Additional software may be necessary for full functionality. Further details can be found in the manuals supplied.


GNU License

Copyright (C) 2007 Richard J. Edwards <software@cabbagesofdoom.co.uk>

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Author contact: <software@cabbagesofdoom.co.uk> / 25 Bassett Court, Bassett Avenue, Southampton SO16 7DR, UK.

To incorporate this module into your own programs, please see GNU Lesser General Public License disclaimer in rje.py


Distributed Files

Python Modules

Other Files


*** Module rje *** [Top]

Module: rje
Description: Contains General Objects for all my (Rich's) scripts
Version: 3.9
Last Edit: 24/06/08
Copyright (C) 2005 Richard J. Edwards - See source code for GNU License Notice

Function:
General module containing Classes used by all my scripts plus a number of miscellaneous methods.
- Output to Screen, Commandline parameters and Log Files

Commandline options are all in the form X=Y. Where Y is to include spaces, use X="Y".

General Commandline:

  • v=X : Sets verbosity (-1 for silent) [0]
  • i=X : Sets interactivity (-1 for full auto) [0]
  • log=FILE : Redirect log to FILE [Default = calling_program.log]
  • newlog=T/F : Create new log file. [Default = False: append log file]
  • silent=T/F : If set to True will not write to screen or log. (Log object attribute only.) [False]
    help : Print help to screen

    Program-Specific Commands: (Some programs only)
  • basefile=FILE : This will set the 'root' filename for output files (FILE.*), including the log
  • outfile=FILE : This will set the 'root' filename for output files (FILE.*), excluding the log
  • delimit=X : Sets standard delimiter for results output files [\t]
  • mysql=T/F : MySQL output
  • append=T/F : Append to results files rather than overwrite [False]
  • force=T/F : Force to regenerate data rather than keep old results [False]

    System Commandline:
  • win32=T/F : Run in Win32 Mode [False]
    pwin : Run in PythonWin (** Must be 'commandline', not in ini file! **)
    cerberus : Run on Cerberus cluster at RCSI
  • memsaver=T/F : Some modules will have a memsaver option to save memory usage [False]
  • runpath=PATH : Run program from given path (log files and some programs only) [path called from]
  • rpath=PATH : Path to installation of R ['c:\\Program Files\\R\\R-2.6.2\\bin\\R.exe']
  • soaplab=T/F : Implement special options/defaults for SoapLab implementations [False]

    Forking Commandline:
  • noforks=T/F : Whether to avoid forks [False]
  • forks=X : Number of parallel sequences to process at once [0]
  • killforks=X : Number of seconds of no activity before killing all remaining forks. [3600]

    Classes:
    RJE_Object(log=None,cmd_list=[]):
    - Metclass for inheritance by other classes.
    >> log:Log = rje.Log object
    >> cmd_list:List = List of commandline variables
    On intiation, this object:
    - sets the Log object (if any)
    - sets verbosity and interactive attributes
    - calls the _setAttributes() method to setup class attributes
    - calls the _cmdList() method to process relevant Commandline Parameters
    Log(itime=time.time(),cmd_list=[]):
    - Handles log output; printing to log file and error reporting
    >> itime:float = initiation time
    >> cmd_list:list of commandline variables
    Info(prog='Unknown',vers='X',edit='??/??/??',desc='Python script',author='Unknown',ptime=None):
    - Stores intro information for a program.
    >> prog:str = program name
    >> vers:str = version number
    >> edit:str = last edit date
    >> desc:str = program description
    >> author:str = author name
    >> ptime:float = starting time of program, time.time()
    Out(cmd=[]):
    - Handles basic generic output to screen based on Verbosity and Interactivity for modules without classes.
    >> cmd:list = list of command-line arguments

    Uses general modules: glob, math, os, random, re, string, sys, time, traceback


    *** Module rje_disorder *** [Top]

    Module: rje_disorder
    Description: Disorder Prediction Module
    Version: 0.5
    Last Edit: 16/05/08
    Copyright (C) 2006 Richard J. Edwards - See source code for GNU License Notice

    Function:
    This module currently has limited function and no standalone capability, though this may be added with time. It is
    designed for use with other modules. The disorder Class can be given a sequence and will run the appropriate
    disorder prediction software and store disorder prediction results for use in other programs. The sequence will have
    any gaps removed.

    Currently two disorder prediction methods are implemented:
    * IUPred : Dosztanyi Z, Csizmok V, Tompa P & Simon I (2005). J. Mol. Biol. 347, 827-839. This has to be installed
    locally. It is available on request from the IUPred website and any use of results should cite the method. (See
    http://iupred.enzim.hu/index.html for more details.) IUPred returns a value for each residue, which by default,
    is determined to be disordered if > 0.5.
    * FoldIndex : This is run directly from the website (http://bioportal.weizmann.ac.il/fldbin/findex) and more simply
    returns a list of disordered regions. You must have a live web connection to use this method!

    For IUPred, the individual residue results are stored in Disorder.list['ResidueDisorder']. For both methods, the
    disordered regions are stored in Disorder.list['RegionDisorder'] as (start,stop) tuples.

    Commandline:
    ### General Options ###

  • disorder=X : Disorder method to use (iupred/foldindex/parse) [iupred]
  • iucut=X : Cut-off for IUPred results [0.2]
  • iumethod=X : IUPred method to use (long/short) [short]
  • sequence=X : Sequence to predict disorder for (autorun) []
  • name=X : Name of sequence to predict disorder for []
  • minregion=X : Minimum length of an ordered/disordered region [0]

    ### System Settings ###
  • iupath=PATH : The full path to the IUPred exectuable [c:/bioware/iupred/iupred.exe]
  • filoop=X : Number of times to try connecting to FoldIndex server [10]
  • fisleep=X : Number of seconds to sleep between attempts [2]
  • iuchdir=T/F : Whether to change to IUPred directory and run (True) or rely on IUPred_PATH env variable [False]

    Uses general modules: copy, os, string, sys, time, urllib2
    Uses RJE modules: rje
    Other modules needed: None


    *** Module rje_sequence *** [Top]

    Module: rje_sequence
    Description: DNA/Protein sequence object
    Version: 1.10
    Last Edit: 28/08/08
    Copyright (C) 2006 Richard J. Edwards - See source code for GNU License Notice

    Function:
    This module contains the Sequence Object used to store sequence data for all PEAT applications that used DNA or
    protein sequences. It has no standalone functionality.

    This modules contains all the methods for parsing out sequence information, including species and source database,
    based on the format of the input sequences. If using a consistent but custom format for fasta description lines,
    please contact me and I can add it to the list of formats currently recognised.

    Uses general modules: copy, os, random, re, sre_constants, string, sys, time
    Uses RJE modules: rje, rje_disorder


    *** Module rje_uniprot *** [Top]

    Module: rje_uniprot
    Description: RJE Module to Handle Uniprot Files
    Version: 3.3
    Last Edit: 10/10/08
    Copyright (C) 2007 Richard J. Edwards - See source code for GNU License Notice

    Function:
    This module contains methods for handling UniProt files, primarily in other rje modules but also with some
    standalone functionality. To get the most out of the module with big UniProt files (such as the downloads from EBI),
    first index the UniProt data using the rje_dbase module.

    This module can be used to extract a list of UniProt entries from a larger database and/or to produce summary tables
    from UniProt flat files.

    In addition to method associated with the classes of this module, there are a number of methods that are called from
    the rje_dbase module (primarily) to download and process the UniProt sequence database.

    Input Options:

  • unipath=PATH : Path to UniProt Datafile (will look here for DB Index file made with rje_dbase)
  • dbindex=FILE : Database index file [uniprot.index]
  • uniprot=FILE : Name of UniProt file [None]
  • extract=LIST : Extract IDs/AccNums in list. LIST can be FILE or list of IDs/AccNums X,Y,.. []
  • acclist=LIST : As Extract.
  • specdat=LIST : Make a UniProt DAT file of the listed species from the index (over-rules extract=LIST) []
  • splicevar=T/F : Whether to search for AccNum allowing for splice variants (AccNum-X) [True]
  • tmconvert=T/F : Whether to convert TOPO_DOM features, using first description word as Type [False]

    Output Options:
  • makeindex=T/F : Generate UniProt index files [False]
  • makespec=T/F : Generate species table [False]
  • makefas=T/F : Generate fasta files [False]
  • datout=FILE : Name of new (reduced) UniProt DAT file of extracted sequences [None]
  • tabout=FILE : Table of extracted UniProt details [None]
  • linkout=FILE : Table of extracted Database links [None]
  • ftout=FILE : Outputs table of features into FILE [None]
  • domtable=T/F : Makes a table of domains from uniprot file [False]
  • cc2ft=T/F : Extra whole-length features added for TISSUE and LOCATION (not in datout) [False]

    UniProt Conversion Options:
  • ucft=X : Feature to add for UpperCase portions of sequence []
  • lcft=X : Feature to add for LowerCase portions of sequence []
  • maskft=LIST : List of Features to mask out []
  • invmask=T/F : Whether to invert the masking and only retain maskft features [False]
  • caseft=LIST : List of Features to make upper case with rest of sequence lower case []

    General Options:
  • append=T/F : Append to results files rather than overwrite [False]
  • memsaver=T/F : Memsaver option to save memory usage - does not retain entries in UniProt object [False]

    Uses general modules: glob, os, re, string, sys, time
    Uses RJE modules: rje, rje_sequence


    *** Module rje_zen *** [Top]

    Module: rje_zen
    Description: Random Zen Wisdom Generator
    Version: 1.0
    Last Edit: 15/04/08
    Copyright (C) 2007 Richard J. Edwards - See source code for GNU License Notice

    Function:
    Generates random (probably nonsensical) Zen wisdoms. Just for fun.

    Commandline:

  • wisdoms=X : Number of Zen Wisdoms to return [10]

    Uses general modules: copy, glob, os, string, sys, time
    Uses RJE modules: rje
    Other modules needed: None